From Days to Seconds: How an AI Query Layer Eliminated the Data Request Backlog on a Gift Card Platform

The problem nobody had named

The gift card platform had two kinds of users who needed data every day to run their operations. Providers issued gift cards and needed to track issuance, redemption, and balance status. Merchants accepted those cards and needed to track their own redemptions, settlements, and dues.

Both groups were asking the same kind of question, over and over, in different words:

"How many of my cards were redeemed this week?"
"What's my current balance with the platform?"
"Which cards are expiring in the next 30 days?"
"What's my settlement amount for this month?"
"Why is this card showing a different balance than I expect?"

None of these are hard questions. The data was sitting in the platform's database. The answers were simple to produce.

But the access path was engineering. Every question became a ticket. Every ticket joined a queue. Every answer took days.

Users were not even complaining. They had accepted it as how things worked. The teams had normalized it. But the pile-up of these requests had two effects that were harder to see from inside the system.

Engineering time was disappearing into data retrieval. The volume of routine data request tickets was not huge, but it was steady. Weeks of capacity per quarter went into producing answers that needed no engineering skill, only engineering access to the database.

And business decisions were getting delayed. A provider deciding whether to run a promotional campaign needed redemption data. If that data took three days to land, the decision got made on stale information, or on no data at all. The cost was real and nobody was counting it.

The architecture decision that made this safe

The first decision, and the most important one, was an architectural constraint: the AI query layer would have no write path to the production database.

That is not the obvious choice when you are scoping an AI feature. The tempting move is to wire the AI straight into the production system so it answers questions with real-time data. The catch is that an AI layer with direct production access adds risk to production. Bugs in query generation, unexpected load from a heavy query, prompt injection attempts: all of them now have production consequences.

The sidecar approach removes that risk completely. The AI layer works off a read replica synced from the production database on a schedule. The replica is the only database the AI layer touches. A bug in the AI layer produces a wrong answer. It cannot corrupt production data, lock production tables, or interfere with transaction processing.

graph TD
    subgraph PRODUCTION["Production Platform (.NET + SQL Server)"]
        APP[Application Backend]
        SOURCE[SQL Server Database]
    end

    subgraph ETL["ETL Layer"]
        SYNC[Sync Orchestrator - scheduled]
    end

    subgraph SIDECAR["AI Sidecar (Python + FastAPI)"]
        REPLICA[PostgreSQL Read Replica]
        ROUTER[LLM Router - intent classification]
        SQL_AGENT[SQL Agent]
        RAG_AGENT[RAG Agent]
        GUARDRAILS[Input Validator + Output Scrubber]
        RATE[Rate Limiter]
    end

    SOURCE -->|incremental sync| SYNC
    SYNC --> REPLICA
    APP -->|JWT auth| SIDECAR
    GUARDRAILS --> ROUTER
    ROUTER -->|structured intent| SQL_AGENT
    ROUTER -->|knowledge query| RAG_AGENT
    SQL_AGENT --> REPLICA

The ETL sync runs on a short interval, keeping the replica current for the kinds of questions the system is built to answer. Redemption counts, balance summaries, and settlement figures do not need millisecond freshness. They need minute-level freshness, and that is what the sidecar delivers.

How the query system works

Intent classification

When a user submits a natural language query, an LLM classifier maps the input to a structured intent before any data is touched. The intent taxonomy covers the question categories that show up in real operational usage:

Provider intents: gift card status, redemption analytics, balance summary, expiring cards.

Merchant intents: outlet redemptions, settlement figures, outstanding dues, denomination performance.

Admin intents: platform-wide volume, provider rankings, overall redemption rates.

The classification step matters for two reasons. First, it gates data access: a classified intent is checked against the user's role before any query runs. Second, it pins query generation to known templates for known intents, which shrinks the surface for SQL injection and unpredictable generated queries.

"Which of my cards were redeemed yesterday?" and "Show me redemptions from the last 24 hours" classify to the same intent, the same query template, and the same data scope. The phrasing variation that breaks a rule-based system is something the LLM classifier handles without fuss.

Role-based data isolation

Each user authenticates with the same JWT they use on the production platform. The sidecar validates the token and pulls out the role and entity identity.

Every query is automatically scoped to the authenticated user's data:

Role	Query Scope
Provider	Own gift cards and redemptions only
Merchant	Own outlet transactions only
Admin	Full platform data

A Provider user cannot ask about another Provider's cards. A Merchant user cannot see platform-wide figures. The scoping is enforced at the query generation level, not the response level - the query itself is written to return only the authorized data.

So the system never has to post-filter results and hope the filtering was right. The query is scoped before it runs.

The guardrails

Building an AI query layer against operational data means taking the security surface seriously.

Input validation blocks SQL injection patterns and prompt injection attempts before they reach the LLM. Queries with SQL keywords in unexpected positions, attempts to override system instructions, or patterns matching known injection signatures are rejected before classification.

Output scrubbing masks PII in responses. The platform data holds email addresses, phone numbers, and other personal identifiers. Responses are scanned and masked before they leave the sidecar, so a provider asking about cardholder redemption patterns gets aggregate data, not individual cardholder details.

Rate limiting applies per-user limits at burst, minute, and hour granularity. That stops anyone from turning the query layer into an unrestricted database engine.

What unearthed business value

The immediate outcome was the one we expected: providers and merchants could ask questions and get answers in seconds instead of days. The value that surfaced over the following weeks and months was bigger than that.

Providers started running operations they had been quietly avoiding. When a provider can check which cards expire this week in seconds, she can act on it: run a reminder campaign, offer an extension, contact the affected cardholders. When the answer took three days, the information showed up too late to use. The sidecar made a whole category of business action practical that had only been theoretical before.

Merchants found discrepancies they had stopped looking for. Several merchants, once they could query their own settlement figures directly, found gaps between their expected and actual settlement amounts that had been quietly accumulating. These were reconciliation issues that had always been visible in the data. The friction of getting at the data was the only thing keeping them hidden.

The platform got stickier. Providers and merchants who use the query layer come back to the platform more often than those who do not. Daily operational questions that used to require an outside contact are now answered inside the platform. Engagement went up with no change to the core product.

Engineering got capacity back. The data request queue was not large before the sidecar. After it, it was close to zero for the question categories the sidecar covers. The engineers who had been the path to that data could put the time back into product work.

What it took to build and what it cost

Build time was about six weeks, from the first ETL design to production deployment. The ETL sync layer and the security guardrails ate the most time. The LLM routing and SQL agent work went faster than expected, because the intent taxonomy was well-defined before implementation started.

Ongoing operational cost: the sidecar adds an LLM inference cost per query. At production query volumes, that cost is a small fraction of the value created. Every query that does not become an engineering ticket has a cost nobody tracks but everybody pays.

Maintenance: the ETL sync needs attention when the source schema changes. The intent taxonomy needs expanding when new query patterns emerge from user behavior. Neither is heavy, but both need an owner.

What the sidecar did not change

The sidecar did not touch the production platform. No production schema changes, no changes to the application backend, no changes to authentication or authorization in the main system. The JWT validation is shared, but the JWT itself is still issued by the existing authentication system.

That was a deliberate constraint, and it is the reason the project finished in six weeks instead of six months. An additive layer with no production dependencies keeps the scope bounded and the risk contained.

The platform's operational data was always there. Its business value was always real. The only thing that changed was the access path.