man in orange and black vest wearing white helmet holding yellow and black power tool
man in orange and black vest wearing white helmet holding yellow and black power tool
man in orange and black vest wearing white helmet holding yellow and black power tool

Nov 28, 2025

The Tech Behind a Reliable eCommerce AI Agent

For Borgh, we set out to build an AI assistant that feels less like a chatbot and more like a domain-aware product expert.

Justin Plagis

Chief Product

Nov 28, 2025

The Tech Behind a Reliable eCommerce AI Agent

For Borgh, we set out to build an AI assistant that feels less like a chatbot and more like a domain-aware product expert.

Justin Plagis

Chief Product

With a catalog of 15,000+ Milwaukee and Makita SKUs, the challenge wasn’t generating text — it was getting retrieval, structure and trust right.

From a CTO lens, this project demonstrates how to combine vector search, traditional filtering, prompt governance and lightweight orchestration into a production-safe E-commerce agent. The result runs across Supabase, LangChain (n8n), OpenAI and LangFuse, with a React front-end on Vercel.

Below is the architecture and reasoning behind it.

Why we didn’t “just add a chatbot”

Tool shoppers aren’t browsing for inspiration — they’re solving a job. Their queries typically involve:

  • Brand preferences (“Makita M-series?”)

  • Specific variants (“M12 vs M18?”)

  • Edge constraints (budget, category, availability)

  • SKU-level precision

Generic LLM chat falls short here. It hallucinates SKUs, misses category boundaries, and lacks observability. We needed an agent that behaves more like a typed API client than a text generator — with controlled formatting, deterministic retrieval, and prompt governance.

This set the core requirements:

  1. Hybrid retrieval: structured filtering + pgvector semantic search

  2. Deterministic output structure: always SKU-first, always markdown

  3. Observability: traces, versioned prompts, LLM-as-a-judge scoring

  4. Composable orchestration: no monolithic backend, just tools + workflows

  5. Replaceable components: interchangeable LLM, modular edge functions, simple FE integration.

The Architectural Overview

We built the stack around four components:

  1. Experience layer:

    • React chat widget on Vercel

    • Sends messages + session ID to an n8n webhook

    • Renders structured markdown answers

  2. Orchestration layer (n8n):

    • Fetches latest system prompt from LangFuse

    • Runs an OpenAI 5.1 mini model as an agent

    • Provides two Supabase-backed tools:

      • search-products

      • search-information

    • Converts the model output into a structured JSON response

    • Logs to Google Sheets for lightweight QA

  3. Data layer (Supabase):

    • Postgres for structured product data

    • pgvector for semantic embeddings

    • Edge functions that expose search endpoints to the agent

    • Acts as the central, typed data platform

  4. Content governance (LangFuse):

    • Stores versioned system prompts

    • Captures traces and tool usage

    • Runs LLM-as-a-judge evaluations for regression testing

    • Provides analytics for prompt iteration

This architecture keeps each piece isolated, replaceable and inspectable.

Keeping 15,000 SKUs fresh: Magento → Supabase → pgvector

Rather than reindexing the full catalog, we rely on delta ingestion from Magento:

  1. Magento exports daily change logs (CRUD operations) to an sFTP location.

  2. A scheduled Supabase edge function pulls and parses the logs.

  3. Each changed product is normalized into:

    • Core fields (SKU, price, brand, category, stock)

    • Descriptive text for embeddings

  4. We compute OpenAI embeddings and update:

    • Postgres (structured data)

    • pgvector (semantic layer)

This gives the agent two search primitives:

  • Precision via SQL filters

  • Flexibility via semantic similarity

It’s lightweight, predictable, and avoids coupling to Magento’s runtime performance.

Making the agent reliable: a multi-step retrieval strategy

The key decision was to treat the LLM not as a “chat model”, but as a controller that produces deterministic search requests.

Everything the agent does is defined in the system prompt — tone, structure, constraints, and methodology and governed through LangFuse.

Clarify-First

Before any search, the agent must ask 1–3 targeted questions if the query is ambiguous (brand, category, price, number of results). This prevents retrieval drift and significantly reduces irrelevant matches.

Internal Query Plan

The LLM must internally construct (but not display) a typed JSON query plan defining:

  • Intent (find, compare, browse, order_note)

  • Mode (close vs. wide)

  • Constraints (brand, category_ids, min/max price, result count)

  • Tool configuration (search type, allowed categories, limit, shape)

This forces the model to reason explicitly before making a tool call.

Close → Wide search pattern

We enforce a two-stage search:

  • Close mode

    • Constrain to category IDs (derived from a seed SKU or query)

    • Apply brand + price filters

    • Use hybrid search as default

  • Wide mode (fallback)

    • Triggered automatically if close returns insufficient results

    • Removes category constraints

    • Uses semantic/hybrid retrieval over the full catalog

This mirrors how human sales experts narrow → broaden depending on available inventory.

Post-Filtering & Diversity
After retrieval, the agent must:
  • Remove irrelevant or cross-category items

  • Eliminate accessories (strict rule)

  • Produce a maximum of three options

  • Provide brand or variant diversity wherever possible

Finally, the answer is formatted in stable markdown with SKU-first bullets.

This is the difference between “chatbot answers” and ecommerce-grade product recommendations.

Prompt governance & observability with LangFuse

The project only works because we built observability into the core.

Versioned prompts

LangFuse stores the system prompt (borgh-chat-prompt) as a versioned artifact. n8n retrieves the latest version on every request. This lets us:

  • Update behavior without deployment

  • Roll back instantly

  • Run controlled A/B or phased rollouts

Traces & tool visibility

Every interaction logs:

  • User input

  • System prompt version

  • Tool calls (search-products, search-information)

  • Final output

  • Confidence scores

This makes the agent fully inspectable — a non-negotiable requirement for commercial use.

LLM-as-a-judge

We run automated evaluations that check:

  • Relevance of recommended products

  • Structural compliance (markdown, SKU-first, max 3 options)

  • Tone consistency with the Borgh persona

  • Correctness of “clarify-first” logic

These scores allow regression testing for each prompt or retrieval change, similar to traditional API behavior tests.

Lightweight QA in Google Sheets

We complement LangFuse with a Google Sheet log for human QA, making stakeholder review easy without exposing internal systems.

What’s next

With retrieval, governance and UX foundations in place, next steps are straightforward, expand the agents options to:

  • Enabling the Agents ability to make discount deals for lowest prices

  • Helping customers order directly from the agent with Magento backend

  • Customer support to help with basic support actions like invoicing, shipping and billing information

Last but not least, here's a fine print of the architecture:

[ Shopper ]
   |
   v
[ Website (React, Vercel) ]
   |
POST message
   |
   v
[ n8n Orchestrator ]
   - Fetch prompt from LangFuse
   - LangChain AI Agent (OpenAI 5.1 mini)
   - Tools:
       * search-products (Supabase)
       * search-information (Supabase)
   - Parse output
   - Log to Sheets
   |
   v
[ Front-end renders structured markdown ]

Side connections:
[ LangFuse ] <-> prompts, traces, evaluations
[ Supabase ] -> Postgres + pgvector + edge functions
[ Magento ] -> sFTP -> Supabase ingestion job

From a CTO lens, this project demonstrates how to combine vector search, traditional filtering, prompt governance and lightweight orchestration into a production-safe E-commerce agent. The result runs across Supabase, LangChain (n8n), OpenAI and LangFuse, with a React front-end on Vercel.

Below is the architecture and reasoning behind it.

Why we didn’t “just add a chatbot”

Tool shoppers aren’t browsing for inspiration — they’re solving a job. Their queries typically involve:

  • Brand preferences (“Makita M-series?”)

  • Specific variants (“M12 vs M18?”)

  • Edge constraints (budget, category, availability)

  • SKU-level precision

Generic LLM chat falls short here. It hallucinates SKUs, misses category boundaries, and lacks observability. We needed an agent that behaves more like a typed API client than a text generator — with controlled formatting, deterministic retrieval, and prompt governance.

This set the core requirements:

  1. Hybrid retrieval: structured filtering + pgvector semantic search

  2. Deterministic output structure: always SKU-first, always markdown

  3. Observability: traces, versioned prompts, LLM-as-a-judge scoring

  4. Composable orchestration: no monolithic backend, just tools + workflows

  5. Replaceable components: interchangeable LLM, modular edge functions, simple FE integration.

The Architectural Overview

We built the stack around four components:

  1. Experience layer:

    • React chat widget on Vercel

    • Sends messages + session ID to an n8n webhook

    • Renders structured markdown answers

  2. Orchestration layer (n8n):

    • Fetches latest system prompt from LangFuse

    • Runs an OpenAI 5.1 mini model as an agent

    • Provides two Supabase-backed tools:

      • search-products

      • search-information

    • Converts the model output into a structured JSON response

    • Logs to Google Sheets for lightweight QA

  3. Data layer (Supabase):

    • Postgres for structured product data

    • pgvector for semantic embeddings

    • Edge functions that expose search endpoints to the agent

    • Acts as the central, typed data platform

  4. Content governance (LangFuse):

    • Stores versioned system prompts

    • Captures traces and tool usage

    • Runs LLM-as-a-judge evaluations for regression testing

    • Provides analytics for prompt iteration

This architecture keeps each piece isolated, replaceable and inspectable.

Keeping 15,000 SKUs fresh: Magento → Supabase → pgvector

Rather than reindexing the full catalog, we rely on delta ingestion from Magento:

  1. Magento exports daily change logs (CRUD operations) to an sFTP location.

  2. A scheduled Supabase edge function pulls and parses the logs.

  3. Each changed product is normalized into:

    • Core fields (SKU, price, brand, category, stock)

    • Descriptive text for embeddings

  4. We compute OpenAI embeddings and update:

    • Postgres (structured data)

    • pgvector (semantic layer)

This gives the agent two search primitives:

  • Precision via SQL filters

  • Flexibility via semantic similarity

It’s lightweight, predictable, and avoids coupling to Magento’s runtime performance.

Making the agent reliable: a multi-step retrieval strategy

The key decision was to treat the LLM not as a “chat model”, but as a controller that produces deterministic search requests.

Everything the agent does is defined in the system prompt — tone, structure, constraints, and methodology and governed through LangFuse.

Clarify-First

Before any search, the agent must ask 1–3 targeted questions if the query is ambiguous (brand, category, price, number of results). This prevents retrieval drift and significantly reduces irrelevant matches.

Internal Query Plan

The LLM must internally construct (but not display) a typed JSON query plan defining:

  • Intent (find, compare, browse, order_note)

  • Mode (close vs. wide)

  • Constraints (brand, category_ids, min/max price, result count)

  • Tool configuration (search type, allowed categories, limit, shape)

This forces the model to reason explicitly before making a tool call.

Close → Wide search pattern

We enforce a two-stage search:

  • Close mode

    • Constrain to category IDs (derived from a seed SKU or query)

    • Apply brand + price filters

    • Use hybrid search as default

  • Wide mode (fallback)

    • Triggered automatically if close returns insufficient results

    • Removes category constraints

    • Uses semantic/hybrid retrieval over the full catalog

This mirrors how human sales experts narrow → broaden depending on available inventory.

Post-Filtering & Diversity
After retrieval, the agent must:
  • Remove irrelevant or cross-category items

  • Eliminate accessories (strict rule)

  • Produce a maximum of three options

  • Provide brand or variant diversity wherever possible

Finally, the answer is formatted in stable markdown with SKU-first bullets.

This is the difference between “chatbot answers” and ecommerce-grade product recommendations.

Prompt governance & observability with LangFuse

The project only works because we built observability into the core.

Versioned prompts

LangFuse stores the system prompt (borgh-chat-prompt) as a versioned artifact. n8n retrieves the latest version on every request. This lets us:

  • Update behavior without deployment

  • Roll back instantly

  • Run controlled A/B or phased rollouts

Traces & tool visibility

Every interaction logs:

  • User input

  • System prompt version

  • Tool calls (search-products, search-information)

  • Final output

  • Confidence scores

This makes the agent fully inspectable — a non-negotiable requirement for commercial use.

LLM-as-a-judge

We run automated evaluations that check:

  • Relevance of recommended products

  • Structural compliance (markdown, SKU-first, max 3 options)

  • Tone consistency with the Borgh persona

  • Correctness of “clarify-first” logic

These scores allow regression testing for each prompt or retrieval change, similar to traditional API behavior tests.

Lightweight QA in Google Sheets

We complement LangFuse with a Google Sheet log for human QA, making stakeholder review easy without exposing internal systems.

What’s next

With retrieval, governance and UX foundations in place, next steps are straightforward, expand the agents options to:

  • Enabling the Agents ability to make discount deals for lowest prices

  • Helping customers order directly from the agent with Magento backend

  • Customer support to help with basic support actions like invoicing, shipping and billing information

Last but not least, here's a fine print of the architecture:

[ Shopper ]
   |
   v
[ Website (React, Vercel) ]
   |
POST message
   |
   v
[ n8n Orchestrator ]
   - Fetch prompt from LangFuse
   - LangChain AI Agent (OpenAI 5.1 mini)
   - Tools:
       * search-products (Supabase)
       * search-information (Supabase)
   - Parse output
   - Log to Sheets
   |
   v
[ Front-end renders structured markdown ]

Side connections:
[ LangFuse ] <-> prompts, traces, evaluations
[ Supabase ] -> Postgres + pgvector + edge functions
[ Magento ] -> sFTP -> Supabase ingestion job