AI & Machine Learning

Building AI-Powered Search for E-commerce Applications

Uvin Vindula·May 6, 2024·11 min read

TL;DR

Traditional e-commerce search is keyword matching — it finds "brake pad" when someone types "brake pad" and returns nothing useful when someone types "grinding noise when I stop." I have built AI-powered search for two production e-commerce platforms: EuroParts Lanka (European car parts in Sri Lanka, where customers describe symptoms and get matched to OEM parts) and FreshMart (grocery delivery, where shoppers search by meal intent and get matched to ingredients). The architecture that works in production is a hybrid approach — semantic vector search for understanding intent, combined with traditional keyword and filter-based retrieval for precision. Neither one alone is good enough. Semantic search catches meaning but misses exact model numbers. Keyword search catches exact matches but misses intent. Together, with a weighted ranking layer, they deliver search results that feel like talking to a knowledgeable sales associate. In this article, I walk through the full architecture, the embedding pipeline, query processing, ranking algorithms, and the real performance numbers from both projects.

Why Traditional Search Fails

Every e-commerce platform starts with the same search implementation: take the user's query, split it into tokens, run a SQL LIKE or full-text search against product titles and descriptions, return results sorted by relevance score. It works fine for about two weeks, and then support tickets start rolling in.

The problem is not the technology. The problem is the assumption that customers know what they are looking for.

On EuroParts Lanka, I tracked search queries for 30 days before building the AI search system. Here is what I found:

38% of queries were symptom descriptions, not product names — "car shaking at high speed," "oil leak near engine," "AC not blowing cold"
24% had spelling errors or used Singlish transliterations that traditional search could not parse
19% used generic terms like "brake thing" or "that rubber part under the car"
Only 19% of queries were precise enough for keyword search to return the right product on the first page

That means traditional search was failing 81% of the time. Not returning zero results — that would be obvious. It was returning wrong results. A customer searching for "steering wheel hard to turn" would get steering wheel covers, steering column locks, and steering wheel badges. Not power steering pumps, not rack and pinion assemblies, not power steering fluid — the things they actually needed.

FreshMart had a different version of the same problem. People do not search for "all-purpose flour 1kg" when they want to bake a cake. They search for "cake ingredients" or "birthday party supplies" or "something sweet for tonight." Keyword search cannot bridge the gap between intent and inventory.

The gap between what customers mean and what they type is where AI search lives.

Hybrid Search — Semantic + Keyword

The first mistake people make with AI search is going all-in on semantic search and ripping out keyword matching. I made this mistake on my first prototype for EuroParts Lanka. Pure semantic search understood "grinding noise when braking" beautifully — it returned brake pads, brake discs, brake calipers. But when a mechanic searched for "1K0 615 301 AA" — an exact OEM part number — semantic search returned vaguely related brake components instead of the exact part. That is a dealbreaker.

The architecture that works in production is hybrid search: two retrieval paths that run in parallel, with a ranking layer that merges and re-scores the results.

User Query
    │
    ├──► Keyword Search (PostgreSQL full-text / Algolia)
    │         └── Exact matches, part numbers, brand names
    │
    ├──► Semantic Search (pgvector / embeddings)
    │         └── Intent matching, symptom-to-product, natural language
    │
    └──► Filters (category, price range, vehicle compatibility)
              └── Hard constraints that must be satisfied

              ▼

      Ranking & Fusion Layer
              │
              ▼
      Ranked Results → User

Keyword search handles precision. When someone types an exact SKU, a brand name, or a specific product title, keyword search wins every time. It is fast, deterministic, and does not hallucinate.

Semantic search handles recall. When someone describes a problem, asks a vague question, or uses non-standard language, semantic search understands the intent behind the words and surfaces products that are conceptually relevant even if they share zero keywords with the query.

Filters handle constraints. Price range, category, vehicle compatibility, availability — these are not search problems. They are filter problems. Mixing them into the search scoring function adds complexity without improving results. Apply them as hard constraints after retrieval.

The fusion layer is where the real engineering happens. I use Reciprocal Rank Fusion (RRF) to merge the two result sets. More on that in the ranking section.

Architecture

Here is the full architecture I use for AI-powered e-commerce search, refined across EuroParts Lanka and FreshMart.

typescript

// Core search architecture

interface SearchRequest {
  query: string;
  filters?: {
    category?: string;
    priceRange?: { min: number; max: number };
    brand?: string;
    inStock?: boolean;
    vehicleModel?: string;  // EuroParts-specific
  };
  page?: number;
  pageSize?: number;
}

interface SearchResult {
  products: RankedProduct[];
  totalCount: number;
  searchMeta: {
    queryType: 'keyword' | 'semantic' | 'hybrid';
    semanticScore: number;
    keywordScore: number;
    processingTimeMs: number;
    didYouMean?: string;
  };
}

interface RankedProduct {
  id: string;
  title: string;
  description: string;
  price: number;
  score: number;            // Combined ranking score
  matchType: 'exact' | 'semantic' | 'hybrid';
  highlightedFields: Record<string, string>;
}

The system has four main components:

Embedding Pipeline — Pre-computes vector embeddings for every product at ingest time
Query Processor — Analyzes, cleans, expands, and embeds the user's query in real time
Dual Retriever — Runs keyword and semantic search in parallel
Ranking Engine — Fuses results using RRF with domain-specific boosting signals

Each component is independently deployable. The embedding pipeline runs as a background job whenever products are created or updated. The query processor, dual retriever, and ranking engine are a single API endpoint that responds in under 200ms for 95th percentile queries.

Embedding Products

The embedding pipeline runs at ingest time — when a product is created, updated, or bulk-imported. This is not a real-time operation. You do not want to embed products on-the-fly during a search request.

The key decision is what text to embed. Embedding just the product title gives you a shallow representation. Embedding the full description gives you noise. The approach that works best is constructing a search document — a purpose-built text string that combines the most search-relevant fields.

typescript

import { generateEmbedding } from '@/lib/embeddings';
import { prisma } from '@/lib/prisma';

interface Product {
  id: string;
  title: string;
  description: string;
  category: string;
  brand: string;
  tags: string[];
  specifications: Record<string, string>;
  // EuroParts-specific
  compatibleVehicles?: string[];
  oemNumbers?: string[];
  symptoms?: string[];
}

function buildSearchDocument(product: Product): string {
  const parts: string[] = [
    product.title,
    product.category,
    product.brand,
    product.description.slice(0, 500),
  ];

  if (product.tags.length > 0) {
    parts.push(product.tags.join(', '));
  }

  // For auto parts: include symptoms and vehicle compatibility
  if (product.symptoms && product.symptoms.length > 0) {
    parts.push(`Symptoms: ${product.symptoms.join('. ')}`);
  }

  if (product.compatibleVehicles && product.compatibleVehicles.length > 0) {
    parts.push(`Compatible with: ${product.compatibleVehicles.join(', ')}`);
  }

  if (product.oemNumbers && product.oemNumbers.length > 0) {
    parts.push(`OEM: ${product.oemNumbers.join(', ')}`);
  }

  // Include key specifications
  const specEntries = Object.entries(product.specifications);
  if (specEntries.length > 0) {
    const specText = specEntries
      .map(([key, value]) => `${key}: ${value}`)
      .join(', ');
    parts.push(specText);
  }

  return parts.join(' | ');
}

async function embedProduct(product: Product): Promise<void> {
  const searchDocument = buildSearchDocument(product);
  const embedding = await generateEmbedding(searchDocument);

  await prisma.$executeRaw`
    UPDATE products
    SET
      search_document = ${searchDocument},
      embedding = ${embedding}::vector,
      embedded_at = NOW()
    WHERE id = ${product.id}
  `;
}

async function embedProductBatch(
  products: Product[],
  batchSize = 100
): Promise<void> {
  for (let i = 0; i < products.length; i += batchSize) {
    const batch = products.slice(i, i + batchSize);
    const documents = batch.map(buildSearchDocument);
    const embeddings = await generateEmbeddingBatch(documents);

    const values = batch.map((product, idx) => ({
      id: product.id,
      searchDocument: documents[idx],
      embedding: embeddings[idx],
    }));

    await prisma.$transaction(
      values.map((v) =>
        prisma.$executeRaw`
          UPDATE products
          SET
            search_document = ${v.searchDocument},
            embedding = ${v.embedding}::vector,
            embedded_at = NOW()
          WHERE id = ${v.id}
        `
      )
    );

    console.log(
      `Embedded ${Math.min(i + batchSize, products.length)}/${products.length}`
    );
  }
}

A few things I learned the hard way:

Batch your embedding API calls. OpenAI's embedding API (and most alternatives) supports batch requests. Embedding 100 products one-by-one costs the same but takes 50x longer than a single batch call. For EuroParts Lanka's catalog of 1,400+ parts, batch embedding completes in under 2 minutes. One-by-one would take over an hour.

Include symptom language in the search document. For EuroParts, I added a symptoms field to each product — plain-language descriptions of what a failing part feels or sounds like. "Grinding noise when braking," "vibration in steering wheel at high speed," "whining noise when turning." This is the language customers actually use, and embedding it alongside the product data means semantic search naturally bridges the gap.

Re-embed on update, not on schedule. I trigger re-embedding via a Prisma middleware that fires on product update. No stale embeddings, no cron job drift. If the title, description, tags, or symptoms change, the embedding refreshes within seconds.

Query Processing

The query side is where you make or break the search experience. A raw user query is messy — typos, mixed languages, ambiguous terms, unnecessary words. The query processor cleans, analyzes, and enriches it before it hits either search path.

typescript

interface ProcessedQuery {
  original: string;
  cleaned: string;
  embedding: number[];
  queryType: 'exact' | 'natural_language' | 'hybrid';
  extractedFilters: Record<string, string>;
  expandedTerms: string[];
}

async function processQuery(raw: string): Promise<ProcessedQuery> {
  const cleaned = cleanQuery(raw);
  const queryType = classifyQuery(cleaned);
  const extractedFilters = extractFilters(cleaned);
  const expandedTerms = expandSynonyms(cleaned);

  const embedding = await generateEmbedding(cleaned);

  return {
    original: raw,
    cleaned,
    embedding,
    queryType,
    extractedFilters,
    expandedTerms,
  };
}

function cleanQuery(query: string): string {
  return query
    .toLowerCase()
    .trim()
    .replace(/\s+/g, ' ')           // Normalize whitespace
    .replace(/[^\w\s\-./]/g, '')    // Keep hyphens, dots, slashes for part numbers
    .slice(0, 200);                  // Cap length to prevent abuse
}

function classifyQuery(query: string): 'exact' | 'natural_language' | 'hybrid' {
  // Part numbers: alphanumeric with hyphens/dots, e.g., "1K0-615-301-AA"
  const partNumberPattern = /^[A-Z0-9][\w\-./]{4,}$/i;
  if (partNumberPattern.test(query.replace(/\s/g, ''))) {
    return 'exact';
  }

  // Natural language: contains common symptom/question words
  const naturalLanguageSignals = [
    'how', 'why', 'what', 'which', 'when',
    'noise', 'sound', 'problem', 'issue', 'broken',
    'not working', 'need', 'looking for', 'best',
    'alternative', 'replacement', 'fix',
  ];

  const hasNaturalLanguage = naturalLanguageSignals.some((signal) =>
    query.includes(signal)
  );

  if (hasNaturalLanguage || query.split(' ').length > 4) {
    return 'natural_language';
  }

  return 'hybrid';
}

function extractFilters(
  query: string
): Record<string, string> {
  const filters: Record<string, string> = {};

  // Extract price constraints
  const priceMatch = query.match(/under\s+(\d+)/);
  if (priceMatch) {
    filters['maxPrice'] = priceMatch[1];
  }

  // Extract brand names
  const brands = ['audi', 'bmw', 'mercedes', 'volkswagen', 'porsche'];
  for (const brand of brands) {
    if (query.includes(brand)) {
      filters['brand'] = brand;
    }
  }

  // Extract vehicle model patterns
  const modelMatch = query.match(
    /(a[34568]|3 series|5 series|c class|e class|golf|passat|cayenne)/i
  );
  if (modelMatch) {
    filters['model'] = modelMatch[1];
  }

  return filters;
}

The query classifier is critical. When I detect a part number pattern, I skip semantic search entirely and go straight to keyword/exact match. No point burning an embedding API call and a vector similarity search when the user knows exactly what they want. When I detect natural language — symptoms, questions, vague descriptions — I weight semantic search much higher in the ranking fusion. For everything in between, both paths get equal weight.

Filter extraction is a simple but high-impact optimization. When someone searches "BMW brake pads under 5000" (prices on EuroParts are in LKR), I do not want to embed the brand name and price as part of the semantic query. I extract "BMW" as a brand filter and "5000" as a price ceiling, then run the cleaned query "brake pads" through the search pipeline. This gives dramatically better results because the semantic model focuses on product relevance while hard filters handle constraints.

Ranking and Relevance

This is the section that separates a prototype from a production search system. You have two result sets — one from keyword search, one from semantic search — and you need to merge them into a single ranked list that feels natural to the user.

I use Reciprocal Rank Fusion (RRF) as the base algorithm, with domain-specific boosting signals layered on top.

typescript

interface ScoredProduct {
  productId: string;
  keywordRank: number | null;    // Position in keyword results (1-based)
  semanticRank: number | null;   // Position in semantic results (1-based)
  rrfScore: number;
  boostMultiplier: number;
  finalScore: number;
}

function reciprocalRankFusion(
  keywordResults: string[],
  semanticResults: string[],
  k = 60 // RRF constant — controls how much rank matters
): ScoredProduct[] {
  const scoreMap = new Map<string, ScoredProduct>();

  // Score keyword results
  keywordResults.forEach((productId, index) => {
    const rank = index + 1;
    const existing = scoreMap.get(productId);
    const rrfContribution = 1 / (k + rank);

    if (existing) {
      existing.keywordRank = rank;
      existing.rrfScore += rrfContribution;
    } else {
      scoreMap.set(productId, {
        productId,
        keywordRank: rank,
        semanticRank: null,
        rrfScore: rrfContribution,
        boostMultiplier: 1,
        finalScore: 0,
      });
    }
  });

  // Score semantic results
  semanticResults.forEach((productId, index) => {
    const rank = index + 1;
    const existing = scoreMap.get(productId);
    const rrfContribution = 1 / (k + rank);

    if (existing) {
      existing.semanticRank = rank;
      existing.rrfScore += rrfContribution;
    } else {
      scoreMap.set(productId, {
        productId,
        keywordRank: null,
        semanticRank: rank,
        rrfScore: rrfContribution,
        boostMultiplier: 1,
        finalScore: 0,
      });
    }
  });

  // Products appearing in BOTH lists get a fusion bonus
  for (const product of scoreMap.values()) {
    if (product.keywordRank !== null && product.semanticRank !== null) {
      product.boostMultiplier *= 1.3;
    }
  }

  // Apply boost and compute final scores
  for (const product of scoreMap.values()) {
    product.finalScore = product.rrfScore * product.boostMultiplier;
  }

  return Array.from(scoreMap.values()).sort(
    (a, b) => b.finalScore - a.finalScore
  );
}

RRF works by converting absolute relevance scores (which are not comparable across different search systems) into rank-based scores (which are). A product ranked #1 in keyword search gets 1 / (60 + 1) = 0.0164. A product ranked #1 in semantic search gets the same. If a product appears in both lists, its scores add up. The k constant (I use 60, which is standard) controls how quickly the score drops off with rank.

On top of RRF, I apply domain-specific boosting signals:

In-stock boost — Products in stock get a 1.2x multiplier. Out of stock products still appear (so users know they exist) but rank lower.
Popularity boost — Products with higher order counts in the last 90 days get a modest 1.1x boost. Not enough to override relevance, but enough to break ties.
Recency boost — For FreshMart, seasonal products get a 1.15x boost when they are in season. Nobody wants to see Christmas pudding in July.
Exact match override — If the keyword search returns a product with a 100% exact title match, that product gets pinned to position #1 regardless of fusion scores. When someone types the exact product name, give them exactly that.

The EuroParts Approach — Natural Language to Products

The most challenging and rewarding search problem I have solved was on EuroParts Lanka. The core challenge: a customer types "my Audi A4 is making a grinding noise when I brake" and needs to see brake discs and brake pads for the Audi A4 B8/B9, with the correct OEM numbers, at the right price points.

Traditional search returns zero useful results for that query. Semantic search alone returns brake-related parts across all vehicle brands. The solution is a three-stage pipeline that I call symptom-to-product mapping.

typescript

interface SymptomMapping {
  symptomPattern: string;
  relatedCategories: string[];
  relatedParts: string[];
  urgencyLevel: 'low' | 'medium' | 'high' | 'critical';
}

const SYMPTOM_MAPPINGS: SymptomMapping[] = [
  {
    symptomPattern: 'grinding.*brak|brake.*noise|squeal.*stop',
    relatedCategories: ['brake-pads', 'brake-discs', 'brake-calipers'],
    relatedParts: ['brake pad set', 'brake disc', 'brake caliper'],
    urgencyLevel: 'high',
  },
  {
    symptomPattern: 'vibrat.*steer|shak.*wheel|shimmy',
    relatedCategories: ['suspension', 'wheel-bearings', 'tie-rods'],
    relatedParts: ['wheel bearing', 'tie rod end', 'control arm'],
    urgencyLevel: 'medium',
  },
  {
    symptomPattern: 'oil.*leak|drip.*oil|oil.*under',
    relatedCategories: ['gaskets', 'seals', 'oil-system'],
    relatedParts: ['valve cover gasket', 'oil pan gasket', 'oil filter housing'],
    urgencyLevel: 'high',
  },
  {
    symptomPattern: 'ac.*cold|air.*hot|climate.*work',
    relatedCategories: ['ac-system', 'climate-control'],
    relatedParts: ['ac compressor', 'condenser', 'cabin filter', 'expansion valve'],
    urgencyLevel: 'low',
  },
  {
    symptomPattern: 'click.*turn|pop.*steer|knock.*turn',
    relatedCategories: ['cv-joints', 'steering'],
    relatedParts: ['cv joint', 'cv boot', 'steering rack'],
    urgencyLevel: 'medium',
  },
];

function matchSymptoms(query: string): SymptomMapping | null {
  for (const mapping of SYMPTOM_MAPPINGS) {
    const regex = new RegExp(mapping.symptomPattern, 'i');
    if (regex.test(query)) {
      return mapping;
    }
  }
  return null;
}

async function searchWithSymptomContext(
  query: string,
  vehicleFilter?: { brand: string; model: string }
): Promise<SearchResult> {
  const symptomMatch = matchSymptoms(query);

  if (symptomMatch) {
    // Boost results in matched categories
    const categoryBoost = symptomMatch.relatedCategories;

    // Run hybrid search with category boosting
    const results = await hybridSearch(query, {
      categoryBoost,
      vehicleFilter,
    });

    // Add urgency context to search metadata
    return {
      ...results,
      searchMeta: {
        ...results.searchMeta,
        symptomDetected: true,
        urgencyLevel: symptomMatch.urgencyLevel,
        suggestedCategories: symptomMatch.relatedCategories,
      },
    };
  }

  // No symptom detected — fall back to standard hybrid search
  return hybridSearch(query, { vehicleFilter });
}

The symptom mapping layer sits between query processing and retrieval. When it detects a symptom pattern, it narrows the search space to relevant categories and boosts those categories in the ranking step. This is not replacing semantic search — it is augmenting it with domain expertise.

On EuroParts, I built up the symptom mapping table over 6 weeks by analyzing the WhatsApp message history between customers and sales staff. Every conversation where a customer described a problem and a staff member identified the correct part became a training example. I identified 47 distinct symptom patterns that map to specific product categories. These patterns handle the vast majority of natural language queries without needing an LLM in the search path — the embeddings handle the nuance, and the symptom mappings handle the structure.

The AI Part Finder (which I wrote about separately) uses Claude for deeper conversational interactions, but the search system itself does not call an LLM. Every search query resolves in under 200ms. An LLM call would add 1-3 seconds. For search, that latency kills the experience.

Handling Typos and Synonyms

In Sri Lanka, customers search in English, Singlish, and sometimes transliterated Sinhala. "Brake pad" might arrive as "break pad," "brake pad," "brak pad," or "braking pad." On FreshMart, "tomato" might come as "thakkali" or "tamato." Traditional search fails silently on all of these.

I handle this at two levels: fuzzy matching for keyword search, and the embedding model itself for semantic search.

typescript

// Fuzzy keyword matching with PostgreSQL trigram similarity
async function fuzzyKeywordSearch(
  query: string,
  threshold = 0.3
): Promise<string[]> {
  const results = await prisma.$queryRaw<{ id: string; similarity: number }[]>`
    SELECT
      id,
      GREATEST(
        similarity(title, ${query}),
        similarity(search_document, ${query})
      ) as similarity
    FROM products
    WHERE
      similarity(title, ${query}) > ${threshold}
      OR similarity(search_document, ${query}) > ${threshold}
    ORDER BY similarity DESC
    LIMIT 50
  `;

  return results.map((r) => r.id);
}

// Synonym expansion for known domain terms
const SYNONYM_MAP: Record<string, string[]> = {
  'brake pad':      ['brake shoe', 'braking pad', 'break pad', 'disk pad'],
  'shock absorber': ['strut', 'damper', 'shock', 'suspension strut'],
  'headlight':      ['head lamp', 'headlamp', 'front light', 'head light'],
  'wiper':          ['wiper blade', 'windscreen wiper', 'windshield wiper'],
  'battery':        ['car battery', 'starter battery', 'accumulator'],
  'air filter':     ['air cleaner', 'engine filter', 'intake filter'],
  'spark plug':     ['ignition plug', 'sparking plug'],
  'coolant':        ['antifreeze', 'radiator fluid', 'cooling fluid'],
  'timing belt':    ['cam belt', 'cambelt', 'timing chain'],
};

function expandSynonyms(query: string): string[] {
  const expanded: string[] = [query];

  for (const [canonical, synonyms] of Object.entries(SYNONYM_MAP)) {
    if (query.includes(canonical)) {
      expanded.push(...synonyms.map((s) => query.replace(canonical, s)));
    }

    for (const synonym of synonyms) {
      if (query.includes(synonym)) {
        expanded.push(query.replace(synonym, canonical));
      }
    }
  }

  return [...new Set(expanded)];
}

PostgreSQL's pg_trgm extension is excellent for fuzzy matching. It compares strings using trigram (three-character subsequence) overlap. "brake pad" and "break pad" share enough trigrams to match with high similarity. This catches most typos without any ML overhead.

For semantic search, typo handling is largely free. Embedding models like text-embedding-3-small are trained on noisy internet text. They produce nearly identical vectors for "brake pad" and "break pad" because they have seen both millions of times. The embedding model is your best typo corrector — it understands meaning, not just characters.

The synonym map handles cases where both keyword search and semantic search might miss a match: abbreviations, regional language differences, and industry jargon. I maintain this manually. It is 47 entries for EuroParts and 83 for FreshMart. Small enough to curate by hand, high-impact enough to justify the effort.

Performance at Scale

Numbers matter. Here are the benchmarks from production:

EuroParts Lanka (1,444 products):

Metric	Before AI Search	After AI Search
Search-to-cart rate	8.2%	23.7%
Zero-result queries	31%	4.2%
Average results page time	45s	12s
Search P95 latency	180ms	195ms
Support tickets from search	28/week	6/week

FreshMart (6,200 products):

Metric	Before	After
Search-to-cart rate	12.4%	29.1%
Zero-result queries	22%	2.8%
Average basket size from search	3.2 items	5.8 items
Search P95 latency	210ms	240ms

The latency increase is marginal — about 15-30ms. That is the cost of one additional vector similarity query plus the embedding API call (which I cache aggressively). The business impact is dramatic. Search-to-cart rate nearly tripled on EuroParts and more than doubled on FreshMart. Zero-result queries dropped to near zero because semantic search always finds something conceptually relevant.

Here is the caching strategy that keeps latency under control:

typescript

import { Redis } from '@upstash/redis';

const redis = new Redis({
  url: process.env.UPSTASH_REDIS_URL!,
  token: process.env.UPSTASH_REDIS_TOKEN!,
});

async function getCachedEmbedding(query: string): Promise<number[] | null> {
  const cacheKey = `emb:${hashQuery(query)}`;
  const cached = await redis.get<number[]>(cacheKey);
  return cached;
}

async function cacheEmbedding(
  query: string,
  embedding: number[]
): Promise<void> {
  const cacheKey = `emb:${hashQuery(query)}`;
  // Cache for 24 hours — product catalog changes daily at most
  await redis.set(cacheKey, embedding, { ex: 86400 });
}

async function getOrCreateEmbedding(query: string): Promise<number[]> {
  const cached = await getCachedEmbedding(query);
  if (cached) return cached;

  const embedding = await generateEmbedding(query);
  await cacheEmbedding(query, embedding);
  return embedding;
}

function hashQuery(query: string): string {
  const normalized = query.toLowerCase().trim().replace(/\s+/g, ' ');
  // Simple FNV-1a hash — fast and good enough for cache keys
  let hash = 2166136261;
  for (let i = 0; i < normalized.length; i++) {
    hash ^= normalized.charCodeAt(i);
    hash = (hash * 16777619) >>> 0;
  }
  return hash.toString(36);
}

I cache query embeddings in Redis with a 24-hour TTL. On EuroParts, the cache hit rate is 67% because customers tend to search for similar symptoms repeatedly. On FreshMart, it is 54% — still significant. Each cache hit saves an embedding API call (~40ms) and reduces OpenAI costs.

For the vector similarity search itself, I use pgvector with an HNSW index:

sql

-- Create the HNSW index for fast similarity search
CREATE INDEX ON products
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);

-- Set the search parameter at query time
SET hnsw.ef_search = 100;

HNSW with m=16 and ef_construction=200 gives excellent recall (>98%) at the catalog sizes I work with. For catalogs over 100K products, you would want to tune these parameters and potentially shard the index. At EuroParts' scale (1,444 products) and FreshMart's scale (6,200 products), a single pgvector index on a Supabase Pro instance handles the load without breaking a sweat.

Results That Prove It Works

The numbers in the previous section tell the business story. Let me tell the human story.

Before AI search on EuroParts Lanka, the most common customer journey was: search on the site, find nothing useful, send a WhatsApp message, wait for a response, go back and forth 3-4 times, finally find the right part. That process took hours to days. Now, a customer types "my Golf 7 vibrates when I go over 100" and sees wheel bearings, brake discs, and CV joints for the VW Golf Mk7 in under a second. The right parts, for the right car, ranked by relevance.

One specific moment sold me on the approach. A customer searched "sound like helicopter inside car A4." That is not a query any keyword system would handle. But the semantic search matched it to wheel bearing assemblies — which is exactly what a failing wheel bearing sounds like. The customer ordered a wheel bearing kit, confirmed via WhatsApp that it fixed the noise, and left a review. That is the power of search that understands intent.

On FreshMart, the most impactful change was recipe-intent search. When someone searches "butter chicken tonight," they do not want to browse the entire grocery catalog. They want chicken thighs, yogurt, butter, garam masala, tomato paste, basmati rice, and naan. Semantic search, with recipe-enriched product embeddings, surfaces exactly those ingredients. Average basket size went from 3.2 items to 5.8 items per search session — customers find more of what they need because the search system thinks in terms of meals, not individual products.

Key Takeaways

Hybrid search is not optional. Pure keyword search misses intent. Pure semantic search misses exact matches. You need both, fused with RRF.

Embed the right text. Your product embedding should include symptom language, use cases, and compatibility data — not just the title and description.

Classify before you search. Detect whether the query is an exact part number, a natural language description, or something in between. Route accordingly.

Domain knowledge amplifies AI. Symptom-to-product mappings, synonym tables, and filter extraction are not glamorous. They are the difference between a demo and a production system.

Cache query embeddings. Most e-commerce search queries are repetitive. A Redis cache with 24-hour TTL saves latency and API costs.

Measure search-to-cart, not just search latency. A search system that returns results in 50ms but never leads to a purchase is worse than one that takes 200ms and triples your conversion rate.

Start with pgvector. If you are already on PostgreSQL (and you probably are), pgvector gives you vector search without adding another service to your architecture. You will know when you have outgrown it.

If you are building an e-commerce platform and search is a pain point, I have done this twice now and would be happy to talk through your specific use case. Check out the EuroParts Lanka case study for the full build story, or get in touch about a project.

*Uvin Vindula is a full-stack and Web3 engineer based between Sri Lanka and the UK, building production AI systems, e-commerce platforms, and decentralized applications. Follow the work at iamuvin.com↗ or reach out at contact@uvin.lk.*

Working on a Web3 or AI project?

Let's talk↗

Uvin Vindula

Web3 and AI engineer based in Sri Lanka and the UK. Author of The Rise of Bitcoin. Director of Blockchain and Software Solutions at Terra Labz. Founder of uvin.lk — Sri Lanka's Bitcoin education platform with 10,000+ learners.

hello@iamuvin.com uvin.lk↗LinkedIn↗