Skip to content
Home » Case Studies of Query response time optimize solutions for vector search in AI ML applications and solr search applications

Case Studies of Query response time optimize solutions for vector search in AI ML applications and solr search applications

  • by

CASE STUDY # 1: ON VECTOR SEARCH
ON TEXT TO TEXT MATCHING ECOMMERCE CLIENT

  1. Hardware Specs – 8 VCores, 52 GB RAM
  2. Cloud VM – GCP n-standard high memory
  3. Operating System – Ubuntu 16.04
  4. Python Version – 3.7.10
  5. Number of Vectors – 26,129,342
  6. Vector dimensions – 128
  7. Number of concurrent requests – 10
  8. Performance test tool – locust

BEFORE: INITIAL VECTOR SEARCH SOLUTION USING KDTREE

  • Top 100 nearest vectors are extracted for a single Query Vector and it takes 281 ms per request on an average.

PROBLEM OVERVIEW AND SOLUTION

Problem Statement – The product titles in an e-commerce website are encoded in embeddings of 128 dimension and stored as the index. Whenever a user comes, types a search query and the exact words are not present in the product titles (due to typos or synonyms) then the user’s search query is encoded in the same embedding space and top 100 similar product ids are extracted using the vector search. The KDTree approach does not scale up with throughput requirement as the number of products increases.
Solution

  1. Use FAISS as the vector search index instead of KDTree.
  2. FAISS gives the approximate nearest neighbours for the query vector with distances (cosine distance in our case). Use METRIC.INNER_PRODUCT and normalize the vectors before indexing and perform the similar treatment for the query vector.
  3. Keep the FAISS index in the main memory.
  4. While creating the index set the number of clusters to enhance the speed. It can be set using nlist parameters.
  5. Product Quantization is used for faster search in the main memory.
  6. Retrain the index everyday with the latest set of vectors instead of using just the add() function.

AFTER: FAISS BASED VECTOR SEARCH

  • Top 100 nearest vectors are extracted for a single Query Vector and it takes 23 ms per request on an average ~ 12X improvement in the speed.

SMART SEARCH ENGINE
(CREATING AN INTELLIGENT SEARCH SERVICE IN PYTHON)

  • System Configuration:
  • Processor: Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz
  • Proc Count : 36 cores
  • Ram : 256 GB
  • Storage: 3.3TB
  • NIC Card : 2 GBPS
  • Model : PowerEdge R630

CASE STUDY #2

Problem Statement

  • Company estimated to lose $ 2M every year due to Higher Grade selection during requisition.
  • Lawsuits filed by Employees due to Lower Grade mapping in recruitments.

Solution !!

  • Develop a tool for context/Job Description Generation.
  • Use the tool to assess the right fit work profile and the grade based on the required Job Description.
  • Approach
  • Build a powerful Search Engine that can search most relevant JD out of the Dataset based on Key- words / Context

WHAT IT DOES ?

  • The intelligence of search engines has been increasing for a very simple reason, the value that an effective search tool can bring to a business is enormous; a key piece of intellectual property.
  • Enables search based on context and Key-words and gives best semantically similar matched Job descriptions
  • Be able to scale up to larger datasets
  • Performs searches at milli seconds for large datasets
  • Handle spelling mistakes, typos and previously ‘unseen’ words in an intelligent way

Prior Art

  • No system in place which could help HR to understand the Grade and Job family which fits the Requirement
  • Manual typing of Job Descriptions, slower search speed – around 3 to 4 min per Query
  • Wrong selections of Job Descriptions lead to inappropriate hiring of candidature & revenue loss

Major Improvements Expected / Value to Company ?

  • 122 Times Faster and accurate results with ML/AI Based Approach towards HR Automation with 4X times big Data Size.
  • Avoids Revenue Loss due to Higher Grade which was mistakenly selected in requisition
  • Avoids lawsuits from employees due to Lower Grade mapping while recruitments

WHY WORD VECTORS? WHY NOT BERT/GPT-3/[LATEST SOTA NLP MODEL]?

SOLUTION SUMMARY

  • Technology Used – BM25, Fasttext from Facebook, NMSLIB.
  • BM25 provides diminishing returns for the number of terms matched against documents.
  • Creating Word Vectors – Building a fasttext model
  • Applying BM25 to Word Vectors
  • Creating a superfast search index with NMSLIB

FROM 3 MIN TO 0.0004 SECOND3S5

NMSLIB vs Simple Semantic Search

Execution Time NMSLIB vs Simple Semantic Search Engine

  • We see how combining word vectors to BM25 and supercharging this with a fast similarity search index can create a smart, scalable and performant search engine.
  • Query Search time is very less for large sized dataset.
  • Search results are very accurate and context based.

Query

Simple Search Engine (Time In Min) NMSLIB Based Search (Time in Min)

CASE STUDY #3

  • Domain: LifeSciences
  • Number of documents: 10 million
  • Elastic search: 7.0
  • System requirements: Ubuntu 18.04,8 core,32 GB,2 TB SSD.
  • Data ingestion: Batch mode(once a day)
  • Programming Language: Python

ELASTICSEARCH SOLUTION(BEFORE)

  • ElasticSearch performs match_all (goes through all the documents) to retrieve the results.
  • Takes ~2 – 3 seconds per query.
  • NEW UPDATE: Elastic has introduced dense_vectors field type from 7.0 version.
  • Might improve recall but still ES lags behind FAISS in terms of latency.

FAISS SOLUTION

  • Uses ANN(Approximate nearest neighbours) and brings down the search space and time.
  • Takes 400 ms/query to retrieve top 100 documents.
  • Recall is slightly higher with FAISS.
  • Indexing/Reindexing is 5x faster than Elastic search.

FAISS REQUIREMENTS

  • Vector dimension:256
  • System Requirements: Ubuntu 18.04,8 core,32 GB,256 GB SSD.
  • S/W requirements: Python 3.7,FAISS

CASE STUDY#4: CHATBOT

  • we created a chatbot using BERT and FastAPI.
  • There were around 1500 questions in the Database.
  • So we used BERT for encoding questions, each vector of encodings are of 768 dimensions.
  • The aim was to find the cosine similarity between the new query encodings and the encodings already present in the Database and return the indexes of top ‘n’ similarities.
  • We were using low end system and doing brute-force.

ENVIRONMENT

  • Hardware Specs: 2 core CPU, 4 GB RAM, 30GB HDD
  • Cloud VM : normal HDD
  • Operating System: Ubuntu 16.04
  • Python Version: 3.6.7
  • Number of vectors: 1500
  • Vector dimensions: 768
  • Number of Concurrent requests: 2

BEFORE:

  • 100 queries were taking around 2.5 hrs. i.e. 1.5 mins/query
  • The major time was consumed in doing Brute- force(linear search), i.e. comparing the new query encodings to encodings of all the existing questions.
  • We can see from the graph that linear search is most expensive.

SOLUTION

  • We propose to use FAISS as a tool to find n – most similar encodings.
  • FAISS is Facebook’s library for similarity search for very large datasets
  • FAISS uses clusters of n-dimensional vector encodings to make search faster
  • FAISS maintains the 5 ms search time for index size as large as 1M vectors

AFTER

  • With FAISS, same search for 1500 encodings takes ~5ms (0.005s) which earlier was taking ~1.5min.

Leave a Reply

Your email address will not be published. Required fields are marked *

For Search, Content Management & Data Engineering Services

Get in touch with us