Case studies of query response time

CASE STUDY # 1: ON VECTOR SEARCH
ON TEXT TO TEXT MATCHING ECOMMERCE CLIENT

Hardware Specs – 8 VCores, 52 GB RAM
Cloud VM – GCP n-standard high memory
Operating System – Ubuntu 16.04
Python Version – 3.7.10
Number of Vectors – 26,129,342
Vector dimensions – 128
Number of concurrent requests – 10
Performance test tool – locust

BEFORE: INITIAL VECTOR SEARCH SOLUTION USING KDTREE

Top 100 nearest vectors are extracted for a single Query Vector and it takes 281 ms per request on an average.

PROBLEM OVERVIEW AND SOLUTION

Problem Statement – The product titles in an e-commerce website are encoded in embeddings of 128 dimension and stored as the index. Whenever a user comes, types a search query and the exact words are not present in the product titles (due to typos or synonyms) then the user’s search query is encoded in the same embedding space and top 100 similar product ids are extracted using the vector search. The KDTree approach does not scale up with throughput requirement as the number of products increases.
Solution

Use FAISS as the vector search index instead of KDTree.
FAISS gives the approximate nearest neighbours for the query vector with distances (cosine distance in our case). Use METRIC.INNER_PRODUCT and normalize the vectors before indexing and perform the similar treatment for the query vector.
Keep the FAISS index in the main memory.
While creating the index set the number of clusters to enhance the speed. It can be set using nlist parameters.
Product Quantization is used for faster search in the main memory.
Retrain the index everyday with the latest set of vectors instead of using just the add() function.

AFTER: FAISS BASED VECTOR SEARCH

Top 100 nearest vectors are extracted for a single Query Vector and it takes 23 ms per request on an average ~ 12X improvement in the speed.

SMART SEARCH ENGINE
(CREATING AN INTELLIGENT SEARCH SERVICE IN PYTHON)

System Configuration:
Processor: Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz
Proc Count : 36 cores
Ram : 256 GB
Storage: 3.3TB
NIC Card : 2 GBPS
Model : PowerEdge R630

CASE STUDY #2

Problem Statement

Company estimated to lose $ 2M every year due to Higher Grade selection during requisition.
Lawsuits filed by Employees due to Lower Grade mapping in recruitments.

Solution !!

Develop a tool for context/Job Description Generation.
Use the tool to assess the right fit work profile and the grade based on the required Job Description.
Approach
Build a powerful Search Engine that can search most relevant JD out of the Dataset based on Key- words / Context

WHAT IT DOES ?

The intelligence of search engines has been increasing for a very simple reason, the value that an effective search tool can bring to a business is enormous; a key piece of intellectual property.
Enables search based on context and Key-words and gives best semantically similar matched Job descriptions
Be able to scale up to larger datasets
Performs searches at milli seconds for large datasets
Handle spelling mistakes, typos and previously ‘unseen’ words in an intelligent way

Prior Art

No system in place which could help HR to understand the Grade and Job family which fits the Requirement
Manual typing of Job Descriptions, slower search speed – around 3 to 4 min per Query
Wrong selections of Job Descriptions lead to inappropriate hiring of candidature & revenue loss

Major Improvements Expected / Value to Company ?

122 Times Faster and accurate results with ML/AI Based Approach towards HR Automation with 4X times big Data Size.
Avoids Revenue Loss due to Higher Grade which was mistakenly selected in requisition
Avoids lawsuits from employees due to Lower Grade mapping while recruitments

WHY WORD VECTORS? WHY NOT BERT/GPT-3/[LATEST SOTA NLP MODEL]?

SOLUTION SUMMARY

Technology Used – BM25, Fasttext from Facebook, NMSLIB.
BM25 provides diminishing returns for the number of terms matched against documents.
Creating Word Vectors – Building a fasttext model
Applying BM25 to Word Vectors
Creating a superfast search index with NMSLIB

FROM 3 MIN TO 0.0004 SECOND3S5

NMSLIB vs Simple Semantic Search

Execution Time NMSLIB vs Simple Semantic Search Engine

We see how combining word vectors to BM25 and supercharging this with a fast similarity search index can create a smart, scalable and performant search engine.
Query Search time is very less for large sized dataset.
Search results are very accurate and context based.

Query

Simple Search Engine (Time In Min) NMSLIB Based Search (Time in Min)

CASE STUDY #3

Domain: LifeSciences
Number of documents: 10 million
Elastic search: 7.0
System requirements: Ubuntu 18.04,8 core,32 GB,2 TB SSD.
Data ingestion: Batch mode(once a day)
Programming Language: Python

ELASTICSEARCH SOLUTION(BEFORE)

ElasticSearch performs match_all (goes through all the documents) to retrieve the results.
Takes ~2 – 3 seconds per query.
NEW UPDATE: Elastic has introduced dense_vectors field type from 7.0 version.
Might improve recall but still ES lags behind FAISS in terms of latency.

FAISS SOLUTION

Uses ANN(Approximate nearest neighbours) and brings down the search space and time.
Takes 400 ms/query to retrieve top 100 documents.
Recall is slightly higher with FAISS.
Indexing/Reindexing is 5x faster than Elastic search.

FAISS REQUIREMENTS

Vector dimension:256
System Requirements: Ubuntu 18.04,8 core,32 GB,256 GB SSD.
S/W requirements: Python 3.7,FAISS

CASE STUDY#4: CHATBOT

we created a chatbot using BERT and FastAPI.
There were around 1500 questions in the Database.
So we used BERT for encoding questions, each vector of encodings are of 768 dimensions.
The aim was to find the cosine similarity between the new query encodings and the encodings already present in the Database and return the indexes of top ‘n’ similarities.
We were using low end system and doing brute-force.

ENVIRONMENT

Hardware Specs: 2 core CPU, 4 GB RAM, 30GB HDD
Cloud VM : normal HDD
Operating System: Ubuntu 16.04
Python Version: 3.6.7
Number of vectors: 1500
Vector dimensions: 768
Number of Concurrent requests: 2

BEFORE:

100 queries were taking around 2.5 hrs. i.e. 1.5 mins/query
The major time was consumed in doing Brute- force(linear search), i.e. comparing the new query encodings to encodings of all the existing questions.
We can see from the graph that linear search is most expensive.

SOLUTION

We propose to use FAISS as a tool to find n – most similar encodings.
FAISS is Facebook’s library for similarity search for very large datasets
FAISS uses clusters of n-dimensional vector encodings to make search faster
FAISS maintains the 5 ms search time for index size as large as 1M vectors

AFTER

With FAISS, same search for 1500 encodings takes ~5ms (0.005s) which earlier was taking ~1.5min.

About Nextbrick

AI

Search

Content Management

Data Engineering

Emerging Technologies

Software Development

ERP

Our Product

About Nextbrick

AI

Search

Content Management

Data Engineering

Emerging Technologies

Software Development

ERP

Our Product

Case Studies of Query response time optimize solutions for vector search in AI ML applications and solr search applications

CASE STUDY # 1: ON VECTOR SEARCH
ON TEXT TO TEXT MATCHING ECOMMERCE CLIENT

CASE STUDY #2

CASE STUDY #3

CASE STUDY#4: CHATBOT

Leave a Reply Cancel reply

Looking for an expert provider of software, services, and technology solutions?

Helpful Links

Official Info

Newsletter

For AI, Search, Content Management & Data Engineering Services

ERP

ERP

Case Studies of Query response time optimize solutions for vector search in AI ML applications and solr search applications

CASE STUDY # 1: ON VECTOR SEARCHON TEXT TO TEXT MATCHING ECOMMERCE CLIENT

CASE STUDY #2

CASE STUDY #3

CASE STUDY#4: CHATBOT

Leave a Reply Cancel reply

Looking for an expert provider of software, services, and technology solutions?

Helpful Links

Official Info

Newsletter

For AI, Search, Content Management & Data Engineering Services

CASE STUDY # 1: ON VECTOR SEARCH
ON TEXT TO TEXT MATCHING ECOMMERCE CLIENT