Introduction
Elasticsearch, a powerful search engine, provides a variety of analyzers to preprocess and tokenize text during indexing and search. One such analyzer is the N-EdgeGram Analyzer, a versatile tool that offers valuable insights when working with text data. In this blog post, we’ll explore the N-EdgeGram Analyzer in Elasticsearch and demonstrate how it can enhance your search capabilities by enabling partial and prefix searches.
Understanding the N-EdgeGram Analyzer
The N-EdgeGram Analyzer is a specialized analyzer that generates N-grams, which are sequences of N characters, from the input text. Unlike traditional tokenizers that divide text into words, the N-EdgeGram Analyzer creates overlapping character sequences, providing greater flexibility when performing partial, infix, or prefix searches.
Use Case: Autocomplete Search
Suppose we have an Elasticsearch index containing product names. We want to enable autocomplete functionality in our search, allowing users to find products quickly as they type. The N-EdgeGram Analyzer will be an excellent choice to achieve this, as it can create edge-based N-grams that enable efficient partial matches.
Creating the N-EdgeGram Analyzer
Before we proceed, let’s create a custom analyzer using the N-EdgeGram Tokenizer.
PUT /products_index
{
“settings”: {
“analysis”: {
“analyzer”: {
“edge_ngram_analyzer”: {
“type”: “custom”,
“tokenizer”: “edge_ngram_tokenizer”
}
},
“tokenizer”: {
“edge_ngram_tokenizer”: {
“type”: “edge_ngram”,
“min_gram”: 2, <— Set the minimum N-gram size
“max_gram”: 10, <— Set the maximum N-gram size
“token_chars”: [
“letter”,
“digit”
]
}
}
}
},
“mappings”: {
“properties”: {
“product_name”: {
“type”: “text”,
“analyzer”: “edge_ngram_analyzer”
}
}
}
}
In the above example, we define a custom analyzer called “edge_ngram_analyzer,” which uses the “edge_ngram_tokenizer.” We specify the minimum and maximum N-gram sizes to generate edge-based N-grams for partial matching. Additionally, we ensure that only letter and digit characters are considered as valid tokens.
Indexing Data
Next, we’ll index some sample product names into our “products_index.”
POST /products_index/_doc/1
{
“product_name”: “Elasticsearch Cookbook”
}
POST /products_index/_doc/2
{
“product_name”: “Elasticsearch in Action”
}
POST /products_index/_doc/3
{
“product_name”: “Mastering Elasticsearch”
}
Performing Partial Match Search
Now that our data is indexed with the N-EdgeGram Analyzer, we can execute partial match searches for our autocomplete functionality.
GET /products_index/_search
{
“query”: {
“match”: {
“product_name”: “Elast”
}
}
}
The search query above will match all three indexed documents, providing relevant results that match the partial input “Elast.”
Conclusion
The N-EdgeGram Analyzer in Elasticsearch offers a powerful solution for implementing autocomplete and partial matching functionalities in your search applications. By creating overlapping character N-grams, the N-EdgeGram Analyzer enables more accurate and efficient matching of partial, infix, and prefix queries.
elasticsearch consulting
elasticsearch support