Elasticsearch, a powerful search engine, provides a variety of analyzers to preprocess and tokenize text during indexing and search. One such analyzer is the N-EdgeGram Analyzer, a versatile tool that offers valuable insights when working with text data. In this blog post, we’ll explore the N-EdgeGram Analyzer in Elasticsearch and demonstrate how it can enhance your search capabilities by enabling partial and prefix searches.
Understanding the N-EdgeGram Analyzer
The N-EdgeGram Analyzer is a specialized analyzer that generates N-grams, which are sequences of N characters, from the input text. Unlike traditional tokenizers that divide text into words, the N-EdgeGram Analyzer creates overlapping character sequences, providing greater flexibility when performing partial, infix, or prefix searches.
Use Case: Autocomplete Search
Suppose we have an Elasticsearch index containing product names. We want to enable autocomplete functionality in our search, allowing users to find products quickly as they type. The N-EdgeGram Analyzer will be an excellent choice to achieve this, as it can create edge-based N-grams that enable efficient partial matches.
Creating the N-EdgeGram Analyzer
Before we proceed, let’s create a custom analyzer using the N-EdgeGram Tokenizer.
“min_gram”: 2, <— Set the minimum N-gram size
“max_gram”: 10, <— Set the maximum N-gram size
In the above example, we define a custom analyzer called “edge_ngram_analyzer,” which uses the “edge_ngram_tokenizer.” We specify the minimum and maximum N-gram sizes to generate edge-based N-grams for partial matching. Additionally, we ensure that only letter and digit characters are considered as valid tokens.
Next, we’ll index some sample product names into our “products_index.”
“product_name”: “Elasticsearch Cookbook”
“product_name”: “Elasticsearch in Action”
“product_name”: “Mastering Elasticsearch”
Performing Partial Match Search
Now that our data is indexed with the N-EdgeGram Analyzer, we can execute partial match searches for our autocomplete functionality.
The search query above will match all three indexed documents, providing relevant results that match the partial input “Elast.”
The N-EdgeGram Analyzer in Elasticsearch offers a powerful solution for implementing autocomplete and partial matching functionalities in your search applications. By creating overlapping character N-grams, the N-EdgeGram Analyzer enables more accurate and efficient matching of partial, infix, and prefix queries.