Advanced Elasticsearch N-EdgeGram Search

Introduction

Elasticsearch, a powerful search engine, provides a variety of analyzers to preprocess and tokenize text during indexing and search. One such analyzer is the N-EdgeGram Analyzer, a versatile tool that offers valuable insights when working with text data. In this blog post, we’ll explore the N-EdgeGram Analyzer in Elasticsearch and demonstrate how it can enhance your search capabilities by enabling partial and prefix searches.

Understanding the N-EdgeGram Analyzer

The N-EdgeGram Analyzer is a specialized analyzer that generates N-grams, which are sequences of N characters, from the input text. Unlike traditional tokenizers that divide text into words, the N-EdgeGram Analyzer creates overlapping character sequences, providing greater flexibility when performing partial, infix, or prefix searches.

Use Case: Autocomplete Search

Suppose we have an Elasticsearch index containing product names. We want to enable autocomplete functionality in our search, allowing users to find products quickly as they type. The N-EdgeGram Analyzer will be an excellent choice to achieve this, as it can create edge-based N-grams that enable efficient partial matches.

Creating the N-EdgeGram Analyzer

Before we proceed, let’s create a custom analyzer using the N-EdgeGram Tokenizer.

PUT /products_index

{

“settings”: {

“analysis”: {

“analyzer”: {

“edge_ngram_analyzer”: {

“type”: “custom”,

“tokenizer”: “edge_ngram_tokenizer”

}

“tokenizer”: {

“edge_ngram_tokenizer”: {

“type”: “edge_ngram”,

“min_gram”: 2, <— Set the minimum N-gram size

“max_gram”: 10, <— Set the maximum N-gram size

“token_chars”: [

“letter”,

“digit”

]

}

“mappings”: {

“properties”: {

“product_name”: {

“type”: “text”,

“analyzer”: “edge_ngram_analyzer”

}

In the above example, we define a custom analyzer called “edge_ngram_analyzer,” which uses the “edge_ngram_tokenizer.” We specify the minimum and maximum N-gram sizes to generate edge-based N-grams for partial matching. Additionally, we ensure that only letter and digit characters are considered as valid tokens.

Indexing Data

Next, we’ll index some sample product names into our “products_index.”

POST /products_index/_doc/1

{

“product_name”: “Elasticsearch Cookbook”

}

POST /products_index/_doc/2

{

“product_name”: “Elasticsearch in Action”

}

POST /products_index/_doc/3

{

“product_name”: “Mastering Elasticsearch”

}

Performing Partial Match Search

Now that our data is indexed with the N-EdgeGram Analyzer, we can execute partial match searches for our autocomplete functionality.

GET /products_index/_search

{

“query”: {

“match”: {

“product_name”: “Elast”

}

The search query above will match all three indexed documents, providing relevant results that match the partial input “Elast.”

Conclusion

The N-EdgeGram Analyzer in Elasticsearch offers a powerful solution for implementing autocomplete and partial matching functionalities in your search applications. By creating overlapping character N-grams, the N-EdgeGram Analyzer enables more accurate and efficient matching of partial, infix, and prefix queries.

elasticsearch consulting

elasticsearch support

About Nextbrick

Search

Content Management

Data Engineering

Software Development

Emerging Technologies

Our Product

Elasticsearch Search with N-EdgeGram Analyzer

Leave a Reply Cancel reply

Looking for the Best Search Consultant?

Helpful Links

Official Info

Newsletter

For Search, Content Management & Data Engineering Services