Skip to content
Home » Elasticsearch Search with N-EdgeGram Analyzer

Elasticsearch Search with N-EdgeGram Analyzer

  • by
Elasticsearch Search with N-EdgeGram Analyzer

Introduction

Elasticsearch, a powerful search engine, provides a variety of analyzers to preprocess and tokenize text during indexing and search. One such analyzer is the N-EdgeGram Analyzer, a versatile tool that offers valuable insights when working with text data. In this blog post, we’ll explore the N-EdgeGram Analyzer in Elasticsearch and demonstrate how it can enhance your search capabilities by enabling partial and prefix searches.

Understanding the N-EdgeGram Analyzer

The N-EdgeGram Analyzer is a specialized analyzer that generates N-grams, which are sequences of N characters, from the input text. Unlike traditional tokenizers that divide text into words, the N-EdgeGram Analyzer creates overlapping character sequences, providing greater flexibility when performing partial, infix, or prefix searches.

Use Case: Autocomplete Search

Suppose we have an Elasticsearch index containing product names. We want to enable autocomplete functionality in our search, allowing users to find products quickly as they type. The N-EdgeGram Analyzer will be an excellent choice to achieve this, as it can create edge-based N-grams that enable efficient partial matches.

Creating the N-EdgeGram Analyzer

Before we proceed, let’s create a custom analyzer using the N-EdgeGram Tokenizer.

PUT /products_index

{

  “settings”: {

    “analysis”: {

      “analyzer”: {

        “edge_ngram_analyzer”: {

          “type”: “custom”,

          “tokenizer”: “edge_ngram_tokenizer”

        }

      },

      “tokenizer”: {

        “edge_ngram_tokenizer”: {

          “type”: “edge_ngram”,

          “min_gram”: 2,      <— Set the minimum N-gram size

          “max_gram”: 10,     <— Set the maximum N-gram size

          “token_chars”: [

            “letter”,

            “digit”

          ]

        }

      }

    }

  },

  “mappings”: {

    “properties”: {

      “product_name”: {

        “type”: “text”,

        “analyzer”: “edge_ngram_analyzer”

      }

    }

  }

}

In the above example, we define a custom analyzer called “edge_ngram_analyzer,” which uses the “edge_ngram_tokenizer.” We specify the minimum and maximum N-gram sizes to generate edge-based N-grams for partial matching. Additionally, we ensure that only letter and digit characters are considered as valid tokens.

Indexing Data

Next, we’ll index some sample product names into our “products_index.”

POST /products_index/_doc/1

{

  “product_name”: “Elasticsearch Cookbook”

}

POST /products_index/_doc/2

{

  “product_name”: “Elasticsearch in Action”

}

POST /products_index/_doc/3

{

  “product_name”: “Mastering Elasticsearch”

}

Performing Partial Match Search

Now that our data is indexed with the N-EdgeGram Analyzer, we can execute partial match searches for our autocomplete functionality.

GET /products_index/_search

{

  “query”: {

    “match”: {

      “product_name”: “Elast”

    }

  }

}

The search query above will match all three indexed documents, providing relevant results that match the partial input “Elast.”

Conclusion

The N-EdgeGram Analyzer in Elasticsearch offers a powerful solution for implementing autocomplete and partial matching functionalities in your search applications. By creating overlapping character N-grams, the N-EdgeGram Analyzer enables more accurate and efficient matching of partial, infix, and prefix queries.

elasticsearch consulting

elasticsearch support

Leave a Reply

Your email address will not be published. Required fields are marked *

For Search, Content Management & Data Engineering Services

Get in touch with us