Syntax
The significant_text aggregation is very useful when we want to extract meaningful keywords from a large chunk of texts. Usually the traditional aggregations models that work mostly based on the term frequency but the significatn_text aggregation uses statistical significance to actually find out the terms that stands out than the rest.

{
  "aggs": {
    "significant_keywords": {
      "significant_text": {
        "field": "content",
        "size": 10
      }
    }
  }
}

In this query snippet, we can extract the power of significant_text by specifying the field (content) from which we want to extract significant keywords and setting the size of the result set to 10.

How is it different from Term Frequency

Usually keyword extraction is heavily based upon term frequency to identify important words within a document. But the problemis this approach may overlook some of the terms that are significant even when they are not that much frequent in an index. By using its algorithmic powers the significant_text aggregation will be able to identify the terms that occur more often together than random chance.

Real time example:

Let us take a look into a real-world scenario where significant_text can help us. The requirement is to extract key concepts from a collection of fruit descriptions. Let us consider that we have an index that contains the details of fruits. Our requirement is to identify terms that are not only frequent but are also statistically significant that shows their importance.

{
  "aggs": {
    "fruit_description_keywords": {
      "significant_text": {
        "field": "description",
        "size": 5
      }
    }
  }
}

In this example, we apply the significant_text aggregation to the description field and then we are aiming to extract five statistically significant fruit keywords whcih will contain the essence of the fruit descriptions.

Combining with Other Aggregations

As I have demonstrated in previous episodes, the true power of significant_text(any queries or aggregations) emerges when we combine it with other aggregations. We can use it here to gain a detailed understanding of fruit-related data. here integrating significant_text with the terms aggregation to not only extract significant fruit keywords but also categorize them based on their occurrence in different fruit categories.

{
  "aggs": {
    "category_keywords": {
      "terms": {
        "field": "category.keyword"
      },
      "aggs": {
        "significant_keywords": {
          "significant_text": {
            "field": "content",
            "size": 5
          }
        }
      }
    }
  }
}

In this query, we are using conventional pattern by combining significant_text with the terms aggregation. Now, we will be able to get both significant fruit keywords(terms query) annd also categorize them based on their prevalence in different fruit categories(significant_text).

The variedness of significant_text makes it very useful across various types of usecases or industries. In legal documents, we can use it to identify legally significant terms. In customer review use cases, we can use it to pinpoint crucial sentiments and so on.

bucket-significanttext-aggregation.html

About Nextbrick

AI

Search

Content Management

Data Engineering

Emerging Technologies

Software Development

ERP

Our Product

About Nextbrick

AI

Search

Content Management

Data Engineering

Emerging Technologies

Software Development

ERP

Our Product

Decoding Elasticsearch Query DSL: significant_text Aggregation for Keyword Extraction

How is it different from Term Frequency

Real time example:

Combining with Other Aggregations

Leave a Reply Cancel reply

Looking for an expert provider of software, services, and technology solutions?

Helpful Links

Official Info

Newsletter

For AI, Search, Content Management & Data Engineering Services