Syntax
The significant_text aggregation is very useful when we want to extract meaningful keywords from a large chunk of texts. Usually the traditional aggregations models that work mostly based on the term frequency but the significatn_text aggregation uses statistical significance to actually find out the terms that stands out than the rest.
{
"aggs": {
"significant_keywords": {
"significant_text": {
"field": "content",
"size": 10
}
}
}
}
In this query snippet, we can extract the power of significant_text by specifying the field (content) from which we want to extract significant keywords and setting the size of the result set to 10.
How is it different from Term Frequency
Usually keyword extraction is heavily based upon term frequency to identify important words within a document. But the problemis this approach may overlook some of the terms that are significant even when they are not that much frequent in an index. By using its algorithmic powers the significant_text aggregation will be able to identify the terms that occur more often together than random chance.
Real time example:
Let us take a look into a real-world scenario where significant_text can help us. The requirement is to extract key concepts from a collection of fruit descriptions. Let us consider that we have an index that contains the details of fruits. Our requirement is to identify terms that are not only frequent but are also statistically significant that shows their importance.
{
"aggs": {
"fruit_description_keywords": {
"significant_text": {
"field": "description",
"size": 5
}
}
}
}
In this example, we apply the significant_text aggregation to the description field and then we are aiming to extract five statistically significant fruit keywords whcih will contain the essence of the fruit descriptions.
Combining with Other Aggregations
As I have demonstrated in previous episodes, the true power of significant_text(any queries or aggregations) emerges when we combine it with other aggregations. We can use it here to gain a detailed understanding of fruit-related data. here integrating significant_text with the terms aggregation to not only extract significant fruit keywords but also categorize them based on their occurrence in different fruit categories.
{
"aggs": {
"category_keywords": {
"terms": {
"field": "category.keyword"
},
"aggs": {
"significant_keywords": {
"significant_text": {
"field": "content",
"size": 5
}
}
}
}
}
}
In this query, we are using conventional pattern by combining significant_text with the terms aggregation. Now, we will be able to get both significant fruit keywords(terms query) annd also categorize them based on their prevalence in different fruit categories(significant_text).
The variedness of significant_text makes it very useful across various types of usecases or industries. In legal documents, we can use it to identify legally significant terms. In customer review use cases, we can use it to pinpoint crucial sentiments and so on.