Skip to content
Home » Decoding Elasticsearch Query DSL: Navigating Statistical Insights with extended_stats Aggregation

Decoding Elasticsearch Query DSL: Navigating Statistical Insights with extended_stats Aggregation

  • by

The Robust extended_stats Aggregation

The extended_stats aggregation plays a very important role in sophisticated statistical analytics area or usecases. This aggregation jumps up from the limitations of conventional/usual metrics and it provides a wide variety of statistical measures. Let us first take a look at the syntact of extended_stats:{
“aggs”: {
“advanced_stats”: {
“extended_stats”: {
“field”: “numeric_field”
}
}
}
}

In this code snippet, the extended_stats aggregation is applied to a numeric field which will help us to generate a detailed set of metrics such as count, mean, min, max, sum, variance, standard deviation, and more.

But before moving further, lets take a quick look at what are these metrices for the betterment of wider audience.

Count

The total count of values within the index or the dataset.

Mean

The arithmetic average which is calculated by summing all values and the dividing by the count(or number of values).

Minimum and Maximum

The smallest and largest values in the index whcih can provide us the range of the values.

Sum

It is the the summation or total of all values within the index.

Combining Aggregations with extended_stats

In the below example, we are using a terms aggregation to first categorize the data, and then the nested extended_stats aggregation to analyze statistical measures in each of those categories.{
“aggs”: {
“category_stats”: {
“terms”: {
“field”: “category.keyword”
},
“aggs”: {
“numeric_stats”: {
“extended_stats”: {
“field”: “numeric_field”
}
}
}
}
}
}

Time-Aware Statistical Analysis:

In the last episode of our series, we saw in detail about the date_histogram aggregation. Combining the data histogram with the extended stats aggregation will help to produce awesome insights that can be heavily leveraged in the many use cases.

For example, these two aggregations in combo can help a lot in understanding what is the resources of a particular instance, over what period of time and what is the range of them between different instances and what particular instance’s usage deviates from the normal mean or average usage.{
“aggs”: {
“monthly_stats”: {
“date_histogram”: {
“field”: “timestamp”,
“interval”: “month”
},
“aggs”: {
“numeric_stats”: {
“extended_stats”: {
“field”: “numeric_field”
}
}
}
}
}
}

This query uses a date_histogram aggregation to segment data into monthly intervals, while the nested extended_stats aggregation computes statistical measures for the numeric field within each month.

The extended_stats aggregation in Elasticsearch not just provide metrics but it can also open up heavily detailed insights just like the example we saw above with the server instances and their resource usages.

For more detailed documentation on the extended_stats aggregation, the official Elastic documentation can help you a lot: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-extendedstats-aggregation.html

Leave a Reply

Your email address will not be published. Required fields are marked *

For AI, Search, Content Management & Data Engineering Services

Get in touch with us