The Robust extended_stats Aggregation
The extended_stats aggregation plays a very important role in sophisticated statistical analytics area or usecases. This aggregation jumps up from the limitations of conventional/usual metrics and it provides a wide variety of statistical measures. Let us first take a look at the syntact of extended_stats:{
“aggs”: {
“advanced_stats”: {
“extended_stats”: {
“field”: “numeric_field”
}
}
}
}
In this code snippet, the extended_stats aggregation is applied to a numeric field which will help us to generate a detailed set of metrics such as count, mean, min, max, sum, variance, standard deviation, and more.
But before moving further, lets take a quick look at what are these metrices for the betterment of wider audience.
Count
The total count of values within the index or the dataset.
Mean
The arithmetic average which is calculated by summing all values and the dividing by the count(or number of values).
Minimum and Maximum
The smallest and largest values in the index whcih can provide us the range of the values.
Sum
It is the the summation or total of all values within the index.
Combining Aggregations with extended_stats
In the below example, we are using a terms aggregation to first categorize the data, and then the nested extended_stats aggregation to analyze statistical measures in each of those categories.{
“aggs”: {
“category_stats”: {
“terms”: {
“field”: “category.keyword”
},
“aggs”: {
“numeric_stats”: {
“extended_stats”: {
“field”: “numeric_field”
}
}
}
}
}
}
Time-Aware Statistical Analysis:
In the last episode of our series, we saw in detail about the date_histogram aggregation. Combining the data histogram with the extended stats aggregation will help to produce awesome insights that can be heavily leveraged in the many use cases.
For example, these two aggregations in combo can help a lot in understanding what is the resources of a particular instance, over what period of time and what is the range of them between different instances and what particular instance’s usage deviates from the normal mean or average usage.{
“aggs”: {
“monthly_stats”: {
“date_histogram”: {
“field”: “timestamp”,
“interval”: “month”
},
“aggs”: {
“numeric_stats”: {
“extended_stats”: {
“field”: “numeric_field”
}
}
}
}
}
}
This query uses a date_histogram aggregation to segment data into monthly intervals, while the nested extended_stats aggregation computes statistical measures for the numeric field within each month.
The extended_stats aggregation in Elasticsearch not just provide metrics but it can also open up heavily detailed insights just like the example we saw above with the server instances and their resource usages.
For more detailed documentation on the extended_stats aggregation, the official Elastic documentation can help you a lot: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-extendedstats-aggregation.html