What is bucket_script:
The bucket_script pipeline is a important tool for creating custom aggregations based on maths and calculations. It enables us to define complex logic that combines and then also transforms the metrics as per our logic and will help to create a new level of flexibility.{
“aggs”: {
“custom_aggregation”: {
“bucket_script”: {
“buckets_path”: {
“totalApples”: “sum_apples”,
“averageBananas”: “avg_bananas”
},
“script”: “params.totalApples / params.averageBananas”,
“gap_policy”: “insert_zeros”
}
}
}
}
If you are thinking, why cant we do these sort of calculations with traditional metrics aggregations itself, the answer is here. The traditional metrics aggregations can only provide a already defined set of calculations such as sum, average, min, max, and more. But we can use the bucket_script and bucket_selector aggregations to define and execute custom calculations and filters.
In the above example, we are creating a custom aggregation by dividing the total apples (sum_apples) by the average bananas (avg_bananas) which is not possible in the traditional aggregation models. They would have only let us to either do the sum or the average and not both.
What is bucket_selector:
The bucket_selector aggregation on the other hand allows users to filter data based on specific conditions. Like we can use it to narrow down the dataset for more focused analysis. This is very useful when we want to include or exclude buckets based on our defined criteria or as per customer requirements.{
“aggs”: {
“filtered_aggregation”: {
“terms”: {
“field”: “fruit_type.keyword”
},
“aggs”: {
“total_apples”: {
“sum”: {
“field”: “apples_count”
}
},
“filter_high_apples”: {
“bucket_selector”: {
“buckets_path”: {
“totalApples”: “total_apples”
},
“script”: “params.totalApples > 50”
}
}
}
}
}
}
From the above query snippet, we are first trying to aggregate the apple data by fruit type and then we are trying to use the bucket_selector to filter out fruit types with total apple counts less than 50. This will allow us to do more refined analysis of higher-performing apple types.
Combining bucket_script and bucket_selector:
As we have seen throughout this series, any query or aggregation turns very powerful when we combine them based on our usecase. Let’s explore a scenario where we first try to calculate a custom metric using bucket_script and then filter the results based on metrics using bucket_selector.{
“aggs”: {
“custom_workflow”: {
“bucket_script”: {
“buckets_path”: {
“totalApples”: “sum_apples”,
“averageBananas”: “avg_bananas”
},
“script”: “params.totalApples / params.averageBananas”,
“gap_policy”: “insert_zeros”,
“scripted_metric”: {
“init_script”: “…”,
“map_script”: “…”,
“combine_script”: “…”,
“reduce_script”: “…”
}
},
“aggs”: {
“filtered_workflow”: {
“bucket_selector”: {
“buckets_path”: {
“customMetric”: “custom_workflow”
},
“script”: “params.customMetric > 0.5”
}
}
}
}
}
}
In this little complex example, we are first creating a custom metric using bucket_script and then we are filtering the results based on that metric using bucket_selector.
Time series analysis:
Alright let us try to make use of a practical example where these custom aggregations are demonstrated better: time series analysis. Consider a scenario where we want to identify months with a significant increase in sales compared to the previous month.{
“aggs”: {
“monthly_apple_sales”: {
“date_histogram”: {
“field”: “timestamp”,
“interval”: “month”
},
“aggs”: {
“monthly_total_apple_sales”: {
“sum”: {
“field”: “apples_count”
}
},
“custom_increase”: {
“bucket_script”: {
“buckets_path”: {
“prevAppleSales”: “monthly_total_apple_sales>prev”
},
“script”: “params.prevAppleSales / params.prevAppleSales > 1.5”,
“gap_policy”: “insert_zeros”
}
},
“filtered_months”: {
“bucket_selector”: {
“buckets_path”: {
“increaseFlag”: “custom_increase”
},
“script”: “params.increaseFlag == true”
}
}
}
}
}
}
So what are we trying to do in this example is,
- we are aggregating the apple sales data on a monthly basis,
- then calculate a custom metric using bucket_script to identify months with an apple sales increase of more than 50%
- then filter the results using bucket_selector to focus on significant increases.
I understand it may sound confusing at the first glance, but it will get better and better with each iterations or tries. For more details on these two custom aggregation models, the official Elastic documentation can help you a lot:
Bucket_script: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-bucket-script-aggregation.html
Bucket_selector: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-bucket-selector-aggregation.html