The Summarize tool allows users to perform a variety of summary calculations and aggregations on their data, making it easy to analyze and understand large datasets.
Add a Summarize tool
1. Navigate to your analysis.
4. Double-Click to open the Summarize tool in a separate tab. This allows for more in-depth analysis and configuration.
Configuration
Add a Summarize tool to your analysis.
Determine if you want to group the aggregations. To add a Group By field, click the plus button under Group By and select the field from the list. To remove a Group By field, click the trash can icon next to the selected field.
Optionally, you can add sorting by selecting the plus button under Sort, then choose the field to sort by. Use the ascending and descending icons to change the sort order. Click the X button to delete the sorting configuration.
If you wish to perform window aggregations, check the checkbox for Window Aggregations. Specify the From and To options (e.g., Unbounded, Current Row, Proceeding, Following) and the number of rows for each.
To create an Aggregation column in your dataset, click the plus button under Aggregations.
Select the field or use the expression option from the Argument dropdown.
Choose the appropriate Calculation for the data type of the argument selected, and provide a New field name if desired.
Click Apply to confirm the aggregation.
To add another aggregated column, click the plus button under your existing aggregation, configure the new aggregation, and click Apply.
Deleting Aggregations: To delete an aggregated column, click the trash can icon next to the aggregation.
Window Aggregations
Window aggregations, also known as window functions or window operations, allow you to perform aggregate calculations over a specific "window" or subset of data within a larger dataset. Unlike regular aggregate functions (e.g., SUM, COUNT, AVG) that operate on the entire dataset, window functions partition the data into smaller groups or "windows" based on specified criteria, and then perform calculations within each window.
In window aggregations, the FROM
and TO
options, along with the options of "Unbounded," "Current Row," "Preceding," and "Following," are used to define the range of rows within the partition to include in the window frame for each row being processed.
Option | Description |
Unbounded | Includes all rows in the partition from the beginning |
Current Row | Includes only the current row in the partition |
Preceding | Includes a specified number of rows before the current row |
Following | Includes a specified number of rows after the current row |
FROM | TO | Description | Result |
Unbounded | Unbounded | Includes all rows in the partition. | All rows in the partition. |
Unbounded | Current Row | Includes all rows from the beginning of the partition up to and including the current row. | All rows from the start to the current row. |
Unbounded | Preceding | Includes all rows from the beginning of the partition up to a specified number of rows before the current row. | All rows from the start to (current - N) rows. |
Unbounded | Following | Includes all rows from the beginning of the partition up to a specified number of rows after the current row. | All rows from the start to (current + N) rows. |
Current Row | Unbounded | Includes all rows from the end of the partition up to the and including the current row. | Only the current row. |
Current Row | Current Row | Includes only the current row in the partition. | Only the current row. |
Current Row | Following | Includes a specified number of rows after the current row. | (current + N) rows. |
Preceding | Current Row | Includes a specified number of rows before the current row. | (current - N) rows. |
Preceding | Preceding | Includes a specified number of rows before the current row. | (current - N1) rows. |
Preceding | Following | Includes a specified number of rows before the current row and a specified number of rows after the current row. | (current - N1) rows to (current + N2) rows. |
Following | Following | Includes a specified number of rows after the current row. | (current + N) rows. |
Aggregations
Aggregation Function | Description |
COUNT | Returns the number of rows in a group or the total number of rows in a table if used without grouping. |
COUNT DISTINCT | Returns the number of unique values in a column or a combination of columns. |
MIN | Returns the minimum value from a column. |
MAX | Returns the maximum value from a column. |
AVG | Returns the average value of a numeric column. |
SUM | Returns the sum of numeric values in a column. |
STDDEV | Returns the standard deviation of a set of numeric values in a column. |
VAR | Returns the variance of a set of numeric values in a column. |
MEDIAN | Returns the middle value of a numeric column when the data is ordered. If even values, returns the average. |
CONCAT | Concatenates two or more strings together. |
CONCAT DISTINCT | Concatenates two or more unique string values together. |