When using multi-row functions in the Transform tool, several configuration options are available to customize the behavior and output of these functions. Below is a detailed description of each option.
Configuration Options
Field Name: The name of the field on which the function will operate.
Multi-Row Radio Button: Select Multi-Row to open the Multi-Row builder.
Data Type: The default is to infer the data type. You can select from the list to force a specific data type for the new field.
Dense/Percent - RANK: This option is available in the RANK function to specify whether the rank should be dense or percentage-based.
ntile - NTILE: This option is specific to the NTILE function and defines the number of buckets to divide the data into.
Limit Rows in Calculation: Defines the range of rows to include in the calculation.
From:
First Row: Starts counting from the very first row in the dataset.
Current Row: Starts counting from the current row.
Row Above: Starts counting from the row immediately above the current row.
Row Below: Starts counting from the row immediately below the current row.
To:
Last Row: Ends counting at the very last row in the dataset.
Current Row: Ends counting at the current row.
Row Above: Ends counting at the row immediately above the current row.
Row Below: Ends counting at the row immediately below the current row.
Partition: Defines the grouping of rows for which the function is calculated.
Argument - Row: Specifies the row to consider for the function.
Delimiter: Used in CONCAT and CONCAT DISTINCT functions to specify the delimiter for concatenated text.
Row Offset: Used in LEAD and LAG functions to specify the number of rows to offset.
Sort: Defines the order of rows before performing the function.
Function Documentation
AVG
Description
The AVG() function calculates the average value of a specified numeric field across a range or partition. This function is useful for determining the mean value in data analysis and reporting.
Output: numeric
Example
ID | Group | Value | AVG(Value) Over (Partition by Group) |
1 | A | 10 | 15 |
2 | A | 20 | 15 |
3 | A | 15 | 15 |
4 | B | 30 | 35 |
5 | B | 40 | 35 |
This example demonstrates how the AVG() function can be used to calculate the average value within each group.
CONCAT
Description
The CONCAT() function combines unique text strings from multiple rows in a specified field into a single text string. It ensures that each string appears only once in the output.
Output: text
Example
ID | Tags | CONCAT(Tags) Delimiter (, ) |
1 | Apple | Apple, Banana, Cherry, Apple |
2 | Banana | Apple, Banana, Cherry, Apple |
3 | Cherry | Apple, Banana, Cherry, Apple |
4 | Apple | Apple, Banana, Cherry, Apple |
This example demonstrates how the CONCAT() function can be used to combine text strings from multiple rows into a single string with unique values.
CONCAT DISTINCT
Description
The CONCAT DISTINCT() function combines unique text strings from multiple rows in a specified field into a single text string. It ensures that each string appears only once in the output.
Output: text
Example
ID | Tags | CONCAT DISTINCT(Tags) Delimiter (, ) |
1 | Apple | Apple, Banana, Cherry |
2 | Banana | Apple, Banana, Cherry |
3 | Cherry | Apple, Banana, Cherry |
4 | Apple | Apple, Banana, Cherry |
This example demonstrates how the CONCAT DISTINCT() function can be used to combine unique text strings from multiple rows into a single string.
COUNT
Description
The COUNT() function counts the number of rows within a specified range or partition. This function is useful for summarizing data and generating row counts for reporting.
Output: integer
Example
ID | Group | COUNT(ID) Over (Partition by Group) |
1 | A | 3 |
2 | A | 3 |
3 | A | 3 |
4 | B | 2 |
5 | B | 2 |
This example demonstrates how the COUNT() function can be used to count the number of rows within each group.
COUNT DISTINCT
Description
The COUNT DISTINCT() function counts the number of unique values in a specified field within a range or partition. This is useful for identifying the number of distinct entries.
Output: integer
Example
ID | Group | Value | COUNT DISTINCT(Value) Over (Partition by Group) |
1 | A | 10 | 2 |
2 | A | 20 | 2 |
3 | A | 10 | 2 |
4 | B | 30 | 1 |
5 | B | 30 | 1 |
This example demonstrates how the COUNT DISTINCT() function can be used to count the number of unique values within each group.
CUM SUM
Description
The CUM SUM() function calculates the cumulative sum of a specified numeric field across a range or partition. It is useful for tracking the running total.
Output: numeric
Example
ID | Group | Value | CUM SUM(Value) Over (Partition by Group) Sort (ID) |
1 | A | 10 | 10 |
2 | A | 20 | 30 |
3 | A | 15 | 45 |
4 | B | 30 | 30 |
5 | B | 40 | 70 |
This example demonstrates how the CUM SUM() function can be used to calculate the cumulative sum within each group.
FILL
Description
The FILL() function populates missing values in a specified field using a defined strategy, such as forward fill or backward fill. It is useful for handling missing data.
Output: varies based on input field type
Example
ID | Value | FILL(Value, 'forward') |
1 | 10 | 10 |
2 | NULL | 10 |
3 | 20 | 20 |
4 | NULL | 20 |
5 | 30 | 30 |
This example demonstrates how the FILL() function can be used to populate missing values using forward fill.
FIRST
Description
The FIRST() function returns the first value of a specified field within a range or partition. This is useful for identifying the initial entry.
Output: varies based on input field type
Example
ID | Group | Value | FIRST(Value) Over (Partition by Group) |
1 | A | 10 | 10 |
2 | A | 20 | 10 |
3 | A | 15 | 10 |
4 | B | 30 | 30 |
5 | B | 40 | 30 |
This example demonstrates how the FIRST() function can be used to return the first value within each group.
LAG
Description
The LAG() function returns the value of a specified field from a previous row in the dataset. It is useful for comparing current values to previous ones.
Output: varies based on input field type
Example
ID | Value | LAG(Value) Row Offset (1) |
1 | 10 | null |
2 | 20 | 10 |
3 | 15 | 20 |
4 | 30 | 15 |
5 | 40 | 30 |
This example demonstrates how the LAG() function can be used to return the value from a previous row.
LAST
Description
The LAST() function returns the last value of a specified field within a range or partition. This is useful for identifying the final entry.
Output: varies based on input field type
Example
ID | Group | Value | LAST(Value) Over (Partition by Group) |
1 | A | 10 | 15 |
2 | A | 20 | 15 |
3 | A | 15 | 15 |
4 | B | 30 | 40 |
5 | B | 40 | 40 |
This example demonstrates how the LAST() function can be used to return the last value within each group.
LEAD
Description
The LEAD() function returns the value of a specified field from a subsequent row in the dataset. It is useful for comparing current values to future ones.
Output: varies based on input field type
Example
ID | Value | LEAD(Value) Row Offset (1) |
1 | 10 | 20 |
2 | 20 | 15 |
3 | 15 | 30 |
4 | 30 | 40 |
5 | 40 | null |
This example demonstrates how the LEAD() function can be used to return the value from a subsequent row.
MAX
Description
The MAX() function returns the maximum value of a specified field within a range or partition. This is useful for identifying the highest value.
Output: varies based on input field type
Example
ID | Group | Value | MAX(Value) Over (Partition by Group) |
1 | A | 10 | 20 |
2 | A | 20 | 20 |
3 | A | 15 | 20 |
4 | B | 30 | 40 |
5 | B | 40 | 40 |
This example demonstrates how the MAX() function can be used to return the maximum value within each group.
MEDIAN
Description
The MEDIAN() function returns the median value of a specified numeric field within a range or partition. This is useful for identifying the central value.
Output: numeric
Example
ID | Group | Value | MEDIAN(Value) Over (Partition by Group) |
1 | A | 10 | 15 |
2 | A | 20 | 15 |
3 | A | 15 | 15 |
4 | B | 30 | 35 |
5 | B | 40 | 35 |
This example demonstrates how the MEDIAN() function can be used to return the median value within each group.
MIN
Description
The MIN() function returns the minimum value of a specified field within a range or partition. This is useful for identifying the lowest value.
Output: varies based on input field type
Example
ID | Group | Value | MIN(Value) Over (Partition by Group) |
1 | A | 10 | 10 |
2 | A | 20 | 10 |
3 | A | 15 | 10 |
4 | B | 30 | 30 |
5 | B | 40 | 30 |
This example demonstrates how the MIN() function can be used to return the minimum value within each group.
NTILE
Description
The NTILE() function assigns a bucket number to each row in a partition, with the number of buckets defined by the user. This is useful for data segmentation and ranking.
Output: integer
Example
ID | Group | Value | NTILE(3) Over (Partition by Group) Sort (ID) |
1 | A | 10 | 1 |
2 | A | 20 | 2 |
3 | A | 15 | 3 |
4 | B | 30 | 1 |
5 | B | 40 | 3 |
This example demonstrates how the NTILE() function can be used to assign bucket numbers to each row within each group.
RANK
Description
The RANK() function assigns a rank to each row within a partition of the dataset, with ties receiving the same rank. This is useful for ranking data based on specified criteria.
Output: integer
Example
ID | Group | Value | RANK() Over (Partition by Group) Sort (Value) |
1 | A | 10 | 1 |
2 | A | 20 | 3 |
3 | A | 15 | 2 |
4 | B | 30 | 1 |
5 | B | 40 | 2 |
This example demonstrates how the RANK() function can be used to assign ranks to rows based on their values within each group.
ROW NUM
Description
The ROW NUM() function assigns a unique sequential integer to rows within a partition of the dataset. This is useful for generating row numbers in ordered datasets.
Output: integer
Example
ID | Group | Value | ROW NUM() Over (Partition by Group) Sort (Value) |
1 | A | 10 | 1 |
2 | A | 20 | 3 |
3 | A | 15 | 2 |
4 | B | 30 | 1 |
5 | B | 40 | 2 |
This example demonstrates how the ROW NUM() function can be used to assign unique row numbers within each group.
STDDEV
Description
The STDDEV() function calculates the standard deviation of a specified numeric field within a range or partition. This is useful for measuring the dispersion or variability of the data.
Output: numeric
Example
ID | Group | Value | STDDEV(Value) Over (Partition by Group) |
1 | A | 10 | 5 |
2 | A | 20 | 5 |
3 | A | 15 | 5 |
4 | B | 30 | 7.071 |
5 | B | 40 | 7.071 |
This example demonstrates how the STDDEV() function can be used to calculate the standard deviation of values within each group.
SUM
Description
The SUM() function calculates the total sum of a specified numeric field within a range or partition. This is useful for aggregating data and calculating totals.
Output: numeric
Example
ID | Group | Value | SUM(Value) Over (Partition by Group) |
1 | A | 10 | 45 |
2 | A | 20 | 45 |
3 | A | 15 | 45 |
4 | B | 30 | 70 |
5 | B | 40 | 70 |
This example demonstrates how the SUM() function can be used to calculate the total sum of values within each group.
VAR
Description
The VAR() function calculates the variance of a specified numeric field within a range or partition. This is useful for measuring the spread or dispersion of the data.
Output: numeric
Example
ID | Group | Value | VAR(Value) Over (Partition by Group) |
1 | A | 10 | 25 |
2 | A | 20 | 25 |
3 | A | 15 | 25 |
4 | B | 30 | 50 |
5 | B | 40 | 50 |
This example demonstrates how the VAR() function can be used to calculate the variance of values within each group.