Skip to main content
Sample Tool ⭐️

Like the RAND() function in Excel, the Sample tool selects a random sample of a dataset every time the bot runs.

Updated over a week ago

About the Sample tool

The Sample tool allows users to randomly select a sample from their dataset.


Add a Sample tool

1. Navigate to your analysis.

2. Click this icon.

3. Click Sample.

4. Double-Click to open the Sample tool in a separate tab. This allows for more in-depth analysis and configuration.


Configuration

  1. Add a Sample tool to your analysis.

  2. Choose the appropriate Random Sample Strategy from the available options ("N Rows," "1 Row per N Rows," or "N Percent of Rows").

  3. Set the value of "n" in the chosen sample strategy in the Sample Dimension to specify the size of the random sample.

  4. After making the necessary configurations, click Apply to generate the random sample.

Random Sample Strategy

Strategy

Description

N Rows

Selects a specific number of rows from the dataset. Users can specify the exact number of rows in the sample.

1 Row per N Rows

Selects one row from the dataset for every specified number of rows (N). Users can set the value of N.

N Percent of Rows

Selects a percentage of rows from the dataset. Users can specify the percentage of rows in the sample.


Randomness in Samples

It's essential to note that every time the user clicks Apply, runs a test, executes a bot run, or initiates a scheduled bot run, the Sample tool will produce a different subset of randomly sampled data. This dynamic behavior ensures that each analysis benefits from a diverse and unbiased sample.


Use Cases and Benefits

1. Quality Checks and Compliance Audits

  • The Sample tool enables users to perform quality checks on large datasets by selecting a random sample for auditing.

  • Compliance audits can be made more manageable by reviewing a random subset of records for adherence to regulations and standards.

2. Experimental Cohort Testing

  • When conducting experimental cohort testing, researchers can use the Sample tool to create controlled groups from the dataset, ensuring unbiased sampling.

  • This allows for the comparison of outcomes between different cohorts while minimizing bias.

3. Speeding up Development

  • During the development phase, analysts can use the Sample tool to test and refine analyses on smaller samples, reducing processing time.

  • Once the analysis is validated on a sample, it can be confidently applied to the entire dataset.

4. Exploratory Data Analysis

  • Random sampling helps analysts explore the dataset's characteristics, patterns, and relationships without being overwhelmed by the entire dataset's size.

  • Analysts can gain initial insights into the data, identify trends, and plan further analyses.

5. Resource Optimization

  • For large datasets, the Sample tool can help optimize computing resources by reducing the dataset size while preserving its representativeness.

  • This is especially useful in scenarios where processing power or memory is limited.

6. Data Privacy and Security

  • When sharing datasets internally or with external stakeholders, using a random sample ensures data privacy, as sensitive information is not exposed.

  • Random sampling can be used as a privacy safeguard for specific analyses.

Did this answer your question?