About the Sample tool
The Sample tool allows users to randomly select a sample from their dataset.
Add a Sample tool
1. Navigate to your analysis.
4. Double-Click to open the Sample tool in a separate tab. This allows for more in-depth analysis and configuration.
Configuration
Add a Sample tool to your analysis.
Choose the appropriate Random Sample Strategy from the available options ("N Rows," "1 Row per N Rows," or "N Percent of Rows").
Set the value of "n" in the chosen sample strategy in the Sample Dimension to specify the size of the random sample.
After making the necessary configurations, click Apply to generate the random sample.
Random Sample Strategy
Strategy | Description |
N Rows | Selects a specific number of rows from the dataset. Users can specify the exact number of rows in the sample. |
1 Row per N Rows | Selects one row from the dataset for every specified number of rows (N). Users can set the value of N. |
N Percent of Rows | Selects a percentage of rows from the dataset. Users can specify the percentage of rows in the sample. |
Randomness in Samples
It's essential to note that every time the user clicks Apply, runs a test, executes a bot run, or initiates a scheduled bot run, the Sample tool will produce a different subset of randomly sampled data. This dynamic behavior ensures that each analysis benefits from a diverse and unbiased sample.
Use Cases and Benefits
1. Quality Checks and Compliance Audits
The Sample tool enables users to perform quality checks on large datasets by selecting a random sample for auditing.
Compliance audits can be made more manageable by reviewing a random subset of records for adherence to regulations and standards.
2. Experimental Cohort Testing
When conducting experimental cohort testing, researchers can use the Sample tool to create controlled groups from the dataset, ensuring unbiased sampling.
This allows for the comparison of outcomes between different cohorts while minimizing bias.
3. Speeding up Development
During the development phase, analysts can use the Sample tool to test and refine analyses on smaller samples, reducing processing time.
Once the analysis is validated on a sample, it can be confidently applied to the entire dataset.
4. Exploratory Data Analysis
Random sampling helps analysts explore the dataset's characteristics, patterns, and relationships without being overwhelmed by the entire dataset's size.
Analysts can gain initial insights into the data, identify trends, and plan further analyses.
5. Resource Optimization
For large datasets, the Sample tool can help optimize computing resources by reducing the dataset size while preserving its representativeness.
This is especially useful in scenarios where processing power or memory is limited.
6. Data Privacy and Security
When sharing datasets internally or with external stakeholders, using a random sample ensures data privacy, as sensitive information is not exposed.
Random sampling can be used as a privacy safeguard for specific analyses.