Overview
Savant’s Dedupe tool is designed to help you efficiently remove duplicate values from your data sets, ensuring that your data is clean, reliable, and ready for analysis or integration with other datasets.
Use Cases
Ensuring Data Integrity: Use the Dedupe Tool to guarantee that your data contains unique values across specific columns. This is particularly useful before performing joins or creating lookup tables.
Eliminating Duplicate Rows: Remove rows from your dataset that contain identical values in all columns.
Filtering Duplicate Values: Filter and remove rows with duplicate values in selected columns.
Example
Input
ID | Name | Age | Location |
1 | John | 25 | New York |
2 | Sarah | 30 | London |
3 | Emily | 25 | Paris |
4 | John | 25 | New York |
5 | Michael | 35 | Sydney |
Output
ID | Name | Age | Location |
1 | John | 25 | New York |
2 | Sarah | 30 | London |
3 | Emily | 25 | Paris |
5 | Michael | 35 | Sydney |
In the example above, we used the Dedupe tool to remove rows with duplicate values in the Name
and Location
columns. The resulting output table contains only unique values across rows in the Name
and Location
columns.
Configuration
Select dedupe method: You have two options for dedupe:
Distinct Rows: Choose this option to remove all duplicate rows from the dataset. It considers rows with identical values in all columns as duplicates.
Dedupe on Fields: Select this option to remove rows with duplicate values in specific fields. You can choose the fields that you want to check for duplicate values.
Choose Fields to Deduplicate: If you selected "Dedupe on Fields," click the fields you want to use for deduplication. The tool will identify duplicates based on the selected fields.
Prioritize Rows (optional): If you want to prioritize certain rows before deduplication, you can use the "Prioritize by Rows" option. This feature allows you to sort the data based on specific columns and then perform deduplication.
For example, if you have multiple records with date information and want to keep the most recent records, you can prioritize rows by the created date column in descending order.
Click the Apply button to save your configuration.
💡 Savant’s dedupe tool is case insensitive.