About the Dedupe Tool
The Dedupe tool helps you remove duplicate rows from your datasets based on specific fields or all fields, ensuring that your dataset contains only unique entries.
Configuration Options
1. Deduplication Type
Distinct Rows: Removes rows that are exact duplicates across all fields in the dataset. Use this option to ensure every row in the dataset is completely unique.
Dedupe on Fields: Removes duplicate rows based on specific fields. Select one or more fields to define the criteria for deduplication. For example, selecting "Customer_ID" will remove rows with duplicate values in this field, retaining only one instance of each.
2. Sort By
When using the Dedupe on Fields option, you can specify how rows should be prioritized if duplicates are identified. Use the Sort By field to choose a field (e.g., "Order_ID") and set the sort order (ascending or descending) to retain the row with the highest or lowest value.
3. Group By
Allows you to specify fields to group rows by for deduplication purposes. Only rows within the same group are evaluated for duplicates.
4. Include Path for Duplicates
Select the Include path for duplicates checkbox to create an additional output path specifically for duplicate records.