Skip to main content
Dedupe Tool ⭐️

Combine datasets based on matching fields, with the flexibility to handle different join types.

Updated yesterday

About the Dedupe Tool

The Dedupe tool helps you remove duplicate rows from your datasets based on specific fields or all fields, ensuring that your dataset contains only unique entries.

Configuration Options

1. Deduplication Type

  • Distinct Rows: Removes rows that are exact duplicates across all fields in the dataset. Use this option to ensure every row in the dataset is completely unique.

  • Dedupe on Fields: Removes duplicate rows based on specific fields. Select one or more fields to define the criteria for deduplication. For example, selecting "Customer_ID" will remove rows with duplicate values in this field, retaining only one instance of each.

2. Sort By

  • When using the Dedupe on Fields option, you can specify how rows should be prioritized if duplicates are identified. Use the Sort By field to choose a field (e.g., "Order_ID") and set the sort order (ascending or descending) to retain the row with the highest or lowest value.

3. Group By

  • Allows you to specify fields to group rows by for deduplication purposes. Only rows within the same group are evaluated for duplicates.

4. Include Path for Duplicates

  • Select the Include path for duplicates checkbox to create an additional output path specifically for duplicate records.

Did this answer your question?