Dedupe ⭐️

Remove duplicates from your datasets

Updated over a week ago

Overview

Savant’s Dedupe tool is designed to help you efficiently remove duplicate values from your data sets, ensuring that your data is clean, reliable, and ready for analysis or integration with other datasets.

Use Cases

  • Ensuring Data Integrity: Use the Dedupe Tool to guarantee that your data contains unique values across specific columns. This is particularly useful before performing joins or creating lookup tables.

  • Eliminating Duplicate Rows: Remove rows from your dataset that contain identical values in all columns.

  • Filtering Duplicate Values: Filter and remove rows with duplicate values in selected columns.

Example

Input

ID

Name

Age

Location

1

John

25

New York

2

Sarah

30

London

3

Emily

25

Paris

4

John

25

New York

5

Michael

35

Sydney

Output

ID

Name

Age

Location

1

John

25

New York

2

Sarah

30

London

3

Emily

25

Paris

5

Michael

35

Sydney

In the example above, we used the Dedupe tool to remove rows with duplicate values in the Name and Location columns. The resulting output table contains only unique values across rows in the Name and Location columns.

Configuration

  1. Select dedupe method: You have two options for dedupe:

    • Distinct Rows: Choose this option to remove all duplicate rows from the dataset. It considers rows with identical values in all columns as duplicates.

    • Dedupe on Fields: Select this option to remove rows with duplicate values in specific fields. You can choose the fields that you want to check for duplicate values.

  2. Choose Fields to Deduplicate: If you selected "Dedupe on Fields," click the fields you want to use for deduplication. The tool will identify duplicates based on the selected fields.

  3. Prioritize Rows (optional): If you want to prioritize certain rows before deduplication, you can use the "Prioritize by Rows" option. This feature allows you to sort the data based on specific columns and then perform deduplication.

    • For example, if you have multiple records with date information and want to keep the most recent records, you can prioritize rows by the created date column in descending order.

  4. Click the Apply button to save your configuration.

💡 Savant’s dedupe tool is case insensitive.

Did this answer your question?