Understanding Subtract
Subtract is a type of join operation that helps identify the records in one table that do not have corresponding matches in another table. It is also referred to as a “subtract join” or “anti join.” The Subtract tool returns the rows from one table that have no matching values in the specified column(s) of another table. It is used to find the missing or non-matching records between two tables. It is useful when you want to identify records in one dataset that are not present in another dataset based on a specified condition.
Add a Subtract Tool
1. Navigate to your analysis.
👉 NoteYou may need to manually connect the second table to the second input node of the Subtract tool.
Configuration
Add a Subtract tool to your Canvas.
Connect the second table that you want to subtract against the first table to the second input node of the Subtract tool. The tool has two input nodes for two input connections.
Decide which table you want to use as the Subtract Table. This table will be the reference to find records that do not exist in the other table.
Specify the fields and operators to define the comparison conditions between the two tables.
Click the Add button to include additional fields to compare against using the Subtract Tool.
Once you've made the necessary configurations, click "Apply" to apply the subtraction in your analysis.
Operators
Operator | Description | Example (Assuming Join on "ID" Column) |
= | Equal to | Joins rows where the "ID" values are identical in both data sets. |
!= | Not equal to | Joins rows where the "ID" values are different in the two data sets. |
> | Greater than | Joins rows where the "ID" value in one data set is greater than the "ID" value in the other data set. |
>= | Greater than or equal to | Joins rows where the "ID" value in one data set is greater than or equal to the "ID" value in the other data set. |
< | Less than | Joins rows where the "ID" value in one data set is less than the "ID" value in the other data set. |
<= | Less than or equal to | Joins rows where the "ID" value in one data set is less than or equal to the "ID" value in the other data set. |
Example
Suppose we have two tables related to an e-commerce website: "Customers" and "Newsletter Subscribers." Both tables contain information about users who interact with the website.
Table: Customers
CustomerID | Name | Age | Address | |
1 | John Smith | 30 | 123 Main St | |
2 | Jane Doe | 25 | 456 Oak Ave | |
3 | Bob Johnson | 40 | 789 Maple Rd |
Table: Newsletter Subscribers
SubscriberID | Name | |
101 | John Smith | |
102 | Jane Doe | |
103 | Alice Lee |
In this scenario, the "Customers" table contains information about all registered customers on the website, whereas the "Newsletter Subscribers" table contains information about users who have subscribed to the website's newsletter.
The user wants to find out the customers who are NOT subscribed to the newsletter. They need to perform a subtract join between the "Customers" and "Newsletter Subscribers" tables based on the email column.
The user might configure Customers as the Subtract Table to identify which “Customers” are not subscribers based on whether the Customers.Name field equals the values in the NewsletterSubscribers.Name field.