Match
To identify matching records across datasets that do not have a common key field
Overview
This Snap performs record linkage to identify documents from different data sources (input views) that may represent the same entity without relying on a common key. The Match Snap enables you to automatically identify matched records across datasets that do not have a common key field.

Transform-type Snap
Does not support Ultra Tasks
Prerequisites
None.
Limitations and known issues
None.
Snap views
View | Description | Examples of upstream and downstream Snaps |
---|---|---|
Input | This Snap has exactly two document inputs.
|
|
Output |
This Snap has at most three document input views.
|
|
Error |
Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are:
Learn more about Error handling in Pipelines. |
Snap settings
- Expression icon (
): JavaScript syntax to access SnapLogic Expressions to set field values dynamically (if enabled). If disabled, you can provide a static value. Learn more.
- SnapGPT (
): Generates SnapLogic Expressions based on natural language using SnapGPT. Learn more.
- Suggestion icon (
): Populates a list of values dynamically based on your Account configuration.
- Upload
: Uploads files. Learn more.
Field / field set | Type | Description |
---|---|---|
Label | String |
Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline. Default value: Match Example: Match string values |
Threshold | String/Expression |
Required. The minimum confidence required for documents to be considered matched. Minimum value: 0 Maximum value: 1 Default value: 0.8 |
Confidence | Checkbox |
Required. Select this check box to include each match's confidence levels in the output. Default status: Deselected |
Match all | Checkbox |
Required. Select this check box to match one record from the first input with multiple records in the second input. Else, the Snap matches the first record of the second input with the first record of the first input. Default status: Deselected |
Matching Criteria |
Enables you to specify the settings that you want to use to perform the matching between the two input datasets. |
|
Left field | String/Suggestion |
The field in the first dataset that you want to use for matching. This property is a JSONPath Default value: N/A Example: $name |
Right field | String/Suggestion |
The field in the second dataset that you want to use for matching. This property is a JSONPath Default value: N/A Example: $country |
Cleaner | String/Expression/Suggestion |
Select the cleaner that you want to use on the selected fields. Depending on the nature of the data in the identified input fields, you can select the kind of cleaner you want to use from the options available:
Default value: None Example: $name |
Comparator | String/Suggestion |
Important:
A comparator compares two values and produces a similarity indicator, which is represented by a number that can range from 0 (completely different) to 1 (exactly equal). Choose the comparator that you want to use on the selected fields, from the drop-down list:
Default value: Levenshtein Example: Metaphone |
Low | String/Expression |
Enter a decimal value representing the level of probability of the records to be matched if the specified fields are completely unlike. Default value: N/A Example: 0.1 |
High | String/Expression |
Enter a decimal value representing the level of probability of the records to be matched if the specified fields are exact match. Default value: N/A Example: 0.8 Note: If this value is left empty, a value of 0.95 is applied automatically.
|
Snap execution | Dropdown list |
Select one of the three modes in which the Snap executes.
Available options are:
Default value: Validate & execute Example: Execute only |
Temporary files
During execution, data processing on Snaplex nodes occurs principally in-memory as streaming and is unencrypted. When processing larger datasets that exceed the available compute memory, the Snap writes unencrypted pipeline data to local storage to optimize the performance. These temporary files are deleted when the pipeline execution completes. You can configure the temporary data's location in the Global properties table of the Snaplex node properties, which can also help avoid pipeline errors because of the unavailability of space. Learn more about Temporary Folder in Configuration Options.