SCD2
SCD2 stands for Slowly Changing Dimension Type 2, which stores historical values for query or audit purposes. SnapLogic AutoSync supports SCD2 when loading data for sources that can provide a last updated timestamp. Each table must have a primary key and a timestamp that is updated each time a change is made to a record. The supported data type for the timestamp field differs by endpoint. Refer to the endpoint specific documentation for details.
If your endpoint does not have a column or field to store the last updated timestamp, you will need to add one to use SCD2. Make sure that the column is not nullable, contains no null values, and updates the timestamp value for all changes.
If you enable synchronization when creating the pipeline, for targets that support incremental load, by default AutoSync does a full load followed by incremental loads (upserts). To use SCD2, edit the data pipeline and change the load type.
To provide SCD2, AutoSync changes the destination schema to add columns that track whether a row contains current or historical value.
Some Salesforce objects are only created, such as LeadHistory
.
These objects do not have a field that tracks updates and should not be used with the SCD2 load type.
You can change a data pipeline using full or incremental load to SCD2. However, use care in the following situations:
- If you need to change a data pipeline from using SCD2 back to using incremental loads, you should first drop the target tables in the destination. Failing to drop tables synchronized with SCD2 before changing the load type can result in errors.
- We do not recommend setting up multiple data pipelines that load the a particular table or object to the same schema in the target. However, if two pipelines load the same table to the same schema and one is set to use incremental loads and the other to use SCD2, the incremental load will fail.
AutoSync supports the SCD2 load type for the source to destination combinations listed in the following table.
Source | Amazon Redshift | Google BigQuery | Snowflake |
---|---|---|---|
Gainsight PX | |||
Google BigQuery | |||
Marketo | |||
Microsoft Dynamics 365 for Sales | |||
MySQL | |||
Oracle | |||
PostgreSQL | |||
Salesforce | |||
ServiceNow | |||
SQL Server | |||
Zendesk |
Schema for SCD2
AutoSync tracks changes for SCD2 based on a last updated timestamp. For all destinations except MySQL, AutoSync stores timestamp values up to nanoseconds, 9 fractional digits. AutoSync updates the values in special columns to indicate whether a row contains the most recent or historical values. For example, when using SCD2 for Salesforce to Snowflake, AutoSync adds following columns to the target Snowflake table.
Column name | Description |
---|---|
|
Contains the value of the Salesforce object id field.
Used as a natural key to prevent duplicates.
|
|
The most recent record has a value of Y , historical records have a value of N . |
|
The start timestamp that records when the record was added to the table. |
|
The end timestamp for the record.
Current records have an end timestamp of 4999-12-31 00:00:0 .
Historical records have an end timestamp equal to the start timestamp of the current record.
|
|
The hashed value of the record. AutoSync uses this to detect changes. |
How AutoSync loads using SCD2
To populate an empty table, AutoSync inserts all rows with a value of Y
in the AUTOSYNC_CURRENTRECORDFLAG
column.
Changes to any field in the source, including the id
, cause AutoSync to:
- For the existing record:
- Change the
AUTOSYNC_CURRENTRECORDFLAG
field value toN
. - Change the
AUTOSYNC_EFFECTIVEENDTIME
field value to the current timestamp.
- Change the
- For the updated record:
- Insert a new row.
- Add a value of
Y
in theAUTOSYNC_CURRENTRECORDFLAG
field. - Add a value of 4999-12-31 00:00:0 in the
AUTOSYNC_EFFECTIVEENDTIME
field.