SCD2

SCD2 stands for Slowly Changing Dimension Type 2, which stores historical values for query or audit purposes. SnapLogic AutoSync supports SCD2 when loading data for sources that can provide a last updated timestamp. Each table must have a primary key and a timestamp that is updated each time a change is made to a record. If your endpoint does not have a column or field to store the last updated timestamp, you will need to add one. Make sure that the column is not nullable, contains no null values, and updates the timestamp for all changes.

If you enable synchronizaton when creating the pipeline, by default AutoSync does a full load followed by incremental loads (upserts). To use SCD2, edit the data pipeline and change the load type.

To provide SCD2, AutoSync changes the destination schema to add columns that track whether a row contains current or historical value. Some Salesforce objects are only created, such as LeadHistory. These objects do not have a field that tracks updates and should not be used with the SCD2 load type.

You can change a data pipeline using full or incremental load to SCD2. However, use care in the following situations:

  • If you need to change a data pipeline from using SCD2 back to using incremental loads, you should first drop the target tables in the destination. Failing to drop tables synchronized with SCD2 before changing the load type can result in errors.
  • We do not recommend setting up multiple data pipelines that load the a particular table or object to the same schema in the target. However, if two pipelines load the same table to the same schema and one is set to use incremental loads and the other to use SCD2, the incremental load will fail.

AutoSync supports the SCD2 load type for the source to destination combinations listed in the following table.

Table 1. Valid SCD2 endpoint combinations
Source Amazon Redshift Google BigQuery Snowflake
Gainsight PX
Google BigQuery
Marketo
Microsoft Dynamics 365 for Sales
MySQL
Oracle
PostgreSQL
Salesforce
ServiceNow
SQL Server
Zendesk

Schema for SCD2

AutoSync tracks changes for SCD2 based on a last updated timestamp. For all destinations except MySQL, AutoSync stores timestamp values up to nanoseconds, 9 fractional digits. AutoSync updates the values in special columns to indicate whether a row contains the most recent or historical values. For example, when using SCD2 for Salesforce to Snowflake, AutoSync adds following columns to the target Snowflake table.

Column name Description
AUTOSYNC_PRIMARYKEY Contains the value of the Salesforce object id field. Used as a natural key to prevent duplicates.
AUTOSYNC_CURRENTRECORDFLAG The most recent record has a value of Y, historical records have a value of N.
AUTOSYNC_EFFECTIVEBEGINTIME The start timestamp that records when the record was added to the table.
AUTOSYNC_EFFECTIVEENDTIME The end timestamp for the record. Current records have an end timestamp of 4999-12-31 00:00:0. Historical records have an end timestamp equal to the start timestamp of the current record.
AUTOSYNC_VALUEHASH The hashed value of the record. AutoSync uses this to detect changes.

How AutoSync loads using SCD2

To populate an empty table, AutoSync inserts all rows with a value of Y in the AUTOSYNC_CURRENTRECORDFLAG column.

Changes to any field in the source, including the id, cause AutoSync to:

  • For the existing record:
    • Change the AUTOSYNC_CURRENTRECORDFLAG field value to N.
    • Change the AUTOSYNC_EFFECTIVEENDTIME field value to the current timestamp.
  • For the updated record:
    • Insert a new row.
    • Add a value of Y in the AUTOSYNC_CURRENTRECORDFLAG field.
    • Add a value of 4999-12-31 00:00:0 in the AUTOSYNC_EFFECTIVEENDTIME field.