Implement Slowly Changing Dimension Type 2

Overview

This example demonstrates how to use the Redshift SCD2 Snap to track historical changes in dimension tables using SCD Type 2 methodology. The Snap enables historization of changing data while preserving both current and historical records.

Prerequisites

  • Amazon Redshift cluster with appropriate access
  • Valid Redshift account configured in SnapLogic
  • Understanding of Slowly Changing Dimension concepts
  • Target table with appropriate SCD2 columns (start_date, end_date, active flag)

SCD Type 2 Concepts

Slowly Changing Dimension Type 2 (SCD2) is a technique for tracking historical changes in dimension data. When a record's data changes:

  • The existing record is marked as historical (active = false) and end-dated
  • A new record is inserted with the updated data (active = true) and a new start date
  • Both records share the same natural key but have different surrogate keys

This allows queries to access either current data or historical snapshots based on date ranges.

Pipeline Overview

The pipeline consists of two Snaps working together:

  1. Redshift SCD2 Snap: Compares incoming data against the target table, identifying new and changed records
  2. Redshift Bulk Upsert Snap: Performs the actual database operations to historize old records and insert new ones

Snap Configuration

Redshift SCD2 Snap - Settings:


Redshift SCD2 Configuration
  • Schema name: Target schema (e.g., public)
  • Table name: Target dimension table
  • Natural key: Columns that uniquely identify a business entity (e.g., station_id, zipcode)
  • Cause-historization fields: Columns whose changes trigger historization (e.g., premium, regular)
  • SCD Fields: Configure dimension tracking columns:
    • Current row field: active (value: true for current, false for historical)
    • Start date field: start_date (value: Date.now())
    • End date field: end_date (value: Date.now() when historized, null for current)
  • Ignore unchanged rows: Optional - skip generating output for records with no changes

Redshift Bulk Upsert Snap - Settings:

  • Table name: Same as SCD2 Snap target table
  • Key columns: Must include at least one natural key field from SCD2 configuration

How It Works

When the pipeline executes:

  1. The SCD2 Snap receives incoming data and performs a lookup in the target table using the natural key
  2. For each input record:
    • If no match found: Generates one output document for insertion as a new current record
    • If match found and cause-historization fields changed: Generates two output documents:
      1. One to update the existing record (mark as historical, set end_date)
      2. One to insert the new record (mark as current, set start_date)
    • If match found and no changes: Either skips or generates pass-through based on configuration
  3. The Bulk Upsert Snap executes the updates and inserts generated by the SCD2 Snap

Example Data Flow

Initial Load:

Input: { "station_id": "U01", "zipcode": "94402", "premium": 4.13, "regular": 3.83 }
Output: One record marked as current (active=true, start_date=now, end_date=null)

After Price Update:

Input: { "station_id": "U01", "zipcode": "94402", "premium": 4.25, "regular": 3.98 }
Output: Two records:
  1. Update existing: active=false, end_date=now
  2. Insert new: active=true, start_date=now, end_date=null, premium=4.25, regular=3.98

Use Cases

This pattern is useful for:

  • Tracking price changes over time
  • Maintaining customer address history
  • Recording product attribute changes
  • Preserving organizational hierarchy changes
  • Any scenario requiring historical data snapshots

Best Practices

  • Choose natural keys that truly identify unique business entities
  • Select cause-historization fields carefully - only those whose changes matter
  • Ensure the target table has appropriate indexes on natural key columns
  • Use the Ignore unchanged rows option to reduce unnecessary processing
  • Test with small datasets to verify historization logic before full deployment
  • Consider data volume and retention policies when implementing SCD2