Replicate table schema and insert records

This example demonstrates how to replicate a table schema from a source database (such as Oracle or MySQL) to Redshift and insert records while preserving the table structure, including column definitions and constraints. The Redshift - Insert Snap receives both data and metadata through its two input views, allowing it to create the table with the same schema as the source.

The pipeline uses a source database Select Snap with two output views: one for the actual data and one for the table metadata. Both are connected to the Redshift Insert Snap's two input views to replicate both structure and data.

  • A valid source database account (Oracle, MySQL, etc.)
  • A valid Redshift account
  • A source table with data to replicate
  • Appropriate permissions to read from source and create/insert in Redshift
  1. Configure the source Select Snap with two output views.

    Use a Select Snap from your source database (Oracle Select, MySQL Select, etc.) and configure it with two output views:

    • Output view "rows": Outputs the actual data records from the source table.
    • Output view "output1": Outputs the table metadata including column definitions, data types, constraints, primary keys, and indexes.

    Set the following properties in the Select Snap:

    • Schema name: Source database schema (for example, SNAPPOD).
    • Table name: Source table name (for example, EMPLOYEE_DATA).

    The metadata output contains a dump of the JDBC DatabaseMetaData class with information about:

    • Column names and data types
    • Primary key columns
    • Foreign key constraints
    • Indexes and unique constraints
  2. Configure the Redshift - Insert Snap with two input views.

    Connect both output views from the Select Snap to the Insert Snap:

    • Input view "rows": Receives the actual data to be inserted.
    • Input view "input1": Receives the table metadata to create the table structure.

    Set the following properties:

    • Schema name: Redshift schema (for example, public).
    • Table name: Target table name in Redshift (can be the same or different from source).
    • Create table if not present: Select this option to automatically create the table based on the metadata received.
  3. (Optional) Transform the metadata using a Mapper Snap.

    If you need to modify the table structure during replication, insert a Mapper Snap between the Select and Insert Snaps on the metadata path. You can:

    • Rename columns: Set $.columns.name.COLUMN_NAME to the new name.
    • Change data types: Modify the TYPE_NAME or DATA_TYPE fields.
    • Drop columns: Set COLUMN_NAME to null or empty string.
    • Adjust precision/scale for DECIMAL columns: Modify _SL_PRECISION and _SL_SCALE.

    For example, to rename the name column to full_name:

    • Target path: $.columns.name.COLUMN_NAME
    • Mapping expression: "full_name"
  4. Execute the pipeline.

    When the pipeline runs:

    1. The Select Snap extracts data and metadata from the source table.
    2. The Insert Snap receives the metadata and creates the table in Redshift if it doesn't exist.
    3. The Insert Snap inserts all records from the source table into the newly created Redshift table.

The pipeline successfully replicates the table schema and data from the source database to Redshift. The output includes:

  • Status of the table creation operation
  • Status of each record insert
  • The original input data for each record
Note: This pattern is useful for database migration scenarios where you need to replicate both the structure and data from one database to another. The Snap automatically translates data types between databases, though you may need to adjust mappings for types that don't have direct equivalents.
Important: In Redshift, uniqueness, primary key, and foreign key constraints are informational only and not enforced. The Snap will create these constraints in the table definition, but Redshift will allow duplicate primary keys and invalid foreign key references.