Redshift - SCD2
Overview
This Snap provides the functionality of SCD (slowly changing dimension) Type 2 on a target Redshift table. The Snap executes one SQL lookup request per set of input documents to avoid making a request for every input record. Its output is typically a stream of documents for the Redshift - Bulk Upsert Snap, which updates or inserts rows into the target table. Therefore, this Snap must be connected to the Redshift - Bulk Upsert Snap to accomplish the complete SCD2 functionality.
ETL Transformations and Data Flow
This Snap enables the following ETL operations/flows:
- Take the incoming document from the upstream snap and perform a lookup operation in the database, producing one or two documents in the output view.
- If a record exists in the database (with the values provided in the input document), generate two output documents; otherwise generates only one.
- Feed the output documents to the Redshift Bulk Upsert Snap, which will insert them into the destination table to preserve history.
- This is a Write-type Snap.
Works in Ultra Tasks
Prerequisites
- Redshift database installed.
- Able to connect to the database from the desired plex nodes.
- A valid Redshift Account with the required permissions.
Known Issues
If you use the PostgreSQL driver (org.postgresql.Driver) with the Redshift Snap Pack, it could result in errors if the data type provided to the Snap does not match the data type in the Redshift table schema. Either use the Redshift driver (com.amazon.redshift.jdbc42.Driver) or use the correct data type in the input document to resolve these errors.
Snap views
| Type | Description | Examples of upstream and downstream Snaps |
|---|---|---|
| Input |
This Snap has exactly one input view and expects documents in the view. |
|
| Output |
This Snap allows zero or one output views and produces documents in the view. |
|
| Learn more about Error handling. | ||
Examples
- Implement Slowly Changing Dimension Type 2: Implement Slowly Changing Dimension Type 2 with Redshift
Snap settings
| Field/Field set | Description | ||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Label
|
Required. Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline. Default value: Redshift - SCD2 Example: Redshift - SCD2 |
||||||||||||||||||||||||||||||||
|
Schema name
|
The name of the database schema that contains the table whose data you want to update. Selecting a schema filters the Table name list to only show tables created in the selected schema. If you do not specify a schema in the Schema name field, the Table name field lists out all tables with the name you specify in all schemas in the database. Default value: N/A Warning: The values can be passed using the pipeline parameters but not the upstream parameter fields.
|
||||||||||||||||||||||||||||||||
|
Table name
|
Required. The name of the table that contains the data you want to update. Default value: N/A Warning: The values can be passed using the pipeline parameters but not the upstream parameter fields.
|
||||||||||||||||||||||||||||||||
|
Natural key
|
Required. Names of fields that identify a unique row in the target table. The identity key cannot be used as the Natural key, since a current row and its historical rows cannot have the same natural key value. Default value: N/A Example: id (Each record has to have a unique value.) Warning: The values can be passed using the pipeline parameters but not the upstream parameter fields.
|
||||||||||||||||||||||||||||||||
|
Cause-historization fields
|
Required. Names of fields where any change in values causes the historization of an existing row and the insertion of a new 'current' row. Default value: N/A Example: gold bullion rate |
||||||||||||||||||||||||||||||||
|
SCD Fields
|
Required. Enter the field names you want to use as dimension fields–or select them from the suggestion list. The columns in the table specified in the Table name field above are used to populate the suggestion list. Example:
Default value:
|
||||||||||||||||||||||||||||||||
|
Ignore unchanged rows
|
Specifies whether the Snap must ignore writing unchanged rows from the source table to the target table. If you enable this option, the Snap generates a corresponding document in the target only if the Cause-historization column in the source row is changed. Else, the Snap does not generate any corresponding document in the target. Default value: Not selected |
||||||||||||||||||||||||||||||||
|
Snap execution
|
Select an option to specify how the Snap must be executed. Available options are:
Default value: Validate & Execute Example: Execute only |
||||||||||||||||||||||||||||||||
Troubleshooting
| Error | Reason | Resolution |
|---|---|---|
| type "e" does not exist | This issue occurs due to incompatibilities with the recent upgrade in the Postgres JDBC drivers. | Download the latest 4.1 Amazon Redshift driver here and use this driver in your Redshift Account configuration and retry running the Pipeline. |