Redshift - Insert

Overview

This Snap executes a SQL INSERT statement using the document's keys as the columns to insert to and the document's values as the values to insert into the columns.

According to Amazon's Redshift documentation, Amazon strongly recommends using the BULK insert functionality, advising against using the INSERT statement. The Redshift - Insert Snap will use the batch INSERT methodology suggested, but the performance of Redshift in executing might be, as Amazon advises, prohibitively.

SnapLogic has created the Redshift - Bulk Load Snap to address this issue, automating the use of the advised BULK insert using the COPY command. Refer to the Redshift - Bulk Load Snap for more information.

ETL Transformations & Data Flow

The Snap receives data from upstream component and loads the data into the selected table in the Redshift DB.

This is a Write-type Snap.
Works in Ultra Tasks

Prerequisites

A valid Redshift Account with the required permissions.

Limitations

Database Write Snaps output all records of a batch (as configured in your account settings) to the error view if the write fails during batch processing.
If you use the PostgreSQL driver (org.postgresql.Driver) with the Redshift Snap Pack, it could result in errors if the data type provided to the Snap does not match the data type in the Redshift table schema. Either use the Redshift driver (com.amazon.redshift.jdbc42.Driver) or use the correct data type in the input document to resolve these errors.

Snap views


Type	Description	Examples of upstream and downstream Snaps
Input	This Snap has one document input view by default. A second input view can be added for table metadata so that the table is created in Redshift with a similar schema as the source table.	Mapper JSON Generator File Reader
Output	This Snap has at most one output view containing the status and the original input.
Learn more about Error handling.

Examples

Insert records into Redshift table: Insert records into Redshift table using basic configuration
Insert records with data transformation: Insert records with data transformation using Mapper Snap
Replicate table schema and insert records: Replicate table schema and insert records into Redshift

Snap settings

Note: Learn about the common controls in the Snap settings dialog.


Field/Field set	Description
Label `String`	Required. Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline. Default value: Redshift - Insert Example: Redshift - Insert
Schema name `String/Expression/ Suggestion`	Specify the database schema name. Leave this field blank to use the default schema. Default value: N/A Example: public
Table name `String/Expression/ Suggestion`	Required. Specify the name of the table into which to insert data. Default value: N/A Example: employees
Create table if not present `Checkbox`	Select this checkbox to create the table if it does not exist. Default value: Deselected Example: Selected
Number of retries `Integer`	Specify the maximum number of retry attempts the Snap must make in case of network failure. Default value: 0 Example: 3
Retry interval (seconds) `Integer`	Specify the time interval between two successive retry requests. A retry happens only when the previous attempt resulted in an exception. Default value: 1 Example: 10
Snap execution `Dropdown list`	Select one of the three modes in which the Snap executes. Available options are: Validate & Execute: Performs limited execution of the Snap and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime. Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data. Disabled: Disables the Snap and all Snaps that are downstream from it. Default value: Execute only Example: Validate & Execute

Table Creation

If the table does not exist when the Snap tries to do the insert, and the Create table if not present property is selected, the table will be created with the columns and data types required to hold the values in the first input document. If you would like the table to be created with the same schema as a source table, you can connect the second output view of a Select Snap to the second input view of this Snap. The extra view in the Select and Bulk Load Snaps are used to pass metadata about the table, effectively allowing you to replicate a table from one database to another.

The table metadata document that is read in by the second input view contains a dump of the JDBC DatabaseMetaData class. The document can be manipulated to affect the CREATE TABLE statement that is generated by this Snap. For example, to rename the name column to full_name, you can use a Mapper Snap that sets the path $.columns.name.COLUMN_NAME to full_name. The document contains the following fields:

columns - Contains the result of the getColumns() method with each column as a separate field in the object. Changing the COLUMN_NAME value will change the name of the column in the created table. Note that if you change a column name, you do not need to change the name of the field in the row input documents. The Snap will automatically translate from the original name to the new name. For example, when changing from name to full_name, the name field in the input document will be put into the "full_name" column. You can also drop a column by setting the COLUMN_NAME value to null or the empty string. The other fields of interest in the column definition are:
- TYPE_NAME - The type to use for the column. If this type is not known to the database, the DATA_TYPE field will be used as a fallback. If you want to explicitly set a type for a column, set the DATA_TYPE field.
- _SL_PRECISION - Contains the result of the getPrecision() method. This field is used along with the _SL_SCALE field for setting the precision and scale of a DECIMAL or NUMERIC field.
- _SL_SCALE - Contains the result of the getScale() method. This field is used along with the _SL_PRECISION field for setting the precision and scale of a DECIMAL or NUMERIC field.
primaryKeyColumns - Contains the result of the getPrimaryKeys() method with each column as a separate field in the object.
declaration - Contains the result of the getTables() method for this table. The values in this object are just informational at the moment. The target table name is taken from the Snap property.
importedKeys - Contains the foreign key information from the getImportedKeys() method. The generated CREATE TABLE statement will include FOREIGN KEY constraints based on the contents of this object. Note that you will need to change the PKTABLE_NAME value if you changed the name of the referenced table when replicating it.
indexInfo - Contains the result of the getIndexInfo() method for this table with each index as a separated field in the object. Any UNIQUE indexes in here will be included in the CREATE TABLE statement generated by this Snap.

Note: The Primary key is not enforced in RedShift, therefore it will allow duplicated records inserted with the same primary key. In fact, uniqueness, primary key, and foreign key constraints are informational only in Amazon RedShift. For more information, refer to: http://docs.aws.amazon.com/redshift/latest/dg/t_Defining_constraints.html

The Snap will not automatically fix some errors encountered during table creation since they may require user intervention to resolve correctly. For example, if the source table contains a column with a type that does not have a direct mapping in the target database, the Snap will fail to execute. You will then need to add a Mapper (Data) Snap to change the metadata document to explicitly set the values needed to produce a valid CREATE TABLE statement.

Troubleshooting

None.