Redshift - Update

Overview

This Snap executes a SQL update with the given properties. Keys from the document will be used as the columns to update and their values will be the updated value in the column.

ETL Transformations & Data Flow

This snap is a data source. It works by performing a standard JDBC DatabaseMetaData#getTables() query for the tables within the database. The Schema name value is used to populate the schemaPattern parameter in that query.

This snap does not require any temporary files or other external resources.

Prerequisites

A valid Redshift Account with the required permissions.

Limitations

If you use the PostgreSQL driver (org.postgresql.Driver) with the Redshift Snap Pack, it could result in errors if the data type provided to the Snap does not match the data type in the Redshift table schema. Either use the Redshift driver (com.amazon.redshift.jdbc42.Driver) or use the correct data type in the input document to resolve these errors.

Snap views

Type Description Examples of upstream and downstream Snaps
Input

This Snap has at most one document input view. The input document contains map data with each document mapped to a single record by matching document field names with table column names.

Output

This Snap has at most one document output view.

Learn more about Error handling.
Warning: Database Write Snaps output all records of a batch (as configured in your account settings) to the error view if the write fails during batch processing.

Examples

  1. Update Records with Conditional Expressions: Update records with conditional expressions and logic

Snap settings

Note: Learn about the common controls in the Snap settings dialog.
Field/Field set Description

Label

String

Required.

Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline.

Default value: Redshift - Update

Example: Redshift - Update

Schema name

String/Expression/ Suggestion

Specify the database schema name. Leave this field blank to use the default schema.

Default value: N/A

Example: public

Warning: The values can be passed using the pipeline parameters but not the upstream parameter.

Table name

String/Expression/ Suggestion

Required.

Specify the name of the table to update.

Default value: N/A

Example: employees

Warning: The values can be passed using the pipeline parameters but not the upstream parameter.

Update condition

String/Expression/ Suggestion

Specify the SQL WHERE clause of the update statement. You can define specific values or columns to update (Set condition) in the upstream Snap, such as Mapper Snap, and then use the WHERE clause to apply these conditions on the columns sourced from the upstream Snap. For instance, here is a sample of an Update SQL query:

UPDATE table_name
SET column1 = value1, column2 = value2,
WHERE condition;

If the Update Condition field is left blank, the condition is applied on all the records of the target table.

In certain scenarios where you want to use specific data from the upstream Snaps, and do not want to change that data in the Update Snap, then you need to place the data in the original structure of the input document. Hence, we recommend that you use the format for Update condition and input data format as shown below:

{
"valueToUpdate" : "true",
"original": {
"col1" : "KEY"
}
}
Important:
  • Instead of building multiple Snaps with inter dependent DML queries, it is recommended to use the Multi Execute Snap.
  • In a scenario where the downstream Snap does depends on the data processed on an Upstream Database Bulk Load Snap, use the Script Snap to add delay for the data to be available.

Refer to the example to understand how to use the Update Condition.

Default value: N/A

Example:

Without using expressions

Using expressions

  • "EMPNO=$EMPNO and ENAME=$EMPNAME"
  • email = $email
  • emp=$emp
  • "emp='" + $emp + "'"
  • "EMPNO=" + $EMPNO + " and ENAME='" + $EMPNAME+ "'"
CAUTION: Using expressions that join strings together to create SQL queries or conditions has a potential SQL injection risk and is hence unsafe. Ensure that you understand all implications and risks involved before using concatenation of strings with '=' Expression enabled.

Number of retries

Integer

Specify the maximum number of retry attempts the Snap must make in case of network failure.

Default value: 0

Example: 3

Retry interval (seconds)

Integer

Specify the time interval between two successive retry requests. A retry happens only when the previous attempt resulted in an exception.

Default value: 1

Example: 10

Snap execution

Dropdown list

Select one of the three modes in which the Snap executes. Available options are:

  • Validate & Execute: Performs limited execution of the Snap and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime.
  • Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.
  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Default value: Execute only

Example: Validate & Execute

Warning:

In a scenario where the Auto commit on the account is set to true, and the downstream Snap does depends on the data processed on an Upstream Database Bulk Load Snap, use the Script Snap to add delay for the data to be available.

For example, when performing a create, insert and a delete function sequentially on a pipeline, using a Script Snap helps in creating a delay between the insert and delete function or otherwise it may turn out that the delete function is triggered even before inserting the records on the table.

Troubleshooting

Error Reason Resolution
type "e" does not exist This issue occurs due to incompatibilities with the recent upgrade in the Postgres JDBC drivers. Download the latest 4.1 Amazon Redshift driver here and use this driver in your Redshift Account configuration and retry running the Pipeline.