Redshift - Unload

Overview

This Snap unloads the result of a query to one or more files on Amazon S3, using Amazon S3 server-side encryption (SSE)/server-side encryption with KMS/client-side encryption with CMK. The target Redshift table is not modified.

ETL Transformations & Data Flow

For more information, see Amazon Redshift Unload/Copy Utility.

Upon successful execution, the expected output data is as follows:

[
    {
        "entries": [
            {
                "url": "s3:///mybucket/unload_folder/test0000_part_00"
            },
            {
                "url": "s3:///mybucket/unload_folder/test0001_part_00"
            },
            {
                "url": "s3:///mybucket/unload_folder/test0002_part_00"
            },
            {
                "url": "s3:///mybucket/unload_folder/test0003_part_00"
            }
        ],
        "status": "success",
        "unloadQuery": "UNLOAD ('SELECT * FROM public.company') TO 's3://mybucket/unload_folder/test'
CREDENTIALS 'aws_access_key_id=;aws_secret_access_key=' MANIFEST ESCAPE ALLOWOVERWRITE PARALLEL"
    }
]

Upon successful preview, the expected output data is as follows:

[
    {
        "entries": [
            {
                "url": "s3:///<bucket>/<folder>/<prefix>000n_part_00"
            }
        ],
        "status": "preview",
        "unloadQuery": "UNLOAD ('SELECT * FROM public.company') TO 's3://mybucket/unload_folder/test'
CREDENTIALS 'aws_access_key_id=;aws_secret_access_key=' MANIFEST ESCAPE ALLOWOVERWRITE PARALLEL"
    }
]
Warning: The Snap behaves the same on a Groundplex as it does in a Cloudplex.

Prerequisites

  • A valid Redshift Account with the required permissions.
  • The Redshift account should contain S3 Access-key ID, S3 Secret key, S3 Bucket and S3 Folder.
  • The Redshift account security settings should allow access from the IP Address of the cloudplex or groundplex.
  • The Amazon S3 bucket where Amazon Redshift will write the output files must reside in the same region as your cluster.
  • The Amazon S3 bucket where Amazon Redshift should allow write access from the IP address of the cloudplex or groundplex.

Snap views

Type Description Examples of upstream and downstream Snaps
Input

Document

Output
This Snap has at most one document output view. The output document contains map data:
  • A "status" field with "success" or "preview"
  • An "unloadQuery" field with the query sent to Redshift
  • An "entries" field with a list of S3 URLs written by the Redshift UNLOAD operation
Learn more about Error handling.

Examples

  1. Unload Data from Redshift to S3: Unload data from Redshift tables to Amazon S3

Snap settings

Note: Learn about the common controls in the Snap settings dialog.
Field/Field set Description

Label

String

Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline.

Default value: Redshift - Unload

Example: Redshift - Unload

SELECT query

String/Expression

Required.

Specify the SELECT query to execute. The query results will be unloaded to S3.

Default value: N/A

Example:

SELECT * FROM employees WHERE dept = 'Engineering'

S3 file prefix

String/Expression

The prefix of AWS S3 file names which are used by Redshift to write data. The Snap uses S3 Bucket and S3 Folder in the RedShift Account to format the full S3 path. File names created by RedShift are in the format: s3://<bucket>/<folder>/<s3-file-prefix><slice-number>_part_<file-number>, where <s3-file-prefix> is the value of this property. The Amazon S3 bucket where Amazon Redshift will write the output files must reside in the same region as your cluster.

Default value: N/A

Example: test

Delimiter

String

Required.

Single ASCII character that is used to separate fields in the output file. DELIMITER will be ignored if FIXEDWIDTH is specified. If the data contains the delimiter character, you will need to specify the ESCAPE option to escape the delimiter, or use ADDQUOTES to enclose the data in double quotes. Alternatively, specify a delimiter that is not contained in the data.

Default value: a pipe character ( | )

Example: a pipe character ( | ), a comma ( , ), or a tab ( \t )

Fixed width

String/Expression

Specifies FIXEDWIDTH spec., in which Redshift unloads the data to a file where each column width is a fixed length, rather than separated by a delimiter. The FIXEDWIDTH spec is a string that specifies the number of columns and the width of the columns. DELIMITER is ignored if FIXEDWIDTH is specified. Because FIXEDWIDTH does not truncate data, the specification for each column in the UNLOAD query needs to be at least as long as the length of the longest entry for that column. The format for fixedwidth_spec is: 'colID1:colWidth1,colID2:colWidth2, ...'

Default value: N/A

Example: colID1:36,colID2:36,colID3:256

Null as

String/Expression

Specifies a string that represents a null value in unload files. If this option is used, all output files contain the specified string in place of any null values found in the selected data. If this option is not specified, null values are unloaded as:

  • Zero-length strings for delimited output
  • Whitespace strings for fixed-width output

If a null string is specified for a fixed-width unload and the width of an output column is less than the width of the null string, the following behavior occurs:

  • An empty field is output for non-character columns
  • An error is reported for character columns

Default value: N/A

Example: null

Note: Anaplan server may ignore this setting and try to auto-detect. See Anaplan for details.

Escape

Checkbox

If selected, for CHAR and VARCHAR columns in delimited unload files, an escape character (\) is placed before every occurrence of the following characters:

  • Linefeed: \n
  • Carriage return: \r
  • The delimiter character specified for the unloaded data.
  • The escape character: \
  • A quote character: " or ' (if both ESCAPE and ADDQUOTES are selected).

We strongly recommend that you always select this property unless you are certain that your data does not contain any delimiters or other characters that might need to be escaped.

Important: If you loaded your data using a COPY with the ESCAPE option, you must also specify the ESCAPE option with your UNLOAD command to generate the reciprocal output file. Similarly, if you UNLOAD using the ESCAPE option, you will need to use ESCAPE when you COPY the same data.

Default status: Selected

Add quotes

Checkbox

If checked, RedShift places quotation marks around each unloaded data field, so that Amazon Redshift can unload data values that contain the delimiter itself.

Default status: Deselected

Allow overwrite

Checkbox

By default, UNLOAD fails if it finds files that it would possibly overwrite. If specified, UNLOAD will overwrite existing files, including the manifest file.

Default status: Deselected

Manifest

Checkbox

Creates a manifest file that explicitly lists the data files that are created by the UNLOAD process. The manifest is a text file in JSON format that lists the URL of each file that was written to Amazon S3. The manifest file is written to the same Amazon S3 path prefix as the unload files in the format <s3_path_prefix>manifest. For example, if the unload file S3 path prefix is 's3://mybucket/myfolder/venue_', the manifest file location will be 's3://mybucket/myfolder/venue_manifest'. Since this Snap needs to read the content of the manifest file for the output document data, it always includes MANIFEST option in the query. After reading the content of the manifest file, the Snap deletes the manifest file if this property is un-selected.

Default status: Deselected

Parallel

Checkbox

By default, UNLOAD writes data in parallel to multiple files, according to the number of slices in the cluster. If unchecked, UNLOAD writes to one or more data files serially, sorted absolutely according to the ORDER BY clause, if one is used. The maximum size for a data file is 6.2 GB. Therefore, if the unload data is larger than 6.2 GB, UNLOAD will create more than one file. The UNLOAD command is designed to use parallel processing. We recommend leaving this property selected for most cases.

Default status: Selected

Gzip

Checkbox

Unloads data to one or more gzip-compressed file per slice. Each resulting file is appended with a .gz extension.

Default status: Deselected

Client-side Encryption

Checkbox

Specifies the Amazon S3 client-side encryption type for the output files on Amazon S3. UNLOAD by default creates encrypted files using Amazon S3 server-side encryption with AWS-managed encryption keys (SSE). UNLOAD does not support Amazon S3 server-side encryption with a customer-supplied key (SSE-C). To unload to encrypted gzip-compressed files, check the GZIP property. If selected, provide the customer master key in the Master symmetric key property.

Default status: Deselected

IAM role

Checkbox

Check this property if the bulk load/unload has to be done using IAM role. If checked, ensure the properties (AWS account ID, role name and region name) are provided in the account.

Default value: N/A

KMS Encryption type

Dropdown list

Specifies the type of KMS S3 encryption to be used on the data. The available encryption options are:

  • None - Files do not get encrypted using KMS encryption
  • Server-Side KMS Encryption - If selected, the output files on Amazon S3 are encrypted using this encryption with Amazon S3 generated KMS key.

Default value: None

Note: If both the KMS and Client-side encryption types are selected, the Snap gives precedence to the SSE, and displays an error prompting the user to select either of the options only.

Master symmetric key

String/Expression

Conditional. This property applies only when the Client-side Encryption property is selected. This is the customer master key for the data to be encrypted client side. For more information about this please refer to Customer Master Keys and Using Client Side Encryption.

Default value: N/A

KMS key

String/Expression

Specifies the AWS Key Management Service (KMS) key ID or ARN to be used for the S3 encryption. This is only required if the KMS Encryption type property is configured to use the encryption with KMS. For more information about the KMS key refer to AWS KMS Overview and Using Server Side Encryption.

Default value: N/A

S3 path

String/Expression

Required.

Specify the S3 path (including bucket and prefix) where the unloaded data will be written.

Default value: N/A

Example: s3://mybucket/unload/employees_

Snap execution

Dropdown list

Select an option to specify how the Snap must be executed. Available options are:

  • Validate & Execute: Performs limited execution of the Snap (up to 50 records) during Pipeline validation; performs full execution of the Snap (unlimited records) during Pipeline execution.
  • Execute only: Performs full execution of the Snap during Pipeline execution; does not execute the Snap during Pipeline validation.
  • Disabled: Disables the Snap and, by extension, its downstream Snaps.

Default value: Validate & Execute

Example: Execute only

Troubleshooting

Error Reason Resolution
type "e" does not exist This issue occurs due to incompatibilities with the recent upgrade in the Postgres JDBC drivers. Download the latest 4.1 Amazon Redshift driver here and use this driver in your Redshift Account configuration and retry running the Pipeline.