Redshift - Unload
Overview
This Snap unloads the result of a query to one or more files on Amazon S3, using Amazon S3 server-side encryption (SSE)/server-side encryption with KMS/client-side encryption with CMK. The target Redshift table is not modified.
ETL Transformations & Data Flow
For more information, see Amazon Redshift Unload/Copy Utility.
Upon successful execution, the expected output data is as follows:
[
{
"entries": [
{
"url": "s3:///mybucket/unload_folder/test0000_part_00"
},
{
"url": "s3:///mybucket/unload_folder/test0001_part_00"
},
{
"url": "s3:///mybucket/unload_folder/test0002_part_00"
},
{
"url": "s3:///mybucket/unload_folder/test0003_part_00"
}
],
"status": "success",
"unloadQuery": "UNLOAD ('SELECT * FROM public.company') TO 's3://mybucket/unload_folder/test'
CREDENTIALS 'aws_access_key_id=;aws_secret_access_key=' MANIFEST ESCAPE ALLOWOVERWRITE PARALLEL"
}
]
Upon successful preview, the expected output data is as follows:
[
{
"entries": [
{
"url": "s3:///<bucket>/<folder>/<prefix>000n_part_00"
}
],
"status": "preview",
"unloadQuery": "UNLOAD ('SELECT * FROM public.company') TO 's3://mybucket/unload_folder/test'
CREDENTIALS 'aws_access_key_id=;aws_secret_access_key=' MANIFEST ESCAPE ALLOWOVERWRITE PARALLEL"
}
]
- This is a Read-type Snap.
Works in Ultra Tasks
Prerequisites
- A valid Redshift Account with the required permissions.
- The Redshift account should contain S3 Access-key ID, S3 Secret key, S3 Bucket and S3 Folder.
- The Redshift account security settings should allow access from the IP Address of the cloudplex or groundplex.
- The Amazon S3 bucket where Amazon Redshift will write the output files must reside in the same region as your cluster.
- The Amazon S3 bucket where Amazon Redshift should allow write access from the IP address of the cloudplex or groundplex.
Snap views
| Type | Description | Examples of upstream and downstream Snaps |
|---|---|---|
| Input |
Document |
|
| Output |
This Snap has at most one document output view. The output document contains map data:
|
|
| Learn more about Error handling. | ||
Examples
- Unload Data from Redshift to S3: Unload data from Redshift tables to Amazon S3
Snap settings
| Field/Field set | Description | ||
|---|---|---|---|
|
Label
|
Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline. Default value: Redshift - Unload Example: Redshift - Unload |
||
|
SELECT query
|
Required. Specify the SELECT query to execute. The query results will be unloaded to S3. Default value: N/A Example:
|
||
|
S3 file prefix
|
The prefix of AWS S3 file names which are used by Redshift to write data. The Snap uses S3 Bucket and S3 Folder in the RedShift Account to format the full S3 path. File names created by RedShift are in the format: s3://<bucket>/<folder>/<s3-file-prefix><slice-number>_part_<file-number>, where <s3-file-prefix> is the value of this property. The Amazon S3 bucket where Amazon Redshift will write the output files must reside in the same region as your cluster. Default value: N/A Example: test |
||
|
Delimiter
|
Required. Single ASCII character that is used to separate fields in the output file. DELIMITER will be ignored if FIXEDWIDTH is specified. If the data contains the delimiter character, you will need to specify the ESCAPE option to escape the delimiter, or use ADDQUOTES to enclose the data in double quotes. Alternatively, specify a delimiter that is not contained in the data. Default value: a pipe character ( | ) Example: a pipe character ( | ), a comma ( , ), or a tab ( \t ) |
||
|
Fixed width
|
Specifies FIXEDWIDTH spec., in which Redshift unloads the data to a file where each column width is a fixed length, rather than separated by a delimiter. The FIXEDWIDTH spec is a string that specifies the number of columns and the width of the columns. DELIMITER is ignored if FIXEDWIDTH is specified. Because FIXEDWIDTH does not truncate data, the specification for each column in the UNLOAD query needs to be at least as long as the length of the longest entry for that column. The format for fixedwidth_spec is: 'colID1:colWidth1,colID2:colWidth2, ...' Default value: N/A Example: colID1:36,colID2:36,colID3:256 |
||
|
Null as
|
Specifies a string that represents a null value in unload files. If this option is used, all output files contain the specified string in place of any null values found in the selected data. If this option is not specified, null values are unloaded as:
If a null string is specified for a fixed-width unload and the width of an output column is less than the width of the null string, the following behavior occurs:
Default value: N/A Example: null Note: Anaplan server may ignore this setting and try to auto-detect. See Anaplan for details.
|
||
|
Escape
|
If selected, for CHAR and VARCHAR columns in delimited unload files, an escape character (\) is placed before every occurrence of the following characters:
We strongly recommend that you always select this property unless you are certain that your data does not contain any delimiters or other characters that might need to be escaped. Important: If you loaded your data using a COPY with the ESCAPE option, you must also specify the ESCAPE option with your UNLOAD command to generate the reciprocal output file. Similarly, if you UNLOAD using the ESCAPE option, you will need to use ESCAPE when you COPY the same data.
Default status: Selected |
||
|
Add quotes
|
If checked, RedShift places quotation marks around each unloaded data field, so that Amazon Redshift can unload data values that contain the delimiter itself. Default status: Deselected |
||
|
Allow overwrite
|
By default, UNLOAD fails if it finds files that it would possibly overwrite. If specified, UNLOAD will overwrite existing files, including the manifest file. Default status: Deselected |
||
|
Manifest
|
Creates a manifest file that explicitly lists the data files that are created by the UNLOAD process. The manifest is a text file in JSON format that lists the URL of each file that was written to Amazon S3. The manifest file is written to the same Amazon S3 path prefix as the unload files in the format <s3_path_prefix>manifest. For example, if the unload file S3 path prefix is 's3://mybucket/myfolder/venue_', the manifest file location will be 's3://mybucket/myfolder/venue_manifest'. Since this Snap needs to read the content of the manifest file for the output document data, it always includes MANIFEST option in the query. After reading the content of the manifest file, the Snap deletes the manifest file if this property is un-selected. Default status: Deselected |
||
|
Parallel
|
By default, UNLOAD writes data in parallel to multiple files, according to the number of slices in the cluster. If unchecked, UNLOAD writes to one or more data files serially, sorted absolutely according to the ORDER BY clause, if one is used. The maximum size for a data file is 6.2 GB. Therefore, if the unload data is larger than 6.2 GB, UNLOAD will create more than one file. The UNLOAD command is designed to use parallel processing. We recommend leaving this property selected for most cases. Default status: Selected |
||
|
Gzip
|
Unloads data to one or more gzip-compressed file per slice. Each resulting file is appended with a .gz extension. Default status: Deselected |
||
|
Client-side Encryption
|
Specifies the Amazon S3 client-side encryption type for the output files on Amazon S3. UNLOAD by default creates encrypted files using Amazon S3 server-side encryption with AWS-managed encryption keys (SSE). UNLOAD does not support Amazon S3 server-side encryption with a customer-supplied key (SSE-C). To unload to encrypted gzip-compressed files, check the GZIP property. If selected, provide the customer master key in the Master symmetric key property. Default status: Deselected |
||
|
IAM role
|
Check this property if the bulk load/unload has to be done using IAM role. If checked, ensure the properties (AWS account ID, role name and region name) are provided in the account. Default value: N/A |
||
|
KMS Encryption type
|
Specifies the type of KMS S3 encryption to be used on the data. The available encryption options are:
Default value: None Note: If both the KMS and Client-side encryption types are selected, the Snap gives precedence to the SSE, and displays an error prompting the user to select either of the options only.
|
||
|
Master symmetric key
|
Conditional. This property applies only when the Client-side Encryption property is selected. This is the customer master key for the data to be encrypted client side. For more information about this please refer to Customer Master Keys and Using Client Side Encryption. Default value: N/A |
||
|
KMS key
|
Specifies the AWS Key Management Service (KMS) key ID or ARN to be used for the S3 encryption. This is only required if the KMS Encryption type property is configured to use the encryption with KMS. For more information about the KMS key refer to AWS KMS Overview and Using Server Side Encryption. Default value: N/A |
||
|
S3 path
|
Required. Specify the S3 path (including bucket and prefix) where the unloaded data will be written. Default value: N/A Example: s3://mybucket/unload/employees_ |
||
|
Snap execution
|
Select an option to specify how the Snap must be executed. Available options are:
Default value: Validate & Execute Example: Execute only |
||
Troubleshooting
| Error | Reason | Resolution |
|---|---|---|
| type "e" does not exist | This issue occurs due to incompatibilities with the recent upgrade in the Postgres JDBC drivers. | Download the latest 4.1 Amazon Redshift driver here and use this driver in your Redshift Account configuration and retry running the Pipeline. |