Unload Data from Redshift to S3

Overview

This example demonstrates how to use the Redshift Unload Snap to export data from Redshift tables to Amazon S3 using the UNLOAD command. The UNLOAD operation efficiently exports large datasets in parallel and can optionally compress and encrypt the output files.

Prerequisites

  • Amazon Redshift cluster with appropriate access
  • Valid Redshift account configured in SnapLogic
  • Amazon S3 bucket with write permissions
  • AWS IAM role or credentials with S3 write access
  • Source table or SELECT query with data to export

Pipeline Overview

The pipeline uses the Redshift Unload Snap to export query results from Redshift directly to S3. The Snap generates an UNLOAD command that leverages Redshift's parallel processing to write data efficiently to multiple files in S3.

Basic Unload Configuration

Redshift Unload Snap - Settings:

  • SELECT statement: Query to retrieve data to unload
  • S3 URL: Target S3 path (e.g., s3://mybucket/myprefix/)
  • AWS access key ID: AWS credentials for S3 access
  • AWS secret access key: AWS secret key (or use IAM role)
  • Additional options: GZIP, MANIFEST, DELIMITER, etc.

Redshift Unload Configuration

UNLOAD Command Format

The Snap generates an UNLOAD command with the following structure:

UNLOAD ('SELECT * FROM table_name WHERE condition')
TO 's3://bucket-name/prefix/filename'
IAM_ROLE 'arn:aws:iam::account-id:role/role-name'
[options];

Example with options:

UNLOAD ('SELECT customer_id, name, email FROM customers WHERE status = ''active''')
TO 's3://my-data-bucket/exports/customers_'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftUnloadRole'
GZIP
MANIFEST
ALLOWOVERWRITE
PARALLEL ON;

UNLOAD Options

The Redshift Unload Snap supports various options to control the export:

  • GZIP: Compress output files using gzip compression
  • MANIFEST: Create a manifest file listing all data files created
  • DELIMITER: Specify field delimiter (default is pipe |)
  • ALLOWOVERWRITE: Overwrite existing files in S3
  • PARALLEL: Control parallel export (ON by default)
  • ADDQUOTES: Enclose strings in double quotes
  • NULL AS: Specify string to represent NULL values
  • ESCAPE: Enable escape characters for special characters
  • ENCRYPTED: Encrypt files using S3 server-side encryption

Authentication Methods

The Snap supports multiple authentication approaches:

  • IAM Role: Recommended - uses IAM role attached to Redshift cluster
  • Access Keys: Provide AWS access key ID and secret access key
  • Temporary Credentials: Use temporary security credentials with session token
Note:

Using IAM roles is the preferred method as it eliminates the need to manage access keys and provides better security through automatic credential rotation.

Advanced Pipeline Pattern

Pipeline: Redshift Unload → JSON Splitter → Redshift Bulk Upsert

This advanced pattern exports data to S3, retrieves the manifest, and optionally processes the exported files:


Unload with Splitter Pipeline

Redshift Unload Output:

The Snap outputs a document containing:

  • S3 URLs of the exported files
  • Manifest file location (if MANIFEST option enabled)
  • Number of rows unloaded

Unload Output Document

JSON Splitter Configuration:

Split the manifest to process individual file references:


Splitter Configuration

Processing Unloaded Files:

Use the file references for downstream processing:


Processing Exported Files

Processing Output

File Naming and Partitioning

Redshift automatically creates multiple files when unloading:

  • Files are named with numeric suffixes: prefix000, prefix001, prefix002, etc.
  • The number of files depends on data size and cluster configuration
  • Each slice in the Redshift cluster writes to a separate file
  • Use PARALLEL OFF to create a single output file (slower)

Example file structure:

s3://my-bucket/exports/customers_000.gz
s3://my-bucket/exports/customers_001.gz
s3://my-bucket/exports/customers_002.gz
s3://my-bucket/exports/customers_manifest

Use Cases

This pattern is useful for:

  • Exporting large datasets from Redshift for archival or backup
  • Data migration to other systems or platforms
  • Creating data extracts for downstream analytics
  • Sharing data with external parties via S3
  • Offloading historical data to reduce storage costs
  • Creating snapshots for reporting or compliance
  • Implementing data lake ingestion workflows

Best Practices

  • Use IAM roles instead of access keys for better security
  • Enable GZIP compression to reduce S3 storage costs and transfer time
  • Use MANIFEST option to track all exported files
  • Include timestamps or dates in S3 prefixes for organization
  • Use ALLOWOVERWRITE carefully to avoid accidental data loss
  • Consider MAXFILESIZE option to control individual file sizes
  • Test UNLOAD queries with LIMIT before exporting full datasets
  • Monitor S3 costs when unloading large volumes of data
  • Use PARALLEL OFF only when single-file output is required
  • Enable S3 server-side encryption for sensitive data
  • Set appropriate S3 lifecycle policies for exported data