Advanced Use Case: File Reader to Redshift Bulk Load
Overview
This advanced example demonstrates how to use the Redshift Bulk Load Snap in a typical enterprise ETL scenario. The pipeline reads data from a CSV file, parses it, and loads it into a Redshift table. If the table does not exist, the Snap creates it with the appropriate schema based on the input data structure.
Prerequisites
- CSV file accessible to the File Reader Snap (local, SLDB, or remote location)
- Amazon Redshift cluster with appropriate access
- S3 bucket configured in Redshift account for staging files
- Valid Redshift account configured in SnapLogic
Pipeline Flow
The pipeline performs the following ETL operations:
- Extract: File Reader Snap reads the data from the CSV file and passes it to the CSV Parser
- Transform: CSV Parser performs parsing of the data, converting CSV rows into structured documents
- Load: Redshift Bulk Load Snap loads the entire dataset into the Redshift table, creating a new table with the provided schema if it doesn't exist
Pipeline

Snap Configuration
File Reader Snap:
- File: Path to the CSV file (for example,
/path/to/data.csv) - Read mode: Binary or Text (depending on file encoding)
- Reads the entire file content and passes it downstream
CSV Parser Snap:
- First row as header: Selected (if CSV has headers)
- Delimiter: Comma (or custom delimiter)
- Quote character: Double quote (default)
- Parses CSV data into structured JSON documents with field names and values
Redshift Bulk Load Snap:
- Schema name: Specify target schema (for example,
public) - Table name: Specify target table name
- Create table if not present: Selected
- Data Source: Input view
- Validate input data: Selected (recommended for initial loads)
- The Snap stages data to S3 and uses the COPY command for efficient bulk loading
Table Creation Behavior
When Create table if not present is selected:
- The Snap analyzes the first input document to determine column names and data types
- A CREATE TABLE statement is automatically generated
- The table is created in Redshift with appropriate column definitions
- Data types are inferred from the CSV Parser output:
- Numeric values → INTEGER or DECIMAL
- Date/time strings → TIMESTAMP or DATE
- Text values → VARCHAR with appropriate length
If you need precise control over data types, consider:
- Pre-creating the table in Redshift with exact column definitions
- Using a Mapper Snap to explicitly cast data types before the Bulk Load Snap
- Providing schema metadata through the second input view of the Bulk Load Snap
Output
The Snap outputs a document showing:
- Status: Success or failure of the bulk load operation
- Records loaded: Total count of successfully loaded records
- Failed records: Count of records that failed to load (if any)
- Table created: Confirmation if a new table was created
Example output:
{
"status": "Success",
"recordsLoaded": 10000,
"failedRecords": 0,
"tableCreated": true,
"tableName": "public.sales_data"
}
Performance Considerations
- File size: For large CSV files (GB+), consider splitting the file or using parallelism settings
- S3 staging: Data is temporarily staged in S3 before loading, ensure sufficient S3 storage
- Compression: Enable compression (default) for faster S3 uploads unless CPU is constrained
- Batch processing: For multiple files, use a Directory Browser Snap to iterate and load each file
Error Handling
Configure error handling to manage data quality issues:
- Maximum error count: Set threshold for acceptable failures (default: 100)
- Validate input data: Enable to catch data type mismatches early
- Accept invalid characters: Enable to handle UTF-8 encoding issues
- Connect the error view to capture and log failed records
Use Cases
This pipeline pattern is useful for:
- Daily batch loading of CSV exports from other systems
- Migrating historical data from flat files to Redshift
- Loading periodic reports or data feeds
- Initial data warehouse population from legacy systems
- ETL processes that receive data as file drops