Redshift - Select
Overview
This Snap allows you to fetch data from a database by providing a table name and configuring the connection. The Snap produces the records from the database on its output view which can then be processed by a downstream Snap.
ETL Transformations & Data Flow
This Snap enables the following ETL operations:
- Fetch data from an existing Redshift table using the user configuration, and feed it to downstream Snaps.
- JSON paths can be used in a query and will have values from an incoming document substituted into the query. However, documents missing values for a given JSON path will be written to the Snap's error view. After a query is executed, the query's results are merged into the incoming document overwriting any existing keys' values. The original document is output if there are no results from the query.
Queries produced by the Snap have an equivalent format:
SELECT * FROM [table] WHERE [where clause] ORDER BY [ordering] LIMIT [limit] OFFSET [offset]
If more powerful functionality is desired, then the Execute Snap should be used.
- This is a Read-type Snap.
Works in Ultra Tasks
Prerequisites
A valid Redshift Account with the required permissions.
Limitations
If you use the PostgreSQL driver (org.postgresql.Driver) with the Redshift Snap Pack, it could result in errors if the data type provided to the Snap does not match the data type in the Redshift table schema. Either use the Redshift driver (com.amazon.redshift.jdbc42.Driver) or use the correct data type in the input document to resolve these errors.
Behavior Change
Starting with version main22460, in the Redshift Select Snap:
- When you create a table in Redshift, by default, all column names are displayed in lowercase in the output.
- When you enter column names in uppercase in the Output Field property, the column names are displayed in lowercase in the output.
Snap views
| Type | Description | Examples of upstream and downstream Snaps |
|---|---|---|
| Input |
Document If the input view is defined, the WHERE clause can substitute incoming values for a given expression (to use the Snap as a lookup). |
|
| Output |
This Snap has one output view by default and produces one document for each row in the table. A second view can be added to output the metadata for the table as a document. |
|
| Learn more about Error handling. | ||
Examples
- Select Data with Pagination and Ordering: Select data with pagination and ordering options
- Query Redshift Tables with WHERE Clause: Query Redshift tables using WHERE clause conditions
Snap settings
| Field/Field set | Description | ||
|---|---|---|---|
|
Label
|
Required. Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline. Redshift - Select Default value: Example: Redshift - Select |
||
|
Schema name
|
The database schema name. Selecting a schema filters the Table name list to show only those tables within the selected schema. The property is suggestible and will retrieve available database schemas during suggest values. [None] Default value: Example: public |
||
|
Table name
|
Required. Name of table to execute select on. Example: people |
||
|
Where clause
|
Where clause of select statement. This supports document value substitution (such as $person.firstname will be substituted with the value found in the incoming document at the path). However, you may not use a value substitution after "IS" or "is" word. Please see the examples below: [None] Default value: Example:
|
||
|
Order by: Column names
|
Enter in the columns in the order in which you want to order by. The default database sort order will be used. Example:
|
||
|
Limit offset
|
Starting row for the query. Default value: [None] Example: 0 |
||
|
Limit rows
|
Number of rows to return from the query. Default value: [None] Example: 10 |
||
|
Output fields
|
Enter or select output field names for SQL SELECT statement. To select all fields, leave it at default. Default value: [None] Example: email, address, first, last, etc. |
||
|
Fetch Output Fields In Schema
|
Select this check box to include only the selected fields or columns in the Output Schema (second output view). If you do not provide any Output fields, all the columns are visible in the output. If you provide output fields, we recommend you to select Fetch Output Fields In Schema check box. Default value: Not selected |
||
|
Pass through
|
If checked, the input document will be passed through to the output view under the key 'original'. Default value: Selected |
||
|
Ignore empty result
|
If selected, no document will be written to the output view when a SELECT operation does not produce any result. If this property is not selected and the Pass through property is selected, the input document will be passed through to the output view. Default value: Not selected |
||
|
Auto commit
|
Select one of the options for this property to override the state of the Auto commit property on the account. The Auto commit at the Snap-level has three values: True, False, and Use account setting. The expected functionality for these modes are:
Default value: False |
||
|
Number of retries
|
Specifies the maximum number of attempts to be made to receive a response. The request is terminated if the attempts do not result in a response. If the value is larger than 0, the Snap first downloads the target file into a temporary local file. If any error occurs during the download, the Snap waits for the time specified in the Retry interval and attempts to download the file again from the beginning. When the download is successful, the Snap streams the data from the temporary file to the downstream Pipeline. All temporary local files are deleted when they are no longer needed. Ensure that the local drive has sufficient free disk space to store the temporary local file. Default value: 0 Example: 3 |
||
|
Retry interval (seconds)
|
Specifies the time interval between two successive retry requests. A retry happens only when the previous attempt resulted in an exception. Default value: 1 Example: 10 |
||
|
Match data types
|
Conditional. This property applies only when the Output fields property is provided with any field value(s). If this property is selected, the Snap tries to match the output data types same as when the Output fields property is empty (SELECT * FROM ...). The output preview would be in the same format as the one when SELECT * FROM is implied and all the contents of the table are displayed. Default value: Not selected |
||
|
Staging mode
|
Required. when the value in the Number of retries field is greater than 0. Specify the location from the following options to store input documents between retries:
|
||
|
Snap execution
|
Select an option to specify how the Snap must be executed. Available options are:
Default value: Validate & Execute Example: Execute only |
||
Troubleshooting
| Error | Reason | Resolution |
|---|---|---|
| type "e" does not exist | This issue occurs due to incompatibilities with the recent upgrade in the Postgres JDBC drivers. | Download the latest 4.1 Amazon Redshift driver here and use this driver in your Redshift Account configuration and retry running the Pipeline. |