Redshift - Table List
Overview
This Snap outputs a list of tables in a database. The Snap will connect to the database, read its metadata, and output a document for each table found in the database. The table names are output in a topological order so that tables with the fewest dependencies are output first. In other words, if table A has a foreign key reference to table B, then table B will be output before A. The ordering is intended to ease the process of replicating a group of tables from one database to another.
Expected input: [None]
Expected output: Documents with the following fields:
- name - The fully-qualified name of the table. To use the table name in another Snap, like a Select or Insert, you can pass it through a ForEach Snap to another pipeline with the Select or Insert.
- type - The type of table. This value is currently fixed to the string "TABLE".
- dependents - (If Compute table graph is selected) A list of table names that have references to this table, including this table.
Replicating a Subset of Tables
The output of the Table List Snap can be directly used to replicate an entire database. However, if you are only interested in a subset of tables, you can use a Filter Snap to select the table names you are interested in as well as the tables that they reference. For example, given the following diamond-shaped table graph where A depends on B and C and they both depend on D:
A
/ \
B C
\ /
D
The Table List will output the following documents:
name=D; dependents=[A, B, C, D]
name=C; dependents=[A, C]
name=B; dependents=[A, B]
name=A; dependents=[A]
So, if you wanted to copy just table 'A' and its dependencies, you can add a Filter Snap with the following expression:
$.dependents.indexOf('A') != -1
The filter will then remove any extra tables that happen to be in the schema.
ETL Transformations & Data Flow
This snap is a data source. It works by performing a standard JDBC DatabaseMetaData#getTables() query for the tables within the database. The Schema name value is used to populate the schemaPattern parameter in that query.
This snap does not require any temporary files or other external resources.
- This is a Read-type Snap.
Works in Ultra Tasks
Prerequisites
- The Redshift account does need to specify the Endpoint, Database name, Username, and Password.
- The Redshift account does not need to specify the S3 Access-key ID, S3 Secret key, S3 Bucket, and S3 Folder.
- The Redshift account security settings does need to allow access from the IP Address of the cloudplex or groundplex.
Known Issues
If you use the PostgreSQL driver (org.postgresql.Driver) with the Redshift Snap Pack, it could result in errors if the data type provided to the Snap does not match the data type in the Redshift table schema. Either use the Redshift driver (com.amazon.redshift.jdbc42.Driver) or use the correct data type in the input document to resolve these errors.
Snap views
| Type | Description | Examples of upstream and downstream Snaps |
|---|---|---|
| Input |
Document All input documents are ignored. |
|
| Output |
This Snap has exactly one document output view. The output document contains:
Output Examples: Example output upon successful execution without Compute table graph selected:
Example output upon successful execution with Compute table graph selected:
|
|
| Learn more about Error handling. | ||
Examples
Examples for this Snap are coming soon.
Snap settings
| Field/Field set | Description | ||
|---|---|---|---|
|
Label
|
Required. Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline. Default value: Redshift - Table List Example: Redshift - Table List |
||
|
Schema name
|
The database schema name. In case it is not defined, then the suggestion for the table name will retrieve all tables names of all schemas. The property is suggestible and will retrieve available database schemas during suggest values. Note: This field appears to support expressions but does not.
Default value: [None] Example: test |
||
|
Compute table graph
|
Computes the dependents among tables and returns each table with a list of tables it has foreign key references to. The ordering of outputted tables is from least dependent to most-dependent. Note: Turning on this option will significantly slow down the Snap; it should be left as off unless you need it.
Default value: Not selected |
||
|
Snap execution
|
Select an option to specify how the Snap must be executed. Available options are:
Default value: Validate & Execute Example: Execute only |
||
Troubleshooting
| Error | Reason | Resolution |
|---|---|---|
| type "e" does not exist | This issue occurs due to incompatibilities with the recent upgrade in the Postgres JDBC drivers. | Download the latest 4.1 Amazon Redshift driver here and use this driver in your Redshift Account configuration and retry running the Pipeline. |