Redshift - Table List

Overview

This Snap outputs a list of tables in a database. The Snap will connect to the database, read its metadata, and output a document for each table found in the database. The table names are output in a topological order so that tables with the fewest dependencies are output first. In other words, if table A has a foreign key reference to table B, then table B will be output before A. The ordering is intended to ease the process of replicating a group of tables from one database to another.

Expected input: [None]

Expected output: Documents with the following fields:

name - The fully-qualified name of the table. To use the table name in another Snap, like a Select or Insert, you can pass it through a ForEach Snap to another pipeline with the Select or Insert.
type - The type of table. This value is currently fixed to the string "TABLE".
dependents - (If Compute table graph is selected) A list of table names that have references to this table, including this table.

Replicating a Subset of Tables

The output of the Table List Snap can be directly used to replicate an entire database. However, if you are only interested in a subset of tables, you can use a Filter Snap to select the table names you are interested in as well as the tables that they reference. For example, given the following diamond-shaped table graph where A depends on B and C and they both depend on D:

The Table List will output the following documents:

name=D; dependents=[A, B, C, D]
name=C; dependents=[A, C]
name=B; dependents=[A, B]
name=A; dependents=[A]

So, if you wanted to copy just table 'A' and its dependencies, you can add a Filter Snap with the following expression:

$.dependents.indexOf('A') != -1

The filter will then remove any extra tables that happen to be in the schema.

ETL Transformations & Data Flow

This snap is a data source. It works by performing a standard JDBC DatabaseMetaData#getTables() query for the tables within the database. The Schema name value is used to populate the schemaPattern parameter in that query.

This snap does not require any temporary files or other external resources.

This is a Read-type Snap.
Works in Ultra Tasks

Prerequisites

The Redshift account does need to specify the Endpoint, Database name, Username, and Password.
The Redshift account does not need to specify the S3 Access-key ID, S3 Secret key, S3 Bucket, and S3 Folder.
The Redshift account security settings does need to allow access from the IP Address of the cloudplex or groundplex.

Known Issues

If you use the PostgreSQL driver (org.postgresql.Driver) with the Redshift Snap Pack, it could result in errors if the data type provided to the Snap does not match the data type in the Redshift table schema. Either use the Redshift driver (com.amazon.redshift.jdbc42.Driver) or use the correct data type in the input document to resolve these errors.

Snap views


Type	Description	Examples of upstream and downstream Snaps
Input	Document All input documents are ignored.
Output	This Snap has exactly one document output view. The output document contains: A "name" field containing the fully-qualified name of the table. A "type" field containing the type of table (currently limited to "TABLE"). An optional "dependents" field containing a list of tables with foreign references to this table. Output Examples: Example output upon successful execution without Compute table graph selected: `[ { "name": "\"demo\".\"'redshiftbulkload'\"", "type": "TABLE" }, { "name": "\"demo\".\"'shankaradp'\"", "type": "TABLE" }, { "name": "\"demo\".\"'shankaradp1'\"", "type": "TABLE" }, { "name": "\"demo\".\"'shankardemo'\"", "type": "TABLE" }, { "name": "\"demo\".\"'shankardemo1'\"", "type": "TABLE" } ]` Example output upon successful execution with Compute table graph selected: `[ { "name": "\"demo\".\"account\"", "type": "TABLE", "dependents": [ "\"demo\".\"account\"" ] }, { "name": "\"demo\".\"account_transaction\"", "type": "TABLE", "dependents": [ "\"demo\".\"account_transaction\"" ] }, { "name": "\"demo\".\"accounts_oy\"", "type": "TABLE", "dependents": [ "\"demo\".\"accounts_oy\"" ] } ]`	Mapper JSON Formatter File Writer
Learn more about Error handling.

Examples

Examples for this Snap are coming soon.

Snap settings

Note: Learn about the common controls in the Snap settings dialog.


Field/Field set	Description
Label `String`	Required. Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline. Default value: Redshift - Table List Example: Redshift - Table List
Schema name `String/Expression/ Suggestion`	The database schema name. In case it is not defined, then the suggestion for the table name will retrieve all tables names of all schemas. The property is suggestible and will retrieve available database schemas during suggest values. Note: This field appears to support expressions but does not. Default value: [None] Example: test
Compute table graph `Checkbox`	Computes the dependents among tables and returns each table with a list of tables it has foreign key references to. The ordering of outputted tables is from least dependent to most-dependent. Note: Turning on this option will significantly slow down the Snap; it should be left as off unless you need it. Default value: Not selected
Snap execution `Dropdown list`	Select an option to specify how the Snap must be executed. Available options are: Validate & Execute: Performs limited execution of the Snap (up to 50 records) during Pipeline validation; performs full execution of the Snap (unlimited records) during Pipeline execution. Execute only: Performs full execution of the Snap during Pipeline execution; does not execute the Snap during Pipeline validation. Disabled: Disables the Snap and, by extension, its downstream Snaps. Default value: Validate & Execute Example: Execute only

Troubleshooting


Error	Reason	Resolution
type "e" does not exist	This issue occurs due to incompatibilities with the recent upgrade in the Postgres JDBC drivers.	Download the latest 4.1 Amazon Redshift driver here and use this driver in your Redshift Account configuration and retry running the Pipeline.