RC File Formatter
The RC File Formatter Snap formats incoming documents from upstream Snaps to the RC (Row columnar) file format.
Overview
This Snap formats the incoming document from the upstream Snaps to the RC (Row columnar) file format used for storing data in an optimized way to answer aggregate queries faster.
- This is a Format-type Snap.
Works in Ultra Tasks
Snap views
| Input/Output | Type of View | Examples of Upstream and Downstream Snaps |
|---|---|---|
| Input | Document | This Snap has at most one document input view. The upstream Snap should output table oriented data with columns and rows. |
| Output | Document | The RC File Formatter Snap outputs binary data, so the downstream Snap must be a data output Snap, for example, HDFS Writer. |
| Error | This Snap has at most one document error view and produces zero or more documents in the view. | |
Supported Accounts
Accounts are not used with this Snap.
Snap settings
| Field Name | Description |
|---|---|
| Label* Default value: RC File Formatter Example: RC File Formatter Type: String |
Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline. |
| Hive Metastore URL Default value: [None] Example: thrift://hive.metastore.com:9083 Type: String/Expression |
Hive Metastore URI, such as: thrift://localhost:9083 |
| Database Default value: [None] Example: hive_db Type: String/Expression/ Suggestion |
Database which holds the schema for the outgoing RC File data. |
| Table Default value: [None] Example: hive_tbl Type: String/Expression/ Suggestion |
Table whose schema should be used for parsing the outgoing RC file data. |
| Column paths* Default value: [None] Example: Column Name: Fun Column Path: $column_from_input_data Column Type: string Type: Table |
Required. Paths where the column values appear in the document.
|
| Snap Execution Default value: Execute only Example: Validate & Execute Type: Dropdown list |
Select one of the following three modes in which the Snap executes:
|
Troubleshooting
Writing to S3 files with HDFS version CDH 5.8 or later
When running HDFS version later than CDH 5.8, the Hadoop Snap Pack may fail to write to S3 files. To overcome this, make the following changes in the Cloudera manager:
- Go to HDFS configuration.
- In Cluster-wide Advanced Configuration Snippet (Safety Valve) for
core-site.xml, add an entry with the following details:
- Name: fs.s3a.threads.max
- Value: 15
- Click Save.
- Restart all the nodes.
- Under Restart Stale Services, select Re-deploy client configuration.
- Click Restart Now.