RC File Parser
The RC File Parser Snap parses RC file data and converts them into documents that can be processed by downstream Snaps.
Overview
This Snap parses RC file data and converts them into documents that can be processed by downstream Snaps.
- This is a Parse-type Snap.
Works in Ultra Tasks
Snap views
| Input/Output | Type of View | Examples of Upstream and Downstream Snaps |
|---|---|---|
| Input | This Snap has exactly one binary input view. | The upstream Snap should be a binary data source Snap sourcing an RC File from some data store. |
| Output | This Snap has exactly one document output view. | The RC File Parser outputs table data with columns and rows. The downstream Snap should be able to parse this information. |
| Error | This Snap has at most one document error view and produces zero or more documents in the view. | |
Supported Accounts
Accounts are not used with this Snap.
Snap settings
| Field Name | Description |
|---|---|
| Label* String |
Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline. Default value: RC File Parser Example: RC File Parser |
| Hive Metastore URL String/Expression |
Hive Metastore URI, such as: thrift://localhost:9083 Default value: [None] Example: thrift://hive.metastore.com:9083 |
| Database String/Expression/ Suggestion |
Database which holds the schema for the incoming RC File data. Default value: [None] Example: hive_db |
| Table String/Expression/ Suggestion |
Table whose schema should be used for parsing the incoming RC file data. Default value: [None] Example: hive_tbl |
| Column definition |
Manually configure the column definition for the incoming RC File data.
Default value: [None] Example: |
| Snap Execution
|
Select one of the following three modes in which the Snap executes:
Default value: Execute only Example: Validate & Execute |
Troubleshooting
Writing to S3 files with HDFS version CDH 5.8 or later
When running HDFS version later than CDH 5.8, the Hadoop Snap Pack may fail to write to S3 files. To overcome this, make the following changes in the Cloudera manager:
- Go to HDFS configuration.
- In Cluster-wide Advanced Configuration Snippet (Safety Valve) for
core-site.xml, add an entry with the following details:
- Name: fs.s3a.threads.max
- Value: 15
- Click Save.
- Restart all the nodes.
- Under Restart Stale Services, select Re-deploy client configuration.
- Click Restart Now.