RC File Parser

The RC File Parser Snap parses RC file data and converts them into documents that can be processed by downstream Snaps.

Overview

This Snap parses RC file data and converts them into documents that can be processed by downstream Snaps.

Snap views

Input/Output Type of View Examples of Upstream and Downstream Snaps
Input This Snap has exactly one binary input view. The upstream Snap should be a binary data source Snap sourcing an RC File from some data store.
Output This Snap has exactly one document output view. The RC File Parser outputs table data with columns and rows. The downstream Snap should be able to parse this information.
Error This Snap has at most one document error view and produces zero or more documents in the view.

Supported Accounts

Accounts are not used with this Snap.

Snap settings

Note: Learn about the common controls in the Snap settings dialog.
Field Name Description
Label*

String

Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline.

Default value: RC File Parser

Example: RC File Parser

Hive Metastore URL

String/Expression

Hive Metastore URI, such as: thrift://localhost:9083

Default value: [None]

Example: thrift://hive.metastore.com:9083

Database

String/Expression/ Suggestion

Database which holds the schema for the incoming RC File data.

Default value: [None]

Example: hive_db

Table

String/Expression/ Suggestion

Table whose schema should be used for parsing the incoming RC file data.

Default value: [None]

Example: hive_tbl

Column definition

Manually configure the column definition for the incoming RC File data.

  • Column Name: Name of the column.
  • Column Type: Data type of the column (string, int, etc.).

Default value: [None]

Example:

Snap Execution

Dropdown list

Select one of the following three modes in which the Snap executes:

  • Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime.
  • Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.
  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Default value: Execute only

Example: Validate & Execute

Troubleshooting

Writing to S3 files with HDFS version CDH 5.8 or later

When running HDFS version later than CDH 5.8, the Hadoop Snap Pack may fail to write to S3 files. To overcome this, make the following changes in the Cloudera manager:

  1. Go to HDFS configuration.
  2. In Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml, add an entry with the following details:
    • Name: fs.s3a.threads.max
    • Value: 15
  3. Click Save.
  4. Restart all the nodes.
  5. Under Restart Stale Services, select Re-deploy client configuration.
  6. Click Restart Now.