Write ORC files to HDFS and S3

This example demonstrates how to configure the ORC Writer Snap to write ORC files to both local HDFS instances and S3 instances using Hive Metastore for schema definition.

  1. Configure the ORC Writer Snap to write to a local HDFS instance.
    • Directory: Enter the HDFS directory path where the ORC file should be written (for example, /tmp/orc-output).
    • Hive Metastore URL: Specify the Hive Metastore URL to read the schema.
    • Database Name: Enter the database name (for example, masterdb).
    • Table Name: Enter the table name to read the schema from (for example, employee_orc).

    The Snap uses the Hive Metastore to read the schema from the specified table and writes the ORC file to the HDFS directory.

  2. Configure the ORC Writer Snap to write to an S3 instance.
    • Directory: Enter the S3 path where the ORC file should be written (for example, s3://bucket-name/orc-output).
    • Hive Metastore URL: Specify the Hive Metastore URL for schema definition.
    • Database Name: Enter the database name.
    • Table Name: Enter the table name to read the schema from.

    The Snap uses the Hive Metastore to read the schema and writes the ORC file to the specified S3 location.

The ORC Writer Snap successfully writes the ORC file to either HDFS or S3 storage using the schema defined in the Hive Metastore.