Use Parquet Writer with second input view for table metadata

This example pipeline demonstrates how to use the second input view in the Parquet Writer Snap to receive table metadata, which overrides other schema settings such as the schema in the Edit Schema box or Hive Metastore properties.

Download this pipeline.

  1. Enable the second input view on the Parquet Writer Snap.

    When you enable the second input view, the Snap overrides other schema settings and only accepts the schema from the second input view.

  2. Configure a Snap to provide table metadata to the second input view.

    Use a Snap such as Hive Execute or Catalog Query to retrieve table metadata and schema information.

    • Connect the metadata output from this Snap to the second input view of the Parquet Writer.
    • The metadata should include column names, data types, and other schema information.
  3. Configure the primary data input for the Parquet Writer.

    Connect your data source to the first input view of the Parquet Writer.

  4. Configure the Parquet Writer Snap settings.
    • Directory: Enter the output directory path (HDFS or S3).
    • Filename: Specify the output Parquet filename.
    • The schema from the second input view will be used automatically, overriding any Hive Metastore URL settings.

    The Snap writes the Parquet file using the schema provided through the second input view.

When the second input view is enabled, the Parquet Writer uses only the schema from that input, ignoring:

  • Schema defined in the Edit Schema box
  • Hive Metastore-related properties
  • Other schema configuration settings

When the second input view is disabled, the Snap receives the schema from the Hive Metastore URL field or Edit Schema settings.

To successfully reuse pipelines:
  1. Download and import the pipeline into SnapLogic.
  2. Configure Snap accounts as applicable.
  3. Provide pipeline parameters as applicable.