Hadoop Snap Pack examples
| Example | Snaps used |
|---|---|
|
List directory contents and verify file creation in HDFS This example pipeline demonstrates how to use the Hadoop Directory Browser Snap to list the contents of a Hadoop file system directory, save the directory listing as a file in the same directory, and then verify that the new file was created. |
|
|
Write and read ZIP files in HDFS This example pipeline demonstrates how to use the HDFS ZipFile Writer Snap to zip and write a new file into HDFS, and then use the HDFS ZipFile Reader Snap to unzip and check the contents of the newly-created ZIP file. |
|
|
Read ORC files from HDFS and S3 This example demonstrates how to configure the ORC Reader Snap to read ORC files from both local HDFS instances and S3 instances. |
|
|
Write ORC files to HDFS and S3 This example demonstrates how to configure the ORC Writer Snap to write ORC files to both local HDFS instances and S3 instances using Hive Metastore for schema definition. |
|
|
Read Parquet files from HDFS, S3, and Kerberos-secured clusters This example demonstrates various ways to configure the Parquet Reader Snap to read Parquet files from HDFS, S3, Kerberos-secured clusters, and using the Catalog Query Snap for schema information. |
|
|
Use Parquet Writer with second input view for table metadata This example pipeline demonstrates how to use the second input view in the Parquet Writer Snap to receive table metadata, which overrides other schema settings such as the schema in the Edit Schema box or Hive Metastore properties. |
|
|
Partition Parquet files by specific fields This example demonstrates how to use the Partition By functionality in the Parquet Writer Snap to organize output files into subdirectories based on field values. |