Write and read ZIP files in HDFS

This example pipeline demonstrates how to use the HDFS ZipFile Writer Snap to zip and write a new file into HDFS, and then use the HDFS ZipFile Reader Snap to unzip and check the contents of the newly-created ZIP file.

Download this pipeline.

  1. Use a Hadoop Directory Browser Snap to check the contents of the target directory before writing.
    • Directory: Enter the HDFS directory path where you want to write the ZIP file.
    • Filter: Use * to list all files in the directory.

    This Snap outputs the initial list of files in the directory.

  2. Generate a file for upload using a JSON Generator or File Reader Snap.

    Create or read the file content that you want to zip and write to HDFS.

  3. Configure the HDFS ZipFile Writer Snap to zip and write the file to HDFS.
    • Directory: Enter the HDFS directory path where the ZIP file should be written.
    • Filename: Specify the name for the ZIP file (for example, test.zip).
    • Compression Level: Select the desired compression level.

    The Snap zips the input file and writes it to the specified HDFS directory.

  4. Use a second Hadoop Directory Browser Snap to verify the ZIP file was created.

    Configure it with the same directory path to confirm the new ZIP file appears in the listing.

  5. Configure the HDFS ZipFile Reader Snap to read and unzip the file.
    • Directory: Enter the HDFS directory path containing the ZIP file.
    • Filename: Specify the ZIP file to read (for example, test.zip).
    • Filter: Optionally specify a filter to extract only certain files from the ZIP archive.

    The Snap reads the ZIP file, extracts its contents, and outputs the unzipped file data.

On successful execution:

  • The HDFS ZipFile Writer creates a compressed ZIP file in the target HDFS directory.
  • The Hadoop Directory Browser confirms the ZIP file exists.
  • The HDFS ZipFile Reader successfully extracts and outputs the file contents from the ZIP archive.
To successfully reuse pipelines:
  1. Download and import the pipeline into SnapLogic.
  2. Configure Snap accounts as applicable.
  3. Provide pipeline parameters as applicable.