This example pipeline demonstrates how to use the Hadoop Directory Browser Snap to list the contents of a Hadoop file system directory, save the directory listing as a file in the same directory, and then verify that the new file was created.
Download this
pipeline.
-
Configure the first Hadoop Directory Browser Snap to list directory contents.
- Directory: Enter the HDFS directory path to browse (for example, hdfs://cdhclusterqa-4-1.clouddev.snaplogic.com:8020/user/snaplogic/0ut).
- Filter: Use * to list all files.
This Snap outputs the list of files in the directory, which will show N files depending on the current contents.
-
Use a Mapper Snap to transform the directory listing data.
Configure the Mapper to extract the directory path and filename from the output.
-
Configure an HDFS Reader Snap to read a test file.
This Snap reads a file that will be written back to the directory.
-
Use a second Mapper Snap to prepare the filename for the new file.
Extract and format the filename from the path.
-
Configure an HDFS Writer Snap to write the new file to the directory.
- Directory: Use the same directory path as the Hadoop Directory Browser.
- File Action: Select OVERWRITE to replace any existing file with the same name.
This creates a new file in the directory using the output from the Hadoop Directory Browser Snap.
-
Configure a second Hadoop Directory Browser Snap to verify the file was created.
Use the same directory path and filter settings as the first Hadoop Directory Browser.
On successful execution, this Snap lists N+1 files, confirming that one new file was created in the directory.
The pipeline execution statistics show:
- Original: N files (depending on the number of files available in the HDFS directory)
- New: N+1 (the new file created using the output from the Hadoop Directory Browser Snap)
- Insertions: 1 (the new file)
- Unmodified: N (the original files in the directory)
To successfully reuse pipelines:
- Download and import the pipeline into SnapLogic.
- Configure Snap accounts as applicable.
- Provide pipeline parameters as applicable.