Read ORC files from HDFS and S3
This example demonstrates how to configure the ORC Reader Snap to read ORC files from both local HDFS instances and S3 instances.
The ORC Reader Snap successfully reads and outputs the ORC file data from either HDFS or S3 storage.
Troubleshooting:
If you encounter issues reading ORC files from S3, configure the following settings in your HDFS configuration:
- Go to HDFS configuration.
- In Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml, add an entry with the following details:
- Name:
fs.s3a.threads.max - Value: Set an appropriate thread count (for example, 10)
- Name:
- Restart all the nodes.
- Under Restart Stale Services, select Re-deploy client configuration.