Joshua Fennessy

Configure Apache Hive to Recursively Search Directories for Files

It is common, such as when using Flume to collect log data for example, that files end up inside subdirectories in HDFS.

By default, Hive will only look for files in the root of directory specified, but with a couple of tweaks, it can be configured to look recursively through subdirectories.

Consider the following file structure in HDFS:

root hdfs     133205 2015-06-30 02:14 /test/000000_0
root hdfs          0 2015-06-30 02:14 /test/test_child_directory
root hdfs     133205 2015-06-30 02:14 /test/test_child_directory/000000_0

Running this query:

SELECT COUNT(*) FROM test_table;

Provides this result:No Recursion 2541 rows

By default, an external Hive table created using the directory /test as the table location, will contain 2541 rows. The total number of rows found in a single instance of the file 000000_0. This proves that Hive does not traverse into subdirectories to read files.

The Solution

With the addition of a couple of property assignment statements the default behavior of Hive can be modified to allow recursion.

Consider the following query:

SET mapred.input.dir.recursive=true;
SET hive.mapred.supports.subdirectories=true;

SELECT COUNT(*) FROM test_table;
Recursion Enabled - 5082 rows
Note that the only change to the previous query is the addition of the two property statements that instruct Hive to traverse subdirectories.

The number rows returned has doubled, as expected. In this query, Hive successfully traversed to the sub-directory and read the second instance of the file.

Adding these properties to each query that should use recursion is a good solution to this project, but it’s not the only solution.

If you find yourself wanting to use recursion by default in your Hive installation, you can add these two properties to your hive-site.xml file. Using Ambari, you would want to enter these properties as new keys in the Custom hive-site.xml section.

Recursion properties in Ambari

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: