Compiling hive for a non-release hadoop version

We have been working on many interesting things around Perforator like extending the core model to other systems like hive, tez etc. At MSR, we developed on hadoop-yarn trunk and have deployed that with a version name 3.0.0-SNAPSHOT. and for a lot of other reasons, we can’t just rename the version. I have struggled a bit in last few days to run hive on top of the non-release version like ours and this blog highlights my solution.

While hive documentation have step to compile from source I could not find any documentation on how to compile for a version not yet “integrated” with hive. I was hoping to find at least some information on this from developer page of hive. Surprisingly enough, I didn’t get anything there. Looking at the pom.xml, I had an early impression that, just changing the 0.23-hadoop-version in the pom.xml would do the trick. But it turns out that hive on starting, calls the hadoop version command to decide what shims to load. and thus will fail with unrecognized hadoop version error, like following.

log4j:ERROR Could not instantiate class [org.apache.hadoop.hive.shims.HiveEventCounter].
java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.0.0-SNAPSHOT
	at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:154)
	at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:113)
	at org.apache.hadoop.hive.shims.ShimLoader.getEventCounter(ShimLoader.java:98)
	at org.apache.hadoop.hive.shims.HiveEventCounter.<init>(HiveEventCounter.java:34)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
	at java.lang.Class.newInstance0(Class.java:372)
	at java.lang.Class.newInstance(Class.java:325)
	at org.apache.log4j.helpers.OptionConverter.instantiateByClassName(OptionConverter.java:336)
	at org.apache.log4j.helpers.OptionConverter.instantiateByKey(OptionConverter.java:123)
	at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:752)
	at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
	at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
	at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
	at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
	at org.apache.log4j.PropertyConfigurator.configure(PropertyConfigurator.java:415)
	at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jDefault(LogUtils.java:124)
	at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:77)
	at org.apache.hadoop.hive.common.LogUtils.initHiveLog4j(LogUtils.java:58)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:630)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:601)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
log4j:ERROR Could not instantiate appender named "EventCounter".
..

There is an important piece of information as highlighted above, ShimLoads.java is the culprit, as before loading the Shims for a given hadoop version it does a sanity check that the hadoop version number is valid. so just go ahead an make sure that this tests passes. I know this test is important, but if you are in situation like me, just go ahead and add the following lines in the file.

vim shims/common/src/main/java/org/apache/hadoop/hive/shims/ShimLoader.java

and add the case statement for your version number. I will add “case 3:” as shown below.

  public static String getMajorVersion() {
	...
	...
	
    // Special handling for Hadoop 1.x and 2.x
    switch (Integer.parseInt(parts[0])) {
    case 0:
      break;
    case 1:
      return "0.20S";
    case 2:
    case 3:
        return "0.23";
    default:
      throw new IllegalArgumentException("Unrecognized Hadoop major version number: " + vers);
    }
    	...
    	...
  }

Now you can build hive with following command.

mvn clean install -Phadoop-2,dist

After successful completion of the above command, you would find the packages hive distribution in the packaging/target/ folder.

Gotchas

  1. Don’t use JDK 7 otherwise you might hive will fail to compile (mentioned in https://issues.apache.org/jira/browse/HIVE-3197 and https://issues.apache.org/jira/browse/HIVE-3384)

  2. Contrib module in current hive trunk is broken by the dependencies on the package org.apache.hadoop.record, which is moved to hadoop-streaming project and then reverted (mentioned in https://issues.apache.org/jira/browse/HIVE-7077 and https://issues.apache.org/jira/browse/HADOOP-10474)

    If you encounter this, there is a simple workaround for that. You can just build the contrib module with version of hadoop that is not effected by above change. Instead of doing any changes to the pom file etc, I simply built it with the hadoop-1. While this is likely to be fixed in the future version,here are the commands to workaround,

     cd contrib
     mvn clean install -Phadoop-1,dist
     cd ..
     mvn install -Phadoop-2,dist

    Note that we have removed clean from the goals, so as to avoid compiling the contrib module with the hadoop-2. Since mvn sees that the module is already compiled, it just goes ahead with the rest of hive.