+373 (69) 210 189 | info@fusionworks.md
bclose

Connecting to Yarn on Hortonworks HDP 2.0


Disclaimer:
This post refers only to current state of affairs. As new version of HDP get released or their site updated situation may change.

One of our clients decided to use Hortonworks HDP 2.0 platform. For POC we used VM provided by Hortonworks on their website. One of the key points over 1.x branch was YARN, so first thing we started with was testing of ability to connect to YARN on VM from another machine. Internet is plenty of examples using YARN from the same box via

[bash]hadoop jar your_jar_with_client_and_appmaster.jar etc[/bash]

But what if you need to connect from another machine? Is it going to be as smoothly as in “ideal” conditions? The answer is NO ????
I started with the simplest code and here is what I came across:

[java] package md.fusionworks.hadoop;

import java.io.IOException;

import junit.framework.Assert;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster;
import org.apache.hadoop.yarn.applications.distributedshell.Client;
import org.apache.hadoop.yarn.client.api.YarnClient;
import org.apache.hadoop.yarn.conf.YarnConfiguration;
import org.apache.hadoop.yarn.exceptions.YarnException;
import org.junit.Test;

public class BasicConnectivityTest {

protected static Configuration conf = new YarnConfiguration();

@Test
public void test() throws YarnException, IOException {
YarnClient client = YarnClient.createYarnClient();
client.init(conf);
client.start();
client.createApplication();
}
}
[/java]

  1. HDP YARN runs on different ports from defaults. Here is yarn-site.xml which will help. It should be placed on the root of the classpath. If building with maven /src/main/resources are ok
    [xml] <!–Sun Jun 16 11:08:58 2013 –>
    <configuration>
    <property>
    <name>hadoop.security.authentication</name>
    <value>simple</value> <!– A value of "simple" would disable security. –>
    </property>
    <property>
    <name>hadoop.security.authorization</name>
    <value>false</value>
    </property>
    <property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>192.168.88.57:8141</value>
    </property>
    <property>
    <name>yarn.resourcemanager.address</name>
    <value>192.168.88.57:8050</value>
    </property>
    <property>
    <name>yarn.nodemanager.address</name>
    <value>192.168.88.57:45454</value>
    </property>
    <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>192.168.88.57:8030</value>
    </property>
    <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>192.168.88.57:8025</value>
    </property>
    <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>192.168.88.57:8088</value>
    </property>
    <property>
    <name>yarn.log.server.url</name>
    <value>http://192.168.88.57:19888/jobhistory/logs</value>
    </property>
    </configuration>[/xml]
  2. Hadoop libraries problem. Current HDP VM runs on custom version of Hadoop libs: 2.1.0.2.0.3.0-98. Version numbering made me think that 2.1.0-SNAPSHOT from Apache should work fine(Hortonworks does not expose its own maven repo at the moment). But it resulted in the errors like one below:
    [java] 1311 [IPC Client (1943692956) connection to /192.168.0.31:8050 from lordorient] DEBUG org.apache.hadoop.ipc.Client.run null – IPC Client (1943692956) connection to /192.168.0.31:8050 from lordorient: starting, having connections 1
    1315 [IPC Client (1943692956) connection to /192.168.0.31:8050 from lordorient] DEBUG org.apache.hadoop.ipc.Client.close null – closing ipc connection to /192.168.0.31:8050: null
    java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:392)
    at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:955)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:852)
    1319 [IPC Client (1943692956) connection to /192.168.0.31:8050 from lordorient] DEBUG org.apache.hadoop.ipc.Client.close null – IPC Client (1943692956) connection to /192.168.0.31:8050 from lordorient: closed
    1320 [IPC Client (1943692956) connection to /192.168.0.31:8050 from lordorient] DEBUG org.apache.hadoop.ipc.Client.run null – IPC Client (1943692956) connection to /192.168.0.31:8050 from lordorient: stopped, remaining connections 0
    1391 [main] TRACE org.apache.hadoop.ipc.ProtobufRpcEngine.invoke null – 1: Exception <- null@192.168.0.31/192.168.0.31:8050: getClusterMetrics {java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "lordorient-VirtualBox/127.0.1.1"; destination host is: "192.168.0.31":8050; }
    [/java]

    I’ve contacted Hortonworks support and they told me to use exact their libraries. So I went to their dowload site, got Hadoop and pushed all the jars to our maven repo. Then I tried once again but the same issue occured.
    Then I noticed that the libraries in their tarballs differ from ones on the VM: 2.1.0.2.0.4.0-38 vs 2.1.0.2.0.3.0-98. So the only option left was to collect all jars from VM and push them to maven repo. Given the fact that VM has no GUI it was kind a task for a Windows guy like me. Fortunately Dimon from Ruby Team came to rescue with nice shell script

    [bash] #!/bin/bash
    rm -rf jars/*
    find /usr/lib/hadoop* -iname "hadoop-*.jar" -exec cp {} ~/jars \;
    rm -rf jars/*-sources.jar

    cd ~/jars

    for i in *.jar
    do
    unzip $i -d $i.dir
    cp `find $i.dir -iname "pom.xml"` $i.dir/pom.xml
    mvn deploy:deploy-file -Dfile=$i -DpomFile=$i.dir/pom.xml -Durl=http://somedomain/maven_repo_path -DrepositoryId=some_repositoryId
    done
    [/bash]

    Once this done, the only thing you need to do is to install some additional poms to your repository:
    hadoop-main.pom
    hadoop-project.pom
    hadoop-project-dist.pom
    hadoop-yarn.pom
    hadoop-yarnapplications.pom
    hadoop-yarn-server.pom
    At this point you are ready to connect to YARN on Hortonworks HDP 2.0 VM. Simply add required maven deps:

    [xml] <properties>
    <!– hadoop.version>2.1.0-SNAPSHOT</hadoop.version –>
    <hadoop.version>2.1.0.2.0.3.0-98</hadoop.version>
    <!– hadoop.version>2.1.0.2.0.4.0-38</hadoop.version–>
    </properties>

    <dependencies>
    <dependency>
    <groupId>org omeprazole magnesium.apache.hadoop</groupId>
    <artifactId>hadoop-yarn-api</artifactId>
    <version>${hadoop.version}</version>
    </dependency>
    <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>${hadoop.version}</version>
    </dependency>
    <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-yarn-applications-distributedshell</artifactId>
    <version>${hadoop.version}</version>
    </dependency>
    </dependencies>
    [/xml]

Conclusion

  • Hortonworks HDP 2.0 does not works with the latest libs from Apache 2.1.0-SNAPSHOT branch
  • Libs on Hortonworks website are not consistent between different distributions
  • Making everything work requires some effort ????

But once all the problems were resolved I found Hortonworks bundle very easy and comfortable to use. Very good thing they are giving prepared VM that let you use Hadoop without going into complicated installation process.