Welcome to my website. I am always posting links to photo albums, art, technology and other creations. Everything that you will see on my numerous personal sites is powered by the formVistaTM Website Management Engine.


  • How to Return Hive Query Results Similary to MySQL \G in One Vertical Column
    06/09/2016 4:03PM

    When trying to look at data in a database with really wide rows even just selecting 1 row to see the data is nearly impossible to understand when the single row wraps 7 or 8 times.

    MySQL offers the '\G' option to display the output in a single column.

    The corresponding method in Hive is to execute the following set command:

    > !set outputformat vertical
    > SELECT something FROM some table;

  • [SOLVED] java.lang.NoSuchMethodError: org.apache.avro.generic.GenericData.createDatumWriter When Using Avro Data with MapReduce
    01/14/2016 2:56PM

    I am working on a project and have decided to use Avro for the data serialization format.

    I encountered the following error when trying to set up the unit test to test the mapper implementation through Eclipse:

    java.lang.NoSuchMethodError: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter;
        at org.apache.avro.hadoop.io.AvroSerialization.getSerializer(AvroSerialization.java:114)
        at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:82)
        at org.apache.hadoop.mrunit.internal.io.Serialization.copy(Serialization.java:67)
        at org.apache.hadoop.mrunit.internal.io.Serialization.copy(Serialization.java:98)
        at org.apache.hadoop.mrunit.internal.io.Serialization.copyWithConf(Serialization.java:111)
        at org.apache.hadoop.mrunit.TestDriver.copy(TestDriver.java:676)
        at org.apache.hadoop.mrunit.TestDriver.copyPair(TestDriver.java:680)
        at org.apache.hadoop.mrunit.MapDriverBase.addInput(MapDriverBase.java:120)
        at org.apache.hadoop.mrunit.MapDriverBase.addInput(MapDriverBase.java:130)
        at org.apache.hadoop.mrunit.MapDriverBase.addAll(MapDriverBase.java:141)
        at org.apache.hadoop.mrunit.MapDriverBase.withAll(MapDriverBase.java:247)
        at com.ryanchapin.hadoop.mapreduce.mrunit.UserDataSortTest.testMapper(UserDataSortTest.java:111)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
        at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
        at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
        at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
        at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
        at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
        at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
        at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
        at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)

    After digging through the source code and finding that method did, infact, exist.  I tried running the same unit test via the maven cli.  It worked just fine.

    After more digging, it turns out that what was happening was that the classpath in Eclipse was using avro-1.7.4 from the hadoop-common and hadoop-mapreduce-client-core jars in my project, and not the 1.7.7 version that I was trying to use.

    To see what the difference between running it via the maven cli and running it in eclipse, I went through the following steps:

    Added the following code to my test code to print out the classpath at runtime:

        // Print out the classpath
        ClassLoader sysClassLoader = ClassLoader.getSystemClassLoader();
        URL[] urls = ((URLClassLoader)sysClassLoader).getURLs();
        for(int i=0; i< urls.length; i++) {

    Then ran it, in Eclipse and saved off the console output.

    Then, I added a sleep call for 100 seconds in the same place in the code.  This enabled me to run the test again from the terminal and copy the project/target/surefire/ directory which contained the surefirebooter.jar.  Click here to read more about that project.

    After copying that jar to a temporary directory, I unpacked it and then compared the versions of avro between the Eclipse classpath and the classpath from the terminal and noticed that they were different.  Inspecting the dependency tree of my project it was clear that 1.7.4 was part of the hadooop jars I was using.

    Ultimately, I ended up updating my version of avro to 1.7.4 in my pom to eliminate the conflict.

  • Hadoop Cluster Sizing Wizard by Hortonworks
    07/13/2015 1:36PM

    Anyone who does any Hadoop development or systems engineering arrives at the "how should I size my cluster" question.

    Hortonworks has a very nice cluster sizing calculator that takes into account the basic use-cases and data profile to help get you started with your hardware requirements.

  • Debugging MapReduce MRv2 Code in Eclipse
    03/24/2015 11:36PM

    Following is how to set-up your environment to be able to set breakpoints, step-through, and debug your MapReduce code in Eclipse.

    All of the this was done on a machine running Linux, but should work just fine for any *nix machine, and perhaps Windows running Cygwin (assuming that you can get Hadoop and its naitive libraries compiled under Windows).

    This also assumes that you are building your project with maven.

    Install a pseudo-distributed hadooop cluster on your development box.  (Yes, this calls for another article on exactly how to do that which I will do shortly and link to from here).

    Add the following environment variables to .bash_profile to ensure that they will be applied to any login shells (make sure to check the location of the directories for your installed hadoop distribution):

        export LD_LIBRARY_PATH=/usr/lib/hadoop/lib/native
        export HADOOP_HOME=/usr/lib/hadoop

    Make sure to include the following dependencies in your pom:


    After you import your maven project into Eclipse update the Build Path to include the correct path to the Native library shared objects:

    Right-click on your project and select 'Build Path -> Configure Build Path:

    Click on 'Libraries' tab:

    Click the drop-down arrow for the 'Maven Dependencies'

    Click on the drop-down arrow on the 'hadoop-common'.jar

    Select the 'Native library location' entry, and click 'Edit'

    Browse to the path of the native directory, in my case it was /usr/lib/hadoop/lib/native.

    Click 'OK'

    Click 'OK' to close the build path dialogue

    Create a run configuration for the Main class in your project:

    Make sure that you do not add the /etc/hadoop/conf* dir to the class path.

    Add any commandline arguments for input and output directories to the 'Program arguments' section of the run configuration, that points to your LOCAL file system and not HDFS.

    Afterwhich, you should be able to run your M/R code and debug it through Eclipse.

  • Restarting Individual Services or the Entire HDP Stack in the Hortornworks Virtual Sandbox
    10/13/2014 12:18PM

    I'm using the Hortonworks Virtual Sandbox for development and testing and wanted to restart the HDP stack without (of course) having to restart the VM.

    It took me a little while to figure out how to go about it as Internet searches on the topic revealed very little.

    It turns out that Hortonworks have set up their own service on the box, startup_script.

    If you take a look at /etc/init.d/startup_script you will see that it calls a number of other shell scripts in /usr/lib/hue/tools/start_scripts/

    To restart the whole stack simply issue the following command:

    # service startup_script restart

  • 1 2 >>
Advanced Search