Welcome to my website. I am always posting links to photo albums, art, technology and other creations. Everything that you will see on my numerous personal sites is powered by the formVistaTM Website Management Engine.

icon.linkedin.jpgicon.twitter.jpg

  • Subscribe to this RSS Feed
  • Debugging MapReduce MRv2 Code in Eclipse
    03/24/2015 11:36PM

    Following is how to set-up your environment to be able to set breakpoints, step-through, and debug your MapReduce code in Eclipse.

    All of the this was done on a machine running Linux, but should work just fine for any *nix machine, and perhaps Windows running Cygwin (assuming that you can get Hadoop and its naitive libraries compiled under Windows).

    This also assumes that you are building your project with maven.

    Install a pseudo-distributed hadooop cluster on your development box.  (Yes, this calls for another article on exactly how to do that which I will do shortly and link to from here).

    Add the following environment variables to .bash_profile to ensure that they will be applied to any login shells (make sure to check the location of the directories for your installed hadoop distribution):

        export LD_LIBRARY_PATH=/usr/lib/hadoop/lib/native
        export HADOOP_HOME=/usr/lib/hadoop

    Make sure to include the following dependencies in your pom:

    hadoop-mapreduce-client-core
    hadoop-common
    hadoop-hdfs
    hadoop-client


    After you import your maven project into Eclipse update the Build Path to include the correct path to the Native library shared objects:

    Right-click on your project and select 'Build Path -> Configure Build Path:

    Click on 'Libraries' tab:

    Click the drop-down arrow for the 'Maven Dependencies'

    Click on the drop-down arrow on the 'hadoop-common'.jar

    Select the 'Native library location' entry, and click 'Edit'

    Browse to the path of the native directory, in my case it was /usr/lib/hadoop/lib/native.

    Click 'OK'

    Click 'OK' to close the build path dialogue

    Create a run configuration for the Main class in your project:

    Make sure that you do not add the /etc/hadoop/conf* dir to the class path.

    Add any commandline arguments for input and output directories to the 'Program arguments' section of the run configuration, that points to your LOCAL file system and not HDFS.

    Afterwhich, you should be able to run your M/R code and debug it through Eclipse.

  • Unit Testing Private Static Methods With Primitive Array Arguments
    03/19/2015 10:06AM

    When writing unit tests to cover your entire program you will undoubtedly come accross the need to test private methods.  This can be acheived using reflection in Java JUnit tests.

    What is a little tricky, and was not completely obvious, was how to use reflection to test a private static method that accepted an array of primitives.  Following is a simple example, with explainations in the comments.

    Note, this code will not run as it, you would need to transpose it into a valid JUnit test class to bypass the IllegalAccessException.

    import java.lang.reflect.InvocationTargetException;
    import java.lang.reflect.Method;

    public class StaticArrayReflectionTest {

       public static void main(String[] args)
             throws NoSuchMethodException, SecurityException,
             IllegalAccessException, IllegalArgumentException,
             InvocationTargetException
       {

          byte[] arr = new byte[] {1, 2, 3, 4};

          // Get a Class instance of the class to be tested
          Class<ByteCounter> byteCounterClazz = ByteCounter.class;

          // Get a Method instance for the method to be tested
          //
          // The part to take note of is how to get a Class instance
          // of a array of primitives.
          Method countByteValueMethod = byteCounterClazz.getDeclaredMethod(
                "countByteValue",
                new Class[]{byte[].class}
          );

          // Invoke the Method instance passing in the arr argument
          //
          // Take note that the first argument of invoke is 'null' as there is no
          // object instance on which to invoke the method since the method in
          // question is static.
          // Also notice how the byte array is passed in, wrapped in an Object[]
          countByteValueMethod.invoke(null, new Object[]{arr});

       }

       public static class ByteCounter {
          private static int countByteValue(byte[] arr) {
             int retVal = 0;
             for (int i = 0; i < arr.length; i++) {
                retVal += (int) arr[i];
             }
             return retVal;
          }
       }
    }

  • One-Liner for Converting CRLF to LF in Text Files
    03/17/2015 10:31AM

    If you have text files created under DOS/Windows and need to convert the CRLF (carriage return and line feed) characters to LF (line feed) character, here is a quick one-liner.

    $ cat file.txt | perl -ne 's/\x0D\x0A/\x0A/g; print' > file.txt.mod

    You can also use dos2unix, however, especially under Cygwin I have seen dos2unix fail without giving any meaningful information about why it was unable to complete the task.  In that case, you can just do it by hand. 

  • Configuring Eclipse to Replace Tabs with Spaces for Indentation
    01/23/2015 6:46PM

    Following are two basic settings (I believe that there are other language specific, C++ for instance, settings as well).

     For Java:

    Window->Preferences->Java->Code Style->Formatter->
    Click on 'New' to create a new profile and select the profile that you want to copy
    Then click 'Edit' and select 'Spaces Only' from the 'Tab Policy' dropdown.

    You can further set the indentation and tab size.

    For default text editor:

    Window->Preferences->General->Editors->Text Editors->Insert spaces for tabs

  • Parsing Command Line Arguments with getopt in Bash
    12/07/2014 10:01PM

    When writing utility scripts in Bash it is tempting to simply pass positional arguments, use $1, $2, etc. and be done with it.  However if you want to either share this utility with other members of your team and/or incorporate it into your system, it makes sense to implement your command line argument parsing in a more flexible and maintainable manner.

    Using getopt you can very easily pass a variety of command line options and arguments.

    Following is a link to a GitHub Gist with an example that illustrates the implementation of flags, options or arguments with values and long option names.

    getopt-example.sh

     

Advanced Search

Categories

Archives