Setting the Compiler Version for Maven from the Command Line

By default maven sets the compiler version for you.  Of course, you can always set it in the pom, but there are cases where you cannot modify the pom, and/or you might want to test compilation and tests with different versions of java.

Following are the specific arguments to pass the compiler version to maven from the command line:

mvn clean install -Dmaven.compiler.source=1.7 -Dmaven.compiler.target=1.7

How To Publish Artifacts to the Maven Central Repository

I have just finished releasing my first project to Maven Central Repository and wanted to capture my notes for the setup of the project and all of the steps required.

Resources

Create account on OSSRH

You will need an account to the sonatype JIRA for OSSRH.  From there, you can request the creation of a new project. See http://central.sonatype.org/pages/ossrh-guide.html for details.

Setup of the Project/pom and Pre-requisites:

PGP keys

http://central.sonatype.org/pages/working-with-pgp-signatures.html

Create a set of PGP keys

gpg2 --gen-key

List the keys

gpg2 --list-keys
/usr/local2/home/rchapin/.gnupg/pubring.gpg
-------------------------------------------
pub   4096R/E5170CE8 2015-03-26 [expires: 2016-03-25]
uid                  Ryan Chapin <rchapin@nbinteractive.com>
sub   4096R/8DAF9AD6 2015-03-26 [expires: 2016-03-25]

The id for the key created is ‘E5170CE8’

Distribute the public key

gpg2 --keyserver hkp://pool.sks-keyservers.net --send-keys E5170CE8

Additions to be added to the pom

Add the <license> tag to your pom. Following are some links regarding available licenses and an example of it’s use in the pom.

I typically use the BSD 3 License, and added the following in the pom:

<project>
  ...
  <licenses>
    <license>
      <name>The BSD 3-Clause License</name>
      <url>http://opensource.org/licenses/BSD-3-Clause</url>
      <distribution>repo</distribution>
    </license>
  </licenses>
  ...
</project>

...

Add the <developers> tag to your pom and add a <developer> entry for yourself:

<project>
  ...
  <developers>
    <developer>
      <id>rchapin</id>
      <name>Ryan Chapin</name>
      <email>rchapin@nbinteractive.com</email>
      <url>http://www.ryanchapin.com</url>
      <roles>
        <role>architect</role>
        <role>developer</role>
      </roles>
      <timezone>America/New_York</timezone>
      <properties>
        <picUrl>http://www.gravatar.com/516f2158d74d134faa9649e9180ef782</picUrl>
      </properties>
    </developer>
  </developers>
  ...
</project>

 Add the <scm> tag to your pom with the details for your repository:

<project>
  ...
  <scm>
    <connection>scm:git:git@github.com:rchapin/hash-generator.git</connection>
    <developerConnection>scm:git:git@github.com:rchapin/hash-generator.git</developerConnection>
    <url>git@github.com:rchapin/hash-generator.git</url>
    <tag>HEAD</tag>
  </scm>
  ...
</project>

Distribution Management and Authentication:  Configure the pom to enable maven to deploy to OSSRH Nexus server with the Nexus Staging Maven plugin.  See below for the release profile configuration which will contain the configs for the nexus-staging-maven-plugin.

<project>
  ...
  <distributionManagement>
    <snapshotRepository>
    <id>ossrh</id>
    <url>https://oss.sonatype.org/content/repositories/snapshots</url>
    </snapshotRepository>
  </distributionManagement>
  ...
</project>

Create a profile to encapsulate the creation of the javadoc and source jar as well as the pgp signing of the artifacts.  In my case, I set the profile to <activeByDefault>true</activeByDefault> to ease release builds and simply invoke a build as such during development of the project to turn off the release profile

mvn package -P\!release

Also, configure the nexus-staging-maven-plugin setting <autoReleaseAfterClose>false</autoReleaseAfterClose> to enable manual inspection of the staging repository BEFORE it is released to central.  To deploy to OSSRH and release to the Central Repository in one step, set autoReleaseAfterClose to true.

<profiles>
    <profile>
      <id>release</id>
      <activation>
        <activeByDefault>true</activeByDefault>
      </activation>
      <build>
        <plugins>

          <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-javadoc-plugin</artifactId>
            <version>${maven.javadoc.plugin.version}</version>
            <executions>
              <execution>
                <id>attach-javadocs</id>
                <goals>
                  <goal>jar</goal>
                </goals>
              </execution>
            </executions>
          </plugin>

          <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-source-plugin</artifactId>
            <version>${maven.source.plugin.version}</version>
            <executions>
              <execution>
                <id>attach-sources</id>
                <goals>
                  <goal>jar</goal>
                </goals>
              </execution>
            </executions>
          </plugin>

          <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-gpg-plugin</artifactId>
            <version>1.6</version>
            <executions>
              <execution>
              <id>sign-artifacts</id>
              <phase>verify</phase>
              <goals>
                <goal>sign</goal>
              </goals>
              </execution>
            </executions>
          </plugin>

          <plugin>
            <groupId>org.sonatype.plugins</groupId>
            <artifactId>nexus-staging-maven-plugin</artifactId>
            <version>1.6.5</version>
            <extensions>true</extensions>
            <configuration>
              <serverId>ossrh</serverId>
              <nexusUrl>https://oss.sonatype.org/</nexusUrl>
              <autoReleaseAfterClose>false</autoReleaseAfterClose>
            </configuration>
          </plugin>

        </plugins>
      </build>
    </profile>

</profiles>

Add the maven-release-plugin to the <build> section.  This should include disabling the ‘release’ profile that we added/describe above, and then specify the deploy goal together with the activation of the ‘release’ profile for the deploy goal.

<project>
  ...
  <build>
    ...
    <plugins>
      ...
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-release-plugin</artifactId>
        <version>2.5.1</version>
        <configuration>
          <autoVersionSubmodules>true</autoVersionSubmodules>
          <useReleaseProfile>false</useReleaseProfile>
          <releaseProfiles>release</releaseProfiles>
          <goals>deploy</goals>
        </configuration>
      </plugin>
      ...
    </plugins>
    ...
  </build>
  ...
</project>

Adding credentials for both the distributionManagement and the pgp signing need to be added to the ~/.m2/settings.xml. One entry for your OSSRH login, and another entry for your PGP signing passphrase

OSSRH Login

<setting>
  ...
  <servers>
    <server>
      <id>ossrh</id>
      <username>your-uid</username>
      <password>your-passwd</password>
    </server>
  </servers>
  ...
</settings>

PGP Signing Passphrase

<setting>
  ...
  <profiles>
    <profile>
      <id>ossrh</id>
      <activation>
        <activeByDefault>true</activeByDefault>
      </activation>
      <properties>
        <gpg.executable>gpg2</gpg.executable>
        <gpg.passphrase>your-passphrase</gpg.passphrase>
      </properties>
    </profile>
  </profiles>
  ...
</settings>

Performing a SNAPSHOT Deployment

As long as your current pom version still ends in ‘SNAPSHOT’ you can deploy a snapshot version to the OSSRH

mvn clean deploy

To use the SNAPSHOT version in a project, users will need to add the snapshot repo to their Nexus, settings.xml or pom.xml. 

<project>
  ...
  <repositories>
    <repository>
      <id>ossrh-snapshots</id>
      <url>https://oss.sonatype.org/content/repositories/snapshots/</url>
    </repository>
  </repositories>
  ...
</project>

Release and Deployment

First make sure that all of the code is pushed to the remote repo and that everything is merged into your main branch.  Then make sure that you are ON the main branch on the local machine.

Prepare Release

To do a dry run and insure that pom will be bumped as expected and to check that everything is in order. This will NOT check-in or tag anyting in the scm repository.

mvn release:prepare -DdryRun=true

Check the output and then do the following before running the actual prepare command

mvn release:clean

Execute the release:prepare

mvn release:prepare

 If you encounter any errors

mvn release:prepare -Dresume=false

Alternatively, you can use

mvn release:clean release:prepare

See http://maven.apache.org/maven-release/maven-release-plugin/examples/prepare-release.html for details.

Once the release has been performed the scm repository should be updated with the new tag and the pom should be bumped to the next version for the next iteration of development.

Perform Release

Once the release has been prepared and a tag created and the scm repository has been update, you can deploy to OSSRH. The following will deploy to a staging repository.

mvn release:perform

Inspecting Staging Repo and Releasing to Central

Login to OSSRH via https://oss.sonatype.org/.  Uid and passwd are the same for issues.sonatype.org. See http://central.sonatype.org/pages/releasing-the-deployment.html for details

Releasing to Central

Once you have inspected the staging repo and are ready to release the deployment to Central do the following

cd target/checkout
mvn nexus-staging:release

If this is your first release to central for this project, don’t forget to go back to your project creation request ticket and add a comment that you have promoted your first release so that your promotion can be verified and your artifacts synced with Central.  This step only needs to be done the first time you promote to central with a new OSSRH project.

Debugging MapReduce MRv2 Code in Eclipse

Following is how to set-up your environment to be able to set breakpoints, step-through, and debug your MapReduce code in Eclipse.

All of the this was done on a machine running Linux, but should work just fine for any *nix machine, and perhaps Windows running Cygwin (assuming that you can get Hadoop and its naitive libraries compiled under Windows).

This also assumes that you are building your project with maven.

Install a pseudo-distributed hadooop cluster on your development box.  (Yes, this calls for another article on exactly how to do that which I will do shortly and link to from here).

Add the following environment variables to .bash_profile to ensure that they will be applied to any login shells (make sure to check the location of the directories for your installed hadoop distribution):

export LD_LIBRARY_PATH=/usr/lib/hadoop/lib/native
export HADOOP_HOME=/usr/lib/hadoop

Make sure to include the following dependencies in your pom:

  • hadoop-mapreduce-client-core
  • hadoop-common
  • hadoop-hdfs
  • hadoop-client

After you import your maven project into Eclipse update the Build Path to include the correct path to the Native library shared objects:

  1. Right-click on your project and select ‘Build Path -> Configure Build Path:
  2. Click on ‘Libraries’ tab:
  3. Click the drop-down arrow for the ‘Maven Dependencies’
  4. Click on the drop-down arrow on the ‘hadoop-common’.jar
  5. Select the ‘Native library location’ entry, and click ‘Edit’
  6. Browse to the path of the native directory, in my case it was /usr/lib/hadoop/lib/native.
  7. Click ‘OK’
  8. Click ‘OK’ to close the build path dialogue

Create a run configuration for the Main class in your project:

Make sure that you do not add the /etc/hadoop/conf* dir to the class path.

Add any commandline arguments for input and output directories to the ‘Program arguments’ section of the run configuration, that points to your LOCAL file system and not HDFS.

Afterwhich, you should be able to run your M/R code and debug it through Eclipse.

Unit Testing Private Static Methods With Primitive Array Arguments

When writing unit tests to cover your entire program you will undoubtedly come across the need to test private methods.  There are arguments that these methods should be tested via integration tests, but there are sometimes when it makes more sense to test all of the permutations in a unit test. This can be achieved using reflection in Java JUnit tests.

What is a little tricky, and was not completely obvious, was how to use reflection to test a private static method that accepted an array of primitives.  Following is a simple example, with explainations in the comments.

Note, this code will not run as it, you would need to transpose it into a valid JUnit test class to bypass the IllegalAccessException.

Class with private method that you want to test:

public class ByteCounter {
    private static int countByteValue(byte[] arr) {
        int retVal = 0;
        for (int i = 0; i < arr.length; i++) {
            retVal += (int) arr[i];
        }
        return retVal;
    }
}

Unit test code:

import java.lang.reflect.Method;

public class StaticArrayReflectionTest {

    public static void main(String[] args) throws Exception {
        byte[] arr = new byte[] { 1, 2, 3, 4 };

        // Get a Class instance of the class to be tested
        Class<ByteCounter> byteCounterClazz = ByteCounter.class;

        /*
         * Get a Method instance for the method to be tested. The part to take
         * note of is how to get a Class instance of a array of primitives.
         */
        Method countByteValueMethod = byteCounterClazz.getDeclaredMethod("countByteValue",
            new Class[] { byte[].class });

        /*
         *
         * Invoke the Method instance passing in the arr argument.
         *
         * Take note that the first argument of invoke is 'null' as there is no object
         * instance on which to invoke the method since the method in question is
         * static. Also notice how the byte array is passed in, wrapped in an Object[]
         */
        countByteValueMethod.invoke(null, new Object[] { arr });
    }
}

One-Liner for Converting CRLF to LF in Text Files

If you have text files created under DOS/Windows and need to convert the CRLF (carriage return and line feed) characters to LF (line feed) character, here is a quick one-liner.

cat file.txt | perl -ne 's/\x0D\x0A/\x0A/g; print' > file.txt.mod

You can also use dos2unix, however, especially under Cygwin I have seen dos2unix fail without giving any meaningful information about why it was unable to complete the task.  In that case, you can just do it by hand. 

Configuring Eclipse to Replace Tabs with Spaces for Indentation

Following are two basic settings (I believe that there are other language specific, C++ for instance, settings as well).

 For Java:

Window->Preferences->Java->Code Style->Formatter->
Click on ‘New’ to create a new profile and select the profile that you want to copy
Then click ‘Edit’ and select ‘Spaces Only’ from the ‘Tab Policy’ dropdown.

You can further set the indentation and tab size.

For default text editor:

Window->Preferences->General->Editors->Text Editors->Insert spaces for tabs

Parsing Command Line Arguments with getopt in Bash

When writing utility scripts in Bash it is tempting to simply pass positional arguments, use $1, $2, etc. and be done with it.  However if you want to either share this utility with other members of your team and/or incorporate it into your system, it makes sense to implement your command line argument parsing in a more flexible and maintainable manner.

Using getopt you can very easily pass a variety of command line options and arguments.

Following is a link to a GitHub Gist with an example that illustrates the implementation of flags, options or arguments with values and long option names.

getopt-example.sh

Passing an Array as an Argument to a Bash Function

If you want to pass an array of items to a bash function, the simple answer is that you need to pass the expanded values.  That means that you can pass the data as a quoted value, assuming that the elements are whitespace delimited, or you can pass it as a string and then split it using an updated IFS (Internal Field Separator) inside the function.

Following is an example of taking the output of a Hive query (a single column that is separated by new lines), wrapping it in quotes and passing it as a single value to the function.

#!/bin/bash

#
# This function will accept the expanded elements of the array
#
function foo() {
# Loop through elements in the first argument passed.
   # In this case, each is separated by whitespace so we do
   # not need to change the IFS
   for i in $1
   do
      echo "i = $i"
   done
}

# Dynamically build our hive query
HIVE_QRY="use somedb; select some_column from some_table;"

# Dynamically build the hive command to execute
CMD="hive -e '$HIVE_QRY'"

# Execute the hive query in a subshell and store the result in
# the 'QRY_RETVAL' variable
QRY_RETVAL=$(eval $CMD)

# Call the foo method and pass it the output of the query, /QUOTED/
# so that it will be passed as a single argument and not a series
# of arguments for each row returned by the query
foo "${QRY_RETVAL}"