Hadoop – Ryan Chapin's Website

[SOLVED] Ambari There are no DataNodes to do rolling restarts when there are DataNodes that do need a restart

Posted on September 16, 2016January 28, 2021 by rchapin

When maintaining a Hadoop cluster, you will need to restart various service from time-to-time when/if you update Hadoop configurations.

I ran into a problem today with Ambari where I wanted to do a rollling restart of all of my DataNodes, but when I clicked on the “Restart DataNodes” entry in the “Restart” drop down the dialog indicated “There are no DataNodes to do rolling restarts”.

This was clearly incorrect.

It did not take me too long to figure out that → Continue reading “[SOLVED] Ambari There are no DataNodes to do rolling restarts when there are DataNodes that do need a restart”

[SOLVED] Unable to Connect to ambari-metrics-collector Issues

Posted on September 2, 2016January 28, 2021 by rchapin

I was having some issues with the ambari-metrics family of services on a ‘pseudo-distributed’ cluster that I have installed on my workstation.

The symptoms were:

1. Ambari indicated the following CRITICAL errors in the Ambari Dashboard under the Ambari Metrics section

Connection failed: [Errno 111] Connection refused to rchapin-wrkstn:6188

2. After attempting to restart the ambari-metrics-collector via either the Ambari Dashboard or through the commandline (# ambari-metrics-collector [stop|start]) you see the following (similar) messages in the ambari-metrics-collector.log

2016-09-02 12:15:37,505 INFO

→ Continue reading

[SOLVED] java.lang.NoSuchMethodError: org.apache.avro.generic.GenericData.createDatumWriter When Using Avro Data with MapReduce

Posted on January 14, 2016January 30, 2021 by rchapin

I am working on a project and have decided to use Avro for the data serialization format.

I encountered the following error when trying to set up the unit test to test the mapper implementation through Eclipse:

java.lang.NoSuchMethodError: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter;
    at org.apache.avro.hadoop.io.AvroSerialization.getSerializer(AvroSerialization.java:114)
    at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:82)
    at org.apache.hadoop.mrunit.internal.io.Serialization.copy(Serialization.java:67)
    at org.apache.hadoop.mrunit.internal.io.Serialization.copy(Serialization.java:98)
    at org.apache.hadoop.mrunit.internal.io.Serialization.copyWithConf(Serialization.java:111)
    at org.apache.hadoop.mrunit.TestDriver.copy(TestDriver.java:676)
    at org.apache.hadoop.mrunit.TestDriver.copyPair(TestDriver.java:680)
    at org.apache.hadoop.mrunit.MapDriverBase.addInput(MapDriverBase.java:120)
    at org.apache.hadoop.mrunit.MapDriverBase.addInput(MapDriverBase.java:130)
    at org.apache.hadoop.mrunit.MapDriverBase.addAll(MapDriverBase.java:141)
    at org.apache.hadoop.mrunit.MapDriverBase.withAll(MapDriverBase.java:247)
    at com.ryanchapin.hadoop.mapreduce.mrunit.UserDataSortTest.testMapper(UserDataSortTest.java:111)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)

→ Continue reading

Hadoop Cluster Sizing Wizard by Hortonworks

Posted on July 13, 2015February 3, 2021 by rchapin

Anyone who does any Hadoop development or systems engineering arrives at the “how should I size my cluster” question.

Hortonworks has a very nice cluster sizing calculator that takes into account the basic use-cases and data profile to help get you started with your hardware requirements.→ Continue reading “Hadoop Cluster Sizing Wizard by Hortonworks”

Debugging MapReduce MRv2 Code in Eclipse

Posted on March 24, 2015March 27, 2021 by rchapin

Following is how to set-up your environment to be able to set breakpoints, step-through, and debug your MapReduce code in Eclipse.

All of the this was done on a machine running Linux, but should work just fine for any *nix machine, and perhaps Windows running Cygwin (assuming that you can get Hadoop and its naitive libraries compiled under Windows).

This also assumes that you are building your project with maven.

Install a pseudo-distributed hadooop cluster on your development box. (Yes, → Continue reading “Debugging MapReduce MRv2 Code in Eclipse”

Restarting Individual Services or the Entire HDP Stack in the Hortornworks Virtual Sandbox

Posted on October 13, 2014March 27, 2021 by rchapin

I’m using the Hortonworks Virtual Sandbox for development and testing and wanted to restart the HDP stack without (of course) having to restart the VM.

It took me a little while to figure out how to go about it as Internet searches on the topic revealed very little.

It turns out that Hortonworks have set up their own service on the box, startup_script.

If you take a look at /etc/init.d/startup_script you will see that it calls a number of other → Continue reading “Restarting Individual Services or the Entire HDP Stack in the Hortornworks Virtual Sandbox”