How To Remove the Byte Order Mark (BOM) from UTF-8 Encoded Text Files

The easiest way that I have seen so far for doing so is to use tail and simply read everything except the first three bytes (start reading at the 4th byte), as follows:

tail --bytes=+4 text_file.txt text_file-wo-bom.txt
Continue reading “How To Remove the Byte Order Mark (BOM) from UTF-8 Encoded Text Files”

[SOLVED] java.lang.NoSuchMethodError: org.apache.avro.generic.GenericData.createDatumWriter When Using Avro Data with MapReduce

I am working on a project and have decided to use Avro for the data serialization format.

I encountered the following error when trying to set up the unit test to test the mapper implementation through Eclipse:

java.lang.NoSuchMethodError: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter;
    at org.apache.avro.hadoop.io.AvroSerialization.getSerializer(AvroSerialization.java:114)
    at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:82)
    at org.apache.hadoop.mrunit.internal.io.Serialization.copy(Serialization.java:67)
    at org.apache.hadoop.mrunit.internal.io.Serialization.copy(Serialization.java:98)
    at org.apache.hadoop.mrunit.internal.io.Serialization.copyWithConf(Serialization.java:111)
    at org.apache.hadoop.mrunit.TestDriver.copy(TestDriver.java:676)
    at org.apache.hadoop.mrunit.TestDriver.copyPair(TestDriver.java:680)
    at org.apache.hadoop.mrunit.MapDriverBase.addInput(MapDriverBase.java:120)
    at org.apache.hadoop.mrunit.MapDriverBase.addInput(MapDriverBase.java:130)
    at org.apache.hadoop.mrunit.MapDriverBase.addAll(MapDriverBase.java:141)
    at org.apache.hadoop.mrunit.MapDriverBase.withAll(MapDriverBase.java:247)
    at com.ryanchapin.hadoop.mapreduce.mrunit.UserDataSortTest.testMapper(UserDataSortTest.java:111)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
    
Continue reading “[SOLVED] java.lang.NoSuchMethodError: org.apache.avro.generic.GenericData.createDatumWriter When Using Avro Data with MapReduce”

Configuring Hidden, Invisible, or Whitespace Characters in The Eclipse Text Editor

The newer (I am currently using Mars, 4.5.0) versions of Eclipse provide very good tools for configuring the visibility of whitespace characters in code.

To customize your settings go to Window -> Preferences -> General -> Editors -> Text Editors.

On that page there will be checkbox option next to “Show whitespace characters (configure visibility).

Clicking on the ‘configure visibility’ link will allow you to choose what is shown and the opacity of the whitespace characters, which is a really → Continue reading “Configuring Hidden, Invisible, or Whitespace Characters in The Eclipse Text Editor”