Removing the Last Token From a String in Bash with awk

Let’s say that you have some number of files for which you want to create a containing directory that is named with all but the last token of the file name, and you want to remove just the last token to create the name of the directory.

Much easier to explain with an example.  Given this list of files:

ls -1
foo_10_10_sometrash
foo_1_sometrash
foo_2_sometrash
foo_3_sometrash
foo_4_sometrash
foo_5_5_sometrash
foo_5_sometrash
foo_6_6_sometrash
foo_7_7_sometrash
foo_8_8_sometrash
foo_9_9_sometrash

You want to create a directory for each of the files as follows:

foo_5_sometrash should have a directory named foo_5.

Further, let’s assume that you have thousands, or hundreds of thousands of files.  In that case doing it via a script while you get a cup of coffee is the preferred solution.

The work will be done within a for i in loop iterating over the output from ls with a nested awk command.

for i in `ls -1`; do DIRNAME=$(echo $i | awk -F_ '{$NF=""; print $0}' | sed 's/ /_/g' | sed 's/_$//g'); mkdir $DIRNAME; done

Here is the command broken down:

DIRNAME=$(....)

will set the var $DIRNAME to the result of the code within the parenthesis.

awk -F_ '{$NF=""; print $0}

will set the field separator to a ‘_’, the character on which you will be ‘splitting’ your string.  $NF="" will set the last field to an empty string and then the print $0 will print the entire input line.

The following sed commands will replace the spaces generated by the awk command with the original separators, ‘_’, and then remove the spurious, trailing ‘_’.

Creating an Array in Bash from a File With Each Element on a Separate Line

Let’s say that you have a file and you would like to convert each line in the file to an element in an array.

The key to this is knowing about and how to manipulate the IFS (Internal Field Separator).  The default IFS is whitespace (a space, tab, or newline) and if you create an array passing it a whitespace delimited list of strings, each token will be set to an element in the array.

ARRAY=(a b d c)

Will result in an array with a single letter in each element.

To do the same thing with the contents of a file, whereby each element is on a separate line, the first thing to be done is to set the IFS that is just new-lines (carriage returns).  Then set, as the input for the array, the contents of the file.

# Save our existing IFS
OIFS="$IFS"

# Set our IFS to a new-line/carriage return
IFS=$'\r\n'

# Create the array with the contents of a file
TEST_ARRAY=($(cat some_file.txt))

# Reset our IFS
IFS="$OIFS"

for i in "${TEST_ARRAY[@]}"
do
   echo $i
done

Removing The Last N Character From a String in Bash Script with sed

Here is a quick one-liner for trimming a specific number of characters from the end of a string under bash:

# Remove the last 5 characters
$ echo "somestringwith12345" | sed "s/.....$//g"
$ somestringwith

# Remove the last 3 characters
$ echo "somestringwith12345" | sed "s/...$//g"
$ somestringwith12

Splitting a String in Bash on the FIRST Occurrence of a Character

About a year ago I posted an article about how to split into an array of values based on a given delimiter in bash.

The following is how to take that same string and split it on the first occurrence of the same user defined delimiter.

Both use the ‘read’ command, but in a slightly different way.

Instead of passing read the -a [aname] parameter which tells it that “The words are assigned to sequential indices of the array variable aname, starting at 0.”, we pass is -r which indicates that “Backslash does not act as an escape character.  The backslash is considered to be part of the line.”.  This will make sure to include any backslash that is in the string in your output.

Then, we provide two variables into which we will store the split string.

#!/bin/bash

SOURCE_STRING='foo|blah|moo'

# Save the initial Interal Field Separator
OIFS="$IFS"

# Set the IFS to a custom delimiter
IFS='|'

read -r KEY VALUE <<< "${SOURCE_STRING}"
echo "KEY = $KEY, VALUE = $VALUE"

# Reset original IFS
IFS="$OIFS"

BASH Script With Default Arguments Defined in The Script

Often times you will want to write a BASH script where you don’t want to have to keep track of all of the positional command line arguments and/or you might want to configure it with a set of environmental variables while having a default value for each in the script.

Following is the syntax for declaring them in the shell script, and then an example on how to invoke it.

#!/bin/bash

: ${ARG1:="somedefault_arg1"}
: ${ARG2:="10"}

echo "ARG1 = $ARG1"
echo "ARG2 = $ARG2"

$ ./default-bash-vars.sh
ARG1 = somedefault_arg1
ARG2 = 10
$ ARG1="someOtherArg1" ARG2="20" ./default-bash-vars.sh
ARG1 = someOtherArg1
ARG2 = 20

In the example script, we have two variables, ARG1 and ARG2.  When running the script without providing any additional configuration the default values will be used.  When invoking it and defining the variables on the command line prior to executing the script those values will be used instead.

This prevents the situation where you potentially have many command-line arguments and then have to jugle the positional $1, $2, …. vars in the script.

Running Dynamically Generated Hive Queries From a Shell Script

If you want to write a HQL hive query and run it mulitple times from a shell script, each time passing it different data for the query, here is a quick example that should get you started.

The first thing to know is that by specifying n number of -hivevar key value pairs when invoking hive on the command line will allow you to pass that data into the hive process.

For example, if you do the following

$ hive -e 'SELECT * FROM some_table' -hivevar FOO=blah

You will have passed in a key of FOO with the value of ‘blah’ to the hive process.

A more practical example would be wanting to run the same hive query over multiple data partitions.

In this example, I’ve got a hive database that has a ‘packets’ table partitioned by hours which looks like 2014032601.

The hive query file (dest_ip_hive.sql) would look like:

SELECT packets.sourceip FROM packets
WHERE packets.destip = "${hivevar:DEST_IP}"
AND packets.hour = ${hivevar:HOUR}
GROUP BY packets.sourceip

And a shell script that would dynamically set those values for each invocation of hive would look like:

#!/bin/bash

#
# Destination IP that we are using to determine which
# packets we will examine.
#
DEST_IP="10.0.1.10"

for HOUR in 2014032209 2014032210 2014032211 2014032212
do

   echo "Running hive query for HOUR $HOUR"

   # Run a hive query from the command line setting variables that will be
   # expaned in the .sql file.
   hive -hivevar HOUR=$HOUR -hivevar DEST_IP=$DEST_IP \
   -f dest_ip_hive.sql > ${DEST_IP}-{$HOUR}.out

done

For each hour defined in the for loop, we will execute a hive command telling it to run the query contained in the file dest_ip_hive.sql.  The DEST_IP and HOUR variables that will be expanded in the query are passed to hive via the

-hivevar HOUR=$HOUR -hivevar DEST_IP=$DEST_IP

part of the hive command.  And the output for each query will be written to a different file for each query.

Eclipse Android Development Error executing aapt: Cannot run program “/path/to/aapt”: error=2, No such file or directory: error=2, No such file or directory

Even though the ADT bundle provides a 64 bit version, the system requirements indicate that “64-bit distributions must be capable of running 32-bit applications.”  I failed to see that when installing it under Fedora Core 20 and was getting the following error from Eclipse:

Error executing aapt: Cannot run program "/home/rchapin/sdks/adt-bundle-linux-x86_64-20131030/sdk/build-tools/android-4.4/aapt": error=2, No such file or directory: error=2, No such file or directory  android_sdk    line 1   Android ADT Problem

I checked to see if the file was there.  Yep.  I checked to see if was executable.  Yep.

It was only after finding a blog post about it and doing a file command on it that I noticed that it was a 32 bit executable:

file adt-bundle-linux-x86_64-20131030/sdk/build-tools/android-4.4/aapt 
adt-bundle-linux-x86_64-20131030/sdk/build-tools/android-4.4/aapt: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, not stripped

All I had to do was install the 32 bit libraries that the binaries are linked against:

yum install glibc.i686 zlib.i686 libstdc++.i686 ncurses-libs.i686 libgcc.i686

Once installed, the error should disappear from Eclipse as it tries to invoke the binaries and do a regular build and/or restarting or cleaning the project should clear the errors.