Creating an Array in Bash from a File With Each Element on a Separate Line

Let’s say that you have a file and you would like to convert each line in the file to an element in an array.

The key to this is knowing about and how to manipulate the IFS (Internal Field Separator).  The default IFS is whitespace (a space, tab, or newline) and if you create an array passing it a whitespace delimited list of strings, each token will be set to an element in the array.

ARRAY=(a b d c)

Will result in an array with a single letter in each element.

To do the same thing with the contents of a file, whereby each element is on a separate line, the first thing to be done is to set the IFS that is just new-lines (carriage returns).  Then set, as the input for the array, the contents of the file.

# Save our existing IFS
OIFS="$IFS"

# Set our IFS to a new-line/carriage return
IFS=$'\r\n'

# Create the array with the contents of a file
TEST_ARRAY=($(cat some_file.txt))

# Reset our IFS
IFS="$OIFS"

for i in "${TEST_ARRAY[@]}"
do
   echo $i
done

Removing The Last N Character From a String in Bash Script with sed

Here is a quick one-liner for trimming a specific number of characters from the end of a string under bash:

# Remove the last 5 characters
$ echo "somestringwith12345" | sed "s/.....$//g"
$ somestringwith

# Remove the last 3 characters
$ echo "somestringwith12345" | sed "s/...$//g"
$ somestringwith12

Splitting a String in Bash on the FIRST Occurrence of a Character

About a year ago I posted an article about how to split into an array of values based on a given delimiter in bash.

The following is how to take that same string and split it on the first occurrence of the same user defined delimiter.

Both use the ‘read’ command, but in a slightly different way.

Instead of passing read the -a [aname] parameter which tells it that “The words are assigned to sequential indices of the array variable aname, starting at 0.”, we pass is -r which indicates that “Backslash does not act as an escape character.  The backslash is considered to be part of the line.”.  This will make sure to include any backslash that is in the string in your output.

Then, we provide two variables into which we will store the split string.

#!/bin/bash

SOURCE_STRING='foo|blah|moo'

# Save the initial Interal Field Separator
OIFS="$IFS"

# Set the IFS to a custom delimiter
IFS='|'

read -r KEY VALUE <<< "${SOURCE_STRING}"
echo "KEY = $KEY, VALUE = $VALUE"

# Reset original IFS
IFS="$OIFS"

BASH Script With Default Arguments Defined in The Script

Often times you will want to write a BASH script where you don’t want to have to keep track of all of the positional command line arguments and/or you might want to configure it with a set of environmental variables while having a default value for each in the script.

Following is the syntax for declaring them in the shell script, and then an example on how to invoke it.

#!/bin/bash

: ${ARG1:="somedefault_arg1"}
: ${ARG2:="10"}

echo "ARG1 = $ARG1"
echo "ARG2 = $ARG2"

$ ./default-bash-vars.sh
ARG1 = somedefault_arg1
ARG2 = 10
$ ARG1="someOtherArg1" ARG2="20" ./default-bash-vars.sh
ARG1 = someOtherArg1
ARG2 = 20

In the example script, we have two variables, ARG1 and ARG2.  When running the script without providing any additional configuration the default values will be used.  When invoking it and defining the variables on the command line prior to executing the script those values will be used instead.

This prevents the situation where you potentially have many command-line arguments and then have to jugle the positional $1, $2, …. vars in the script.

Running Dynamically Generated Hive Queries From a Shell Script

If you want to write a HQL hive query and run it mulitple times from a shell script, each time passing it different data for the query, here is a quick example that should get you started.

The first thing to know is that by specifying n number of -hivevar key value pairs when invoking hive on the command line will allow you to pass that data into the hive process.

For example, if you do the following

$ hive -e 'SELECT * FROM some_table' -hivevar FOO=blah

You will have passed in a key of FOO with the value of ‘blah’ to the hive process.

A more practical example would be wanting to run the same hive query over multiple data partitions.

In this example, I’ve got a hive database that has a ‘packets’ table partitioned by hours which looks like 2014032601.

The hive query file (dest_ip_hive.sql) would look like:

SELECT packets.sourceip FROM packets
WHERE packets.destip = "${hivevar:DEST_IP}"
AND packets.hour = ${hivevar:HOUR}
GROUP BY packets.sourceip

And a shell script that would dynamically set those values for each invocation of hive would look like:

#!/bin/bash

#
# Destination IP that we are using to determine which
# packets we will examine.
#
DEST_IP="10.0.1.10"

for HOUR in 2014032209 2014032210 2014032211 2014032212
do

   echo "Running hive query for HOUR $HOUR"

   # Run a hive query from the command line setting variables that will be
   # expaned in the .sql file.
   hive -hivevar HOUR=$HOUR -hivevar DEST_IP=$DEST_IP \
   -f dest_ip_hive.sql > ${DEST_IP}-{$HOUR}.out

done

For each hour defined in the for loop, we will execute a hive command telling it to run the query contained in the file dest_ip_hive.sql.  The DEST_IP and HOUR variables that will be expanded in the query are passed to hive via the

-hivevar HOUR=$HOUR -hivevar DEST_IP=$DEST_IP

part of the hive command.  And the output for each query will be written to a different file for each query.

Creating a Beep from a Command Line or Shell Script

If you have a long-running command on shell-script that you want to generate a beep upon completion on your PC running Linux do the following:

Make sure that the pcspkr module is loaded:

# modprobe pcspkr

Then create a wrapper shell script that looks something like this:

#!/bin/bash

# Some long running command here . . .

echo -e '\a' > /dev/console

Executing Dynamically Generated SQL Queries in a Shell Script and Saving the Output to a Variable

If you would like to, in a shell script, dynamically generate SQL queries for MySQL and save the output of those queries to a variable that you can then use in the script, here is an example:

#!/bin/bash

for i in `cat tables_list.txt`
do

   # Build the query
   QUERY="SELECT count(*) FROM ${i}"

   # Run the query from the command-line and save the
   # output into the $ROW_COUNT variable
   ROW_COUNT=$(echo $QUERY | mysql -u${USER_NAME} -p${PASSWORD} -h ${HOST} -P ${PORT} --skip-column-names ${DBASE})

   # Do something with the var...
   echo -n -e "$ROW_COUNT\t" >> $OUT_FILE
   echo "$i" >> $OUT_FILE

done;