Looping Through a List of Files with Spaces in the File Name with Bash

If you have a list of files that you want to operate on in a loop in bash and some of them have spaces in the file name the default IFS (Internal Field Separator) will match with the space and tokenize the file.

The simple approach is to temporarily set the IFS as follows.  This can be done in a shell script, but the following example is directly on the command line for ‘one-liner’ usage.

OIFS="$IFS"

IFS=$'\n' 

for i in `find ./ -type f -iname '*some_criteria*'`; do "something with $i"; done

IFS="$OIFS"

The previous commands will:

  1. Save the existing IFS
  2. Update the IFS to a newline char
  3. Execute your loop with the results of a find command
  4. Reset the IFS

Use awk to Print from nth element to the End of the Line

If you want to extract from the nth token to the end of the line, following is how you can do that with awk:

Given a source file with the following:

line1 -- 01   0011 1
line2 -- 01   0011 2
line3 -- 01   0011 3
line4 -- 01   0011 4
line5 -- 01   0011 5
line6 -- 01   0011 6
line7 -- 01   0011 7
line8 -- 01   0011 8
line9 -- 01   0011 9
line10 -- 01   0011 10

If you want remove the 1st, 2nd, and 3rd items from the list, you can use awk to set those fields to an empty value as follows

awk '{$1=$2=$3=""; print $0}' test.out

Which will result in:

   0011 1
   0011 2
   0011 3
   0011 4
   0011 5
   0011 6
   0011 7
   0011 8
   0011 9
   0011 10

Generate a Random String of a Specified Size with a Shell Script

The following is a one-liner for generating a random string of a fixed size in bash, where the possible characters to use in the string are any digit, letter, and a newline.

By adding the newline, you are fairly sure to prevent getting one long line of text.

< /dev/urandom tr -dc "[:digit:][:alpha:][\n]" | head -c1000 > file.out

One-Liner for Converting CRLF to LF in Text Files

If you have text files created under DOS/Windows and need to convert the CRLF (carriage return and line feed) characters to LF (line feed) character, here is a quick one-liner.

cat file.txt | perl -ne 's/\x0D\x0A/\x0A/g; print' > file.txt.mod

You can also use dos2unix, however, especially under Cygwin I have seen dos2unix fail without giving any meaningful information about why it was unable to complete the task.  In that case, you can just do it by hand. 

Parsing Command Line Arguments with getopt in Bash

When writing utility scripts in Bash it is tempting to simply pass positional arguments, use $1, $2, etc. and be done with it.  However if you want to either share this utility with other members of your team and/or incorporate it into your system, it makes sense to implement your command line argument parsing in a more flexible and maintainable manner.

Using getopt you can very easily pass a variety of command line options and arguments.

Following is a link to a GitHub Gist with an example that illustrates the implementation of flags, options or arguments with values and long option names.

getopt-example.sh

Passing an Array as an Argument to a Bash Function

If you want to pass an array of items to a bash function, the simple answer is that you need to pass the expanded values.  That means that you can pass the data as a quoted value, assuming that the elements are whitespace delimited, or you can pass it as a string and then split it using an updated IFS (Internal Field Separator) inside the function.

Following is an example of taking the output of a Hive query (a single column that is separated by new lines), wrapping it in quotes and passing it as a single value to the function.

#!/bin/bash

#
# This function will accept the expanded elements of the array
#
function foo() {
# Loop through elements in the first argument passed.
   # In this case, each is separated by whitespace so we do
   # not need to change the IFS
   for i in $1
   do
      echo "i = $i"
   done
}

# Dynamically build our hive query
HIVE_QRY="use somedb; select some_column from some_table;"

# Dynamically build the hive command to execute
CMD="hive -e '$HIVE_QRY'"

# Execute the hive query in a subshell and store the result in
# the 'QRY_RETVAL' variable
QRY_RETVAL=$(eval $CMD)

# Call the foo method and pass it the output of the query, /QUOTED/
# so that it will be passed as a single argument and not a series
# of arguments for each row returned by the query
foo "${QRY_RETVAL}"

Removing the Last Token From a String in Bash with awk

Let’s say that you have some number of files for which you want to create a containing directory that is named with all but the last token of the file name, and you want to remove just the last token to create the name of the directory.

Much easier to explain with an example.  Given this list of files:

ls -1
foo_10_10_sometrash
foo_1_sometrash
foo_2_sometrash
foo_3_sometrash
foo_4_sometrash
foo_5_5_sometrash
foo_5_sometrash
foo_6_6_sometrash
foo_7_7_sometrash
foo_8_8_sometrash
foo_9_9_sometrash

You want to create a directory for each of the files as follows:

foo_5_sometrash should have a directory named foo_5.

Further, let’s assume that you have thousands, or hundreds of thousands of files.  In that case doing it via a script while you get a cup of coffee is the preferred solution.

The work will be done within a for i in loop iterating over the output from ls with a nested awk command.

for i in `ls -1`; do DIRNAME=$(echo $i | awk -F_ '{$NF=""; print $0}' | sed 's/ /_/g' | sed 's/_$//g'); mkdir $DIRNAME; done

Here is the command broken down:

DIRNAME=$(....)

will set the var $DIRNAME to the result of the code within the parenthesis.

awk -F_ '{$NF=""; print $0}

will set the field separator to a ‘_’, the character on which you will be ‘splitting’ your string.  $NF="" will set the last field to an empty string and then the print $0 will print the entire input line.

The following sed commands will replace the spaces generated by the awk command with the original separators, ‘_’, and then remove the spurious, trailing ‘_’.