Welcome to my website. I am always posting links to photo albums, art, technology and other creations. Everything that you will see on my numerous personal sites is powered by the formVistaTM Website Management Engine.


  • How to Return Hive Query Results Similary to MySQL \G in One Vertical Column
    06/09/2016 4:03PM

    When trying to look at data in a database with really wide rows even just selecting 1 row to see the data is nearly impossible to understand when the single row wraps 7 or 8 times.

    MySQL offers the '\G' option to display the output in a single column.

    The corresponding method in Hive is to execute the following set command:

    > !set outputformat vertical
    > SELECT something FROM some table;

  • Running Dynamically Generated Hive Queries From a Shell Script
    03/26/2014 10:35AM

    If you want to write a HQL hive query and run it mulitple times from a shell script, each time passing it different data for the query, here is a quick example that should get you started.

    The first thing to know is that by specifying n number of -hivevar key value pairs when invoking hive on the command line will allow you to pass that data into the hive process.

    For example, if you do the following

    $ hive -e 'SELECT * FROM some_table' -hivevar FOO=blah

    You will have passed in a key of FOO with the value of 'blah' to the hive process.

    A more practical example would be wanting to run the same hive query over multiple data partitions.

    In this example, I've got a hive database that has a 'packets' table partitioned by hours which looks like 2014032601.

    The hive query file (dest_ip_hive.sql) would look like:

    SELECT packets.sourceip FROM packets
    WHERE packets.destip = "${hivevar:DEST_IP}"
    AND packets.hour = ${hivevar:HOUR}
    GROUP BY packets.sourceip

    And a shell script that would dynamically set those values for each invocation of hive would look like:


    # Destination IP that we are using to determine which
    # packets we will examine.

    for HOUR in 2014032209 2014032210 2014032211 2014032212

       echo "Running hive query for HOUR $HOUR"

       # Run a hive query from the command line setting variables that will be
       # expaned in the .sql file.
       hive -hivevar HOUR=$HOUR -hivevar DEST_IP=$DEST_IP \
       -f dest_ip_hive.sql > ${DEST_IP}-{$HOUR}.out


    For each hour defined in the for loop, we will execute a hive command telling it to run the query contained in the file dest_ip_hive.sql.  The DEST_IP and HOUR variables that will be expanded in the query are passed to hive via the

    -hivevar HOUR=$HOUR -hivevar DEST_IP=$DEST_IP

    part of the hive command.  And the output for each query will be written to a different file for each query.

Advanced Search