Query nested arrays in PostgreSQL JSON data

The following is an example showing how to query multiple nested arrays in JSON data in a PostgreSQL database.

Given

CREATE TABLE sample_json (
  id serial PRIMARY KEY,
  name varchar(64),
  json_data json
);


INSERT INTO sample_json (name, json_data)
VALUES
	(
	'NA',
	'
	{
	    "location": "US",
	    "topLevelArray": [
	        {
	            "id": 1,
	            "secondLevelArray": [
	                {
	                    "key": "someKey",
	                    "operator": "=",
	                    "value": 10
	                },
	                {
	                    "key": "foo",
	                    "operator": ">=",
	                    "value": 5
	                },
	                {
	                    "key": "someOtherKey",
	                    "operator": ">",
	                    "value": 647
	                }
	            ]
	        },
	        {
	            "id": 2,
	            "secondLevelArray": [
	                {
	                    "key": "blah",
	                    "operator": "<",
	                    "value": 7
	                }
	            ]
	        }
	    ]
	}
	'
	),
	(
	'EU',
	'
	{
	    "location": "poland",
	    "topLevelArray": [
	        {
	            "id": 2,
	            "secondLevelArray": [
	                {
	                    "key": "bar",
	                    "operator": "<",
	                    "value": 10
	                },
	                {
	                    "key": "moo",
	                    "operator": ">=",
	                    "value": 16
	                },
	                {
	                    "key": "baz",
	                    "operator": "!=",
	                    "value": 9
	                }
	            ]
	        }
	    ]
	}
	'	
	)
;

With the aforementioned data, let’s say we want to know the id of the rows that have an object in the secondLevelArray with operator equal to >= and the value of the key field.

The concept to understand to be able to search for all of the keys in the secondLevelArray where the operator is >= is the lateral join. The TLDR; is that a subquery appearing in the FROM clause can reference columns provided by preceding items. Or, you can write clauses in the FROM clause that read from the result of previous FROM clauses.

Let’s run some queries and go through them, line-by-line. First we will just select everything in the table.

SELECT * FROM sample_json;

|id |name|json_data                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|---|----|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|1  |NA  |"
	{
	    \"location\": \"US\",
	    \"topLevelArray\": [
	        {
	            \"id\": 1,
	            \"secondLevelArray\": [
	                {
	                    \"key\": \"someKey\",
	                    \"operator\": \"=\",
	                    \"value\": 10
	                },
	                {
	                    \"key\": \"foo\",
	                    \"operator\": \">=\",
	                    \"value\": 5
	                },
	                {
	                    \"key\": \"someOtherKey\",
	                    \"operator\": \">\",
	                    \"value\": 647
	                }
	            ]
	        },
	        {
	            \"id\": 2,
	            \"secondLevelArray\": [
	                {
	                    \"key\": \"blah\",
	                    \"operator\": \"<\",
	                    \"value\": 7
	                }
	            ]
	        }
	    ]
	}
	"|
|2  |EU  |"
	{
	    \"location\": \"poland\",
	    \"topLevelArray\": [
	        {
	            \"id\": 2,
	            \"secondLevelArray\": [
	                {
	                    \"key\": \"bar\",
	                    \"operator\": \"<\",
	                    \"value\": 10
	                },
	                {
	                    \"key\": \"moo\",
	                    \"operator\": \">=\",
	                    \"value\": 16
	                },
	                {
	                    \"key\": \"baz\",
	                    \"operator\": \"!=\",
	                    \"value\": 9
	                }
	            ]
	        }
	    ]
	}
	"                                                                                                                                                                                                                                                                      |

As expected, we just get back everything.

Now, let’s start to drill down into the JSON object. First we will select the row id and just the data from the topLevelArray.

SELECT
	sj.id
	topLevelArray
FROM
	sample_json sj,
	json_array_elements(json_data -> 'topLevelArray') topLevelArray
;

|id |toplevelarray                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
|---|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|1  |"{
	            \"id\": 1,
	            \"secondLevelArray\": [
	                {
	                    \"key\": \"someKey\",
	                    \"operator\": \"=\",
	                    \"value\": 10
	                },
	                {
	                    \"key\": \"foo\",
	                    \"operator\": \">=\",
	                    \"value\": 5
	                },
	                {
	                    \"key\": \"someOtherKey\",
	                    \"operator\": \">\",
	                    \"value\": 647
	                }
	            ]
	        }"|
|1  |"{
	            \"id\": 2,
	            \"secondLevelArray\": [
	                {
	                    \"key\": \"blah\",
	                    \"operator\": \"<\",
	                    \"value\": 7
	                }
	            ]
	        }"                                                                                                                                                                                                                                                                                                                                    |
|2  |"{
	            \"id\": 2,
	            \"secondLevelArray\": [
	                {
	                    \"key\": \"bar\",
	                    \"operator\": \"<\",
	                    \"value\": 10
	                },
	                {
	                    \"key\": \"moo\",
	                    \"operator\": \">=\",
	                    \"value\": 16
	                },
	                {
	                    \"key\": \"baz\",
	                    \"operator\": \"!=\",
	                    \"value\": 9
	                }
	            ]
	        }"             |

The “columns” selected are the id and topLevelArray. The id is straightforward. The topLevelArray is a lateral join clause statement and we use the json_array_elements() function to select the contents of the json_data.topLevelArray key.

We can continue to traverse deeper into the JSON object with another lateral join clause that accesses the next nested array from the topLevelArray key.

SELECT
	sj.id,
	secondLevelElements
FROM
	sample_json sj,
	json_array_elements(json_data -> 'topLevelArray') topLevelArray,
	json_array_elements(topLevelArray -> 'secondLevelArray') secondLevelElements
;

|id |secondlevelelements                                                                                                                                   |
|---|----------------------------------------------------------------------------------------------------------------------------------------------------|
|1  |"{
	                    \"key\": \"someKey\",
	                    \"operator\": \"=\",
	                    \"value\": 10
	                }"      |
|1  |"{
	                    \"key\": \"foo\",
	                    \"operator\": \">=\",
	                    \"value\": 5
	                }"          |
|1  |"{
	                    \"key\": \"someOtherKey\",
	                    \"operator\": \">\",
	                    \"value\": 647
	                }"|
|1  |"{
	                    \"key\": \"blah\",
	                    \"operator\": \"<\",
	                    \"value\": 7
	                }"          |
|2  |"{
	                    \"key\": \"bar\",
	                    \"operator\": \"<\",
	                    \"value\": 10
	                }"          |
|2  |"{
	                    \"key\": \"moo\",
	                    \"operator\": \">=\",
	                    \"value\": 16
	                }"         |
|2  |"{
	                    \"key\": \"baz\",
	                    \"operator\": \"!=\",
	                    \"value\": 9
	                }"          |

We add an additional FROM clause. This one referencing the result of the previous FROM clause.

json_array_elements(json_data -> 'topLevelArray') topLevelArray,
json_array_elements(topLevelArray -> 'secondLevelArray') secondLevelElements

Now we have access to the elements in the secondLevelArray and we can add a WHERE clause to select only what we want from that nested array.

select
	sj.id,
	secondLevelElements ->> 'key'
from
	sample_json sj,
	json_array_elements(json_data -> 'topLevelArray') topLevelArray,
	json_array_elements(topLevelArray -> 'secondLevelArray') secondLevelElements
where
	secondLevelElements ->> 'operator' = '>='
;

|id |?column?|
|---|--------|
|1  |foo     |
|2  |moo     |

The result being the row id, and the value of the key field in inner most nested object.

Leave a Reply