The following is an example showing how to query multiple nested arrays in JSON data in a PostgreSQL database.
Given
CREATE TABLE sample_json (
id serial PRIMARY KEY,
name varchar(64),
json_data json
);
INSERT INTO sample_json (name, json_data)
VALUES
(
'NA',
'
{
"location": "US",
"topLevelArray": [
{
"id": 1,
"secondLevelArray": [
{
"key": "someKey",
"operator": "=",
"value": 10
},
{
"key": "foo",
"operator": ">=",
"value": 5
},
{
"key": "someOtherKey",
"operator": ">",
"value": 647
}
]
},
{
"id": 2,
"secondLevelArray": [
{
"key": "blah",
"operator": "<",
"value": 7
}
]
}
]
}
'
),
(
'EU',
'
{
"location": "poland",
"topLevelArray": [
{
"id": 2,
"secondLevelArray": [
{
"key": "bar",
"operator": "<",
"value": 10
},
{
"key": "moo",
"operator": ">=",
"value": 16
},
{
"key": "baz",
"operator": "!=",
"value": 9
}
]
}
]
}
'
)
;
With the aforementioned data, let’s say we want to know the id of the rows that have an object in the secondLevelArray with operator equal to >= and the value of the key field.
The concept to understand to be able to search for all of the keys in the secondLevelArray where the operator is >= is the lateral join. The TLDR; is that a subquery appearing in the FROM clause can reference columns provided by preceding items. Or, you can write clauses in the FROM clause that read from the result of previous FROM clauses.
Let’s run some queries and go through them, line-by-line. First we will just select everything in the table.
SELECT * FROM sample_json;
|id |name|json_data |
|---|----|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|1 |NA |"
{
\"location\": \"US\",
\"topLevelArray\": [
{
\"id\": 1,
\"secondLevelArray\": [
{
\"key\": \"someKey\",
\"operator\": \"=\",
\"value\": 10
},
{
\"key\": \"foo\",
\"operator\": \">=\",
\"value\": 5
},
{
\"key\": \"someOtherKey\",
\"operator\": \">\",
\"value\": 647
}
]
},
{
\"id\": 2,
\"secondLevelArray\": [
{
\"key\": \"blah\",
\"operator\": \"<\",
\"value\": 7
}
]
}
]
}
"|
|2 |EU |"
{
\"location\": \"poland\",
\"topLevelArray\": [
{
\"id\": 2,
\"secondLevelArray\": [
{
\"key\": \"bar\",
\"operator\": \"<\",
\"value\": 10
},
{
\"key\": \"moo\",
\"operator\": \">=\",
\"value\": 16
},
{
\"key\": \"baz\",
\"operator\": \"!=\",
\"value\": 9
}
]
}
]
}
" |
As expected, we just get back everything.
Now, let’s start to drill down into the JSON object. First we will select the row id and just the data from the topLevelArray.
SELECT
sj.id
topLevelArray
FROM
sample_json sj,
json_array_elements(json_data -> 'topLevelArray') topLevelArray
;
|id |toplevelarray |
|---|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|1 |"{
\"id\": 1,
\"secondLevelArray\": [
{
\"key\": \"someKey\",
\"operator\": \"=\",
\"value\": 10
},
{
\"key\": \"foo\",
\"operator\": \">=\",
\"value\": 5
},
{
\"key\": \"someOtherKey\",
\"operator\": \">\",
\"value\": 647
}
]
}"|
|1 |"{
\"id\": 2,
\"secondLevelArray\": [
{
\"key\": \"blah\",
\"operator\": \"<\",
\"value\": 7
}
]
}" |
|2 |"{
\"id\": 2,
\"secondLevelArray\": [
{
\"key\": \"bar\",
\"operator\": \"<\",
\"value\": 10
},
{
\"key\": \"moo\",
\"operator\": \">=\",
\"value\": 16
},
{
\"key\": \"baz\",
\"operator\": \"!=\",
\"value\": 9
}
]
}" |
The “columns” selected are the id and topLevelArray. The id is straightforward. The topLevelArray is a lateral join clause statement and we use the json_array_elements() function to select the contents of the json_data.topLevelArray key.
We can continue to traverse deeper into the JSON object with another lateral join clause that accesses the next nested array from the topLevelArray key.
SELECT
sj.id,
secondLevelElements
FROM
sample_json sj,
json_array_elements(json_data -> 'topLevelArray') topLevelArray,
json_array_elements(topLevelArray -> 'secondLevelArray') secondLevelElements
;
|id |secondlevelelements |
|---|----------------------------------------------------------------------------------------------------------------------------------------------------|
|1 |"{
\"key\": \"someKey\",
\"operator\": \"=\",
\"value\": 10
}" |
|1 |"{
\"key\": \"foo\",
\"operator\": \">=\",
\"value\": 5
}" |
|1 |"{
\"key\": \"someOtherKey\",
\"operator\": \">\",
\"value\": 647
}"|
|1 |"{
\"key\": \"blah\",
\"operator\": \"<\",
\"value\": 7
}" |
|2 |"{
\"key\": \"bar\",
\"operator\": \"<\",
\"value\": 10
}" |
|2 |"{
\"key\": \"moo\",
\"operator\": \">=\",
\"value\": 16
}" |
|2 |"{
\"key\": \"baz\",
\"operator\": \"!=\",
\"value\": 9
}" |
We add an additional FROM clause. This one referencing the result of the previous FROM clause.
json_array_elements(json_data -> 'topLevelArray') topLevelArray,
json_array_elements(topLevelArray -> 'secondLevelArray') secondLevelElements
Now we have access to the elements in the secondLevelArray and we can add a WHERE clause to select only what we want from that nested array.
select
sj.id,
secondLevelElements ->> 'key'
from
sample_json sj,
json_array_elements(json_data -> 'topLevelArray') topLevelArray,
json_array_elements(topLevelArray -> 'secondLevelArray') secondLevelElements
where
secondLevelElements ->> 'operator' = '>='
;
|id |?column?|
|---|--------|
|1 |foo |
|2 |moo |
The result being the row id, and the value of the key field in inner most nested object.