Query a single element from a dictionary-like column in SQL - sql

I have a PostgresSQL table that looks as follows:
id order_id products
[PK] integer integer character varying
1 123 {"type": "foo", "counts": 2}
2 456 {"type": "foobar", "counts": 4}
3 789 {"type": "foo", "counts": 1}
4 678 {"type": "baz", "counts": 3}
I would like to query for only where type = foo.
In another query, I've successfully used the following:
SELECT
table_a.data::json->>'type' prod_type,
FROM table.a
But, this only works because the data column is JSON type.
How would I index into the products column such that I only return type = foo?
1 123 {"type": "foo", "counts": 2}
3 789 {"type": "foo", "counts": 1}
Thanks!

You can try something like this:
SELECT *
FROM test
WHERE products::json->>'type' = 'foo';
Sample Output:

Related

BigQuery string-formatting to json

Is the following a full list of all value types as they're passed to json in BigQuery? I've gotten this by trial and error but haven't been able to find this in the documentation:
select
NULL as NullValue,
FALSE as BoolValue,
DATE '2014-01-01' as DateValue,
INTERVAL 1 year as IntervalValue,
DATETIME '2014-01-01 01:02:03' as DatetimeValue,
TIMESTAMP '2014-01-01 01:02:03' as TimestampValue,
"Hello" as StringValue,
B"abc" as BytesValue,
123 as IntegerValue,
NUMERIC '3.14' as NumericValue,
3.14 as FloatValue,
TIME '12:30:00.45' as TimeValue,
[1,2,3] as ArrayValue,
STRUCT('Mark' as first, 'Thomas' as last) as StructValue,
[STRUCT(1 as x, 2 as y), STRUCT(5 as x, 6 as y)] as ArrayStructValue,
STRUCT(1 as x, [1,2,3] as y, ('a','b','c') as z) as StructNestedValue
{
"NullValue": null,
"BoolValue": "false", // why not just false without quotes?
"DateValue": "2014-01-01",
"IntervalValue": "1-0 0 0:0:0",
"DatetimeValue": "2014-01-01T01:02:03",
"TimestampValue": "2014-01-01T01:02:03Z",
"StringValue": "Hello",
"BytesValue": "YWJj",
"IntegerValue": "123",
"NumericValue": "3.14",
"FloatValue": "3.14",
"TimeValue": "12:30:00.450000",
"ArrayValue": ["1", "2", "3"],
"StructValue": {
"first": "Mark",
"last": "Thomas"
},
"ArrayStructValue": [
{"x": "1", "y": "2"},
{"x": "5", "y": "6"}
],
"StructNestedValue": {
"x": "1",
"y": ["1", 2", "3"],
"z": {"a": "a", b": "b", "c": "c"}
}
}
Honestly, it seems to me that other than the null value and the array [] or struct {} container, everything is string-enclosed, which seems a bit odd.
According to this document, json is built on two structures:
A collection of name/value pairs. In various languages, this is
realized as an object, record, struct, dictionary, hash table, keyed
list, or associative array.
An ordered list of values. In most
languages, this is realized as an array, vector, list, or sequence.
The result of the SELECT query is in json format, wherein [] depicts an array datatype, {} depicts an object datatype and double quotes(" ") depicts a string value as in the query itself.

is there's function to extract value from dictionary in list?

kindly need to extract name value, even it's Action or adventure from this column in new column
in pandas
'[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]'
You want from_records:
import pandas as pd
data = [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]
df = pd.DataFrame.from_records(data)
df
you get
id name
0 28 Action
1 12 Adventure
2 14 Fantasy
3 878 Science Fiction

How to select a field of a JSON object coming from the WHERE condition

I have this table
id name json
1 alex {"type": "user", "items": [ {"name": "banana", "color": "yellow"}, {"name": "apple", "color": "red"} ] }
2 peter {"type": "user", "items": [ {"name": "watermelon", "color": "green"}, {"name": "pepper", "color": "red"} ] }
3 john {"type": "user", "items": [ {"name": "tomato", "color": "red"} ] }
4 carl {"type": "user", "items": [ {"name": "orange", "color": "orange"}, {"name": "nut", "color": "brown"} ] }
Important, each json object can have different number of "items", but what I need is the "product name" of JUST the object that matched in the WHERE condition.
My desired output would be the two first columns and just the name of the item, WHERE the color is like %red%:
id name fruit
1 alex apple
2 peter pepper
3 john tomato
select id, name, ***** (this is what I don't know) FROM table
where JSON_EXTRACT(json, "$.items[*].color") like '%red%'
I would recommend json_table(), if you are running MySQL 8.0:
select t.id, t.name, x.name as fruit
from mytable t
cross join json_table(
t.js,
'$.items[*]' columns (name varchar(50) path '$.name', color varchar(50) path '$.color')
) x
where x.color = 'red'
This function is not implemented in MariaDB. We can unnest manually with the help of a numbers table:
select t.id, t.name,
json_unquote(json_extract(t.js, concat('$.items[', x.num, '].name'))) as fruit
from mytable t
inner join (select 0 as num union all select 1 union all select 2 ...) x(num)
on x.num < json_length(t.js, '$.items')
where json_unquote(json_extract(t.js, concat('$.items[', x.num, '].color'))) = 'red'
You can use JSON_EXTRACT() function along with Recursive Common Table Expression in order to generate rows dynamically such as
WITH RECURSIVE cte AS
(
SELECT 1 AS n
UNION ALL
SELECT n + 1
FROM cte
WHERE cte.n < (SELECT MAX(JSON_LENGTH(json)) FROM t )
)
SELECT id, name,
JSON_UNQUOTE(JSON_EXTRACT(json,CONCAT('$.items[',n-1,'].name'))) AS fruit
FROM cte
JOIN t
WHERE JSON_EXTRACT(json,CONCAT('$.items[',n-1,'].color')) = "red"
Demo

PLSQL: remove unwanted double quotes in a Json

I have a Json like this (it is contained in a clob variable):
{"id": "33", "type": "abc", "val": "2", "cod": "", "sg1": "1", "sg2": "1"}
{"id": "359", "type": "abcef", "val": "52", "cod": "aa", "sg1": "", "sg2": "0"}
…
I need to remove " from values of: id, val, sg1, sg2
Is it possibile?
For example, I need to obtain this:
{"id": 33, "type": "abc", "val": 2, "cod": "", "sg1": 1, "sg2": 1}
{"id": 359, "type": "abcef", "val": 52, "cod": "aa", "sg1": , "sg2": 0}
…
If you are using Oracle 12 (R2?) or later then you can convert your JSON to the appropriate data types and then convert it back to JSON.
Oracle 18 Setup:
CREATE TABLE test_data ( value CLOB );
INSERT INTO test_data ( value )
VALUES ( '{"id": "33", "type": "abc", "val": "2", "cod": "", "sg1": "1", "sg2": "1"}' );
INSERT INTO test_data ( value )
VALUES ( '{"id": "359", "type": "abcef", "val": "52", "cod": "aa", "sg1": "", "sg2": "0"}' );
Query:
SELECT JSON_OBJECT(
'id' IS j.id,
'type' IS j.typ,
'val' IS j.val,
'cod' IS j.cod,
'sg1' IS j.sg1,
'sg2' IS j.sg2
) AS JSON
FROM test_data t
CROSS JOIN
JSON_TABLE(
t.value,
'$'
COLUMNS
id NUMBER(5,0) PATH '$.id',
typ VARCHAR2(10) PATH '$.type',
val NUMBER(5,0) PATH '$.val',
cod VARCHAR2(10) PATH '$.cod',
sg1 NUMBER(5,0) PATH '$.sg1',
sg2 NUMBER(5,0) PATH '$.sg2'
) j
Output:
| JSON |
| :--------------------------------------------------------------- |
| {"id":33,"type":"abc","val":2,"cod":null,"sg1":1,"sg2":1} |
| {"id":359,"type":"abcef","val":52,"cod":"aa","sg1":null,"sg2":0} |
Or, if you want to use regular expressions (you shouldn't if you have the choice and should use a proper JSON parser instead) then:
Query 2:
SELECT REGEXP_REPLACE(
REGEXP_REPLACE(
value,
'"(id|val|sg1|sg2)": ""',
'"\1": "null"'
),
'"(id|val|sg1|sg2)": "(\d+|null)"',
'"\1": \2'
) AS JSON
FROM test_data
Output:
| JSON |
| :-------------------------------------------------------------------------- |
| {"id": 33, "type": "abc", "val": 2, "cod": "", "sg1": 1, "sg2": 1} |
| {"id": 359, "type": "abcef", "val": 52, "cod": "aa", "sg1": null, "sg2": 0} |
db<>fiddle here

Lookup smallest value greater than current

I have an objects table and a lookup table. In the objects table, I'm looking to add the smallest value from the lookup table that is greater than the object's number.
I found this similar question but it's about finding a value greater than a constant, rather than changing for each row.
In code:
import pandas as pd
objects = pd.DataFrame([{"id": 1, "number": 10}, {"id": 2, "number": 30}])
lookup = pd.DataFrame([{"number": 3}, {"number": 12}, {"number": 40}])
expected = pd.DataFrame(
[
{"id": 1, "number": 10, "smallest_greater": 12},
{"id": 2, "number": 30, "smallest_greater": 40},
]
)
First compare each value lookup['number'] by objects['number'] to 2d boolean mask, then add cumsum and compare first value by 1 and get position by numpy.argmax for set value by lookup['number'].
Output is generated with numpy.where for overwrite all not matched values to NaN.
objects = pd.DataFrame([{"id": 1, "number": 10}, {"id": 2, "number": 30},
{"id": 3, "number": 100},{"id": 4, "number": 1}])
print (objects)
id number
0 1 10
1 2 30
2 3 100
3 4 1
m1 = lookup['number'].values >= objects['number'].values[:, None]
m2 = np.cumsum(m1, axis=1) == 1
m3 = np.any(m1, axis=1)
out = lookup['number'].values[m2.argmax(axis=1)]
objects['smallest_greater'] = np.where(m3, out, np.nan)
print (objects)
id number smallest_greater
0 1 10 12.0
1 2 30 40.0
2 3 100 NaN
3 4 1 3.0
smallest_greater = []
for i in objects['number']: smallest_greater.append(lookup['number'[lookup[lookup['number']>i].sort_values(by='number').index[0]])
objects['smallest_greater'] = smallest_greater