I have a JSON object that's written in a weird way.
> {"custom": [ { "name": "addressIdNum", "valueNum": 12345678}, {
> "name": "cancelledDateAt", "valueAt": "2017-02-30T01:43:04.000Z" }] }
Not sure how to parse something like this. The keys are addressIdNum and cancelledDateAt and the values are 12345678 and 2017-02-30T01:43:04.000Z respectively.
How do I parse this using Snowflake SQL?
Thanks for all your help!
Best,
Preet Rajdeo
If your input is ALWAYS in this form (two elements in an array, with the same fields in the same element), you can combine PARSE_JSON function and the path access.
Just try this:
with input as (
select parse_json(
'{"custom": [ { "name": "addressIdNum", "valueNum": 12345678}, {"name": "cancelledDateAt", "valueAt": "2017-02-30T01:43:04.000Z" }] }')
as json)
select json:custom[0].valueNum::integer, json:custom[1].valueAt::timestamp from input;
----------------------------------+-----------------------------------+
JSON:CUSTOM[0].VALUENUM::INTEGER | JSON:CUSTOM[1].VALUEAT::TIMESTAMP |
----------------------------------+-----------------------------------+
12345678 | 2017-03-01 17:43:04 |
----------------------------------+-----------------------------------+
However, if the structure of your data might be different (e.g. elements in the array might be in a different order), it's probably best to write a JavaScript UDF in Snowflake to convert such messy data into something easier.
Related
not an SQL guru here.
Trying to write a query that gets a few columns of a table, and only the value "icon" of the json column below (named weather). I got to a pointwhere i can list all the attributes listed right after sessions, which are timestamps, but no luck in iterating them and joining to the rest of the table.
I also have the feeling that it wasn't very clever to store that value as an attribute name, especially as it's already stored in the "dt" value.
Can anybody confirm if this is best practice or not?
And could somebody help me get the "icon" value?
{
"lat":43.6423,
"lon":-72.2518,
"timezone":"America/New_York",
"timezone_offset":-14400,
"sessions":{
"1651078174":{
"dt":1651078174,
"sunrise":1651052825,
"sunset":1651103155,
"temp":48.45,
"feels_like":43.63,
"pressure":1009,
"humidity":68,
"dew_point":38.39,
"uvi":5,
"clouds":100,
"visibility":10000,
"wind_speed":11.5,
"wind_deg":310,
"weather":[
{
"id":804,
"main":"Clouds",
"description":"overcast clouds",
"icon":"04d"
}
]
}
}
}
If you have only 1 icon and multiple sessions you can run
if you have multiple icon you need to apply anothe CTE layer to extract them with json_Each
for more json function see https://www.postgresql.org/docs/current/functions-json.html
wITH CTE AS (
select value from json_each('{
"lat":43.6423,
"lon":-72.2518,
"timezone":"America/New_York",
"timezone_offset":-14400,
"sessions":{
"1651078174":{
"dt":1651078174,
"sunrise":1651052825,
"sunset":1651103155,
"temp":48.45,
"feels_like":43.63,
"pressure":1009,
"humidity":68,
"dew_point":38.39,
"uvi":5,
"clouds":100,
"visibility":10000,
"wind_speed":11.5,
"wind_deg":310,
"weather":[
{
"id":804,
"main":"Clouds",
"description":"overcast clouds",
"icon":"04d"
}
]
}
}
}
') WHERE key = 'sessions')
SELECT json_data.key session,
json_data.value-> 'weather' -> 0 ->> 'icon' FROM CTE,json_each(CTE.value) json_data
session | ?column?
:--------- | :-------
1651078174 | 04d
db<>fiddle here
I have a table with simple & complex entries:
id | formats | ...
1 [array]
2 [array]
...
I can select the rows I want based on some other columns.
The [array] is a list of complex entries
formats:
[{
format: "blah1"
hash_key: "hash_key1"
},
{
format: "blah2"
hash_key: "hash_key2"
},{
format: "correct"
hash_key: "hash_key3"
},
...
]
I need to loop through the the list of formats and if format=="correct" select the hash_key.
So I will return all of my rows with:
id1, hashkey
id2, hashkey
...
I don't know how this can be done in SQL. This would be easy with a while loop in C++ or Python, but I need to do it in SQL here.
I need to do this in Spanner SQL as this might matter. I can try any standard SQL answers.
I am following the instruction in the documentation for how to access JSON values in CloudWatch Insights where the recomendation is as follows
JSON arrays are flattened into a list of field names and values. For example, to specify the value of instanceId for the first item in requestParameters.instancesSet, use requestParameters.instancesSet.items.0.instanceId.
ref
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_AnalyzeLogData-discoverable-fields.html
I am trying the following and getting nothing in return. The intellisense autofills up to processList.0 but no further
fields processList.0.vss
| sort #timestamp desc
| limit 1
The JSON I am woking with is
"processList": [
{
"vss": xxxxx,
"name": "aurora",
"tgid": xxxx,
"vmlimit": "unlimited",
"parentID": 1,
"memoryUsedPc": 16.01,
"cpuUsedPc": 0.01,
"id": xxxxx,
"rss": xxxxx
},
{
"vss": xxxx,
"name": "aurora",
"tgid": xxxxxx,
"vmlimit": "unlimited",
"parentID": 1,
"memoryUsedPc": 16.01,
"cpuUsedPc": 0.06,
"id": xxxxx,
"rss": xxxxx
}]
Have you tried the following?
fields ##timestamp, #processList.0.vss
| sort ##timestamp desc
| limit 5
It may be a syntax error. If not, please post a couple of records worth of the overall structure, with #timestamp included.
The reference link that you have posted also states the following.
CloudWatch Logs Insights can extract a maximum of 100 log event fields
from a JSON log. For extra fields that are not extracted, you can use
the parse command to parse these fields from the raw unparsed log
event in the message field.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_AnalyzeLogData-discoverable-fields.html
For very large JSON messages, Insights intellisense may not be parsing all the fields into named fields. So, the solution is to use parse on the complete JSON string in the field where you expect your data field to be present. In your example and mine it is processList.
I was able to extract the value of specific cpuUsedPc under processList by using a query like the following.
fields #timestamp, cpuUtilization.total, processList
| parse processList /"name":"RDS processes","tgid":.*?,"parentID":.*?,"memoryUsedPc":.*?,"cpuUsedPc":(?<RDSProcessesCPUUsedPc>.*?),/
| sort #timestamp asc
| display #timestamp, cpuUtilization.total, RDSProcessesCPUUsedPc
I have a table say types, which had a JSON column, say location that looks like this:
{ "attribute":[
{
"type": "state",
"value": "CA"
},
{
"type": "distance",
"value": "200.00"
} ...
]
}
Each row in the table has the data, and all have the "type": "state" in it. I want to just extract the value of "type": "state" from every row in the table, and put it in a new column. I checked out several questions on SO, like:
Query for element of array in JSON column
Index for finding an element in a JSON array
Query for array elements inside JSON type
but could not get it working. I do not need to query on this. I need the value of this column. I apologize in advance if I missed something.
create table t(data json);
insert into t values('{"attribute":[{"type": "state","value": "CA"},{"type": "distance","value": "200.00"}]}'::json);
select elem->>'value' as state
from t, json_array_elements(t.data->'attribute') elem
where elem->>'type' = 'state';
| state |
| :---- |
| CA |
dbfiddle here
I mainly use Redshift where there is a built-in function to do this. So on the off-chance you're there, check it out.
redshift docs
It looks like Postgres has a similar function set:
https://www.postgresql.org/docs/current/static/functions-json.html
I think you'll need to chain three functions together to make this work.
SELECT
your_field::json->'attribute'->0->'value'
FROM
your_table
What I'm trying is a json extract by key name, followed by a json array extract by index (always the 1st, if your example is consistent with the full data), followed finally by another extract by key name.
Edit: got it working for your example
SELECT
'{ "attribute":[
{
"type": "state",
"value": "CA"
},
{
"type": "distance",
"value": "200.00"
}
]
}'::json->'attribute'->0->'value'
Returns "CA"
2nd edit: nested querying
#McNets is the right, better answer. But in this dive, I discovered you can nest queries in Postgres! How frickin' cool!
I stored the json as a text field in a dummy table and successfully ran this:
SELECT
(SELECT value FROM json_to_recordset(
my_column::json->'attribute') as x(type text, value text)
WHERE
type = 'state'
)
FROM dummy_table
I have a table with JSON array data I'd like to search.
CREATE TABLE data (id SERIAL, json JSON);
INSERT INTO data (id, json)
VALUES (1, '[{"name": "Value A", "value": 10}]');
INSERT INTO data (id, json)
VALUES (2, '[{"name": "Value B1", "value": 5}, {"name": "Value B2", "value": 15}]');
As described in this answer, i created a function, which also allows to create an index on the array data (important).
CREATE OR REPLACE FUNCTION json_val_arr(_j json, _key text)
RETURNS text[] AS
$$
SELECT array_agg(elem->>_key)
FROM json_array_elements(_j) AS x(elem)
$$
LANGUAGE sql IMMUTABLE;
This works nicely if I want to find an entire value (eg. "Value B1"):
SELECT *
FROM data
WHERE '{"Value B1"}'::text[] <# (json_val_arr(json, 'name'));
Now my questions:
Is it possible to find values with a wildcard (eg. "Value*")? Something like the following (naive) approach:
...
WHERE '{"Value%"}'::text[] <# (json_val_arr(json, 'name'));
Is it possible to find numeric values with comparison operators (eg. >= 10)? Again, a naive and obviously wrong approach:
...
WHERE '{10}'::int[] >= (json_val_arr(json, 'value'));
I tried to create a new function returning int[] but that did not work.
I created a SQL Fiddle to illustrate my problem.
Or would it be better to use a different approach like the following working queries:
SELECT *
FROM data,
json_array_elements(json) jsondata
WHERE jsondata ->> 'name' LIKE 'Value%';
and
...
WHERE cast(jsondata ->> 'value' as integer) <= 10;
However, for these queries, I was not able to create any index that was actually picked up by the queries.
Also, I'd like to implement all this in Postgresql 9.4 with JSONB eventually, but I think for the above questions this should not be an issue.
Thank you very much!
I know its been a while but I was just chugging on something similar (using wild cards to query json datatypes) and thought I'd share what I found.
Firstly, this was a huge point in the right direction:
http://schinckel.net/2014/05/25/querying-json-in-postgres/
The take away is that your method of exploding the json element into something else (a record-set) is the way to go. It lets you query the json elements with normal postgres stuff.
In my case:
#Table:test
ID | jsonb_column
1 | {"name": "", "value": "reserved", "expires_in": 13732}
2 | {"name": "poop", "value": "{\"ns\":[\"Whaaat.\"]}", "expires_in": 4554}
3 | {"name": "dog", "value": "{\"ns\":[\"woof.\"]}", "expires_in": 4554}
Example Query
select * from test jsonb_to_recordset(x) where jsonb_column->>'name' like '%o%';
# => Returns
# 2 | {"name": "poop", "value": "{\"ns\":[\"Whaaat.\"]}", "expires_in": 4554}
And to answer your question about jsonb: It looks like jsonb is the better route MOST of the time. It has more methods and faster read (but slower write) times.
Sources:
http://www.postgresql.org/docs/9.4/static/functions-json.html
http://www.postgresql.org/docs/9.4/static/datatype-json.html
Happy hunting!