not an SQL guru here.
Trying to write a query that gets a few columns of a table, and only the value "icon" of the json column below (named weather). I got to a pointwhere i can list all the attributes listed right after sessions, which are timestamps, but no luck in iterating them and joining to the rest of the table.
I also have the feeling that it wasn't very clever to store that value as an attribute name, especially as it's already stored in the "dt" value.
Can anybody confirm if this is best practice or not?
And could somebody help me get the "icon" value?
{
"lat":43.6423,
"lon":-72.2518,
"timezone":"America/New_York",
"timezone_offset":-14400,
"sessions":{
"1651078174":{
"dt":1651078174,
"sunrise":1651052825,
"sunset":1651103155,
"temp":48.45,
"feels_like":43.63,
"pressure":1009,
"humidity":68,
"dew_point":38.39,
"uvi":5,
"clouds":100,
"visibility":10000,
"wind_speed":11.5,
"wind_deg":310,
"weather":[
{
"id":804,
"main":"Clouds",
"description":"overcast clouds",
"icon":"04d"
}
]
}
}
}
If you have only 1 icon and multiple sessions you can run
if you have multiple icon you need to apply anothe CTE layer to extract them with json_Each
for more json function see https://www.postgresql.org/docs/current/functions-json.html
wITH CTE AS (
select value from json_each('{
"lat":43.6423,
"lon":-72.2518,
"timezone":"America/New_York",
"timezone_offset":-14400,
"sessions":{
"1651078174":{
"dt":1651078174,
"sunrise":1651052825,
"sunset":1651103155,
"temp":48.45,
"feels_like":43.63,
"pressure":1009,
"humidity":68,
"dew_point":38.39,
"uvi":5,
"clouds":100,
"visibility":10000,
"wind_speed":11.5,
"wind_deg":310,
"weather":[
{
"id":804,
"main":"Clouds",
"description":"overcast clouds",
"icon":"04d"
}
]
}
}
}
') WHERE key = 'sessions')
SELECT json_data.key session,
json_data.value-> 'weather' -> 0 ->> 'icon' FROM CTE,json_each(CTE.value) json_data
session | ?column?
:--------- | :-------
1651078174 | 04d
db<>fiddle here
Related
"example_code": [
{
"code": "blah",
"type": "value"
}
]
In other cases we would write : (meta->'example_code'->'code')
But, since it is inside [] braces, i am not able to extract.
Welcome to SO. Since example_code is an array use -> 0 to access its first (and only) element. Here is the documentation on it.
CTE the_table is a mimic of real data.
with the_table(meta) as
(
values ('{"example_code":[{"code":"blah", "type":"value"}]}'::json)
)
select meta -> 'example_code' -> 0 ->> 'code' from the_table;
I have table (orders) with jsonb[] column named steps in Postgres db.
I need create SQL query to select records where Step1 and Step2 and Step3 has success status
[
{
"step_name"=>"Step1",
"status"=>"success",
"timestamp"=>1636120240
},
{
"step_name"=>"Step2",
"status"=>"success",
"timestamp"=>1636120275
},
{
"step_name"=>"Step3",
"status"=>"success",
"timestamp"=>1636120279
},
{
"step_name"=>"Step4",
"timestamp"=>1636120236
"status"=>"success"
}
]
table structure
id | name | steps (jsonb)
'Normalize' steps into a list of JSON items and check whether every one of them has "status":"success". BTW your example is not valid JSON. All => need to be replaced with : and a comma is missing.
select id, name from orders
where
(
select bool_and(j->>'status' = 'success')
from jsonb_array_elements(steps) j
where j->>'step_name' in ('Step1','Step2','Step3') -- if not all steps but only these are needed
);
You can use JSON value contain operation for check condition exist or not
Demo
select
*
from
test
where
steps #> '[{"step_name":"Step1","status":"success"},{"step_name":"Step2","status":"success"},{"step_name":"Step3","status":"success"}]'
I followed these instructions to get my AWS WAF data into an Athena table.
I would like to query the data to find the latest requests with an action of BLOCK. This query works:
SELECT
from_unixtime(timestamp / 1000e0) AS date,
action,
httprequest.clientip AS ip,
httprequest.uri AS request,
httprequest.country as country,
terminatingruleid,
rulegrouplist
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC
LIMIT 100;
My issue is cleanly identifying the "terminatingrule" - the reason the request was blocked. As an example, a result has
terminatingrule = AWS-AWSManagedRulesCommonRuleSet
And
rulegrouplist = [
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesAmazonIpReputationList",
"terminatingrule": "null",
"excludedrules": "null"
},
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesKnownBadInputsRuleSet",
"terminatingrule": "null",
"excludedrules": "null"
},
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesLinuxRuleSet",
"terminatingrule": "null",
"excludedrules": "null"
},
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesCommonRuleSet",
"terminatingrule": {
"rulematchdetails": "null",
"action": "BLOCK",
"ruleid": "NoUserAgent_HEADER"
},
"excludedrules":"null"
}
]
The piece of data I would like separated into a column is rulegrouplist[terminatingrule].ruleid which has a value of NoUserAgent_HEADER
AWS provide useful information on querying nested Athena arrays, but I have been unable to get the result I want.
I have framed this as an AWS question but since Athena uses SQL queries, it's likely that anyone with good SQL skills could work this out.
It's not entirely clear to me exactly what you want, but I'm going to assume you are after the array element where terminatingrule is not "null" (I will also assume that if there are multiple you want the first).
The documentation you link to say that the type of the rulegrouplist column is array<string>. The reason why it is string and not a complex type is because there seems to be multiple different schemas for this column, one example being that the terminatingrule property is either the string "null", or a struct/object – something that can't be described using Athena's type system.
This is not a problem, however. When dealing with JSON there's a whole set of JSON functions that can be used. Here's one way to use json_extract combined with filter and element_at to remove array elements where the terminatingrule property is the string "null" and then pick the first of the remaining elements:
SELECT
element_at(
filter(
rulegrouplist,
rulegroup -> json_extract(rulegroup, '$.terminatingrule') <> CAST('null' AS JSON)
),
1
) AS first_non_null_terminatingrule
FROM waf_logs
WHERE action = 'BLOCK'
ORDER BY date DESC
You say you want the "latest", which to me is ambiguous and could mean both first non-null and last non-null element. The query above will return the first non-null element, and if you want the last you can change the second argument to element_at to -1 (Athena's array indexing starts from 1, and -1 is counting from the end).
To return the individual ruleid element of the json:
SELECT from_unixtime(timestamp / 1000e0) AS date, action, httprequest.clientip AS ip, httprequest.uri AS request, httprequest.country as country, terminatingruleid, json_extract(element_at(filter(rulegrouplist,rulegroup -> json_extract(rulegroup, '$.terminatingrule') <> CAST('null' AS JSON) ),1), '$.terminatingrule.ruleid') AS ruleid
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC
I had the same issue but the solution posted by Theo didn't work for me, even though the table was created according to the instructions linked to in the original post.
Here is what worked for me, which is basically the same as Theo's solution, but without the json conversion:
SELECT
from_unixtime(timestamp / 1000e0) AS date,
action,
httprequest.clientip AS ip,
httprequest.uri AS request,
httprequest.country as country,
terminatingruleid,
rulegrouplist,
element_at(filter(ruleGroupList, ruleGroup -> ruleGroup.terminatingRule IS NOT NULL),1).terminatingRule.ruleId AS ruleId
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC
LIMIT 100;
I have a JSON object that's written in a weird way.
> {"custom": [ { "name": "addressIdNum", "valueNum": 12345678}, {
> "name": "cancelledDateAt", "valueAt": "2017-02-30T01:43:04.000Z" }] }
Not sure how to parse something like this. The keys are addressIdNum and cancelledDateAt and the values are 12345678 and 2017-02-30T01:43:04.000Z respectively.
How do I parse this using Snowflake SQL?
Thanks for all your help!
Best,
Preet Rajdeo
If your input is ALWAYS in this form (two elements in an array, with the same fields in the same element), you can combine PARSE_JSON function and the path access.
Just try this:
with input as (
select parse_json(
'{"custom": [ { "name": "addressIdNum", "valueNum": 12345678}, {"name": "cancelledDateAt", "valueAt": "2017-02-30T01:43:04.000Z" }] }')
as json)
select json:custom[0].valueNum::integer, json:custom[1].valueAt::timestamp from input;
----------------------------------+-----------------------------------+
JSON:CUSTOM[0].VALUENUM::INTEGER | JSON:CUSTOM[1].VALUEAT::TIMESTAMP |
----------------------------------+-----------------------------------+
12345678 | 2017-03-01 17:43:04 |
----------------------------------+-----------------------------------+
However, if the structure of your data might be different (e.g. elements in the array might be in a different order), it's probably best to write a JavaScript UDF in Snowflake to convert such messy data into something easier.
I have a table say types, which had a JSON column, say location that looks like this:
{ "attribute":[
{
"type": "state",
"value": "CA"
},
{
"type": "distance",
"value": "200.00"
} ...
]
}
Each row in the table has the data, and all have the "type": "state" in it. I want to just extract the value of "type": "state" from every row in the table, and put it in a new column. I checked out several questions on SO, like:
Query for element of array in JSON column
Index for finding an element in a JSON array
Query for array elements inside JSON type
but could not get it working. I do not need to query on this. I need the value of this column. I apologize in advance if I missed something.
create table t(data json);
insert into t values('{"attribute":[{"type": "state","value": "CA"},{"type": "distance","value": "200.00"}]}'::json);
select elem->>'value' as state
from t, json_array_elements(t.data->'attribute') elem
where elem->>'type' = 'state';
| state |
| :---- |
| CA |
dbfiddle here
I mainly use Redshift where there is a built-in function to do this. So on the off-chance you're there, check it out.
redshift docs
It looks like Postgres has a similar function set:
https://www.postgresql.org/docs/current/static/functions-json.html
I think you'll need to chain three functions together to make this work.
SELECT
your_field::json->'attribute'->0->'value'
FROM
your_table
What I'm trying is a json extract by key name, followed by a json array extract by index (always the 1st, if your example is consistent with the full data), followed finally by another extract by key name.
Edit: got it working for your example
SELECT
'{ "attribute":[
{
"type": "state",
"value": "CA"
},
{
"type": "distance",
"value": "200.00"
}
]
}'::json->'attribute'->0->'value'
Returns "CA"
2nd edit: nested querying
#McNets is the right, better answer. But in this dive, I discovered you can nest queries in Postgres! How frickin' cool!
I stored the json as a text field in a dummy table and successfully ran this:
SELECT
(SELECT value FROM json_to_recordset(
my_column::json->'attribute') as x(type text, value text)
WHERE
type = 'state'
)
FROM dummy_table