SQL - regexp_extract JSON string

SQL - regexp_extract JSON string - sql

I'm trying to extract from the following string the locationID
{"type":"player","topic_id":"555","topic_name":"sfd","userId":116,"userLocation":{"countryCode":"BR","locationId":21,"locationCity":"Rio de Janeiro"}}
I'm able to extract for example the topic_id using the following safe_cast(regexp_extract(h.events.label,r'"topic_id":"([a-zA-Z0-9-_. ]+)"') as int64
but this doesn't work for locationId. I'm guessing it's because of the nested dict? But not sure how to get around that.

You'd better using a json function rather than a regexp function.
WITH sample_data AS (
SELECT '{"type":"player","topic_id":"555","topic_name":"sfd","userId":116,"userLocation":{"countryCode":"BR","locationId":21,"locationCity":"Rio de Janeiro"}}' json
)
SELECT CAST(JSON_VALUE(json, '$.userLocation.locationId') AS INT64) AS locationId
FROM sample_data;
+------------+
| locationId |
+------------+
| 21 |
+------------+
this doesn't work for locationId. I'm guessing it's because of the nested dict?
I guess it's because a value of topic_id is a string "555" and a value of locationId is an integer 21.
r'"locationId":([a-zA-Z0-9-_. ]+)' will work for locationId but more simple regular expression would be r'"locationId":(\d+)'

Related

extract json in column

I have a table
id | status | outgoing
-------------------------
1 | paid | {"a945248027_14454878":"processing","old.a945248027_14454878":"cancelled"}
2 | pending| {"069e5248cf_45299995":"processing"}
I am trying to extract the values after each underscore in the outgoing column e.g from a945248027_14454878 I want 14454878
Because the json data is not standardised I can't seem to figure it out.

You may extract the json key part after the underscore using regexp version of substring.
select id, status, outgoing,
substring(key from '_([^_]+)$') as key
from the_table, lateral jsonb_object_keys(outgoing) as j(key);
See demo.

Extract Values based on Keys in a Bigquery column

I have data in the form of key value pair (Not Json) as shown below
id | Attributes
---|---------------------------------------------------
12 | Country:US, Eligibility:Yes, startDate:2022-08-04
33 | Country:CA, Eligibility:Yes, startDate:2021-12-01
11 | Country:IN, Eligibility:No, startDate:2019-11-07
I would like to extract only startDate from Attributes section
Expected Output:
id | Attributes_startDate
---|----------------------
12 | 2022-08-04
33 | 2021-12-01
11 | 2019-11-07
One way that I tried was, I tired converting the Attributes column in the Input data into JSON by appending {, } at start and end positions respectively. Also some how tried adding double quotes on the Key values and tried extracting startDate. But, is there any other effective solution to extract startDate as I don't want to rely on Regex.

Is there any way to just specify the key and extract its respective value? (Just like the way we can extract values using keys in a JSON column using JSON_QUERY)
you can try below
create temp function fakejson_extract(json string, attribute string) as ((
select split(kv, ':')[safe_offset(1)]
from unnest(split(json)) kv
where trim(split(kv, ':')[offset(0)]) = attribute
));
select id,
fakejson_extract(Attributes, 'Country') as Country,
fakejson_extract(Attributes, 'Eligibility') as Eligibility,
fakejson_extract(Attributes, 'startDate') as startDate
from your_table
if applied to sample data in your question - output is

is there any other effective solution to extract startDate as I don't want to rely on Regex.
If your feeling about Regex here really very strong - use below
select id, split(Attribute, ':')[safe_offset(1)] Attributes_startDate
from your_table, unnest(split(Attributes)) Attribute
where trim(split(Attribute, ':')[offset(0)]) = 'startDate'

Use below (I think using RegEx here is the most efficient option)
select id, regexp_extract(Attributes, r'startDate:(\d{4}-\d{2}-\d{2})') Attributes_startDate
from your_table
if applied to sample data in your question - output is

Athena query get the index of any element in a list

I need to access to the elements in a column whose type is list according to the other elements' locations in another list-like column. Say, my dataset is like:
WITH dataset AS (
SELECT ARRAY ['hello', 'amazon', 'athena'] AS words,
ARRAY ['john', 'tom', 'dave'] AS names
)
SELECT * FROM dataset
And I'm going to achieve
SELECT element_at(words, index(names, 'john')) AS john_word
FROM dataset
Is there a way to have a function in Athena like "index"? Or how can I customize one like this? The desired result should be like:
| -------- |
| john_word|
| -------- |
| hello |
| -------- |

array_position:
array_position(x, element) → bigint
Returns the position of the first occurrence of the element in array x (or 0 if not found).
Note that in presto array indexes start from 1.
SELECT element_at(words, array_position(names, 'john')) AS john_word
FROM dataset

SQL LIKE using the same row value

I'm wondering how can I use a row value as a variable for my like statement? For example
ID | PID | DESCRIPTION
1 | 4124 | Hi4124
2 | 2451 | Test
3 | 1467 | Hello
4 | 9642 | Me9642
I have a table above, I want to return IDs 1 and 4 since DESCRIPTION contains PID.
I'm thinking it would be SELECT * from TABLE WHERE DESCRIPTION LIKE '%PID%' but I can't get it.

You can use CONCAT() to assemble the matching pattern, as in:
select *
from t
where description like concat('%', PID, '%')

We could also try using CHARINDEX here:
SELECT ID, PID, DESCRIPTION
FROM yourTable
WHERE CHARINDEX(PID, DESCRIPTION) > 0;
Demo
Note that I assume in the demo that the PID column is actually text, and not a numeric column. If PID be numeric, we might have to first use a cast in order to use CHARINDEX (or any of the methods given in the other answers).

Use the CONCAT SQL function
SELECT *
FROM TABLE
WHERE DESCRIPTION LIKE CONCAT('%', PID, '%')

Concatenating JSON results to single column postgresql

So, at the moment I have two columns in a table, one of which containing a JSON document, like so:
CID:
2
Name:
{"first" : "bob","1" : "william", "2" : "archebuster", "last" : "smith"}
When I do a search on this column using:
SELECT "CID", "Name"->>(json_object_keys("Name")) AS name FROM "Clients" WHERE
"Name"->>'first' LIKE 'bob' GROUP BY "CID";
I get:
CID | name
--------------
2 | bob
2 | william
2 | archebuster
2 | smith
When really I want:
CID | name
2 | bob william archebuster smith
How would i go about doing this? I'm new to JSON in postgresql.
I've tried string_agg and it wouldn't work, presumably because i'm working in a json column, despite the fact '->>' should type set the result to string
UPDATE:

First, you need to understand, if you include a set-returning function into the SELECT clause, you will create an implicit LATERAL CROSS JOIN.
Your query in reality looks like this:
SELECT "CID", "Name"->>"key" AS name
FROM "Clients"
CROSS JOIN LATERAL json_object_keys("Name") AS "foo"("key")
WHERE "Name"->>'first' LIKE 'bob'
GROUP BY "CID", "Name"->>"key"
If you really want to do that, you can apply an aggregate function here (possibly array_agg or string_agg).
SQLFiddle

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL - regexp_extract JSON string - sql

Related

extract json in column

Extract Values based on Keys in a Bigquery column

Athena query get the index of any element in a list

SQL LIKE using the same row value

Concatenating JSON results to single column postgresql

Categories

Resources