How to extract values from a column - Google Big quey [duplicate] - google-bigquery

This question already has answers here:
How to get a value of key with $ thru JSON_EXTRACT in BigQuery
(1 answer)
Parsing JSON files from a column with invalid token in BigQuery
(1 answer)
BigQuery parse json child column with special character
(1 answer)
Closed 5 months ago.
I am looking to extract $Revenue & $price value from a column which has values like below in a google big query, not sure how to use REGEX_EXTRACT or any other function to do so. Position of revenue & price varies in string so can have specific position.
Any thoughts how i do that?
Value 1 -
{"utm_medium":"direct","utm_amplitude_user_id":"1580904318308","$quantity":1,"Locale":"English","$revenue":56.49,"Source":"App","utm_date":"2020-02-05","AppVersion":"Mac","$price":56.49,"utm_initial_medium":"direct"}
Value 2 -
{"utm_initial_source":"none",utm_medium":"direct","utm_amplitude_user_id":"1580904318308","$quantity":1,"Locale":"English","$revenue":56.49,"Source":"App","utm_date":"2020-02-05","AppVersion":"Mac","$price":56.49,"utm_source":"none","Device":"Desktop"}

You can try below query.
JSON_VALUE() function uses double quotes to escape invalid JSONPath characters.
see more information about JSON functions here.
WITH sample_data AS (
SELECT '{"utm_medium":"direct","utm_amplitude_user_id":"1580904318308","$quantity":1,"Locale":"English","$revenue":56.49,"Source":"App","utm_date":"2020-02-05","AppVersion":"Mac","$price":56.49,"utm_initial_medium":"direct"}' json UNION ALL
SELECT '{"utm_initial_source":"none","utm_medium":"direct","utm_amplitude_user_id":"1580904318308","$quantity":1,"Locale":"English","$revenue":56.49,"Source":"App","utm_date":"2020-02-05","AppVersion":"Mac","$price":56.49,"utm_source":"none","Device":"Desktop"}'
)
SELECT JSON_VALUE(json, '$."$revenue"') AS revenue,
JSON_VALUE(json, '$."$price"') AS price,
FROM sample_data;
+---------+-------+
| revenue | price |
+---------+-------+
| 56.49 | 56.49 |
| 56.49 | 56.49 |
+---------+-------+

Related

extract json in column

I have a table
id | status | outgoing
-------------------------
1 | paid | {"a945248027_14454878":"processing","old.a945248027_14454878":"cancelled"}
2 | pending| {"069e5248cf_45299995":"processing"}
I am trying to extract the values after each underscore in the outgoing column e.g from a945248027_14454878 I want 14454878
Because the json data is not standardised I can't seem to figure it out.
You may extract the json key part after the underscore using regexp version of substring.
select id, status, outgoing,
substring(key from '_([^_]+)$') as key
from the_table, lateral jsonb_object_keys(outgoing) as j(key);
See demo.

How do I remove any text from string SQL Server? [duplicate]

This question already has answers here:
Select query to remove non-numeric characters
(19 answers)
Closed 3 months ago.
I have a column:
| Duration |
| -------- |
| 32 minutes|
| 27minutes |
| 20 mins |
| 15 |
I want to remove the text so that only the numbers remain, but as the text is varied I'm at a loss how to do so. I've reviewed multiple solutions and none seem to accomplish the job in an elegant way.
I had another column that was distance, and every row contained 'km' at the end so I was able to use replace.
UPDATE runner_orders
SET distance = REPLACE(distance,'km','')
I tried doing the same but using a wildcard, this didn't work.
UPDATE runner_orders
SET duration = REPLACE(duration, 'min%','')
Any input is well appreciated.
You can achieve this with CAST
UPDATE runner_orders SET duration = CAST(duration as INT);

Extract Values based on Keys in a Bigquery column

I have data in the form of key value pair (Not Json) as shown below
id | Attributes
---|---------------------------------------------------
12 | Country:US, Eligibility:Yes, startDate:2022-08-04
33 | Country:CA, Eligibility:Yes, startDate:2021-12-01
11 | Country:IN, Eligibility:No, startDate:2019-11-07
I would like to extract only startDate from Attributes section
Expected Output:
id | Attributes_startDate
---|----------------------
12 | 2022-08-04
33 | 2021-12-01
11 | 2019-11-07
One way that I tried was, I tired converting the Attributes column in the Input data into JSON by appending {, } at start and end positions respectively. Also some how tried adding double quotes on the Key values and tried extracting startDate. But, is there any other effective solution to extract startDate as I don't want to rely on Regex.
Is there any way to just specify the key and extract its respective value? (Just like the way we can extract values using keys in a JSON column using JSON_QUERY)
you can try below
create temp function fakejson_extract(json string, attribute string) as ((
select split(kv, ':')[safe_offset(1)]
from unnest(split(json)) kv
where trim(split(kv, ':')[offset(0)]) = attribute
));
select id,
fakejson_extract(Attributes, 'Country') as Country,
fakejson_extract(Attributes, 'Eligibility') as Eligibility,
fakejson_extract(Attributes, 'startDate') as startDate
from your_table
if applied to sample data in your question - output is
is there any other effective solution to extract startDate as I don't want to rely on Regex.
If your feeling about Regex here really very strong - use below
select id, split(Attribute, ':')[safe_offset(1)] Attributes_startDate
from your_table, unnest(split(Attributes)) Attribute
where trim(split(Attribute, ':')[offset(0)]) = 'startDate'
Use below (I think using RegEx here is the most efficient option)
select id, regexp_extract(Attributes, r'startDate:(\d{4}-\d{2}-\d{2})') Attributes_startDate
from your_table
if applied to sample data in your question - output is

Combine query to get all the matching search text in right order

I have the following table:
postgres=# \d so_rum;
Table "public.so_rum"
Column | Type | Collation | Nullable | Default
-----------+-------------------------+-----------+----------+---------
id | integer | | |
title | character varying(1000) | | |
posts | text | | |
body | tsvector | | |
parent_id | integer | | |
Indexes:
"so_rum_body_idx" rum (body)
I wanted to do phrase search query, so I came up with the below query, for example:
select id from so_rum
where body ## phraseto_tsquery('english','Is it possible to toggle the visibility');
This gives me the results, which only match's the entire text. However, there are documents, where the distance between lexmes are more and the above query doesn't gives me back those data. For example: 'it is something possible to do toggle between the. . . visibility' doesn't get returned. I know I can get it returned with <2> (for example) distance operator by giving in the to_tsquery, manually.
But I wanted to understand, how to do this in my sql statement itself, so that I get the results first with distance of 1 and then 2 and so on (may be till 6-7). Finally append results with the actual count of the search words like the following query:
select count(id) from so_rum
where body ## to_tsquery('english','string & string . . . ')
Is it possible to do in a single query with good performance?
I don't see a canned solution to this. It sounds like you need to use plainto_tsquery to get all the results with all the lexemes, and then implement your own custom ranking function to rank them by distance between the lexemes, and maybe filter out ones with the wrong order.

Google BigQuery - Parsing string data from a Bigquery table column

I have a table A within a dataset in Bigquery. This table has multiple columns and one of the columns called hits_eventInfo_eventLabel has values like below:
{ID:AEEMEO,Score:8.990000;ID:SEAMCV,Score:8.990000;ID:HBLION;Property
ID:DNSEAWH,Score:0.391670;ID:CP1853;ID:HI2367;ID:H25600;}
If you write this string out in a tabular form, it contains the following data:
**ID | Score**
AEEMEO | 8.990000
SEAMCV | 8.990000
HBLION | -
DNSEAWH | 0.391670
CP1853 | -
HI2367 | -
H25600 | -
Some IDs have scores, some don't. I have multiple records with similar strings populated under the column hits_eventInfo_eventLabel within the table.
My question is how can I parse this string successfully WITHIN BIGQUERY so that I can get a list of property ids and their respective recommendation scores (if existing)? I would like to have the order in which the IDs appear in the string to be preserved after parsing this data.
Would really appreciate any info on this. Thanks in advance!
I would use combination of SPLIT to separate into different rows and REGEXP_EXTRACT to separate into different columns, i.e.
select
regexp_extract(x, r'ID:([^,]*)') as id,
regexp_extract(x, r'Score:([\d\.]*)') score from (
select split(x, ';') x from (
select 'ID:AEEMEO,Score:8.990000;ID:SEAMCV,Score:8.990000;ID:HBLION;Property ID:DNSEAWH,Score:0.391670;ID:CP1853;ID:HI2367;ID:H25600;' as x))
It produces the following result:
Row id score
1 AEEMEO 8.990000
2 SEAMCV 8.990000
3 HBLION null
4 DNSEAWH 0.391670
5 CP1853 null
6 HI2367 null
7 H25600 null
You can write your own JavaScript functions in BigQuery to get exactly what you want now: http://googledevelopers.blogspot.com/2015/08/breaking-sql-barrier-google-bigquery.html