extracting data from unstructured JSON in big query - sql

I have JSON as a string in a big query field:
[{"name":"user_group","value":"regular"},{"name":"checkout_version","value":"2.2"},{"name":"currency","value":"EUR"},{"name":"currency_exchange_rate","value":"1"},{"name":"currency_symbol","value":"€"},{"name":"variant","value":"default"},{"name":"snowplow_id","value":"XXXXXXX"},{"name":"ip_address","value":"XXXX"},{"name":"user_agent","value":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36"},{"name":"is_test_order","value":"false"}]
I'm going crazy trying to extract the value ("default") from this section:
{"name":"variant","value":"default"}
the part I want will always follow "name":"variant","value":" and have a " at the end.
I have tried json_extract but regexp_extract seems the best option, I tried this: (
select REGEXP_EXTRACT(json_string_field, r'/\{"value":"([^"]+)"/') as variant
from source_table
)
but I'm just getting nulls back...would appreciate ideas...

consider below query
WITH json_data AS (
SELECT '[{"name":"user_group","value":"regular"},{"name":"checkout_version","value":"2.2"},{"name":"currency","value":"EUR"},{"name":"currency_exchange_rate","value":"1"},{"name":"currency_symbol","value":"€"},{"name":"variant","value":"default"},{"name":"snowplow_id","value":"XXXXXXX"},{"name":"ip_address","value":"XXXX"},{"name":"user_agent","value":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36"},{"name":"is_test_order","value":"false"}]' json
)
SELECT JSON_VALUE(kv, '$.value') AS value
FROM json_data, UNNEST(JSON_QUERY_ARRAY(json)) kv
WHERE JSON_VALUE(kv, '$.name') = 'variant';

Related

Karate: Is there a way to pass variable as string in scenario outline and examples table [duplicate]

This question already has an answer here:
Passing variables into Examples section [duplicate]
(1 answer)
Closed 1 year ago.
I am using latest karate v1.1.0. My feature looks like this:
Feature: Scenario outline with examples table
Background:
* url 'http://localhost:8080'
* def dummy = Java.type(karatetests.attributes)
* def test = new attributes()
* def userid = test.getuserid()
Scenario Outline: pass userid with string as parameter
Given path '<path>'
And Header Host = 'hostname'
And Header User-Agent = '<Ua-string>'
When method POST
Then status 200
Examples:
| path | Ua-string |
| api | AppleWebKit/537.36 (KHTML, like Gecko, userid)|
In cucumber: I was able to realise the variable value of 'userid' in Ua-string table with AppleWebKit/537.36 (KHTML, like Gecko, ${userid})
In karate: I tried with 'userid', "userid", "#(userid)", and '#(userid)' unfortunately was not succesfull.
Examples:
| path | Ua-string |
| api | AppleWebKit/537.36 (KHTML, like Gecko, userid)| => Result: userid string is passed not its value
| api | AppleWebKit/537.36 (KHTML, like Gecko, 'userid')| => Result: syntax error
| api | AppleWebKit/537.36 (KHTML, like Gecko, "userid")| => Result: "userid" string is passed not its value
| api | AppleWebKit/537.36 (KHTML, like Gecko, "#(userid)")| => Result: "#(userid)" string is passed not its value
| api | AppleWebKit/537.36 (KHTML, like Gecko, '#(userid)')| => Result: '#(userid)' syntax error
How can I replace the userid with its value, while passing it to Ua-string header?
Thanks
Works perfectly for me. Try this example that you can cut and paste into any feature and see that it works for yourself:
Scenario Outline:
* url 'https://httpbin.org/anything'
* header User-Agent = userAgent
* method get
Examples:
| userAgent |
| foo |
| bar |
But I think you are expecting functions and variables to work within Examples: - but sorry, that is not supported: https://stackoverflow.com/a/60358535/143475
EDIT I still don't understand what you mean by "variable" in your comment, but if you are seeing some problems with commas, try this, and refer the docs: https://github.com/intuit/karate#scenario-outline-enhancements
Scenario Outline:
* url 'https://httpbin.org/anything'
* header User-Agent = userAgent
* method get
Examples:
| userAgent! |
| 'foo, bar' |
| 'baz, ban' |
And finally, if you really want to do some "magic" using variables, you can always do that in the scenario body as shown here: https://stackoverflow.com/a/60358535/143475
Or just use string concatenation. Is that so hard ?

Correct syntax for SQL Query with XML

My DataXML looks like this
<TestResults>
<MethodResult>
X
X
<StepResult name="BluetoothERROR">
X
X
X
X
X
X
X
<StepResult name="FLOWERROR1">
<Number value="-100" />
</ActualValue>
X
X
<StepResult name="PowerOffError">
X
X
X
</StepResult>
</MethodResult>
</TestResults>
Where X means other instances of StepResult with different Name like BluetoothError or PowerOffError. Assume that the other StepResults can have similar outputs as the "FLOWERROR1".
I am particularly interested in StepResult with name "FlowError1" and I would like to return the Number value of -100.
I have tried this line of code and it did not work and only shows Nulls.
f.ResultXML.value('(/TestResults/MethodResult/StepResult/ActualValue/Number/#Value)[1]', 'varchar(max)') As "Actual Value"
What should I have done instead?
You can to filter nodes by the name and return the first matching node' value:
select #data.value('(TestResults/MethodResult/StepResult[#name="FLOWERROR1"]/ActualValue/Number)[1]/#value', 'int')
db<>fiddle demo
See Introduction to Using XPath Queries and XQuery Language Reference.

Impala AnalysisException: Subqueries are not supported in the HAVING clause

I have a query where I am selecting destination host names where a user agent string matches and grouping by where there is a distinct srchostname using Impala.
select desthostname
from proxy_table
where useragentstring = "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/538.1 (KHTML, like Gecko) Google Earth Pro/7.3.2.5491 Safari/538.1"
group by desthostname
having count(*) = (select count(distinct srchostname) from proxy_table);
But I am running into the error:
AnalysisException: Subqueries are not supported in the HAVING clause.
Do you know how I can fix this?
Run this:
select desthostname from
(select desthostname,count(*) as cnt
from proxy_table
where useragentstring = "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/538.1 (KHTML, like Gecko) Google Earth Pro/7.3.2.5491 Safari/538.1"
group by desthostname) A where A.cnt in (select count(distinct srchostname) from proxy_table);

SQL query between strings representing version numbers

I have a SQL query that goes like this:
select v FROM v.rsversion between 'minVer' and 'maxVer';
Where the version is expressed as a string of format x.y.z
This will return fine all existing versions between 0.2.0 and 0.2.9
but will return nothing if the range is 0.2.0 and 0.2.10
Is there a way to make this work?
With Postgres you could do this by splitting up the version into three numbers and then compare those numbers. For other DBMS you would need to find a different way of splitting a string like 0.2.1 into three numbers.
with rsversion (v) as (
values
-- only here for sample data
('0.2.0'), ('0.2.1'), ('0.2.2'), ('0.2.10'), ('0.2.12'),
('0.3.0'), ('0.3.1'), ('0.3.2'), ('0.3.4')
), numeric_version (v, major, minor, patch) as (
select v,
split_part(v,'.', 1)::int,
split_part(v,'.', 2)::int, ,
split_part(v,'.', 3)::int
from rsversion
)
select v
FROM numeric_version
where (major,minor,patch) between (0,2,1) and (0,2,11)
The above prints:
v
------
0.2.1
0.2.2
0.2.10
SQLFiddle example: http://sqlfiddle.com/#!15/403d2/2
The reason this is not working as designed is because 0.2.0 and 0.2.x are not literal numbers, so if you are trying to do a string comparison it's looking at each incremented character and comparing them.
So 0.2.0, 0.2.1, 0.2.10, 0.2.2, 0.2.3, etc is how it's arranging the strings.
You may be able to make this work by adding a leading 0 to the third part of the string 0.2.00, 0.2.01, 0.2.02, etc if that is possible to do for your purposes.
Does this work? I'm assuming your version will always be in a similar format and that your minVer and maxVer are stored as text.
WITH
DATA AS
(
SELECT whatever other data you require,
SUBSTRING(v.rsversion FROM '^[0-9]+.[0-9]+'), '.', '')::decimal as version_part_one,
SUBSTRING(v.rsversion FROM '[0-9]+$')::decimal as version_part_two
FROM v
)
SELECT d.whatever other data you require
FROM data d
WHERE d.version_part_one >= SUBSTRING(minVer FROM '^[0-9]+.[0-9]+')::decimal
AND
d.version_part_two >= SUBSTRING(minVer FROM '.[0-9]+$')::decimal
AND
d.version_part_one <= SUBSTRING(minVer FROM '^[0-9]+.[0-9]+')::decimal
AND
d.version_part_two <= SUBSTRING(maxVer FROM '.[0-9]+$')::decimal
You can change the name of the CTE if you're feeling more creative...

SQL 2005: How to use GROUP BY with a sub query

The following very simple query
select distinct guid, browser_agent
from tblMyGlossary
where browser_agent is not null
provides the following results:
guid browser_agent
367DE2B8-88A5-4DA9-ACBB-C0864493DC1F Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
5DCB918E-DA56-4545-A4E3-D09B1B803422 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
998B8F37-2C9A-49EB-AA0B-CF88C4CC7BDF Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5
A0DD3BCB-E8A9-4434-A869-C343FB21F993 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
Ii want to be able to count the number of unique browser_agent strings, so I am performing the following query:
select browser_agent, count(browser_agent) as 'count'
from
(
select distinct guid, browser_agent
from tblMyGlossary
where browser_agent is not null
)
group by browser_agent
order by 'count' desc;
Problem is SQL 2005 is complaining:
Msg 156, Level 15, State 1, Line 8
Incorrect syntax near the keyword 'group'.
Can anyone shed any light on how to resolve this please? I've run out of ideas.
Many thanks,
Mark
You need to alias your derived table.
select browser_agent, count(browser_agent) as 'count'
from
(
select distinct guid, browser_agent
from tblMyGlossary
where browser_agent is not null
) a
group by browser_agent
order by 'count' desc;