How to transform a JSON String into a new table in Bigquery - sql

I have a lot of raw data into one of my tables in Bigquery and from this raw data I need to create a new table.
Raw data table have a column named raw_output this column contains a JSON object that was stringify. It looks like that:
| raw_output |
| ----------------------------------------------------------------------|
| {"client":"A9310","c_integration":"889625","idntf":false,"nf_p":8.32} |
| {"client":"VB050","c_integration":"236590","idntf":true,"nf_p":4.36} |
| {"client":"XT5543","c_integration":"326957","idntf":true,"nf_p":2.33} |
From this table I would like to get something like:
client
c_integration
idntf
nf_p
A9310
889625
false
8.32
VB050
236590
true
4.36
XT5543
326957
true
2.33
So I can perform JOINS and do other operations with the data, I have looked into google's BQ docs (JSON functions) but I was not able to get the expected output. Any idea/solution is much appreciated.
Thank you all in advance.

This should help
with raw_data as (
select '{"client":"A9310","c_integration":"889625","idntf":false,"nf_p":8.32}' as raw_input union all
select '{"client":"VB050","c_integration":"236590","idntf":true,"nf_p":4.36}' as raw_input union all
select '{"client":"XT5543","c_integration":"326957","idntf":true,"nf_p":2.33}' as raw_input
)
select
json_extract_scalar(raw_input, '$.client' ) as client,
json_extract_scalar(raw_input, '$.c_integration' ) as c_integration,
json_extract_scalar(raw_input, '$.idntf' ) as idntf,
json_extract_scalar(raw_input, '$.nf_p' ) as nf_p
from raw_data

Related

Extracting value of a json in Spark SQl

I am looking to aggregate by extracting the value of a json key here from one of the column here. can someone help me with the right syntax in Spark SQL
select count(distinct(Name)) as users, xHeaderFields['xyz'] as app group by app order by users desc
The table column is something like this. I have removed other columns for simplification.Table has columns like Name etc.
Assuming that your dataset is called ds and there is only one key=xyz object per columns;
First, to JSON conversion (if needed):
ds = ds.withColumn("xHeaderFields", expr("from_json(xHeaderFields, 'array<struct<key:string,value:string>>')"))
Then filter the key = xyz and take the first element (assuming there is only one xyz key):
.withColumn("xHeaderFields", expr("filter(xHeaderFields, x -> x.key == 'xyz')[0]"))
Finally, extract value from your object:
.withColumn("xHeaderFields", expr("xHeaderFields.value"))
Final result:
+-------------+
|xHeaderFields|
+-------------+
|null |
|null |
|Settheclass |
+-------------+
Good luck!

How to fix: cannot retrieve all fields MongoDB collection with Apache Drill SQL expression query

I'm trying to retrieve all(*) columns from a MongoDB object with Apache Drill expression SQL:
`_id`.`$oid`
Background: I'm using Apache Drill to query MongoDB collections. By default, Drill retrieves the ObjectId values in a different format than the stored in the database. For example:
Mongo: ObjectId(“59f2c3eba83a576fe07c735c”)
Drill query result: [B#3149a…]
In order to get the data in String format (59f2c3eba83a576fe07c735c) from the object, I changed the Drill config "store.mongo.bson.record.reader" to "false".
ALTER SESSION SET store.mongo.bson.record.reader = false
Drill query result after config set to false:
select * from calc;
+--------------------------------------+---------+
| _id | name |
+--------------------------------------+---------+
| {"$oid":"5cb0e161f0849231dfe16d99"} | thiago |
+--------------------------------------+---------+
Running a query by _id:
select `o`.`_id`.`$oid` , `o`.`name` from mongo.od_teste.calc o where `o`.`_id`.`$oid`='5cb0e161f0849231dfe16d99';
Result:
+---------------------------+---------+
| EXPR$0 | name |
+---------------------------+---------+
| 5cb0e161f0849231dfe16d99 | thiago |
+---------------------------+---------+
For an object with a few columns like the one above (_id, name) it's ok to specify all the columns in the select query by id. However, in my production database, the objects have a "hundred" of columns.
If I try to query all (*) columns from the collection, this is the result:
select `o`.* from mongo.od_teste.calc o where `o`.`_id`.`$oid`='5cb0e161f0849231dfe16d99';
or
select * from mongo.od_teste.calc o where `o`.`_id`.`$oid`='5cb0e161f0849231dfe16d99';
+-----+
| ** |
+-----+
+-----+
No rows selected (6.112 seconds)
Expected result: Retrieve all columns from a MongoDB collection instead of declaring all of them on the SQL query.
I have no suggestions here, because it is a bug in Mongo Storage Plugin.
I have created Jira ticket for it, please take a look and feel free to add any related info there: DRILL-7176

Access array elements ANSI SQL

I'm using Drill to query MongoDB using ANSI SQL , I have a field that contains an array of values, I want to be able to access those elements to join them with other documents .
select name from table where table.id = array.element;
but other than FLATTEN which divides them into multiple lines, I can't access the array's elements.
Any help please ?
I added some sample data in mongodb
db.col.insert({"id":1,name:"dev","arr":[1,2,3,4]});
Working query from Drill:
select name from col4 where id=arr[0];
Output:
+-------+
| name |
+-------+
| dev |
+-------+

Google BigQuery - Parsing string data from a Bigquery table column

I have a table A within a dataset in Bigquery. This table has multiple columns and one of the columns called hits_eventInfo_eventLabel has values like below:
{ID:AEEMEO,Score:8.990000;ID:SEAMCV,Score:8.990000;ID:HBLION;Property
ID:DNSEAWH,Score:0.391670;ID:CP1853;ID:HI2367;ID:H25600;}
If you write this string out in a tabular form, it contains the following data:
**ID | Score**
AEEMEO | 8.990000
SEAMCV | 8.990000
HBLION | -
DNSEAWH | 0.391670
CP1853 | -
HI2367 | -
H25600 | -
Some IDs have scores, some don't. I have multiple records with similar strings populated under the column hits_eventInfo_eventLabel within the table.
My question is how can I parse this string successfully WITHIN BIGQUERY so that I can get a list of property ids and their respective recommendation scores (if existing)? I would like to have the order in which the IDs appear in the string to be preserved after parsing this data.
Would really appreciate any info on this. Thanks in advance!
I would use combination of SPLIT to separate into different rows and REGEXP_EXTRACT to separate into different columns, i.e.
select
regexp_extract(x, r'ID:([^,]*)') as id,
regexp_extract(x, r'Score:([\d\.]*)') score from (
select split(x, ';') x from (
select 'ID:AEEMEO,Score:8.990000;ID:SEAMCV,Score:8.990000;ID:HBLION;Property ID:DNSEAWH,Score:0.391670;ID:CP1853;ID:HI2367;ID:H25600;' as x))
It produces the following result:
Row id score
1 AEEMEO 8.990000
2 SEAMCV 8.990000
3 HBLION null
4 DNSEAWH 0.391670
5 CP1853 null
6 HI2367 null
7 H25600 null
You can write your own JavaScript functions in BigQuery to get exactly what you want now: http://googledevelopers.blogspot.com/2015/08/breaking-sql-barrier-google-bigquery.html

How would I compare two fields in an SQL SELECT statement to then select other fields based on this result?

I have a table which contains a list of products for a company. They input data about how much stock they have and also the level at which they want to be reminded that they need to order new stock.
For example:
+-------+-------+----------------+
|column1|column2|column1<=column2|
+-------+-------+----------------+
|value1 |value1 | true |
|value2 |value3 | false |
|value4 |value4 | true |
+-------+-------+----------------+
I want to list all the true results in a form which the user is then able to navigate through. What would be the best way to go about doing this?
How about
SELECT * FROM mytable WHERE column1<=column2 ?
Using SQL I would suggest a statement like this:
SELECT * FROM table WHERE column1 <= column2