Unnesting 3rd level dependency in Google BigQuery - sql

I'm trying to Replace the schema in existing table using BQ. There are certain fields in BQ which have 3-5 level schema dependency.
For Ex. comsalesorders.comSalesOrdersInfo.storetransactionid this field is nested under two fields.
Since I'm using this to replace existing table, I can not change the field names in query.
The query looks similar to this
SELECT * REPLACE(comsalesorders.comSalesOrdersInfo.storetransactionid AS STRING) FROM CentralizedOrders_streaming.orderStatusUpdated, UNNEST(comsalesorders) AS comsalesorders, UNNEST(comsalesorders.comSalesOrdersInfo) AS comsalesorders.comSalesOrdersInfo
BQ enables unnesting first schema field but presents problem for 2nd nesting.
What changes do I need to make to this query to use UNNEST() for such depedndent schemas ?

Given that you don't have a schema, I will try to provide a generalized answer. Please try to understand the difference between the 2 queries.
-- Provide an alias for each unnest (as if each is a separate table)
select c.stuff
from table
left join unnest(table.first_level_nested) a
left join unnest(a.second_level_nested) b
left join unnest(b.third_level_nested) c
-- b and c won't work here because you are 'double unnesting'
select c.stuff
from table
left join unnest(table.first_level_nested) a
left join unnest(first_level_nested.second_level_nested) b
left join unnest(first_level_nested.second_level_nested.third_level_nested) c

I'm not sure I understand your question, but as I could guess, you want to change one column type to another type, such as STRING.
The UNNEST function is only used with columns that are array types, for example:
"comsalesorders":["comSalesOrdersInfo":{}, comSalesOrdersInfo:{}, comSalesOrdersInfo:{}]
But not with this kind of columns:
"comSalesOrdersInfo":{"storeTransactionID":"X1056-943462","ItemsWarrenty":0,"currencyCountry":"USD"}
Therefore, if a didn't misunderstand your question, I would make a query like this:
SELECT *, CAST(A.comSalesOrdersInfo.storeTransactionID as STRING)
FROM `TABLE`, UNNEST(comsalesorders) as A

Related

How to use Except clause in Bigquery?

I am trying to use the existing Except clause in Bigquery. Please find my query below
select * EXCEPT (b.hosp_id, b.person_id,c.hosp_id) from
person a
inner join hospital b
on a.hosp_id= b.hosp_id
inner join reading c
on a.hosp_id= c.hosp_id
As you can see I am using 3 tables. All the 3 tables have the hosp_id column, so I would like to remove duplicate columns which are b.hosp_id and c.hosp_id. Simlarly, I would like to remove b.person_id column as well.
When I execute the above query, I get the syntax error as shown below
Syntax error: Expected ")" or "," but got "." at [9:19]
Please note that all the columns that I am using in Except clause is present in the tables used. Additional info is all the tables used are temp tables created using with clause. When I do the same manually by selecting column of interest, it works fine. But I have several columns and can't do this manually.
Can you help? I am trying to learn Bigquery. Your inputs would help
I use the EXCEPT on a per-table basis:
select p.* EXCEPT (hosp_id, person_id),
h.*,
r.* EXCEPT (hosp_id)
from person p inner join
hospital h
on p.hosp_id = h.hosp_id inner join
reading r
on p.hosp_id = r.hosp_id;
Note that this also uses meaningful abbreviations for table aliases, which makes the query much simpler to understand.
In your case, I don't think you need EXCEPT at all if you use the USING clause.
Try this instead:
select * EXCEPT (person_id) from
person a
inner join hospital b
using (hosp_id)
inner join reading c
using (hosp_id)
You can only put column names (not paths) in the EXCEPT list, and you can simply avoid projecting the duplicate columns with USING instead of ON.

Joining tables with incompatible types

I'm trying to join two tables using this command :
SELECT * FROM bigquery-public-data.github_repos.files INNER JOIN bigquery-public-data.github_repos.commits USING (repo_name)
but there are incompatible types on either side of the join: STRING and ARRAY< STRING> Is there a way to go through this ?
Thank you !
You want to join a 2 billion row table with a 200 million row one. This won't end up well, unless you define restrictions on what you want to get out of this.
As for the technical problems of this query: The error says you are trying to JOIN a single value with an array of values. You need to UNNEST() that array.
This would work syntactically:
SELECT *
FROM `bigquery-public-data.github_repos.files` a
INNER JOIN (
SELECT * EXCEPT(repo_name)
FROM `bigquery-public-data.github_repos.commits`
, UNNEST(repo_name) repo
) b
ON a.repo_name=b.repo
But if you go for it, it will use all your free monthly quota (1TB of data scanned) for no good purpose, as far as I can tell.

Get overlapped data from two tables with same structure, giving prefrence to other : Oracle

I am completely lost thinking about how do I solve this challenge of data retrieving.
I have this two tables: MY_DATA and MY_DATA_CHANGE in my Oracle database.
I wanted to select data some thing like this:
SELECT ALL COLUMNS
FROM MY_DATA
WHERE ID IN (1,2,4,5) FROM MY_DATA
BUT IF ANY ID IS PRESENT IN (1,2,4,5) IN MY_DATA_CHANGE
THEN USE ROW FROM MY_DATA_CHANGE
So my overall result must look like:
I can only use SQL not stored procedure, as this query is going to be part of another very big query (legacy code written long back) (will be used in Crystal reports tool to create report).
So guys please help. My column data contains CLOB and the usual UNION logic does not work on them.
How do I do it ?
SELECT
m.Id
,COALESCE(c.CLOB1,m.CLOB1) as CLOB1
,COALESCE(c.CLOB2,m.CLOB2) as CLOB2
FROM
MY_DATA m
LEFT JOIN MY_DATA_CHANGE c
ON m.Id = c.Id
WHERE
m.ID IN (1,2,4,5)
The way I would choose to do that is via a LEFT JOIN between the two tables and then use COALESCE().

Column in field list is ambiguous error

i've been recently working in mysql and in one of the requests i wrote :
SELECT SIGLE_EEP, ID_SOUS_MODULE, LIBELLE
FROM mef_edi.eep a, mef_edi.envoi e, mef_edi.sous_module s
WHERE a.ID_EEP = e.ID_EEP
AND a.ID_SOUS_MODULE = s.ID_SOUS_MODULE;
and they told me :
Column ID_SOUS_MODULE in field list is ambiguous
What should i do ?
More than one table has a column named ID_SOUS_MODULE.
So you need to name the table every time you mention the column to specify which table you mean.
Change
SELECT ID_SOUS_MODULE
for instance to
SELECT a.ID_SOUS_MODULE
I agree with the answer above, you may have duplicate column names across your 3 tables, assigning the table id (a, e, s) as noted above will avoid that issue in the select. In addition to what #juergen said you may want to get rid of that cartesian join by using an inner or left join (inner seems to be what your going for). The way you are joining your table you are joining every possible combination of rows together than filtering. using a proper join will get you better performance in the long run as your table line counts grow. Here is an example of a non cartesian join:
SELECT SIGLE_EEP, ID_SOUS_MODULE, LIBELLE
FROM mef_edi.eep a
INNER JOIN mef_edi.envoi e ON (a.ID_EEP = e.ID_EEP)
INNER JOIN mef_edi.sous_module s ON (a.ID_SOUS_MODULE = s.ID_SOUS_MODULE)

How to get names present in both views?

I have a very large view containing 5 million records containing repeated names with each row having unique transaction number. Another view of 9000 records containing unique names is also present. Now I want to retrieve records in first view whose names are present in second view
select * from v1 where name in (select name from v2)
But the query is taking very long to run. Is there any short cut method?
Did you try just using a INNER JOIN. This will return all rows that exist in both tables:
select v1.*
from v1
INNER JOIN v2
on v1.name = v2.name
If you need help learning JOIN syntax, here is a great visual explanation.
You can add the DISTINCT keyword which will remove any duplicate values that the query returns.
use JOIN.
The DISTINCT will allow you to return only unique records from the list since you are joining from the other table and there could be possibilities that a record may have more than one matches on the other table.
SELECT DISTINCT a.*
FROM v1 a
INNER JOIN v2 b
ON a.name = b.name
For faster performance, add an index on column NAME on both tables since you are joining through it.
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins