BigQuery join a nested table onto another table

BigQuery join a nested table onto another table - sql

I am trying to join a table of some project data (table1) with a nested array of project id's onto another table with project data (table2) (order is important here)
table1
proj_date num_proj_per_day proj_size proj_id(nested)
1/1/2020 4 150 a123
b456
c789
table2
proj_id(not nested) proj_loc lots_of_other_proj_fields....
a123 Los Angeles
b456 New York
c798 Los Angeles
d012 Denver
.... ....
desired outcome
proj_date num_proj_per_day proj_size proj_id(unnested) pro_loc
1/1/2020 4 150 a123 Los Angeles
1/1/2020 4 150 b456 New York
1/1/2020 4 150 c789 Los Angeles
I have been able to achieve this outcome if I write the sql code with table1 as the from and then cross join unnest(proj_id) and then left join table2. The problem is i need to have table2 in the from statement then join table1 on the unnested(proj_id). Order unfortuantely matters because I have to merge this new dataset(table1) into existing dataset/framework(table2) within Looker
Example of what works to get the correct outcome but does not work for my application
SELECT
table1.*,
table2.proj_loc
FROM table1
CROSS JOIN UNNEST(table1.proj_id) as unnested
LEFT JOIN table2
ON table2.proj_id = unnested.proj_id
I am looking for something like below but you can not put the unnest into the ON clause - bigquery pops error "Unexpected keyword UNNEST"
SELECT
table1.*,
table2.proj_loc
FROM table2
LEFT JOIN table1
ON UNNEST(table1.proj_id)=table2.proj_id
Thank you in advance and let me know if you need anymore clarifying information

Below is for BigQuery Standard SQL
#standardSQL
SELECT proj_date, num_proj_per_day, proj_size, t2.*
FROM `project.dataset.table2` t2
JOIN `project.dataset.table1` t1
ON t2.proj_id IN UNNEST(t1.proj_id)
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.table1` AS (
SELECT DATE '2020-01-01' proj_date, 4 num_proj_per_day, 150 proj_size, ['a123','b456','c789'] proj_id
),`project.dataset.table2` AS (
SELECT 'a123' proj_id, 'Los Angeles' proj_loc, 1 proj_field1, 2 proj_field2, 3 proj_field3 UNION ALL
SELECT 'b456', 'New York', 21, 22, 23 UNION ALL
SELECT 'c789', 'Los Angeles', 31, 32, 33 UNION ALL
SELECT 'd012', 'Denver', 41, 42, 43
)
SELECT proj_date, num_proj_per_day, proj_size, t2.*
FROM `project.dataset.table2` t2
JOIN `project.dataset.table1` t1
ON t2.proj_id IN UNNEST(t1.proj_id)
with output
Row proj_date num_proj_per_day proj_size proj_id proj_loc proj_field1 proj_field2 proj_field3
1 2020-01-01 4 150 a123 Los Angeles 1 2 3
2 2020-01-01 4 150 b456 New York 21 22 23
3 2020-01-01 4 150 c789 Los Angeles 31 32 33

Related

How to join two total tables using sql?

For a university work we have two tables in sql:
table1:
column_name1 number_P1
PARIS 10
LISBOA 20
RIO 30
table2:
column_name2 number_P2
PARIS 100
NEW YORK 300
I need to join the two tables by adding the total number of people in each city. So I tried to do:
SELECT table1.column_name1,
number_P2 + number_P1 AS TOTAL
FROM table1
LEFT JOIN table2 ON table1.column_name = table2.column_name;
However, if a city A appears in table 1 and does not appear in table 2 this would not work. The same would happen if a City B appears in table 2 and does not appear in table 1. How can I generalize these situations?
Desired output:
column_name number_P
PARIS 110
LISBOA 20
RIO 30
NEW YORK 300

We can try to use UNION ALL with SUM instead of JOIN
SELECT column_name,
SUM(number_P) number_P
FROM (
SELECT column_name1 as column_name,number_P1 as number_P
FROM table1
UNION ALL
SELECT column_name2,number_P2
FROM table2
) t1
GROUP BY column_name

Another way to achieve this without a subquery.
SELECT IFNULL(table1.column_name1,table2.column_name2) AS ColumnName,
(IFNULL(number_P2,0)+ IFNULL(number_P1,0)) AS TOTAL
FROM table1
FULL JOIN table2 ON table1.column_name1 = table2.column_name2;
Output
ColumnName
TOTAL
PARIS
110
LISBOA
20
RIO
30
NEW YORK
300
To replace 'RIO' with 'RIO DE JANEIRO'
SELECT CASE IFNULL(table1.column_name1,table2.column_name2)
WHEN 'RIO' THEN 'RIO DE JANEIRO'
ELSE IFNULL(table1.column_name1,table2.column_name2) END AS ColumnName,
(IFNULL(number_P2,0)+ IFNULL(number_P1,0)) AS TOTAL
FROM table1
FULL JOIN table2 ON table1.column_name1 = table2.column_name2;

Join records only on first match

im trying to join two tables. I only want the first matching row to be joined the others have to be null.
One of the tables contains daily records per User and the second table contains the goal for each user and day.
The joined result table should only join the firs ocurrence of User and Day and set the others to null. The Goal in the joined table can be interpreted as DailyGoal.
Example:
Table1 Table2
Id Day User Value Id Day User Goal
================================ ============================
01 01/01/2020 Bob 100 01 01/01/2020 Bob 300
02 01/01/2020 Bob 150 02 02/01/2020 Carl 170
03 01/01/2020 Bob 50
04 02/01/2020 Carl 200
05 02/01/2020 Carl 30
ResultTable
Day User Value Goal
============================================
01/01/2020 Bob 100 300
01/01/2020 Bob 150 (null)
01/01/2020 Bob 50 (null)
02/01/2020 Carl 200 170
02/01/2020 Carl 30 (null)
I tryed doing top1, distinct, subqueries but I cant find way to do it. Is this possible?

One option uses window functions:
select t1.*, t2.goal
from (
select t1.*,
row_number() over(partition by day, user order by id) as rn
from table1 t1
) t1
left join table2 t2 on t2.day = t1.day and t2.user = t1.user and t1.rn = 1
A case expression is even simpler:
select t1.*,
case when row_number() over(partition by day, user order by id) = 1
then t2.goal
end as goal
from table1 t1

Oracle : merging a row and its next one into the same row

I have an issue here.
I have these 4 rows of data :
Origin Destination Distance Carrier Price
Miami New-York 800 BF 500
Dallas Chicago 300 AL 200
Dallas Chicago 300 KH 200
Miami New-York 800 JH 500
What i want is to merge rows 2 and 3 into one row like this :
Dallas Chicago 300 AL, KH 200 (All information is the same except the Carrier)
The problem is that I have to check if the previous row is containing the same information except carriers, for all rows.
How can I achieve that ? with LEAD and LAG ?
Thanks for your help.

Do a self join:
select t1.Origin, t1.Destination, t1.Distance, t1.Carrier, t2.Distance, t2.Carrier
from table t1
join table t2 on t1.Origin = t2.Origin
and t1.Destination = t2.Destination
and t1.Carrier < t2.Carrier
Row order unimportant here. (Of course, that's the dbms way!)
If you want to return alone flights too, do LEFT JOIN instead of just JOIN.

Here you go. But it will add Miami New-York row too. If you want only 2 adjecent rows to be merged, then you need another column like ID or InsertDate or something like that. Then we can modify the given query to aggregate based on that.
with tbl (Origin, Destination, Distance ,Carrier ,Price)
as
(select 'Miami','New-York',800,'BF ', 500 from dual union
select 'Dallas','Chicago',300,' AL', 200 from dual union
select 'Dallas','Chicago',300,' KH', 200 from dual union
select 'Miami','New-York',800,'JH',500 from dual)
select Origin,Destination,Distance,listagg(carrier,',') WITHIN GROUP (ORDER BY origin ) as AggCarrier,Price from tbl
group by Origin,Destination,Distance,Price
Output
Origin Destination Distance AggCarrier Price
Miami New-York 800 BF ,JH 500
Dallas Chicago 300 AL, KH 200
EDIT: Unless we have any column to identify the insert order of the data, you cannot achieve what you want. See the below example. I tried addidng rownum to your data. But it will not assign the rownum in the way you want exactly. It has to come from the table which you want to use. See the example below.
with tbl (Origin, Destination, Distance ,Carrier ,Price)
as
(select 'Miami','New-York',800,'BF', 500 from dual union
select 'Dallas','Chicago',300,'AL', 200 from dual union
select 'Dallas','Chicago',300,'KH', 200 from dual union
select 'Miami','New-York',800,'JH',500 from dual
)
select rownum,tbl.* from tbl
The Output doestn't return Miami first row before Dallas.
ROWNUM ORIGIN DESTINATION DISTANCE CARRIER PRICE
1 Dallas Chicago 300 AL 200
2 Dallas Chicago 300 KH 200
3 Miami New-York 800 BF 500
4 Miami New-York 800 JH 500
So you need anything. ID/InsertTime or any other identifier to find that. Else DB will never know which record was inserted first. Please ask for it to achieve what you want.

Oracle 10g unpivot returning values in one column and creating month column

We just found out that the new database we have been given access to is Oracle 10g, so we are unable to use fcn like UNPIVOT.
We have a table like this..
SUBMISSION COUNTRY CPM_ID PFM_ID T_AREA CNTRY_CODE V_TYPE RES_CAT JAN_2014 FEB_2014
01-JUN-2014 USA 10 24 TEST1 USA V1 210 5 10
01-AUG-2014 UK 20 30 TEST2 UK V1 213 20 30
The desired output would look like this...
SUBMISSION COUNTRY CPM_ID PFM_ID T_AREA CNTRY_CODE V_TYPE RES_CAT MONTH VALUE
01-JUN-2014 USA 10 24 TEST1 USA V1 210 01-JAN-2014 5
01-JUN-2014 USA 10 24 TEST1 USA V1 210 01-FEB-2014 10
01-AUG-2014 UK 20 30 TEST2 UK V1 213 01-JAN-2014 20
01-AUG-2014 UK 20 30 TEST2 UK V1 213 01-FEB-2014 30
I am working with a query like this...but I cannot get the month column to come out right...
select *
from (select t.submission,
t.country,
t.cpm_id,
t.pfm_id,
t.t_area,
t.cntry_code,
t.v_type,
t.res_cat,
(case
when n.n = 1 then JAN-2014
when n.n = 1 then FEB-2014 end) as value
from table1 t cross join
(select FEB_2014 as n from dual union all
select FEB_2014 from dual) n
) s
where value is not null;
Thanks for your help,

I would do:
select t.submission,
t.country,
t.cpm_id,
t.pfm_id,
t.t_area,
t.cntry_code,
t.v_type,
t.res_cat,
n.d,
case when n.d = '01-JAN-2014' then t.jan_2014 else t.feb_2014 end value
from table1 t
cross join
(
select '01-JAN-2014' d from dual
union all
select '01-FEB-2014' d from dual
) n;

How to use SQL to output latest info with multiple columns

I have a "weather" table below with 3 cols:
City Temperature Date
New York 22 C 10/10/2005
Seattle 21 C 10/10/2005
New York 18 C 10/09/2005
Seattle 20 C 10/09/2005
Washington 17 C 10/09/2005
New York 21 C 10/08/2005
Washington 20 C 10/08/2005
I want to find out the latest info on the City and Temperature in 3 cols as well (see example):
City Temperature Date
New York 22 C 10/10/2005
Seattle 21 C 10/10/2005
Washington 17 C 10/09/2005
Can anyone help?

Find the maximum (latest) date for each city in a sub-query then join on the date and city:
select weather.*
from weather
inner join
(select city, max(date) from weather group by city) as latest
on weather.date = latest.date
and weather.city = latest.city

There are several methods. Personally, I think the following is the most expressive:
SELECT * FROM weather w1 WHERE NOT EXISTS
(SELECT * FROM weather w2 WHERE w2.city = w1.city AND w2.date > w1.date)

Option 1:
select city, temparature, date
from weather t1
where date = (select max(date)
from weather t2
where t2.city=t1.city)
Option 2:
select t1.city, t1.temp, t1.date
from weather t1
where not exists (select 1
from weather t2
where t2.date > t1.date and t1.city=t2.city)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas