What is the bigquery sql inner join equivalent for rows? - sql

I have the following 2 tables:
I would like to create a table where
all rows of table 1 are included
if the timestamp of any row of table 2 falls in between the timestamp and endTime of any row of table 1, then include the row.
The resultant table would like:
There are columns/fields that are common to both tables, but I haven't included them for brevity. Basically, I am looking for the equivalent of an inner join operation but then instead of adding the rows of table 2 as columns, add them as rows. I have written a sample code whilst experimenting with inner join as below:
WITH table_a AS (
SELECT 'x' AS event, 1 AS timestamp, 5 AS endtime, 'a' AS field1
UNION ALL SELECT 'x', 100, 200, 'b'
),
table_b AS (
SELECT 'y' AS event, 2 AS timestamp, 'm' AS field2
UNION ALL SELECT 'y', 25, 'n'
UNION ALL SELECT 'y', 150, 'o'
)
SELECT
table_a.*,
table_b.*
FROM table_a JOIN table_b
Any thoughts what bigquery sql functions I can use?

Use below
select *, null field2 from table_a union all
select distinct b.event, b.timestamp, null, cast(null as string), field2
from table_b b
join table_a a
on b.timestamp between a.timestamp and a.endtime
if applied to sample data in your question - output is

Related

SQL - summarize results from multiple tables

I have the following simple SQL query that I need to run on 3 tables:
SELECT
A.date,
SUM(A.number)
FROM A
GROUP BY
A.date
But I have two other tables (B and C) on which I'd like to run the same query. And combine the results into one table as output.
I am expecting the output to look something like:
date
A.number
B.number
C.number
2022
12322.1
9999999
888888
We can try the following union approach:
SELECT
date,
SUM(CASE WHEN src = 'A' THEN number ELSE 0 END) AS A_sum,
SUM(CASE WHEN src = 'B' THEN number ELSE 0 END) AS B_sum,
SUM(CASE WHEN src = 'C' THEN number ELSE 0 END) AS C_sum
FROM
(
SELECT date, number, 'A' AS src FROM A
UNION ALL
SELECT date, number, 'B' FROM B
UNION ALL
SELECT date, number, 'C' FROM C
) t
GROUP BY date
ORDER BY date;
Here is my approach:
Create Table TableA
(
Dates Date,
Number Int
)
GO
Create Table TableB
(
Dates Date,
Number Int
)
GO
Create Table TableC
(
Dates Date,
Number Int
)
GO
Insert Into TableA
Values ('2023-01-01', 100000),
('2023-01-02',30000)
GO
Insert Into TableB
Values ('2023-01-01', 200000),
('2023-01-02',10000)
GO
Insert Into TableC
Values ('2023-01-01', 400000),
('2023-01-02',20000)
GO
SELECT * from
(
Select *,'A' Det from TableA
UNION ALL
Select *,'B' from TableB
UNION ALL
Select *,'C' from TableC
)ABC
PIVOT
(SUM(ABC.Number) FOR Det IN (A,B,C))
XYZ
DROP TABLE TableA
DROP Table TableB
DROP table TableC
IMO best option is to create calendar first and then left join created calendar with different tables:
Calendar is important to gather data from join.
And i'ts quite odd if you have column date as year.
In my oppinion it's to small level of complexity.
But ok. Let's say that you have only year.
Create table with years.
Create Table Years
(
years int
)
next:
INSERT INTO Years(years )
VALUES
(2022),(2021),(2020),(2019),(2018),(2017),(2016),(2015)
eg.
SELECT
y.*,
sum(a.number) as SumA,
sum(b.number) as SumB,
sum(c.number) as SumC
FROM Years as y
left join
table_a a
on
y.years=a.date
left join
table_b b
on
y.years=b.date
left join
table_c c
on
y.years=c.date
GROUP BY
y.years
Hopefully this helps!
Please let me know if it works as you wanted.

How to compare column in one table with array from another table in BigQuery?

Just continue from the answer for my previous question.
I want to get all values from table b (in rows) if there is any difference between values in arrays from table a by same ids
WITH a as (SELECT 1 as id, ['123', 'abc', '456', 'qaz', 'uqw'] as value
UNION ALL SELECT 2, ['123', 'wer', 'thg', '10', '200']
UNION ALL SELECT 3, ['200']
UNION ALL SELECT 4, null
UNION ALL SELECT 5, ['140']),
b as (SELECT 1 as id, '123' as value
UNION ALL SELECT 1, 'abc'
UNION ALL SELECT 1, '456'
UNION ALL SELECT 1, 'qaz'
UNION ALL SELECT 1, 'uqw'
UNION ALL SELECT 2, '123'
UNION ALL SELECT 2, 'wer'
UNION ALL SELECT 2, '10'
UNION ALL SELECT 3, null
UNION ALL SELECT 4, 'wer'
UNION ALL SELECT 4, '234'
UNION ALL SELECT 5, '140'
UNION ALL SELECT 5, '121'
)
SELECT * EXCEPT(flag)
FROM (
SELECT b.*, COUNTIF(b.value IS NULL) OVER(PARTITION BY id) flag
FROM a LEFT JOIN a.value
FULL OUTER JOIN b
USING(id, value)
)
WHERE flag > 0
AND NOT id IS NULL
It works well for all ids except 5.
In my case I need to return all values if there is any difference.
In example array with id 5 from table a has only one value is '140' while there are two rows with values by id 5 from table b. So in this case all values by id 5 from table b also must appear in expected output
How need to modify this query to get what I want?
UPDATED
Seems like it works for me. But I can not be sure for 100%
SELECT * EXCEPT(flag)
FROM (
SELECT b.*, COUNTIF((b.value IS NULL AND a.value IS NOT NULL) OR (b.value IS NOT NULL AND a.value IS NULL)) OVER(PARTITION BY id) flag
FROM a LEFT JOIN a.value
FULL OUTER JOIN b
USING(id, value)
)
WHERE flag > 0
AND NOT id IS NULL
#standardSQL
SELECT *
FROM table_b
WHERE id IN (
SELECT id FROM table_a a
JOIN table_b b USING(id)
GROUP BY id
HAVING STRING_AGG(IFNULL(b.value, 'NULL') ORDER BY b.value) !=
IFNULL(ANY_VALUE((SELECT STRING_AGG(IFNULL(value, 'NULL') ORDER BY value) FROM a.value)), 'NULL')
)

MINUS functionality in BigQuery database

I am new to BigQuery database.
Like in Oracle database MINUS operator what is the same functionality in BigQuery? I did not find MINUS operator in BigQuery.
Oracle --> Minus
BigQuery --> ??
Though there is no MINUS function in BigQuery, you can use a LEFT OUTER JOIN as an alternative.
SELECT name, uid FROM a
MINUS
SELECT name, uid FROM b
Can be written as:
SELECT a.name, a.uid
FROM a LEFT OUTER JOIN b ON a.name= b.name AND a.uid= b.uid
WHERE b.name IS NULL
BigQuery doesn't have "MINUS", but it does have the functionally identical "EXCEPT DISTINCT".
with whole as
( select 1 as id, 'One' as value
union all
select 2 as id, 'Two' as value
union all
select 3 as id, 'Three' as value
),
sub_set as
(
select 1 as id, 'One' as value
union all
select 2 as id, 'Two' as value
)
select * from whole
except distinct
select * from sub_set
Result was
3 Three
Refer: https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#except
I am getting the error EXCEPT ALL is not supported, DISTINCT worked. Hope this helps.
StandardSQL Output for MINUS where ID is the composite key or primary key in Table 1 and Table2
same concept as Vamsi Mohan's
Select ID, Name from Table 1
where ID not in (Select distinct ID in Table 2)

SQl Query : need to get the latest created data in the child records

I have a requirment in which I need to get the latest created data in the child records.
Suppose there are two tables A and B. A is parent and B is child. They have 1:M relation. Both has some columns and B table has one 'created date' column also which holds the created date of the record in table B.
Now, I need to write a query which can fetch all records from A table and it's latest created child record from B table. suppose If two child records are created today in table B for a parent record then the latest one out of them should get fetch.
One record of A table could have many childs, so how can we achive this.
Result should be - Columns of tbl A, Columns of tbl B(Latest created one)
I hope the 'created date' is a DATETIME column. This would give you the most recent child record. Assuming you have a consistent ID in the parent table with the same ParentID in the child table as a foreign key....
select A.*, B.*
from A
join B on A.ParentID = B.ParentID
join (
select ParentID, max([created date]) as [created date]
from B
group by ParentID
) maxchild on A.ParentID = maxchild.ParentID
where B.ParentID = maxchild.ParentID and B.[created date] = maxchild.[created date]
Below is the query that can help you out.
select x, y from ( select a.coloumn_TAB_A x, b.coloumn_TAB_B y from TableA a ,
TableB b where a.primary_key=b.primary_key
and a.Primary_key ='XYZ' order by b.created_date desc) where rownum < 2
Here we have two tables A and B, Joined them based on primary keys, order them on created date column of Table B in Descending order.
Use this output as inline view for outer query and select whichever coloumn u want like x, y. where rownum < 2 (that will fetch the latest record of table B)
This is not the most efficient but will work (SQL Only):
SELECT [Table_A].[Columns], [Table_B].[Columns]
FROM [Table_A]
LEFT OUTER JOIN [Table_B]
ON [Table_B].ForeignKey = [Table_A].PrimaryKey
AND [Table_B].PrimaryKey = (SELECT TOP 1 [Table_B].PrimaryKey
FROM [Table_B]
WHERE [Table_B].ForeignKey = [Table_A].PrimaryKey
ORDER BY [Table_B].CREATIONDATE DESC)
You can use analytic functions to avoid hitting each table (or specifically B) more than once
Using CTEs to provide dummy data for A and B you can do this:
with A as (
select 1 as id from dual
union all select 2 from dual
union all select 3 from dual
),
B as (
select 1 as a_id, date '2012-01-01' as created_date, 'First for 1' as value
from dual
union all select 1, date '2012-01-02', 'Second for 1' from dual
union all select 1, date '2012-01-03', 'Third for 1' from dual
union all select 2, date '2012-02-01', 'First for 2' from dual
union all select 2, date '2012-02-03', 'Second for 2' from dual
union all select 3, date '2012-02-01', 'First for 3' from dual
union all select 3, date '2012-02-03', 'Second for 3' from dual
union all select 3, date '2012-02-05', 'Third for 3' from dual
union all select 3, date '2012-02-09', 'Fourth for 3' from dual
)
select id, created_date, value from (
select a.id, b.created_date, b.value,
row_number() over (partition by a.id order by b.created_date desc) as rn
from a
join b on b.a_id = a.id
)
where rn = 1
order by id;
ID CREATED_D VALUE
---------- --------- ------------
1 03-JAN-12 Third for 1
2 03-FEB-12 Second for 2
3 09-FEB-12 Fourth for 3
You can select any columns you want from A and B, but you'll need to alias them in the subquery if there are any with the same name in both tables.
You may also need to user rank() or dense_rank() instead of row_number to handle ties appropriately, if you can have child records with the same created date.

TSQL Comparing two Sets

When two sets are given
s1 ={ a,b,c,d} s2={b,c,d,a}
(i.e)
TableA
Item
a
b
c
d
TableB
Item
b
c
d
a
How to write Sql query to display "Elements in tableA and tableB are equal". [Without using SP or UDF]
Output
Elements in TableA and TableB contains identical sets
Use:
SELECT CASE
WHEN COUNT(*) = (SELECT COUNT(*) FROM a)
AND COUNT(*) = (SELECT COUNT(*) FROM b) THEN 'Elements in TableA and TableB contains identical sets'
ELSE 'TableA and TableB do NOT contain identical sets'
END
FROM (SELECT a.col
FROM a
INTERSECT
SELECT b.col
FROM b) x
Test with:
WITH a AS (
SELECT 'a' AS col
UNION ALL
SELECT 'b'
UNION ALL
SELECT 'c'
UNION ALL
SELECT 'd'),
b AS (
SELECT 'b' AS col
UNION ALL
SELECT 'c'
UNION ALL
SELECT 'd'
UNION ALL
SELECT 'a')
SELECT CASE
WHEN COUNT(*) = (SELECT COUNT(*) FROM a)
AND COUNT(*) = (SELECT COUNT(*) FROM b) THEN 'yes'
ELSE 'no'
END
FROM (SELECT a.col
FROM a
INTERSECT
SELECT b.col
FROM b) x
Something like this, using FULL JOIN:
SELECT
CASE
WHEN EXISTS (
SELECT * FROM s1 FULL JOIN s2 ON s1.Item = s2.Item
WHERE s1.Item IS NULL OR s2.Item IS NULL
)
THEN 'Elements in tableA and tableB are not equal'
ELSE 'Elements in tableA and tableB are equal'
END
This has the virtue of short-circuiting on the first non-match, unlike other solutions that require 2 full scans of each table (once for the COUNT(*), once for the JOIN/INTERSECT).
Estimated cost is significantly less than other solutions.
Watch out, I'm gonna use a Cross Join.
Declare #t1 table(val varchar(20))
Declare #t2 table(val varchar(20))
insert into #t1 values ('a')
insert into #t1 values ('b')
insert into #t1 values ('c')
insert into #t1 values ('d')
insert into #t2 values ('c')
insert into #t2 values ('d')
insert into #t2 values ('b')
insert into #t2 values ('a')
select
case when
count(1) =
(((Select count(1) from #t1)
+ (Select count(1) from #t2)) / 2.0)
then 1 else 0 end as SetsMatch from
#t1 t1 cross join #t2 t2
where t1.val = t2.val
My monstrocity:
;with SetA as
(select 'a' c union
select 'b' union
select 'c')
, SetB as
(select 'b' c union
select 'c' union
select 'a' union
select 'd'
)
select case (select count(*) from (
select * from SetA except select * from SetB
union
select * from SetB except select * from SetA
)t)
when 0 then 'Equal' else 'NotEqual' end 'Equality'
Could do it with EXCEPT and a case
select
case
when count (1)=0
then 'Elements in TableA and TableB contains identical sets'
else 'Nope' end from (
select item from s1
EXCEPT
select item from s2
) b
Since this thread was very helpful to me, I thought I'd share my solution.
I had a similar problem, perhaps more generally applicable than this specific single-set comparison. I was trying to find the id of an element that had a set of multi-element child elements that matched a query set of multi-element items.
The relevant schema information is:
table events, pk id
table solutions, pk id, fk event_id -> events
table solution_sources, fk solutionid -> solutions
columns unitsourceid, alpha
Query: find the solution for event with id 110 that has the set of solution_sources that match the set of (unitsourceid, alpha) in ss_tmp. (This can also be done without the tmp table, I believe.)
Solution:
with solutionids as (
select y.solutionid from (
select ss.solutionid, count(ss.solutionid) x
from solutions s, solution_sources ss
where s.event_id = 110 and ss.solutionid = s.id
group by ss.solutionid
) y where y.x = ( select count(*) from ss_tmp )
)
select solutionids.solutionid from solutionids where
(
select case
when count(*) = ( select count(*) from ss_tmp ) then true
else false
end
from
( SELECT unitsourceid, alpha FROM solution_sources
where solutionid = solutionids.solutionid
INTERSECT
SELECT unitsourceid, alpha FROM ss_tmp ) x
)
Tested against a test query of 4 items and a test db that had a matching solution (same number of child elements, each that matched), several completely non-matching solutions, and 1 solution that had 3 matching child elements, 1 solution that had all 4 matching child elements, plus an additional child, and 1 solution that had 4 child elements of which 3 of the 4 matched the query. Only the id of the true match was returned.
thanks a lot
-Linus
Use EXCEPT statement
When using the EXCEPT statement to test if two sets contain the same rows, you will need to do the EXCEPT in both directions (A EXCEPT B and B EXCEPT A). If either comparison returns any records, then the sets are different. If no records are returned by either, they are the same.
The nice thing about this is that you can do this comparison with any number of specific columns and NULL values are handled implicitly without having to jump through hoops to compare them.
A good use case for this is verifying that saving a set of records happened correctly, especially when affecting an existing set.
SELECT IsMatching = (1 ^ convert(bit, count(*)))
FROM (
SELECT Mismatched = 1 -- Can be any column name
FROM (
SELECT Item -- Can have additional columns
FROM TableA
EXCEPT
SELECT Item -- Can have additional columns
FROM TableB
) as A
UNION
SELECT Mismatched = 1 -- Can be any column name
FROM (
SELECT Item -- Can have additional columns
FROM TableB
EXCEPT
SELECT Item -- Can have additional columns
FROM TableA
) as A
) as A