Join 3 tables in bigquery with no duplication

Join 3 tables in bigquery with no duplication - google-bigquery

I want to join 3 tables:
table 1
c_id gateway_id timestamp
1 0 2019-01-05 06:53:24 UTC
2 0 2019-01-05 08:51:24 UTC
table 2
gateway_id gateway_name
0 a
1 b
table 3
date u_id
2022-08-13 1
2022-08-13 2
I join from 3 tables. Tabel 1 and 2 on gateway_id and join that two to tables 3 on c_id = u_id. Here's the query i try:
WITH
date_dict AS (
SELECT
DATE('2022-08-04') AS start_dt,
DATE_SUB(current_date, INTERVAL 1 day) AS end_dt),
mp AS(
SELECT
a.c_id,
a.gateway_id,
b.gateway_name,
SUM(a.actual_amount) total_p
FROM a
JOIN b USING(gateway_id)
WHERE
date(a.update_timestamp) between (select start_dt from date_dict) and (select end_dt from date_dict)
AND b.gateway_source_name = 'Marketplace'
GROUP BY 1,2,3)
SELECT m.c_id, p.u_id, m.gateway_id, m.gateway_name, date, m.total_p
FROM mp m
JOIN p ON m.c_id = CAST(p.u_id AS INT64)
But there are duplication in the result like this:
c_id u_id gateway_id gateway_name date_key total_p
1 1 0 a 2022-08-18 800000
1 1 0 a 2022-08-18 800000
1 1 1 b 2022-08-18 634490
1 1 1 b 2022-08-18 634490
1 1 2 c 2022-08-18 200000
1 1 2 c 2022-08-18 200000
When I adding GROUP BY in the last query there are error. I want the result like this:
c_id u_id gateway_id gateway_name date_key total_p
1 1 0 a 2022-08-18 800000
1 1 1 b 2022-08-18 634490
1 1 2 c 2022-08-18 200000
There are no duplication. Any suggestion?

Related

How to count numbers which are defined in other table, also show zero counts

This is the current situation:
Table1
key
some_id
date
class
1
1
1.1.2000
2
1
2
1.1.2000
2
2
1
1.1.1999
3
...
...
...
...
I'm counting the classes and providing the information through a view by using following select statement:
SELECT key, date, class, count(class) as cnt
FROM table1
GROUP BY key, date, class
The result would be:
key
date
class
cnt
1
1.1.2000
2
2
2
1.1.1999
3
1
...
...
...
...
but now there is another table which includes all possible class-codes, e.g.
parameter_key
class_code
1
1
1
2
1
3
2
1
...
...
For my view I'm only querying data for parameter_key 1. And the view now needs to show all possible class_codes, also if the count would be 0.
So my desired result table is:
key
date
class
cnt
1
1.1.2000
1
0
1
1.1.2000
2
2
1
1.1.2000
3
0
2
1.1.1999
1
0
2
1.1.1999
2
0
2
1.1.1999
3
1
...
...
...
...
but I just can't get my head around how to do this. I've tried to add a right join like this but that does not change anything (probably because I join the class column and do an aggregate which won't be displayed if there is nothing to count?):
SELECT key, date, class, count(class) as cnt
FROM table1
RIGHT JOIN table2 on table1.class = table2.class and table2.parameter_key = 1
GROUP BY key, date, class
Any idea on how to achieve the desired result table?

Use a PARTITIONed join:
SELECT t2.parameter_key AS key,
t1."DATE",
t2.class_code AS class,
count(t1.class) as cnt
FROM table2 t2
LEFT OUTER JOIN table1 t1
PARTITION BY (t1."DATE")
ON (t1.class = t2.class_code AND t1.key = t2.parameter_key)
WHERE t2.parameter_key = 1
GROUP BY
t2.parameter_key,
t1."DATE",
t2.class_code
Which, for the sample data:
CREATE TABLE table1 (key, some_id, "DATE", class) AS
SELECT 1, 1, DATE '2000-01-01', 2 FROM DUAL UNION ALL
SELECT 1, 2, DATE '2000-01-01', 2 FROM DUAL UNION ALL
SELECT 2, 1, DATE '1999-01-01', 3 FROM DUAL;
CREATE TABLE table2 (parameter_key, class_code) AS
SELECT 1, 1 FROM DUAL UNION ALL
SELECT 1, 2 FROM DUAL UNION ALL
SELECT 1, 3 FROM DUAL UNION ALL
SELECT 2, 1 FROM DUAL;
Outputs:
KEY
DATE
CLASS
CNT
1
1999-01-01 00:00:00
1
0
1
1999-01-01 00:00:00
2
0
1
1999-01-01 00:00:00
3
0
1
2000-01-01 00:00:00
1
0
1
2000-01-01 00:00:00
2
2
1
2000-01-01 00:00:00
3
0
Or, depending on how you want to manage the join conditions:
SELECT t1.key,
t1."DATE",
t2.class_code AS class,
count(t1.class) as cnt
FROM table2 t2
LEFT OUTER JOIN table1 t1
PARTITION BY (t1.key, t1."DATE")
ON (t1.class = t2.class_code)
WHERE t2.parameter_key = 1
GROUP BY
t1.key,
t1."DATE",
t2.class_code
Which outputs:
KEY
DATE
CLASS
CNT
1
2000-01-01 00:00:00
1
0
1
2000-01-01 00:00:00
2
2
1
2000-01-01 00:00:00
3
0
2
1999-01-01 00:00:00
1
0
2
1999-01-01 00:00:00
2
0
2
1999-01-01 00:00:00
3
1
db<>fiddle here

SQL Join Multiple Tables with One to Many Relationships without "Duplication"

First let me start by saying that I do understand that these are not duplicate rows. I understand the basic functionality of joining multiple tables. I am just trying to find out if there is a way to do what I am trying to do in SQL and I don't know a better way to title it.
Example Tables:
Day Table
Day_KEY Day_Label
1 Mon
2 Tues
3 Wed
4 Thur
EstHours Table
EstHours_KEY Day_KEY Est_Hours
1 1 2
2 1 1
3 1 3
ActHours Table
ActHours_KEY Day_KEY Act_Hours
1 1 3
2 1 2
3 1 2
Example Query:
select *
from Day
join EstHours on EstHours.Day_KEY = Day.Day_KEY
join ActHours on ActHours.Day_KEY = Day.Day_KEY
Result:
Day_KEY Day_Label EstHours_KEY Day_KEY Est_Hours ActHours_KEY Day_KEY Act_Hours
1 Mon 1 1 2 1 1 3
1 Mon 1 1 2 2 1 2
1 Mon 1 1 2 3 1 2
1 Mon 2 1 1 1 1 3
1 Mon 2 1 1 2 1 2
1 Mon 2 1 1 3 1 2
1 Mon 3 1 3 1 1 3
1 Mon 3 1 3 2 1 2
1 Mon 3 1 3 3 1 2
Desired Result:
Day_KEY Day_Label EstHours_KEY Day_KEY Est_Hours ActHours_KEY Day_KEY Act_Hours
1 Mon 1 1 2 1 1 3
1 Mon 2 1 1 2 1 2
1 Mon 3 1 3 3 1 2
What I have tried:
1)
Query:
select *
from (
select *, row_number() over (partition by Day.Day_KEY order by EstHours_KEY) as rn
from Day
join EstHours on EstHours.Day_KEY = Day.Day_KEY) rt
join (
select *, row_number() over (partition by Day_KEY order by ActHours_KEY) as rn
from ActHours) on ActHours.Day_KEY = Day.Day_KEY and EstHours.rn = ActHours.rn
Result:
Day_KEY Day_Label EstHours_KEY Day_KEY Est_Hours ActHours_KEY Day_KEY Act_Hours
1 Mon 1 1 2 1 1 3
1 Mon 2 1 1 2 1 2
1 Mon 3 1 3 3 1 2
This does what I need unless the EstHours has less rows than the ActHours, in which case it will leave those rows out from ActHours.
2)
Query:
select *, null, null, null
from Day
join EstHours on EstHours.Day_KEY = Day.Day_KEY
union
select Day.*, null, null, null, ActHours.*
from Day
join ActHours on ActHours.Day_KEY = Day.Day_KEY
Result:
Day_KEY Day_Label EstHours_KEY Day_KEY Est_Hours ActHours_KEY Day_KEY Act_Hours
1 Mon 1 1 2 null null null
1 Mon 2 1 1 null null null
1 Mon 3 1 3 null null null
1 Mon null null null 1 1 3
1 Mon null null null 2 1 2
1 Mon null null null 3 1 2
This does what I want except I would prefer the values to be on the same rows, so that the maximum number of rows for a single Day_KEY would be that of the either the EstHours or ActHours, whichever has more.
Has anyone any idea of how this can be done? Am I going about this all wrong?

Sounds like you need a 'group by' clause that has a unique/distinct field belonging to the 'one' table in the one to many relationship. Such as a row id.
select * from table_a,table_b,table_c group by table_a.rowid
This will collapse the results to distinct rows from table_a, and also allow the select result to use/include aggregate functions like sum() on the fields from table_b or table_c.
In the example I used, think of every row from table_b and table_c overlapping with the unique rows of table_a that get returned.

SQL Aggregate on Two tables

Table A has millions of records from 2014, Using Oracle
ID Sales_Amount Sales_Date
1 10 20/11/2014
1 10 22/11/2014
1 10 22/12/2014
1 10 22/01/2015
1 10 22/02/2015
1 10 22/03/2015
1 10 22/04/2015
1 10 22/05/2015
1 10 22/06/2015
1 10 22/07/2015
1 10 22/08/2015
1 10 22/09/2015
1 10 22/10/2015
1 10 22/11/2015
Table B
ID ID_Date
1 22/11/2014
2 01/12/2014
I want sum of totals for 6 months as well as 1 year for ID 1 taking starting
date from Table B as 22/11/2014
Output Sales_Amount_6Months Sales_Amount_6Months
1 70 130
Shall I use add_months in this case?

Yes, you can use ADD_MONTHS() and conditional aggregation :
SELECT b.id,
SUM(CASE WHEN a.sales_date between b.id_date AND ADD_MONTHS(b.id_date,6) THEN a.sales_amount ELSE 0 END) as sales_6_month,
SUM(CASE WHEN a.sales_date between b.id_date AND ADD_MONTHS(b.id_date,12) THEN a.sales_amount ELSE 0 END) as sales_12_month
FROM TableB b
JOIN TableA a
ON(b.id = a.id)
GROUP BY b.id

Query to fetch max records from a table

I am having a table name batch_log whose structure is as below
batch_id run_count start_date end_date
1 4 03/12/2014 03/12/2014
1 3 02/12/2014 02/12/2014
1 2 01/12/2014 01/12/2014
1 1 30/11/2014 30/11/2014
2 5 03/12/2014 03/12/2014
2 4 02/12/2014 02/12/2014
2 3 01/12/2014 01/12/2014
2 2 30/11/2014 30/11/2014
2 1 29/11/2014 29/11/2014
3 3 02/12/2014 02/12/2014
3 2 01/12/2014 01/12/2014
3 1 30/11/2014 30/11/2014
I need to fetch rows for all the batch_id with max run_count.
result of the query should be :
batch_id run_count start_date end_date
1 4 03/12/2014 03/12/2014
2 5 03/12/2014 03/12/2014
3 3 02/12/2014 02/12/2014
I tried many options using, group by batch_id and run_count but not able to get the correct result
select a.* from batch_log a,batch_log b
where a.batch_id =b.batch_id
and a.run_count=b.run_count
and a.run_count in (select max(run_count) from batch_log
group by batch_id ) order by a.batch_id
Plese help

select *
from(
select a.*, max(run_count) over (partition by batch_id) max_run_count
from batch_log a)
where run_count=max_run_count;

This should also work:
SELECT * FROM batch_log b1
WHERE b1.run_count = (SELECT max(b2.run_count)
FROM batch_log b2
WHERE b2.batch_id = b1.batch_id
GROUP BY b2.batch_id)

You can do it by this query :
select *
from batch_log a
inner join (
select b.batch_id , max(run_count) as run_count
from batch_log b
group by b.batch_id
) c on a.batch_id = c.batch_id and a.run_count = c.run_count
Hope it helps

Answer given by Arion is looking perfect to me. You can modify to this as per below to achieve your exact requirement
SELECT batch_id,run_count,start_date,end_date
FROM
(
SELECT
ROW_NUMBER() OVER(PARTITION BY batch_id ORDER BY run_count DESC) AS RowNbr,
batch_log.*
FROM
batch_log
) as batch
WHERE
batch.RowNbr=1

SQL Server: Join 2 tables, preferring results from one table where there is a conflict

I have tables that looks like this:-
tblConsuptionsFromA
id meter date total
1 1 03/01/2014 100.1
2 1 04/01/2014 184.1
3 1 05/01/2014 134.1
4 1 06/01/2014 132.4
5 1 07/01/2014 126.1
6 1 08/01/2014 190.1
and...
tblConsuptionsFromB
id meter date total
1 1 01/01/2014 164.1
2 1 02/01/2014 133.1
3 1 03/01/2014 136.1
4 1 04/01/2014 125.1
5 1 05/01/2014 190.1
6 1 06/01/2014 103.1
7 1 07/01/2014 164.1
8 1 08/01/2014 133.1
9 1 09/01/2014 136.1
10 1 10/01/2014 125.1
11 1 11/01/2014 190.1
I need to join these two tables, but if there is an entry for the same day in both table... only take the result from tblConsumptionsFromA.
So the result would be:-
id source_id meter from date total
1 1 1 B 01/01/2014 164.1
2 2 1 B 02/01/2014 133.1
3 1 1 A 03/01/2014 100.1
4 2 1 A 04/01/2014 184.1
5 3 1 A 05/01/2014 134.1
6 4 1 A 06/01/2014 132.4
7 5 1 A 07/01/2014 126.1
8 6 1 A 08/01/2014 190.1
9 9 1 B 09/01/2014 136.1
10 10 1 B 10/01/2014 125.1
11 11 1 B 11/01/2014 190.1
This is beyond me, so if someone can solve... I will be very impressed.

Here's one way to do it:
SELECT
COALESCE(a.source_id,b.source_id) as source_id,
COALESCE(a.meter,b.meter) as meter,
COALESCE(a.[from],b.[from]) as [from],
COALESCE(a.[date],b.[date]) as [date],
COALESCE(a.total,b.total)
FROM (select source_id,meter,'b' as [from],[date],total
from tblConsuptionsFromB) b
left join
(select source_id,meter,'a' as [from],[date],total
from tblConsuptionsFromA) a
on
a.meter = b.meter and
a.[date] = b.[date]
Unfortunately, there's no shorthand like COALESCE(a.*,b.*) to apply the COALESCE to all columns

The UNION operator is used to combine the result-set of two or more SELECT statements.
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;
The document of UNION is here:
http://www.w3schools.com/sql/sql_union.asp
And ROW_NUMBER() returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.
ROW_NUMBER ( )
OVER ( [ PARTITION BY value_expression , ... [ n ] ] order_by_clause )
The document of ROW_NUMBER() is here:
http://technet.microsoft.com/en-us/library/ms186734.aspx
The following SQL statement uses UNION to select all records from the "tblConsuptionsFromA" and part of records from "tblConsuptionsFromB" tables.
SELECT ROW_NUMBER() OVER(ORDER BY DATE ASC) AS 'id',
id AS 'source_id',meter, date,t AS 'from',total
FROM(
SELECT id,meter, date, 'A' AS t, total FROM tblConsuptionsFromA
UNION
SELECT id,meter, date, 'B' AS t,total FROM tblConsuptionsFromB
WHERE NOT date IN (SELECT date FROM tblConsuptionsFromA)
) AS C;
Hope this helps.

select ta.id, tb.id, ta.meter,
if(ta.date is null, 'B', 'A') as from,
if(ta.date is null, tb.date, ta.date) as date,
if(ta.date is null, tb.total, ta.total) as total
from tblConsuptionsFromA ta
full join tblConsuptionsFromB tb on ta.date=tb.date

You would need to do a Union of the 2 tables, and exclude records from tabletblConsuptionsFromB which are present in tblConsuptionsFromA, something like:
Select Id, Source_ID, meter, 'A' From, Date, Total
FROM tblConsuptionsFromA
Union All
Select Id, Source_ID, meter, 'B' From, Date, Total
FROM tblConsuptionsFromB
Where Date NOT EXISTS (Select Date from tblConsuptionsFromA)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Join 3 tables in bigquery with no duplication - google-bigquery

Related

How to count numbers which are defined in other table, also show zero counts

SQL Join Multiple Tables with One to Many Relationships without "Duplication"

SQL Aggregate on Two tables

Query to fetch max records from a table

SQL Server: Join 2 tables, preferring results from one table where there is a conflict

Categories

Resources