SQL query to sum by group and inner join two tables - sql

I have two tables like so:
Each row in both tables is uniquely identified by the columns week and city.
I want to create one table with 5 columns (week, value_a, value_b, value1, value2) and 3 rows (1 row for each week, with the value columns being summed across each city). The final table should look exactly like this:
sum_a is the sum of value a for each week across all cities, sum_b is the sum of value_b across all cities and so on.
Here is my SQL query:
SELECT *
FROM table1
INNER JOIN table2
ON table1.week = table2.week AND
table1.city = table2.city

If you need to sum column relied by join you just need to sum your tables before to avoid repeat data
Considere that if you have a week in your table 1 and not in the table 2 the data will not be shawn in your example
SELECT
A1.week,
A1.city,
A1.value1,
A1.value2,
A2.value1,
A2.value2
FROM (
SELECT
Week,
city,
sum(value1),
sum(value2)
FROM table1
GROUP BY Week, city
) A1
INNER JOIN (
SELECT
Week,
city,
sum(valueA),
sum(valueB)
FROM table2
GROUP BY Week, city
) A2
ON a1.week = a2.week AND a1.city = a2.city

the below query can give you output as expected:
SELECT table1.week, sum(value_a) as sum_a, sum(value_b) as sum_b, sum(value1) as sum_1, sum(value2) as sum_2
FROM table1
INNER JOIN table2 ON table1.week = table2.week AND table1.city = table2.city
group by table1.week
Query can be validated by checking the link db<>fiddle<>example

Related

Looping/iterating based on table names in SQL Server

Currently I have 2 tables in SQL Server: tab_master (current month table) and tab_08_2022 (indicating it refers to Aug data based on the timestamp).
The tables have exactly structures, the only difference being they pertain to different months.
There will be more tables coming in for subsequent months: tab_08_2022, tab_09_2022, tab_10_2022 and so on.
I need to count the number of rows for each flag (grouping by the flag column) and showcase them every month. This should also have the data from all the previous month's tables.
For one table the query could be:
select count(*), flag_col
from tab_master
group by flag_col;
When the second table comes in the query would be
select *
from
(select count(*), flag_col
from tab_master
group by flag_col) x
inner join
(select count(*), flag_col as flag_08_2022
from tab_09_2022
group by flag_col) y on x.flag = y.flag_08_2022;
When the third table comes in the query would be
select *
from
(select count(*), flag_col
from tab_master
group by flag_col) x
inner join
(select count(*), flag_col as flag_08_2022
from tab_08_2022
group by flag) y on x.flag = y.flag_08_2022
inner join
(select count(*), flag_col as flag_09_2022
from tab_09_2022
group by flag_col) z on x.flag_col = z.flag_09_2022;
How do I go about doing that?

Best way to use CTEs to join two large tables?

I have 2 tables like this:
table1
user_id
region
th54d5d
South West
table2
user_id
date
th54d5d
South West
The tables are too big to join together so I'm trying to use CTEs to query them, this is what I have tried:
'''
WITH a AS (
SELECT
DISTINCT region,
COUNT(DISTINCT user_id) AS users
FROM table1
GROUP BY 1
),
b AS (
SELECT
user_id AS users,
date
FROM table2
WHERE date BETWEEN '20220418' AND '20220821'
GROUP BY 1,2
)
SELECT
a.region,
a.users
FROM a
RIGHT JOIN b
ON a.users = b.users
WHERE b.datet BETWEEN '20220418' AND '20220821'
GROUP BY 1,2
ORDER BY 2
;
'''
This just returns two blank columns. I'm not that great at CTEs, maybe someone can advise on the correct/ a better way of going about this? (amazon redshift) Thanks!

How to aggregate different CTEs in outer query SQL

i am trying to join two ctes to get the difference in performance of different countries and group on id here is my example
every campaign can be done in different countries, so how can i group by at the end to have 1 row per campaign id ?
CTE 1: (planned)
select
country
, campaign_id
, sum(sales) as planned_sales
from table x
group by 1,2
CTE 2: (Actual)
select
country
, campaign_id
, sum(sales) as actual_sales
from table y
group by 1,2
outer select
select
country,
planned_sales,
actual_sales
planned - actual as diff
from cte1
join cte2
on campaign_id = campaign_id
This should do it:
select
cte1.campaign_id,
sum(cte1.planned_sales),
sum(cte2.actual_sales)
sum(cte1.planned_sales) - sum(cte2.actual_sales) as diff
from cte1
join cte2
on cte1.campaign_id = cte2.campaign_id and cte1.country = cte2.country
group by 1
I would suggest using full join, so all data is included in both tables, not just data in one or the other. Your query is basically correct but it needs a group by.
select campaign_id,
sum(cte1.planned_sales) as planned_sales
sum(cte2.actual_sales) as actual_sales,
(coalesce(sum(cte1.planned_sales), 0) -
coalesce(sum(cte2.actual_sales), 0)
) as diff
from cte1 full join
cte2
using (campaign_id, country)
group by campaign_id;
That said, there is no reason why the CTEs should aggregate by both campaign and country. They could just aggregate by campaign id -- simplifying the query and improving performance.

SQL GROUP BY ( DATEPART(), field1 ) result set to zero nulls

I want to aggregate counts, grouped by a datepart and column.
For example, a table with 3 columns with each row representing a unique event: id, name, date
I want to select total counts grouped by name and hour, with zeros when there are no events. If I'm only grouping by name, I can join it with a table of every name. With an hour I could do something similar.
How would I handle the case of grouping by both without having a table with a row for every name+hour combination?
The following is the mysql solution:
create table hours (hour int)
insert hours (hour) values (0), (1) .... (23)
select hour, name, sum(case when name is null then 0 else 1 end)
from hours left outer join
event on (hour(event.date) = hours.hour)
group by hour, name
the sum(case when name is null then 0 else 1 end) handles the case when there are no events for a particular hour and name. the count will show as 0. For others each matching row contributes 1 to the sum.
For sql server use datepart(hour, event.date) instead. The rest should be similar
You can use cross join to generate all the rows and then other logic to fill in the values:
select h.hour, n.name, count(a.name) as cnt
from (select distinct hour(date) as hour from atable) h cross join
(select distinct name from atable) n left join
atable a
on hour(a.date) = h.hour and a.name = n.name
group by h.hour, n.name;

How to join three select queries which has one common column

I have three select queries as below which gives a respective output
select DATE_FORMAT(table1.value_date,'%b')as Month,
DATE_FORMAT(table1.value_date,'%Y') as Year,
table1.open as Open
from index_main as table1
join ( select min(`value_date`) `value_date`
from index_main
group by month(`value_date`), year( `value_date`)
) as table2 on table1.`value_date` = table2.`value_date`
Output columns - Month,year,open
select DATE_FORMAT(table1.value_date,'%b')as Month,
DATE_FORMAT(table1.value_date,'%Y') as Year,
table1.close as Open
from index_main as table1
join ( select max(`value_date`) `value_date`
from index_main group by month(`value_date`), year( `value_date`)
) as table2 on(table1.`value_date` = table2.`value_date`)
Output columns - Month,year,close
select DATE_FORMAT(table1.value_date,'%b')as Month,
DATE_FORMAT(table1.value_date,'%Y') as Year,
max(table1.high) as High
FROM `index_main` as table1
GROUP BY table1.month,table1.year
ORDER BY year(table1.value_date) desc, month(table1.value_date) desc
Output columns - Month,year,high,low
I want to join these three select queries based on the common columns i.e month & year.
My final result should have the following columns - month,year,open,close,high,low.
Try this.
First create 3 views, one with each query (vw1, vw2 and vw3). Then use a query like this:
SELECT vw1.Month, vw1.Year, Open, Close High FROM vw1 LEFT join vw2 on vw1.Year=vw2.Year and vw1.Month=vw2.Month LEFT JOIN vw3 on vw1.Year=vw3.Year and vw1.Month=vw3.Month
Hope this helps you.