Sum multiple column with PARTITION from single table

Sum multiple column with PARTITION from single table - sql

I have a question, it seems simple but I can't figure it out.
I have a sample table like this:
Overtime Table (OT)
+----------+------------+----------+-------------+
|EmployeeId|OvertimeDate|HourMargin|OvertimePoint|
+----------+------------+----------+-------------+
| 1| 2020-07-01| 05:00| 15|
| 1| 2020-07-02| 03:00| 9|
| 2| 2020-07-01| 01:00| 3|
| 2| 2020-07-03| 03:00| 9|
| 3| 2020-07-06| 03:00| 9|
| 3| 2020-07-07| 01:00| 3|
+----------+------------+----------+-------------+
OLC Table (OLC)
+----------+------------+-----+------+
|EmployeeId| OLCDate | OLC | Trip |
+----------+------------+-----+------+
| 1| 2020-07-01| 2| 0|
| 3| 2020-07-13| 3| 6|
+----------+------------+-----+------+
So, based on that tables, I want to calculate total OT.HourMargin, OT.OTPoint, OLC.OLC, and OLC.Trip with the final result like this:
Result
+----------+-----------+----------+--------+----------+
|EmployeeId|TotalMargin|TotalPoint|TotalOLC|TotalPoint|
+----------+-----------+----------+--------+----------+
| 1| 08:00| 24| 2| 0|
| 2| 04:00| 12| 0| 0|
| 3| 04:00| 24| 3| 6|
+----------+-----------+----------+--------+----------+
Here is the query that I try to achieve the result:
DECLARE #Overtime TABLE (
EmployeeId INT,
OvertimeDate DATE,
HourMargin TIME,
OvertimePoint INT
)
DECLARE #OLC TABLE (
EmployeeId INT,
OLCDate DATE,
OLC INT,
Trip INT
)
INSERT INTO #Overtime VALUES (1, '2020-07-01', '05:00:00', 15)
INSERT INTO #Overtime VALUES (1, '2020-07-02', '03:00:00', 9)
INSERT INTO #Overtime VALUES (2, '2020-07-01', '01:00:00', 3)
INSERT INTO #Overtime VALUES (2, '2020-07-03', '03:00:00', 9)
INSERT INTO #Overtime VALUES (3, '2020-07-06', '03:00:00', 9)
INSERT INTO #Overtime VALUES (3, '2020-07-07', '01:00:00', 3)
INSERT INTO #OLC VALUES (1, '2020-07-01', 2, 0)
INSERT INTO #OLC VALUES (3, '2020-07-13', 3, 6)
SELECT
OT.EmployeeId,
CONVERT(TIME, DATEADD(MS, (SUM(DATEDIFF(MS, '00:00:00.000', OT.HourMargin)) OVER (PARTITION BY OT.EmployeeId)), '00:00:00.000')) AS TotalMargin,
SUM(OT.OvertimePoint) OVER (PARTITION BY OT.EmployeeId) AS TotalPoint,
SUM(OLC.OLC) OVER (PARTITION BY OLC.EmployeeId) AS TotalOLC,
SUM(OLC.Trip) OVER (PARTITION BY OLC.EmployeeId) AS TotalTrip
FROM
#Overtime OT
LEFT JOIN #OLC OLC ON OLC.EmployeeId = OT.EmployeeId
AND OLC.OLCDate = OT.OvertimeDate
ORDER BY
EmployeeId
Here is the result from my query:
+----------+-----------+----------+--------+----------+
|EmployeeId|TotalMargin|TotalPoint|TotalOLC|TotalPoint|
+----------+-----------+----------+--------+----------+
| 1| 08:00| 24| NULL| NULL|
| 1| 08:00| 24| 2| 0|
| 2| 04:00| 12| NULL| NULL|
| 2| 04:00| 12| NULL| NULL|
| 3| 04:00| 12| NULL| NULL|
| 3| 04:00| 12| NULL| NULL|
+----------+-----------+----------+--------+----------+
It seems when I try to SUM multiple columns from single table, it will create multiple rows in the final result. Right now, what came across to my mind is using CTE, separate the multiple column into multiple CTE's and querying from all CTE's. Or even try to create temp table/table variable, query the sum's from each column and store/update it.
So, any idea how to achieve my result without using multiple CTE's or temp tables?
Thank You

You want to group together rows that belong to the same EmployeeID, so this implies aggregation rather than window functions:
SELECT
OT.EmployeeId,
CONVERT(TIME, DATEADD(MS, SUM(DATEDIFF(MS, '00:00:00.000', OT.HourMargin)), '00:00:00.000')) AS TotalMargin,
SUM(OT.OvertimePoint) AS TotalPoint,
COALESCE(SUM(OLC.OLC), 0) AS TotalOLC,
COALESCE(SUM(OLC.Trip), 0) AS TotalTrip
FROM #Overtime OT
LEFT JOIN #OLC OLC ON OLC.EmployeeId = OT.EmployeeId
GROUP BY OT.EmployeeId
I also don't see the point for the join condition on the dates, so I removed it. Finally, you can use coalesce() to return 0 for rows that have no OLC.
Demo on DB Fiddle:
EmployeeId | TotalMargin | TotalPoint | TotalOLC | TotalTrip
---------: | :---------- | ---------: | -------: | --------:
1 | 08:00:00 | 24 | 4 | 0
2 | 04:00:00 | 12 | 0 | 0
3 | 04:00:00 | 12 | 6 | 12

You've decided to use SUM OVER but you're experiencing the "problem" of multiple rows... that's what a sum over does; you can conceive that doing an OVER(PARTITION..) does a group by that is auto joined back to the driving table so you end up with all the rows from the driving table together with repeated results of the summation
Here is a simple data set:
ProductID, Price
1, 100
1, 200
2, 300
2, 400
Here are some queries and results:
--perform a basic group and sum
SELECT ProductID, SUM(Price) S FROM x GROUP BY ProductID
1, 300
2, 700
--perform basic group/sum and join it back to the main table
SELECT ProductID, Price, S
FROM
x
INNER JOIN
(SELECT ProductID, SUM(Price) s FROM x GROUP BY ProductID) y
ON x.ProductID = y.ProductID
1, 100, 300
1, 200, 300
2, 300, 700
2, 400, 700
--perform a sum over, the partition here being the same as the earlier group
SELECT ProductID, Price, SUM(Price) OVER(PARTITION BY ProductID) FROM x
1, 100, 300
1, 200, 300
2, 300, 700
2, 400, 700
You can see the latter two produce the same result, extra rows with the total appended. It may help you understand simple window functions if you conceive that this is what he db does internally - it takes the "partition by", does a subquery group by with it, and joins the results back on whatever columns were in the partition
It looks like what you really want is a simple group:
SELECT
OT.EmployeeId,
CONVERT(TIME, DATEADD(MS, (SUM(DATEDIFF(MS, '00:00:00.000', OT.HourMargin))), '00:00:00.000')) AS TotalMargin,
SUM(OT.OvertimePoint) AS TotalPoint,
SUM(OLC.OLC) AS TotalOLC,
SUM(OLC.Trip) AS TotalTrip
FROM #Overtime OT
LEFT JOIN #OLC OLC ON OLC.EmployeeId = OT.EmployeeId
AND OLC.OLCDate = OT.OvertimeDate
GROUP BY OT.EmployeeID

Related

How to use window function in Redshift?

I have 2 tables:
| Product |
|:----: |
| product_id |
| source_id|
Source
source_id
priority
sometimes there are cases when 1 product_id can contain few sources and my task is to select data with min priority from for example
| product_id | source_id| priority|
|:----: |:------:| :-----:|
| 10| 2| 9|
| 10| 4| 2|
| 20| 2| 9|
| 20| 4| 2|
| 30| 2| 9|
| 30| 4| 2|
correct result should be like:
| product_id | source_id| priority|
|:----: |:------:| :-----:|
| 10| 4| 2|
| 20| 4| 2|
| 30| 4| 2|
I am using query:
SELECT p.product_id, p.source_id, s.priority FROM Product p
INNER JOIN Source s on s.source_id = p.source_id
WHERE s.priority = (SELECT Min(s1.priority) OVER (PARTITION BY p.product_id) FROM Source s1)
but it returns error "this type of correlated subquery pattern is not supported yet" so as i understand i can't use such variant in Redshift, how should it be solved, are there any other ways?

You just need to unroll the where clause into the second data source and the easiest flag for min priority is to use the ROW_NUMBER() window function. You're asking Redshift to rerun the window function for each JOIN ON test which creates a lot of inefficiencies in clustered database. Try the following (untested):
SELECT p.product_id, p.source_id, s.priority
FROM Product p
INNER JOIN (
SELECT ROW_NUMBER() OVER (PARTITION BY p.product_id, order by s1.priority) as row_num,
source_id,
priority
FROM Source) s
on s.source_id = p.source_id
WHERE row_num = 1
Now the window function only runs once. You can also move the subquery to a CTE if that improve readability for your full case.

Already found best solution for that case:
SELECT
p.product_id
, p.source_id
, s.priority
, Min(s.priority) OVER (PARTITION BY p.product_id) as min_priority
FROM Product p
INNER JOIN Source s
ON s.source_id = p.source_id
WHERE s.priority = p.min_priority

SQL Server create Unpivot table with where condition

Here is my sample table
---+-------------+------------------+------------------+------------------+--------------------+--------------------+--------------------+
| Id| CompanyName|part1_sales_amount|part2_sales_amount|part3_sales_amount|part1_sales_quantity|part2_sales_quantity|part3_sales_quantity|
+---+-------------+------------------+------------------+------------------+--------------------+--------------------+--------------------+
| 1| FastCarsCo| 1| 2| 3| 4| 5| 6|
| 2|TastyCakeShop| 4| 5| 6| 4| 5| 6|
| 3| KidsToys| 7| 8| 9| 7| 8| 9|
| 4| FruitStall| 10| 11| 12| 10| 11| 12|
+---+-------------+------------------+------------------+------------------+--------------------+--------------------+--------------------+
Here is output table that i want
+---+-------------+------------------+------------------+------------------+
| Id| CompanyName|Account |amount |quantity |
+---+-------------+------------------+------------------+------------------+
| 1| FastCarsCo| part1_sales| 1| 1|
| 1| FastCarsCo| part2_sales| 2| 2|
| 1| FastCarsCo| part3_sales| 3| 3|
| 2|TastyCakeShop| part1_sales| 4| 4|
| 2|TastyCakeShop| part2_sales| 5| 5|
| 2|TastyCakeShop| part3_sales| 6| 6|
| 3| KidsToys| part1_sales| 7| 7|
| 3| KidsToys| part2_sales| 8| 8|
| 3| KidsToys| part3_sales| 9| 9|
| 4| FruitStall| part1_sales| 10| 10|
| 4| FruitStall| part2_sales| 11| 11|
| 4| FruitStall| part3_sales| 12| 12|
+---+-------------+------------------+------------------+------------------+
Things I already did
SELECT
Id,
CompanyName,
REPLACE ( acc , '_amount' , '' ) AS Account,
amount,
quantity
FROM
(
SELECT Id, CompanyName, part1_sales_amount ,part2_sales_amount ,part3_sales_amount ,part1_sales_quantity ,part2_sales_quantity ,part3_sales_quantity
FROM privot
) src
UNPIVOT
(
amount FOR acc IN (part1_sales_amount ,part2_sales_amount ,part3_sales_amount )
) pvt1
UNPIVOT
(
quantity FOR acc1 IN (part1_sales_quantity, part2_sales_quantity, part3_sales_quantity )
) pvt2
It gave some result but it seems like there is some unexpected record also(Like cross join). so my final step the WHERE clause, what should I write in WHERE clause.I tried many thing but non is a correct one.
Note: In my real data base here are almost 200 column like those part1_sales_amount and part1_sales_quantity
Please any help appreciate.

You can use apply :
select t.id, t.companyname, tt.amount, tt.qty
from table t cross apply
( values (t.part1_sales_amount, t.part1_sales_quantity),
(t.part2_sales_amount, t.part2_sales_quantity),
(t.part3_sales_amount, t.part3_sales_quantity),
. . .
) tt(amount, qty);

SELECT Id, CompanyName, account, amount, quantity
FROM MyTable
CROSS APPLY (
SELECT account = 'part1_sales_amount', amount = part1_sales_amount, quantity = part1_sales_quantity
UNION ALL
SELECT account = 'part2_sales_amount', amount = part2_sales_amount, quantity = part2_sales_quantity
UNION ALL
SELECT account = 'part3_sales_amount', amount = part3_sales_amount, quantity = part3_sales_quantity
) AS AnotherData

Single unpivot and choose the corresponding quantity column:
declare #privot table
(
id int,
CompanyName varchar(20),
part1_sales_amount money,
part2_sales_amount money,
part3_sales_amount money,
part4_sales_amount money,
part5_sales_amount money,
part1_sales_quantity int,
part2_sales_quantity int,
part3_sales_quantity int,
part4_sales_quantity int,
part5_sales_quantity int
);
insert into #privot
(
Id, CompanyName,
part1_sales_amount, part2_sales_amount, part3_sales_amount, part4_sales_amount, part5_sales_amount,
part1_sales_quantity, part2_sales_quantity, part3_sales_quantity, part4_sales_quantity, part5_sales_quantity
)
values
(1, 'FastCarsCo', 1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
(2, 'TastyCakeShop', 10, 20, 30, 40, 50, 60, 70, 80, 90, 100),
(3, 'KidsToys', 11, 21, 31, 41, 51, 61, 71, 81, 91, 101),
(4, 'FruitStall', 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000);
select
Id, CompanyName, replace(acc, '_amount', '') as acc, amount,
quantity=choose(/*try_cast ??*/replace(left(acc, charindex('_', acc)-1), 'part', ''), /*quantity columns*/part1_sales_quantity, part2_sales_quantity, part3_sales_quantity, part4_sales_quantity, part5_sales_quantity)
FROM
(
SELECT *
--Id, CompanyName,
--part1_sales_amount, part2_sales_amount, part3_sales_amount, part4_sales_amount, part5_sales_amount,
--part1_sales_quantity ,part2_sales_quantity ,part3_sales_quantity , part4_sales_quantity, part5_sales_quantity
FROM #privot
) src
UNPIVOT
(
amount FOR acc IN (/*amount columns*/part1_sales_amount ,part2_sales_amount ,part3_sales_amount, part4_sales_amount, part5_sales_amount )
) pvt1;

Sql server select query of with ids, Count of ids grouped by casted date from datetime

i am struggling to find a right way to write as select query that produces a count of ids with unique date, i have Log table as
id| DateTime
1|23-03-2019 18:27:45|
1|23-03-2019 18:27:45|
2|23-03-2019 18:27:50|
2|23-03-2019 18:27:51|
2|23-03-2019 18:28:01|
3|23-03-2019 18:33:15|
1|24-03-2019 18:13:18|
2|23-03-2019 18:27:12|
2|23-03-2019 15:27:46|
3|23-03-2019 18:21:58|
3|23-03-2019 18:21:58|
4|24-03-2019 10:11:14|
What i have am tried
select id, count(cast(DateTime as DATE)) as Counts from Logs group by id
its producing proper count of ids with id like
id|count
1 | 2|
2 | 3|
3 | 1|
1 | 1|
2 | 2|
3 | 2|
4 | 1|
What i want is to add datetime column casted as date
id|count|Date
1 | 2| 23-03-2019
2 | 3| 23-03-2019
3 | 1| 23-03-2019
1 | 1| 24-03-2019
2 | 2| 24-03-2019
3 | 2| 24-03-2019
4 | 1| 24-03-2019
However i get an error saying
Column 'Logs.DateTime' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
when i try
select id, count(cast(DateTime as DATE)) as Counts from Logs group by id

You need to add cast(DateTime as DATE) also in group by
select id,cast(DateTime as DATE) as dateval, count(cast(DateTime as DATE)) as Counts
from Logs
group by id,cast(DateTime as DATE)

SQL - Pivot or Unpivot?

Another time, another problem. I have the following table:
|assemb.|Repl_1|Repl_2|Repl_3|Repl_4|Repl_5|Amount_1|Amount_2|Amount_3|Amount_4|Amount_5|
|---------------------------------------------------------------------------------------|
|4711001|111000|222000|333000|444000|555000| 1| 1| 1| 1| 1|
|---------------------------------------------------------------------------------------|
|4711002|222000|333000|444000|555000|666000| 1| 1| 1| 1| 1|
|---------------------------------------------------------------------------------------|
And here what I need:
|Article|Amount|
|--------------|
| 111000| 1|
|--------------|
| 222000| 2|
|--------------|
| 333000| 2|
|--------------|
| 444000| 2|
|--------------|
| 555000| 2|
|--------------|
| 666000| 1|
|---------------
Repl_1 to Repl_10 are replacement-articles of the assembly. I can have n assemblies with to 10 rep-articles. At the end I need to overview all articles with there amounts of all assemblies.
THX.
Best greetz
Vegeta

This is probably the quickest way of achieving it using UNION ALL. However, I'd recommend normalising your table
SELECT Article, SUM(Amount) FROM (
SELECT Repl_1 AS Article, SUM(Amount_1) AS Amount FROM #Test GROUP BY Repl_1
UNION ALL
SELECT Repl_2 AS Article, SUM(Amount_2) AS Amount FROM #Test GROUP BY Repl_2
UNION ALL
SELECT Repl_3 AS Article, SUM(Amount_3) AS Amount FROM #Test GROUP BY Repl_3
UNION ALL
SELECT Repl_4 AS Article, SUM(Amount_4) AS Amount FROM #Test GROUP BY Repl_4
UNION ALL
SELECT Repl_5 AS Article, SUM(Amount_5) AS Amount FROM #Test GROUP BY Repl_5
) tbl GROUP BY Article

SQL query for finding the most frequent value of a grouped by value

I'm using SQLite browser, I'm trying to find a query that can find the max of each grouped by a value from another column from:
Table is called main
| |Place |Value|
| 1| London| 101|
| 2| London| 20|
| 3| London| 101|
| 4| London| 20|
| 5| London| 20|
| 6| London| 20|
| 7| London| 20|
| 8| London| 20|
| 9| France| 30|
| 10| France| 30|
| 11| France| 30|
| 12| France| 30|
The result I'm looking for is the finding the most frequent value grouping by place:
| |Place |Most Frequent Value|
| 1| London| 20|
| 2| France| 30|
Or even better
| |Place |Most Frequent Value|Largest Percentage|2nd Largest Percentage|
| 1| London| 20| 0.75| 0.25|
| 2| France| 30| 1| 0.75|

You can group by place, then value, and order by frequency eg.
select place,value,count(value) as freq from cars group by place,value order by place, freq;
This will not give exactly the answer you want, but near to it like
London | 101 | 2
France | 30 | 4
London | 20 | 6
Now select place and value from this intermediate table and group by place, so that only one row per place is displayed.
select place,value from
(select place,value,count(value) as freq from cars group by place,value order by place, freq)
group by place;
This will produce the result like following:
France | 30
London | 20
This works for sqlite. But for some other programs, it might not work as expected and return the place and value with least frequency. In those, you can put order by place, freq desc instead to solve your problem.

The first part would be something like this.
http://sqlfiddle.com/#!7/ac182/8
with tbl1 as
(select a.place,a.value,count(a.value) as val_count
from table1 a
group by a.place,a.value
)
select t1.place,
t1.value as most_frequent_value
from tbl1 t1
inner join
(select place,max(val_count) as val_count from tbl1
group by place) t2
on t1.place=t2.place
and t1.val_count=t2.val_count
Here we are deriving tbl1 which will give us the count of each place and value combination. Now we will join this data with another derived table t2 which will find the max count and we will join this data to get the required result.
I am not sure how do you want the percentage in second output, but if you understood this query, you can use some logic on top of it do derive the required output. Play around with the sqlfiddle. All the best.

RANK
SQLite now supports RANK, so we can use the exact same syntax that works on PostgreSQL, similar to https://stackoverflow.com/a/12448971/895245
SELECT "city", "value", "cnt"
FROM (
SELECT
"city",
"value",
COUNT(*) AS "cnt",
RANK() OVER (
PARTITION BY "city"
ORDER BY COUNT(*) DESC
) AS "rnk"
FROM "Sales"
GROUP BY "city", "value"
) AS "sub"
WHERE "rnk" = 1
ORDER BY
"city" ASC,
"value" ASC
This would return all in case of tie. To return just one you could use ROW_NUMBER instead of RANK.
Tested on SQLite 3.34.0 and PostgreSQL 14.3. GitHub upstream.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Sum multiple column with PARTITION from single table - sql

Related

How to use window function in Redshift?

SQL Server create Unpivot table with where condition

Sql server select query of with ids, Count of ids grouped by casted date from datetime

SQL - Pivot or Unpivot?

SQL query for finding the most frequent value of a grouped by value

Categories

Resources