Incremental Sum across different groups - sql

I am trying to figure out how to count every product at every date such that count is incremental across all product,
this is dummy table for understanding , I have millions of records with thousands of different products
I am unable to query at every date for each product the count in incremental fashion along with miles as per date provided
CREATE TABLE Dummy_tab (
empid int,
date1_start date,
name_emp varchar(255),
product varchar(255),
miles varchar(20)
);
INSERT INTO Dummy_tab VALUES
(1, '2018-08-27', 'Eric', 'a',10),
(1, '2018-08-28', 'Eric','b',10),
(1, '2018-08-28', 'Eric','a',20),
(2, '2020-01-8', 'Jack','d',10),
(2, '2020-02-8', 'Jack','b',20),
(2, '2020-12-28', 'Jack','b',20),
(2, '2020-12-28', 'Jack','d',20),
(2,'2021-10-28', 'Jack','c',20),
(2, '2022-12-28', 'Jack','d',20),
(3, '2018-12-31', 'Jane','',10),
(3, '2018-12-31', 'Jane','',15);
My desired O/p is this
Id Date a b c d empty miles
1 2018-08-27 1 0 0 0 0 10
1 2018-08-28 2 1 0 0 0 20
2 2020-01-08 0 0 0 1 0 10
2 2020-02-08 0 1 0 1 0 20
2 2020-12-28 0 2 0 2 0 20
2 2021-10-28 0 2 1 2 0 20
2 2022-12-28 0 2 1 3 0 20
3 2018-12-31 0 0 0 0 1 10
3 2019-12-31 0 0 0 0 2 15
FOR EXAMPLE
Eric has 3 entry for ID =1 with product a on 2018 08 27 with product b on 2018 08 28 with product a on 2018 08 28
SO 1st entry a= 1 for ID=1 2nt entry is sum of previous and current so now a =2 for ID=1 and b= 1 as there were no entry earlier for b
Miles needs to be maximum miles for that date from past dates

You need to first (conditionally) aggregate your values here, and then you can do a cumulative SUM:
WITH Aggregates AS(
SELECT empid AS Id,
date1_start AS [Date],
COUNT(CASE product WHEN 'a' THEN 1 END) AS a,
COUNT(CASE product WHEN 'b' THEN 1 END) AS b,
COUNT(CASE product WHEN 'c' THEN 1 END) AS c,
COUNT(CASE product WHEN 'd' THEN 1 END) AS d,
COUNT(CASE product WHEN '' THEN 1 END) AS empty,
MAX(miles) AS miles
FROM dbo.Dummy_tab
GROUP BY empid, date1_start)
SELECT Id,
[Date],
SUM(a) OVER (PARTITION BY Id ORDER BY [Date]) AS a,
SUM(b) OVER (PARTITION BY Id ORDER BY [Date]) AS b,
SUM(c) OVER (PARTITION BY Id ORDER BY [Date]) AS c,
SUM(d) OVER (PARTITION BY Id ORDER BY [Date]) AS d,
SUM(empty) OVER (PARTITION BY Id ORDER BY [Date]) AS empty,
miles
FROM Aggregates
ORDER BY ID,
[Date];

Related

SQL get rank of ordered data

I have a data set that looks like this (ordered by date):
date
value
first_id
second_id
2020-01-01
10
1
1
2020-01-02
15
1
1
2020-01-03
5
1
2
2020-01-04
75
2
2
2020-01-05
101
2
2
2020-01-06
12
1
1
2020-01-07
5
1
1
2020-01-08
14
1
2
I need to get an aggregation when values are the same for the same first_id and second_id in a sequence, lets say max(value), so I can get:
max_value
first_id
second_id
15
1
1
5
1
2
101
2
2
12
1
1
14
1
2
If you do max(value) and group by, same first_id and second_id combinations will give just one row (regardless of date ordering).
I was thinking to add RANK when one of ids changes, e.g:
date
value
first_id
second_id
rank
2020-01-01
10
1
1
1
2020-01-02
15
1
1
1
2020-01-03
5
1
2
2
2020-01-04
75
2
2
3
2020-01-05
101
2
2
3
2020-01-06
12
1
1
4
2020-01-07
5
1
1
4
2020-01-08
14
1
2
5
But I don't know how to get that rank as well since same id combinations are considered together.
You can use lag() and a cumulative sum to define the groups and then aggregate. You can see the groups if you run this query:
select t.*,
sum(case when prev_date = prev_date2 then 0 else 1 end) over (order by date) as grp
from (select t.*,
lag(date) over (order by date) as prev_date,
lag(date) over (partition by first_id, second_id order by date) as prev_date2
from t
) t;
The logic is saying that a new group starts when the previous date does not have the same values of the two id columns.
Then the aggregation is:
with grps as (
select t.*,
sum(case when prev_date = prev_date2 then 0 else 1 end) over (order by date) as grp
from (select t.*,
lag(date) over (order by date) as prev_date,
lag(date) over (partition by first_id, second_id order by date) as prev_date2
from t
) t
)
select first_id, second_id, max(value), min(date), max(date)
from grps
group by grp

Total Sum and Partial Sum

I am currently using SSMS. I am pulling data, and trying to get two different columns that sum prices. The two columns 'ChangeSpend' and 'TotalSpend' both reference the same column and this is where I am running into problems.
I want ChangeSpend to return the sum of all the codes per receipt that start with V.Ch% (so they exclude all the others) and the TotalSpend to sum all of the codes for each receipt.
Here is my current code:
SELECT
Receipt
,ReceiptCode
,ReceiptAmount
,sum(ReceiptAmount) over (Partition by Receipt) as TotalSpend
,(CASE WHEN ReceiptCode = 'V.Ch%' then sum(ReceiptAmount)
over (Partition by Receipt)
ELSE 0
END) as ChangeSpend
FROM tableA
LEFT OUTER JOIN tableB
on A.Receipt = B.Receipt
WHERE ReceiptCode LIKE 'V.%'
ORDER BY Receipt
However, my query currently prints this:
Receipt ReceiptCode ReceiptAmount TotalSpend ChangeSpend
1 v.cha 5 20 0
1 v.rt 2 20 0
1 v.chb 6 20 0
1 v.abc 7 20 0
2 v.cha 20 21 0
2 v.abc 1 21 0
3 v.cha 4 14 0
3 v.chb 1 14 0
3 v.tye 7 14 0
3 v.chs 2 14 0
And I would like it to print this:
Receipt ReceiptCode ReceiptAmount TotalSpend ChangeSpend
1 v.cha 5 20 11
1 v.rt 2 20 11
1 v.chb 6 20 11
1 v.abc 7 20 11
2 v.cha 20 21 20
2 v.abc 1 21 20
3 v.cha 4 14 7
3 v.chb 1 14 7
3 v.tye 7 14 7
3 v.chs 2 14 7
Thanks for any help
Try
,SUM(CASE WHEN ReceiptCode LIKE 'V.Ch%' THEN ReceiptAmount ELSE 0 END)
OVER (Partition by Receipt)
AS ChangeSpend
You have to put the SUM outside the CASE, not the other way around:
SUM(CASE WHEN SomeCondition=true THEN MyColumn ELSE 0 END)
This may help:
Create Table Payment(
Receipt Int,
ReceiptCode VARCHAR(10),
ReceiptAmount decimal)
Insert Into Payment
Values
(1, 'v.cha', 5),
(1, 'v.rt', 2),
(1, 'v.chb', 6),
(1, 'v.abc', 7),
(2, 'v.cha', 20),
(2, 'v.abc', 1),
(3, 'v.cha', 4),
(3, 'v.chb', 1),
(3, 'v.the', 7),
(3, 'v.chs', 2);
SELECT * ,
SUM(ReceiptAmount) OVER ( PARTITION BY Receipt ) AS TotalSpend ,
SUM(IIF(ReceiptCode LIKE 'v.ch%',ReceiptAmount,0)) OVER ( PARTITION
BY Receipt ) AS ChangeSpend
FROM payment;
Result:
SUM(
CASE WHEN ReceiptCode like 'V.Ch%' then ReceiptAmount ELSE 0 END) as ChangeSpend

Count the cycle, and count that already has been counted

I have my query:
SELECT UserGroupCode,COUNT(UserGroupCode) AS [CountofCycle]
FROM Users.GroupCycles
GROUP BY UserGroupCode;
Which shows me:
UserGroupCode CountofCycles
1 1
4 1
5 1
6 2 (gone into 2nd cycle)
7 1
8 1
9 1
10 1
11 1
12 1
13 1
14 1
15 1
16 1
17 1
18 1
19 1
When i try to count Total UserGroups where countofcycle=1
SELECT Count(t.CountOfCycle) AS 'totalgroups'
FROM
(SELECT CreateDate, COUNT(userGroupCode) AS [CountofCycle]
FROM Users.GroupCycles
GROUP BY CreateDate,UserGroupCode)t
WHERE CountofCycle=1
I get result = 18 which should be 16, if i delete CreateDate from both SELECT And GROUP BY statement i can get correct number of CountofCycles,
and when i change condition to CountofCycle=2 or >1 it shows me 0
What is the problem with showing UserGroups with cycle > 1 ???!??
Here is my query to filter out onCreateDate, in 2nd table that i UNION with 1st one, i cant't use CreateDate, as it disturbs my query results
SELECT Count(t.CountOfCycle) AS 'total groups'
FROM
(SELECT COUNT(userGroupCode) AS [CountofCycle], CreateDate
FROM users.GroupCycles GROUP BY userGroupCode,CreateDate)t
WHERE t.CountOfCycle=1 AND t.CreateDate Between '03/16/2017' AND '04/25/2017'
UNION ALL
SELECT Count(t.CountOfCycle) AS 'group on date2'
FROM
(SELECT COUNT(userGroupCode) AS [CountofCycle] FROM users.GroupCycles GROUP BY userGroupCode)t
WHERE t.CountOfCycle=2
Firstly to address why you are not getting the results you are expecting, and the simple reason is that you are comparing two different queries and expecting the results to be the same.
Consider this very simple example data
UserGroupCode | CreateDate
----------------+----------------
A | 2017-05-10
B | 2017-05-10
B | 2017-05-11
C | 2017-05-10
You have two records where the UserGroupCode is B, so if you run:
DECLARE #T TABLE (UserGroupCode CHAR(1), CreateDate DATE)
INSERT #T (userGroupCode, CreateDate)
VALUES ('A', '2017-05-10'), ('B', '2017-05-10'), ('B', '2017-05-11');
SELECT UserGroupCode, COUNT(*) AS [Count]
FROM #T
GROUP BY UserGroupCode
HAVING COUNT(*) = 2;
This returns:
UserGroupCode Count
-------------------------
B 2
However, if you were to add CreateDate to the grouping, "B" would be split into two groups, each with a count of 1:
DECLARE #T TABLE (UserGroupCode CHAR(1), CreateDate DATE)
INSERT #T (userGroupCode, CreateDate)
VALUES ('A', '2017-05-10'), ('B', '2017-05-10'), ('B', '2017-05-11');
SELECT UserGroupCode, CreateDate, COUNT(*) AS [Count]
FROM #T
WHERE UserGroupCode = 'B'
GROUP BY UserGroupCode, CreateDate;
This returns:
UserGroupCode CreateDate Count
---------------------------------------
B 2017-05-10 1
B 2017-05-11 1
Now, based you your queries you have posted, it looks like you want to know
The number of groups that only have one record in the date range 16th March 2017 to 25th April 2017.
The number of groups that have two records in total.
For this, consider a slightly larger data set:
UserGroupCode | CreateDate
----------------+----------------
A | 2017-04-10
B | 2017-04-10
B | 2017-05-11
C | 2017-01-01
C | 2017-01-02
D | 2017-04-01
D | 2017-04-02
E | 2017-01-02
So here.
Group A has one record in total, and it falls within the date range
Group B has two records in total, on in the date range, one not
Group C has two records, neither in the date range
Group D has two records, both in the date range.
Group E has one record, not in the date range
So for your first requirement:
The number of groups that only have one record in the date range 16th March 2017 to 25th April 2017.
We would expect 2 groups, A and B, because C and E have no records in the date range, and D has two.
and the second we would expect three groups, B, C and D, since A and E only have one record each.
You can do this with a single query by using a conditional aggregate.
DECLARE #T TABLE (UserGroupCode CHAR(1), CreateDate DATE)
INSERT #T (userGroupCode, CreateDate)
VALUES ('A', '2017-04-10'),
('B', '2017-04-10'), ('B', '2017-05-11'),
('C', '2017-01-01'), ('C', '2017-01-02'),
('D', '2017-04-01'), ('D', '2017-04-02'),
('E', '2017-01-02');
SELECT TotalGroups = COUNT(CASE WHEN RecordsInPeriod = 1 THEN 1 END),
GroupOnDate2 = COUNT(CASE WHEN TotalRecords = 2 THEN 1 END)
FROM ( SELECT UserGroupCode,
TotalRecords = COUNT(*),
RecordsInPeriod = COUNT(CASE WHEN CreateDate >= '20170316'
AND CreateDate <= '20170425' THEN 1 END)
FROM #T
GROUP BY UserGroupCode
) AS t;
Which gives:
TotalGroups GroupOnDate2
------------------------------
2 3
I'd expect to see a HAVING clause rather than a WHERE:
SELECT UserGroupCode, COUNT(UserGroupCode) [CountofCycle]
FROM [Users].[GroupCycles]
GROUP BY UserGroupCode
HAVING COUNT(UserGroupCode) > 1;
You could use HAVING, should work (and be more efficient)
select count(*)
from
(
SELECT CreateDate, COUNT(userGroupCode) AS [CountofCycle]
FROM Users.GroupCycles
GROUP BY CreateDate,UserGroupCode
having count(userGroupCode) > 1 -- here is HAVING clause
) x1

How to turn repeated ranking(1-5) row data to column data in TSQL

I have a table data:
ID Sale Weekday
1 12 1
2 15 2
3 16 3
4 17 4
5 18 5
6 11 1
7 13 2
8 14 3
9 15 4
10 20 5
11 25 1
12 14 2
13 18 3
14 21 4
15 11 5
.. ..
I'd like to turn it into:
Mo Tu We Th Fr
12 15 16 17 18
11 13 14 15 20
25 14 18 21 11
..
Thank you!
Try this
SELECT SUM(case when Weekday = 1 then Sale else 0 end) as mn,
SUM(case when Weekday = 2 then Sale else 0 end) as Tu,
SUM(case when Weekday = 3 then Sale else 0 end) as We,
SUM(case when Weekday = 4 then Sale else 0 end) as Th,
SUM(case when Weekday = 5 then Sale else 0 end) as Fr
FROM
(
SELECT *,
Row_number()OVER(partition by weekday ORDER BY ID ) as seq_no
FROM tablename
) A
Group by seq_no
As mentioned in sample data if your table has all 5 days for all the week
SELECT SUM(case when Weekday = 1 then Sale else 0 end) as mn,
SUM(case when Weekday = 2 then Sale else 0 end) as Tu,
SUM(case when Weekday = 3 then Sale else 0 end) as We,
SUM(case when Weekday = 4 then Sale else 0 end) as Th,
SUM(case when Weekday = 5 then Sale else 0 end) as Fr
FROM
(
SELECT *,
( ( Row_number()OVER(ORDER BY ID ) - 1 ) / 5 ) + 1 seq_no
FROM tablename
) A
Group by seq_no
SQL FIDDLE DEMO
You could use the pivot operator together with a partitioned row_number like this:
select
max([1]) as 'Mo',
max([2]) as 'Tu',
max([3]) as 'We',
max([4]) as 'Th',
max([5]) as 'Fr'
from
(
select *, row_number() over (partition by weekday order by id) rn
from your_table
) a
pivot (max(sale) for weekday in ([1],[2],[3],[4],[5])) p
group by rn;

Generating order statistics grouped by order total

Hopefully I can explain this correctly. I have a table of line orders (each line order consists of quantity of item and the price, there are other fields but I left those out.)
table 'orderitems':
orderid | quantity | price
1 | 1 | 1.5000
1 | 2 | 3.22
2 | 1 | 9.99
3 | 4 | 0.44
3 | 2 | 15.99
So to get order total I would run
SELECT SUM(Quantity * price) AS total
FROM OrderItems
GROUP BY OrderID
However, I would like to get a count of all total orders under $1 (just provide a count).
My end result I would like would be able to define ranges:
under $1, $1 - $3, 3-5, 5-10, 10-15, 15.. etc;
and my data to look like so (hopefully):
tunder1 | t1to3 | t3to5 | t5to10 | etc
10 | 500 | 123 | 5633 |
So that I can present a piechart breakdown of customer orders on our eCommerce site.
Now I can run individual SQL queries to get this, but I would like to know what the most efficient 'single sql query' would be. I am using MS SQL Server.
Currently I can run a single query like so to get under $1 total:
SELECT COUNT(total) AS tunder1
FROM (SELECT SUM(Quantity * price) AS total
FROM OrderItems
GROUP BY OrderID) AS a
WHERE (total < 1)
How can I optimize this? Thanks in advance!
select
count(case when total < 1 then 1 end) tunder1,
count(case when total >= 1 and total < 3 then 1 end) t1to3,
count(case when total >= 3 and total < 5 then 1 end) t3to5,
...
from
(
select sum(quantity * price) as total
from orderitems group by orderid
);
you need to use HAVING for filtering grouped values.
try this:
DECLARE #YourTable table (OrderID int, Quantity int, Price decimal)
INSERT INTO #YourTable VALUES (1,1,1.5000)
INSERT INTO #YourTable VALUES (1,2,3.22)
INSERT INTO #YourTable VALUES (2,1,9.99)
INSERT INTO #YourTable VALUES (3,4,0.44)
INSERT INTO #YourTable VALUES (3,2,15.99)
SELECT
SUM(CASE WHEN TotalCost<1 THEN 1 ELSE 0 END) AS tunder1
,SUM(CASE WHEN TotalCost>=1 AND TotalCost<3 THEN 1 ELSE 0 END) AS t1to3
,SUM(CASE WHEN TotalCost>=3 AND TotalCost<5 THEN 1 ELSE 0 END) AS t3to5
,SUM(CASE WHEN TotalCost>=5 THEN 1 ELSE 0 END) AS t5andup
FROM (SELECT
SUM(quantity * price) AS TotalCost
FROM #YourTable
GROUP BY OrderID
) dt
OUTPUT:
tunder1 t1to3 t3to5 t5andup
----------- ----------- ----------- -----------
0 0 0 3
(1 row(s) affected)
WITH orders (orderid, quantity, price) AS
(
SELECT 1, 1, 1.5
UNION ALL
SELECT 1, 2, 3.22
UNION ALL
SELECT 2, 1, 9.99
UNION ALL
SELECT 3, 4, 0.44
UNION ALL
SELECT 4, 2, 15.99
),
ranges (bound) AS
(
SELECT 1
UNION ALL
SELECT 3
UNION ALL
SELECT 5
UNION ALL
SELECT 10
UNION ALL
SELECT 15
),
rr AS
(
SELECT bound, ROW_NUMBER() OVER (ORDER BY bound) AS rn
FROM ranges
),
r AS
(
SELECT COALESCE(rf.rn, 0) AS rn, COALESCE(rf.bound, 0) AS f,
rt.bound AS t
FROM rr rf
FULL JOIN
rr rt
ON rt.rn = rf.rn + 1
)
SELECT rn, f, t, COUNT(*) AS cnt
FROM r
JOIN (
SELECT SUM(quantity * price) AS total
FROM orders
GROUP BY
orderid
) o
ON total >= f
AND total < COALESCE(t, 10000000)
GROUP BY
rn, t, f
Output:
rn f t cnt
1 1 3 1
3 5 10 2
5 15 NULL 1
, that is 1 order from $1 to $3, 2 orders from $5 to $10, 1 order more than $15.