I am trying to get a conditional sum based on another column. For example, suppose I have this dataset:
ID Date Type Total
-----------------------
5 12/16/2019 0 7
5 12/16/2019 1 0
5 12/17/2019 0 7
5 12/17/2019 1 7
5 12/18/2019 0 7
5 12/18/2019 1 0
5 12/19/2019 0 7
5 12/19/2019 1 7
5 12/20/2019 0 7
5 12/20/2019 1 7
5 12/23/2019 0 7
5 12/24/2019 0 7
5 12/25/2019 0 7
5 12/26/2019 0 7
5 12/27/2019 0 7
If there is a type of 1 then I only want that data for that data, else if there is only 0 then I want that data for that date.
So for 12/16/2019 I would want the value 0. For 12/23/2019 - 12/27/2019 I would want the value 7.
You can use row_number() :
select t.*
from (select t.*, row_number() over (partition by id, date order by type desc) as seq
from table t
) t
where seq = 1;
A simple ROW_NUMBER can handle this quite easily. I changed some of the column names because reserved words are just painful to work with.
declare #Something table
(
ID int
, SomeDate Date
, MyType int
, Total int
)
insert #Something values
(5, '12/16/2019', 0, 7)
, (5, '12/16/2019', 1, 0)
, (5, '12/17/2019', 0, 7)
, (5, '12/17/2019', 1, 7)
, (5, '12/18/2019', 0, 7)
, (5, '12/18/2019', 1, 0)
, (5, '12/19/2019', 0, 7)
, (5, '12/19/2019', 1, 7)
, (5, '12/20/2019', 0, 7)
, (5, '12/20/2019', 1, 7)
, (5, '12/23/2019', 0, 7)
, (5, '12/24/2019', 0, 7)
, (5, '12/25/2019', 0, 7)
, (5, '12/26/2019', 0, 7)
, (5, '12/27/2019', 0, 7)
select ID
, SomeDate
, MyType
, Total
from
(
select *
, RowNum = ROW_NUMBER()over(partition by SomeDate order by MyType)
from #Something
) x
where x.RowNum = 1
You can do this with simple aggregation . . . well, and case:
select id, date, max(type),
coalesce(max(case when type = 1 then total end),
max(total)
) as total
from t
group by id, date;
This formulation is assuming that you have only types 0 and 1 and at most one of each type on each day for a given id.
Related
I have a table t with:
PLACE
LOCATION
TS
ID
AMOUNT
GOING_IN
GOING_OUT
1
10
2020-10-01
1
100
10
0
1
10
2020-10-02
1
110
5
-50
1
10
2020-10-03
1
75
0
-100
1
10
2020-10-04
1
-25
30
0
1
10
2020-10-05
1
5
0
0
1
10
2020-10-06
1
5
38
-300
1
10
2020-10-07
1
-257
0
0
1
10
2020-10-01
2
1
10
0
1
10
2020-10-02
2
11
0
-12
1
10
2020-10-03
2
-1
0
-100
1
10
2020-10-04
2
-101
0
0
2
20
2020-11-15
1
18
20
0
2
20
2020-11-16
1
38
0
0
2
20
2020-11-15
3
-9
20
-31
2
20
2020-11-16
3
-20
0
0
So due to SAP legacy stuff some logistic data is mangled which may lead to negative inventory.
To check how severe the error is I need to count for each PLACE, LOCATION, ID
the number of rows that have a positive AMOUNT AND which do not have a negative AMOUNT before
the number of rows that have a negative AMOUNT AND any positive AMOUNT that has a negative AMOUNT anywhere before
As you can see in my table there are (for PLACE=1, LOCATION=10, ID=1) 3 rows with a positive AMOUNT without any negative AMOUNT before. But then there is a negative AMOUNT and some positive AMOUNTS afterwards --> those 4 rows should not be counted for COUNT_CORRECT but should count for COUNT_WRONG.
So in this example table my query should return:
PLACE
LOCATION
TOTAL
COUNT_CORRECT
COUNT_WRONG
RATIO
1
10
11
5
6
0.55
2
20
4
2
2
0.5
My code so far:
CREATE OR REPLACE TABLE ANALYTICS.t (
PLACE INT NOT NULL
, LOCATION INT NOT NULL
, TS DATE NOT NULL
, ID INT NOT NULL
, AMOUNT INT NOT NULL
, GOING_IN INT NOT NULL
, GOING_OUT INT NOT NULL
, PRIMARY KEY(PLACE, LOCATION, ID, TS)
);
INSERT INTO ANALYTICS.t
(PLACE, LOCATION, TS, ID, AMOUNT, GOING_IN, GOING_OUT)
VALUES
(1, 10, '2020-10-01', 1, 100, 10, 0)
, (1, 10, '2020-10-02', 1, 110, 5, -50)
, (1, 10, '2020-10-03', 1, 75, 0, -100)
, (1, 10, '2020-10-04', 1, -25, 30, 0)
, (1, 10, '2020-10-05', 1, 5, 0, 0)
, (1, 10, '2020-10-06', 1, 5, 38, 300)
, (1, 10, '2020-10-07', 1, -257, 0, 0)
, (1, 10, '2020-10-04', 2, 1, 10, 0)
, (1, 10, '2020-10-05', 2, 11, 0, -12)
, (1, 10, '2020-10-06', 2, -1, 0, -100)
, (1, 10, '2020-10-07', 2, -101, 0, 0)
, (2, 20, '2020-11-15', 1, 18, 12, 0)
, (2, 20, '2020-11-16', 1, 30, 0, 0)
, (2, 20, '2020-11-15', 3, -9, 20, -31)
, (2, 20, '2020-11-16', 3, -20, 0, 0)
;
Then
SELECT PLACE
, LOCATION
, SUM(CASE WHEN AMOUNT >= 0 THEN 1 ELSE 0 END) AS 'COUNT_CORRECT'
, SUM(CASE WHEN AMOUNT < 0 THEN 1 ELSE 0 END) AS 'COUNT_WRONG'
, ROUND((SUM(CASE WHEN AMOUNT < 0 THEN 1 ELSE 0 END) / COUNT(AMOUNT)) * 100, 2) AS 'ratio'
FROM t
GROUP BY PLACE, LOCATION
ORDER BY PLACE, LOCATION
;
But I don't know how I can filter for "AND which do not have a negative AMOUNT before" and counting by PLACE, LOCATION, ID as an intermediate step.
Any help appreciated.
I'm not sure if I understand your question correctly, but the following gives you the number of rows before the first negative amount per (place, location) partition.
The subselect computes the row numbers of all rows with a negative amount. Then we can select the minimum of this as the first row with a negative amount.
SELECT
place,
location,
COUNT(*) - NVL(MIN(pos) - 1, COUNT(*)) AS COUNT_WRONG,
COUNT(*) - local.COUNT_WRONG AS COUNT_CORRECT,
ROUND(local.COUNT_WRONG / COUNT(*),2) AS RATIO
FROM
( SELECT
amount,
place,
location,
CASE
WHEN amount < 0
THEN ROW_NUMBER() over (
PARTITION BY
place,
location
ORDER BY
"TIMESTAMP")
ELSE NULL
END pos -- Row numbers of rows with negative amount, else NULL
FROM
t)
GROUP BY
place,
location;
I have edited the query. Please let me know if this works.
ALL_ENTRIES query has all the row numbers for the table t partitioned by place,location and ID and ordered by timestamp.
TABLE1 is used to compute the first negative entry. This is done by joining with ALL_ENTRIES and selecting the minimum row number where amount < 0.
TABLE2 is used to compute the last correct entry. Basically ALL_ENTRIES is joined with TABLE1 with the condition that the row numbers should be lesser than the row number in TABLE1. This will give us the row number corresponding to the last correct entry.
TABLE1 and TABLE2 are joined with ALL_ENTRIES to calculate the max row number, which gives the total entries.
In the final select statement I have used case when statement to account for IDs where there are no negative amount values. In those scenarios all the entries should be correct. Hence, the max row number is considered for those cases.
WITH ALL_ENTRIES AS (
SELECT
PLACE,
LOCATION,
ID,
TIMESTAMP,
AMOUNT,
ROW_NUMBER() OVER(PARTITION BY PLACE,LOCATION,ID ORDER BY TIMESTAMP) AS 'ROW_NUM'
FROM t)
SELECT
PLACE,
LOCATION,
ID,
TOTAL,
COUNT_CORRECT,
TOTAL - COUNT_CORRECT AS COUNT_WRONG,
COUNT_CORRECT / TOTAL AS RATIO
FROM
(SELECT
ae.PLACE,
ae.LOCATION,
ae.ID,
MAX(ae.ROW_NUM) as TOTAL,
MAX (CASE WHEN table2.LAST_CORRECT_ENTRY IS NULL THEN ae.ROW_NUM ELSE table2.LAST_CORRECT_ENTRY END) AS COUNT_CORRECT,
FROM
ALL_ENTRIES ae
LEFT JOIN
(SELECT
ae.PLACE,
ae.LOCATION,
ae.ID,
MAX(ae.ROW_NUM) as LAST_CORRECT_ENTRY
FROM
ALL_ENTRIES ae
INNER JOIN
( SELECT
t.PLACE,
t.LOCATION,
t.ID, MIN(ae.ROW_NUM) as FIRST_NEGATIVE_ENTRY
FROM t t
INNER JOIN
ALL_ENTRIES ae ON t.PLACE = ae.PLACE
AND t.LOCATION = ae.LOCATION
AND t.ID = ae.ID
AND t.TIMESTAMP = ae.TIMESTAMP
AND t.AMOUNT = ae.AMOUNT
AND ae.AMOUNT < 0
GROUP BY t.PLACE, t.LOCATION
) table1
ON ae.PLACE = table1.PLACE
AND ae.LOCATION = table1.LOCATION
AND ae.ID = table1.ID
AND ae.ROW_NUM < table1.FIRST_NEGATIVE_ENTRY
GROUP BY ae.PLACE, ae.LOCATION, ae.ID
) table2
ON ae.PLACE = table2.PLACE
AND ae.LOCATION = table2.LOCATION
AND ae.ID = table2.ID
GROUP BY ae.PLACE, ae.LOCATION, ae.ID
)
I've been trying to reset the row_number when the value changes on Column Value and I have no idea on how should i do this.
This is my SQL snippet:
WITH Sch(SubjectID, VisitID, Scheduled,Actual,UserId,RLev,SubjectTransactionID,SubjectTransactionTypeID,TransactionDateUTC,MissedVisit,FieldId,Value) as
(
select
svs.*,
CASE WHEN stdp.FieldID = 'FrequencyRegime' and svs.SubjectTransactionTypeID in (2,3) THEN
stdp.FieldID
WHEN stdp.FieldID is NULL and svs.SubjectTransactionTypeID = 1
THEN NULL
WHEN stdp.FieldID is NULL
THEN 'FrequencyRegime'
ELSE stdp.FieldID
END AS [FieldID],
CASE WHEN stdp.Value is NULL and svs.SubjectTransactionTypeID = 1
THEN NULL
WHEN stdp.Value IS NULL THEN
(SELECT TOP 1 stdp.Value from SubjectTransaction st
JOIN SubjectTransactionDataPoint STDP on stdp.SubjectTransactionID = st.SubjectTransactionID and stdp.FieldID = 'FrequencyRegime'
where st.SubjectID = svs.SubjectID
order by st.ServerDateST desc)
ELSE stdp.Value END AS [Value]
from SubjectVisitSchedule svs
left join SubjectTransactionDataPoint stdp on svs.SubjectTransactionID = stdp.SubjectTransactionID and stdp.FieldID = 'FrequencyRegime'
)
select
Sch.*,
CASE WHEN sch.Value is not NULL THEN
ROW_NUMBER() over(partition by Sch.Value, Sch.SubjectID order by Sch.SubjectID, Sch.VisitID)
ELSE NULL
END as [FrequencyCounter],
CASE WHEN Sch.Value = 1 THEN 1--v.Quantity
WHEN Sch.Value = 2 and (ROW_NUMBER() over(partition by Sch.Value, Sch.SubjectID order by Sch.SubjectID, Sch.VisitID) % 2) <> 0
THEN 0
WHEN Sch.Value = 2 and (ROW_NUMBER() over(partition by Sch.Value, Sch.SubjectID order by Sch.SubjectID, Sch.VisitID) % 2) = 0
THEN 1
ELSE NULL
END AS [DispenseQuantity]
from Sch
--left join VisitDrugAssignment v on v.VisitID = Sch.VisitID
where SubjectID = '4E80718E-D0D8-4250-B5CF-02B7A259CAC4'
order by SubjectID, VisitID
This is my Dataset:
Based on the Dataset, I am trying to reset the FrequencyCounter to 1 every time the value changes for each subject, Right now it does 50% of what I want, It is counting when the value 1 or 2 is found, but when value 1 comes again after value 2 it continues the count from where it left. I want every time the value is changes the count to also start from the beginning.
It's difficult to reproduce and test without sample data, but if you want to know how to number rows based on change in column value, next approach may help. It's probably not the best one, but at least will give you a good start. Of course, I hope I understand your question correctly.
Data:
CREATE TABLE #Data (
[Id] int,
[Subject] varchar(3),
[Value] int
)
INSERT INTO #Data
([Id], [Subject], [Value])
VALUES
(1, '801', 1),
(2, '801', 2),
(3, '801', 2),
(4, '801', 2),
(5, '801', 1),
(6, '801', 2),
(7, '801', 2),
(8, '801', 2)
Statement:
;WITH ChangesCTE AS (
SELECT
*,
CASE
WHEN LAG([Value]) OVER (PARTITION BY [Subject] ORDER BY [Id]) <> [Value] THEN 1
ELSE 0
END AS [Change]
FROM #Data
), GroupsCTE AS (
SELECT
*,
SUM([Change]) OVER (PARTITION BY [Subject] ORDER BY [Id]) AS [GroupID]
FROM ChangesCTE
)
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY [GroupID] ORDER BY [Id]) AS Rn
FROM GroupsCTE
Result:
--------------------------------------
Id Subject Value Change GroupID Rn
--------------------------------------
1 801 1 0 0 1
2 801 2 1 1 1
3 801 2 0 1 2
4 801 2 0 1 3
5 801 1 1 2 1
6 801 2 1 3 1
7 801 2 0 3 2
8 801 2 0 3 3
As per my understanding, you need DENSE_RANK as you are looking for the row number will only change when value changed. The syntax will be as below-
WITH your_table(your_column)
AS
(
SELECT 2 UNION ALL
SELECT 10 UNION ALL
SELECT 2 UNION ALL
SELECT 11
)
SELECT *,DENSE_RANK() OVER (ORDER BY your_column)
FROM your_table
Hopefully I don't make this complicated.
I wrote the following SQL that returns the users that their most recent transactions meet a condition
(TRANS_TYPE NOT IN (4, 6, 21, 23) OR DEPOSIT_OPTION & 64 <> 64).
SELECT
TERMINAL_ID,REGISTER_ID,[USER_ID],sub.CREATE_DATE,sub.TRANS_TYPE_ID,BUS_DATE_ID
FROM (
SELECT
T.TERMINAL_ID
,US.REGISTER_ID
,U.[USER_ID]
,T.CREATE_DATE
,T.TRANS_TYPE_ID
,T.BUS_DATE_ID
,T.TRANS_CONFIG_ID
,TT.TRANS_TYPE
, row_number()
over (partition by U.[USER_ID]
order by T.CREATE_DATE desc) rn
From [RCMDYNAMIC].[dbo].[Transaction] T
INNER JOIN [RCMDYNAMIC].[dbo].[UserSession] US ON T.USER_SESSION_ID = US.USER_SESSION_ID
INNER JOIN [RCMSTATIC].[dbo].[User] U ON U.[USER_ID] = US.[USER_ID]
INNER JOIN [RCMSTATIC].[dbo].[TransactionType] TT ON T.TRANS_TYPE_ID = TT.TRANS_TYPE_ID
INNER JOIN [RCMSTATIC].[dbo].[Register] R ON US.REGISTER_ID = R.REGISTER_ID
) sub
LEFT JOIN
[RCMSTATIC].[dbo].[DepositConfig] DC
ON sub.TRANS_CONFIG_ID = DC.DEPOSIT_ID
WHERE sub.rn = 1 AND (TRANS_TYPE NOT IN (4, 6, 21, 23) OR DEPOSIT_OPTION & 64 <> 64)
I acquired the " most recent transaction" by using
row_number()
over (partition by U.[USER_ID]
order by T.CREATE_DATE desc) rn
However, what I really want is selecting the most recent transaction of TRANS_TYPE = 10 IF the most recent transaction met the previous condition.
The sub query in the previous code will return all transactions for all users and rank them in DESC order and the outer SELECT will display the users that meet the condition by checking their Rank 1 transaction.
What I want is having something like this
FOREACH user
IF the user rank 1 transaction meet the condition
THEN
FIND the user most recent transaction of TRANS_TYPE 10
it could be the transaction rank 1 or rank N
Example:
User_ID TRANS_TYPE DEPOSIT_OPTION Rank
1 4 7 1
1 10 7 2
2 22 64 1
2 23 4 2
2 10 126 3
2 4 7 4
3 10 3 1
4 6 64 1 -- doesn't meet the condition
4 10 7 2
form the previous results if the Rank 1 row satisfies the condition
WHERE sub.rn = 1 AND (TRANS_TYPE NOT IN (4, 6, 21, 23) OR DEPOSIT_OPTION & 64 <> 64)
I want the TRANS_TYPE= 10 to be displayed so I would expect the result to be:
User_ID TRANS_TYPE Rank
1 10 2
2 10 3
3 10 1
I am sorry if the question is not very clear I tried my best!
drop table if exists dbo.Deposits;
create table dbo.Deposits (
User_ID int
, Trans_Type int
, Deposit_Option int
, Rank int
);
insert into dbo.Deposits (User_ID, Trans_Type, Deposit_Option, Rank)
values (1, 4, 7, 1), (1, 10, 7, 2)
, (2, 22, 64, 1), (2, 23, 4, 2), (2, 10, 126, 3), (2, 4, 7, 4)
, (3, 10, 3, 1)
, (4, 6, 64, 1), (4, 10, 7, 2);
select
sub2.User_ID, sub2.Trans_Type, sub2.Rank
from dbo.Deposits sub
inner join dbo.Deposits sub2 on sub.User_ID = sub2.User_ID
and sub2.Trans_Type = 10
where sub.Rank = 1 AND (sub.Trans_Type NOT IN (4, 6, 21, 23) OR sub.Deposit_Option & 64 <> 64)
I have a table:
CREATE TABLE Cl
(
PropId int,
ClId int
);
INSERT INTO Cl
(PropId, ClId)
VALUES
(1, 1),
(1, 2),
(1, 3),
(2, 1),
(2, 2),
(3, 1),
(4, 1),
(4, 2),
(5, 1),
(5, 2);
PropId ClId
1 1
1 2
1 3
2 1
2 2
3 1
4 1
4 2
5 1
5 2
I would like to build a query that returns:
PropId
2
4
5
when in my "WHERE" condition I have only PropId=2. The table join must be through ClId values. Thanks in advance. sqlfiddle
You can use group by and having. Here is one method:
select propid
from c1
group by propid
having sum(case when clid = 1 then 1 else 0 end) > 0 and
sum(case when clid = 2 then 1 else 0 end) > 0 and
count(distinct clid) = 2;
As to the comment to the question:
(...) I have only 2 in my PropId and first of all I have to go to the CliId
values and according to them (values and their segmentations) achive
my PropId output. (...)
If by segmentation you mean a set of records that count is equal or greater than 1, you need to join data this way:
SELECT C.PropId
FROM Cl C INNER JOIN (
SELECT B.ClId
FROM Cl B
WHERE B.PropId = 2
) D ON D.Clid = C.Clid
GROUP BY C.PropID
HAVING COUNT(c.ClId )>1
Note: The result will be: {1, 2, 4, 5}. Why?
Because a D subquery returns set of ClId: {1, 2} and there's 4 PropId records which meet criteria.
I have a data which is something like this
stories value
--------------------------
0 2194940472.78964
1 1651820586.1447
2 627935051.75
3 586994698.4272
4 89132137.57
5 134608008
6 40759564
7 0
8 0
10 0
11 0
12 0
13 26060602
17 0
18 0
19 84522335
20 316478066.045
24 0
I want to sum it up as per the range
Output which I am expected
stories value
0-3 125201021
4-7 215453123
8-12 453121545
12-max(numstories) 21354322
I tried this but not able to figure it out what is wrong
select t.NumStories, SUM(t.bldnvalue)
from
(select
a.NumStories,
case
when a.NumStories between 0 and 3 then sum(a.BldgValue)
when a.NumStories between 4 and 7 then sum(a.BldgValue)
when a.NumStories between 8 and 12 then sum(a.BldgValue)
when a.NumStories between 13 and max(a.NumStories) then sum(a.BldgValue)
end as bldnvalue
from
dbo.EDM_CocaCola_Coca_Cola_Company_1_1 a
group by
a.NumStories) t
group by
t.NumStories
With this query I am getting this output
NumStories value
-------------------------------
0 2194940472.78964
3 586994698.4272
12 0
6 40759564
7 0
1 1651820586.1447
24 0
18 0
10 0
4 89132137.57
19 84522335
13 26060602
5 134608008
2 627935051.75
17 0
11 0
20 316478066.045
8 0
I like this result, I tried to use the BIN concept. I think the only issue would be with your max bin. I don't understand how you got your output sums. the first records value is '2,194,940,472.78964' which is bigger than your value in 0-3 bin
if OBJECT_ID('tempdb..#Test') is not null
drop table #Test;
Create table #Test (
Stories int
, Value float
)
insert into #Test
values
(0 , 2194940472.78964)
, (1 , 1651820586.1447 )
, (2 , 627935051.75 )
, (3 , 586994698.4272 )
, (4 , 89132137.57 )
, (5 , 134608008 )
, (6 , 40759564 )
, (7 , 0 )
, (8 , 0 )
, (10, 0 )
, (11, 0 )
, (12, 0 )
, (13, 26060602 )
, (17, 0 )
, (18, 0 )
, (19, 84522335 )
, (20, 316478066.045 )
, (24, 0 )
if OBJECT_ID('tempdb..#Bins') is not null
drop table #Bins;
create Table #Bins(
Label varchar(20)
, Min int
, Max int
)
insert into #Bins values
('0-3', 0, 3)
, ('4-7', 4, 7)
, ('8-12', 8, 12)
, ('13 - Max', 13, 999999999)
Select b.Label
, sum(t.Value) as Value
from #Test t
join #Bins b
on t.stories between b.Min and b.Max
Group by b.Label
order by 1
Output:
Label Value
-------------------- ----------------------
0-3 5061690809.11154
13 - Max 427061003.045
4-7 264499709.57
8-12 0
Just build the grouping string first that you want and group by that variable.
select
case
when a.NumStories between 0 and 3 then '0-3'
when a.NumStories between 4 and 7 then '4-7'
when a.NumStories between 8 and 12 then '8-12'
when a.NumStories >= 13 then '13-max'
end as stories,
sum(a.BldgValue) as value
from
dbo.EDM_CocaCola_Coca_Cola_Company_1_1 a
group by 1;
If you really want to print the max too, then you can put in a subquery in the "13-max" line as (SELECT MAX(BldgValue) FROM dbo.EDM_CocaCola_Coca_Cola_Company_1_1)
You can try this:
SELECT '0-3' AS stories,
SUM(value) AS value
FROM dbo.EDM_CocaCola_Coca_Cola_Company_1_1
WHERE stories BETWEEN 0 AND 3
UNION ALL
SELECT '4-7' AS stories,
SUM(value) AS value
FROM dbo.EDM_CocaCola_Coca_Cola_Company_1_1
WHERE stories BETWEEN 4 AND 7
UNION ALL
...
Here is solution with CTE that should work for any data set, without copying the code.
declare #YourTable table(stories int, value money)
declare #GroupMemberCount int=4
insert #YourTable (stories,value) values (0,5),(1,10),(2,11),(3,7),(4,18),(5,13),(7,15)
;with cte as
(
select c.stories+v.i*#GroupMemberCount FirstGroupMember, c.stories+v.i*#GroupMemberCount+#GroupMemberCount -1 LastGroupMember
,CAST(c.stories+v.i*#GroupMemberCount as varchar(50))
+'-'+CAST(c.stories+v.i*#GroupMemberCount+#GroupMemberCount -1 as varchar(50))GroupName
from (select MIN(stories) stories from #YourTable) c
cross join (values (0),(1),(2),(3),(4)/* and so on */) v(i)
where exists (select * from #YourTable yt where yt.stories>=c.stories+v.i*3)
)
select c.GroupName, SUM(yt.value)
from cte c
JOIN #YourTable yt ON yt.stories BETWEEN c.FirstGroupMember AND C.LastGroupMember
GROUP BY c.GroupName