Cumulative distinct count filtered by last value - T-SQL - sql

I am trying to come up with exactly the same answer as here:
Cumulative distinct count filtered by last value - DAX
but in SQL Server. For convenience I am copying the whole problem description.
I have a dataset:
month name flag
1 abc TRUE
2 xyz TRUE
3 abc TRUE
4 xyz TRUE
5 abc FALSE
6 abc TRUE
I want to calculate month-cumulative distinct count of 'name' filtered by last 'flag' value (TRUE). I.e. I want to have a result:
month count
1 1
2 2
3 2
4 2
5 1
6 2
In months 5 and 6 'abc' should be excluded because the flag switched to 'FALSE' in month 5.
I am thinking about using "over" clause with "partition by" but I don't have any experience here so it's a struggle for me.
UPDATE
I have updated the last row in exemplary source data.
was:
6 abc FALSE
is:
6 abc TRUE
And the last row in output data.
Was:
6 1
is:
6 2
It might have not been obivous from the description that it should work this way and the proposed answer does not solve this problem.
UPDATE 2
I have managed to create a query that gives the result but it's ugly and I think could be shrinked by using over clause. Can you help me with that?
select t5.month_current, count(*) as count from
(select t3.month month_current, t4.month months_until_current, t3.name, t4.flag from
(select name ,month from
(select distinct name
from Source_data) t1
,(select distinct month
from Source_data) t2) t3
left join
Source_data t4
on t3.name = t4.name and t3.month >= t4.month) t5
inner join
(select t3.month month_current, max(t4.month) real_max_month_until_current, t3.name from
(select name ,month from
(select distinct name
from Source_data) t1
,(select distinct month
from Source_data) t2) t3
left join
Source_data t4
on t3.name = t4.name and t3.month >= t4.month
group by
t3.month, t3.name) t6
on t5.month_current = t6.month_current
and t5.months_until_current = t6.real_max_month_until_current
and t5.name = t6.name
where t5.flag = 'TRUE'
group by t5.month_current

You can do a cumulative distinct count as:
select t.*,
sum(case when seqnum = 1 then 1 else 0 end) over (order by month) as cnt
from (select t.*,
row_number() over (partition by name order by month) as seqnum
from t
) t;
I don't understand the logic for incorporating the flag.
You can replicate the results in the question by incorporating the flag:
select t.*,
sum(case when seqnum = 1 and flag = 'true' then 1
when seqnum = 1 and flag = 'false' then -1
else 0
end) over (order by month) as cnt
from (select t.*,
row_number() over (partition by name, flag order by month) as seqnum
from t
) t;

Related

SQL Server - Sum of difference between rows in a table

I have a table in the format :
SomeID SomeData
1 3
2 7
3 9
4 10
5 14
6 16
. .
. .
I want to find sum of difference between rows in this table. i.e ( (7-3) + (10-9) + (16-14) + ....)
Which is the best way to do this
Using a self join along with the modulus:
SELECT SUM(t1.SomeData - t2.SomeData) AS total_diff
FROM yourTable t1
INNER JOIN yourTable t2
ON t1.SomeID = t2.SomeID + 1
WHERE t1.SomeID % 2 = 0;
Demo
This answer assumes that the SomeID sequence in fact starts with 1 and increments by 1 with each subsequent row. If not, then we might be able to first apply ROW_NUMBER over SomeID and generate a 1 to N sequence.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY SomeID) rn
FROM yourTable
)
SELECT SUM(t1.SomeData - t2.SomeData) AS total_diff
FROM cte t1
INNER JOIN cte t2
ON t1.SomeID = t2.SomeID + 1
WHERE t1.rn % 2 = 0;
You can try to use ROW_NUMBER window function to make a serial number then MOD by 2 to get your expected group then use condition aggregate function.
Query 1:
SELECT SUM(CASE WHEN rn = 0 THEN SomeData END) - SUM(CASE WHEN rn = 1 THEN SomeData END)
FROM (
SELECT SomeData,ROW_NUMBER() over(order by SomeID) % 2 rn
FROM t t1
) t1
Results:
| |
|---|
| 7 |

SQL: Same date but different values

I have a table that looks like this
Station
year
month
day
number
A1
1990
1
1
50
A1
1990
1
1
60
A1
1990
1
2
55
A1
1990
1
3
10
A1
1990
1
4
40
In example , the query result will like below table
for same station and date
Station
year
month
day
number
A1
1990
1
1
50
A1
1990
1
1
60
How to set a proper SQL for it?
If I understand correctly, you want rows where the first four columns are duplicated. A simple method uses count(*):
select t.*
from (select t.*,
count(*) over (partition by station, year, month, date) as cnt
from t
) t
where cnt > 1;
Assuming your table have a primary key column called id, we can also try using exists logic here:
SELECT t1.*
FROM yourTable t1
WHERE EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.Station = t1.Station AND t2.year = t1.year AND
t2.month = t1.month AND t2.day = t1.day AND t2.id <> t1.id);
If you don't have such an id column, then we could also use aggregation here:
SELECT t1.*
FROM yourTable t1
INNER JOIN
(
SELECT Station, year, month, day
FROM yourTable
GROUP BY Station, year, month, day
HAVING COUNT(*) > 1
) t2
ON t2.Station = t1.Station AND t2.year = t1.year AND t2.month = t1.month AND
t2.day = t1.day;
You can use exists clause to find exact duplicate with different number as follows:
Select t.*
From your_table t
Where exists
(select 1 from your_table tt
Where tt.station = t.station
And tt.year = t.year
And tt.month = t.month
And tt.date = t.date
And tt.number <> t.number)

Running total (COUNT) SQL Server

I currently have this result
ID Code
1 AAA12
2 F5
3 GOFK568
4 G77
5 JLKJ4
6 FOG0
Now what i want to do is to create a third column that keeps a running total for codes that are above 4 in length.
Now, i have this code that gives me the sum of the code with above 4 in length.
SELECT * ,
SUM(CASE WHEN LENGTH(CODE) > 4 THEN 1 ELSE 0 END) AS [Count]
FROM Table1;
But this gives me this result
ID Code Count
1 AAA12 3
I am looking for a result like this
ID Code Running_Total
1 AAA12 1
2 F5 1
3 GOFK568 2
4 G77 2
5 JLKJ4 3
6 FOG0 3
I was working on something similar to this
SELECT * ,
CASE WHEN LENGTH(CODE) > 4 THEN (SUM(Code) OVER (PARTITION BY ID)) ELSE END
AS [Count]
FROM Table1;
But it still doesn't give me a running total.
I have an SQL Fiddle page
http://sqlfiddle.com/#!9/2746c/18
Any help would be great
Put the case in the sum:
SELECT Table1.* ,
SUM(case when len(Code) > 4 then 1 else 0 end) OVER (order BY ID) as counted
FROM Table1;
In Sql Server 2012+ you can use Sum() Over(Order by) function
SELECT Sum(CASE WHEN Len(code) > 4 THEN 1 ELSE 0 END)
OVER(ORDER BY id)
FROM Yourtable
for older versions
SELECT *
FROM Yourtable a
CROSS apply (SELECT Count(*)
FROM Yourtable b
WHERE a.ID >= b.ID
AND Len(code) > 4) cs (runn)
ANSI SQL method
SELECT ID,Code,
(SELECT count(*)
FROM Yourtable b
WHERE a.ID >= b.ID and char_length(code) > 4) AS runn
FROM Yourtable a
There are some good and efficient answers here.
But in case you want to try different approach then try following query:
SELECT
t1.*,
(Select sum(r.cnt) from
(SELECT COUNT(t2.code) as cnt FROM table1 AS t2
WHERE t2.Id <= t1.Id
group by t2.code
having len(t11.code) > 4) r
) AS Count
FROM table1 AS t1;
Here is the DEMO
Hope it helps!

Finding Missing Numbers series when Data Is Grouped in sql server

I need to write a query that will calculate the missing numbers with their count in a sequence when the data is "grouped". The data are in multiple groups & each group is in sequence.
For Ex. I have number series like 1001-1050, 1245-1270, 4571-4590 and all numbers like 1001,1002,1003,....1050 is stored in Table1 and from that Table1 some numbers are stored in another table Table2. E.g. 1001,1002,1003,1004,1005.
I want to get output like this:
Utilized Numbers | Balance Numbers |
----------- -------------------------
1001 - 1005 = 5 | 1006 - 1050 = 45 |
1245 - 1251 = 7 | 1252 - 1270 = 19 |
4571 - 4573 = 3 | 4574 - 4590 = 17 |
The number of each series is single field which is stored in both tables.
You haven't really explained your data, but guessing that "Utilized" are the numbers found in both Table1 and Table2, and "Balance" are the numbers only in Table1.
You can get the result at least this way, it's a little bit messy, mostly because of formatting the results:
Edit: This is a new version that does not use lag.
select
min (case when C2 = 1 then MINID end), max (case when C2 = 1 then MAXID end), max(case when C2=1 then ROWS end),
min (case when C2 = 0 then MINID end), max (case when C2 = 0 then MAXID end), max(case when C2=0 then ROWS end)
from (
select min(ID) as MINID, max(ID) as MAXID, count(*) as ROWS, C2, row_number() over (partition by C2 order by min(ID)) as GRP3 from (
select *, ID - RN as GRP1, ID - RN2 as GRP2 from (
select
T1.ID, row_number() over (order by T1.ID) as RN,
case when T2.ID is NULL then 0 else 1 end as C2,
row_number() over (partition by case when T2.ID is NULL then 0 else 1 end order by T1.ID) as RN2,
T2.ID as ID2
from #Table1 T1
left outer join #Table2 T2 on T1.ID = T2.ID
) X
) Y
group by GRP1, GRP2, C2
) Z
group by GRP3
order by 1
The idea here is to have a row number ordered by Table1.ID, and it's compared to the Table1.ID, and if the difference changes, then it's a new group. The same logic is used second time, but now partitioned differently for rows that exist in Table2 to handle changes between "Utilized" and "Balance".
From those groupings you can get the min and max value + number of rows. There's one additional grouping with min/max and case to format the result into 2 columns.
See the demo.

Count consecutive duplicate values in SQL

I have a table like so
ID OrdID Value
1 1 0
2 2 0
3 1 1
4 2 1
5 1 1
6 2 0
7 1 0
8 2 0
9 2 1
10 1 0
11 2 0
I want to get the count of consecutive value where the value is 0. Using the example above the result will be 3 (Rows 6, 7 and 8). I am using sql server 2008 r2.
I am going to presume that id is unique and increasing. You can get counts of consecutive values by using the different of row numbers. The following counts all sequences:
select grp, value, min(id), max(id), count(*) as cnt
from (select t.*,
(row_number() over (order by id) - row_number() over (partition by value order by id)
) as grp
from table t
) t
group by grp, value;
If you want the longest sequence of 0s:
select top 1 grp, value, min(id), max(id), count(*) as cnt
from (select t.*,
(row_number() over (order by id) - row_number() over (partition by value order by id)
) as grp
from table t
) t
group by grp, value
having value = 0
order by count(*) desc
A query using not exists to find consecutive 0s
select top 1 min(t2.id), max(t2.id), count(*)
from mytable t
join mytable t2 on t2.id <= t.id
where not exists (
select 1 from mytable t3
where t3.id between t2.id and t.id
and t3.value <> 0
)
group by t.id
order by count(*) desc
http://sqlfiddle.com/#!3/52989/3