SQL counting the number of ones in sequence - sql

I have the following table and as you can see the ids are not the same. So I can't do group by. I need to count all the ones that are in sequence. Like from id 9 to 13, from id 20 to 23. How i do it?

Here's a solution with LAG and LEAD.
;WITH StackValues AS
(
SELECT
T.*,
PreviousStatus = LAG(T.Status, 1, 0) OVER (ORDER BY T.ID ASC),
NextStatus = LEAD(T.Status, 1, 0) OVER (ORDER BY T.ID ASC)
FROM
#YourTable AS T
),
ValuesToSum AS
(
SELECT
L.*,
ValueToSum = CASE
WHEN L.Status = 1 AND L.PreviousStatus = 1 AND L.NextStatus = 0 THEN 1
ELSE 0 END
FROM
StackValues AS L
)
SELECT
Total = SUM(V.ValueToSum)
FROM
ValuesToSum AS V
LAG will give you the N previous row (N = 1 for this example) while LEAD will give you the N next row (N = 1 for this example). The query generates another column (ValueToSum) based on the previous and next values and uses it's result to sum.

Related

Find overlapping range in PL/SQL

Sample data below
id start end
a 1 3
a 5 6
a 8 9
b 2 4
b 6 7
b 9 10
c 2 4
c 6 7
c 9 10
I'm trying to come up with a query that will return all the overlap start-end inclusive between a, b, and c (but extendable to more). So the expected data will look like the following
start end
2 3
6 6
9 9
The only way I can picture this is with a custom aggregate function that tracks the current valid intervals then computes the new intervals during the iterate phase. However I can't see this approach being practical when working with large datasets. So if some bright mind out there have a query or some innate function that I'm not aware of I would greatly appreciate the help.
You can do this using aggregation and a join. Assuming no internal overlaps for "a" and "b":
select greatest(ta.start, tb.start) as start,
least(ta.end, tb.end) as end
from t ta join
t tb
on ta.start <= tb.end and ta.end >= tb.start and
ta.id = 'a' and tb.id = 'b';
This is a lot uglier and more complex than Gordon's solution, but I think it gives the expected answer better and should extend to work with more ids:
WITH NUMS(N) AS ( --GENERATE NUMBERS N FROM THE SMALLEST START VALUE TO THE LARGEST END VALUE
SELECT MIN("START") N FROM T
UNION ALL
SELECT N+1 FROM NUMS WHERE N < (SELECT MAX("END") FROM T)
),
SEQS(N,START_RANK,END_RANK) AS (
SELECT N,
CASE WHEN IS_START=1 THEN ROW_NUMBER() OVER (PARTITION BY IS_START ORDER BY N) ELSE 0 END START_RANK, --ASSIGN A RANK TO EACH RANGE START
CASE WHEN IS_END=1 THEN ROW_NUMBER() OVER (PARTITION BY IS_END ORDER BY N) ELSE 0 END END_RANK --ASSIGN A RANK TO EACH RANGE END
FROM (
SELECT N,
CASE WHEN NVL(LAG(N) OVER (ORDER BY N),N) + 1 <> N THEN 1 ELSE 0 END IS_START, --MARK N AS A RANGE START
CASE WHEN NVL(LEAD(N) OVER (ORDER BY N),N) -1 <> N THEN 1 ELSE 0 END IS_END /* MARK N AS A RANGE END */
FROM (
SELECT DISTINCT N FROM ( --GET THE SET OF NUMBERS N THAT ARE INCLUDED IN ALL ID RANGES
SELECT NUMS.*,T.*,COUNT(*) OVER (PARTITION BY N) N_CNT,COUNT(DISTINCT "ID") OVER () ID_CNT
FROM NUMS
JOIN T ON (NUMS.N >= T."START" AND NUMS.N <= T."END")
) WHERE N_CNT=ID_CNT
)
) WHERE IS_START + IS_END > 0
)
SELECT STARTS.N "START",ENDS.N "END" FROM SEQS STARTS
JOIN SEQS ENDS ON (STARTS.START_RANK=ENDS.END_RANK AND STARTS.N <= ENDS.N) ORDER BY "START"; --MATCH CORRESPONDING RANGE START/END VALUES
First we generate all the numbers between the smallest start value and the largest end value.
Then we find the numbers that are included in all the provided "id" ranges by joining our generated numbers to the ranges, and selecting each number "n" that appears once for each "id".
Then we determine whether each of these values "n" starts or ends a range. To determine that, for each N we say:
If the previous value of N does not exist or is not 1 less than current N, current N starts a range. If the next value of N does not exist or is not 1 greater than current N, current N ends a range.
Next, we assign a "rank" to each start and end value so we can match them up.
Finally, we self-join where the ranks match (and where the start <= the end) to get our result.
EDIT: After some searching, I came across this question which shows a better way to find the start/ends and refactored the query to:
WITH NUMS(N) AS ( --GENERATE NUMBERS N FROM THE SMALLEST START VALUE TO THE LARGEST END VALUE
SELECT MIN("START") N FROM T
UNION ALL
SELECT N+1 FROM NUMS WHERE N < (SELECT MAX("END") FROM T)
)
SELECT MIN(N) "START",MAX(N) "END" FROM (
SELECT N,ROW_NUMBER() OVER (ORDER BY N)-N GRP_ID
FROM (
SELECT DISTINCT N FROM ( --GET THE SET OF NUMBERS N THAT ARE INCLUDED IN ALL ID RANGES
SELECT NUMS.*,T.*,COUNT(*) OVER (PARTITION BY N) N_CNT,COUNT(DISTINCT "ID") OVER () ID_CNT
FROM NUMS
JOIN T ON (NUMS.N >= T."START" AND NUMS.N <= T."END")
) WHERE N_CNT=ID_CNT
)
)
GROUP BY GRP_ID ORDER BY "START";

Select Random Numbers from a list

This is my query.
SELECT TOP 2 NUM
FROM QT_PIVOT
WHERE NUM BETWEEN 1 AND 45
ORDER BY NEWID()
I'm selecting 2 random numbers from a list but I don't want that these numbers to be continuous
Sometimes the result is
NUM
----
2
3
And I don't want this
Thanks , and sorry for my English u.u
Basically the same as the 2nd approach Gordon uses except it lacks the use of the lag function and therefor will work on SQL-2008.
WITH Data AS(
SELECT *, RowNum = ROW_NUMBER() OVER (ORDER BY NEWID())
FROM sys.objects AS O
),
r AS(
SELECT TOP 1 *, SkipRow = 0
FROM Data
WHERE Data.RowNum = 1
UNION ALL
SELECT d.*, SkipRow = CASE WHEN d.object_id BETWEEN r.object_id -2 AND r.object_id + 2 THEN 1 ELSE 0 END
FROM r
JOIN Data AS D
ON r.RowNum + 1 = D.RowNum
)
SELECT TOP 2 * FROM R
WHERE R.SkipRow = 0
One approach is to select the first number, and then select an appropriate second number:
WITH r AS (
SELECT TOP 1 num
FROM QT_PIVOT
WHERE NUM BETWEEN 1 AND 45
ORDER BY NEWId()
)
select num
from r
union all
select top 1 q.num
from qt_pivot q join
r
on q.num not in (r.num, r.num - 1, r.num + 1)
where q.num between 1 and 45
order by newid();
Another approach (if you had SQL Server 2012+) would use lag() to remove any possibilities that do not meet the conditions:
WITH r AS (
SELECT num, row_number() over (order by newid()) as seqnum
FROM QT_PIVOT
WHERE NUM BETWEEN 1 AND 45
)
SELECT r.num
FROM (SELECT r.*, LAG(num) OVER (ORDER BY seqnum) as prevnum
FROM r
) r
WHERE prevnum is null or
prevnum not in (num - 1, num + 1);
EDIT:
The first approach doesn't work, because SQL Server always re-evaluates CTEs, and there is not even a hint to fix this problem. Here is an alternative approach, that will ensure that values are not consecutive:
WITH r as (
SELECT (1 + checksum(newid()) * 45) as r1,
(2 + checksum(newid()) * 43) as r2
)
SELECT q.num
FROM QT_PIVOT q
WHERE q.num = r.r1 or
q.num = 1 + (r.r1 + r.r2) % 45;
This calculates a two random numbers. The first is a random position. The second is an allowable offset (hence the "2" and "43") to guarantee that the numbers are not adjacent.

Need help creating SQL query from example of data

I have a database table below.
And I want to get list of all DBKey that have: at least one entry with Staled=1, and the last entry is Staled=0
The list should not contain DBKey that has only Staled=0 OR Staled=1.
In this example, the list would be: DBKey=2 and DBKey=3
I think this should do the trick:
SELECT DISTINCT T.DBKey
FROM TABLE T
WHERE
-- checks that the DBKey has at least one entry with Staled = 1
EXISTS (
SELECT DISTINCT Staled
FROM TABLE
WHERE DBKey = T.DBKey
AND Staled = 1
)
-- checks that the last Staled entry for this DBKey is 0
AND EXISTS (
SELECT DISTINCT Staled
FROM TABLE
WHERE DBKey = T.DBKey
AND Staled = 0
AND EntryDateTime = (
SELECT MAX(EntryDateTime)
FROM TABLE
WHERE DBKey = T.DBKey
)
)
Here is a working SQLFiddle of the query, using your sample data.
The idea is to use EXISTS to look for those individual conditions that you've described. I've added comments to my code to explain what each does.
Should be done with a simple JOIN... Starting FIRST with any 1 qualifiers, joined to itself by same key AND 0 staled qualifier AND the 0 record has a higher date. Ensure you have an index on ( DBKey, Staled, EntryDateTime )
SELECT
YT.DBKey,
MAX( YT.EntryDateTime ) as MaxStaled1,
MAX( YT2.EntryDateTime ) as MaxStaled0
from
YourTable YT
JOIN YourTable YT2
ON YT.DBKey = YT2.DBKey
AND YT2.Staled = 0
AND YT.EntryDateTime < YT2.EntryDateTime
where
YT.Staled = 1
group by
YT.DBKey
having
MAX( YT.EntryDateTime ) < MAX( YT2.EntryDateTime )
Maybe this:
With X as
(
Select Row_Number() Over (Partition By DBKey Order By EntryDateTime Desc) RN, DBKey, Staled
From table
)
Select *
From X
Where rn = 1 and staled = 0 and
Exists (select 1 from x x2 where x2.dbkey = x.dbkey and Staled = 1)

T-sql problem with running sum

I am trying to write T-sql script which will find "open" records for one table
Structure of data is following
Id (int PK) Ts (datetime) Art_id (int) Amount (float)
1 '2009-01-01' 1 1
2 '2009-01-05' 1 -1
3 '2009-01-10' 1 1
4 '2009-01-11' 1 -1
5 '2009-01-13' 1 1
6 '2009-01-14' 1 1
7 '2009-01-15' 2 1
8 '2009-01-17' 2 -1
9 '2009-01-18' 2 1
According to my needs I am trying to show only records after last sum for every one articles where 0 sorting by date of last running sum of zero value. So I am trying to abstract (show) records 5 and 6 for Art_id=1 and record 9 for art_id=2. I am using MSSQL2005 and my table has around 30K records with 6000 distinct values of ART_ID.
In this solution I simply want to find all the rows where there isn't a subsequent row for that Art_id where the running sum was 0. I am assuming we can use the ID as a better tiebreaker than TS, since two rows can come in with the same timestamp but they will get sequential identity values.
;WITH base AS
(
SELECT
ID, Art_id, TS, Amount,
RunningSum = Amount + COALESCE
(
(
SELECT SUM(Amount)
FROM dbo.foo
WHERE Art_id = f.Art_id
AND ID < f.ID
)
, 0
)
FROM dbo.[table name] AS f
)
SELECT ID, Art_id, TS, Amount
FROM base AS b1
WHERE NOT EXISTS
(
SELECT 1
FROM base AS b2
WHERE Art_id = b1.Art_id
AND ID >= b1.ID
AND RunningSum = 0
)
ORDER BY ID;
Complete working query:
SELECT
*
FROM TABLE_NAME E
JOIN
(SELECT
C.ART_ID,
MAX(TS) MAX_TS
FROM
(SELECT
ART_ID,
TS,
COALESCE((SELECT SUM(AMOUNT) FROM TABLE_NAME B WHERE (B.Art_id = A.Art_id) AND (B.Ts < A.Ts)),0) ROW_SUM
FROM TABLE_NAME A) C
WHERE C.ROW_SUM = 0
GROUP BY C.ART_ID) D
ON
(D.ART_ID = E.ART_ID) AND
(E.TS >= D.MAX_TS)
First we calculate running sums for every row:
SELECT
ART_ID,
TS,
COALESCE((SELECT SUM(AMOUNT) FROM TABLE_NAME B WHERE (B.Art_id = A.Art_id) AND (B.Ts < A.Ts)),0) ROW_SUM
FROM TABLE_NAME A
Then we look for last article with 0:
SELECT
C.ART_ID,
MAX(TS) MAX_TS
FROM
(SELECT
ART_ID,
TS,
COALESCE((SELECT SUM(AMOUNT) FROM TABLE_NAME B WHERE (B.Art_id = A.Art_id) AND (B.Ts < A.Ts)),0) ROW_SUM
FROM TABLE_NAME A) C
WHERE C.ROW_SUM = 0
GROUP BY C.ART_ID
You can find all rows where the running sum is zero with:
select cur.id, cur.art_id
from #articles cur
left join #articles prev
on prev.art_id = cur.art_id
and prev.id <= cur.id
group by cur.id, cur.art_id
having sum(prev.amount) = 0
Then you can query all rows that come after the rows with a zero running sum:
select a.*
from #articles a
left join (
select cur.id, cur.art_id, running = sum(prev.amount)
from #articles cur
left join #articles prev
on prev.art_id = cur.art_id
and prev.ts <= cur.ts
group by cur.id, cur.art_id
having sum(prev.amount) = 0
) later_zero_running on
a.art_id = later_zero_running.art_id
and a.id <= later_zero_running.id
where later_zero_running.id is null
The LEFT JOIN in combination with the WHERE says: there can not be a row after this row, where the running sum is zero.

Row Number Tsql

I have the following Syntax
select rcp.CalendarPeriodId
,rc.CalendarId
,rcp.CalendarYearId
,rcp.PeriodNumber
,rcp.PeriodStartDate,rcp.PeriodEndDate
,CASE WHEN GETDATE() BETWEEN rcp.PeriodStartDate AND rcp.PeriodEndDate THEN 1 ELSE 0 END AS 'CurrentPeriod'
from RentCalendarPeriod rcp
LEFT JOIN RentCalendarYear rcy ON rcy.CalenderYearId = rcp.CalendarYearId
LEFT JOIN RentCalendar rc ON rc.CalendarId = rcy.CalendarId
What this is doing is that a I have two Calendars (CalenderID 1 = Weekly, CalenderID 2 = Monthly) This is the RentCalendar table.
Each Rent Calendar has a Year (RentCalendarYear table),which in turn each Year has a set of periods.
You will notice that line 47, the final column has been marked as 1 (true) This is because it is the current period.
What I need to do is mark the previous 12 periods for any CalendarId. I was wondering if I could achieve this with ROW_NUMBER, with the field CurrentPeriod WHERE = 1 will be 1 and all periods before will start to be numbered 2, 3, 4, 5 and so on.
I don't know how to do this though.
So something like this:
SELECT * FROM (
select rcp.CalendarPeriodId,rc.CalendarId,rcp.CalendarYearId,rcp.PeriodNumber,rcp.PeriodStartDate,rcp.PeriodEndDate,
ROW_NUMBER() OVER(ORDER BY PeriodStartDate DESC) AS CurrentPeriod
from RentCalendarPeriod rcp
LEFT JOIN RentCalendarYear rcy ON rcy.CalenderYearId = rcp.CalendarYearId
LEFT JOIN RentCalendar rc ON rc.CalendarId = rcy.CalendarId)
WHERE currentperiod <= 12
I'm not sure if I understood you correctly.. this will give you for the latests week 1, second one 2 , third one 3 and so on in CurrentPeriod column
Something like this:
;WITH CTE AS (
SELECT rcp.CalendarPeriodId, rc.CalendarId, rcp.CalendarYearId,
rcp.PeriodNumber, rcp.PeriodStartDate, rcp.PeriodEndDate,
ROW_NUMBER() OVER (ORDER BY rcp.CalendarPeriodId) AS rn,
CASE
WHEN GETDATE() BETWEEN rcp.PeriodStartDate AND
rcp.PeriodEndDate THEN 1
ELSE 0
END AS 'CurrentPeriod'
FROM RentCalendarPeriod rcp
LEFT JOIN RentCalendarYear rcy ON rcy.CalenderYearId = rcp.CalendarYearId
LEFT JOIN RentCalendar rc ON rc.CalendarId = rcy.CalendarId
)
SELECT CalendarPeriodId, CalendarId, CalendarYearId,
PeriodNumber, PeriodStartDate, PeriodEndDate,
'CurrentPeriod',
(t.rn + 1) - c.rn AS rn
FROM CTE AS c
CROSS JOIN (SELECT rn FROM CTE WHERE 'CurrentPeriod' = 1) AS t
WHERE rn BETWEEN t.rn - 11 AND t.rn
This will return 12 records, the one having CurrentPeriod = 1 and the previous 11 records. Field rn enumerates records starting from the one having CurrentPeriod = 1.