Window Function / Aggregate Function / Interrupting Window - sql

I have a Table looking like this (Cols A-D):
A B C D E
----------------------------------------------------------
1 2011 2011-06-30 A 2013-06-30
1 2012 2012-06-30 A 2013-06-30
1 2013 2013-06-30 A 2013-06-30
1 2014 2015-06-30 B 2015-06-30
1 2015 9999-12-31 A 9999-12-31
2 2014 9999-12-31 C 9999-12-31
2 2015 9999-12-31 C 9999-12-31
2 2016 9999-12-31 C 9999-12-31
I try to create col E based on A-D via window functions. I need to calculate the max(C) without interruption of D (if it changes the next window should begin) ordered by A, B and C.

You need to identify adjacent groups. One method uses a difference of window functions to identify the groups:
select t.*,
max(c) over (partition by a, seqnum_a - seqnum_ad) as e
from (select t.*,
row_number() over (partition by a order by b) as seqnum_a,
row_number() over (partition by a, d order by b) as seqnum_ad
from t
) t;
It is a bit hard to explain how the difference of row numbers works. However, if you run the subquery and stare at the results, you'll probably see how it works.

Try below query to get the requested result
select t1.*,t2.C as E from table1 as t1
(select D,max(c) C from table1 group by D) as t2 on t1.D=t2.D

Related

How to filter out multiple downtime events in SQL Server?

There is a query I need to write that will filter out multiples of the same downtime event. These records get created at the exact same time with multiple different timestealrs which I don't need. Also, in the event of multiple timestealers for a downtime event I need to make the timestealer 'NULL' instead.
Example table:
Id
TimeStealer
Start
End
Is_Downtime
Downtime_Event
1
Machine 1
2022-01-01 01:00:00
2022-01-01 01:01:00
1
Malfunction
2
Machine 2
2022-01-01 01:00:00
2022-01-01 01:01:00
1
Malfunction
3
NULL
2022-01-01 00:01:00
2022-01-01 00:59:59
0
Operating
What I need the query to return:
Id
TimeStealer
Start
End
Is_Downtime
Downtime_Event
1
NULL
2022-01-01 01:00:00
2022-01-01 01:01:00
1
Malfunction
2
NULL
2022-01-01 00:01:00
2022-01-01 00:59:59
0
Operating
Seems like this is a top 1 row of each group, but with the added logic of making a column NULL when there are multiple rows. You can achieve that by also using a windowed COUNT, and then a CASE expression in the outer SELECT to only return the value of TimeStealer when there was 1 event:
WITH CTE AS(
SELECT V.Id,
V.TimeStealer,
V.Start,
V.[End],
V.Is_Downtime,
V.Downtime_Event,
ROW_NUMBER() OVER (PARTITION BY V.Start, V.[End], V.Is_Downtime,V.Downtime_Event ORDER BY ID) AS RN,
COUNT(V.ID) OVER (PARTITION BY V.Start, V.[End], V.Is_Downtime,V.Downtime_Event) AS Events
FROM(VALUES('1','Machine 1',CONVERT(datetime2(0),'2022-01-01 01:00:00'),CONVERT(datetime2(0),'2022-01-01 01:01:00'),'1','Malfunction'),
('2','Machine 2',CONVERT(datetime2(0),'2022-01-01 01:00:00'),CONVERT(datetime2(0),'2022-01-01 01:01:00'),'1','Malfunction'),
('3','NULL',CONVERT(datetime2(0),'2022-01-01 00:01:00'),CONVERT(datetime2(0),'2022-01-01 00:59:59'),'0','Operating'))V(Id,TimeStealer,[Start],[End],Is_Downtime,Downtime_Event))
SELECT ROW_NUMBER() OVER (ORDER BY ID) AS ID,
CASE WHEN C.Events = 1 THEN C.TimeStealer END AS TimeStealer,
C.Start,
C.[End],
C.Is_Downtime,
C.Downtime_Event
FROM CTE C
WHERE C.RN = 1;

How do I group aggregated data a certain way

I have the following sample transactional item receipt data, consisting of Item, Vendor and Receipt Date:
Item
Vendor
Receipt_Date
A
1
2021-01-01 00:00:00.000
A
2
2021-01-31 00:00:00.000
B
1
2021-02-01 00:00:00.000
B
2
2021-02-10 00:00:00.000
B
3
2021-02-20 00:00:00.000
C
7
2021-03-01 00:00:00.000
I want to select the Vendor for each Item, based on the last (max) Receipt Date, so the expected result for the above sample would be:
Item
Last_Vendor_For_Receipt
A
2
B
3
C
7
I can group the data per Item and Vendor, but I cannot figure out how to achieve the above expected result with an outer query. I'm using SQL Server 2012. Here's the initial query:
select
ir.Item
,ir.Vendor
,max(ir.Receipt_Date) Last_Receipt_Date
from
ItemReceipt ir
I checked online and in the forum, but it was hard to search for my specific question.
Thanks
Here is one approach using TOP with ROW_NUMBER:
SELECT TOP 1 WITH TIES *
FROM yourTable
ORDER BY ROW_NUMBER() OVER (PARTITION BY Item ORDER BY Receipt_Date DESC);
First you select the desired max date per item:
select max(Receipt_Date) as max_rcpt_date
, Item
from your_unknown_table
group by Item
And then you can use this as a subquery to get the vendor:
select Item
, Vendor
from your_unknown_table
where ( Receipt_Date, Item ) in
( select max(Receipt_Date) as max_rcpt_date
, Item
from your_unknown_table
group by Item
)
This will work in Oracle. I'm not sure if this subquery-structure in SQL-Server wil work.

Join two tables side by side

I have these two tables that I need to join side by side
Table A
id
date
1
03/01/2021
1
04/01/2021
1
05/01/2021
2
04/01/2021
2
05/01/2021
3
03/01/2021
3
04/01/2021
Table B
id
date
1
03/01/2021
1
04/01/2021
1
05/01/2021
1
06/01/2021
2
04/02/2021
2
05/02/2021
3
03/01/2021
The output would be
id
dateA
dateB
1
03/01/2021
03/01/2021
1
04/01/2021
04/01/2021
1
05/01/2021
05/01/2021
1
06/01/2021
2
04/01/2021
04/02/2021
2
05/01/2021
05/02/2021
3
03/01/2021
03/01/2021
3
04/01/2021
Basically, search all records that match a value, (for example 1, then list them side by side)
I tried joining them using id as key but it spawned a multitude of other rows that I don't want. Tried grouping as well but it messes with the order
I'm using sqlite via pandas
The query below causes some extra rows to be returned, which I can't figure out how to filter out
SELECT
A.id, A.date, B.date
FROM
A
JOIN
B ON B.id = A.id
Adding a group by causes the table to output only the first records of each multiple
Use a CTE where you rank all the rows of both tables by id and order of the dates and then aggregate:
WITH cte AS (
SELECT id, date dateA, null dateB, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) rn
FROM TableA
UNION ALL
SELECT id, null, date, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) rn
FROM TableB
)
SELECT id, MAX(dateA) dateA, MAX(dateB) dateB
FROM cte
GROUP BY id, rn
ORDER BY id, rn;
See the demo.
Note that your dates as they are in the format dd/mm/yyyy, they are not comparable.
You should change them to yyyy-mm-dd for the code to work properly.

How to get ' COUNT DISTINCT' over moving window

I'm working on a query to compute the distinct users of particular features of an app within a moving window. So, if there's a range from 15-20th October, I want a query to go from 8-15 Oct, 9-16 Oct etc and get the count of distinct users per feature. So for each date, it should have x rows where x is the number of features.
I have a query the following query so far:
WITH V1(edate, code, total) AS
(
SELECT date, featurecode,
DENSE_RANK() OVER ( PARTITION BY (featurecode ORDER BY accountid ASC) + DENSE_RANK() OVER ( PARTITION BY featurecode ORDER By accountid DESC) - 1
FROM....
GROUP BY edate, featurecode, appcode, accountid
HAVING appcode='sample' AND eventdate BETWEEN '15-10-2018' And '20-10-2018'
)
Select distinct date, code, total
from V1
WHERE date between '2018-10-15' AND '2018-10-20'
This returns the same set of values for all the dates. Is there any way to do this efficiently?? It's a DB2 database by the way but I'm looking for insight from postgresql users too.
Present result- All the totals are being repeated.
date code total
10/15/2018 appname-feature1 123
10/15/2018 appname-feature2 234
10/15/2018 appname-feature3 321
10/16/2018 appname-feature1 123
10/16/2018 appname-feature2 234
10/16/2018 appname-feature3 321
Desired result.
date code total
10/15/2018 appname-feature1 123
10/15/2018 appname-feature2 234
10/15/2018 appname-feature3 321
10/16/2018 appname-feature1 212
10/16/2018 appname-feature2 577
10/16/2018 appname-feature3 2345
This is not easy to do efficiently. DISTINCT counts are't incrementally maintainable (unless you go down the route of in-exact DISTINCT counts such as HyperLogLog).
It is easy to code in SQL, and try the usual indexing etc to help.
It is (possibly) not possible, however, to code with OLAP functions.. not least because you can only use RANGE BETWEEN for SUM(), COUNT(), MAX() etc, but not RANK() or DENSE_RANK() ... so just use a traditional co-related sub-select
First some data
CREATE TABLE T(D DATE,F CHAR(1),A CHAR(1));
INSERT INTO T (VALUES
('2018-10-10','X','A')
, ('2018-10-11','X','A')
, ('2018-10-15','X','A')
, ('2018-10-15','X','A')
, ('2018-10-15','X','B')
, ('2018-10-15','Y','A')
, ('2018-10-16','X','C')
, ('2018-10-18','X','A')
, ('2018-10-21','X','B')
)
;
Now a simple select
WITH B AS (
SELECT DISTINCT D, F FROM T
)
SELECT D,F
, (SELECT COUNT(DISTINCT A)
FROM T
WHERE T.F = B.F
AND T.D BETWEEN B.D - 3 DAYS AND B.D + 4 DAYS
) AS DISTINCT_A_MOVING_WEEK
FROM
B
ORDER BY F,D
;
giving, e.g.
D F DISTINCT_A_MOVING_WEEK
---------- - ----------------------
2018-10-10 X 1
2018-10-11 X 2
2018-10-15 X 3
2018-10-16 X 3
2018-10-18 X 3
2018-10-21 X 2
2018-10-15 Y 1

Get MAX count but keep the repeated calculated value if highest

I have the following table, I am using SQL Server 2008
BayNo FixDateTime FixType
1 04/05/2015 16:15:00 tyre change
1 12/05/2015 00:15:00 oil change
1 12/05/2015 08:15:00 engine tuning
1 04/05/2016 08:11:00 car tuning
2 13/05/2015 19:30:00 puncture
2 14/05/2015 08:00:00 light repair
2 15/05/2015 10:30:00 super op
2 20/05/2015 12:30:00 wiper change
2 12/05/2016 09:30:00 denting
2 12/05/2016 10:30:00 wiper repair
2 12/06/2016 10:30:00 exhaust repair
4 12/05/2016 05:30:00 stereo unlock
4 17/05/2016 15:05:00 door handle repair
on any given day need do find the highest number of fixes made on a given bay number, and if that calculated number is repeated then it should also appear in the resultset
so would like to see the result set as follows
BayNo FixDateTime noOfFixes
1 12/05/2015 00:15:00 2
2 12/05/2016 09:30:00 2
4 12/05/2016 05:30:00 1
4 17/05/2016 15:05:00 1
I manage to get the counts of each but struggling to get the max and keep the highest calculated repeated value. can someone help please
Use window functions.
Get the count for each day by bayno and also find the min fixdatetime for each day per bayno.
Then use dense_rank to compute the highest ranked row for each bayno based on the number of fixes.
Finally get the highest ranked rows.
select distinct bayno,minfixdatetime,no_of_fixes
from (
select bayno,minfixdatetime,no_of_fixes
,dense_rank() over(partition by bayno order by no_of_fixes desc) rnk
from (
select t.*,
count(*) over(partition by bayno,cast(fixdatetime as date)) no_of_fixes,
min(fixdatetime) over(partition by bayno,cast(fixdatetime as date)) minfixdatetime
from tablename t
) x
) y
where rnk = 1
Sample Demo
You are looking for rank() or dense_rank(). I would right the query like this:
select bayno, thedate, numFixes
from (select bayno, cast(fixdatetime) as date) as thedate,
count(*) as numFixes,
rank() over (partition by cast(fixdatetime as date) order by count(*) desc) as seqnum
from t
group by bayno, cast(fixdatetime as date)
) b
where seqnum = 1;
Note that this returns the date in question. The date does not have a time component.