SQL get single row based on multiple condition after group by - sql

Need help with creating query for below case :
Suppose I have a Table with following records
Name Date Time Category CategoryKey
John 10/20/2012 10:00 Low 2
Sam 10/20/2012 10:00 High 4
Harry 10/20/2012 10:00 Medium 1
Michael 10/20/2012 10:00 Grey 3
Rob 10/22/2013 11:00 Low 2
Marry 10/23/2014 12:00 Low 2
Richard 10/23/2014 12:00 Grey 3
Jack 10/24/2015 1:30 High 4
Then If there are multiple Names for same date and time then force select only one record based on following logic and stop when any 1 of the following condition is met.
If Category is Medium then take name
Else If Category is High then take name
Else If Category is Low then take name
Else If Category is Grey then take name
So that the Final result will be
Name Date Time Category CategoryKey
Harry 10/20/2012 10:00 Medium 1
Rob 10/22/2013 11:00 Low 2
Marry 10/23/2014 12:00 Low 2
Jack 10/24/2015 1:30 High 4

The simplest method is row_number():
select t.*
from (select t.*,
row_number() over (partition by date, time
order by (case category when 'Medium' then 1 when 'High' then 2 when 'Low' then 3 when 'Grey' then 4 else 5 end)
) as seqnum
from t
) t
where seqnum = 1;
It can be convenient to use string functions here:
row_number() over (partition by date, time
order by charindex(category, 'Medium,High,Low,Grey')
) as seqnum
This works for your case, but you need to be a little careful that all values are included and that none "contain" another value.

Related

How to identify invalid records from a dimension table?

This is my sample data. Its a slowing changing dimension (type 2).
iddim
idperson
name
role
IsActive
start
end
1
1234
jim
driver
1
2022-01-01
2022-02-03
2
1234
jim
driver
0
2022-02-03
9999-12-31
3
3456
tom
accountant
1
2022-01-01
2022-08-30
4
4567
patty
assistant
1
2022-01-01
9999-12-31
Due to a server error one of my ssis packages performed some unexpected actions and there are now idperson without the 99991231 end date (ie. Tom)
I require to identify them so I can manually modify this condition so my resulting table will be
iddim
idperson
name
role
IsActive
start
end
1
1234
jim
driver
1
2022-01-01
2022-02-03
2
1234
jim
driver
0
2022-02-04
9999-12-31
3
3456
tom
accountant
1
2022-01-01
2022-08-30
4
4567
patty
assistant
1
2022-01-01
9999-12-31
5
3456
tom
accountant
0
2022-08-31
9999-12-31
So, as I understand your requirements, you need to generate records to fill the gaps between the latest end date (per person) and '9999-12-31'. the filler records should have IsActive = 0 and should inherit the latest prior name and role for that idperson.
Perhaps something like the following:
SELECT
idperson,
name,
role,
IsActive = 0,
start = DATEADD(day, 1, [end]),
[end] = '9999-12-31'
FROM (
SELECT *, Recency = ROW_NUMBER() OVER(PARTITION BY idperson ORDER BY [End] DESC)
FROM #Data
) D
WHERE Recency = 1 AND [end] < '9999-12-31'
ORDER BY iddim
The Recency value calculated above will be 1 for the latest record per idperson ands 2, 3, etc. for records with older end dates. If the latest record isn't end-of-time, a filler record is generated.
See this db<>fiddle for a working example (which includes a few additional test data records).
Note: The two existing jim records in your original posted data have different idperson values, so they are treated as different persons and the first triggers a gap record.
UPDATE: The above was revised to allow for possible name change over time for a given idperson.

Build time window counters from raw data - Big Query

Consider raw events data regarding purchases in 2020, as per the following table:
BUYER DATE ITEM
Joe '2020-01-15' Dr. Pepper
Joe '2020-02-15' Dr. Pepper
Joe '2020-03-15' Dr. Pepper
Joe '2020-05-15' Dr. Pepper
Joe '2020-10-15' Dr. Pepper
Joe '2020-12-15' Dr. Pepper
I would like to aggregate the data to see what Joe did in a monthly moving sum, i.e., obtaining as an outcome
BUYER Date Num_Purchases_last_3months
Joe '2020-01-31' 1
Joe '2020-02-31' 2
Joe '2020-03-31' 3
Joe '2020-04-31' 2
.
.
.
Joe '2020-11-31' 1
Joe '2020-12-31' 2
How could I obtain the desired result in an efficient query?
You can use window functions, in this case, count(*) with a range window frame specification:
select t.*,
count(*) over (partition by buyer
order by extract(year from date) * 12 + extract(month from date)
range between 2 preceding and current row
) as Num_Purchases_last_3months
from t;

Rank values from table based on temporal sequences of values

I have a table similar to this one, representing which drivers were driving different cars at certain times.
CAR_ID DRIVER_ID DT
10 A 10:00
10 A 12:00
10 A 14:00
10 B 16:00
10 B 17:00
10 B 20:00
10 A 21:00
10 A 22:00
20 C 15:00
20 C 18:00
Where DT is a datetime. I'm trying to have something similar to what I would obtain using a DENSE_RANK() function but generating a new number when there is a change on the column DRIVER_ID between two drivers. This would be my expected output:
CAR_ID DRIVER_ID DT RES
10 A 10:00 1
10 A 12:00 1
10 A 14:00 1
10 B 16:00 2
10 B 17:00 2
10 B 20:00 2
10 A 21:00 3 #
10 A 22:00 3 #
20 C 15:00 4
20 C 18:00 4
Using DENSE_RANK() OVER (PARTITION BY CAR_ID, DRIVER_ID ORDER BY DT) AS RES I get the two elements marked with a # as members of the same group as the first three rows, but I want them to be different, because of the "discontinuity" (the car was driven by another driver from 16:00 to 20:00). I can't seem to find a solution that doesn't include a loop. Is this possible?
Any help would be greatly appreciated.
This can be done with lag and running sum.
select t.*,sum(case when prev_driver = driver then 0 else 1 end) over(partition by id order by dt) as res
from (select t.*,lag(driver_id) over(partition by id order by dt) as prev_driver
from tbl
) t
You need to do.a row_number partitioned by car and ordered by dt. Also you need to do a row_number partitioned by car and driver and ordered by dt. Subtracting the second of these from the first gives you a unique "segment" number - which in this case will represent the continuous period of possession that each driver had of each car.
This segment number value has no Intrinsic meaning - it is just guaranteed to be different for each segment within the partition of cars and drivers. Then use this segment number as an additional partition for whatever function you are trying to apply.
As a note, however, I couldn't work out how you got the results you've displayed for RES from the code you quote, and thus I'm not exactly sure what you are trying to achieve overall.

Report on a point in time

I am about to create what I assume will be 2 new tables in SQL. The idea is for one to be the "live" data and a second which would hold all the changes. Dates are in DD/MM/YYYY format.
Active
ID | Name | State Date | End Date
1 Zac 1/1/2016 -
2 John 1/5/2016 -
3 Sam 1/6/2016 -
4 Joel 1/7/2016 -
Changes
CID | UID | Name | Start Date | End Date
1 1 Zac 1/1/2016 -
2 4 Joel 1/1/2016 -
3 4 Joel - 1/4/2016
4 2 John 1/5/2016 -
5 3 Sam 1/6/2016 -
6 4 Joel 1/7/2016 -
In the above situation you can see that Joel worked from the 1/1/2016 until the 1/4/2016, took 3 months off and then worked from the 1/7/2016.
I need to build a query where by I can pick a date in time and report on who was working at that time. The above table only lists the name but there will be many more columns to report on for a point in time.
What would be best way to structure the tables to be able to achieve this query.
I started writing this last night and finally coming back to it. Basically you would have to use your change table to create a Slowly Changing Dimension and then generate a row number to match your start and ends. This will assume however that your DB will never be out of sync by adding 2 start records or 2 end records in a row.
This also assumes you are using a RDBMS that supports common table expressions and Window Functions such as SQL Server, Oracle, PostgreSQL, DB2....
WITH cte AS (
SELECT
*
,ROW_NUMBER() OVER (PARTITION BY UID ORDER BY ISNULL(StartDate,EndDate)) As RowNum
FROM
Changes c
)
SELECT
s.UID
,s.Name
,s.StartDate
,COALESCE(e.EndDate,GETDATE()) as EndDate
FROM
cte s
LEFT JOIN cte e
ON s.UID = e.UID
AND s.RowNum + 1 = e.RowNum
WHERE
s.StartDate IS NOT NULL
AND '2016-05-05' BETWEEN s.StartDate AND COALESCE(e.EndDate,GETDATE())

Compare two rows in SQL Server and return only one row

I have a table (trips) that has response data with columns:
TripDate
Job
Address
DispatchDateTime
OnSceneDateTime
Vehicle
Often two vehicles will respond to the same address on the same date, and I need to find the one that was there first.
I've tried this:
SELECT
TripDate,
Job,
Vehicle,
DispatchDateTime
(SELECT min(OnSceneDateTime)
FROM Trips AS FirstOnScene
WHERE AllTrips.TripDate = FirstOnScene.TripDate
AND AllTrips.Address = FirstOnScene.Address) AS FirstOnScene
FROM
Trips AS AllTrips
But I still get both records returned, and both have the same FirstOnScene time.
How do I only get THE record, with it's DispatchDateTime and OnSceneDateTime, and not the row of the trip that was on scene second?
Here are a few example rows from the table:
2016-01-01 0169-a 150 Main St 2016-01-01 16:52 2016-01-01 16:59 Truck 1
2016-01-01 0171-a 150 Main St 2016-01-01 16:53 2016-01-01 17:05 Truck 2
2016-01-01 0190-a 29 Spring St 2016-01-01 17:19 2016-01-01 17:30 Truck 5
2016-01-02 0111-a 8 Fist St 2016-01-02 09:30 2016-01-02 09:40 Truck 1
2016-01-02 0112-a 8 Fist St 2016-01-02 09:32 2016-01-02 09:38 Truck 2
In the above examples I need to return the first, third, and last row of that data set.
Here is a total shot in the dark based on the sparse information provided. I don't really know what defines a given incident so you can adjust the partition accordingly.
with sortedValues as
(
select TripDate
, Job
, Vehicle
, OnSceneDateTime
, ROW_NUMBER() over(partition by Address, DispatchDateTime order by OnSceneDateTime desc) as RowNum
from Trips
)
select TripDate
, Job
, Vehicle
, OnSceneDateTime
from sortedValues
where RowNum = 1
You can just filter the rows down by selecting only the MIN OnSceneDateTime like below:
SELECT TripDate, Job, Vehicle, DispatchDateTime,OnSceneDateTime FirstOnScene
FROM Trips as AllTrips
WHERE AllTrips.OnSceneDateTime = (SELECT MIN(OnSceneDateTime)
FROM Trips as FirstOnScene
WHERE AllTrips.TripDate = FirstOnScene.TripDate
and AllTrips.Address = FirstOnScene.Address
)
How about use an ORDER BY on the OnSceneDateTime and then Limit 1. A simplified version like this:
SELECT TripDate, Job, Vehicle, DispatchDateTime, OnSceneDateTime FROM trips ORDER BY OnSceneDateTime LIMIT 1