Creating the SQL script that tracks progress of an object - sql

I have a table with this schema:
Fruit Truck ID Bucket ID Date
------ ----- --------- ----------
Apple 1 101 2018/04/01
Apple 1 101 2018/04/10
Apple 1 112 2018/04/16
Apple 2 782 2018/08/18
Apple 2 782 2018/09/12
Apple 1 113 2019/09/12
Apple 1 113 2019/09/21
My goal is to write an SQL script that returns the start and end dates of each truck & bucket pair for each fruit. The intended result is below:
Fruit Truck ID Bucket ID Start Date End Date
------ ----- --------- ---------- ----------
Apple 1 101 2018/04/01 2018/04/16
Apple 1 112 2018/04/16 2018/08/18
Apple 2 782 2018/08/18 2018/09/12
Apple 1 113 2019/09/12 2019/09/21
I have tried solving this through lag/lead window functions, but it the dates are not correct. Is there another method of solving this using window functions or do I have to create sub queries for this?

I think you want aggregation and window functions:
select fruit, truck_id, bucket_id,
min(date) start_date,
lead(min(date), 1, max(date)) over(partition by fuit order by min(date)) end_date
from mytable
group by fruit, truck_id, bucket_id

Related

Query Last data group by column

I have this data;
date owner p.code product
---- ----- ----- ------
21.08.2020 Micheal 5 apple
22.08.2020 Micheal 5 apple
15.08.2020 George 4 biscuit
14.08.2020 George 4 biscuit
10.08.2020 Micheal 4 biscuit
23.08.2020 Alice 2 pear
15.08.2020 Alice 2 pear
14.08.2020 Micheal 2 pear
11.08.2020 Micheal 2 pear
I want to group them trought to product and show last date and last owner.
like this ;
date owner p.code product
---- ----- ------ ------
22.08.2020 Micheal 5 apple
15.08.2020 George 4 biscuit
23.08.2020 Alice 2 pear
In Oracle, you can phrase this using group by:
select product, code,
max(date) as max_date,
max(owner) keep (dense_rank first order by date desc) as owner_at_max_date
from t
group by product, code;
The keep syntax is Oracle's rather verbose way of implementing a first() aggregation function.
You can use window functions:
select *
from (
select t.*, row_number() over(partition by product order by date desc) rn
from mytable t
) t
where rn = 1

Oracle SQL -- selecting specific data from multiple rows into one row

I'm trying to select data across multiple rows into one row.
For example, with this data set:
NAME THING DATE
----- ------ ------
JACK 1 EARLY
JACK 2 LATER
JACK 3 NOW
JANE 1 LATER
JANE 2 EARLY
JANE 3 NOW
I want to produce the following result:
NAME THING DATE
---- ---- -----
JACK 1, 2, 3 NOW
JANE 1, 2, 3 NOW
And so, I know i can use the LISTAGG function to combine the "Thing" rows, but my biggest question is how to select across multiple rows to get the "NOW" values in the date field.
Any help would be appreciated. Thanks!
It isn't clear if you want the latest date for any thing (ordering the aggregated things either by date or by their own values):
select name,
listagg(thing, ',') within group (order by date_col) as things,
max(date_col) as now
from your_table
group by name
order by name;
or the date corresponding to the highest value of thing:
select name,
listagg(thing, ',') within group (order by thing) as things,
max(date_col) keep (dense_rank last order by thing) as now
from your_table
group by name
order by name;
As you said those are actually dates, with your sample data with added date values configured slightly differently for two names:
NAME THING DATE_COL
---- ---------- ----------
JACK 1 2019-01-01
JACK 2 2019-03-15
JACK 3 2019-04-30
JANE 1 2019-02-01
JANE 2 2019-05-03
JANE 3 2019-04-02
the first query gets:
NAME THINGS NOW
---- --------------- ----------
JACK 1,2,3 2019-04-30
JANE 1,3,2 2019-05-03
and the second query gets:
NAME THINGS NOW
---- --------------- ----------
JACK 1,2,3 2019-04-30
JANE 1,2,3 2019-04-02
db<>fiddle

SQL: Identify distinct blocks of treatment over multiple start and end date ranges for each member

Objective: Identify distinct episodes of continuous treatment for each member in a table. Each member has a diagnosis and a service date, and an episode is defined as all services where the time between each consecutive service is less than some number (let's say 90 days for this example). The query will need to loop through each row and calculate the difference between dates, and return the first and last date associated with each episode. The goal is to group results by member and episode start/end date.
A very similar question has been asked before, and was somewhat helpful. The problem is that in customizing the code, the returned tables are excluding first and last records. I'm not sure how to proceed.
My data currently looks like this:
MemberCode Diagnosis ServiceDate
1001 ----- ABC ----- 2010-02-04
1001 ----- ABC ----- 2010-03-20
1001 ----- ABC ----- 2010-04-18
1001 ----- ABC ----- 2010-05-22
1001 ----- ABC ----- 2010-09-26
1001 ----- ABC ----- 2010-10-11
1001 ----- ABC ----- 2010-10-19
2002 ----- XYZ ----- 2010-07-10
2002 ----- XYZ ----- 2010-07-21
2002 ----- XYZ ----- 2010-11-08
2002 ----- ABC ----- 2010-06-03
2002 ----- ABC ----- 2010-08-13
In the above data, the first record for Member 1001 is 2010-02-04, and there is not a difference of more than 90 days between consecutive services until 2010-09-26 (the date at which a new episode starts). So Member 1001 has two distinct episodes: (1) Diagnosis ABC, which goes from 2010-02-04 to 2010-05-22, and (2) Diagnosis ABC, which goes from 2010-09-26 to 2010-10-19.
Similarly, Member 2002 has three distinct episodes: (1) Diagnosis XYZ, which goes from 2010-07-10 to 2010-07-21, (2) Diagnosis XYZ, which begins and ends on 2010-11-08, and (3) Diagnosis ABC, which goes from 2010-06-03 to 2010-08-13.
Desired output:
MemberCode Diagnosis EpisodeStartDate EpisodeEndDate
1001 ----- ABC ----- 2010-02-04 ----- 2010-05-22
1001 ----- ABC ----- 2010-09-26 ----- 2010-10-19
2002 ----- XYZ ----- 2010-07-10 ----- 2010-07-21
2002 ----- XYZ ----- 2010-11-08 ----- 2010-11-08
2002 ----- ABC ----- 2010-06-03 ----- 2010-08-13
I've been working on this query for too long, and still can't get exactly what I need. Any help would be appreciated. Thanks in advance!
SQL Server 2012 has the lag() and cumulative sum functions, which makes it easier to write such a query. The idea is to find the first in each sequence. Then take the cumulative sum of the first flag to identify each group. Here is the code:
select MemberId, Diagnosis, min(ServiceDate) as EpisodeStartDate,
max(ServiceStartDate) as EpisodeEndDate
from (select t.*, sum(ServiceStartFlag) over (partition by MemberId, Diagnosis order by ServiceDate) as grp
from (select t.*,
(case when datediff(day,
lag(ServiceDate) over (partition by MemberId, Diagnosis
order by ServiceDate),
ServiceDate) < 90
then 0
else 1 -- handles both NULL and >= 90
end) as ServiceStartFlag
from table t
) t
group by grp, MemberId, Diagnosis;
You can do this in earlier versions of SQL Server but the code is more cumbersome.
For versions of SQL Server prior to 2012, here's some code snippets that should work.
First, you'll need a temp table (as opposed to a CTE, as the lookup of the edge event will fire the newid() function again, rather than retriving the value for that row)
DECLARE #Edges TABLE (MemberCode INT, Diagnosis VARCHAR(3), ServiceDate DATE, GroupID VARCHAR(40))
INSERT INTO #Edges
SELECT *
FROM Treatments E
CROSS APPLY (
SELECT
CASE
WHEN EXISTS (
SELECT TOP 1 E2.ServiceDate
FROM Treatments E2
WHERE E.MemberCode = E2.MemberCode
AND E.Diagnosis = E2.Diagnosis
AND E.ServiceDate > E2.ServiceDate
AND DATEDIFF(dd,E2.ServiceDate,E.ServiceDate) BETWEEN 1 AND 90
ORDER BY E2.ServiceDate DESC
) THEN 'Group'
ELSE CAST(NEWID() AS VARCHAR(40))
END AS GroupID
) z
The EXISTS operator contains a query that looks into the past for a date between 1 and 90 days ago. Once the Edge cases are gathered, this query will provide the results you posted as desired from the test data you posted.
SELECT MemberCode, Diagnosis, MIN(ServiceDate) AS StartDate, MAX(ServiceDate) AS EndDate
FROM (
SELECT
MemberCode
, Diagnosis
, ServiceDate
, CASE GroupID
WHEN 'Group' THEN (
SELECT TOP 1 GroupID
FROM #Edges E2
WHERE E.MemberCode = E2.MemberCode
AND E.Diagnosis = E2.Diagnosis
AND E.ServiceDate > E2.ServiceDate
AND GroupID != 'Group'
ORDER BY ServiceDate DESC
)
ELSE GroupID END AS GroupID
FROM #Edges E
) Z
GROUP BY MemberCode, Diagnosis, GroupID
ORDER BY MemberCode, Diagnosis, MIN(ServiceDate)
Like Gordon said, more cumbersome, but it can be done if your server is not SQL 2012 or greater.

SQL sort that distributes results

Given a table of products like this:
ID Name Seller ID Updated at
-- ---- --------- ----------
1 First 3 2012-01-01 12:00:10
2 Second 3 2012-01-01 12:00:09
3 Third 4 2012-01-01 12:00:08
4 Fourth 4 2012-01-01 12:00:07
5 Fifth 5 2012-01-01 12:00:06
I want to construct a query to sort the products like this:
ID
---
1
3
5
2
4
In other words, the query should show most recently updated products, distributed by seller to minimize the likelihood of continuous sequences of products from the same seller.
Any ideas on how to best accomplish this? (Note that the code for this application is Ruby, but I'd like to do this in pure SQL if possible).
EDIT:
Note that the query should handle this case, too:
ID Name Seller ID Updated at
-- ---- --------- ----------
1 First 3 2012-01-01 12:00:06
2 Second 3 2012-01-01 12:00:07
3 Third 4 2012-01-01 12:00:08
4 Fourth 4 2012-01-01 12:00:09
5 Fifth 5 2012-01-01 12:00:10
to produce the following results:
ID
---
5
4
2
3
1
One option demonstrated in this sqlfiddle is
select subq.*
from (
select rank() over (partition by seller_id order by updated_at desc) rnk,
p.*
from products p) subq
order by rnk, updated_at desc;

DB2 SQL SUM and GROUPING

I am having problems with querying and grouping.
I am needing the following output:
officr, cbal, sname
ABC, 500.00, TOM JONES
ABC, 200.00, SUE JONES
ABC TOTAL 700.00
RAR, 100.10, JOE SMITH
RAR, 200.05, MILES SMITH
RAR TOTAL 300.15
SQL below produces the error:
[DB2 for i5/OS]SQL0122 - Column SNAME or expression in SELECT list not valid.
SELECT
lnmast.officr, SUM(LNMAST.CBAL), lnmast.sname
FROM
LNMAST
WHERE LNMAST.RATCOD IN (6,7,8) AND STATUS NOT IN ('2','8')
group by lnmast.officr
GROUP BY GROUPING SETS is a POWERFUL tool for grouping/cubing data. It lets you combine non-aggregated data with aggregated data in one query result.
SELECT lnmast.officr, SUM(LNMAST.CBAL), lnmast.sname
FROM LNMAST
WHERE LNMAST.RATCOD IN (6,7,8)
AND STATUS NOT IN ('2','8')
GROUP BY GROUPING SETS ((lnmast.officr, lnmast.sname),(lnmast.officr))
An example from IBM DOCS: www.ibm.com/support/knowledgecenter/en/... :
SELECT WEEK(SALES_DATE) AS WEEK,
DAYOFWEEK(SALES_DATE) AS DAY_WEEK,
SALES_PERSON, SUM(SALES) AS UNITS_SOLD
FROM SALES
WHERE WEEK(SALES_DATE) = 13
GROUP BY GROUPING SETS ( (WEEK(SALES_DATE), SALES_PERSON),
(DAYOFWEEK(SALES_DATE), SALES_PERSON))
ORDER BY WEEK, DAY_WEEK, SALES_PERSON
This results in:
WEEK DAY_WEEK SALES_PERSON UNITS_SOLD
----------- ----------- --------------- -----------
13 - GOUNOT 32
13 - LEE 33
13 - LUCCHESSI 8
- 6 GOUNOT 11
- 6 LEE 12
- 6 LUCCHESSI 4
- 7 GOUNOT 21
- 7 LEE 21
- 7 LUCCHESSI 4