Using ORDER BY when merging prioritised records from different users - sql

I'm having trouble with a complex 'ORDER BY' which is also a bit of a tricky one to explain (it may also not be possible!).
Let's say I have 2 customers who each raise 3 support tickets (into a SQL database) and later set a Priority for each one. The two result sets would look something like this (ordered by Priority):
USER | SUBJECT | PRIORITY | DATESTAMP
-------+---------+------------+----------------
THOMAS | Error A | Priority 1 | 20/11/2017 08:01
THOMAS | Error C | Priority 2 | 20/11/2017 10:30
THOMAS | Error B | Priority 3 | 20/11/2017 14:55
USER | SUBJECT | PRIORITY | DATESTAMP
-------+---------+------------+----------------
HENRY | Error B | Priority 1 | 20/11/2017 11:14
HENRY | Error A | Priority 2 | 20/11/2017 18:44
HENRY | Error C | Priority 3 | 20/11/2017 16:26
This is where it gets complicated (I think): let's say I then wanted to combine these lists to create a 'master list' of support tickets, ordered appropriately to create a task list for a technician. This ORDER BY would need to use the Datestamp to ensure earlier tickets were resolved first, but also honour the Priority numbers assigned by the users to address more important jobs before others. The resulting table should look something like this:
USER | SUBJECT | PRIORITY | DATESTAMP
-------+---------+------------+----------------
THOMAS | Error A | Priority 1 | 20/11/2017 08:01
THOMAS | Error C | Priority 2 | 20/11/2017 10:30
HENRY | Error B | Priority 1 | 20/11/2017 11:14
THOMAS | Error B | Priority 3 | 20/11/2017 14:55
HENRY | Error A | Priority 2 | 20/11/2017 18:44
HENRY | Error C | Priority 3 | 20/11/2017 16:26
Currently, I am struggling to achieve this.
If I order them by the Datetime field first this essentially ignores the Priority because all of the dates are different.
If I order them by the Priority field first, I get this:
USER | SUBJECT | PRIORITY | DATESTAMP
-------+---------+------------+----------------
THOMAS | Error A | Priority 1 | 20/11/2017 08:01
HENRY | Error B | Priority 1 | 20/11/2017 11:14
THOMAS | Error C | Priority 2 | 20/11/2017 10:30
HENRY | Error A | Priority 2 | 20/11/2017 18:44
THOMAS | Error B | Priority 3 | 20/11/2017 14:55
HENRY | Error C | Priority 3 | 20/11/2017 16:26
The problem here is that Henry's tickets are being bumped too high up the list--his Priority 1 should be after Thomas' Priorities 1 AND 2, because it was logged at a later time.
I feel like this can't be fixed by simply rearranging the order of the fields in the ORDER BY, if it can be done at all, but I can't think of a way around it. Is there some special multi-layered approach which can achieve this, or am I just being stupid? Thanks!

Okay i'm not perfectly sure with this one and i never used LAG before. It should return previous row value. I gave my explanation in the comments on the question. Quickly put : Order by datestamp, but if the priority for certain individual is not in chronological order, than change it to reflect priority first by putting older ticket with greater priority same datestamp as newer ticket this lesser priority.
Edit : LAG now use newdate
SELECT UserName, Subject, Priority, DateStamp
FROM T
INNER JOIN (
SELECT UserName, DateStamp,
CASE WHEN LAG(newdate, 1, DateStamp) OVER (Partition BY UserName ORDER BY Priority) > DateStamp
THEN LAG(newdate, 1, DateStamp) OVER (Partition By UserName ORDER BY Priority)
ELSE DateStamp END AS newdate
FROM T as x ) AS x ON x.UserName = t.UserName
ORDER BY x.newdate, T.Priority

The problem here is that, as stated, your criteria for sorting results in a comparison operator that is non-transitive. Consider:
Thomas | Priority 1 | 10:00
Thomas | Priority 2 | 08:00
Henry | Priority 1 | 09:00
Now, you want Thomas Priority 1 > Thomas Priority 2, but also Thomas Priority 2 > Henry Priority 1 because of the timestamps, AND Henry Priority 1 > Thomas Priority 1 because of timestamps, so you have A > B > C > A.
This means that there is no total ordering of the set of tickets, and, therefore, you cannot meaningfully sort them. Math here https://en.wikipedia.org/wiki/Total_order.

Order them by the "Due Date". I assume each of your priorties has some form of SLA associated with it. Add the SLA to the DATESTAMP to determine when it should be due. Ordering by this value ensures that your lower priority tickets will eventually come up in the queue, but still requires the higher priority tickets to be resolved quicker.
DECLARE #T TABLE
(
UserName VARCHAR(10),
Subject VARCHAR(10),
Priority INT,
DateStamp DATETIME
)
;
INSERT INTO #T (UserName,
Subject,
Priority,
DateStamp
)
VALUES ('THOMAS', 'Error A', 1, '20171120 08:01'),
('THOMAS', 'Error C', 2, '20171120 10:30'),
('THOMAS', 'Error B', 3, '20171120 14:55'),
('HENRY', 'Error B', 1, '20171120 11:14'),
('HENRY', 'Error A', 2, '20171120 18:44'),
('HENRY', 'Error C', 3, '20171120 16:26')
;
SELECT UserName,
Subject,
Priority,
DateStamp,
DueDate = CASE Priority
WHEN 1
THEN DATEADD( HOUR, 2, DateStamp )
WHEN 2
THEN DATEADD( HOUR, 12, DateStamp )
WHEN 3
THEN DATEADD( HOUR, 24, DateStamp )
ELSE DATEADD( HOUR, 72, DateStamp )
END
FROM #T
ORDER BY DueDate
This also makes it very easy to see the impact of your SLA threshholds.
UserName Subject Priority DateStamp DueDate
---------- ---------- ----------- ----------------------- -----------------------
THOMAS Error A 1 2017-11-20 08:01:00.000 2017-11-20 10:01:00.000
HENRY Error B 1 2017-11-20 11:14:00.000 2017-11-20 13:14:00.000
THOMAS Error C 2 2017-11-20 10:30:00.000 2017-11-20 22:30:00.000
HENRY Error A 2 2017-11-20 18:44:00.000 2017-11-21 06:44:00.000
THOMAS Error B 3 2017-11-20 14:55:00.000 2017-11-21 14:55:00.000
HENRY Error C 3 2017-11-20 16:26:00.000 2017-11-21 16:26:00.000
UPDATED
I've added some additional records and included some priority 4 values to better show how they will distribute.
UserName Subject Priority DateStamp DueDate
---------- ---------- ----------- ----------------------- -----------------------
HENRY Error A 2 2017-11-19 19:44:00.000 2017-11-20 07:44:00.000
THOMAS Error A 1 2017-11-20 08:01:00.000 2017-11-20 10:01:00.000
HENRY Error B 1 2017-11-20 11:14:00.000 2017-11-20 13:14:00.000
THOMAS Error C 2 2017-11-20 10:30:00.000 2017-11-20 22:30:00.000
HENRY Error D 2 2017-11-20 12:44:00.000 2017-11-21 00:44:00.000
THOMAS Error B 3 2017-11-20 14:55:00.000 2017-11-21 14:55:00.000
HENRY Error C 3 2017-11-20 16:26:00.000 2017-11-21 16:26:00.000
HENRY Error F 4 2017-11-18 18:44:00.000 2017-11-21 18:44:00.000
HENRY Error C 2 2017-11-21 08:44:00.000 2017-11-21 20:44:00.000
HENRY Error B 2 2017-11-21 18:44:00.000 2017-11-22 06:44:00.000
HENRY Error E 4 2017-11-19 18:44:00.000 2017-11-22 18:44:00.000

Related

SQL - Period range in subgroups of a group by

I have the following dataset:
A
B
C
1
John
2018-08-14
1
John
2018-08-20
1
John
2018-09-03
2
John
2018-11-13
2
John
2018-12-11
2
John
2018-12-12
1
John
2020-01-20
1
John
2020-01-21
3
John
2021-03-02
3
John
2021-03-03
1
John
2020-05-10
1
John
2020-05-12
And I would like to have the following result:
A
B
C
1
John
2018-08-14
2
John
2018-11-13
1
John
2020-01-20
3
John
2021-03-02
1
John
2020-05-10
If I group by A, B the 1st row and the third just concatenate which is coherent. How could I create another columns to still use a group by and have the result I want.
If you have another ideas than mine, please explain it !
I tried to use some first, last, rank, dense_rank without success.
Use lag(). Looks like B is a function of A in your data. So checking lag(A) will suffice.
select A,B,C
from (
select *, case when lag(A) over(order by C) = A then 0 else 1 end startFlag
from mytable
) t
where startFlag = 1
order by C

SQL lite query - Merging time periods

I am working with a SQLite RDB and have the following problem.
PID EID EPISODETYPE START_TIME END_TIME
123 556 emergency_room 2020-03-29 15:09:00 2020-03-30 20:36:00
123 558 ward 2020-04-30 20:35:00 2020-05-04 22:12:00
123 660 ward 2020-05-04 22:12:00 2020-05-21 08:59:00
123 661 icu 2020-05-21 09:00:00 2020-07-01 17:00:00
Basically, PID represents each patient unique identifier. They all have an episode identifier for all the different beds they occupy during a unique stay.
What I wish to accomplish is to select all episodes from a single hospital stay and return it as the stay number.
I would want my query to result in this :
PID EID StayNumber
123 556 1
123 558 2
123 660 2
123 661 2
1 st row is StayNumber as it's the first.
As the 2nd, 3rd and 4th row are from the same hospital stay (we can tell by the overlapping OR relatively close start and end time period) they are all labeled StayNumber 2.
A hospital stay is defined as the period of time during which the patient never left the hospital.
I tried to write the query by starting off with a :
GROUP BY PID (to isolate the process for each individual patient)
Using datetime to compute a simple time difference rule but I have trouble writing down a query using the end time from a row and the start time from the next row.
Thank you in advance.
I am a SQL learner
UPDATE ***
Use window function LAG() to flag the groups for each hospital stay and window function SUM() to get the numbers:
SELECT PID, EID,
SUM(flag) OVER (PARTITION BY PID ORDER BY START_TIME) StayNumber
FROM (
SELECT *,
strftime('%s', START_TIME) -
strftime('%s', LAG(END_TIME, 1, datetime(START_TIME, '-1 hour')) OVER (PARTITION BY PID ORDER BY START_TIME)) > 60 flag
FROM tablename
)
See the demo.
Results:
|PID | EID | StayNumber
|:-- | :-- | ---------:
|123 | 556 | 1
|123 | 558 | 2
|123 | 660 | 2
|123 | 661 | 2

Payment fully received

We are raising bills to clients on various dates, and the payment is received in a irregular way. We need to calculate the payment delay days till full payments are received for a particular payment due. The data is sample data for only one client 0123
table Due (id, fil varchar(12), amount numeric(10, 2), date DATE)
table Received (id, fil varchar(12), amount numeric(10, 2), date DATE)
Table Due:
id fil amount date
----------------------------------------
1. 0123 1000. 2019-jan-01
2. 0123 1500 2019-jan-15
3. 0123 1200. 2019-jan-25
4. 0123 1800. 2019-feb-10
Table Received:
id. fil. amount. date
-----------------------------------------
1. 0123 1000. 2019-jan-10
2. 0123 500. 2019-jan-20
3. 0123 1300. 2019-jan-25
4. 0123 400. 2019-feb-08
5. 0123 1000. 2019-feb-20
The joined table should show:
fil. due_date due_amount. received_amount date delay
------------------------------------------------------------------------
0123 2019-jan-01 1000. 1000 9
0123 2019-jan-15. 1500. 500
0123 1300. 10(since payment completed on 25th jan)
0123 2019-jan-25. 1200. 400.
0123 1000. 26
0123 2019-feb-10. 1800.
I have tried to be as accurate as possible in calculations......Please excuse if there is some in advertant error. I was just coming around to writing a script to do this, but maybe someone will be able to suggest a proper join.
Thanks for trying..
As #DavidHempy said, this is not possible without knowing for which invoice each payment is meant. You can calculate how many days it's been since the account was at 0, which might help:
with all_activity as (
select due.date,
-1 * amount as amount
from due
union all
select received.date,
amount
from received),
totals as (
select date,
amount,
sum(amount) over (order by date),
case when sum(amount) over (order by date) >=0
then true
else false
end as nothing_owed
from all_activity)
select date,
amount,
sum,
date - max(date) filter (where nothing_owed = true) OVER (order by date)
as days_since_positive
from totals order by 1
;
date | amount | sum | days_since_positive
------------+----------+----------+---------------------
2019-01-01 | -1000.00 | -1000.00 |
2019-01-10 | 1000.00 | 0.00 | 0
2019-01-15 | -1500.00 | -1500.00 | 5
2019-01-20 | 500.00 | -1000.00 | 10
2019-01-25 | -1200.00 | -900.00 | 15
2019-01-25 | 1300.00 | -900.00 | 15
2019-02-08 | 400.00 | -500.00 | 29
2019-02-10 | -1800.00 | -2300.00 | 31
2019-02-20 | 1000.00 | -1300.00 | 41
(9 rows)
You could extend this logic to figure out the last due date from which they were above 0.

SQL - Creating a timeline for each ID (Vertica)

I am dealing with the following problem in SQL (using Vertica):
In short -- Create a timeline for each ID (in a table where I have multiple lines, orders in my example, per ID)
What I would like to achieve -- At my disposal I have a table on historical order date and I would like to compute new customer (first order ever in the past month), active customer- (>1 order in last 1-3 months), passive customer- (no order for last 3-6 months) and inactive customer (no order for >6 months) rates.
Which steps I have taken so far -- I was able to construct a table similar to the example presented below:
CustomerID Current order date Time between current/previous order First order date (all-time)
001 2015-04-30 12:06:58 (null) 2015-04-30 12:06:58
001 2015-09-24 17:30:59 147 05:24:01 2015-04-30 12:06:58
001 2016-02-11 13:21:10 139 19:50:11 2015-04-30 12:06:58
002 2015-10-21 10:38:29 (null) 2015-10-21 10:38:29
003 2015-05-22 12:13:01 (null) 2015-05-22 12:13:01
003 2015-07-09 01:04:51 47 12:51:50 2015-05-22 12:13:01
003 2015-10-23 00:23:48 105 23:18:57 2015-05-22 12:13:01
A little bit of intuition: customer 001 placed three orders from which the second one was 147 days after its first order. Customer 002 has only placed one order in total.
What I think that the next steps should be -- I would like to know for each date (also dates on which a certain user did not place an order), for each CustomerID, how long it has been since his/her last order. This would imply that I would create some sort of timeline for each CustomerID. In the example presented above I would get 287 (days between 1st of May 2015 and 11th of February 2016, the timespan of this table) lines for each CustomerID. I have difficulties solving this previous step. When I have performed this step I want to create a field which shows at each date the last order date, the period between the last order date and the current date, and what state someone is in at the current date. For the example presented earlier, this would look something like this:
CustomerID Last order date Current date Time between current date /last order State
001 2015-04-30 12:06:58 2015-05-01 00:00:00 0 00:00:00 New
...
001 2015-04-30 12:06:58 2015-06-30 00:00:00 60 11:53:02 Active
...
001 2015-09-24 17:30:59 2016-02-01 00:00:00 129 11:53:02 Passive
...
...
002 2015-10-21 17:30:59 2015-10-22 00:00:00 0 06:29:01 New
...
002 2015-10-21 17:30:59 2015-11-30 00:00:00 39 06:29:01 Active
...
...
003 2015-05-22 12:13:01 2015-06-23 00:00:00 31 11:46:59 Active
...
003 2015-07-09 01:04:51 2015-10-22 00:00:00 105 11:46:59 Inactive
...
At the dots there should be all the inbetween dates but for sake of space I have left these out of the table.
When I know for each date what the state is of each customer (active/passive/inactive) my plan is to sum the states and group by date which should give me the sum of new, active, passive and inactive customers. From here on I can easily compute the rates at each date.
Anybody that knows how I can possibly achieve this task?
Note -- If anyone has other ideas how to achieve the goal presented above (using some other approach compared to the approach I had in mind) please let me know!
EDIT
Suppose you start from a table like this:
SQL> select * from ord order by custid, ord_date ;
custid | ord_date
--------+---------------------
1 | 2015-04-30 12:06:58
1 | 2015-09-24 17:30:59
1 | 2016-02-11 13:21:10
2 | 2015-10-21 10:38:29
3 | 2015-05-22 12:13:01
3 | 2015-07-09 01:04:51
3 | 2015-10-23 00:23:48
(7 rows)
You can use Vertica's Timeseries Analytic Functions TS_FIRST_VALUE(), TS_LAST_VALUE() to fill gaps and interpolate last_order date to the current date:
Then you just have to join this with a Vertica's TimeSeries generated from the same table with interval one day starting from the first day each customer did place his/her first order up to now (current_date):
select
custid,
status_dt,
last_order_dt,
case
when status_dt::date - last_order_dt::date < 30 then case
when nord = 1 then 'New' else 'Active' end
when status_dt::date - last_order_dt::date < 90 then 'Active'
when status_dt::date - last_order_dt::date < 180 then 'Passive'
else 'Inactive'
end as status
from (
select
custid,
last_order_dt,
status_dt,
conditional_true_event (first_order_dt is null or
last_order_dt > lag(last_order_dt))
over(partition by custid order by status_dt) as nord
from (
select
custid,
ts_first_value(ord_date) as first_order_dt ,
ts_last_value(ord_date) as last_order_dt ,
dt::date as status_dt
from
( select custid, ord_date from ord
union all
select distinct(custid) as custid, current_date + 1 as ord_date from ord
) z timeseries dt as '1 day' over (partition by custid order by ord_date)
) x
) y
where status_dt <= current_date
order by 1, 2
;
And you will get something like this:
custid | status_dt | last_order_dt | status
--------+------------+---------------------+---------
1 | 2015-04-30 | 2015-04-30 12:06:58 | New
1 | 2015-05-01 | 2015-04-30 12:06:58 | New
1 | 2015-05-02 | 2015-04-30 12:06:58 | New
...
1 | 2015-05-29 | 2015-04-30 12:06:58 | New
1 | 2015-05-30 | 2015-04-30 12:06:58 | Active
1 | 2015-05-31 | 2015-04-30 12:06:58 | Active
...
etc.

How to find the first column in an ACCESS table row with a value not Zero of Blank?

I have a table which includes a column of each month (Jan-13, Feb-13, Mar-13, etc) for a period of four years. I need to find the first column in each row that has a value other than "0" (zero). Then I will need to find the last column with a value other than zero.
The query will let me know the start month and the end month of a resource. I have written an Excel formula but now I need to convert the same functionality to Access. When I find each of the columns I need to retrieve the column heading. Could anyone help me with the SQL for my query?
The report based on the query would be
ResourceName, StartDate EndDate
Bob Sample Apr-13 Apr-15
There are actual two tables involved. The Resource table with all the information for the individuals and a forecast table which has the months as columns and the resource id and task for rows. For each month an individual is forecast to work a given percent of their time. We are not concerned about the actual dates (11/20/201) the individual starts, just the month. So a resource for task 1 is forecast to work .5 percent of their time in Nov-13, which would be the first month that the resource work. Then that resource may be forecast to work at the same level for the next nine months. So the column for Aug-14 would be the last month with the .5 value. After that all columns contains zeros.
(I will limit my example to six months because I'm lazy.)
So we have some "wide" data in a table named [Forecast]:
ResourceID Jul-13 Aug-13 Sep-13 Oct-13 Nov-13 Dec-13
---------- ------ ------ ------ ------ ------ ------
1 0 0.5 1 0.5 0 0
2 0 0 2 0 0 0
3 0 0 3 4 0 0
Start by creating a saved query in Access named [ForecastUnpivoted] to convert the "short wide" data into "long skinny" data:
SELECT ResourceID, "2013-07" AS forecastMonth, [Jul-13] AS forecastValue
FROM Forecast
UNION ALL
SELECT ResourceID, "2013-08" AS forecastMonth, [Aug-13] AS forecastValue
FROM Forecast
UNION ALL
SELECT ResourceID, "2013-09" AS forecastMonth, [Sep-13] AS forecastValue
FROM Forecast
UNION ALL
SELECT ResourceID, "2013-10" AS forecastMonth, [Oct-13] AS forecastValue
FROM Forecast
UNION ALL
SELECT ResourceID, "2013-11" AS forecastMonth, [Nov-13] AS forecastValue
FROM Forecast
UNION ALL
SELECT ResourceID, "2013-12" AS forecastMonth, [Dec-13] AS forecastValue
FROM Forecast
which returns
ResourceID forecastMonth forecastValue
---------- ------------- -------------
1 2013-07 0
2 2013-07 0
3 2013-07 0
1 2013-08 0.5
2 2013-08 0
3 2013-08 0
1 2013-09 1
2 2013-09 2
3 2013-09 3
1 2013-10 0.5
2 2013-10 0
3 2013-10 4
1 2013-11 0
2 2013-11 0
3 2013-11 0
1 2013-12 0
2 2013-12 0
3 2013-12 0
Now we can use Min() and Max() to give us the start end end dates for each resource
SELECT
ResourceID,
Min(forecastMonth) AS StartMonth,
Max(forecastMonth) AS EndMonth
FROM ForecastUnpivoted
WHERE forecastValue <> 0
GROUP BY ResourceID
That gives us
ResourceID StartMonth EndMonth
---------- ---------- --------
1 2013-08 2013-10
2 2013-09 2013-09
3 2013-09 2013-10
In a relational database, you would not store your data like this. Instead, you would have 2 tables:
Your resources
The months in which a resource does work (for a task)
Example:
Table 1 resources:
ID | Resource_Name
1 | Mr. A
2 | Mrs. B
Table 2 Forecast:
FC_ID | FC_Month | FC_Resource_ID | FC_Task | FC_Percentage
1 | 2013-10-01 | 1 | actiontask! | 0.05
2 | 2013-11-01 | 1 | actiontask! | 0.10
3 | 2013-12-01 | 1 | actiontask! | 0.05
4 | 2013-07-01 | 2 | boring task | 0.3
5 | 2013-08-01 | 2 | boring task | 0.25
6 | 2013-09-01 | 2 | boring task | 0.3
7 | 2013-10-01 | 2 | boring task | 0.1
You can then request the start and end date using SQL:
SELECT Resource_Name, Min(FC_Month) AS colMin, Max(FC_Month) As colMax
FROM tblForecast INNER JOIN tblResources ON FC_Resource_ID=Resource_ID
GROUP BY FC_Resource_ID, Resource_Name
The result of this example will be:
Mr. A | 2013-10-01 | 2013-12-01
Mrs. B | 2013-07-01 | 2013-10-01