I am working with a SQLite RDB and have the following problem.
PID EID EPISODETYPE START_TIME END_TIME
123 556 emergency_room 2020-03-29 15:09:00 2020-03-30 20:36:00
123 558 ward 2020-04-30 20:35:00 2020-05-04 22:12:00
123 660 ward 2020-05-04 22:12:00 2020-05-21 08:59:00
123 661 icu 2020-05-21 09:00:00 2020-07-01 17:00:00
Basically, PID represents each patient unique identifier. They all have an episode identifier for all the different beds they occupy during a unique stay.
What I wish to accomplish is to select all episodes from a single hospital stay and return it as the stay number.
I would want my query to result in this :
PID EID StayNumber
123 556 1
123 558 2
123 660 2
123 661 2
1 st row is StayNumber as it's the first.
As the 2nd, 3rd and 4th row are from the same hospital stay (we can tell by the overlapping OR relatively close start and end time period) they are all labeled StayNumber 2.
A hospital stay is defined as the period of time during which the patient never left the hospital.
I tried to write the query by starting off with a :
GROUP BY PID (to isolate the process for each individual patient)
Using datetime to compute a simple time difference rule but I have trouble writing down a query using the end time from a row and the start time from the next row.
Thank you in advance.
I am a SQL learner
UPDATE ***
Use window function LAG() to flag the groups for each hospital stay and window function SUM() to get the numbers:
SELECT PID, EID,
SUM(flag) OVER (PARTITION BY PID ORDER BY START_TIME) StayNumber
FROM (
SELECT *,
strftime('%s', START_TIME) -
strftime('%s', LAG(END_TIME, 1, datetime(START_TIME, '-1 hour')) OVER (PARTITION BY PID ORDER BY START_TIME)) > 60 flag
FROM tablename
)
See the demo.
Results:
|PID | EID | StayNumber
|:-- | :-- | ---------:
|123 | 556 | 1
|123 | 558 | 2
|123 | 660 | 2
|123 | 661 | 2
Related
Our accounting department needs pull tax data from our MIS every month and submit it online to the Dept. of Revenue. Unfortunately, when pulling the data, it is duplicated a varying number of times depending on which jurisdictions we have to pay taxes to. All she needs is the dollar amount for one jurisdiction, for one line, because she enters that on the website.
I've tried using DISTINCT to pull only one record of the type, in conjunction with LEFT() to pull just the first 7 characters of the jurisdiction but it ended up excluding certain results that should have been included. I believe it was because the posting date and the amount on a couple transactions was identical. They were separate transactions but the query took them as duplicates and ignored them.
Here is a couple of examples of queries I've run that have been successful in pulling most of the data, but most times either too much or not enough:
SELECT DISTINCT LEFT("Sales-Tax-Jurisdiction-Code", 7), "Taxable-Base", "Posting-Date"
FROM ARInvoiceTax
WHERE ("Posting-Date" >= '2019-09-01' AND "Posting-Date" <= '2019-09-30')
AND (("Sales-Tax-Jurisdiction-Code" BETWEEN '55001' AND '56763')
OR "Sales-Tax-Jurisdiction-Code" = 'Dakota Cty TT')
ORDER BY "Sales-Tax-Jurisdiction-Code"
Here is a query that I can to pull all of the data and the subsequent result is below that:
SELECT "Sales-Tax-Jurisdiction-Code", "Taxable-Base", "Posting-Date"
FROM ARInvoiceTax
WHERE ("Posting-Date" >= '2019-09-01' AND "Posting-Date" <= '2019-09-30')
AND (("Sales-Tax-Jurisdiction-Code" BETWEEN '55001' AND '56763')
OR "Sales-Tax-Jurisdiction-Code" = 'Dakota Cty TT')
ORDER BY "Sales-Tax-Jurisdiction-Code"
Below is a sample of the output:
Jurisdiction | Tax Amount | Posting Date
-------------|------------|-------------
5512100City | $50.00 | 2019-09-02
5512100City | $50.00 | 2019-09-03
5512100City | $70.00 | 2019-09-02
5512100Cnty | $50.00 | 2019-09-02
5512100Cnty | $50.00 | 2019-09-03
5512100Cnty | $70.00 | 2019-09-02
5512100State | $70.00 | 2019-09-02
5512100State | $50.00 | 2019-09-02
5512100State | $50.00 | 2019-09-03
5513100Cnty | $25.00 | 2019-09-12
5513100State | $25.00 | 2019-09-12
5514100City | $9.00 | 2019-09-06
5514100City | $9.00 | 2019-09-06
5514100Cnty | $9.00 | 2019-09-06
5514100Cnty | $9.00 | 2019-09-06
5515100State | $12.00 | 2019-09-11
5516100City | $6.00 | 2019-09-13
5516100City | $7.00 | 2019-09-13
5516100State | $6.00 | 2019-09-13
5516100State | $7.00 | 2019-09-13
As you can see, the data can be all over the place. One zip code could have multiple different lines. What the accounting department does now is prints a report with this information and, in a spreadsheet, only records (1) dollar amount per transaction. For example, for 55121, she would need to record $50.00, $50.00 and $70.00 (she tallies them and adds the total amount on the website) however the SQL query gives me those (3) numbers, (3) times.
I can't seem to figure out a query that will pull only one set of the data. Unfortunately, I can't do it based on the words/letters after the 00 because not all jurisdictions have all 3 (city, cnty, state) and thus trying to remove lines based on that removes valid lines as well.
Can you use select distinct? If the first five characters are the zip code and you just want that:
select distinct left(jurisdiction, 5), tax_amount
from t;
Take only City/County/.. whatever is first
select jurisdiction, tax_amount, Posting_Date
from (
select *, dense_rank() over(partition by left(jurisdiction, 7) order by substring(jurisdiction, 8, len(jurisdiction))) rnk
from taxes -- you output here
)
where rnk=1;
Sql server syntax, you may need other string functions in your dbms.
Postgresql fiddle
I have a table that stores data of customer care . The table/view has the following structure.
userid calls_received calls_answered calls_rejected call_date
-----------------------------------------------------------------------
1030 134 100 34 28-05-2018
1012 140 120 20 28-05-2018
1045 120 80 40 28-05-2018
1030 99 39 50 28-04-2018
1045 50 30 20 28-04-2018
1045 200 100 100 28-05-2017
1030 160 90 70 28-04-2017
1045 50 30 20 28-04-2017
This is the sample data. The data is stored on day basis.
I have to create a report in a report designer software that takes date as an input. When user selects a date for eg. 28/05/2018. This date is send as parameter ${call_date}. i have to query the view in such a way that result should look like as below. If user selects date 28/05/2018 then data of 28/04/2018 and 28/05/2017 should be displayed side by side as like the below column order.
userid | cl_cur | ans_cur | rej_cur |success_percentage |diff_percent|position_last_month| cl_last_mon | ans_las_mon | rej_last_mon |percentage_lm|cl_last_year | ans_last_year | rej_last_year
1030 | 134 | 100 | 34 | 74.6 % | 14% | 2 | 99 | 39 | 50 | 39.3% | 160 | 90 | 70
1045 | 120 | 80 | 40 | 66.6% | 26.7% | 1 | 50 | 30 | 20 | 60% | 50 | 30 | 20
The objective of this query is to show data of selected day, data of same day previous month and same day previous years in columns so that user can have a look and compare. Here the result is ordered by percentage(ans_cur/cl_cur) of selected day in descending order of calculated percentage and show under success_percentage.
The column position_last_month is the position of that particular employee in previous month when it is ordered in descending order of percentage. In this example userid 1030 was in 2nd position last month and userid 1045 in 1 st position last month. Similarly I have to calculate this also for year.
Also there is a field called diff_percent which calculates the difference of percentage between the person who where in same position last month.Same i have to do for last year. How i can achieve this result.Please help.
THIS ANSWERS THE ORIGINAL VERSION OF THE QUESTION.
One method is a join:
select t.user_id,
t.calls_received as cr_cur, t.calls_answered as ca_cur, t.calls_rejected as cr_cur,
tm.calls_received as cr_last_mon, tm.calls_answered as ca_last_mon, tm.calls_rejected as cr_last_mon,
ty.calls_received as cr_last_year, ty.calls_answered as ca_last_year, ty.calls_rejected as cr_last_year
from t left join
t tm
on tm.userid = t.userid and
tm.call_date = dateadd(month, -1, t.call_date) left join
t ty
on ty.userid = t.userid and
tm.call_date = dateadd(year, -1, t.call_date)
where t.call_date = ${call_date};
I'm having trouble with a complex 'ORDER BY' which is also a bit of a tricky one to explain (it may also not be possible!).
Let's say I have 2 customers who each raise 3 support tickets (into a SQL database) and later set a Priority for each one. The two result sets would look something like this (ordered by Priority):
USER | SUBJECT | PRIORITY | DATESTAMP
-------+---------+------------+----------------
THOMAS | Error A | Priority 1 | 20/11/2017 08:01
THOMAS | Error C | Priority 2 | 20/11/2017 10:30
THOMAS | Error B | Priority 3 | 20/11/2017 14:55
USER | SUBJECT | PRIORITY | DATESTAMP
-------+---------+------------+----------------
HENRY | Error B | Priority 1 | 20/11/2017 11:14
HENRY | Error A | Priority 2 | 20/11/2017 18:44
HENRY | Error C | Priority 3 | 20/11/2017 16:26
This is where it gets complicated (I think): let's say I then wanted to combine these lists to create a 'master list' of support tickets, ordered appropriately to create a task list for a technician. This ORDER BY would need to use the Datestamp to ensure earlier tickets were resolved first, but also honour the Priority numbers assigned by the users to address more important jobs before others. The resulting table should look something like this:
USER | SUBJECT | PRIORITY | DATESTAMP
-------+---------+------------+----------------
THOMAS | Error A | Priority 1 | 20/11/2017 08:01
THOMAS | Error C | Priority 2 | 20/11/2017 10:30
HENRY | Error B | Priority 1 | 20/11/2017 11:14
THOMAS | Error B | Priority 3 | 20/11/2017 14:55
HENRY | Error A | Priority 2 | 20/11/2017 18:44
HENRY | Error C | Priority 3 | 20/11/2017 16:26
Currently, I am struggling to achieve this.
If I order them by the Datetime field first this essentially ignores the Priority because all of the dates are different.
If I order them by the Priority field first, I get this:
USER | SUBJECT | PRIORITY | DATESTAMP
-------+---------+------------+----------------
THOMAS | Error A | Priority 1 | 20/11/2017 08:01
HENRY | Error B | Priority 1 | 20/11/2017 11:14
THOMAS | Error C | Priority 2 | 20/11/2017 10:30
HENRY | Error A | Priority 2 | 20/11/2017 18:44
THOMAS | Error B | Priority 3 | 20/11/2017 14:55
HENRY | Error C | Priority 3 | 20/11/2017 16:26
The problem here is that Henry's tickets are being bumped too high up the list--his Priority 1 should be after Thomas' Priorities 1 AND 2, because it was logged at a later time.
I feel like this can't be fixed by simply rearranging the order of the fields in the ORDER BY, if it can be done at all, but I can't think of a way around it. Is there some special multi-layered approach which can achieve this, or am I just being stupid? Thanks!
Okay i'm not perfectly sure with this one and i never used LAG before. It should return previous row value. I gave my explanation in the comments on the question. Quickly put : Order by datestamp, but if the priority for certain individual is not in chronological order, than change it to reflect priority first by putting older ticket with greater priority same datestamp as newer ticket this lesser priority.
Edit : LAG now use newdate
SELECT UserName, Subject, Priority, DateStamp
FROM T
INNER JOIN (
SELECT UserName, DateStamp,
CASE WHEN LAG(newdate, 1, DateStamp) OVER (Partition BY UserName ORDER BY Priority) > DateStamp
THEN LAG(newdate, 1, DateStamp) OVER (Partition By UserName ORDER BY Priority)
ELSE DateStamp END AS newdate
FROM T as x ) AS x ON x.UserName = t.UserName
ORDER BY x.newdate, T.Priority
The problem here is that, as stated, your criteria for sorting results in a comparison operator that is non-transitive. Consider:
Thomas | Priority 1 | 10:00
Thomas | Priority 2 | 08:00
Henry | Priority 1 | 09:00
Now, you want Thomas Priority 1 > Thomas Priority 2, but also Thomas Priority 2 > Henry Priority 1 because of the timestamps, AND Henry Priority 1 > Thomas Priority 1 because of timestamps, so you have A > B > C > A.
This means that there is no total ordering of the set of tickets, and, therefore, you cannot meaningfully sort them. Math here https://en.wikipedia.org/wiki/Total_order.
Order them by the "Due Date". I assume each of your priorties has some form of SLA associated with it. Add the SLA to the DATESTAMP to determine when it should be due. Ordering by this value ensures that your lower priority tickets will eventually come up in the queue, but still requires the higher priority tickets to be resolved quicker.
DECLARE #T TABLE
(
UserName VARCHAR(10),
Subject VARCHAR(10),
Priority INT,
DateStamp DATETIME
)
;
INSERT INTO #T (UserName,
Subject,
Priority,
DateStamp
)
VALUES ('THOMAS', 'Error A', 1, '20171120 08:01'),
('THOMAS', 'Error C', 2, '20171120 10:30'),
('THOMAS', 'Error B', 3, '20171120 14:55'),
('HENRY', 'Error B', 1, '20171120 11:14'),
('HENRY', 'Error A', 2, '20171120 18:44'),
('HENRY', 'Error C', 3, '20171120 16:26')
;
SELECT UserName,
Subject,
Priority,
DateStamp,
DueDate = CASE Priority
WHEN 1
THEN DATEADD( HOUR, 2, DateStamp )
WHEN 2
THEN DATEADD( HOUR, 12, DateStamp )
WHEN 3
THEN DATEADD( HOUR, 24, DateStamp )
ELSE DATEADD( HOUR, 72, DateStamp )
END
FROM #T
ORDER BY DueDate
This also makes it very easy to see the impact of your SLA threshholds.
UserName Subject Priority DateStamp DueDate
---------- ---------- ----------- ----------------------- -----------------------
THOMAS Error A 1 2017-11-20 08:01:00.000 2017-11-20 10:01:00.000
HENRY Error B 1 2017-11-20 11:14:00.000 2017-11-20 13:14:00.000
THOMAS Error C 2 2017-11-20 10:30:00.000 2017-11-20 22:30:00.000
HENRY Error A 2 2017-11-20 18:44:00.000 2017-11-21 06:44:00.000
THOMAS Error B 3 2017-11-20 14:55:00.000 2017-11-21 14:55:00.000
HENRY Error C 3 2017-11-20 16:26:00.000 2017-11-21 16:26:00.000
UPDATED
I've added some additional records and included some priority 4 values to better show how they will distribute.
UserName Subject Priority DateStamp DueDate
---------- ---------- ----------- ----------------------- -----------------------
HENRY Error A 2 2017-11-19 19:44:00.000 2017-11-20 07:44:00.000
THOMAS Error A 1 2017-11-20 08:01:00.000 2017-11-20 10:01:00.000
HENRY Error B 1 2017-11-20 11:14:00.000 2017-11-20 13:14:00.000
THOMAS Error C 2 2017-11-20 10:30:00.000 2017-11-20 22:30:00.000
HENRY Error D 2 2017-11-20 12:44:00.000 2017-11-21 00:44:00.000
THOMAS Error B 3 2017-11-20 14:55:00.000 2017-11-21 14:55:00.000
HENRY Error C 3 2017-11-20 16:26:00.000 2017-11-21 16:26:00.000
HENRY Error F 4 2017-11-18 18:44:00.000 2017-11-21 18:44:00.000
HENRY Error C 2 2017-11-21 08:44:00.000 2017-11-21 20:44:00.000
HENRY Error B 2 2017-11-21 18:44:00.000 2017-11-22 06:44:00.000
HENRY Error E 4 2017-11-19 18:44:00.000 2017-11-22 18:44:00.000
I am dealing with the following problem in SQL (using Vertica):
In short -- Create a timeline for each ID (in a table where I have multiple lines, orders in my example, per ID)
What I would like to achieve -- At my disposal I have a table on historical order date and I would like to compute new customer (first order ever in the past month), active customer- (>1 order in last 1-3 months), passive customer- (no order for last 3-6 months) and inactive customer (no order for >6 months) rates.
Which steps I have taken so far -- I was able to construct a table similar to the example presented below:
CustomerID Current order date Time between current/previous order First order date (all-time)
001 2015-04-30 12:06:58 (null) 2015-04-30 12:06:58
001 2015-09-24 17:30:59 147 05:24:01 2015-04-30 12:06:58
001 2016-02-11 13:21:10 139 19:50:11 2015-04-30 12:06:58
002 2015-10-21 10:38:29 (null) 2015-10-21 10:38:29
003 2015-05-22 12:13:01 (null) 2015-05-22 12:13:01
003 2015-07-09 01:04:51 47 12:51:50 2015-05-22 12:13:01
003 2015-10-23 00:23:48 105 23:18:57 2015-05-22 12:13:01
A little bit of intuition: customer 001 placed three orders from which the second one was 147 days after its first order. Customer 002 has only placed one order in total.
What I think that the next steps should be -- I would like to know for each date (also dates on which a certain user did not place an order), for each CustomerID, how long it has been since his/her last order. This would imply that I would create some sort of timeline for each CustomerID. In the example presented above I would get 287 (days between 1st of May 2015 and 11th of February 2016, the timespan of this table) lines for each CustomerID. I have difficulties solving this previous step. When I have performed this step I want to create a field which shows at each date the last order date, the period between the last order date and the current date, and what state someone is in at the current date. For the example presented earlier, this would look something like this:
CustomerID Last order date Current date Time between current date /last order State
001 2015-04-30 12:06:58 2015-05-01 00:00:00 0 00:00:00 New
...
001 2015-04-30 12:06:58 2015-06-30 00:00:00 60 11:53:02 Active
...
001 2015-09-24 17:30:59 2016-02-01 00:00:00 129 11:53:02 Passive
...
...
002 2015-10-21 17:30:59 2015-10-22 00:00:00 0 06:29:01 New
...
002 2015-10-21 17:30:59 2015-11-30 00:00:00 39 06:29:01 Active
...
...
003 2015-05-22 12:13:01 2015-06-23 00:00:00 31 11:46:59 Active
...
003 2015-07-09 01:04:51 2015-10-22 00:00:00 105 11:46:59 Inactive
...
At the dots there should be all the inbetween dates but for sake of space I have left these out of the table.
When I know for each date what the state is of each customer (active/passive/inactive) my plan is to sum the states and group by date which should give me the sum of new, active, passive and inactive customers. From here on I can easily compute the rates at each date.
Anybody that knows how I can possibly achieve this task?
Note -- If anyone has other ideas how to achieve the goal presented above (using some other approach compared to the approach I had in mind) please let me know!
EDIT
Suppose you start from a table like this:
SQL> select * from ord order by custid, ord_date ;
custid | ord_date
--------+---------------------
1 | 2015-04-30 12:06:58
1 | 2015-09-24 17:30:59
1 | 2016-02-11 13:21:10
2 | 2015-10-21 10:38:29
3 | 2015-05-22 12:13:01
3 | 2015-07-09 01:04:51
3 | 2015-10-23 00:23:48
(7 rows)
You can use Vertica's Timeseries Analytic Functions TS_FIRST_VALUE(), TS_LAST_VALUE() to fill gaps and interpolate last_order date to the current date:
Then you just have to join this with a Vertica's TimeSeries generated from the same table with interval one day starting from the first day each customer did place his/her first order up to now (current_date):
select
custid,
status_dt,
last_order_dt,
case
when status_dt::date - last_order_dt::date < 30 then case
when nord = 1 then 'New' else 'Active' end
when status_dt::date - last_order_dt::date < 90 then 'Active'
when status_dt::date - last_order_dt::date < 180 then 'Passive'
else 'Inactive'
end as status
from (
select
custid,
last_order_dt,
status_dt,
conditional_true_event (first_order_dt is null or
last_order_dt > lag(last_order_dt))
over(partition by custid order by status_dt) as nord
from (
select
custid,
ts_first_value(ord_date) as first_order_dt ,
ts_last_value(ord_date) as last_order_dt ,
dt::date as status_dt
from
( select custid, ord_date from ord
union all
select distinct(custid) as custid, current_date + 1 as ord_date from ord
) z timeseries dt as '1 day' over (partition by custid order by ord_date)
) x
) y
where status_dt <= current_date
order by 1, 2
;
And you will get something like this:
custid | status_dt | last_order_dt | status
--------+------------+---------------------+---------
1 | 2015-04-30 | 2015-04-30 12:06:58 | New
1 | 2015-05-01 | 2015-04-30 12:06:58 | New
1 | 2015-05-02 | 2015-04-30 12:06:58 | New
...
1 | 2015-05-29 | 2015-04-30 12:06:58 | New
1 | 2015-05-30 | 2015-04-30 12:06:58 | Active
1 | 2015-05-31 | 2015-04-30 12:06:58 | Active
...
etc.
Below table includes non-unique id, money value and dates/times.
id_1 value_1 value_time id_version Version_time
138 250 09-SEP-14 595 02-SEP-14
140 250 15-SEP-14 695 01-AUG-14
140 300 30-DEC-14 720 05-NOV-14
233 250 01-JUN-15 800 16-MAY-15
As you can see id_1, id_version and time columns can change in table but value_1 may stay the same.
I know that if id_1 is same in rows, value_1 can only change according to id_version. But there are too many id_version in the table. And I know that it changes according to id_version, but i don't know the exact change time of it.
So firstly I have to decide, which id_version and id_version time cause the value change group by id_1.
But again id_1 is not uniqe, and id may change but value stays the same :)
editor: From OP's comment - Begin
Here is the desired result example i want to get the first and second row not the third and fourth row.
| 140 | 250 | 15-SEP-14 | 695 | 01-AUG-14 |
| 140 | 300 | 31-DEC-14 | 725 | 07-NOV-14 |
| 140 | 300 | 05-JAN-14 | 740 | 30-NOV-14 |
| 140 | 300 | 30-DEC-14 | 720 | 05-NOV-14 |
editor: From OP's comment - End
Thanks in advance really need help in this situation.
Based on the input given so far (and processing just the data in the linked to picture - rather than the one in the current example data), the following should help to get you started:
SELECT
TMin.id_1
, TMin.value_1
, TO_CHAR(TAll.value_time, 'DD-MON-RR') value_time
, TMin.id_version
, TO_CHAR(TMin.version_time, 'DD-MON-RR') version_time
FROM
(SELECT
id_1
, value_1
, MIN(id_version) id_version
, MIN(version_time) version_time
FROM T
GROUP BY id_1, value_1
ORDER BY id_1, value_1
) TMin
JOIN T TAll
ON TMin.id_1 = TAll.id_1
AND TMin.value_1 = TAll.value_1
AND TMin.id_version = TAll.id_version
AND TMin.version_time = TAll.version_time
ORDER BY TMin.id_1, TMin.value_1
;
See it in action: SQL Fiddle.
Please comment, if and as this requires adjustment / further detail.