count trip in sql server - sql

my table structure
id zoneid status
1 35 IN starting zone
2 35 OUT 1st trip has been started
3 36 IN
4 36 IN
5 36 OUT
6 38 IN last station zone 1 trip completed
7 38 OUT returning back 2nd trip has start
8 38 OUT
9 36 IN
10 36 OUT
11 35 IN when return back in start zone means 2nd trip complete
12 35 IN
13 35 IN
14 35 OUT 3rd trip has been started
15 36 IN
16 36 IN
17 36 OUT
18 38 IN 3rd trip has been completed
19 38 OUT 4th trip has been started
20 38 OUT
21 36 IN
22 36 OUT
23 35 IN 4th trip completed
24 35 IN
now i want a sql query, so i can count no of trips. i do not want to use status field for count
edit
i want result total trips
where 35 is the starting point and 38 is the ending point(this is 1 trip), when again 35 occures after 38 means 2 trip and so on.

So you don't want to look at the status, but only look at the zoneid changes ordered by id. zoneid 36 is irrelevant, so we select 35 and 38 only, order them by id and count changes. We detect changes by comparing a record with the previous one. We can look into a previous record with LAG.
select sum(ischange) as trips_completed
from
(
select
case when zoneid <> lag(zoneid) over (order by id) then 1 else 0 end as ischange
from trips
where zoneid in (35,38)
) changes_detected;

I am suggesting this without any testing. Does the following query produce the correct number of rows? Note if there is a date_created (datetime) column then I would suggest using that column to order by instead of id.
select
ca.in_id, t.id as out_id, ca.in_status, t.status as out_status
from table1 t
cross apply (
select top (1) id as in_id, status as in_status
from table1
where table1.id < t.id
and zoneid = 35
order by id DESC
) ca
where t.zoneid = 38
/* and conditions for selecting one day only */
If that logic is correct then just use COUNT(*) instead of the column list.
CREATE TABLE Table1
("id" int, "zoneid" int, "status" varchar(3), "other" varchar(54))
;
INSERT INTO Table1
("id", "zoneid", "status", "other")
VALUES
(1, 35, 'IN', 'starting zone'),
(2, 35, 'OUT', '1st trip has been started'),
(3, 36, 'IN', NULL),
(4, 36, 'IN', NULL),
(5, 36, 'OUT', NULL),
(6, 38, 'IN', 'last station zone 1 trip completed'),
(7, 38, 'OUT', 'returning back 2nd trip has start'),
(8, 38, 'OUT', NULL),
(9, 36, 'IN', NULL),
(10, 36, 'OUT', NULL),
(11, 35, 'IN', 'when return back in start zone means 2nd trip complete'),
(12, 35, 'IN', NULL),
(13, 35, 'IN', NULL),
(14, 35, 'OUT', '3rd trip has been started'),
(15, 36, 'IN', NULL),
(16, 36, 'IN', NULL),
(17, 36, 'OUT', 'other'),
(18, 38, 'IN', '3rd trip has been completed'),
(19, 38, 'OUT', '4th trip has been started'),
(20, 38, 'OUT', NULL),
(21, 36, 'IN', NULL),
(22, 36, 'OUT', NULL),
(23, 35, 'IN', '4th trip completed'),
(24, 35, 'IN', NULL)
;

For learning purposes, here is a self explanatory & detailed version http://sqlfiddle.com/#!15/d8bf4/1/0
The solution is based on calculating the 'running' count of trips from '35 to 38' and '38 to 35'. Solution is very specific to the OP query but can be optimized with a much shorter version...
with trip_38_to_35 as (
select * from zonecount
where (zoneid=38 and status='OUT') OR (zoneid=35 and status='IN')
order by id asc
)
, count_start_on_38 as (
select count(*) as start_on_38
from trip_38_to_35
where (zoneid=38 and status='OUT') AND
id <
( select max(id)
from trip_38_to_35
where (zoneid=35 and status='IN')
) /*do not count unfinished trips*/
)
, count_end_on_35 as (
select count(*) as end_on_35
from trip_38_to_35
where (zoneid=35 and status='IN')
) /*the other way of trip*/
, trip_35_to_38 as (
select * from zonecount
where (zoneid=35 and status='OUT') OR (zoneid=38 and status='IN')
order by id asc
)
,count_start_on_35 as (
select count(*) as start_on_35
from trip_35_to_38
where (zoneid=35 and status='OUT') AND
id <
( select max(id)
from trip_35_to_38
where (zoneid=38 and status='IN')
) /*do not count unfinished trips*/
)
,count_end_on_38 as (
select count(*) as end_on_38
from trip_35_to_38
where (zoneid=38 and status='IN')
)
/*sum the MIN of the two trips count*/
select
(case when end_on_35 > start_on_38 then start_on_38 else end_on_35 end) +
(case when end_on_38 > start_on_35 then start_on_35 else end_on_38 end)
from
count_start_on_38,
count_end_on_35,
count_start_on_35,
count_end_on_38
btw, 6 trips are calculated as per definition

Related

Identify rows subsequent to other rows based on criteria?

I am fairly new to DB2 and SQL. There exists a table of customers and their visits. I need to write a query to find visits by the same customer subsequent and within 24hr to a visit when Sale = 'Y'.
Based on this example data:
CustomerId
VisitID
Sale
DateTime
1
1
Y
2021-04-23 20:16:00.000000
2
2
N
2021-04-24 20:16:00.000000
1
3
N
2021-04-23 21:16:00.000000
2
4
Y
2021-04-25 20:16:00.000000
3
5
Y
2021-04-23 20:16:00.000000
2
6
N
2021-04-25 24:16:00.000000
3
7
N
2021-5-23 20:16:00.000000
The query results should return:
VisitID
3
6
How do I do this?
Try this. You may uncomment the commented out block to run this statement as is.
/*
WITH MYTAB (CustomerId, VisitID, Sale, DateTime) AS
(
VALUES
(1, 1, 'Y', '2021-04-23 20:16:00'::TIMESTAMP)
, (1, 3, 'N', '2021-04-23 21:16:00'::TIMESTAMP)
, (2, 2, 'N', '2021-04-24 20:16:00'::TIMESTAMP)
, (2, 4, 'Y', '2021-04-25 20:16:00'::TIMESTAMP)
, (2, 6, 'N', '2021-04-25 23:16:00'::TIMESTAMP)
, (3, 5, 'Y', '2021-04-23 20:16:00'::TIMESTAMP)
, (3, 7, 'N', '2021-05-23 20:16:00'::TIMESTAMP)
)
*/
SELECT VisitID
FROM MYTAB A
WHERE EXISTS
(
SELECT 1
FROM MYTAB B
WHERE B.CustomerID = A.CustomerID
AND B.Sale = 'Y'
AND B.VisitID <> A.VisitID
AND A.DateTime BETWEEN B.DateTime AND B.DateTime + 24 HOUR
)

How to find all those Sellers from the table who had increase in sales in at least 3 months consecutively in SQL?

How to find all those Sellers from below table who had increase in sales in at least 3 months consecutively?
Record | Seller_id | Months | Sales_amount
0 121 Feb 100
1 121 Jan 87
2 121 Mar 95
3 121 May 105
4 121 Apr 100
5 321 Jan 100
6 321 Feb 87
7 321 Mar 95
8 321 Apr 105
9 321 May 110
10 597 Jan 100
11 597 Feb 105
12 597 Mar 95
13 597 Apr 100
14 597 May 110
This is curious you have no year and months are three letter codes. Do it with lag
and table of months
With tbl as (
select * from (values
-- source data
(0 , 121,'Feb',100)
,(1 , 121,'Jan',87 )
,(2 , 121,'Mar',95 )
,(3 , 121,'May',105)
,(4 , 121,'Apr',100)
,(5 , 321,'Jan',100)
,(6 , 321,'Feb',87 )
,(7 , 321,'Mar',95 )
,(8 , 321,'Apr',105)
,(9 , 321,'May',110)
,(10, 597,'Jan',100)
,(11, 597,'Feb',105)
,(12, 597,'Mar',95 )
,(13, 597,'Apr',100)
,(14, 597,'May',110)
) t(id, Seller_id, Months, Sales_amount)
), months as (
select * from ( values
(1, 'Jan')
,(2, 'Feb')
,(3, 'Mar')
,(4, 'Apr')
,(5, 'May')
-- , etc
) t(id,name)
)
select *
from (
select t.*,
lag(Sales_amount,1) over (partition by Seller_id order by m.id) m1,
lag(Sales_amount,2) over (partition by Seller_id order by m.id) m2
from tbl t
join months m on m.name=t.Months
) t
where Sales_amount > m1 and m1 > m2;
WITH a
AS (SELECT *
FROM
(
VALUES -- source data
(0, 121, 'Feb', 100),
(1, 121, 'Jan', 87),
(2, 121, 'Mar', 95),
(3, 121, 'May', 105),
(4, 121, 'Apr', 100),
(5, 321, 'Jan', 100),
(6, 321, 'Feb', 87),
(7, 321, 'Mar', 95),
(8, 321, 'Apr', 105),
(9, 321, 'May', 110),
(10, 597, 'Jan', 100),
(11, 597, 'Feb', 105),
(12, 597, 'Mar', 95),
(13, 597, 'Apr', 100),
(14, 597, 'May', 110)
) t (id, Seller_id, Months, Sales_amount) ),
b
AS (SELECT *
FROM
(
VALUES
(1, 'Jan'),
(2, 'Feb'),
(3, 'Mar'),
(4, 'Apr'),
(5, 'May') -- , etc
) t (id, name) ),
c
AS (SELECT a.*,
b.id id2,
ROW_NUMBER() OVER (PARTITION BY a.Seller_id ORDER BY b.id ASC) rnk
FROM a
LEFT JOIN b
ON a.Months = b.name),
d
AS (SELECT --c1.*
c1.Seller_id,
c1.Months AS m1,
c2.Months AS m2,
c3.Months AS m3,
c1.Sales_amount AS sa1,
c2.Sales_amount AS sa2,
c3.Sales_amount AS sa3
FROM c c1
LEFT JOIN c c2
ON c1.id2 = c2.id2 - 1
AND c1.Seller_id = c2.Seller_id
LEFT JOIN c c3
ON c2.id2 = c3.id2 - 1
AND c2.Seller_id = c3.Seller_id)
SELECT *,
CASE
WHEN sa1 < sa2
AND sa2 < sa3 THEN
1
ELSE
0
END is_con
FROM d;

Amazon Redshift avoid recursive CTE for filling null values with previous

I have found a pretty good solution to a common problem in SQL, right here: https://stackoverflow.com/a/3474775
My only problem is that Amazon Redshift does not support recursive CTE, is there any way to rewrite this portion of code differently and avoid the recursion on CleanCust?
/* Test Data & Table */
DECLARE #Customers TABLE
(Dates datetime,
Customer integer,
Value integer)
INSERT INTO #Customers
VALUES ('20100101', 1, 12),
('20100101', 2, NULL),
('20100101', 3, 32),
('20100101', 4, 42),
('20100101', 5, 15),
('20100102', 1, NULL),
('20100102', 2, NULL),
('20100102', 3, 39),
('20100102', 4, NULL),
('20100102', 5, 16),
('20100103', 1, 13),
('20100103', 2, 24),
('20100103', 3, NULL),
('20100103', 4, NULL),
('20100103', 5, 21),
('20100104', 1, 14),
('20100104', 2, NULL),
('20100104', 3, NULL),
('20100104', 4, 65),
('20100104', 5, 23) ;
/* CustCTE - This gives us a RowNum to allow us to build the recursive CTE CleanCust */
WITH CustCTE
AS (SELECT Customer,
Value,
Dates,
ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY Dates) RowNum
FROM #Customers),
/* CleanCust - A recursive CTE. This runs down the list of values for each customer, checking the Value column, if it is null it gets the previous non NULL value.*/
CleanCust
AS (SELECT Customer,
ISNULL(Value, 0) Value, /* Ensure we start with no NULL values for each customer */
Dates,
RowNum
FROM CustCte cur
WHERE RowNum = 1
UNION ALL
SELECT Curr.Customer,
ISNULL(Curr.Value, prev.Value) Value,
Curr.Dates,
Curr.RowNum
FROM CustCte curr
INNER JOIN CleanCust prev ON curr.Customer = prev.Customer
AND curr.RowNum = prev.RowNum + 1)
The desired output is below, in the Required column:
Date Customer Value Required Rule
20100101 1 12 12
20100101 2 0 If no value assign 0
20100101 3 32 32
20100101 4 42 42
20100101 5 15 15
20100102 1 12 Take last known value
20100102 2 0 Take last known value
20100102 3 39 39
20100102 4 42 Take last known value
20100102 5 16 16
20100103 1 13 13
20100103 2 24 24
20100103 3 39 Take last known value
20100103 4 42 Take last known value
20100103 5 21 21
20100104 1 14 14
20100104 2 24 Take last known value
20100104 3 39 Take last known value
20100104 4 65 65
20100104 5 23 23
Use a running sum to set groups based on the occurrence of null values. Then get the max value for that group.
select dates,customer,val,coalesce(max(val) over(partition by customer,grp),0) as required
from (select dates,customer,val,
sum(case when val is null then 0 else 1 end)
over(partition by customer order by dates rows unbounded preceding) as grp
from customers
) t

SQL Server episode identification

I am working with a blood pressure database in SQL Server which contains patient_id, timestamp (per minute) and systolicBloodPressure.
My goals are to find:
the number of episodes in which a patient is under a certain blood pressure threshold
An episode consists of the timestmap where the patient drops below a certain threshold until the timestamp where the patient comes above the threshold.
the mean blood pressure per episode per patient
the duration of the episode per episode per patient
What I have tried so far:
I am able to identify episodes by just making a new column which sets to 1 if threshold is reached.
select *
, CASE
when sys < threshold THEN '1'
from BPDATA
However , I am not able to 'identify' different episodes within the patient; episode1 episode 2 with their relative timestamps.
Could someone help me with this? Or is there someone with a better different solution?
EDIT: Sample data with example threshold 100
ID Timestamp SysBP below Threshold
----------------------------------------------------
1 9:38 110 Null
1 9:39 105 Null
1 9:40 96 1
1 9:41 92 1
1 9:42 102 Null
2 12:23 95 1
2 12:24 98 1
2 12:25 102 Null
2 12:26 104 Null
2 12:27 94 1
2 12:28 88 1
2 12:29 104 Null
Thanks for the sample data.
This should work:
declare #t table (ID int, Timestamp time, SysBP int, belowThreshold bit)
insert #t
values
(1, '9:38', 110, null),
(1, '9:39', 105, null),
(1, '9:40', 96, 1),
(1, '9:41', 92, 1),
(1, '9:42', 102, null),
(2, '12:23', 95, 1),
(2, '12:24', 98, 1),
(2, '12:25', 102, null),
(2, '12:26', 104, null),
(2, '12:27', 94, 1),
(2, '12:28', 88, 1),
(2, '12:29', 104, null)
declare #treshold int = 100
;with y as (
select *, case when lag(belowThreshold, 1, 0) over(partition by id order by timestamp) = belowThreshold then 0 else 1 end epg
from #t
),
z as (
select *, sum(epg) over(partition by id order by timestamp) episode
from y
where sysbp < #treshold
)
select id, episode, count(episode) over(partition by id) number_of_episodes_per_id, avg(sysbp) avg_sysbp, datediff(minute, min(timestamp), max(timestamp))+1 episode_duration
from z
group by id, episode
This answer relies on LEAD() and LAG() functions so only works on 2012 or later:
Setup:
CREATE TABLE #bloodpressure
(
Patient_id int,
[TimeStamp] SmallDateTime,
SystolicBloodPressure INT
)
INSERT INTO #bloodpressure
VALUES
(1, '2017-01-01 09:01', 60),
(1, '2017-01-01 09:02', 55),
(1, '2017-01-01 09:03', 60),
(1, '2017-01-01 09:04', 70),
(1, '2017-01-01 09:05', 72),
(1, '2017-01-01 09:06', 75),
(1, '2017-01-01 09:07', 60),
(1, '2017-01-01 09:08', 50),
(1, '2017-01-01 09:09', 52),
(1, '2017-01-01 09:10', 53),
(1, '2017-01-01 09:11', 65),
(1, '2017-01-01 09:12', 71),
(1, '2017-01-01 09:13', 73),
(1, '2017-01-01 09:14', 74),
(2, '2017-01-01 09:01', 70),
(2, '2017-01-01 09:02', 75),
(2, '2017-01-01 09:03', 80),
(2, '2017-01-01 09:04', 70),
(2, '2017-01-01 09:05', 72),
(2, '2017-01-01 09:06', 75),
(2, '2017-01-01 09:07', 60),
(2, '2017-01-01 09:08', 50),
(2, '2017-01-01 09:09', 52),
(2, '2017-01-01 09:10', 53),
(2, '2017-01-01 09:11', 65),
(2, '2017-01-01 09:12', 71),
(2, '2017-01-01 09:13', 73),
(2, '2017-01-01 09:14', 74),
(3, '2017-01-01 09:12', 71),
(3, '2017-01-01 09:13', 60),
(3, '2017-01-01 09:14', 74)
Now using Lead And Lag to find the previous rows values, to find whether this is the beginning or end of a sequence of low blood pressures, in combination with a common table expression. Using a UNION of start and end events ensures that an event which covers just one minute is recorded as both a start and an end event.
;WITH CTE
AS
(
SELECT *,
LAG(SystolicBloodPressure,1)
OVER (PaRTITION BY Patient_Id ORDER BY TimeStamp) As PrevValue,
Lead(SystolicBloodPressure,1)
OVER (PaRTITION BY Patient_Id ORDER BY TimeStamp) As NextValue
FROM #bloodpressure
),
CTE2
AS
(
-- Get Start Events (EventType 1)
SELECT 1 As [EventType], Patient_id, TimeStamp,
ROW_NUMBER() OVER (ORDER BY Patient_id, TimeStamp) AS RN
FROM CTE
WHERE (PrevValue IS NULL AND SystolicBloodPressure < 70) OR
(PrevValue >= 70 AND SystolicBloodPressure < 70)
UNION
-- Get End Events (EventType 2)
SELECT 2 As [EventType], Patient_id, TimeStamp,
ROW_NUMBER() OVER (ORDER BY Patient_id, TimeStamp) AS RN
FROM CTE
WHERE (NextValue IS NULL AND SystolicBloodPressure < 70 ) OR
(NextValue >= 70 AND SystolicBloodPressure < 70)
)
SELECT C1.Patient_id, C1.TimeStamp As EventStart, C2.TimeStamp As EventEnd
FROM CTE2 C1
INNER JOIN CTE2 C2
ON C1.Patient_id = C2.Patient_id AND C1.RN = C2.RN
WHERE C1.EventType = 1 AND C2.EventType = 2
ORDER BY C1.Patient_id, C1.TimeStamp

PostgreSQL/plpython: how compare two columns from different table in loop

I have a problem with loop in which I must compare columns between different tables.
I have two tables year2004 and year2005. Both contains month numbers and an amount for that month. I want compare the amount from both tables and produce a third table year with the number of month and greatest amount for that month.
For example I have in 2004 - 100, in 2005 - 200 so I must return values(2005, number_of_month, 200). Have you any ideas for solve this problem?
PS. Sorry for my writing errors, I learned English only few years ago :)
I'm guessing that you're trying to find the greatest amount for each month across the two years.
This would be much, much easier if your data was all in one table monthly_statistics with a date column. Then it'd just be a simple aggregate function or a window.
So lets turn the two tables into one.
Given sample data:
CREATE TABLE year2004 ( month int primary key, amount int);
INSERT INTO year2004 (month, amount)
VALUES (1, 50), (2, 40), (3, 60), (4, 80), (5, 100), (6, 800), (7, 20), (8, 40), (9, 30), (10, 40), (11, 50), (12, 99);
CREATE TABLE year2005 ( month int primary key, amount int);
INSERT INTO year2005 (month, amount)
VALUES (1, 88), (2, 44), (3, 11), (4, 123), (5, 12), (6, 88), (7, 21), (8, 19), (9, 44), (10, 89), (11, 4), (12, 42);
we could either join the tables, or we could convert it to a single table by date then filter it. Here's how we might generate a single table with the contents:
SELECT DATE '2004-01-01' + month * INTERVAL '1' MONTH AS sampledate, amount
FROM year2004
UNION ALL
SELECT DATE '2005-01-01' + month * INTERVAL '1' MONTH, amount
FROM year2005;
That's what you'd use if you were going to create a new table, but if you don't care about the actual dates, only the months, you can simply union all the two tables:
WITH samples AS (
SELECT month, amount
FROM year2004
UNION ALL
SELECT month, amount
FROM year2005
)
SELECT month, max(amount) AS amount
FROM samples
GROUP BY 1
ORDER BY month;
samplemonth | amount
-------------+--------
5 | 123
11 | 89
1 | 99
2 | 88
3 | 44
9 | 40
4 | 60
6 | 100
10 | 44
12 | 50
7 | 800
8 | 21
(12 rows)