Writing single SQL query satisfying two cases - sql

After the comments received, I am rephrasing this question with required data.
Reference: SQL query to exclude some records from the output
Vertica analytical functions: https://www.vertica.com/blog/analytic-queries-vertica/
Table 1:
create table etl_group_membership
(
group_item_id int not null,
member_item_id int not null
);
INSERT INTO etl_group_membership (group_item_id, member_item_id) VALUES (335640, 117722);
INSERT INTO etl_group_membership (group_item_id, member_item_id) VALUES (335640, 104151);
INSERT INTO etl_group_membership (group_item_id, member_item_id) VALUES (335640, 5316);
INSERT INTO etl_group_membership (group_item_id, member_item_id) VALUES (335641, 117723);
INSERT INTO etl_group_membership (group_item_id, member_item_id) VALUES (335641, 104152);
INSERT INTO etl_group_membership (group_item_id, member_item_id) VALUES (335641, 5317);
INSERT INTO etl_group_membership (group_item_id, member_item_id) VALUES (335642, 117724);
INSERT INTO etl_group_membership (group_item_id, member_item_id) VALUES (335642, 104153);
INSERT INTO etl_group_membership (group_item_id, member_item_id) VALUES (335642, 5318);
Table 2:
create table v_poll_item
(
device_item_id int not null,
item_id int not null
);
INSERT INTO v_poll_item (device_item_id, item_id) VALUES (117722, 273215);
INSERT INTO v_poll_item (device_item_id, item_id) VALUES (117722, 117936);
INSERT INTO v_poll_item (device_item_id, item_id) VALUES (117722, 117873);
INSERT INTO v_poll_item (device_item_id, item_id) VALUES (117722, 123305);
INSERT INTO v_poll_item (device_item_id, item_id) VALUES (104151, 240006);
INSERT INTO v_poll_item (device_item_id, item_id) VALUES (104151, 240005);
INSERT INTO v_poll_item (device_item_id, item_id) VALUES (104151, 239415);
INSERT INTO v_poll_item (device_item_id, item_id) VALUES (104151, 239414);
INSERT INTO v_poll_item (device_item_id, item_id) VALUES (5316, 118310);
INSERT INTO v_poll_item (device_item_id, item_id) VALUES (5316, 130627);
INSERT INTO v_poll_item (device_item_id, item_id) VALUES (5316, 298564);
INSERT INTO v_poll_item (device_item_id, item_id) VALUES (5316, 118311);
Table 3: Note that im_utilization can be NULL as well
create table nrm_cpustats_rate
(
item_id int not null,
tstamp datetime not null,
im_utilization float,
);
INSERT INTO nrm_cpustats_rate (item_id, tstamp, im_utilization) VALUES (273215, '2021-06-28 19:55:00.000000', 2);
INSERT INTO nrm_cpustats_rate (item_id, tstamp, im_utilization) VALUES (273215, '2021-06-27 23:35:00.000000', 24);
INSERT INTO nrm_cpustats_rate (item_id, tstamp, im_utilization) VALUES (273215, '2021-06-26 14:05:00.000000', 27);
INSERT INTO nrm_cpustats_rate (item_id, tstamp, im_utilization) VALUES (273215, '2021-06-25 09:05:00.000000', 29);
INSERT INTO nrm_cpustats_rate (item_id, tstamp, im_utilization) VALUES (117936, '2021-06-28 19:30:00.000000', 17);
INSERT INTO nrm_cpustats_rate (item_id, tstamp, im_utilization) VALUES (117936, '2021-06-28 19:15:00.000000', 35);
INSERT INTO nrm_cpustats_rate (item_id, tstamp, im_utilization) VALUES (117936, '2021-06-28 19:05:00.000000', 50);
INSERT INTO nrm_cpustats_rate (item_id, tstamp, im_utilization) VALUES (117936, '2021-06-27 05:45:00.000000', 89);
INSERT INTO nrm_cpustats_rate (item_id, tstamp, im_utilization) VALUES (117936, '2021-06-25 09:20:00.000000', 37);
INSERT INTO nrm_cpustats_rate (item_id, tstamp, im_utilization) VALUES (117936, '2021-06-25 09:10:00.000000', 51);
INSERT INTO nrm_cpustats_rate (item_id, tstamp, im_utilization) VALUES (117936, '2021-06-25 08:50:00.000000', 90);
INSERT INTO nrm_cpustats_rate (item_id, tstamp, im_utilization) VALUES (118310, '2021-06-23 04:10:00.000000', 51);
INSERT INTO nrm_cpustats_rate (item_id, tstamp, im_utilization) VALUES (118310, '2021-06-23 03:15:00.000000', 48);
INSERT INTO nrm_cpustats_rate (item_id, tstamp, im_utilization) VALUES (118310, '2021-06-22 22:20:00.000000', 19);
INSERT INTO nrm_cpustats_rate (item_id, tstamp, im_utilization) VALUES (239414, '2021-06-22 17:10:00.000000', 11);
INSERT INTO nrm_cpustats_rate (item_id, tstamp, im_utilization) VALUES (239414, '2021-06-22 16:30:00.000000', 37);
INSERT INTO nrm_cpustats_rate (item_id, tstamp, im_utilization) VALUES (239414, '2021-06-22 16:35:00.000000', 38);
INSERT INTO nrm_cpustats_rate (item_id, tstamp, im_utilization) VALUES (239414, '2021-06-28 18:45:00.000000', 74);
INSERT INTO nrm_cpustats_rate (item_id, tstamp, im_utilization) VALUES (239414, '2021-06-28 18:48:00.000000', 76);
INSERT INTO nrm_cpustats_rate (item_id, tstamp, im_utilization) VALUES (239414, '2021-06-28 18:50:00.000000', 77);
INSERT INTO nrm_cpustats_rate (item_id, tstamp, im_utilization) VALUES (118311, '2021-06-28 00:40:00.000000', 29);
INSERT INTO nrm_cpustats_rate (item_id, tstamp, im_utilization) VALUES (118311, '2021-06-23 22:30:00.000000', 37);
INSERT INTO nrm_cpustats_rate (item_id, tstamp, im_utilization) VALUES (118311, '2021-06-23 22:25:00.000000', 92);
To get the device items ids in a group:
SELECT member_item_id FROM etl_group_membership WHERE group_item_id = 335640;
From the list of device item ids retrieved, to get the list of item_ids:
SELECT item_id FROM v_poll_item WHERE device_item_id IN (<devices retrieved from previous query>);
Inputs:
Two time ranges: yesterday until 7 days back (AND tstamp > '2021-06-22 00:00:00.000000'AND tstamp <= '2021-06-28 23:59:59.000000')
Group id: 335640
breach_threshold: 25
Minimum number of breaches each day: 2
Expected output:
breached means im_utilization is >= 25
Pick only those records where count_of_breached in a given day >= 2
That is, records with item_id 273215 are excluded because even though the number of breaches (>= 25) are 2, there's only one each day
device_item_id | item_id | count_of_breach | date_when_breached | max_utilization | max_utilization_tstamp
=====================================================================================================================
117722 | 117936 | 2 | 2021-06-28 | 90 | 2021-06-25 08:50:00.000000
117722 | 117936 | 3 | 2021-06-25 | 90 | 2021-06-25 08:50:00.000000
5316 | 118310 | 2 | 2021-06-23 | 51 | 2021-06-23 04:10:00.000000
5316 | 118311 | 2 | 2021-06-23 | 92 | 2021-06-23 22:25:00.000000
104151 | 239414 | 2 | 2021-06-22 | 77 | 2021-06-28 18:50:00.000000
104151 | 239414 | 3 | 2021-06-28 | 77 | 2021-06-28 18:50:00.000000
Even if a single SQL query cannot be written to produce this output, can two optimized queries be suggested. #marcothesane pointed that even the query to get daily breaches can be written in a better way.
UPDATE:
This query has worked for me for finding the max im_utlization and tstamp when it was max. I am not sure though why I had to use the timestamp range at two places!
SELECT t.item_id, t.tstamp, t.im_utilization
FROM (
SELECT item_id, MAX(im_utilization) AS max_cpu
FROM nrm_cpustats_rate
WHERE item_id IN (SELECT item_id from v_poll_item WHERE device_item_id IN (SELECT item_id FROM v_poll_item WHERE device_item_id IN (117722, 104151, 5316)))
AND tstamp > '2021-06-22 00:00:00.000000
AND tstamp <= '2021-06-28 23:59:59.000000'
GROUP BY item_id
) AS m
INNER JOIN nrm_cpustats_rate AS t
ON t.item_id = m.item_id
AND t.im_utilization = m.max_cpu
AND tstamp > '2021-06-22 00:00:00.000000
AND tstamp <= '2021-06-28 23:59:59.000000'
ORDER BY 1, 2 DESC, 3 DESC

To start, all your preliminary work relies specifically with the cpustats rate table. But as you mentioned, the breach consideration is based on more than 1 (hence 2 or more) ON THE SAME DAY. So, on a per item, per day (hence date( ncr.tstamp ) field to get just the DATE portion regardless of time, we can apply a HAVING clause so only those when grouped by item and date have more than one breach.
Then, from that, we can join to the poll item table for the specific device ID. The following should work for you.
select
vpi.device_item_id,
ncr.item_id,
count(*) count_of_breach,
date( ncr.tstamp ) date_when_breached,
max( ncr.im_utilization ) max_utilization
from
nrm_cpustats_rate ncr
join v_poll_item vpi
on ncr.item_id = vpi.item_id
where
ncr.im_utilization >= 25
group by
vpi.device_item_id,
ncr.item_id,
date( ncr.tstamp )
having
count(*) > 1
FEEDBACK - MODIFICATION
Removed the second table from query and trying with the TRUNC() function within query instead of date(). Also added in your date filter restriction.
select
ncr.item_id,
count(*) count_of_breach,
trunc( ncr.tstamp, 'DD' ) date_when_breached,
max( ncr.im_utilization ) max_utilization
from
nrm_cpustats_rate ncr
where
ncr.tstamp > '2021-06-22 00:00:00.000000'
AND ncr.tstamp <= '2021-06-28 23:59:59.000000'
AND ncr.im_utilization >= 25
group by
ncr.item_id,
trunc( ncr.tstamp, 'DD' )
having
count(*) > 1

You can hard-cast a TIMESTAMP to a DATE with the ::DATE operation.
As the others did it:
You can get all the first five columns in a grouping query, grouping by device_item_id,v_poll_item.item_id,tstamp::DATE . I put that as a common Table Expression (CTE) into a WITH clause. Then, I joined that CTE back with nrm_cpustats_rate over item_id and an equi predicate over the previously obtained max_utilization and im_utilization; and I finally filter away the rows with less than 2 breach counts and filter again by the timestamp range.
WITH
grp AS (
SELECT
device_item_id
, v_poll_item.item_id
, SUM(
CASE
WHEN im_utilization >= 25 THEN 1
ELSE 0
END
) AS count_of_breach
, tstamp::DATE as date_when_breached
, MAX(im_utilization) AS max_utilization
FROM nrm_cpustats_rate
JOIN v_poll_item USING(item_id)
JOIN etl_group_membership ON member_item_id=device_item_id
WHERE tstamp > '2021-06-22 00:00:00.000000'AND tstamp <= '2021-06-28 23:59:59.000000'
GROUP BY
device_item_id
, v_poll_item.item_id
, tstamp::DATE
)
SELECT
grp.*
, o.tstamp AS max_utilization_tstamp
FROM grp
JOIN nrm_cpustats_rate o
ON o.item_id=grp.item_id
AND max_utilization=im_utilization
WHERE tstamp > '2021-06-22 00:00:00.000000'AND tstamp <= '2021-06-28 23:59:59.000000'
AND count_of_breach >= 2
;
-- out device_item_id | item_id | count_of_breach | date_when_breached | max_utilization | max_utilization_tstamp
-- out ----------------+---------+-----------------+--------------------+-----------------+------------------------
-- out 5316 | 118310 | 2 | 2021-06-23 | 51 | 2021-06-23 04:10:00
-- out 104151 | 239414 | 3 | 2021-06-28 | 77 | 2021-06-28 18:50:00
-- out 104151 | 239414 | 2 | 2021-06-22 | 38 | 2021-06-22 16:35:00
-- out 117722 | 117936 | 3 | 2021-06-25 | 90 | 2021-06-25 08:50:00
-- out 5316 | 118311 | 2 | 2021-06-23 | 92 | 2021-06-23 22:25:00
-- out 117722 | 117936 | 2 | 2021-06-28 | 50 | 2021-06-28 19:05:00

Related

postgresql How show most frequent value per day date

I've got a problem with a query that is supposed to return the value which occur most per date
+------------+------------------+
| Date | value |
+------------+------------------+
| 2020-01-01 | Programmer |
| 2020-01-02 | Technician |
| 2020-01-03 | Business Analyst |
+------------+------------------+
So far I have done
select count(headline) as asd, publication_date, employer -> 'name' as dsa from jobhunter
group by publication_date,dsa
ORDER BY publication_date DESC
But it shows 2020-12-31 19:06:00 instead of just YYYY-MM-DD
Any idea on how to fix this?
enter image description here
Test data:
create table tbl (
id serial primary key,
row_datetime TIMESTAMP,
row_val VARCHAR(60)
);
insert into tbl (row_datetime, row_val) values ('2021-01-01 00:00:00', 'a');
insert into tbl (row_datetime, row_val) values ('2021-01-01 01:00:00', 'a');
insert into tbl (row_datetime, row_val) values ('2021-01-01 02:00:00', 'b');
insert into tbl (row_datetime, row_val) values ('2021-01-02 00:00:00', 'a');
insert into tbl (row_datetime, row_val) values ('2021-01-02 01:00:00', 'b');
insert into tbl (row_datetime, row_val) values ('2021-01-02 02:00:00', 'b');
Example query:
SELECT dt, val, cnt
FROM (
SELECT dt, val, cnt, ROW_NUMBER() OVER (PARTITION BY dt ORDER BY cnt DESC) AS row_num
FROM (
SELECT dt, val, COUNT(val) AS cnt
FROM (
SELECT DATE(row_datetime) AS dt, row_val AS val FROM tbl
) AS T1 GROUP BY dt, val
) AS T2
) AS T3
WHERE row_num=1
ORDER BY dt ASC
You can additionally customize your query to optimize the performance, get more fields, etc.

SQL Server Query to pull/updating the missing data from Previous months value

I have a two tables table Rev and Table Cost from both these tables are common column is Product ID and Tdate (Month/Year) for join.
from Table Rev I need to pull the cost from Table Cost table, however, if the cost is not found for the particular month then it should check for the previous month and bring that cost for that product. Like that it should check till last 6 months (looping) if the cost is not available and get the latest cost whichever is available for latest 6 months from that date.
However I am not able to get any idea how to solve it, Please help.
REV
Product ID Transaction Date Output should be
101 3/5/2018 16.8
101 3/24/2018 16.8
101 4/10/2018 16.8
101 5/30/2018 7.6
101 6/25/2018 14.3
102 1/1/2019 30.11
102 2/4/2019 30.11
102 2/11/2019 30.11
103 2/17/2019 6.62
103 2/25/2019 6.62
103 3/24/2019 6.62
103 3/30/2019 6.62
for the REV table I need to bring the cost based on PROD ID and Month/Year Match, if not available it should check for last 6 months backdate and bring the cost latest available month.
Cost
Product ID PCR Period Cost
101 Jan-18 16.8
101 May-18 7.6
101 Jun-18 14.3
101 Jul-18 301.88
101 Aug-18 6.62
101 Nov-18 0.01
102 Dec-18 6.62
102 May-18 47.95
102 Jun-18 79.8
102 Jul-18 3.49
102 Jan-19 30.11
103 Mar-19 102.11
Let me know if you need any futher details
This should be close to what you're after
select r.id, r.td, (
select top 1 c.cost from cost c
where c.id = r.id
and datediff(day, CAST('01-' + c.td AS datetime), r.td) >= 0
order by CAST('01-' + c.td AS datetime) desc
) as cost
from rev r
EXAMPLE SCRIPT:
declare #REV table (id int, trandate datetime)
insert into #REV (id, trandate) values (101, '3/5/2018')
insert into #REV (id, trandate) values (101, '3/24/2018')
insert into #REV (id, trandate) values (101, '4/10/2018')
insert into #REV (id, trandate) values (101, '5/30/2018')
insert into #REV (id, trandate) values (101, '6/25/2018')
insert into #REV (id, trandate) values (102, '1/1/2019')
insert into #REV (id, trandate) values (102, '2/4/2019')
insert into #REV (id, trandate) values (102, '2/11/2019')
insert into #REV (id, trandate) values (103, '2/17/2019')
insert into #REV (id, trandate) values (103, '2/25/2019')
insert into #REV (id, trandate) values (103, '3/24/2019')
insert into #REV (id, trandate) values (103, '3/30/2019')
declare #COST table (id int, pcr varchar(20), cost float)
insert into #COST (id, pcr, cost) values (101, 'Jan-18', 16.8)
insert into #COST (id, pcr, cost) values (101, 'May-18', 7.6)
insert into #COST (id, pcr, cost) values (101, 'Jun-18', 14.3)
insert into #COST (id, pcr, cost) values (101, 'Jul-18', 301.88)
insert into #COST (id, pcr, cost) values (101, 'Aug-18', 6.62)
insert into #COST (id, pcr, cost) values (101, 'Nov-18', 0.01)
insert into #COST (id, pcr, cost) values (102, 'Dec-18', 6.62)
insert into #COST (id, pcr, cost) values (102, 'May-18', 47.95)
insert into #COST (id, pcr, cost) values (102, 'Jun-18', 79.8)
insert into #COST (id, pcr, cost) values (102, 'Jul-18', 3.49)
insert into #COST (id, pcr, cost) values (102, 'Jan-19', 30.11)
insert into #COST (id, pcr, cost) values (103, 'Mar-19', 102.11)
select r.id, r.trandate, (
select top 1 c.cost from #cost c
where c.id = r.id
and datediff(day, CAST('01-' + c.pcr AS datetime), r.trandate) >= 0
order by CAST('01-' + c.pcr AS datetime) desc
) as cost
from #rev r
RESULTS:
101 2018-03-05 16.8
101 2018-03-24 16.8
101 2018-04-10 16.8
101 2018-05-30 7.6
101 2018-06-25 14.3
102 2019-01-01 30.11
102 2019-02-04 30.11
102 2019-02-11 30.11
103 2019-02-17 NULL
103 2019-02-25 NULL
103 2019-03-24 102.11
103 2019-03-30 102.11
SELECT A.PRODUCT_ID
,A.TRANSACTION_DATE
,(
SELECT TOP 1 X.COST
FROM COST X
WHERE X.PRODUCT_ID = A.PRODUCT_ID
AND X.PCR_PERIOD < = A.TRANSACTION_DATE
AND X.PCR_PERIOD > DATEADD(MONTH,-6,A.TRANSACTION_DATE)
ORDER BY X.PCR_PERIOD DESC
) AS COST
FROM REV A
Written using Outer Apply and assuming PCR Period is a date:
SELECT REV.[Product ID], REV.[Transaction Date], LastCost.[Cost]
FROM REV
OUTER APPLY
(
SELECT TOP 1 Cost.Cost
FROM Cost
WHERE Cost.[Product ID]= REV.[Product ID]
AND Cost.[PCR Period] BETWEEN dateadd(month,-6,REV.[Transaction Date]) and REV.[Transaction Date]
ORDER BY Cost.[PCR Period] DESC
) AS LastCost

How to find records with recursively overlapping date ranges in Oracle DB

I have a table like this:
| ID | DSTART | DEND
+------+------------+-----------
| fat1 | 01/01/2017 | 31/01/2017
| fat2 | 01/02/2017 | 28/02/2017
| fat3 | 01/03/2017 | 31/03/2017
| fat4 | 01/04/2017 | 30/04/2017
| fat5 | 01/02/2017 | 31/03/2017
| fat6 | 01/01/2017 | 28/02/2017
| fat7 | 01/03/2017 | 30/04/2017
| fat8 | 01/06/2017 | 30/06/2017
| fat9 | 28/04/2017 | 02/05/2017
given a record I want to find all the overlapping records and all the records overlapping the overlapping records.
e.g. searching for overlapping records of fat7 should return
fat5 (overlaps fat7)
fat4 (overlaps fat7)
fat3 (overlaps fat7)
fat2 (*overlaps fat5)
fat6 (*overlaps fat5)
fat1 (**overlaps fat6)
to create the dataset:
create table zz_fatt
( id varchar2(100) primary key,
dstart date,
dend date);
insert into zz_fatt (id, dstart, dend) values ('fat7', to_date('03/01/2017', 'mm/dd/yyyy'), to_date('04/30/2017', 'mm/dd/yyyy'));
insert into zz_fatt (id, dstart, dend) values ('fat1', to_date('01/01/2017', 'mm/dd/yyyy'), to_date('01/31/2017', 'mm/dd/yyyy'));
insert into zz_fatt (id, dstart, dend) values ('fat2', to_date('02/01/2017', 'mm/dd/yyyy'), to_date('02/28/2017', 'mm/dd/yyyy'));
insert into zz_fatt (id, dstart, dend) values ('fat3', to_date('03/01/2017', 'mm/dd/yyyy'), to_date('03/31/2017', 'mm/dd/yyyy'));
insert into zz_fatt (id, dstart, dend) values ('fat4', to_date('04/01/2017', 'mm/dd/yyyy'), to_date('04/30/2017', 'mm/dd/yyyy'));
insert into zz_fatt (id, dstart, dend) values ('fat5', to_date('02/01/2017', 'mm/dd/yyyy'), to_date('03/31/2017', 'mm/dd/yyyy'));
insert into zz_fatt (id, dstart, dend) values ('fat6', to_date('01/01/2017', 'mm/dd/yyyy'), to_date('02/28/2017', 'mm/dd/yyyy'));
insert into zz_fatt (id, dstart, dend) values ('fat8', to_date('06/01/2017', 'mm/dd/yyyy'), to_date('06/15/2017', 'mm/dd/yyyy'));
You can assign a group identifier to the records. The idea is to find records that do not overlap, and use them as the beginning of a group.
The following assigns the groups to each record:
select t.*, sum(group_start) over (order by dstart) as grp
from (select t.*,
(case when not exists (select 1
from t t2
where t2.dstart < t.dstart and t2.dend >= t.dstart
)
then 1 else 0
end) group_start
from t
) t
If you only want the groups for a certain record then there are several ways, such as:
with overlaps as (
<query above>
)
select o.*
from overlaps o
where o.grp = (select o2.grp from overlaps o2 where o2.id = ???);

Using a (Recursive?) CTE + Window Functions to zero out sales orders?

I am trying to use a recursive CTE + window functions to find the last outcome of a series of buy/sell orders.
First, here's some nomenclature:
field_id is the store's ID.
Field_number is an order number, but can be reused by the same person
Field_date is the date of the initial order.
Field_inserted is when this specific transaction occcurred.
Field_sale is whether we bought or returned it.
Unfortunately, because of the way the systems work, I do NOT get the cost when an item is returned, so figuring out the last outcome for an order is complicated (did we wind up selling any). I need to match the purchase with the sale, Which normally works pretty well. However, there are cases such as below when it fails, and I'm trying to find a way to do this in one pass, possibly using a recursive CTE.
Here's some code.
DECLARE #tablea TABLE (field_id int, field_number CHAR(3), field_date datetime, field_inserted DATETIME, field_sale varchar(4))
INSERT INTO #tablea
VALUES
(1, 100, '20170311','20170311 01:00:00', 'Buy'),
(1, 100, '20170311','20170311 01:01:00', 'Retu'),
(1, 100, '20170311','20170311 01:02:00', 'Buy'),
(1, 100, '20170311','20170311 01:03:00', 'Retu'),
(1, 100, '20170311','20170311 01:02:01', 'buy'),
(2, 100, '20170311','20170311 01:03:00', 'REtu'),
(1, 110, '20170311','20170311 01:03:00', 'Buy');
Now to remove the buys that were then returned. The ISNULL is because I'm the NOT IN will ignore all the rows that have NULL for the _lead/_lag values.
WITH cte AS
(SELECT
ROW_NUMBER() OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS row_num,
field_id,
field_number,
field_date,
field_sale,
lead(field_sale) OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lead,
lag(field_sale) OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lag
FROM #tablea
)
SELECT * FROM cte
WHERE NOT (cte.field_sale = 'Buy' AND ISNULL(field_sale_lead,'') = 'Retu')--AND field_sale_lead IS NOT null)
AND NOT (cte.field_sale = 'Retu' AND ISNULL(field_sale_lag,'') = 'buy' )--AND field_sale_lag IS NOT NULL)
And I felt pretty smug and thought I had it. However, that's the simple case. Buy, Return, Buy, Return. Let's try another case, Buy Buy Return Return, which is still valid, but obviously would result in a net of 0..
DECLARE #tablea TABLE (field_id int, field_number CHAR(3), field_date datetime, field_inserted DATETIME, field_sale varchar(4))
INSERT INTO #tablea
VALUES
(1, 100, '20170311','20170311 01:00:00', 'Buy'),
(1, 100, '20170311','20170311 01:01:00', 'Buy'),
(1, 100, '20170311','20170311 01:02:00', 'Retu'),
(1, 100, '20170311','20170311 01:03:00', 'Retu'),
(2, 100, '20170311','20170311 01:03:00', 'Buy'),
(1, 110, '20170311','20170311 01:03:00', 'Buy');
WITH cte AS
(SELECT
ROW_NUMBER() OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS row_num,
field_id,
field_number,
field_date,
field_sale,
lead(field_sale) OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lead,
lag(field_sale) OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lag
FROM #tablea
)
SELECT * FROM cte
WHERE NOT (cte.field_sale = 'Buy' AND ISNULL(field_sale_lead,'') = 'sell')--AND field_sale_lead IS NOT null)
AND NOT (cte.field_sale = 'sell' AND ISNULL(field_sale_lag,'') = 'buy' )--AND field_sale_lag IS NOT NULL)
When you do this, though, you realize that it found direct matches, but now there's still a Buy/Return pair, and I'd like to cancel that out.
It's at this point I'm stuck. I've done Recursive CTEs before, but for whatever reason I can't figure out how to recurse and make it cancel out 1/1/100 and 4/1/100. All I've managed to do is have it choke on the recursion.
DECLARE #tablea TABLE (field_id int, field_number CHAR(3), field_date datetime, field_inserted DATETIME, field_sale varchar(4))
INSERT INTO #tablea
VALUES
(1, 100, '20170311','20170311 01:00:00', 'Buy'),
(1, 100, '20170311','20170311 01:01:00', 'Buy'),
(1, 100, '20170311','20170311 01:02:00', 'Retu'),
(1, 100, '20170311','20170311 01:03:00', 'Retu'),
(2, 100, '20170311','20170311 01:03:00', 'Buy'),
(1, 110, '20170311','20170311 01:03:00', 'Buy');
WITH cte AS
(SELECT
ROW_NUMBER() OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS row_num,
field_id,
field_number,
field_date,
field_sale,
field_inserted,
lead(field_sale) OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lead,
lag(field_sale) OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lag
FROM #tablea
--)
--SELECT * FROM cte
--WHERE NOT (cte.field_sale = 'Buy' AND ISNULL(field_sale_lead,'') = 'Retu')--AND field_sale_lead IS NOT null)
--AND NOT (cte.field_sale = 'Retu' AND ISNULL(field_sale_lag,'') = 'buy' )--AND field_sale_lag IS NOT NULL)
UNION ALL
SELECT
ROW_NUMBER() OVER (PARTITION BY cte.field_id, cte.field_number, cte.field_date ORDER BY cte.field_inserted) AS row_num,
cte.field_id,
cte.field_number,
cte.field_date,
cte.field_sale,
cte.field_inserted,
lead(cte.field_sale) OVER (PARTITION BY cte.field_id, cte.field_number, cte.field_date ORDER BY cte.field_inserted) AS field_sale_lead,
lag(cte.field_sale) OVER (PARTITION BY cte.field_id, cte.field_number, cte.field_date ORDER BY cte.field_inserted) AS field_sale_lag
FROM #tablea INNER JOIN cte ON cte.field_date = [#tablea].field_date AND cte.field_id = [#tablea].field_id AND cte.field_number = [#tablea].field_number
)
SELECT * FROM cte
WHERE NOT (cte.field_sale = 'Buy' AND ISNULL(field_sale_lead,'') = 'Retu')--AND field_sale_lead IS NOT null)
AND NOT (cte.field_sale = 'Retu' AND ISNULL(field_sale_lag,'') = 'buy' )--AND field_sale_lag IS NOT NULL)
We can tackle this without loops or recursion by using a common table expression and row_number() like so:
If I am understanding your question correctly, you want to remove sales that have been returned
, and for each 'retu' it should remove the most recent 'buy'.
First we will add id using row_number() to our rowset so we can uniquely identify our rows.
Next, we add br_rn (short for Buy/Return RowNumber) partitioned by field_id, field_number, field_date, but we will also add field_sale to the partition; and we will order it by field_inserted desc.
This will let us match each 'retu' with the most recent 'buy', and once we can do that, we can eliminate all of the pairs with not exists():
;with cte as (
select
id = row_number() over (
order by field_id, field_number, field_date, field_inserted asc
)
, field_id
, field_number
, field_date
, field_inserted
, field_sale
, br_rn = row_number() over (
partition by field_id, field_number, field_date, field_sale
order by field_inserted desc
)
from #tablea
)
select
id
, field_number
, field_date
, field_inserted
, field_sale
from cte
where not exists (
select 1
from cte as i
where i.field_id = cte.field_id
and i.field_number = cte.field_number
and i.field_date = cte.field_date
and i.br_rn = cte.br_rn
and i.id <> cte.id
)
order by id
rextester demo: http://rextester.com/TKXOC61533
For this input:
(1, 100, '20170311','20170311 01:00:00', 'Buy')
, (1, 100, '20170311','20170311 01:01:00', 'Buy')
, (1, 100, '20170311','20170311 01:02:00', 'Retu')
, (1, 100, '20170311','20170311 01:03:00', 'Retu')
, (2, 100, '20170311','20170311 01:03:00', 'Buy')
, (1, 110, '20170311','20170311 01:03:00', 'Buy');
returns:
+----+----------+--------------+------------+---------------------+------------+
| id | field_id | field_number | field_date | field_inserted | field_sale |
+----+----------+--------------+------------+---------------------+------------+
| 5 | 1 | 110 | 2017-03-11 | 2017-03-11 01:03:00 | Buy |
| 6 | 2 | 100 | 2017-03-11 | 2017-03-11 01:03:00 | Buy |
+----+----------+--------------+------------+---------------------+------------+
and for this input:
(1, 100, '20170311','20170311 01:01:00', 'Buy')
, (1, 100, '20170311','20170311 01:02:00', 'Buy')
, (1, 100, '20170311','20170311 01:03:00', 'Buy')
, (1, 100, '20170311','20170311 01:04:00', 'Retu')
, (1, 100, '20170311','20170311 01:05:00', 'Buy')
, (1, 100, '20170311','20170311 01:06:00', 'Retu')
, (1, 100, '20170311','20170311 01:07:00', 'Retu')
, (2, 100, '20170311','20170311 01:03:00', 'Buy')
, (1, 110, '20170311','20170311 01:03:00', 'Buy');
returns:
+----+----------+--------------+------------+---------------------+------------+
| id | field_id | field_number | field_date | field_inserted | field_sale |
+----+----------+--------------+------------+---------------------+------------+
| 1 | 1 | 100 | 2017-03-11 | 2017-03-11 01:01:00 | Buy |
| 8 | 1 | 110 | 2017-03-11 | 2017-03-11 01:03:00 | Buy |
| 9 | 2 | 100 | 2017-03-11 | 2017-03-11 01:03:00 | Buy |
+----+----------+--------------+------------+---------------------+------------+
for this input:
(1, 100, '20170311','20170311 01:01:00', 'Buy')
, (1, 100, '20170311','20170311 01:02:00', 'Buy')
, (1, 100, '20170311','20170311 01:04:00', 'Retu')
, (1, 100, '20170311','20170311 01:05:00', 'Retu')
, (1, 100, '20170312','20170311 01:06:00', 'Buy')
, (1, 100, '20170312','20170311 01:07:00', 'Buy')
, (2, 100, '20170311','20170311 01:03:00', 'Buy')
, (1, 110, '20170311','20170311 01:03:00', 'Buy')
returns:
+----+----------+--------------+------------+---------------------+------------+
| id | field_id | field_number | field_date | field_inserted | field_sale |
+----+----------+--------------+------------+---------------------+------------+
| 5 | 1 | 100 | 2017-03-12 | 2017-03-11 01:06:00 | Buy |
| 6 | 1 | 100 | 2017-03-12 | 2017-03-11 01:07:00 | Buy |
| 7 | 1 | 110 | 2017-03-11 | 2017-03-11 01:03:00 | Buy |
| 8 | 2 | 100 | 2017-03-11 | 2017-03-11 01:03:00 | Buy |
+----+----------+--------------+------------+---------------------+------------+
It may help illustrate what we are doing to take a look what the cte is returning before we eliminate any pairs.
Looking at just the set that needs filtering, before we filter it:
+----+----------+--------------+------------+---------------------+------------+-------+
| id | field_id | field_number | field_date | field_inserted | field_sale | br_rn |
+----+----------+--------------+------------+---------------------+------------+-------+
| 1 | 1 | 100 | 2017-03-11 | 2017-03-11 01:01:00 | Buy | 4 |
| 2 | 1 | 100 | 2017-03-11 | 2017-03-11 01:02:00 | Buy | 3 |
| 3 | 1 | 100 | 2017-03-11 | 2017-03-11 01:03:00 | Buy | 2 |
| 4 | 1 | 100 | 2017-03-11 | 2017-03-11 01:04:00 | Retu | 3 |
| 5 | 1 | 100 | 2017-03-11 | 2017-03-11 01:05:00 | Buy | 1 |
| 6 | 1 | 100 | 2017-03-11 | 2017-03-11 01:06:00 | Retu | 2 |
| 7 | 1 | 100 | 2017-03-11 | 2017-03-11 01:07:00 | Retu | 1 |
+----+----------+--------------+------------+---------------------+------------+-------+
Looking at it like this, we can easily see that the 'buy' order id 1 has a br_rn of 4 and there is no associated 'retu'.
One thing i can suggest delete pairs of sequential buy/return while it's possible. Try
DECLARE #tablea TABLE (field_id int, field_number CHAR(3), field_date datetime, field_inserted DATETIME, field_sale varchar(4))
INSERT INTO #tablea
VALUES
(1, 100, '20170311','20170311 01:01:00', 'Buy'),
(1, 100, '20170311','20170311 01:02:00', 'Buy'),
(1, 100, '20170311','20170311 01:03:00', 'Buy'),
(1, 100, '20170311','20170311 01:04:00', 'Retu'),
(1, 100, '20170311','20170311 01:05:00', 'Buy'),
(1, 100, '20170311','20170311 01:06:00', 'Retu'),
(1, 100, '20170311','20170311 01:07:00', 'Retu'),
(2, 100, '20170311','20170311 01:03:00', 'Buy'),
(1, 110, '20170311','20170311 01:03:00', 'Buy');
select * from #tablea
order by field_id,
field_number,
field_inserted
declare #eoj int =1;
while #eoj > 0
begin
WITH cte AS
(
SELECT
case field_sale when 'Buy' then
lead (field_sale) OVER (PARTITION BY field_id, field_number ORDER BY field_inserted)
when 'Retu' then
lag (field_sale) OVER (PARTITION BY field_id, field_number ORDER BY field_inserted)
end nbr_type,
field_id,
field_number,
field_date,
field_sale,
field_inserted
FROM #tablea
)
delete
from cte
where nbr_type is not null and nbr_type <> field_sale;
set #eoj = ##rowcount;
-- check it
select * from #tablea
order by field_id,
field_number,
field_inserted;
end;
It will be repeated N+1 times where N is the length of the longest sequence of returns. N=2 in the above example.

T-SQL: Conditional NULL removal

I need to select only the Room_IDs that have no instances where the Status is NULL.
For example here :
TABLE_A
Room_Id Status Inspection_Date
-----------------------------------
1 NULL 5/15/2015
2 occupied 5/21/2015
2 NULL 1/19/2016
1 occupied 12/16/2015
4 NULL 3/25/2016
3 vacant 8/27/2015
1 vacant 4/17/2016
3 vacant 12/12/2015
3 vacant 3/22/2016
4 vacant 2/2/2015
4 vacant 3/24/2015
My result should look like this:
Room_Id Status Inspection_Date
-----------------------------------
3 vacant 8/27/2015
3 vacant 12/12/2015
3 vacant 3/22/2016
Because Room_ID '3' has no instances where the Status is NULL
Quick example of how to do it:
DECLARE #tTable TABLE(
Room_Id INT,
Status VARCHAR(20),
Inspection_Date DATETIME)
INSERT INTO #tTable VALUES
(1, NULL, '5/15/2015'),
(1,NULL, '5/15/2015'),
(2,'occupied', '5/21/2015'),
(2,NULL, '1/19/2016'),
(1,'occupied', '12/16/2015'),
(4,NULL, '3/25/2016'),
(3,'vacant', '8/27/2015'),
(1,'vacant', '4/17/2016'),
(3,'vacant', '12/12/2015'),
(3,'vacant', '3/22/2016'),
(4,'vacant', '2/2/2015'),
(4,'vacant', '3/24/2015')
SELECT * FROM #tTable T1
WHERE Room_Id NOT IN (SELECT Room_ID FROM #tTable WHERE Status IS NULL)
Gives :
Room_Id | Status | Inspection_Date |
-------------------------------------------------
3 | vacant | 2015-08-27 00:00:00.000
3 | vacant | 2015-12-12 00:00:00.000
3 | vacant | 2016-03-22 00:00:00.000
Try this out:
SELECT *
FROM Table1
WHERE Room_ID NOT IN
(
SELECT DISTINCT Room_ID
FROM Table1
WHERE Status IS NULL
)
The sub query returns a list of unique room id's that, at one time or another, had a NULL status. The outer query looks at that list, and says "Return * where the room_ID IS NOT one those in the subquery.
If you want to try it in SQL Fiddle, here is the Schema:
CREATE TABLE Table1
(Room_ID int, Status varchar(8), Inspection_Date datetime)
;
INSERT INTO Table1
(Room_ID, Status, Inspection_Date)
VALUES
(1, NULL, '2015-05-15 00:00:00'),
(2, 'occupied', '2015-05-21 00:00:00'),
(2, NULL, '2016-01-19 00:00:00'),
(1, 'occupied', '2015-12-16 00:00:00'),
(4, NULL, '2016-03-25 00:00:00'),
(4, 'vacant', '2015-08-27 00:00:00'),
(1, 'vacant', '2016-04-17 00:00:00'),
(3, 'vacant', '2015-12-12 00:00:00'),
(3, 'vacant', '2016-03-22 00:00:00'),
(4, 'vacant', '2015-02-02 00:00:00'),
(4, 'vacant', '2015-03-24 00:00:00'),
(2, NULL, '2015-05-22 00:00:00')
;
As alternative to Hashman, I just prefer to use not exists over not in for these types of queries.
Creating some test data
Note that I just kept the same date for everything since it's not imperative to the question.
create table #table_a (
Room_Id int,
Status varchar(32),
Inspection_Date date);
insert #table_a (Room_Id, Status, Inspection_Date)
values
(1, null, getdate()),
(2, 'occupied', getdate()),
(2, null, getdate()),
(1, 'occupied', getdate()),
(4, null, getdate()),
(3, 'vacant', getdate()),
(1, 'vacant', getdate()),
(3, 'vacant', getdate()),
(3, 'vacant', getdate()),
(4, 'vacant', getdate()),
(4, 'vacant', getdate());
The query
select *
from #table_a t1
where not exists (
select *
from #table_a t2
where t1.Room_Id = t2.Room_Id
and Status is null);
The results
Room_Id Status Inspection_Date
----------- -------------------------------- ---------------
3 vacant 2016-06-17
3 vacant 2016-06-17
3 vacant 2016-06-17
You can use CTE and NOT EXIST like below code
WITH bt
AS ( SELECT RoomId ,
Status,
Inspection_Date
FROM dbo.Table_1
)
SELECT *
FROM bt AS a
WHERE NOT EXISTS ( SELECT 1
FROM bt
WHERE bt.RoomId = a.RoomId
AND bt.Status IS NULL );