How to select rows and nearby rows with specific conditions - sql

I have a table (Trans) of values like
OrderID (unique) | CustID | OrderDate| TimeSinceLast|
------------------------------------------------------
123a | A01 | 20.06.18 | 20 |
123y | B05 | 20.06.18 | 31 |
113k | A01 | 18.05.18 | NULL | <------- need this
168x | C01 | 17.04.18 | 8 |
999y | B05 | 15.04.18 | NULL | <------- need this
188k | A01 | 15.04.18 | 123 |
678a | B05 | 16.03.18 | 45 |
What I need is to select the rows where TimeSinceLast is null, as well as a row preceding and following where TimeSinceLast is not null, grouped by custID
I'd need my final table to look like:
OrderID (unique) | CustID | OrderDate| TimeSinceLast|
------------------------------------------------------
123a | A01 | 20.06.18 | 20 |
113k | A01 | 18.05.18 | NULL |
188k | A01 | 15.04.18 | 123 |
123y | B05 | 20.06.18 | 31 |
999y | B05 | 15.04.18 | NULL |
678a | B05 | 16.03.18 | 45 |
The main problem is that TimeSinceLast is not reliable and for whatsoever reason does not calculate well the days since last order, so I cannot use it in a query for preceding or following row.
I have tried to look for codes and found something like this on this forum
with dt as
(select distinct custID, OrderID,
max (case when timeSinceLast is null then OrderID end)
over(partition by custID order by OrderDate
rows between 1 preceding and 1 following) as NullID
from Trans)
select *
from dt
where request_id between NullID -1 and NullID+1
But does not work well for my purposes. Also it looks like max function cannot work with missing values.
Many thanks

Use lead() and lag().
What I need is to select the rows where TimeSinceLast is null, as well as a row preceding and following where TimeSinceLast is not null.
First, the ordering is a little unclear. Your sample data and code do not match. The following assumes some combination of the date and orderid, but there may be other columns that better capture what you mean by "preceding" and "following".
This is a little tricky, because you don't want to always include the first and last rows -- unless necessary. So, look at two columns:
select t.*
from (select t.*,
lead(TimeSinceLast) over (partition by custid order by orderdate, orderid) as next_tsl,
lag(TimeSinceLast) over (partition by custid order by orderdate, orderid) as prev_tsl,
lead(orderid) over (partition by custid order by orderdate, orderid) as next_orderid,
lag(orderid) over (partition by custid order by orderdate, orderid) as prev_orderid
from t
) t
where TimeSinceLast is not null or
(next_tsl is null and next_orderid is not null) or
(prev_tsl is null and prev_orderid is not null);

USE APPLY
DECLARE #TransTable TABLE (OrderID char(4), CustID char(3), OrderDate date, TimeSinceLast int)
INSERT #TransTable VALUES
('123a', 'A01', '06.20.2018', 20),
('123y', 'B05', '06.20.2018' ,31),
('113k', 'A01', '05.18.2018' ,NULL), ------- need this
('168x', 'C01', '04.17.2018' ,8),
('999y', 'B05', '04.15.2018' ,NULL), ------- need this
('188k', 'A01', '04.15.2018' ,123),
('678a', 'B05', '03.16.2018' ,45)
SELECT B.OrderID, B.CustID, B.OrderDate, B.TimeSinceLast
FROM #TransTable A
CROSS APPLY (
SELECT 0 AS rn, A.OrderID, A.CustID, A.OrderDate, A.TimeSinceLast
UNION ALL
SELECT TOP 2 ROW_NUMBER() OVER (PARTITION BY CASE WHEN T.OrderDate > A.OrderDate THEN 1 ELSE 0 END ORDER BY ABS(DATEDIFF(day, T.OrderDate, A.OrderDate))) rn,
T.OrderID, T.CustID, T.OrderDate, T.TimeSinceLast
FROM #TransTable T
WHERE T.CustID = A.CustID AND T.OrderID <> A.OrderID
ORDER BY rn
) B
WHERE A.TimeSinceLast IS NULL
ORDER BY B.CustID, B.OrderDate DESC

Related

Finding created on dates for duplicates in SQL

I have one table of contact records and I'm trying to get the count of duplicate records that were created on each date. I'm not looking to include the original instance in the count. I'm using SQL Server.
Here's an example table
| email | created_on |
| ------------- | ---------- |
| aaa#email.com | 08-16-22 |
| bbb#email.com | 08-16-22 |
| zzz#email.com | 08-16-22 |
| bbb#email.com | 07-12-22 |
| aaa#email.com | 07-12-22 |
| zzz#email.com | 06-08-22 |
| aaa#email.com | 06-08-22 |
| bbb#email.com | 04-21-22 |
And I'm expecting to return
| created_on | dupe_count |
| ---------- | ---------- |
| 08-16-22 | 3 |
| 07-12-22 | 2 |
| 06-08-22 | 0 |
| 04-21-22 | 0 |
Edited to add error message:
error message
I created a sub table based on email and created date row number. Then, you query that, and ignore the date when the email first was created (row number 1). Works perfectly fine in this case.
Entire code:
Create table #Temp
(
email varchar(50),
dateCreated date
)
insert into #Temp
(email, dateCreated) values
('aaa#email.com', '08-16-22'),
('bbb#email.com', '08-16-22'),
('zzz#email.com', '08-16-22'),
('bbb#email.com', '07-12-22'),
('aaa#email.com', '07-12-22'),
('zzz#email.com', '06-08-22'),
('aaa#email.com', '06-08-22'),
('bbb#email.com', '04-21-22')
select datecreated, sum(case when r = 1 then 0 else 1 end) as duplicates
from
(
Select email, datecreated, ROW_NUMBER() over(partition by email
order by datecreated) as r from #Temp
) b
group by dateCreated
drop table #Temp
Output:
datecreated duplicates
2022-04-21 0
2022-06-08 0
2022-07-12 2
2022-08-16 3
You can calculate the difference between total count of emails for every day and the count of unique emails for the day:
select created_on,
count(email) - count(distinct email) as dupe_count
from cte
group by created_on
It seems I have misunderstood your request, and you wanted to consider previous created_on dates' too:
ct as (
select created_on,
(select case when (select count(*)
from cte t2
where t1.email = t2.email and t1.created_on > t2.created_on
) > 0 then email end) as c
from cte t1)
select created_on,
count(distinct c) as dupe_count
from ct
group by created_on
order by 1
It seems that in oracle it is also possible to aggregate it using one query:
select created_on,
count(distinct case when (select count(*)
from cte t2
where t1.email = t2.email and t1.created_on > t2.created_on
) > 0 then email end) as c
from cte t1
group by created_on
order by 1

how to compare two dates in same column in SQL

I have to compare tow dates that they are in one column of the table, I need this comparing to find the date before and after the specific date that I need, also I have to show them in the 3 different columns
I wrote this code but it's totally wrong:
CREATE VIEW
AS
SELECT (CASE
WHEN T1.BuyDate > T2.BuyDate THEN T1.BuyDate END)
AS PreviousBuyDate, T1.ItemName, T1.BuyDate,
(CASE
WHEN T1.BuyDate > T2.BuyDate THEN T1.BuyDate END)
AS NextDate
FROM FoodSara_tbl T1 , FoodSara_tbl T2
GO
input:
|ItemName | BuyDate | ItemOrigin |
|---------|---------|------------|
| cake |2020-10-2| UK |
| coca |2020-5-2 | US |
| cake |2019-10-6| UK |
| coca |2020-12-2| US |
Output:
|PreviousDate | ItemName | BuyDate |NextDate |
|-------------|----------|---------|---------|
| NULL |cake |2019-10-6|2020-10-2|
| NULL |coca |2020-5-2 |2020-12-2|
|2019-10-6 |cake |2020-10-2| NULL |
| 2020-5-2 |coca |2020-12-2| NULL |
PS: I have to make a date be in order.
Try this with LAG function:
select LAG(BuyDate,1) OVER (PARTITION BY ItemName ORDER BY BuyDate asc) previous_date
, ItemName
, BuyDate
, LAG(BuyDate,1) OVER (PARTITION BY ItemName ORDER BY BuyDate desc) next_date
from FoodSara_tbl
See the final result: sqlfiddle
OR
Use LAG and LEAD function:
select LAG(BuyDate,1) OVER (PARTITION BY ItemName order by BuyDate) previous_date
, ItemName
, BuyDate
, LEAD(BuyDate,1) OVER (PARTITION BY ItemName order by BuyDate) next_date
from FoodSara_tbl
See the final result; sqlfiddle

Using the last_value function on every column | Downfilling all nulls in a table

I have table an individual level table, ordered by Person_ID and Date, ascending. There are duplicate entries at the Person_ID level. What I would like to do is "downfill" null values across every column -- my impression is that the last_value( | ignore nulls) function will work perfectly for each column.
A major problem is that the table is hundreds of columns wide, and is quite dynamic (feature creation for ML experiments). There has to be a better way than to writing out a last_value statement for each variable, something like this:
SELECT last_value(var1) OVER (PARTITION BY Person_ID ORDER BY Date ASC
RANGE BETWEEN UNBOUNDED PRECEDING) as Var1,
last_value(var2) OVER (PARTITION BY Person_ID ORDER BY Date ASC
RANGE BETWEEN UNBOUNDED PRECEDING) as Var2,
...
last_value(var300) OVER (PARTITION BY Person_ID ORDER BY Date ASC
RANGE BETWEEN UNBOUNDED PRECEDING) as Var3
FROM TABLE
In summmary, I have the following table:
+----------+-----------+------+------+---+------------+
| PersonID | YearMonth | Var1 | Var2 | … | Var300 |
+----------+-----------+------+------+---+------------+
| 1 | 200901 | 2 | null | | null |
| 1 | 200902 | null | 1 | | Category 1 |
| 1 | 201010 | null | 1 | | null |
+----------+-----------+------+------+---+------------+
and desire the following table:
+----------+-----------+------+------+---+------------+
| PersonID | YearMonth | Var1 | Var2 | … | Var300 |
+----------+-----------+------+------+---+------------+
| 1 | 200901 | 2 | null | | null |
| 1 | 200902 | 2 | 1 | | Category 1 |
| 1 | 201010 | 2 | 1 | | Category 1 |
+----------+-----------+------+------+---+------------+
I don't see any great options for you, but here are two approaches you might look into.
OPTION 1 -- Recursive CTE
In this approach, you use a recursive query, where each child value equals itself or, if it is null, its parent's value. Like so:
WITH
ordered AS (
SELECT yt.*
row_number() over ( partition by yt.personid order by yt.yearmonth ) rn
FROM YOUR_TABLE yt),
downfilled ( personid, yearmonth, var1, var2, ..., var300, rn) as (
SELECT o.*
FROM ordered o
WHERE o.rn = 1
UNION ALL
SELECT c.personid, c.yearmonth,
nvl(c.var1, p.var1) var1,
nvl(c.var2, p.var2) var2,
...
nvl(c.var300, p.var300) var300
FROM downfilled p INNER JOIN ordered c ON c.personid = p.personid AND c.rn = p.rn + 1 )
SELECT * FROM downfilled
ORDER BY personid, yearmonth;
This replaces each expression like this:
last_value(var2) OVER (PARTITION BY Person_ID ORDER BY Date ASC
RANGE BETWEEN UNBOUNDED PRECEDING) as Var2
with an expression like this:
NVL(c.var2, p.var2)
One downside, though, is that this makes you repeat the list of 300 columns twice (once for the 300 NVL() expressions and once to specify the output columns of the recursive CTE (downfilled).
OPTION 2 -- UNPIVOT and PIVOT again
In this approach, you UNPIVOT your VARxx columns into rows, so that you only need to write the last_value()... expression one time.
SELECT personid,
yearmonth,
var_column,
last_value(var_value ignore nulls)
over ( partition by personid, var_column order by yearmonth ) var_value
FROM YOUR_TABLE
UNPIVOT INCLUDE NULLS ( var_value FOR var_column IN ("VAR1","VAR2","VAR3") ) )
SELECT * FROM unp
PIVOT ( max(var_value) FOR var_column IN ('VAR1' AS VAR1, 'VAR2' AS VAR, 'VAR3' AS VAR3 ) )
Here you still need to list each column twice. Also, I'm not sure what performance will be like if you have a large data set.

Returning most recent row SQL Server

I have this table
CREATE TABLE Test (
OrderID int,
Person varchar(10),
LastModified Date
);
INSERT INTO Test (OrderID, Person, LastModified)
VALUES (1, 'Sam', '2018-05-15'),
(1, 'Tim','2018-05-14'),
(1, 'Kim','2018-05-05'),
(1, 'Dave','2018-05-13'),
(1, 'James','2018-05-11'),
(1, 'Fred','2018-05-05');
select * result:
| OrderID | Person | LastModified |
|---------|--------|--------------|
| 1 | Sam | 2018-05-15 |
| 1 | Tim | 2018-05-14 |
| 1 | Kim | 2018-05-05 |
| 1 | Dave | 2018-05-13 |
| 1 | James | 2018-05-11 |
| 1 | Fred | 2018-05-05 |
I am looking to return the most recent modified row which is the first row with 'Sam'.
Now i now i can use max to return the most recent date but how can i aggregate the person column to return sam?
Looking for a result set like
| OrderID | Person | LastModified |
|---------|--------|--------------|
| 1 | Sam | 2018-05-15 |
I ran this:
SELECT
OrderID,
max(Person) AS [Person],
max(LastModified) AS [LastModified]
FROM Test
GROUP BY
OrderID
but this returns:
| OrderID | Person | LastModified |
|---------|--------|--------------|
| 1 | Tim | 2018-05-15 |
Can someone advice me further please? thanks
*** UPDATE
INSERT INTO Test (OrderID, Person, LastModified)
VALUES (1, 'Sam', '2018-05-15'),
(1, 'Tim','2018-05-14'),
(1, 'Kim','2018-05-05'),
(1, 'Dave','2018-05-13'),
(1, 'James','2018-05-11'),
(1, 'Fred','2018-05-05'),
(2, 'Dave','2018-05-13'),
(2, 'James','2018-05-11'),
(2, 'Fred','2018-05-05');
So i would be looking for this result to be:
| OrderID | Person | LastModified |
|---------|--------|--------------|
| 1 | Sam | 2018-05-15 |
| 2 | Dave | 2018-05-13 |
If you always want just one record (the latest modified one) per OrderID then this would do it:
SELECT
t2.OrderID
, t2.Person
, t2.LastModified
FROM (
SELECT
MAX( LastModified ) AS LastModified
, OrderID
FROM
Test
GROUP BY
OrderID
) t
INNER JOIN Test t2
ON t2.LastModified = t.LastModified
AND t2.OrderID = t.OrderID
Expanding on your comment ("thanks very much, is there a way i can do this if there is more than one orderID e.g. multiple people and lastmodified for multiple orderID's?"), in xcvd's answer, I assume what you therefore want is this:
WITH CTE AS(
SELECT OrderId,
Person,
LastModifed,
ROW_NUMBER() OVER (PARTITION BY OrderID ORDER BY LastModified DESC) AS RN
FROM YourTable)
SELECT OrderID,
Person,
LastModified
FROM CTE
WHERE RN = 1;
How about just using TOP (1) and ORDER BY?
SELECT TOP (1) t.*
FROM Test t
ORDER BY LastModified DESC;
If you want this for each orderid, then this is a handy method in SQL Server:
SELECT TOP (1) WITH TIES t.*
FROM Test t
ORDER BY ROW_NUMBER() OVER (PARTITION BY OrderId ORDER BY LastModified DESC);
"xcvd's" answer is perfect for this, I would just like to add another solution that can be used here for the sake of showing you a method that can be used in more complex situations than this. This solution uses a nested query (sub-query) to find the MAX(LastModified) regardless of any other field and it will use the result in the original query's WHERE clause to find any results that meet the new criteria. Cheers.
SELECT OrderID
, Person
, LastModified
FROM Test
WHERE LastModified IN (SELECT MAX(LastModified)
FROM Test)
Here is one other method :
select t.*
from Test t
where LastModified = (select max(t1.LastModified) from Test t1 where t1.OrderID = t.OrderID);

Select that joins two tables Oracle PL/SQL

I've got two tables wchich I need to join with Select and I've got a problem.
The tables look like that:
table_price
Product_ID | Buy_date | Buy_price |
1 | 16.10.01 | 2.50 |
1 | 16.11.02 | 3.20 |
2 | 16.10.31 | 3.80 |
table expire_date
Product_ID | Count | Exp_date |
1 | 1000 | 17.10.01|
1 | 500 | 17.11.31|
2 | 500 | 17.11.01|
I need to write a select in Oracle PL/SQL wchich gives me following results:
Product_ID| Count | Exp_date| last_buy_price|
1 | 1000 | 17.10.01| 3.20 |
1 | 500 | 17.31.31| 3.20 |
2 | 500 | 17.11.01| 3.80 |
It means that it will give me every expire date with count of product from table expire_date and match it with last buy price from table_price with product_id (always with last buy price, ordered by column buy_date)
Please guys help me, I've tried so many codes and I still can't get satysfying results
A correlated subquery using keep is possibly the most performant method:
select ed.*,
(select max(p.buy_price) keep (dense_rank first order by p.buy_date desc)
from table_price p
where p.product_id = ed.product_id
) as last_buy_price
from expire_date ed;
You could, of course, also express this in the from clause:
select ed.*, p.last_buy_price
from expire_date ed left join
(select p.product_id,
max(p.buy_price) keep (dense_rank first order by p.buy_date desc) as last_buy_price
from table_price p
) p
on p.product_id = ed.product_id;
You can use ROW_NUMBER() :
SELECT ed.*,
tp.buy_price as last_buy_price
FROM expire_date ed
JOIN(SELECT s.*,
ROW_NUMBER() OVER(PARTITION BY s.product_id ORDER BY s.buy_date DESC) as rnk
FROM table_price s) tp
ON(ed.product_id = tp.product_id and tp.rnk = 1 )