how to compare two dates in same column in SQL - sql

I have to compare tow dates that they are in one column of the table, I need this comparing to find the date before and after the specific date that I need, also I have to show them in the 3 different columns
I wrote this code but it's totally wrong:
CREATE VIEW
AS
SELECT (CASE
WHEN T1.BuyDate > T2.BuyDate THEN T1.BuyDate END)
AS PreviousBuyDate, T1.ItemName, T1.BuyDate,
(CASE
WHEN T1.BuyDate > T2.BuyDate THEN T1.BuyDate END)
AS NextDate
FROM FoodSara_tbl T1 , FoodSara_tbl T2
GO
input:
|ItemName | BuyDate | ItemOrigin |
|---------|---------|------------|
| cake |2020-10-2| UK |
| coca |2020-5-2 | US |
| cake |2019-10-6| UK |
| coca |2020-12-2| US |
Output:
|PreviousDate | ItemName | BuyDate |NextDate |
|-------------|----------|---------|---------|
| NULL |cake |2019-10-6|2020-10-2|
| NULL |coca |2020-5-2 |2020-12-2|
|2019-10-6 |cake |2020-10-2| NULL |
| 2020-5-2 |coca |2020-12-2| NULL |
PS: I have to make a date be in order.

Try this with LAG function:
select LAG(BuyDate,1) OVER (PARTITION BY ItemName ORDER BY BuyDate asc) previous_date
, ItemName
, BuyDate
, LAG(BuyDate,1) OVER (PARTITION BY ItemName ORDER BY BuyDate desc) next_date
from FoodSara_tbl
See the final result: sqlfiddle
OR
Use LAG and LEAD function:
select LAG(BuyDate,1) OVER (PARTITION BY ItemName order by BuyDate) previous_date
, ItemName
, BuyDate
, LEAD(BuyDate,1) OVER (PARTITION BY ItemName order by BuyDate) next_date
from FoodSara_tbl
See the final result; sqlfiddle

Related

SQL Query to get 2nd most recent results with multiple columns

I am trying to obtain the 2nd most recent results for all distinct part_id's(based off of order_date) to go into a report I am working on making to compare it to the most recent results.
The commented sections are from another approach I was trying but unsuccessful with.
Any help is greatly appreciated!
(Side note: I am new to posting on SO, and I apologize in advance if this is answered elsewhere, but I was unable to find anything that pertained to this issue)
I am using the following query:
SELECT
PURCHASE_ORDER.ORDER_DATE
, PURC_ORDER_LINE.PART_ID
, PURCHASE_ORDER.VENDOR_ID
, PURC_ORDER_LINE.LINE_STATUS
, PURC_ORDER_LINE.ORDER_QTY
, PURC_ORDER_LINE.UNIT_PRICE
--, ROW_NUMBER() over (ORDER BY PURCHASE_ORDER.ORDER_DATE DESC)AS ROW
, CAST (PURC_ORDER_LINE.ORDER_QTY * PURC_ORDER_LINE.UNIT_PRICE AS VARCHAR) AS TOTAL_COST
FROM
PURCHASE_ORDER
INNER JOIN
PURC_ORDER_LINE
ON
PURCHASE_ORDER.ID = PURC_ORDER_LINE.PURC_ORDER_ID
WHERE PURCHASE_ORDER.ORDER_DATE < (SELECT MAX(ORDER_DATE) FROM PURCHASE_ORDER) AND PURC_ORDER_LINE.PART_ID = 'XXXX'
ORDER BY ORDER_dATE DESC
--WHERE PURC_ORDER_LINE.PART_ID = 'XXXX' and PURCHASE_ORDER.ORDER_DATE = (SELECT MAX(ORDER_DATE) FROM PURCHASE_ORDER WHERE ORDER_DATE < (SELECT MAX(ORDER_DATE) FROM PURCHASE_ORDER))
EDIT 5/28 LATE NIGHT:
Lets say below is the data set I need the 2nd result from each part_id (2nd based off of ORDER_DATE DESC)
+-------------+---------+-----------+
| ORDER_DATE | PART_ID | VENDOR_ID |
+-------------+---------+-----------+
| 2020-05-29 | XXXX | CVVB |
| 2020-05-27 | XXXX | CVVB |
| 2020-05-28 | XXXX | CVVA |
| 2020-05-28 | YYYY | GGNB |
| 2020-04-12 | YYYY | GGNB |
| 2020-02-08 | YYYY | GGNB |
| 2020-05-28 | ZZZZ | LLNB |
| 2019-10-28 | ZZZZ | LLNB |
| 2019-05-27 | ZZZZ | OKIJ |
+-------------+---------+-----------+
I am looking to receive the following output (for more than 3 different part id's):
+------------+---------+-----------+
| ORDER_DATE | PART_ID | VENDOR_ID |
+------------+---------+-----------+
| 2020-05-28 | XXXX | CVVA |
| 2020-04-12 | YYYY | GGNB |
| 2019-10-28 | ZZZZ | LLNB |
+------------+---------+-----------+
There are also additional columns in the query but formatting them as a table would have taken much longer. I have left off a few of the columns on the examples.
ANOTHER EDIT
I am not sure if this information helps, but I am trying to compare the most recent results to the previous results to show pricing and vendor differences. We are compiling the data into Report Builder; My approach here was to create 2 separate datasets one with the most recent and the other with the 2nd most recent and combine the data from the datasets in Report Builder. If there is an easier approach and I am heading in the wrong direction please let me know!
Example:
+------------+---------+-----------+-------------+----------+------------+
| ORDER_DATE | PART_ID | VENDOR_ID | Porder_Date | Ppart_ID | pVendor_id |
+------------+---------+-----------+-------------+----------+------------+
| 2020-05-29 | XXXX | CVVB | 2020-05-28 | XXXX | CVVA |
| 2020-05-28 | YYYY | GGNB | 2020-04-12 | YYYY | GGNB |
| 2020-05-28 | ZZZZ | LLNB | 2019-10-28 | ZZZZ | LLNB |
+------------+---------+-----------+-------------+----------+------------+
EDIT THE NEXT MORNING
Thanks everyone for all the help! After Harry posted his solution I went ahead and made some tiny edits to get the columns I needed added on. I swapped his union portion with the original select statement. Everything here seems to be exactly what I am looking for!
Code:
;
WITH mycte AS
(
SELECT
PURCHASE_ORDER.ORDER_DATE
, PURC_ORDER_LINE.PART_ID
, PURCHASE_ORDER.VENDOR_ID
, PURC_ORDER_LINE.LINE_STATUS
, PURC_ORDER_LINE.ORDER_QTY
, PURC_ORDER_LINE.UNIT_PRICE
, CAST (PURC_ORDER_LINE.ORDER_QTY * PURC_ORDER_LINE.UNIT_PRICE AS VARCHAR) AS TOTAL_COST
FROM
PURCHASE_ORDER
INNER JOIN
PURC_ORDER_LINE
ON
PURCHASE_ORDER.ID = PURC_ORDER_LINE.PURC_ORDER_ID
)
, mycte2 AS
(
SELECT
CONVERT(DATE,order_date) AS order_date
, part_id
, vendor_id
, order_qty
, unit_price
, total_cost
, ROW_NUMBER() over(
PARTITION BY part_id
ORDER BY
CONVERT(DATE,order_date) DESC) AS row_num
FROM
mycte
)
SELECT
mycte2.order_date
, mycte2.part_id
, mycte2.vendor_id
, mycte2.order_qty
, mycte2.unit_price
, mycte2.total_cost
, previous.order_date porder_date
, previous.part_id ppart_id
, previous.vendor_id pvendor_id
, previous.order_qty poqrder_qty
, previous.unit_price punit_price
, previous.total_cost ptotal_cost
FROM
mycte2
LEFT JOIN
mycte2 previous
ON
previous.row_num = mycte2.row_num +1
AND mycte2.part_id = previous.part_id
WHERE
mycte2.row_num = 1
Based on the data you have provided, you can do this with a cte and row number function.
Note - it always helps to show the whole picture rather than just ask for the part you want help with (usually).. as it is easier to answer if we can understand the entire issue!
See code below
;with mycte as (
select
'2020-05-29' as order_date , 'XXXX' as part_id , 'CVVB' as vendor_id
union all select
'2020-05-27' , 'XXXX' , 'CVVB'
union all select
'2020-05-28' , 'XXXX' , 'CVVA'
union all select
'2020-05-28' , 'YYYY' , 'GGNB'
union all select
'2020-04-12' , 'YYYY' , 'GGNB'
union all select
'2020-02-08' , 'YYYY' , 'GGNB'
union all select
'2020-05-28' , 'ZZZZ' , 'LLNB'
union all select
'2019-10-28' , 'ZZZZ' , 'LLNB'
union all select
'2019-05-27' , 'ZZZZ' , 'OKIJ'
)
, mycte2 as (
Select
convert(date,order_date) as order_date
,part_id
,vendor_id
,ROW_NUMBER() over( partition by part_id order by convert(date,order_date) desc) as row_num
from mycte
)
Select
mycte2.order_date
,mycte2.part_id
,mycte2.vendor_id
,previous.order_date porder_date
,previous.part_id ppart_id
,previous.vendor_id pvendor_id
from mycte2
left join mycte2 previous
on previous.row_num = mycte2.row_num +1
and mycte2.part_id = previous.part_id
where mycte2.row_num = 1
result
I think something like this would work to get the second most recent order :
;WITH cteOrders AS (
SELECT ROW_NUMBER() OVER (ORDER BY Order_Date DESC) AS row_num,
PURCHASE_ORDER.ORDER_DATE
, PURC_ORDER_LINE.PART_ID
, PURCHASE_ORDER.VENDOR_ID
, PURC_ORDER_LINE.LINE_STATUS
, PURC_ORDER_LINE.ORDER_QTY
, PURC_ORDER_LINE.UNIT_PRICE
FROM PURCHASE_ORDER
INNER JOIN PURC_ORDER_LINE ON PURCHASE_ORDER.ID = PURC_ORDER_LINE.PURC_ORDER_ID
WHERE PURCHASE_ORDER.ORDER_DATE < (SELECT MAX(ORDER_DATE) FROM PURCHASE_ORDER) AND PURC_ORDER_LINE.PART_ID = 'XXXX'
)
SELECT * FROM cteOrders WHERE row_num = 2

Cumulated sum based on condition in other column

I would like to create a view based on data in following structure:
CREATE TABLE my_table (
date date,
daily_cumulative_precip float4
);
INSERT INTO my_table (date, daily_cumulative_precip)
VALUES
('2016-07-28', 3.048)
, ('2016-08-04', 2.286)
, ('2016-08-11', 5.334)
, ('2016-08-12', 0.254)
, ('2016-08-13', 2.794)
, ('2016-08-14', 2.286)
, ('2016-08-15', 3.302)
, ('2016-08-17', 3.81)
, ('2016-08-19', 15.746)
, ('2016-08-20', 46.739998);
I would like to accumulate the precipitation for consecutive days only.
Below is the desired result for a different test case - except that days without rain should be omitted:
I have tried window functions with OVER(PARTITION BY date, rain_on_day) but they do not yield the desired result.
How could I solve this?
SELECT date
, dense_rank() OVER (ORDER BY grp) AS consecutive_group_nr -- optional
, daily_cumulative_precip
, sum(daily_cumulative_precip) OVER (PARTITION BY grp ORDER BY date) AS cum_precipitation_mm
FROM (
SELECT date, t.daily_cumulative_precip
, row_number() OVER (ORDER BY date) - t.rn AS grp
FROM (
SELECT generate_series (min(date), max(date), interval '1 day')::date AS date
FROM my_table
) d
LEFT JOIN (SELECT *, row_number() OVER (ORDER BY date) AS rn FROM my_table) t USING (date)
) x
WHERE daily_cumulative_precip > 0
ORDER BY date;
db<>fiddle here
Returns all rainy days with cumulative sums for consecutive days (and a running group number).
Basics:
Select longest continuous sequence
Here's a way to calculate cumulative precipitation without having to explicitly enumerate all dates:
SELECT date, daily_cumulative_precip, sum(daily_cumulative_precip) over (partition by group_num order by date) as cum_precip
FROM
(SELECT date, daily_cumulative_precip, sum(start_group) over (order by date) as group_num
FROM
(SELECT date, daily_cumulative_precip, CASE WHEN (date != prev_date + 1) THEN 1 ELSE 0 END as start_group
FROM
(SELECT date, daily_cumulative_precip, lag(date, 1, '-infinity'::date) over (order by date) as prev_date
FROM my_table) t1) t2) t3
yields
| date | daily_cumulative_precip | cum_precip |
|------------+-------------------------+------------|
| 2016-07-28 | 3.048 | 3.048 |
| 2016-08-04 | 2.286 | 2.286 |
| 2016-08-11 | 5.334 | 5.334 |
| 2016-08-12 | 0.254 | 5.588 |
| 2016-08-13 | 2.794 | 8.382 |
| 2016-08-14 | 2.286 | 10.668 |
| 2016-08-15 | 3.302 | 13.97 |
| 2016-08-17 | 3.81 | 3.81 |
| 2016-08-19 | 15.746 | 15.746 |
| 2016-08-20 | 46.74 | 62.486 |

Using the last_value function on every column | Downfilling all nulls in a table

I have table an individual level table, ordered by Person_ID and Date, ascending. There are duplicate entries at the Person_ID level. What I would like to do is "downfill" null values across every column -- my impression is that the last_value( | ignore nulls) function will work perfectly for each column.
A major problem is that the table is hundreds of columns wide, and is quite dynamic (feature creation for ML experiments). There has to be a better way than to writing out a last_value statement for each variable, something like this:
SELECT last_value(var1) OVER (PARTITION BY Person_ID ORDER BY Date ASC
RANGE BETWEEN UNBOUNDED PRECEDING) as Var1,
last_value(var2) OVER (PARTITION BY Person_ID ORDER BY Date ASC
RANGE BETWEEN UNBOUNDED PRECEDING) as Var2,
...
last_value(var300) OVER (PARTITION BY Person_ID ORDER BY Date ASC
RANGE BETWEEN UNBOUNDED PRECEDING) as Var3
FROM TABLE
In summmary, I have the following table:
+----------+-----------+------+------+---+------------+
| PersonID | YearMonth | Var1 | Var2 | … | Var300 |
+----------+-----------+------+------+---+------------+
| 1 | 200901 | 2 | null | | null |
| 1 | 200902 | null | 1 | | Category 1 |
| 1 | 201010 | null | 1 | | null |
+----------+-----------+------+------+---+------------+
and desire the following table:
+----------+-----------+------+------+---+------------+
| PersonID | YearMonth | Var1 | Var2 | … | Var300 |
+----------+-----------+------+------+---+------------+
| 1 | 200901 | 2 | null | | null |
| 1 | 200902 | 2 | 1 | | Category 1 |
| 1 | 201010 | 2 | 1 | | Category 1 |
+----------+-----------+------+------+---+------------+
I don't see any great options for you, but here are two approaches you might look into.
OPTION 1 -- Recursive CTE
In this approach, you use a recursive query, where each child value equals itself or, if it is null, its parent's value. Like so:
WITH
ordered AS (
SELECT yt.*
row_number() over ( partition by yt.personid order by yt.yearmonth ) rn
FROM YOUR_TABLE yt),
downfilled ( personid, yearmonth, var1, var2, ..., var300, rn) as (
SELECT o.*
FROM ordered o
WHERE o.rn = 1
UNION ALL
SELECT c.personid, c.yearmonth,
nvl(c.var1, p.var1) var1,
nvl(c.var2, p.var2) var2,
...
nvl(c.var300, p.var300) var300
FROM downfilled p INNER JOIN ordered c ON c.personid = p.personid AND c.rn = p.rn + 1 )
SELECT * FROM downfilled
ORDER BY personid, yearmonth;
This replaces each expression like this:
last_value(var2) OVER (PARTITION BY Person_ID ORDER BY Date ASC
RANGE BETWEEN UNBOUNDED PRECEDING) as Var2
with an expression like this:
NVL(c.var2, p.var2)
One downside, though, is that this makes you repeat the list of 300 columns twice (once for the 300 NVL() expressions and once to specify the output columns of the recursive CTE (downfilled).
OPTION 2 -- UNPIVOT and PIVOT again
In this approach, you UNPIVOT your VARxx columns into rows, so that you only need to write the last_value()... expression one time.
SELECT personid,
yearmonth,
var_column,
last_value(var_value ignore nulls)
over ( partition by personid, var_column order by yearmonth ) var_value
FROM YOUR_TABLE
UNPIVOT INCLUDE NULLS ( var_value FOR var_column IN ("VAR1","VAR2","VAR3") ) )
SELECT * FROM unp
PIVOT ( max(var_value) FOR var_column IN ('VAR1' AS VAR1, 'VAR2' AS VAR, 'VAR3' AS VAR3 ) )
Here you still need to list each column twice. Also, I'm not sure what performance will be like if you have a large data set.

Redshift window function for change in column

I have a redshift table with amongst other things an id and plan_type column and would like a window function group clause where the plan_type changes so that if this is the data for example:
| user_id | plan_type | created |
|---------|-----------|------------|
| 1 | A | 2019-01-01 |
| 1 | A | 2019-01-02 |
| 1 | B | 2019-01-05 |
| 2 | A | 2019-01-01 |
| 2 | A | 2-10-01-05 |
I would like a result like this where I get the first date that the plan_type was "new":
| user_id | plan_type | created |
|---------|-----------|------------|
| 1 | A | 2019-01-01 |
| 1 | B | 2019-01-05 |
| 2 | A | 2019-01-01 |
Is this possible with window functions?
EDIT
Since I have some garbage in the data where plan_type can sometimes be null and the accepted solution does not include the first row (since I can't have the OR is not null I had to make some modifications. Hopefully his will help other people if they have similar issues. The final query is as follows:
SELECT * FROM
(
SELECT
user_id,
plan_type,
created_at,
lag(plan_type) OVER (PARTITION by user_id ORDER BY created_at) as prev_plan,
row_number() OVER (PARTITION by user_id ORDER BY created_at) as rownum
FROM tablename
WHERE plan_type IS NOT NULL
) userHistory
WHERE
userHistory.plan_type <> userHistory.prev_plan
OR userHistory.rownum = 1
ORDER BY created_at;
The plan_type IS NOT NULL filters out bad data at the source table and the outer where clause gets any changes OR the first row of data that would not be included otherwise.
ALSO BE CAREFUL about the created_at timestamp if you are working of your prev_plan field since it would of course give you the time of the new value!!!
This is a gaps-and-islands problem. I think lag() is the simplest approach:
select user_id, plan_type, created
from (select t.*,
lag(plan_type) over (partition by user_id order by created) as prev_plan_type
from t
) t
where prev_plan_type is null or prev_plan_type <> plan_type;
This assumes that plan types can move back to another value and you want each one.
If not, just use aggregation:
select user_id, plan_type, min(created)
from t
group by user_id, plan_type;
use row_number() window function
select * from
(select *,row_number()over(partition by user_id,plan_type order by created) rn
) a where a.rn=1
use lag()
select * from
(
select user_id, plant_type, lag(plan_type) over (partition by user_id order by created) as changes, created
from tablename
)A where plan_type<>changes and changes is not null

How to select rows and nearby rows with specific conditions

I have a table (Trans) of values like
OrderID (unique) | CustID | OrderDate| TimeSinceLast|
------------------------------------------------------
123a | A01 | 20.06.18 | 20 |
123y | B05 | 20.06.18 | 31 |
113k | A01 | 18.05.18 | NULL | <------- need this
168x | C01 | 17.04.18 | 8 |
999y | B05 | 15.04.18 | NULL | <------- need this
188k | A01 | 15.04.18 | 123 |
678a | B05 | 16.03.18 | 45 |
What I need is to select the rows where TimeSinceLast is null, as well as a row preceding and following where TimeSinceLast is not null, grouped by custID
I'd need my final table to look like:
OrderID (unique) | CustID | OrderDate| TimeSinceLast|
------------------------------------------------------
123a | A01 | 20.06.18 | 20 |
113k | A01 | 18.05.18 | NULL |
188k | A01 | 15.04.18 | 123 |
123y | B05 | 20.06.18 | 31 |
999y | B05 | 15.04.18 | NULL |
678a | B05 | 16.03.18 | 45 |
The main problem is that TimeSinceLast is not reliable and for whatsoever reason does not calculate well the days since last order, so I cannot use it in a query for preceding or following row.
I have tried to look for codes and found something like this on this forum
with dt as
(select distinct custID, OrderID,
max (case when timeSinceLast is null then OrderID end)
over(partition by custID order by OrderDate
rows between 1 preceding and 1 following) as NullID
from Trans)
select *
from dt
where request_id between NullID -1 and NullID+1
But does not work well for my purposes. Also it looks like max function cannot work with missing values.
Many thanks
Use lead() and lag().
What I need is to select the rows where TimeSinceLast is null, as well as a row preceding and following where TimeSinceLast is not null.
First, the ordering is a little unclear. Your sample data and code do not match. The following assumes some combination of the date and orderid, but there may be other columns that better capture what you mean by "preceding" and "following".
This is a little tricky, because you don't want to always include the first and last rows -- unless necessary. So, look at two columns:
select t.*
from (select t.*,
lead(TimeSinceLast) over (partition by custid order by orderdate, orderid) as next_tsl,
lag(TimeSinceLast) over (partition by custid order by orderdate, orderid) as prev_tsl,
lead(orderid) over (partition by custid order by orderdate, orderid) as next_orderid,
lag(orderid) over (partition by custid order by orderdate, orderid) as prev_orderid
from t
) t
where TimeSinceLast is not null or
(next_tsl is null and next_orderid is not null) or
(prev_tsl is null and prev_orderid is not null);
USE APPLY
DECLARE #TransTable TABLE (OrderID char(4), CustID char(3), OrderDate date, TimeSinceLast int)
INSERT #TransTable VALUES
('123a', 'A01', '06.20.2018', 20),
('123y', 'B05', '06.20.2018' ,31),
('113k', 'A01', '05.18.2018' ,NULL), ------- need this
('168x', 'C01', '04.17.2018' ,8),
('999y', 'B05', '04.15.2018' ,NULL), ------- need this
('188k', 'A01', '04.15.2018' ,123),
('678a', 'B05', '03.16.2018' ,45)
SELECT B.OrderID, B.CustID, B.OrderDate, B.TimeSinceLast
FROM #TransTable A
CROSS APPLY (
SELECT 0 AS rn, A.OrderID, A.CustID, A.OrderDate, A.TimeSinceLast
UNION ALL
SELECT TOP 2 ROW_NUMBER() OVER (PARTITION BY CASE WHEN T.OrderDate > A.OrderDate THEN 1 ELSE 0 END ORDER BY ABS(DATEDIFF(day, T.OrderDate, A.OrderDate))) rn,
T.OrderID, T.CustID, T.OrderDate, T.TimeSinceLast
FROM #TransTable T
WHERE T.CustID = A.CustID AND T.OrderID <> A.OrderID
ORDER BY rn
) B
WHERE A.TimeSinceLast IS NULL
ORDER BY B.CustID, B.OrderDate DESC