Difference between current and previous column using OLAP functions - sql

I was asked to create a report (using Teradata SQL OLAP functions) as below
EMPL_ID | perd_end_d | pdct_I | Year to date sal Amnt | Diff in sale amnt from Prev month
-------------------------------------------------------------------------------------------
I was given the following "sales" dataset and I have to calculate "Year to date sale amount" and "difference in crrent and previous month's sale amount"
empl_id| perd_end_d | pdct_I|sale_amnt|
----------------------------------------
E1001 | 31-01-2010 | P2003 | 2,03 |
E1003 | 31-01-2010 | P2015 | 44 |
E1003 | 31-01-2010 | P2004 | 67,6 |
E1001 | 31-01-2010 | P2002 | 135 |
E1003 | 31-01-2010 | P2003 | 545 |
E1001 | 31-01-2010 | P2001 | 1,00 |
E1002 | 31-01-2010 | P2005 | 23 |
E1002 | 31-01-2010 | P2007 | 343 |
E1006 | 28-02-2010 | P2005 | 34 |
E1006 | 28-02-2010 | P2004 | 43 |
E1001 | 28-02-2010 | P2003 | 54 |
E1001 | 28-02-2010 | P2002 | 878 |
E1003 | 28-02-2010 | P2008 | 434 |
E1001 | 28-02-2010 | P2001 | 66 |
E1007 | 28-02-2010 | P2009 | 455 |
E1007 | 28-02-2010 | P2009 | 4,54 |
E1003 | 28-02-2010 | P2007 | 56 |
E1008 | 28-02-2010 | P2009 | 786 |
E1010 | 31-01-2011 | P2001 | 300 |
E1001 | 31-01-2011 | P2002 | 200 |
E1009 | 31-01-2011 | P2003 | 100 |
E1011 | 31-01-2012 | P2004 | 700 |
E1002 | 31-01-2012 | P2005 | 400 |
E1011 | 31-01-2012 | P2003 | 600 |
E1002 | 31-01-2012 | P2007 | 500 |
---------------------------------------
I want something like below
empl_id| perd_end_d | pdct_I|sale_amnt| diff(ur_mnt_sal - prev_mnt_sal)
-------------------------------------------------------------------------
E1001 | 31-01-2010 | P2003 | 2,03 | 203 -- or may be null
E1003 | 31-01-2010 | P2015 | 44 | 159
E1003 | 31-01-2010 | P2004 | 67,6 | 632
E1001 | 31-01-2010 | P2002 | 135 | 541
E1003 | 31-01-2010 | P2003 | 545 | 410
...
So far I managed to find the required result but it looks ugly, how can I improve the following solution.
SELECT perd_end_d
, pdct_I
, sale_amnt
, ABS( SUM(sale_amnt) over (partition by perd_end_d
order by perd_end_d
rows between 1 preceding and 1 preceding )
- SUM(sale_amnt) over (partition by perd_end_d
order by perd_end_d
rows current row ) )"prev_mnt_sal - cur_mnt_sal"
from sandbox.sales;
and the resultset is as following

SELECT perd_end_d
, pdct_I
, sale_amnt
, ABS( min(sale_amnt) over (partition by perd_end_d
order by perd_end_d
rows between 1 preceding and 1 preceding )
- sale_amnt) as "prev_mnt_sal - cur_mnt_sal"
from sandbox.sales;

To probably want something like this:
SELECT empl_id
, perd_end_d
, sum(sale_amnt) as sumsale
-- cumulative sum of sales per employee
, SUM(sumsale)
over (partition by empl_id
order by perd_end_d
rows unbounded preceding)
-- difference between current and previous month per employee
, sumsale -
SUM(sumsale)
over (partition by empl_id
order by perd_end_d
rows between 1 preceding and 1 preceding )
from sandbox.sales
group by 1,2;

Related

How to get Max date and sum of its rows SQL

I have following table,
+------+-------------+----------+---------+
| id | date | amount | amount2 |
+------+-------------+----------+---------+
| | | | 500 |
| 1 | 1/1/2020 | 1000 | |
+------+-------------+----------+---------+
| | | | 100 |
| 1 | 1/3/2020 | 1558 | |
+------+-------------+----------+---------+
| | | | 200 |
| 1 | 1/3/2020 | 126 | |
+------+-------------+----------+---------+
| | | | 500 |
| 2 | 2/5/2020 | 4921 | |
+------+-------------+----------+---------+
| | | | 100 |
| 2 | 2/5/2020 | 15 | |
+------+-------------+----------+---------+
| | | | 140 |
| 2 | 1/1/2020 | 5951 | |
+------+-------------+----------+---------+
| | | | 10 |
| 2 | 1/2/2020 | 1588 | |
+------+-------------+----------+---------+
| | | | 56 |
| 2 | 1/3/2020 | 1568 | |
+------+-------------+----------+---------+
| | | | 45 |
| 2 | 1/4/2020 | 12558 | |
+------+-------------+----------+---------+
I need to get each Id's max date and its amount and amount2 summations, how can I do this. according to above data, I need following output.
+------+-------------+----------+---------+
| | | | 300 |
| 1 | 1/3/2020 | 1684 | |
+------+-------------+----------+---------+
| | | | 600 |
| 2 | 2/5/2020 | 4936 | |
+------+-------------+----------+---------+
How can I do this.
Aggregate and use MAX OVER to get the IDs' maximum dates:
select id, [date], sum_amount, sum_amount2
from
(
select
id, [date], sum(amount) as sum_amount, sum(amount2) as sum_amount2,
max([date]) over (partition by id) as max_date_for_id
from mytable group by id, [date]
) aggregated
where [date] = max_date_for_id
order by id;
first is to use dense_rank() to find the row with latest date
dense_rank () over (partition by id order by [date] desc)
after that, just simply group by with sum() on the amount
select id, [date], sum(amount), sum(amount2)
from
(
select *,
dr = dense_rank () over (partition by id order by [date] desc)
from your_table
) t
where dr = 1
group by id, [date]

Selecting the first instance of a vendor, part combination

I am trying to create an indicator for if a particular transaction was the first time a part was purchased from a particular vendor.
I have a dataset that looks like this:
| transaction_id | vendor_id | part_id | trans_date |
|:--------------:|:---------:|:-------:|:-----------------:|
| 9Bx*2Pc' | a | 873 | 10/12/2018 |
| 1Po.4Ot, | a | 473 | 4/22/2016 |
| 9Sk"7Kv/ | b | 123 | 7/23/2016 |
| 2Lz&7Hu& | a | 873 | 12/20/2017 |
| 8Lz)5Is# | b | 743 | 10/22/2016 |
| 5Sc'6Jl/ | a | 113 | 10/6/2016 |
| 0Ra&8Hb& | a | 653 | 10/4/2017 |
| 4Wc-8Of* | c | 333 | 8/3/2017 |
| 8Vv+9Yo/ | c | 333 | 12/7/2016 |
| 6Qh!1Ha- | c | 333 | 3/28/2017 |
| 2Ol%4Rs# | c | 333 | 5/2/2017 |
| 1Gg#8Cm% | c | 333 | 11/15/2016 |
| 0Lw(6Pv/ | d | 873 | 8/13/2017 |
| 1Gy/7Zw, | a | 443 | 10/12/2018 |
| 2Gz,4Gp. | b | 103 | 1/5/2018 |
| 5Dj)6Wc+ | a | 893 | 12/17/2016 |
| 5Hl-8Ds! | a | 903 | 12/8/2017 |
| 8Ws$3Vy* | b | 873 | 1/13/2018 |
What I am looking to do is determine if the transaction_id was the first time (sorted by trans_date), that the part_id was purchased from a vendor_id. I would imagine the ideal output to look like this:
| transaction_id | vendor_id | part_id | trans_date | first_time |
|:--------------:|:---------:|:-------:|:-----------------:|:----------:|
| 9Bx*2Pc' | a | 873 | 10/12/2018 | N |
| 1Po.4Ot, | a | 473 | 4/22/2016 | Y |
| 9Sk"7Kv/ | b | 123 | 7/23/2016 | Y |
| 2Lz&7Hu& | a | 873 | 12/20/2017 | Y |
| 8Lz)5Is# | b | 743 | 10/22/2016 | Y |
| 5Sc'6Jl/ | a | 113 | 10/6/2016 | Y |
| 0Ra&8Hb& | a | 653 | 10/4/2017 | Y |
| 4Wc-8Of* | c | 333 | 8/3/2017 | N |
| 8Vv+9Yo/ | c | 333 | 12/7/2016 | N |
| 6Qh!1Ha- | c | 333 | 3/28/2017 | N |
| 2Ol%4Rs# | c | 333 | 5/2/2017 | N |
| 1Gg#8Cm% | c | 333 | 11/15/2016 | Y |
| 0Lw(6Pv/ | d | 873 | 8/13/2017 | Y |
| 1Gy/7Zw, | a | 443 | 10/12/2018 | Y |
| 2Gz,4Gp. | b | 103 | 1/5/2018 | Y |
| 5Dj)6Wc+ | a | 893 | 12/17/2016 | Y |
| 5Hl-8Ds! | a | 903 | 12/8/2017 | Y |
| 8Ws$3Vy* | b | 873 | 1/13/2018 | Y |
So far, I have tried (which was influenced by this post):
WITH
first_instance AS (
SELECT
tbl_trans.*,
ROW_NUMBER() OVER (PARTITION BY vendor_id||part_id ORDER BY trans_date) AS row_nums
FROM
tbl_trans
)
SELECT
x.*,
CASE WHEN y.row_nums = 1 THEN 'Y' ELSE 'N' END AS first_time_indicator
FROM
tbl_trans x
LEFT JOIN first_instance y
But I am met with:
ORA-00905: missing keyword
I have created a SQL FIDDLE with this data and the query thus far for testing. How can I determine the if a transaction was a first time purchase for a part/vendor combination?
Use window functions:
select t.*,
(case when row_number() over (partition by vendor_id, part_id order by trans_date) = 1
then 'Y' else 'N'
end) as first_time
from tbl_trans t;
You don't need a join.
Apart from row_number, there are multiple ways of achieving the desired result using analytical function as follows.
You can use first_value analytical function as follows:
Select t.*,
Case
when first_value(trans_date)
over (partition by vendor_id, part_id order by trans_date) = trans_date
then 'Y'
else 'N'
end as first_time
From your_table t;
The same way, you can also use min as follows:
Select t.*,
Case
when min(trans_date)
over (partition by vendor_id, part_id) = trans_date
then 'Y'
else 'N'
end as first_time
From your_table t;

Find rows with adjourning date ranges and accumulate their durations

My PostgreSQL database stores school vacation, public holidays and weekend dates for parents to plan their vacation. Many times school vacations are adjourned by weekends or public holidays. I want to display the total number of non-school days for a school vacation. That should include any adjourned weekend or public holiday.
Example Data
locations
SELECT id, name, is_federal_state
FROM locations
WHERE is_federal_state = true;
| id | name | is_federal_state |
|----|-------------------|------------------|
| 2 | Baden-Württemberg | true |
| 3 | Bayern | true |
holiday_or_vacation_types
SELECT id, name FROM holiday_or_vacation_types;
| id | name |
|----|-----------------------|
| 1 | Herbst |
| 8 | Wochenende |
"Herbst" is German for "autumn" and "Wochenende" is German for "weekend".
periods
SELECT id, starts_on, ends_on, holiday_or_vacation_type_id
FROM periods
WHERE location_id = 2
ORDER BY starts_on;
| id | starts_on | ends_on | holiday_or_vacation_type_id |
|-----|--------------|--------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 8 |
Task
I want to select all periods where location_id equals 2. And I want to calculate the duration of each period in days. That can be done with this SQL query:
SELECT id, starts_on, ends_on,
(ends_on - starts_on + 1) AS duration,
holiday_or_vacation_type_id
FROM periods
| id | starts_on | ends_on | duration | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2 | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 3 | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 2 | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 2 | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 2 | 8 |
Any human looking at the calendar would see that the ids 670 (weekend), 532 (fall vacation) and 533 (fall vacation) are adjourned. So they add up to a 6 day vacation period. So far I do this with a program which computes this. But that takes quite a lot of resources (the actual table contains some 500,000 items).
Problem 1
Which SQL query would result in the following output (is adds a real_duration column)? Is that even possible with SQL?
| id | starts_on | ends_on | duration | real_duration | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|---------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2 | 6 | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 3 | 6 | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 | 6 | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 2 | 2 | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 2 | 2 | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 2 | 2 | 8 |
Problem 2
It is possible to list the adjourning periods in a part_of_range field? This would be the result. Can that be done with SQL?
| id | starts_on | ends_on | duration | part_of_range | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|---------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2 | 670,532,533 | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 3 | 670,532,533 | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 | 670,532,533 | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 2 | | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 2 | | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 2 | | 8 |
This is a gaps and islands problem. In this case you can use lag() to see where an island starts and then a cumulative sum.
The final operation is some aggregation (using window functions):
SELECT p.*,
(Max(ends_on) OVER (PARTITION BY location_id, grp) - Min(starts_on) OVER (PARTITION BY location_id, grp) ) + 1 AS duration,
Array_agg(p.id) OVER (PARTITION BY location_id)
FROM (SELECT p.*,
Count(*) FILTER (WHERE prev_eo < starts_on - INTERVAL '1 day') OVER (PARTITION BY location_id ORDER BY starts_on) AS grp
FROM (SELECT id, starts_on, ends_on, location_id, holiday_or_vacation_type_id,
lag(ends_on) OVER (PARTITION BY location_id ORDER BY (starts_on)) AS prev_eo
FROM periods
) p
) p;

Omit Duplicate Only When In Sequential Order

I've got a query returning the following information. I used Over(PARTITION BY to include row numbers. I'm capturing every time my work_center_S begins a new Order#, but I want to exclude the beginning of a new order when the part_number was the same as the order/row previous to it. I'm not able to use the DISTINCT function because a part_number may appear numerous times a day and I'll need to capture every time such a change occurs.
[![Query Return][1]][1]
[1]: https://i.stack.imgur.com/IKvsR.jpg
+----+--------+---------------+-------------+------+
| rn | Order# | work_center_S | part_number | Hour |
+----+--------+---------------+-------------+------+
| 1 | 7098 | TB312 | 37203 | 1 |
+----+--------+---------------+-------------+------+
| 2 | 8797 | TB312 | 37194 | 4 |
+----+--------+---------------+-------------+------+
| 3 | 8802 | TB312 | 37355 | 11 |
+----+--------+---------------+-------------+------+
| 4 | 0946 | TB312 | 37194 | 15 |
+----+--------+---------------+-------------+------+
| 5 | 0698 | TB312 | 37203 | 18 |
+----+--------+---------------+-------------+------+
| 6 | 0699 | TB312 | 37203 | 21 |
+----+--------+---------------+-------------+------+
I assume there isn't a -1 part_number
select Order#,work_center_S,part_number,Hour
from (select *
,lag(part_number,1,-1) over
(
partition by work_center_S
order by Hour
) as prev_part_number
from mytable
) t
where part_number <> prev_part_number
--
+--------+---------------+-------------+------+
| Order# | work_center_S | part_number | Hour |
+--------+---------------+-------------+------+
| 7098 | TB312 | 37203 | 1 |
+--------+---------------+-------------+------+
| 8797 | TB312 | 37194 | 4 |
+--------+---------------+-------------+------+
| 8802 | TB312 | 37355 | 11 |
+--------+---------------+-------------+------+
| 946 | TB312 | 37194 | 15 |
+--------+---------------+-------------+------+
| 698 | TB312 | 37203 | 18 |
+--------+---------------+-------------+------+

Fetch set or rows from table

I have a table Exam_record with the data. I need to know how to pullout the latest 2 record for each EID based on latest exam date.
EID | Exam_name | score | date_of_completion |
-----------------------------------------------
1 | Exam_1 | 60 | 23-Jun-2014 |
1 | Exam_1 | 70 | 10-Jan-2014 |
1 | Exam_1 | 71 | 15-Aug-2014 |
1 | Exam_1 | 65 | 1-Sep-2014 |
2 | Exam_2 | 50 | 2-Jul-2014 |
2 | Exam_2 | 55 | 12-May-2014 |
2 | Exam_2 | 65 | 15-Apr-2014 |
Desired output is
EID | Exam_name | score | date_of_completion |
-----------------------------------------------
1 | Exam_1 | 71 | 15-Aug-2014 |
1 | Exam_1 | 65 | 1-Sep-2014 |
2 | Exam_2 | 55 | 12-May-2014 |
2 | Exam_2 | 50 | 2-Jul-2014 |
Try like this:
select * from
(
select *,row_number()over(partition by EID order by date_of_completion desc) as rn from table
)x
where x<=2
more info