I was asked to create a report (using Teradata SQL OLAP functions) as below
EMPL_ID | perd_end_d | pdct_I | Year to date sal Amnt | Diff in sale amnt from Prev month
-------------------------------------------------------------------------------------------
I was given the following "sales" dataset and I have to calculate "Year to date sale amount" and "difference in crrent and previous month's sale amount"
empl_id| perd_end_d | pdct_I|sale_amnt|
----------------------------------------
E1001 | 31-01-2010 | P2003 | 2,03 |
E1003 | 31-01-2010 | P2015 | 44 |
E1003 | 31-01-2010 | P2004 | 67,6 |
E1001 | 31-01-2010 | P2002 | 135 |
E1003 | 31-01-2010 | P2003 | 545 |
E1001 | 31-01-2010 | P2001 | 1,00 |
E1002 | 31-01-2010 | P2005 | 23 |
E1002 | 31-01-2010 | P2007 | 343 |
E1006 | 28-02-2010 | P2005 | 34 |
E1006 | 28-02-2010 | P2004 | 43 |
E1001 | 28-02-2010 | P2003 | 54 |
E1001 | 28-02-2010 | P2002 | 878 |
E1003 | 28-02-2010 | P2008 | 434 |
E1001 | 28-02-2010 | P2001 | 66 |
E1007 | 28-02-2010 | P2009 | 455 |
E1007 | 28-02-2010 | P2009 | 4,54 |
E1003 | 28-02-2010 | P2007 | 56 |
E1008 | 28-02-2010 | P2009 | 786 |
E1010 | 31-01-2011 | P2001 | 300 |
E1001 | 31-01-2011 | P2002 | 200 |
E1009 | 31-01-2011 | P2003 | 100 |
E1011 | 31-01-2012 | P2004 | 700 |
E1002 | 31-01-2012 | P2005 | 400 |
E1011 | 31-01-2012 | P2003 | 600 |
E1002 | 31-01-2012 | P2007 | 500 |
---------------------------------------
I want something like below
empl_id| perd_end_d | pdct_I|sale_amnt| diff(ur_mnt_sal - prev_mnt_sal)
-------------------------------------------------------------------------
E1001 | 31-01-2010 | P2003 | 2,03 | 203 -- or may be null
E1003 | 31-01-2010 | P2015 | 44 | 159
E1003 | 31-01-2010 | P2004 | 67,6 | 632
E1001 | 31-01-2010 | P2002 | 135 | 541
E1003 | 31-01-2010 | P2003 | 545 | 410
...
So far I managed to find the required result but it looks ugly, how can I improve the following solution.
SELECT perd_end_d
, pdct_I
, sale_amnt
, ABS( SUM(sale_amnt) over (partition by perd_end_d
order by perd_end_d
rows between 1 preceding and 1 preceding )
- SUM(sale_amnt) over (partition by perd_end_d
order by perd_end_d
rows current row ) )"prev_mnt_sal - cur_mnt_sal"
from sandbox.sales;
and the resultset is as following
SELECT perd_end_d
, pdct_I
, sale_amnt
, ABS( min(sale_amnt) over (partition by perd_end_d
order by perd_end_d
rows between 1 preceding and 1 preceding )
- sale_amnt) as "prev_mnt_sal - cur_mnt_sal"
from sandbox.sales;
To probably want something like this:
SELECT empl_id
, perd_end_d
, sum(sale_amnt) as sumsale
-- cumulative sum of sales per employee
, SUM(sumsale)
over (partition by empl_id
order by perd_end_d
rows unbounded preceding)
-- difference between current and previous month per employee
, sumsale -
SUM(sumsale)
over (partition by empl_id
order by perd_end_d
rows between 1 preceding and 1 preceding )
from sandbox.sales
group by 1,2;
Related
I have following table,
+------+-------------+----------+---------+
| id | date | amount | amount2 |
+------+-------------+----------+---------+
| | | | 500 |
| 1 | 1/1/2020 | 1000 | |
+------+-------------+----------+---------+
| | | | 100 |
| 1 | 1/3/2020 | 1558 | |
+------+-------------+----------+---------+
| | | | 200 |
| 1 | 1/3/2020 | 126 | |
+------+-------------+----------+---------+
| | | | 500 |
| 2 | 2/5/2020 | 4921 | |
+------+-------------+----------+---------+
| | | | 100 |
| 2 | 2/5/2020 | 15 | |
+------+-------------+----------+---------+
| | | | 140 |
| 2 | 1/1/2020 | 5951 | |
+------+-------------+----------+---------+
| | | | 10 |
| 2 | 1/2/2020 | 1588 | |
+------+-------------+----------+---------+
| | | | 56 |
| 2 | 1/3/2020 | 1568 | |
+------+-------------+----------+---------+
| | | | 45 |
| 2 | 1/4/2020 | 12558 | |
+------+-------------+----------+---------+
I need to get each Id's max date and its amount and amount2 summations, how can I do this. according to above data, I need following output.
+------+-------------+----------+---------+
| | | | 300 |
| 1 | 1/3/2020 | 1684 | |
+------+-------------+----------+---------+
| | | | 600 |
| 2 | 2/5/2020 | 4936 | |
+------+-------------+----------+---------+
How can I do this.
Aggregate and use MAX OVER to get the IDs' maximum dates:
select id, [date], sum_amount, sum_amount2
from
(
select
id, [date], sum(amount) as sum_amount, sum(amount2) as sum_amount2,
max([date]) over (partition by id) as max_date_for_id
from mytable group by id, [date]
) aggregated
where [date] = max_date_for_id
order by id;
first is to use dense_rank() to find the row with latest date
dense_rank () over (partition by id order by [date] desc)
after that, just simply group by with sum() on the amount
select id, [date], sum(amount), sum(amount2)
from
(
select *,
dr = dense_rank () over (partition by id order by [date] desc)
from your_table
) t
where dr = 1
group by id, [date]
I am trying to create an indicator for if a particular transaction was the first time a part was purchased from a particular vendor.
I have a dataset that looks like this:
| transaction_id | vendor_id | part_id | trans_date |
|:--------------:|:---------:|:-------:|:-----------------:|
| 9Bx*2Pc' | a | 873 | 10/12/2018 |
| 1Po.4Ot, | a | 473 | 4/22/2016 |
| 9Sk"7Kv/ | b | 123 | 7/23/2016 |
| 2Lz&7Hu& | a | 873 | 12/20/2017 |
| 8Lz)5Is# | b | 743 | 10/22/2016 |
| 5Sc'6Jl/ | a | 113 | 10/6/2016 |
| 0Ra&8Hb& | a | 653 | 10/4/2017 |
| 4Wc-8Of* | c | 333 | 8/3/2017 |
| 8Vv+9Yo/ | c | 333 | 12/7/2016 |
| 6Qh!1Ha- | c | 333 | 3/28/2017 |
| 2Ol%4Rs# | c | 333 | 5/2/2017 |
| 1Gg#8Cm% | c | 333 | 11/15/2016 |
| 0Lw(6Pv/ | d | 873 | 8/13/2017 |
| 1Gy/7Zw, | a | 443 | 10/12/2018 |
| 2Gz,4Gp. | b | 103 | 1/5/2018 |
| 5Dj)6Wc+ | a | 893 | 12/17/2016 |
| 5Hl-8Ds! | a | 903 | 12/8/2017 |
| 8Ws$3Vy* | b | 873 | 1/13/2018 |
What I am looking to do is determine if the transaction_id was the first time (sorted by trans_date), that the part_id was purchased from a vendor_id. I would imagine the ideal output to look like this:
| transaction_id | vendor_id | part_id | trans_date | first_time |
|:--------------:|:---------:|:-------:|:-----------------:|:----------:|
| 9Bx*2Pc' | a | 873 | 10/12/2018 | N |
| 1Po.4Ot, | a | 473 | 4/22/2016 | Y |
| 9Sk"7Kv/ | b | 123 | 7/23/2016 | Y |
| 2Lz&7Hu& | a | 873 | 12/20/2017 | Y |
| 8Lz)5Is# | b | 743 | 10/22/2016 | Y |
| 5Sc'6Jl/ | a | 113 | 10/6/2016 | Y |
| 0Ra&8Hb& | a | 653 | 10/4/2017 | Y |
| 4Wc-8Of* | c | 333 | 8/3/2017 | N |
| 8Vv+9Yo/ | c | 333 | 12/7/2016 | N |
| 6Qh!1Ha- | c | 333 | 3/28/2017 | N |
| 2Ol%4Rs# | c | 333 | 5/2/2017 | N |
| 1Gg#8Cm% | c | 333 | 11/15/2016 | Y |
| 0Lw(6Pv/ | d | 873 | 8/13/2017 | Y |
| 1Gy/7Zw, | a | 443 | 10/12/2018 | Y |
| 2Gz,4Gp. | b | 103 | 1/5/2018 | Y |
| 5Dj)6Wc+ | a | 893 | 12/17/2016 | Y |
| 5Hl-8Ds! | a | 903 | 12/8/2017 | Y |
| 8Ws$3Vy* | b | 873 | 1/13/2018 | Y |
So far, I have tried (which was influenced by this post):
WITH
first_instance AS (
SELECT
tbl_trans.*,
ROW_NUMBER() OVER (PARTITION BY vendor_id||part_id ORDER BY trans_date) AS row_nums
FROM
tbl_trans
)
SELECT
x.*,
CASE WHEN y.row_nums = 1 THEN 'Y' ELSE 'N' END AS first_time_indicator
FROM
tbl_trans x
LEFT JOIN first_instance y
But I am met with:
ORA-00905: missing keyword
I have created a SQL FIDDLE with this data and the query thus far for testing. How can I determine the if a transaction was a first time purchase for a part/vendor combination?
Use window functions:
select t.*,
(case when row_number() over (partition by vendor_id, part_id order by trans_date) = 1
then 'Y' else 'N'
end) as first_time
from tbl_trans t;
You don't need a join.
Apart from row_number, there are multiple ways of achieving the desired result using analytical function as follows.
You can use first_value analytical function as follows:
Select t.*,
Case
when first_value(trans_date)
over (partition by vendor_id, part_id order by trans_date) = trans_date
then 'Y'
else 'N'
end as first_time
From your_table t;
The same way, you can also use min as follows:
Select t.*,
Case
when min(trans_date)
over (partition by vendor_id, part_id) = trans_date
then 'Y'
else 'N'
end as first_time
From your_table t;
My PostgreSQL database stores school vacation, public holidays and weekend dates for parents to plan their vacation. Many times school vacations are adjourned by weekends or public holidays. I want to display the total number of non-school days for a school vacation. That should include any adjourned weekend or public holiday.
Example Data
locations
SELECT id, name, is_federal_state
FROM locations
WHERE is_federal_state = true;
| id | name | is_federal_state |
|----|-------------------|------------------|
| 2 | Baden-Württemberg | true |
| 3 | Bayern | true |
holiday_or_vacation_types
SELECT id, name FROM holiday_or_vacation_types;
| id | name |
|----|-----------------------|
| 1 | Herbst |
| 8 | Wochenende |
"Herbst" is German for "autumn" and "Wochenende" is German for "weekend".
periods
SELECT id, starts_on, ends_on, holiday_or_vacation_type_id
FROM periods
WHERE location_id = 2
ORDER BY starts_on;
| id | starts_on | ends_on | holiday_or_vacation_type_id |
|-----|--------------|--------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 8 |
Task
I want to select all periods where location_id equals 2. And I want to calculate the duration of each period in days. That can be done with this SQL query:
SELECT id, starts_on, ends_on,
(ends_on - starts_on + 1) AS duration,
holiday_or_vacation_type_id
FROM periods
| id | starts_on | ends_on | duration | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2 | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 3 | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 2 | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 2 | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 2 | 8 |
Any human looking at the calendar would see that the ids 670 (weekend), 532 (fall vacation) and 533 (fall vacation) are adjourned. So they add up to a 6 day vacation period. So far I do this with a program which computes this. But that takes quite a lot of resources (the actual table contains some 500,000 items).
Problem 1
Which SQL query would result in the following output (is adds a real_duration column)? Is that even possible with SQL?
| id | starts_on | ends_on | duration | real_duration | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|---------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2 | 6 | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 3 | 6 | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 | 6 | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 2 | 2 | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 2 | 2 | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 2 | 2 | 8 |
Problem 2
It is possible to list the adjourning periods in a part_of_range field? This would be the result. Can that be done with SQL?
| id | starts_on | ends_on | duration | part_of_range | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|---------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2 | 670,532,533 | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 3 | 670,532,533 | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 | 670,532,533 | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 2 | | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 2 | | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 2 | | 8 |
This is a gaps and islands problem. In this case you can use lag() to see where an island starts and then a cumulative sum.
The final operation is some aggregation (using window functions):
SELECT p.*,
(Max(ends_on) OVER (PARTITION BY location_id, grp) - Min(starts_on) OVER (PARTITION BY location_id, grp) ) + 1 AS duration,
Array_agg(p.id) OVER (PARTITION BY location_id)
FROM (SELECT p.*,
Count(*) FILTER (WHERE prev_eo < starts_on - INTERVAL '1 day') OVER (PARTITION BY location_id ORDER BY starts_on) AS grp
FROM (SELECT id, starts_on, ends_on, location_id, holiday_or_vacation_type_id,
lag(ends_on) OVER (PARTITION BY location_id ORDER BY (starts_on)) AS prev_eo
FROM periods
) p
) p;
I've got a query returning the following information. I used Over(PARTITION BY to include row numbers. I'm capturing every time my work_center_S begins a new Order#, but I want to exclude the beginning of a new order when the part_number was the same as the order/row previous to it. I'm not able to use the DISTINCT function because a part_number may appear numerous times a day and I'll need to capture every time such a change occurs.
[![Query Return][1]][1]
[1]: https://i.stack.imgur.com/IKvsR.jpg
+----+--------+---------------+-------------+------+
| rn | Order# | work_center_S | part_number | Hour |
+----+--------+---------------+-------------+------+
| 1 | 7098 | TB312 | 37203 | 1 |
+----+--------+---------------+-------------+------+
| 2 | 8797 | TB312 | 37194 | 4 |
+----+--------+---------------+-------------+------+
| 3 | 8802 | TB312 | 37355 | 11 |
+----+--------+---------------+-------------+------+
| 4 | 0946 | TB312 | 37194 | 15 |
+----+--------+---------------+-------------+------+
| 5 | 0698 | TB312 | 37203 | 18 |
+----+--------+---------------+-------------+------+
| 6 | 0699 | TB312 | 37203 | 21 |
+----+--------+---------------+-------------+------+
I assume there isn't a -1 part_number
select Order#,work_center_S,part_number,Hour
from (select *
,lag(part_number,1,-1) over
(
partition by work_center_S
order by Hour
) as prev_part_number
from mytable
) t
where part_number <> prev_part_number
--
+--------+---------------+-------------+------+
| Order# | work_center_S | part_number | Hour |
+--------+---------------+-------------+------+
| 7098 | TB312 | 37203 | 1 |
+--------+---------------+-------------+------+
| 8797 | TB312 | 37194 | 4 |
+--------+---------------+-------------+------+
| 8802 | TB312 | 37355 | 11 |
+--------+---------------+-------------+------+
| 946 | TB312 | 37194 | 15 |
+--------+---------------+-------------+------+
| 698 | TB312 | 37203 | 18 |
+--------+---------------+-------------+------+
I have a table Exam_record with the data. I need to know how to pullout the latest 2 record for each EID based on latest exam date.
EID | Exam_name | score | date_of_completion |
-----------------------------------------------
1 | Exam_1 | 60 | 23-Jun-2014 |
1 | Exam_1 | 70 | 10-Jan-2014 |
1 | Exam_1 | 71 | 15-Aug-2014 |
1 | Exam_1 | 65 | 1-Sep-2014 |
2 | Exam_2 | 50 | 2-Jul-2014 |
2 | Exam_2 | 55 | 12-May-2014 |
2 | Exam_2 | 65 | 15-Apr-2014 |
Desired output is
EID | Exam_name | score | date_of_completion |
-----------------------------------------------
1 | Exam_1 | 71 | 15-Aug-2014 |
1 | Exam_1 | 65 | 1-Sep-2014 |
2 | Exam_2 | 55 | 12-May-2014 |
2 | Exam_2 | 50 | 2-Jul-2014 |
Try like this:
select * from
(
select *,row_number()over(partition by EID order by date_of_completion desc) as rn from table
)x
where x<=2
more info