Conditional Partition - sql

I have a table (Employee_Training) that has the following columns:
Employee_Number
Course_ID
Date_Completed
I have this query that I use to show training, and it filters out the duplicate Date_Completed, only showing the most recent date:
SELECT x.*
FROM (SELECT t.*, ROW_NUMBER() OVER
(PARTITION BY t.Course_ID, t.Employee_Number
ORDER BY t.Date_Completed DESC) AS rank
FROM Employee_Training t) x
WHERE x.rank = 1
Is there any way to format this query not to apply the partition to a specific Course_ID, say like 1000004? I would want to see all the rows where Course_ID = 1000004.
Here is some sample data:
Just using a select all on that table:
557 | 1000002 | 2014-11-18
557 | 1000002 | 2009-7-6
557 | 1000004 | 2011-1-15
557 | 1000004 | 2005-9-22
557 | 1000004 | 2004-4-17
557 | 1000010 | 2014-6-10
557 | 1000010 | 2013-6-09
557 | 1000010 | 2012-6-10
Using my original query I get these results:
557 | 1000002 | 2014-11-18
557 | 1000004 | 2011-1-15
557 | 1000010 | 2014-6-10
What I would like to see (Only the 1000004 not being filtered out):
557 | 1000002 | 2014-11-18
557 | 1000004 | 2011-1-15
557 | 1000004 | 2005-9-22
557 | 1000004 | 2004-4-17
557 | 1000010 | 2014-6-10
Thank you.
Thank you.

You could exclude them from you're row_number partition and union them on at end.
SELECT x.*
FROM (SELECT t.*, ROW_NUMBER() OVER
(PARTITION BY t.Course_ID, t.Employee_Number
ORDER BY t.Date_Completed DESC) AS rank
FROM Employee_Training t
WHERE course_id!=1000004) x
WHERE x.rank = 1
UNION ALL
SELECT t.*,1 as rank
FROM Employee_Training t
WHERE course_id=1000004

Related

Compare table to itself and update one value based on another - bulk

The following select provides a list of 8524 values. Half are duplicates of the other half, with different dates. I need to terminate the older values based on the new DateEffective
SELECT PRID, COUNT(SiteID) AS SiteID_Count FROM PRL
WHERE GETDATE() BETWEEN DateEffective AND DateTerminated
and SiteGID in (190,191,192,193,30,31,32,33)
GROUP BY PRID
HAVING COUNT(SiteID)=2
ORDER BY PRID
Below table shows the current and expected result:
select * from PRL where SiteGID in (30,31,32,33) and PRID = 1339
UNION
select * from PRL where SiteGID in (190,191,192,193) and PRID = 1339
table:
| PRLID | PRID | SiteGID | SiteID | DateEffective | DateTerminated
| 895 | 1339 | 30 | 4353 | 2010-04-10 | 9999-12-31
| 966598 | 1339 | 191 | 4353 | 2021-02-19 | 9999-12-31
| PRLID | PRID | SiteGID | SiteID | DateEffective | DateTerminated
| 895 | 1339 | 30 | 4353 | 2010-04-10 | **2021-02-18**
| 966598 | 1339 | 191 | 4353 | 2021-02-19 | 9999-12-31
I want to link two tmp tables together, possibly using row_number and partitions? I'm really not sure - any advice is greatly appreciated
Based on your description,
PRLID is the primary key of table PRL
Grouping is based on (PRID, SiteID)
DateTerminated needs to be updated with following DateEffective - 1 day if applicable.
with cte as (
select prlid,
date_sub(lead(date_effective,1) over (partition by prid, site_id order by date_effective), interval 1 day) as new_date_terminated
from prl)
update prl as p
inner join cte c
using (prlid)
set p.date_terminated = c.new_date_terminated
where c.new_date_terminated is not null
and p.date_terminated <> c.new_date_terminated;
Outcome:
prlid |prid|site_gid|site_id|date_effective|date_terminated|
------+----+--------+-------+--------------+---------------+
895|1339| 30| 4353| 2010-04-10| 2021-02-18|
966598|1339| 191| 4353| 2021-02-19| 9999-12-31|

Subtracting previous row value from current row

I'm doing an aggregation like this:
select
date,
product,
count(*) as cnt
from
t1
where
yyyy_mm_dd in ('2020-03-31', '2020-07-31', '2020-09-30', '2020-12-31')
group by
1,2
order by
product asc, date asc
This produces data which looks like this:
| date | product | cnt | difference |
|------------|---------|------|------------|
| 2020-03-31 | p1 | 100 | null |
| 2020-07-31 | p1 | 1000 | 900 |
| 2020-09-30 | p1 | 900 | -100 |
| 2020-12-31 | p1 | 1100 | 200 |
| 2020-03-31 | p2 | 200 | null |
| 2020-07-31 | p2 | 210 | 10 |
| ... | ... | ... | x |
But without the difference column. How could I make such a calculation? I could pivot the date column and subtract that way but maybe there's a better way
Was able to use lag with partition by and order by to get this to work:
select
date,
product,
count,
count - lag(count) over (partition by product order by date, product) as difference
from(
select
date,
product,
count(*) as count
from
t1
where
yyyy_mm_dd in ('2020-03-31', '2020-07-31', '2020-09-30', '2020-12-31')
group by
1,2
) t

Compare values of two timestamps and put them in two columns in results

DB-Fiddle
CREATE TABLE operations (
id SERIAL PRIMARY KEY,
time_stamp DATE,
product VARCHAR,
plan_week VARCHAR,
quantity DECIMAL
);
INSERT INTO operations
(time_stamp, product, plan_week, quantity
)
VALUES
('2020-01-01', 'Product_A', 'CW01', '125'),
('2020-01-01', 'Product_B', 'CW01', '300'),
('2020-01-01', 'Product_C', 'CW08', '700'),
('2020-01-01', 'Product_D', 'CW01', '900'),
('2020-01-01', 'Product_G', 'CW05', '600'),
('2020-01-01', 'Product_J', 'CW01', '465'),
('2020-03-15', 'Product_A', 'CW01', '570'),
('2020-03-15', 'Product_C', 'CW02', '150'),
('2020-03-15', 'Product_E', 'CW02', '325'),
('2020-03-15', 'Product_G', 'CW01', '482'),
('2020-03-15', 'Product_J', 'CW12', '323');
Expected Result:
time_stamp | product | plan_week | quantity | first_plan | last_plan |
---------- |-------------|--------------|------------|--------------|-------------|---
2020-01-01 | Product_A | CW01 | 125 | CW01 | CW01 |
2020-03-15 | Product_A | CW01 | 570 | CW01 | CW01 |
------------|-------------|--------------|------------|--------------|-------------|---
2020-01-01 | Product_B | CW01 | 300 | CW01 | CW01 |
------------|-------------|--------------|------------|--------------|-------------|---
2020-01-01 | Product_C | CW08 | 700 | CW08 | CW02 |
2020-03-15 | Product_C | CW02 | 150 | CW08 | CW02 |
------------|-------------|--------------|------------|--------------|-------------|---
2020-01-01 | Product_D | CW01 | 900 | CW01 | CW01 |
------------|-------------|--------------|------------|--------------|-------------|---
2020-03-15 | Product_E | CW02 | 325 | CW02 | CW02 |
------------|-------------|--------------|------------|--------------|-------------|---
2020-01-01 | Product_G | CW05 | 600 | CW05 | CW01 |
2020-03-15 | Product_G | CW01 | 482 | CW05 | CW01 |
------------|-------------|--------------|------------|--------------|-------------|---
2020-01-01 | Product_J | CW01 | 465 | CW01 | CW12 |
2020-03-15 | Product_J | CW12 | 323 | CW01 | CW12 |
I want to compare the plan_week of two timestamps per product and order them below each other as you can see in the expected result.
In column first_plan I want to list the week of the first timestamp.
In column last_plan I want to list the week of the last teimestamp.
I am currently using this query to achive the result in postgresSQL:
SELECT
time_stamp,
product,
plan_week,
quantity,
(FIRST_VALUE(plan_week) OVER (PARTITION BY product ORDER BY time_stamp ASC)) first_plan,
(FIRST_VALUE(plan_week) OVER (PARTITION BY product ORDER BY time_stamp DESC)) last_plan
FROM operations;
However, when I apply this sql to amazon-redshift I get:
ERROR: Aggregate window functions with an ORDER BY clause require a frame clause
How do I need to modify the query to also make it work in redshift?
The manual explains what a frame clause is:
https://docs.aws.amazon.com/redshift/latest/dg/r_WF_first_value.html
https://docs.aws.amazon.com/redshift/latest/dg/r_Window_function_synopsis.html
(It's how many rows the window should look forward or backward.)
You probably want something like...
SELECT
time_stamp,
product,
plan_week,
quantity,
FIRST_VALUE(plan_week)
OVER (
PARTITION BY product
ORDER BY time_stamp
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
AS first_plan,
LAST_VALUE(plan_week)
OVER (
PARTITION BY product
ORDER BY time_stamp
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
AS last_plan
FROM
operations
Note, I used LAST_VALUE() instead of reversing the ORDER BY. In general it's preferred to keep the same window clause for multiple window functions. It makes the optimiser's life a bit easier, which is good for You.

Joining 2 unrelated tables together

I have just delved into PostgreSQL and am currently trying to practice an unorthodox query whereby I want to join 2 unrelated tables, each with the same number of rows, together such that every row carries the combined columns of both tables.
These are what I have:
technical table
position | height | technical_id
----------+--------+-------------
Striker | 172 | 3
CAM | 165 | 4
(2 rows)
footballers table
name | age | country | game_id
----------+-----+-----------+--------
Pele | 77 | Brazil | 1
Maradona | 65 | Argentina | 2
(2 rows)
What i have tried:
SELECT name, '' AS position, null AS height, age, country, game_id, null as technical_id
from footballers
UNION
SELECT '' as name, position, height, null AS age,'' AS country, null as game_id, technical_id
from technical;
Output:
name | position | height | age | country | game_id | technical_id
----------+----------+--------+-----+-----------+---------+-------------
| Striker | 172 | | | | 3
| CAM | 165 | | | | 4
Maradona | | | 65 | Argentina | 2 |
Pele | | | 77 | Brazil | 1 |
(4 rows)
What I'm looking for (ideally):
name | position | height | age | country | game_id | technical_id
----------+----------+--------+-----+-----------+---------+-------------
Pele | Striker | 172 | 77 | Brazil | 1 | 3
Maradona | CAM | 165 | 65 | Argentina | 2 | 4
(2 rows)
Please use below query. But its not the right way of designing the schema. You should have a foreign key.
select t1.position,t1.height,t1.technical_id,t2.name,t2.age,t2.country,t2.game_id
from
(select position,height,technical_id, row_number() over(partition by
position,height,technical_id) as rnk) t1
inner join
(select name,age,country,game_id, row_number() over(partition by
name,age,country,game_id) as rnk) t2
on t1.rnk = t2.rnk;
You don't have a column to join on, so you can generate one. What works is a sequential number generated by row_number(). So:
select *
from (select t.*, row_number() over () as sequm
from technical t
) t join
(select f.*, row_number() over () as sequm
from footballers f
) f
using (seqnum);
Note: Postgres has extended the syntax of row_number() so it does not require an order by clause. The ordering of the rows is arbitrary and might change on different runs of the query.

How do i calculate minimum and maximum for groups in a sequence in SQL Server?

I am having the following data in my database table in SQL Server:
Id Date Val_A Val_B Val_C Avg Vector MINMAXPOINTS
329 2016-01-15 78.09 68.40 70.29 76.50 BELOW 68.40
328 2016-01-14 79.79 75.40 76.65 76.67 BELOW 75.40
327 2016-01-13 81.15 74.59 79.00 76.44 ABOVE 81.15
326 2016-01-12 81.95 77.04 78.95 76.04 ABOVE 81.95
325 2016-01-11 82.40 73.65 81.34 75.47 ABOVE 82.40
324 2016-01-08 78.75 73.40 77.20 74.47 ABOVE 78.75
323 2016-01-07 76.40 72.29 72.95 73.74 BELOW 72.29
322 2016-01-06 81.25 77.70 78.34 73.12 ABOVE 81.25
321 2016-01-05 81.75 76.34 80.54 72.08 ABOVE 81.75
320 2016-01-04 80.95 75.15 76.29 70.86 ABOVE 80.95
The column MIMMAXPOINTS should actually contain lowest of Val_B until Vector is 'BELOW' and highest of Val_A until Vector is 'ABOVE'. So, we would have the following values in MINMAXPOINTS:
MINMAXPOINTS
68.40
68.40
82.40
82.40
82.40
82.40
72.29
81.75
81.75
81.75
Is it possible without cursor?
Any help will be greatly appreciated!.
At first apply classic gaps-and-islands to determine groups (gaps/islands/above/below) and then calculate MIN and MAX for each group.
I assume that ID column defines the order of rows.
Tested on SQL Server 2008. Here is SQL Fiddle.
Sample data
DECLARE #T TABLE
([Id] int, [dt] date, [Val_A] float, [Val_B] float, [Val_C] float, [Avg] float,
[Vector] varchar(5));
INSERT INTO #T ([Id], [dt], [Val_A], [Val_B], [Val_C], [Avg], [Vector]) VALUES
(329, '2016-01-15', 78.09, 68.40, 70.29, 76.50, 'BELOW'),
(328, '2016-01-14', 79.79, 75.40, 76.65, 76.67, 'BELOW'),
(327, '2016-01-13', 81.15, 74.59, 79.00, 76.44, 'ABOVE'),
(326, '2016-01-12', 81.95, 77.04, 78.95, 76.04, 'ABOVE'),
(325, '2016-01-11', 82.40, 73.65, 81.34, 75.47, 'ABOVE'),
(324, '2016-01-08', 78.75, 73.40, 77.20, 74.47, 'ABOVE'),
(323, '2016-01-07', 76.40, 72.29, 72.95, 73.74, 'BELOW'),
(322, '2016-01-06', 81.25, 77.70, 78.34, 73.12, 'ABOVE'),
(321, '2016-01-05', 81.75, 76.34, 80.54, 72.08, 'ABOVE'),
(320, '2016-01-04', 80.95, 75.15, 76.29, 70.86, 'ABOVE');
Query
To understand better how it works examine results of each CTE.
CTE_RowNumbers calculates two sequences of row numbers.
CTE_Groups assigns a number for each group (above/below).
CTE_MinMax calculates MIN/MAX for each group.
Final SELECT picks MIN or MAX to return.
WITH
CTE_RowNumbers
AS
(
SELECT [Id], [dt], [Val_A], [Val_B], [Val_C], [Avg], [Vector]
,ROW_NUMBER() OVER (ORDER BY ID DESC) AS rn1
,ROW_NUMBER() OVER (PARTITION BY Vector ORDER BY ID DESC) AS rn2
FROM #T
)
,CTE_Groups
AS
(
SELECT [Id], [dt], [Val_A], [Val_B], [Val_C], [Avg], [Vector]
,rn1-rn2 AS Groups
FROM CTE_RowNumbers
)
,CTE_MinMax
AS
(
SELECT [Id], [dt], [Val_A], [Val_B], [Val_C], [Avg], [Vector]
,MAX(Val_A) OVER(PARTITION BY Groups) AS MaxA
,MIN(Val_B) OVER(PARTITION BY Groups) AS MinB
FROM CTE_Groups
)
SELECT [Id], [dt], [Val_A], [Val_B], [Val_C], [Avg], [Vector]
,CASE
WHEN [Vector] = 'BELOW' THEN MinB
WHEN [Vector] = 'ABOVE' THEN MaxA
END AS MINMAXPOINTS
FROM CTE_MinMax
ORDER BY ID DESC;
Result
+-----+------------+-------+-------+-------+-------+--------+--------------+
| Id | dt | Val_A | Val_B | Val_C | Avg | Vector | MINMAXPOINTS |
+-----+------------+-------+-------+-------+-------+--------+--------------+
| 329 | 2016-01-15 | 78.09 | 68.4 | 70.29 | 76.5 | BELOW | 68.4 |
| 328 | 2016-01-14 | 79.79 | 75.4 | 76.65 | 76.67 | BELOW | 68.4 |
| 327 | 2016-01-13 | 81.15 | 74.59 | 79 | 76.44 | ABOVE | 82.4 |
| 326 | 2016-01-12 | 81.95 | 77.04 | 78.95 | 76.04 | ABOVE | 82.4 |
| 325 | 2016-01-11 | 82.4 | 73.65 | 81.34 | 75.47 | ABOVE | 82.4 |
| 324 | 2016-01-08 | 78.75 | 73.4 | 77.2 | 74.47 | ABOVE | 82.4 |
| 323 | 2016-01-07 | 76.4 | 72.29 | 72.95 | 73.74 | BELOW | 72.29 |
| 322 | 2016-01-06 | 81.25 | 77.7 | 78.34 | 73.12 | ABOVE | 81.75 |
| 321 | 2016-01-05 | 81.75 | 76.34 | 80.54 | 72.08 | ABOVE | 81.75 |
| 320 | 2016-01-04 | 80.95 | 75.15 | 76.29 | 70.86 | ABOVE | 81.75 |
+-----+------------+-------+-------+-------+-------+--------+--------------+
Modify the query to check for group of data greater than current records as
You can use below query using case statment which will let you select a conditional value based on vector value for each row.
The query is
SELECT ID, DATE, VAL_A, VAL_B, VAL_C, AVG, VECTOR,
CASE
WHEN VECTOR = 'BELOW' THEN (SELECT MIN(VAL_B) FROM TABLE A WHERE ROWID >= B.ROWID)
WHEN VECTOR = 'ABOVE' THEN (SELECT MAX(VAL_A) FROM TABLE A WHERE ROWID >= B.ROWID)
END AS MINMAXVALUE
FROM TABLE B
GO
Check this should yield the result you are expecting from the data.
You can use below query using case statment which will let you select a conditional value based on vector value for each row.
The query is
SELECT ID, DATE, VAL_A, VAL_B, VAL_C, AVG, VECTOR,
CASE
WHEN VECTOR = 'BELOW' THEN (SELECT MIN(VAL_B) FROM TABLE A)
WHEN VECTOR = 'ABOVE' THEN (SELECT MAX(VAL_A) FROM TABLE A)
END AS MINMAXVALUE
FROM TABLE B
GO
Check if this help you.