Is there a difference between Oracle SQL 'KEEP' for multiple columns and 'KEEP' for one and GROUP BY for the rest? - sql

I'm just now learning about KEEP in Oracle SQL, but I cannot seem to find documentation that explains why their examples use KEEP in all columns that are not indexed.
I have a table with 5 columns
PERSON_ID | BRANCH | YEAR | STATUS | TIMESTAMP
123456 | 0001 | 2017 | 1 | 1-1-2017 (ROW 1)
123456 | 0001 | 2017 | 2 | 2-1-2017 (ROW 2)
123456 | 0002 | 2017 | 3 | 3-1-2017 (ROW 3)
123456 | 0001 | 2017 | 2 | 4-1-2017 (ROW 4)
123456 | 0001 | 2018 | 2 | 1-1-2018 (ROW 5)
123456 | 0001 | 2018 | 3 | 2-1-2018 (ROW 6)
I want to return the row of the most recent timestamp by person, branch, and year, so rows 3, 4, and 6.
RESULTS
PERSON_ID | BRANCH | YEAR | STATUS | TIME_STAMP
123456 | 0002 | 2017 | 3 | 3-1-2017 (ROW 3)
123456 | 0001 | 2017 | 2 | 4-1-2017 (ROW 4)
123456 | 0001 | 2018 | 3 | 2-1-2018 (ROW 6)
To get the entire row, I would normally I would write something like this:
SELECT *
FROM STATUS_TABLE a
WHERE a.TIME_STAMP =
(
SELECT MAX(sub.TIME_STAMP)
FROM STATUS_TABLE sub
WHERE a.PERSON_ID = sub.PERSON_ID
AND a.YEAR = sub.YEAR
AND a.BRANCH = sub.BRANCH
)
But I'm learning I can write this:
SELECT
a.PERSON_ID,
a.YEAR,
a.BRANCH,
MAX(a.STATUS) KEEP (DENSE_RANK FIRST ORDER BY TIME_STAMP DESC)
FROM STATUS_TABLE a
GROUP BY a.PERSON_ID, a.YEAR, a.BRANCH;
My concern is that a lot of the documentation and example I'm finding doesn't put all the group-by columns in GROUP BY, but rather they write a KEEP statement for many columns.
Like this:
SELECT
a.PERSON_ID,
MAX(a.YEAR) KEEP (DENSE_RANK FIRST ORDER BY TIME_STAMP DESC),
MAX(a.BRANCH) KEEP (DENSE_RANK FIRST ORDER BY TIME_STAMP DESC),
MAX(a.STATUS) KEEP (DENSE_RANK FIRST ORDER BY TIME_STAMP DESC)
FROM STATUS_TABLE a
GROUP BY a.PERSON_ID;
QUESTION
If I know that there will never be duplicates on TIME_STAMP for an ID, YEAR, and BRANCH, can I write it the first way or do I still need to write it the 2nd way. Using the first way, I get the results I'm expecting, but I can't seem to find any explanation of this method and what the differences may be.
Are there any?

Your aggregation queries are different. When you have:
GROUP BY a.PERSON_ID, a.YEAR, a.BRANCH
Your result set will have one row in the result set for each combination of the three columns.
If you specify:
GROUP BY a.PERSON_ID
Then there is one row only for each PERSON_ID. Under some circumstances, this is the same as the above version. But only when there is one YEAR and BRANCH per PERSON_ID. That is not true in your data.
These versions are functionally equivalent for most practical purposes to your version with the correlated subquery. One difference is what happens if any of the grouping/correlation columns are NULL. The GROUP BY keeps these groupings. The correlated subquery filters them out.

Related

ORACLE SELECT DISTINCT VALUE ONLY IN SOME COLUMNS

+----+------+-------+---------+---------+
| id | order| value | type | account |
+----+------+-------+---------+---------+
| 1 | 1 | a | 2 | 1 |
| 1 | 2 | b | 1 | 1 |
| 1 | 3 | c | 4 | 1 |
| 1 | 4 | d | 2 | 1 |
| 1 | 5 | e | 1 | 1 |
| 1 | 5 | f | 6 | 1 |
| 2 | 6 | g | 1 | 1 |
+----+------+-------+---------+---------+
I need get a select of all fields of this table but only getting 1 row for each combination of id+type (I don't care the value of the type). But I tried some approach without result.
At the moment that I make an DISTINCT I cant include rest of the fields to make it available in a subquery. If I add ROWNUM in the subquery all rows will be different making this not working.
Some ideas?
My better query at the moment is this:
SELECT ID, TYPE, VALUE, ACCOUNT
FROM MYTABLE
WHERE ROWID IN (SELECT DISTINCT MAX(ROWID)
FROM MYTABLE
GROUP BY ID, TYPE);
It seems you need to select one (random) row for each distinct combination of id and type. If so, you could do that efficiently using the row_number analytic function. Something like this:
select id, type, value, account
from (
select id, type, value, account,
row_number() over (partition by id, type order by null) as rn
from your_table
)
where rn = 1
;
order by null means random ordering of rows within each group (partition) by (id, type); this means that the ordering step, which is usually time-consuming, will be trivial in this case. Also, Oracle optimizes such queries (for the filter rn = 1).
Or, in versions 12.1 and higher, you can get the same with the match_recognize clause:
select id, type, value, account
from my_table
match_recognize (
partition by id, type
all rows per match
pattern (^r)
define r as null is null
);
This partitions the rows by id and type, it doesn't order them (which means random ordering), and selects just the "first" row from each partition. Note that some analytic functions, including row_number(), require an order by clause (even when we don't care about the ordering) - order by null is customary, but it can't be left out completely. By contrast, in match_recognize you can leave out the order by clause (the default is "random order"). On the other hand, you can't leave out the define clause, even if it imposes no conditions whatsoever. Why Oracle doesn't use a default for that clause too, only Oracle knows.

Show last update date

I am new in this forum and also new in SQL my question is
I have an Excel sheet link to database with "From Microsoft query" I have 3 tables link together pd_ln,pdcflbrt,pdlbr
By using the following query I am getting this data
SELECT pdcflbrt.lbrcod, pdcflbrt.lbrrat, pd_ln.prdnum, pdcflbrt.begeffdat
FROM velocity.dbo.pd_ln pd_ln, velocity.dbo.pdcflbrt pdcflbrt, velocity.dbo.pdlbr pdlbr
WHERE pdlbr.lbrrattky = pdcflbrt.lbrrattky AND pd_ln.pd_ln_tky = pdlbr.pd_ln_tky
+--------------+--------------+-----------+------------------+
| lbrcod | lbrrat | prdnum | begeffdat |
+--------------+--------------+-----------+------------------+
| FC Braselton | 0.11 | 00236 | 7/15/2012 0:00 |
| FC Braselton | 0.11 | 00236 | 7/15/2012 0:00 |
| FC Braselton | 0.1 | 00236 | 12/10/2012 0:00 |
| Sizing | 0.21 | 03103 | 8/28/2015 0:00 |
| Sizing | 0.2 | 03103 | 10/13/2011 0:00 |
+--------------+--------------+-----------+------------------+
How do I query to get the last begeffdat of each prdnum.
Magood's answer may work in this situation. However, if there was a unique identifier for each edit that you were selecting, it wouldn't work. As far as I know, you would have to get involved with row_number() like so:
SELECT s2.lbrcod, s2.lbrrat, s2.prdnum, s2.begeffdat from
(SELECT pdcflbrt.lbrcod
, pdcflbrt.lbrrat
, pd_ln.prdnum
, pdcflbrt.begeffdat
, row_number() over (partition by pd_ln.prdnum order by pdcflbrt.begeffdat desc) as RN
FROM velocity.dbo.pd_ln pd_ln, velocity.dbo.pdcflbrt pdcflbrt, velocity.dbo.pdlbr pdlbr
WHERE pdlbr.lbrrattky = pdcflbrt.lbrrattky AND pd_ln.pd_ln_tky = pdlbr.pd_ln_tky) s2
where s2.rn = 1
This will return only the top date (it is the same query on the inner portion, but with the row_number() function added, with each different prdnum starting the numbers over, and ordering the rows by date, with the newest date first. The outer portion selects only row 1 (that's the last where) which is the newest date.
EDIT: Alternatively, if you only want the OLDEST update, you could change the desc in the main query's select statement to say asc.
-- Only for name and latest date
select lbrcod, max(begeffdate) begeffdat from #table
group by lbrcod
-- For all columns
select * from (
select *, row_number() over (partition by prdnum order by begeffdate desc) rowNum from #table
) data
where rowNum = 1

PostgreSQL return multiple rows with DISTINCT though only latest date per second column

Lets says I have the following database table (date truncated for example only, two 'id_' preix columns join with other tables)...
+-----------+---------+------+--------------------+-------+
| id_table1 | id_tab2 | date | description | price |
+-----------+---------+------+--------------------+-------+
| 1 | 11 | 2014 | man-eating-waffles | 1.46 |
+-----------+---------+------+--------------------+-------+
| 2 | 22 | 2014 | Flying Shoes | 8.99 |
+-----------+---------+------+--------------------+-------+
| 3 | 44 | 2015 | Flying Shoes | 12.99 |
+-----------+---------+------+--------------------+-------+
...and I have a query like the following...
SELECT id, date, description FROM inventory ORDER BY date ASC;
How do I SELECT all the descriptions, but only once each while simultaneously only the latest year for that description? So I need the database query to return the first and last row from the sample data above; the second it not returned because the last row has a later date.
Postgres has something called distinct on. This is usually more efficient than using window functions. So, an alternative method would be:
SELECT distinct on (description) id, date, description
FROM inventory
ORDER BY description, date desc;
The row_number window function should do the trick:
SELECT id, date, description
FROM (SELECT id, date, description,
ROW_NUMBER() OVER (PARTITION BY description
ORDER BY date DESC) AS rn
FROM inventory) t
WHERE rn = 1
ORDER BY date ASC;

Select first & last date in window

I'm trying to select first & last date in window based on month & year of date supplied.
Here is example data:
F.rates
| id | c_id | date | rate |
---------------------------------
| 1 | 1 | 01-01-1991 | 1 |
| 1 | 1 | 15-01-1991 | 0.5 |
| 1 | 1 | 30-01-1991 | 2 |
.................................
| 1 | 1 | 01-11-2014 | 1 |
| 1 | 1 | 15-11-2014 | 0.5 |
| 1 | 1 | 30-11-2014 | 2 |
Here is pgSQL SELECT I came up with:
SELECT c_id, first_value(date) OVER w, last_value(date) OVER w FROM F.rates
WINDOW w AS (PARTITION BY EXTRACT(YEAR FROM date), EXTRACT(MONTH FROM date), c_id
ORDER BY date ASC)
Which gives me a result pretty close to what I want:
| c_id | first_date | last_date |
----------------------------------
| 1 | 01-01-1991 | 15-01-1991 |
| 1 | 01-01-1991 | 30-01-1991 |
.................................
Should be:
| c_id | first_date | last_date |
----------------------------------
| 1 | 01-01-1991 | 30-01-1991 |
.................................
For some reasons last_value(date) returns every record in a window. Which giving me a thought that I'm misunderstanding how windows in SQL works. It's like SQL forming a new window for each row it iterates through, but not multiple windows for entire table based on YEAR and MONTH.
So could any one be kind and explain if I'm wrong and how do I achieve the result I want?
There is a reason why i'm not using MAX/MIN over GROUP BY clause. My next step would be to retrieve associated rates for dates I selected, like:
| c_id | first_date | last_date | first_rate | last_rate | avg rate |
-----------------------------------------------------------------------
| 1 | 01-01-1991 | 30-01-1991 | 1 | 2 | 1.1 |
.......................................................................
If you want your output to become grouped into a single (or just fewer) row(s), you should use simple aggregation (i.e. GROUP BY), if avg_rate is enough:
SELECT c_id, min(date), max(date), avg(rate)
FROM F.rates
GROUP BY c_id, date_trunc('month', date)
More about window functions in PostgreSQL's documentation:
But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities.
...
There is another important concept associated with window functions: for each row, there is a set of rows within its partition called its window frame. Many (but not all) window functions act only on the rows of the window frame, rather than of the whole partition. By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause. When ORDER BY is omitted the default frame consists of all rows in the partition.
...
There are options to define the window frame in other ways ... See Section 4.2.8 for details.
EDIT:
If you want to collapse (min/max aggregation) your data and want to collect more columns than those what listed in GROUP BY, you have 2 choice:
The SQL way
Select min/max value(s) in a sub-query, then join their original rows back (but this way, you have to deal with the fact, that min/max-ed column(s) usually not unique):
SELECT c_id,
min first_date,
max last_date,
first.rate first_rate,
last.rate last_rate,
avg avg_rate
FROM (SELECT c_id, min(date), max(date), avg(rate)
FROM F.rates
GROUP BY c_id, date_trunc('month', date)) agg
JOIN F.rates first ON agg.c_id = first.c_id AND agg.min = first.date
JOIN F.rates last ON agg.c_id = last.c_id AND agg.max = last.date
PostgreSQL's DISTINCT ON
DISTINCT ON is typically meant for this task, but highly rely on ordering (only 1 extremum can be searched for this way at a time):
SELECT DISTINCT ON (c_id, date_trunc('month', date))
c_id,
date first_date,
rate first_rate
FROM F.rates
ORDER BY c_id, date
You can join this query with other aggregated sub-queries of F.rates, but this point (if you really need both minimum & maximum, and in your case even an average) the SQL compliant way is more suiting.
Windowing functions aren't appropriate for this. Use aggregate functions instead.
select
c_id, date_trunc('month', date)::date,
min(date) first_date, max(date) last_date
from rates
group by c_id, date_trunc('month', date)::date;
c_id | date_trunc | first_date | last_date
------+------------+------------+------------
1 | 2014-11-01 | 2014-11-01 | 2014-11-30
1 | 1991-01-01 | 1991-01-01 | 1991-01-30
create table rates (
id integer not null,
c_id integer not null,
date date not null,
rate numeric(2, 1),
primary key (id, c_id, date)
);
insert into rates values
(1, 1, '1991-01-01', 1),
(1, 1, '1991-01-15', 0.5),
(1, 1, '1991-01-30', 2),
(1, 1, '2014-11-01', 1),
(1, 1, '2014-11-15', 0.5),
(1, 1, '2014-11-30', 2);

MIN() Function in SQL

Need help with Min Function in SQL
I have a table as shown below.
+------------+-------+-------+
| Date_ | Name | Score |
+------------+-------+-------+
| 2012/07/05 | Jack | 1 |
| 2012/07/05 | Jones | 1 |
| 2012/07/06 | Jill | 2 |
| 2012/07/06 | James | 3 |
| 2012/07/07 | Hugo | 1 |
| 2012/07/07 | Jack | 1 |
| 2012/07/07 | Jim | 2 |
+------------+-------+-------+
I would like to get the output like below
+------------+------+-------+
| Date_ | Name | Score |
+------------+------+-------+
| 2012/07/05 | Jack | 1 |
| 2012/07/06 | Jill | 2 |
| 2012/07/07 | Hugo | 1 |
+------------+------+-------+
When I use the MIN() function with just the date and Score column I get the lowest score for each date, which is what I want. I don't care which row is returned if there is a tie in the score for the same date. Trouble starts when I also want name column in the output. I tried a few variation of SQL (i.e min with correlated sub query) but I have no luck getting the output as shown above. Can anyone help please:)
Query is as follows
SELECT DISTINCT
A.USername, A.Date_, A.Score
FROM TestTable AS A
INNER JOIN (SELECT Date_,MIN(Score) AS MinScore
FROM TestTable
GROUP BY Date_) AS B
ON (A.Score = B.MinScore) AND (A.Date_ = B.Date_);
Use this solution:
SELECT a.date_, MIN(name) AS name, a.score
FROM tbl a
INNER JOIN
(
SELECT date_, MIN(score) AS minscore
FROM tbl
GROUP BY date_
) b ON a.date_ = b.date_ AND a.score = b.minscore
GROUP BY a.date_, a.score
SQL-Fiddle Demo
This will get the minimum score per date in the INNER JOIN subselect, which we use to join to the main table. Once we join the subselect, we will only have dates with names having the minimum score (with ties being displayed).
Since we only want one name per date, we then group by date and score, selecting whichever name: MIN(name).
If we want to display the name column, we must use an aggregate function on name to facilitate the GROUP BY on date and score columns, or else it will not work (We could also use MAX() on that column as well).
Please learn about the GROUP BY functionality of RDBMS.
SELECT Date_,Name,MIN(Score)
FROM T
GROUP BY Name
This makes the assumption that EACH NAME and EACH date appears only once, and this will only work for MySQL.
To make it work on other RDBMSs, you need to apply another group function on the Date column, like MAX. MIN. etc
SELECT T.Name, T.Date_, MIN(T.Score) as Score FROM T
GROUP BY T.Date_
Edit: This answer is not corrected as pointed out by JNK in comments
SELECT Date_,MAX(Name),MIN(Score)
FROM T
GROUP BY Date_
Here I am using MAX(NAME), it will pick one name if two names were found with the same goal numbers.
This will find Min score for each day (no duplicates), scored by any player. The name that starts with Z will be picked first than the name that starts with A.
Edit: Fixed by removing group by name