Confusing SELECT statement - sql

First I will show you example tables that my issue pertains to, then I will ask the question.
[my_fruits]
fruit_name | fruit_id | fruit_owner | fruit_timestamp
----------------------------------------------------------------
Banana | 3 | Timmy | 3/4/11
Banana | 3 | Timmy | 4/1/11
Banana | 8 | Timmy | 5/2/11
Apple | 4 | Timmy | 2/1/11
Apple | 4 | Roger | 3/4/11
Now I want to run a query that only selects fruit_name, fruit_id, and fruit_owner values. I only want to get one row per fruit, and the way I want it to be decided is by the latest timestamp. For example the perfect query on this table would return:
[my_fruits]
fruit_name | fruit_id | fruit_owner |
----------------------------------------------
Banana | 8 | Timmy |
Apple | 4 | Roger |
I tried the query:
select max(my_fruits.fruit_name) keep
(dense_rank last order by my_fruits.fruit_timestamp) fruit_name,
my_fruits.fruit_id, my_fruits.fruit_owner
from my_fruits
group by my_fruits.fruit_id, my_fruits.fruit_owner
Now the issue with that is returns basically distinct fruit names, fruit ids, and fruit owners.

For Oracle 9i+, use:
SELECT x.fruit_name,
x.fruit_id,
x.fruit_owner
FROM (SELECT mf.fruit_name,
mf.fruit_id,
mf.fruit_owner,
ROW_NUMBER() OVER (PARTITION BY mf.fruit_name
ORDER BY mf.fruit_timestamp) AS rank
FROM MY_FRUIT mf) x
WHERE x.rank = 1
Most databases will support using a self join on a derived table/inline view:
SELECT x.fruit_name,
x.fruit_id,
x.fruit_owner
FROM MY_FRUIT x
JOIN (SELECT t.fruit_name,
MAX(t.fruit_timestamp) AS max_ts
FROM MY_FRUIT t
GROUP BY t.fruit_name) y ON y.fruit_name = x.fruit_name
AND y.max_ts = x.fruit_timestamp
However, this will return duplicates if there are 2+ fruit_name records with the same timestamp value.

If you want one row per fruit name, you have to group by fruit_name.
select fruit_name,
max(my_fruits.fruit_id) keep
(dense_rank last order by my_fruits.fruit_timestamp) fruit_id,
max(my_fruits.fruit_owner) keep
(dense_rank last order by my_fruits.fruit_timestamp) fruit_owner
from my_fruits
group by my_fruits.fruit_name
How you want to deal with tie-breaks is a separate issue.

Try a subquery:
select a.fruit_name, a.fruit_id, a.fruit_owner
from my_fruits a
where a.fruit_timestamp =
(select max(b.fruit_timestamp)
from my_fruits b
where b.fruit_id = a.fruit_id)

I would do it by finding out the list of (fruit_name, fruit_timestamp) which are of interest to you, and then grouping that "table" with the actual fruit table and retrieving the other values.
SELECT fruit_and_max_t.fruit_name,
my_fruits.fruit_id,
my_fruits.fruit_owner
FROM my_fruits,
( SELECT fruit_name, MAX(fruit_timestamp) AS max_timestamp
FROM my_fruits
GROUP BY fruit_name) AS fruit_and_max_t,
WHERE fruit_and_max_t.max_timestamp = my_fruits.fruit_timestamp
AND fruit_and_max_t.fruit_name = my_fruits.fruit_name
This assumes that there are not multiple entries in the table with the same value of (fruit_name, fruit_timestamp), i.e. that tuple (pair) act like a unique identifier.

Related

Novice seeking help, Max Aggregate not returning expected results

I'm still very new to MS-SQL. I have a simple table and query that that is getting the best of me. I know it will something fundamental I'm overlooking.
I've changed the field names but the idea is the same.
So the idea is that every time someone signs up they get a RegID, Name, and Team. The names are unique, so for below yes John changed teams. And that's my trouble.
Football Table
+------------+----------+---------+
| Max_RegID | Name | Team |
+------------+----------+---------+
| 100 | John | Red |
| 101 | Bill | Blue |
| 102 | Tom | Green |
| 103 | John | Green |
+------------+----------+---------+
With the query at the bottom using the Max_RegID, I was expecting to get back only one record.
+------------+----------+---------+
| Max_RegID | Name | Team |
+------------+----------+---------+
| 103 | John | Green |
+------------+----------+---------+
Instead I get back below, Which seems to include Max_RegID but also for each team. What am I doing wrong?
+------------+----------+---------+
| Max_RegID | Name | Team |
+------------+----------+---------+
| 100 | John | Red |
| 103 | John | Green |
+------------+----------+---------+
My Query
SELECT
Max(Football.RegID) AS Max_RegID,
Football.Name,
Football.Team
FROM
Football
GROUP BY
Football.RegID,
Football.Name,
Football.Team
EDIT* Removed the WHERE statement
The reason you're getting the results that you are is because of the way you have your GROUP BY clause structured.
When you're using any aggregate function, MAX(X), SUM(X), COUNT(X), or what have you, you're telling the SQL engine that you want the aggregate value of column X for each unique combination of the columns listed in the GROUP BY clause.
In your query as written, you're grouping by all three of the columns in the table, telling the SQL engine that each tuple is unique. Therefore the query is returning ALL of the values, and you aren't actually getting the MAX of anything at all.
What you actually want in your results is the maximum RegID for each distinct value in the Name column and also the Team that goes along with that (RegID,Name) combination.
To accomplish that you need to find the MAX(ID) for each Name in an initial data set, and then use that list of RegIDs to add the values for Name and Team in a secondary data set.
Caveat (per comments from #HABO): This is premised on the assumption that RegID is a unique number (an IDENTITY column, value from a SEQUENCE, or something of that sort). If there are duplicate values, this will fail.
The most straight forward way to accomplish that is with a sub-query. The sub-query below gets your unique RegIDs, then joins to the original table to add the other values.
SELECT
f.RegID
,f.Name
,f.Team
FROM
Football AS f
JOIN
(--The sub-query, sq, gets the list of IDs
SELECT
MAX(f2.RegID) AS Max_RegID
FROM
Football AS f2
GROUP BY
f2.Name
) AS sq
ON
sq.Max_RegID = f.RegID;
EDIT: Sorry. I just re-read the question. To get just the single record for the MAX(RegID), just take the GROUP BY out of the sub-query, and you'll just get the current maximum value, which you can use to find the values in the rest of the columns.
SELECT
f.RegID
,f.Name
,f.Team
FROM
Football AS f
JOIN
(--The sub-query, sq, now gets the MAX ID
SELECT
MAX(f2.RegID) AS Max_RegID
FROM
Football AS f2
) AS sq
ON
sq.Max_RegID = f.RegID;
Use row_number()
select * from
(SELECT
Football.RegID AS Max_RegID,
Football.Name,
Football.Team, row_number() over(partition by name order by Football.RegID desc) as rn
FROM
Football
WHERE
Football.Name = 'John')a
where rn=1
simply you can edit your query below way
SELECT *
FROM
Football f
WHERE
f.Name = 'John' and
Max_RegID = (SELECT Max(Football.Max_RegID) where Football.Name = 'John'
)
or
if sql server simply use this
select top 1 * from Football f
where f.Name = 'John'
order by Max_RegID desc
or
if mysql then
select * from Football f
where f.Name = 'John'
order by Max_RegID desc
Limit 1
You need self join :
select f1.*
from Football f inner join
Football f1
on f1.name = f.name
where f.Max_RegID = 103;
After re-visit question, the sample data suggests me subquery :
select f.*
from Football f
where name = (select top (1) f1.name
from Football f1
order by f1.Max_RegID desc
);

More efficient way to query shortest string value associated with each value in another column in Hive QL

I have a table in Hive containing store names, order IDs, and User IDs (as well as some other columns including item ID). There is a row in the table for every item purchased (so there can be more than one row per order if the order contains multiple items). Order IDs are unique within a store, but not across stores. A single order can have more than one user ID associated with it.
I'm trying to write a query that will return a list of all stores and order IDs and the shortest user ID associated with each order.
So, for example, if the data looks like this:
STORE | ORDERID | USERID | ITEMID
------+---------+--------+-------
| a | 1 | bill | abc |
| a | 1 | susan | def |
| a | 2 | jane | abc |
| b | 1 | scott | ghi |
| b | 1 | tony | jkl |
Then the output would look like this:
STORE | ORDERID | USERID
------+---------+-------
a | 1 | bill
a | 2 | jane
b | 1 | tony
I've written a query that will do this, but I feel like there must be a more efficient way to go about it. Does anybody know a better way to produce these results?
This is what I have so far:
select
users.store, users.orderid, users.userid
from
(select
store, orderid, userid, length(userid) as len
from
sales) users
join
(select distinct
store, orderid,
min(length(userid)) over (partition by store, orderid) as len
from
sales) len on users.store = len.store
and users.orderid = len.orderid
and users.len = len.len
Check out probably this will work for you, here you can achieve your goal of single "SELECT" clause with no extra overhead on SQL.
select distinct
store, orderid,
first_value(userid) over(partition by store, orderid order by length(userid) asc) f_val
from
sales;
The result will be:
store orderid f_val
a 1 bill
a 2 jane
b 1 tony
Probably rank() is the best way:
select s.*
from (select s.*, rank() over (partition by store order by length(userid) as seqnum
from sales s
) s
where seqnum = 1;

Closest match per group?

I have table that looks similar to this:
Motor MotorType CalibrationValueX CalibrationValueY
A Car 1.2343 2.33343
B Boat 1.2455 2.55434
B1 Boat 1.4554 2.11211
C Car 1.4323 4.56555
D Car 1.533 4.6666
..... 500 entries
In my SQL query, I am trying to find average of CalibrationValueY where CalibrationValueX is a certain value:
SELECT avg(CalibrationValueY), MotorType, Motor FROM MotorTable
WHERE CalibrationValueX = 1.23333
GROUP BY MotorType
This will not return anything, since there is not a CalibrationValueX value that equals exactly 1.23333.
I am able to find closest match separately for each MotorTable with:
SELECT TOP 1 CalibrationValueY, FileSize, MotorType, Motor FROM MotorTable
where FileType = 'text' order by abs(FileSize - 1.23333)
however, I can't get it to work with a group by statement.
How can I do it so that if I am grouping by MotorType and I am searching CalibrationValueX = 1.23333, I would get this:
A Car 1.2343 2.33343
B Boat 1.2455 2.55434
Using ROW_NUMBER and PARTITION BY You combinate TOP 1 for each group
SQL Fiddle Demo
with cte as (
SELECT MotorType, CalibrationValueX, CalibrationValueY,
ROW_NUMBER() over (partition by MotorType order by abs(CalibrationValueX - 1.23333)) rn
from historyCR
)
SELECT *
from cte
where rn = 1
OUTPUT
| MotorType | CalibrationValueX | CalibrationValueY | rn |
|-----------|-------------------|-------------------|----|
| Boat | 1.2455 | 2.55434 | 1 |
| Car | 1.2343 | 2.33343 | 1 |

PostgreSQL return multiple rows with DISTINCT though only latest date per second column

Lets says I have the following database table (date truncated for example only, two 'id_' preix columns join with other tables)...
+-----------+---------+------+--------------------+-------+
| id_table1 | id_tab2 | date | description | price |
+-----------+---------+------+--------------------+-------+
| 1 | 11 | 2014 | man-eating-waffles | 1.46 |
+-----------+---------+------+--------------------+-------+
| 2 | 22 | 2014 | Flying Shoes | 8.99 |
+-----------+---------+------+--------------------+-------+
| 3 | 44 | 2015 | Flying Shoes | 12.99 |
+-----------+---------+------+--------------------+-------+
...and I have a query like the following...
SELECT id, date, description FROM inventory ORDER BY date ASC;
How do I SELECT all the descriptions, but only once each while simultaneously only the latest year for that description? So I need the database query to return the first and last row from the sample data above; the second it not returned because the last row has a later date.
Postgres has something called distinct on. This is usually more efficient than using window functions. So, an alternative method would be:
SELECT distinct on (description) id, date, description
FROM inventory
ORDER BY description, date desc;
The row_number window function should do the trick:
SELECT id, date, description
FROM (SELECT id, date, description,
ROW_NUMBER() OVER (PARTITION BY description
ORDER BY date DESC) AS rn
FROM inventory) t
WHERE rn = 1
ORDER BY date ASC;

MIN() Function in SQL

Need help with Min Function in SQL
I have a table as shown below.
+------------+-------+-------+
| Date_ | Name | Score |
+------------+-------+-------+
| 2012/07/05 | Jack | 1 |
| 2012/07/05 | Jones | 1 |
| 2012/07/06 | Jill | 2 |
| 2012/07/06 | James | 3 |
| 2012/07/07 | Hugo | 1 |
| 2012/07/07 | Jack | 1 |
| 2012/07/07 | Jim | 2 |
+------------+-------+-------+
I would like to get the output like below
+------------+------+-------+
| Date_ | Name | Score |
+------------+------+-------+
| 2012/07/05 | Jack | 1 |
| 2012/07/06 | Jill | 2 |
| 2012/07/07 | Hugo | 1 |
+------------+------+-------+
When I use the MIN() function with just the date and Score column I get the lowest score for each date, which is what I want. I don't care which row is returned if there is a tie in the score for the same date. Trouble starts when I also want name column in the output. I tried a few variation of SQL (i.e min with correlated sub query) but I have no luck getting the output as shown above. Can anyone help please:)
Query is as follows
SELECT DISTINCT
A.USername, A.Date_, A.Score
FROM TestTable AS A
INNER JOIN (SELECT Date_,MIN(Score) AS MinScore
FROM TestTable
GROUP BY Date_) AS B
ON (A.Score = B.MinScore) AND (A.Date_ = B.Date_);
Use this solution:
SELECT a.date_, MIN(name) AS name, a.score
FROM tbl a
INNER JOIN
(
SELECT date_, MIN(score) AS minscore
FROM tbl
GROUP BY date_
) b ON a.date_ = b.date_ AND a.score = b.minscore
GROUP BY a.date_, a.score
SQL-Fiddle Demo
This will get the minimum score per date in the INNER JOIN subselect, which we use to join to the main table. Once we join the subselect, we will only have dates with names having the minimum score (with ties being displayed).
Since we only want one name per date, we then group by date and score, selecting whichever name: MIN(name).
If we want to display the name column, we must use an aggregate function on name to facilitate the GROUP BY on date and score columns, or else it will not work (We could also use MAX() on that column as well).
Please learn about the GROUP BY functionality of RDBMS.
SELECT Date_,Name,MIN(Score)
FROM T
GROUP BY Name
This makes the assumption that EACH NAME and EACH date appears only once, and this will only work for MySQL.
To make it work on other RDBMSs, you need to apply another group function on the Date column, like MAX. MIN. etc
SELECT T.Name, T.Date_, MIN(T.Score) as Score FROM T
GROUP BY T.Date_
Edit: This answer is not corrected as pointed out by JNK in comments
SELECT Date_,MAX(Name),MIN(Score)
FROM T
GROUP BY Date_
Here I am using MAX(NAME), it will pick one name if two names were found with the same goal numbers.
This will find Min score for each day (no duplicates), scored by any player. The name that starts with Z will be picked first than the name that starts with A.
Edit: Fixed by removing group by name