SQL - Select lowest values with group by and order by? - sql

In my rankings database I have a table named times. I also have another table with authors. The authors have author id's (named ath_id inside the times table).
Records saved in times table:
id ath_id brand_id time date
------------- ------------ -------------- -------------- --------------
65125537 5384729 3 44741 May 8 2014
72073658 4298584 1 1104 Jun 28 2015
86139060 4298584 2 2376 Nov 20 2016
92237079 4298584 1 1115 Jun 24 2017
92237082 4298584 1 1104 Jun 24 2017
93436362 5384729 12 376492 Dec 31 2012
What I want to achieve
I'd like to retrieve an ordered list of the times that belong to the author (by the author id). I'd like to order them by brand_id, and I only want the records with the lowest time value.
Also, when there are multiple records with the same brand_id and the same time value, I'd like the list to be ordered by date. So the record with the latest date will be last.
What I have
I currently use this query: SELECT * FROM times WHERE ath_id = 4298584 GROUP BY brand_id ASC.
It works great, but it limits records with the same brand_id to 1, and thereby it limits records with the same time, even when multiple records have the lowest time value.
To sum it up
So in the case of the example above. When I select all the records with ath_id = 4298584, I'd like to retrieve the following ordered list:
id ath_id brand_id time date
------------- ------------ -------------- -------------- --------------
72073658 4298584 1 1104 Jun 28 2015
92237082 4298584 1 1104 Jun 24 2017
86139060 4298584 2 2376 Nov 20 2016
This is my first time doing a bit more advanced SQL queries. I'm working with Laravel, so giving both a raw SQL solution and a Laravel solution using the Laravel Query Builder wouldn't do any harm.

You could try using a derived table to get the min time for an ath_id and brand_id. Then join it back to your original table to get the rest of the data.
SELECT t.*
FROM times t
JOIN (SELECT ath_id, brand_id, MIN(time) AS time FROM dbo.times GROUP BY ath_id, brand_id) b
ON t.ath_id = b.ath_id AND t.brand_id = b.brand_id AND t.time = b.time
WHERE t.ath_id = 4298584
ORDER BY t.brand_id ASC, t.date DESC

This is another way you can do it. Although the output would be similar to SQLChao's answer, but the difference is that the inner query is creating and assigning ranks to the combination of ath_id,brand_id and date followed ordered by time. Then in outer query, you can use a filter to separate the rank 1. So basically you are replicating row_number() function.
You can use rnk=1 to rnk <= n in case you want first n records for your combination. But in you case, SQLChao's answer would be faster.
select t3.id,t3.ath_id,t3.brand_id,t3.time,t3.date
from times1 t3
inner join
(
select t1.ath_id,t1.brand_id,t1.date,t1.time,count(*) as rnk
from times1 t1
inner join times1 t2
on t1.ath_id=t2.ath_id
and t1.brand_id=t2.brand_id
and t1.date=t2.date
and t1.time >= t2.time
where t1.ath_id=4298584
group by t1.ath_id,t1.brand_id,t1.date,t1.time
) t4
on t3.ath_id=t4.ath_id
and t3.brand_id=t4.brand_id
and t3.date=t4.date
and t3.time = t4.time
and t4.rnk=1
;

Related

GROUP BY one column, then GROUP BY another column

I have a database table t with a sales table:
ID
TYPE
AGE
1
B
20
1
BP
20
1
BP
20
1
P
20
2
B
30
2
BP
30
2
BP
30
3
P
40
If a person buys a bundle it appears the bundle sale (TYPE B) and the different bundle products (TYPE BP), all with the same ID. So a bundle with 2 products appears 3 times (1x TYPE B and 2x TYPE BP) and has the same ID.
A person can also buy any other product in that single sale (TYPE P), which has also the same ID.
I need to calculate the average/min/max age of the customers but the multiple entries per sale tamper with the correct calculation.
The real average age is
(20 + 30 + 40) / 3 = 30
and not
(20+20+20+20 + 30+30+30 + 40) / 8 = 26,25
But I don't know how I can reduce the sales to a single row entry AND get the 4 needed values?
Do I need to GROUP BY twice (first by ID, then by AGE?) and if yes, how can I do it?
My code so far:
SELECT
AVERAGE(AGE)
, MIN(AGE)
, MAX(AGE)
, MEDIAN(AGE)
FROM t
but that does count every row.
Assuming the age is the same for all rows with the same ID (which in itself indicates a normalisation problem), you can use nest aggregation:
select avg(min(age)) from sales
group by id
AVG(MIN(AGE))
-------------
30
SQL Fiddle
The example in the documentation is very similar; and is explained as:
This calculation evaluates the inner aggregate (MAX(salary)) for each group defined by the GROUP BY clause (department_id), and aggregates the results again.
So for your version:
This calculation evaluates the inner aggregate (MIN(age)) for each group defined by the GROUP BY clause (id), and aggregates the results again.
It doesn't really matter whether the inner aggregate is min or max - again, assuming they are all the same - it's just to get a single value per ID, which can then be averaged.
You can do the same for the other values in your original query:
select
avg(min(age)) as avg_age,
min(min(age)) as min_age,
max(min(age)) as max_age,
median(min(age)) as med_age
from sales
group by id;
AVG_AGE MIN_AGE MAX_AGE MED_AGE
------- ------- ------- -------
30 20 40 30
Or if you prefer you could get the one-age-per-ID values once ina CTE or subquery and apply the second layer of aggregation to that:
select
avg(age) as avg_age,
min(age) as min_age,
max(age) as max_age,
median(age) as med_age
from (
select min(age) as age
from sales
group by id
);
which gets the same result.
SQL Fiddle

SQL counting query

Sorry if this is a basic question.
Basically, I have a table that is as follows, below is a basic sample
store-ProdCode-result
13p I10x 5
13p I20x 7
13p I30x 8
14a K38z 23
17a K38z 23
my data set has nearly 100,000 records.
What I'm trying to do is, for every store find the top 10 prodCode.
I am unsure of how to do this but what I tried was:
select s_code as store, prod_code,count (prod_code)
from top10_secondary
where prod_code is not null
group by store,prod_code
order by count(prod_code) desc limit 10
this is giving me something completely different and i'm unsure on how I go about achieving my final result.
All help is appreciated.
Thanks
The expected output should be: for every store(s_code) display the top 10 prodcode
so:
store--prodcode--result
1a abc 5
1a abd 4
2a dgf 1
2a ldk 6
.(10 times until next store code)
You can use the table twice in the FROM clause, once for the data, and once to get a count of how many records have fewer results for that store.
SELECT a.s_code, a.prod_code, count(*)
FROM top10_secondary a
LEFT OUTER JOIN top10_secondary b
ON a.s_code = b.s_code
AND b.result < a.result
GROUP BY a.s_code, a.prod_code
HAVING count(*) < 10
With this technique though, you may get more than 10 records per store if the 10th result value exists multiple times. Because the limit rule is simply "include record as long as there are less than 10 records with result values than mine"
It looks like in your case, "result" is a ranking, so they would not be duplicated per store.
This is a good case for Window functions.
SELECT
s_code,
prod_code,
prod_count
FROM
(
SELECT
s_code,
prod_code,
prod_count,
RANK() OVER (PARTITION BY s_code ORDER BY prod_Count DESC) as prod_rank
FROM
(SELECT s_code as store, prod_code, count(prod_Code) prod_count FROM table GROUP BY s_code, prod_code) t1
) t2
WHERE prod_rank <= 10
The inner most query gets the count of each product at the store. The second inner more query determines the rank for those products for each store based on that count. Then the outer most query limits the results based on that rank.
o

How to make a single line query include multiple lines in Oracle

I would like to take a set of data and expand it by adding date rows based an existing field. For instance, If I have the following table (TABLE1):
ID NAME YEAR
1 John 2001
2 Jim 2012
3 Sally 2005
I want to take this data and put it into another table but expand it to include a set of months (and from there I can add monthly information). If I just look at the first record (John) my result would be:
ID NAME YEAR MONTH
1 John 2001 01-JAN-2001
1 John 2001 01-FEB-2001
1 John 2001 01-MAR-2001
...
1 John 2001 01-DEC-2001
I have the mechanism to derive my monthly dates but how do I extract the data from TABLE1 to make TABLE2. Here is just a quick query but, of course, I get the ORA-01427 single-row subquery returns more than one row as expect. Just not sure how to organize the query to put these two pieces together:
select id,
name,
year,
book_cd,
(SELECT ADD_MONTHS('01-JAN-'|| year, LEVEL - 1)
FROM DUAL CONNECT BY LEVEL <= 12) month
from table1 ;
I realize I cant do this but I'm not sure how to put the two pieces together. I plan to bulk process records so it wont be one ID at a time Thanks for the help.
You can use a cross join:
select t.id,
t.name,
t.year,
t.book_cd,
ADD_MONTHS(to_date(t.year || '-01-01', 'YYYY-MM-DD'), m.rn) as mnth
from table1 t
cross join (select rownum - 1 as rn
from dual
connect by rownum <= 12) m

SQL query with grouping and MAX

I have a table that looks like the following but also has more columns that are not needed for this instance.
ID DATE Random
-- -------- ---------
1 4/12/2015 2
2 4/15/2015 2
3 3/12/2015 2
4 9/16/2015 3
5 1/12/2015 3
6 2/12/2015 3
ID is the primary key
Random is a foreign key but i am not actually using table it points to.
I am trying to design a query that groups the results by Random and Date and select the MAX Date within the grouping then gives me the associated ID.
IF i do the following query
select top 100 ID, Random, MAX(Date) from DateBase group by Random, Date, ID
I get duplicate Randoms since ID is the primary key and will always be unique.
The results i need would look something like this
ID DATE Random
-- -------- ---------
2 4/15/2015 2
4 9/16/2015 3
Also another question is there could be times where there are many of the same date. What will MAX do in that case?
You can use NOT EXISTS() :
SELECT * FROM YourTable t
WHERE NOT EXISTS(SELECT 1 FROM YourTable s
WHERE s.random = t.random
AND s.date > t.date)
This will select only those who doesn't have a bigger date for corresponding random value.
Can also be done using IN() :
SELECT * FROM YourTable t
WHERE (t.random,t.date) in (SELECT s.random,max(s.date)
FROM YourTable s
GROUP BY s.random)
Or with a join:
SELECT t.* FROM YourTable t
INNER JOIN (SELECT s.random,max(s.date) as max_date
FROM YourTable s
GROUP BY s.random) tt
ON(t.date = tt.max_date and s.random = t.random)
In SQL Server you could do something like the following,
select a.* from DateBase a inner join
(select Random,
MAX(dt) as dt from DateBase group by Random) as x
on a.dt =x.dt and a.random = x.random
This method will work in all versions of SQL as there are no vendor specifics (you'll need to format the dates using your vendor specific syntax)
You can do this in two stages:
The first step is to work out the max date for each random:
SELECT MAX(DateField) AS MaxDateField, Random
FROM Example
GROUP BY Random
Now you can join back onto your table to get the max ID for each combination:
SELECT MAX(e.ID) AS ID
,e.DateField AS DateField
,e.Random
FROM Example AS e
INNER JOIN (
SELECT MAX(DateField) AS MaxDateField, Random
FROM Example
GROUP BY Random
) data
ON data.MaxDateField = e.DateField
AND data.Random = e.Random
GROUP BY DateField, Random
SQL Fiddle example here: SQL Fiddle
To answer your second question:
If there are multiples of the same date, the MAX(e.ID) will simply choose the highest number. If you want the lowest, you can use MIN(e.ID) instead.

sql - return max value for a time period in two tables

Here is a simplified version of my problem:
I have two tables:
Students:
ST_STUDENT_ID NAME ST_DATE_TAKEN
------------- ----- -------------
1 Jim 2011-01-01
2 Fred 2011-01-02
3 Sarah 2011-01-03
4 Nancy 2001-02-04
SCORES:
SC_STUDENT_ID SC_SCORE
------------- --------
1 97
2 97
3 95
4 97
I need to pull the student with the highest score for a month (say January). However, I only want one student even if multiple students received that score and that score could also exist outside my focus month, so this is complicating my query. The only way I could figure to do it was to redo all my criteria at each nested sub-query. Is there a better way. It wasn't too terrible here, but in my actual problem the where criteria is much more complicated and joined across many tables, duplicating it is a pain, plus the cost gets quite large.
SELECT ST_STUDENT_ID, ST_TIMESTAMP, SC_SCORE
FROM STUDENTS
JOIN SCORES
ON ST_STUDENT_ID = SC_STUDENT_ID
WHERE ST_STUDENT_ID = (
SELECT MAX(ST_STUDENT_ID)
FROM STUDENTS
JOIN SCORES
ON ST_STUDENT_ID = SC_STUDENT_ID
WHERE ST_TIMESTAMP > '2011-01-01'
AND ST_TIMESTAMP < '2011-02-01'
AND SC_SCORE IS NOT NULL
AND SC_SCORE = (
SELECT MAX(SC_SCORE)
FROM STUDENTS
JOIN SCORES
ON ST_STUDENT_ID = SC_STUDENT_ID
WHERE ST_TIMESTAMP > '2011-01-01'
AND ST_TIMESTAMP < '2011-02-01'))
If you only want one score, and your time period will be passed explicitly into the query, what about something like this?
SELECT TOP 1 ST_STUDENT_ID, ST_TIMESTAMP, SC_SCORE
FROM STUDENTS
JOIN SCORES ON ST_STUDENT_ID = SC_STUDENT_ID
WHERE ST_TIMESTAMP >= '2011-01-01'
AND ST_TIMESTAMP <= '2011-02-01'
ORDER BY SC_SCORE DESC, ST_STUDENT_ID DESC
That syntax should work for MS SQL Server - different RDBMSs have slightly different syntaxes for the "TOP 1" concept.
[I see from your later comment that you're using DB2 - in which case the syntax apparently is FETCH FIRST 1 ROWS ONLY.]
Note that I'm following the logic in your example, which implies the student with the highest ID takes precedence. Good incentive to register late for class ;-)
(Assuming SQL Server 2005 or later, or another RDMBS that supports CTEs and window functions)
Something like:
;With OrderedScores as (
SELECT
ST_STUDENT_ID,
ST_TIMESTAMP,
SC_SCORE,
ROW_NUMBER() OVER (ORDER BY SC_SCORE desc,newid()) as rn /* Ordered randomly within same score */
FROM
STUDENTS
join
SCORES
on ST_STUDENT_ID = SC_STUDENT_ID
WHERE
ST_TIMESTAMP >= '20110101' and
ST_TIMESTAMP < '20110201'
)
select * from OrderedScores where rn = 1
Obviously, you can play with the criteria within the ORDER BY of the window function, to determine which student you pick when ties exist (in the above it's random; again assuming SQL Server - if another RDBMS, newid() should be replaced with something else)
In addition, I think I've got your date criteria correct here - in your original query, you have one set of criteria that use > and < (so excluding Jim), and in the other, you use <= and >=, which could include a student who tested on 1st Feb.