How to write a correct SQL query?

How to write a correct SQL query? - sql

I have a table with following columns:
city name, population, year (year is when the data about population was collected)
In the table there can be the same city for more than 1 year, for example:
New York 7999999 2019
New York 8000000 2020
New York 7999998 2018
London 7000000 2020
London 7000000 2016
Moscow 12000000 2017
(So there're 3 records about New York, 2 about London and 1 about Moscow)
I need a query to get the newest records about every city.
So here the result will be:
New York 8000000 2020
London 7000000 2020
Moscow 12000000 2017

You want the latest record per city. A cross-database solution that usually performs well is to filter with a correlated subquery:
select t.*
from mytable t
where t.year = (select max(t1.year) from mytable t1 where t1.town = t.town)
A multi-column index on (town, year) would speed up this query.

you can use something like this:
SELECT town, amount, date FROM townlist
WHERE date IN (SELECT max(date) FROM townlist)
In general you need a solution to group the list by town - with the newest entry on top: another approach
SELECT town, amount, date
FROM (SELECT * FROM townlist ORDER BY date DESC) as sortedTable
GROUP BY town

Related

Find the Age and Name of the Youngest Player for Each Race

Table "participant":
ptcpt_id
ptcpt_name
brt_dt
1
Ana Perez
2001-10-10
2
John Sy
1999-04-03
3
Judy Ann
2001-10-10
Table "race":
race_id
race_name
race_date
1
Vroom Vroom
2023-01-01
2
Fast & Furious
2022-01-01
Table "individual_race_record":
irr_id
ptcpt_id
race_id
run_time
1
1
1
00:59:13
2
1
2
01:19:14
3
2
1
00:48:05
4
2
2
01:01:17
5
3
2
01:31:18
I want to select the name and age of the youngest participant for each race event, as well as the name and year of each race event.
This is what I have so far:
SELECT
r.race_name,
EXTRACT(YEAR FROM r.race_date) AS year,
COALESCE(CAST(min.age AS varchar), 'N/A')
FROM(
SELECT
race_id,
EXTRACT(YEAR FROM MIN(AGE(brt_dt))) AS age
FROM(
SELECT p.ptcpt_id, p.brt_dt, irr.race_id
FROM participant p
INNER JOIN individual_race_record irr
ON p.ptcpt_id = irr.ptcpt_id
) sub
GROUP BY race_id
) min
RIGHT JOIN race r ON r.race_id=min.race_id
ORDER BY year DESC
which resulted to the following table:
race_name
year
age
Vroom Vroom
2023
21
Fast & Furious
2022
21
But what I want is this:
race_name
year
age
ptcpt_name
Vroom Vroom
2023
21
Ana Perez
Fast & Furious
2022
21
Ana Perez
Fast & Furious
2022
21
Judy Ann
The problem is that I can't join it with the participant table. I still need another column for the name of the youngest participant. And if there are multiple youngest participant in a race, I'd like to show them both. When I try to select the ptcpt_id for the 'min' table it resulted to an error saying that I have to also include the ptcpt_id under the GROUP BY function. But I don't need it to be grouped by participants.
I'd appreciate any help and leads on this issue. Thank you.

You can use FETCH FIRST ROWS WITH TIES to gather all records that tie on the first ORDER BY field. Namely, if we use DENSE_RANK to assign a ranking to each person for each race, based on their age, it will allow to get all people with minimum age for each race. Since we're using DENSE_RANK, it will retrieve all people having the minimum age, if there's more than one.
SELECT r.race_name,
EXTRACT(YEAR FROM r.race_date) AS "year",
DATE_PART('year', r.race_date) - DATE_PART('year', p.brt_dt) AS age,
p.ptcpt_name
FROM participant p
INNER JOIN individual_race_record irr ON p.ptcpt_id = irr.ptcpt_id
INNER JOIN race r ON r.race_id = irr.race_id
ORDER BY DENSE_RANK() OVER(
PARTITION BY race_name
ORDER BY DATE_PART('year', r.race_date) - DATE_PART('year', p.brt_dt))
FETCH FIRST 1 ROWS WITH TIES
Output:
race_name
year
age
ptcpt_name
Fast & Furious
2022
21
Ana Perez
Fast & Furious
2022
21
Judy Ann
Vroom Vroom
2023
22
Ana Perez
Check the demo here.

find out the player with highest score in each year

I have a table like these
country
gender
player
score
year
Germany
male
Michael
14
1990
Austria
male
Simon
13
1990
Germany
female
Mila
16
1990
Austria
female
Simona
15
1990
This is a table in the database. It shows 70 countries around the world with player names and gender. It shows which player score how many goals in which year. The years goes from 1990 to 2015. So the table is large. Now I would like to know which female player and which male player score most in every year from 2010 to 2012.
I expect this:
gender
player
score
year
male
Michael
24
2010
male
Simon
19
2011
male
Milos
19
2012
female
Mara
16
2010
female
Simona
16
2011
female
Dania
17
2012
I used that code but got an error
SELECT gender,year,player, max(score) as score from (football) where player = max(score) and year in ('2010','2011','2012') group by 1,2,3
football is the table name

with main as (
select
gender,
player,
year,
sum(score) as total_score -- incase each player played multiple match in a year
from <table_name>
where year between 2010 and 2012
group by 1,2,3
),
ranking as (
select *,
row_number(total_score) over(partition by year, gender order by total_score desc) as rank_
)
select
gender,
player,
year,
total_score
from ranking where rank_ = 1
filter on years
first you add total score, to make sure you cover the cases if there are multiple matches played by the same player in same year
then you create a rank based on year, gender and the total score, so for a given year and for a given gender create a rank
then you filter on rank_ = 1 as it represents the highest score

You can use the dense_rank function to achieve this, if you are using sqlite version 3.25 or higher.
Query
select t.* from(
select *, dense_rank() over(
partition by year, gender
order by score desc
) as rn
from football
where year in ('2010','2011','2012')
) as t
where t.rn = 1;

How to aggregate using distinct values across two columns?

I have the following data in an orders table:
revenue expenses location_1 location_2
3 6 London New York
6 11 Paris Toronto
1 8 Houston Sydney
1 4 Chicago Los Angeles
2 5 New York London
7 11 New York Boston
4 6 Toronto Paris
5 11 Toronto New York
1 2 Los Angeles London
0 0 Mexico City London
I would like to create a result set that has 3 columns:
a list of the 10 DISTINCT city names
the sum of revenue for each city
the sum of expenses for each city
The desired result is:
location revenue expenses
London 6 13
New York 17 33
Paris 10 17
Toronto 15 28
Houston 1 8
Sydney 1 8
Chicago 1 4
Los Angeles 2 6
Boston 7 11
Mexico City 0 0
Is it possible to aggregate on distinct values across two columns? If yes, how would I do it?
Here is a fiddle:
http://sqlfiddle.com/#!9/0b1105/1

Shorter (and often faster):
SELECT location, sum(revenue) AS rev, sum(expenses) AS exp
FROM (
SELECT location_1 AS location, revenue, expenses FROM orders
UNION ALL
SELECT location_2 , revenue, expenses FROM orders
) sub
GROUP BY 1;
May be faster:
WITH cte AS (
SELECT location_1, location_2, revenue AS rev, expenses AS exp
FROM orders
)
SELECT location, sum(rev) AS rev, sum(exp) AS exp
FROM (
SELECT location_1 AS location, rev, exp FROM cte
UNION ALL
SELECT location_2 , rev, exp FROM cte
) sub
GROUP BY 1;
The (materialized!) CTE adds overhead, which may outweigh the benefit. Depends on many factors like total table size, available indexes, possible bloat, available RAM, storage speed, Postgres version, ...
fiddle

You could UNION ALL two queries and then select from it...
select location, sum(rev) as rev, sum(exp) as exp
from (
select location_1 as location, sum(revenue) as rev, sum(expenses) as exp
from orders
group by location_1
union all
select location_2 as location, sum(revenue) as rev, sum(expenses) as exp
from orders
group by location_2
)z
group by location
order by 1

Group by based on field length

I wanted to group number of ids that are of length of 4, 5, 6 bytes based on the year.
ID
year
name
location
geo
new_loc
addr 1
addr 2
addr 3
addr 4
12345
2019
bob
UK
UK-4
basic
dat1
dat11
dat13
dat123
19804
2004
sam
US
US-1
advanced
dat2
dat21
dat23
dat233
19
2000
lister
EU
EU
basic
dat3
dat31
dat33
dat333
190838
2004
harold
US
US-3
basic
dat4
dat41
dat53
dat533
11804
2019
beanie
SK
UK-2
advanced
NULL
NULL
NULL
NULL
Output
ID
year
name
location
new location
num_of_ids_each_year
12345
2019
bob
UK
basic
2
11804
2019
beanie
SK
advanced
2
19804
2004
sam
US
advanced
2
190838
2004
harold
US
basic
2
What I tried:
select ID, year, name, location, [new location], count(year)
from table1
group by ID, year, name, location, [new location], count(year);
Could someone advice on how to include only those ids that has more than 4,5,6 bytes

You can use COUNT() with Partition by Year to get the results without using GROUP BY.
SELECT ID, [year], [name], [location], [new location]
, COUNT(1) OVER (PARTITION BY year) AS num_of_ids_each_year
FROM table1
WHERE LEN(ID) IN (4,5,6)

Thanks #Squirrel, I finally made a way.
select id, Year, name, location, [new location],
count(id) over (partition by year) as num_of_ids_each_year
from table1 where len(id) in (4,5,6);

Please try aggregate function in having clause
e.g.
select ID,
year,
name,
location,
new location,
len(year)
from table1
group by ID, year, name, location, new location
having Len(year) >= 4

GROUP BY expression (part of column) instead of whole column

I'm creating a query:
I need to get the quantity of returns made for each year.
ReturnDate is the specific date each member has, when the return was made.
To get the year I'm using LEFT(ReturnDate,4).
It all seem OK so far, but I need the list to show Year, City and the complete number of returns for that city, like:
YEAR CITY QUANTITY
2011 London 300
2011 Stockholm 40
2012 London 250
Now, I'm not getting:
YEAR CITY QUANTITY
2011 London 200
2011 London 100
2011 Stockholm 30
2011 Stockholm 10
2012 London 250
This is what I've come up with so far:
SELECT LEFT(ReturnDate,4) AS Year, City, COUNT(ReturnDate) AS Quantity
FROM Member
GROUP BY ReturnDate, City

Try:
GROUP BY LEFT(ReturnDate, 4), City

Alternatively, you can try using DATEPART:
SELECT DATEPART(yyyy, ReturnDate) AS Year, City, COUNT(ReturnDate) AS Quantity
FROM Member
GROUP BY DATEPART(yyyy, ReturnDate), City

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to write a correct SQL query? - sql

You want the latest record per city. A cross-database solution that usually performs well is to filter with a correlated subquery: select t.* from mytable t where t.year = (select max(t1.year) from mytable t1 where t1.town = t.town) A multi-column index on (town, year) would speed up this query.

Related

Find the Age and Name of the Youngest Player for Each Race

find out the player with highest score in each year

How to aggregate using distinct values across two columns?

Group by based on field length

GROUP BY expression (part of column) instead of whole column

Categories

Resources