Summing and ordering at once - sql

I have a table of orders. There I need to find out which 3 partner_id's have made the largest sum of amount_totals, and sort those 3 from biggest to smallest.
testdb=# SELECT amount_total, partner_id FROM sale_order;
amount_total | partner_id
--------------+------------
1244.00 | 9
3065.90 | 12
3600.00 | 3
2263.00 | 25
3000.00 | 10
3263.00 | 3
123.00 | 25
5400.00 | 12
(8 rows)
Just starting SQL, I find it confusing ...

Aggregated amounts
If you want to list aggregated amounts, it can be as simple as:
SELECT partner_id, sum(amount_total) AS amout_suptertotal
FROM sale_order
GROUP BY 1
ORDER BY 2 DESC
LIMIT 3;
The 1 in GROUP BY 1 is a numerical parameter, that refers to the position in the SELECT list. Just a notational shortcut for GROUP BY partner_id in this case.
This ignores the special case where more than three partner would qualify and picks 3 arbitrarily (for lack of definition).
Individual amounts
SELECT partner_id, amount_total
FROM sale_order
JOIN (
SELECT partner_id, rank() OVER (ORDER BY sum(amount) DESC) As rnk
FROM sale_order
GROUP BY 1
ORDER BY 2
LIMIT 3
) top3 USING (partner_id)
ORDER BY top3.rnk;
This one, on the other hand includes all peers if more than 3 partner qualify for the top 3. The window function rank() gives you that.
The technique here is to group by partner_id in the subquery top3 and have the window function rank() attach ranks after the aggregation (window functions execute after aggregate functions). ORDER BY is applied after window functions and LIMIT is applied last. All in one subquery.
Then I join the base table to this subquery, so that only the top dogs remain in the result and order by rnk.
Window functions require PostgreSQL 8.4 or later.
This is rather advanced stuff. You should start learning SQL with something simpler probably.

select amount_total, partner_id
from (
select
sum(amount_total) amount_total,
partner_id
from sale_order
group by partner_id
) s
order by amount_total desc
limit 3

Related

SQL Distinct / GroupBy

Ok, I’m stuck on an SQL query and tried long enough that it’s time to ask for help :) I'm using Objection.js – but that's not super relevant as I really just can't figure out how to structure the SQL.
I have the following example data set:
Items
id
name
1
Test 1
2
Test 2
3
Test 3
Listings
id
item_id
price
created_at
1
1
100
1654640000
2
1
60
1654640001
3
1
80
1654640002
4
2
90
1654640003
5
2
90
1654640004
6
3
50
1654640005
What I’m trying to do:
Return the lowest priced listing for each item
If all listings for an item have the same price, I want to return the newest of the two items
Overall, I want to return the resulting items by price
I’m trying to write a query that returns the data:
id
item_id
name
price
created_at
6
3
Test 3
50
1654640005
2
1
Test 1
60
1654640001
5
2
Test 2
90
1654640004
Any help would be greatly appreciated! I'm also starting fresh, so I can add new columns to the data if that would help at all :)
An example of where my query is right now:
select * from "listings" inner join (select "item_id", MIN(price) as "min_price" from "listings" group by "item_id") as "grouped_listings" on "listings"."item_id" = "grouped_listings"."item_id" and "listings"."price" = "grouped_listings"."min_price" where "listings"."sold_at" is null and "listings"."expires_at" > ? order by CAST(price AS DECIMAL) ASC limit ?;
This gets me listings – but if two listings have the same price, it returns multiple listings with the same item_id – not ideal.
Given the postgresql tag, this should work:
with listings_numbered as (
select *, row_number() over (
partition by item_id
order by price asc, created_at desc
) as rownum
from listings
)
select l.id, l.item_id, i.name, l.price, l.created_at
from listings_numbered l
join items i on l.item_id=i.id
where l.rownum=1
order by price asc;
This is a bit of an advanced query, using window functions and a common table expression, but we can break it down.
with listings_numbered as (...) select simply means to run the query inside of the ..., and then we can refer to the results of that query as listings_numbered inside of the select, as though it was a table.
We're selecting all of the columns in listings, plus one more:
row_number() over (partition by item_id order by price asc, created_at desc). partition by item_id means that we would like the row number to reset for each new item_id, and the order by specifies the ordering that the rows should get within each partition before we number them: first increasing by price, then decreasing by creation time to break ties.
The result of the CTE listings_numbered looks like:
id
item_id
price
created_at
rownum
2
1
60
1654640001
1
3
1
80
1654640002
2
1
1
100
1654640000
3
5
2
90
1654640004
1
4
2
90
1654640003
2
6
3
50
1654640005
1
If you look at only the rows where rownum (the last column) is 1, then you can see that it's exactly the set of listings that you're interested in.
The outer query then selects from this this dataset, joins on items to get the name, filters to only the listings where rownum is 1, and sorts by price, to get the final result:
id
item_id
name
price
created_at
6
3
Test 3
50
1654640005
2
1
Test 1
60
1654640001
5
2
Test 2
90
1654640004
Aggregation functions, as the MIN function you employed in your query, is a viable option, yet if you want to have an efficient query for your problem, window functions can be your best friends. This class of functions allow to compute values over "windows" (partitions) of your table given some specified columns.
For the solution to this problem I'm going to compute two values using the window functions:
the minimum value for "listings.price", by partitioning on "listings.item_id",
the maximum value for "created_at", by partitioning on "listings.item_id" and listings.price
SELECT *,
MIN(price) OVER(PARTITION BY item_id) AS min_price,
MAX(created_at) OVER(PARTITION BY item_id, price) AS max_created_at
FROM listings
Once you have all records of listings associated to the corresponding minimum price and latest date, it's necessary for you to select the records whose
price equals the minimum price
created_at equals the most recent created_at
WITH cte AS (
SELECT *,
MIN(price) OVER(PARTITION BY item_id) AS min_price,
MAX(created_at) OVER(PARTITION BY item_id, price) AS max_created_at
FROM listings
)
SELECT id,
item_id,
price,
created_at
FROM cte
WHERE price = min_price
AND created_at = max_created_at
If you need to order by price, it's sufficient to add a ORDER BY price clause.
Check the demo here.

How can I select the top n records per category without any duplicates in SQL?

I am trying to select a set of recipes in a database by category. The criteria is that I need n number of recipes per category with no repeats. So, given a dataset recipes:
id | category
---|---------
1 | dairy
1 | eggs
1 | vegetarian
2 | dairy
2 | dessert
3 | thanksgiving
...
Is it possible to perform a select in such a way that my resulting dataset looks like this, where n=1?
id | category
----|----------
1 | dairy
2 | dessert
3 | thanksgiving
I happen to be using Presto to query this dataset, and there are about 30 categories total. I originally thought that maybe I could do some nested UNION statements, but a) that would be tedious for the number of categories I have and b) I don't think it will work since each UNION is kind of its own thing and has no knowledge of the past. I also considered using
select id from (
select id, category, row_number() over (partition by category order by id)
from recipes)
where row_num < 2
which would allow me to set how many ids want back from each category, but doesn't deal with the removing duplicates.
Ultimately I have a feeling this isn't possible in SQL, and that I should move it into Python or something, but if it's possible I'm very interested to see it in action!
You are close. use partition by id instead:
select id, category
from (select id, category,
row_number() over (partition by id order by id) as seqnum
from recipes
)
where seqnum = 1;
The order by only makes a difference if you want to determine which row you want -- the first category alphabetically for instance.
As a note: If you wanted one id per category, then I might suggest aggregation:
select category, min(id)
from t
group by category;

Selecting pair(including reverse order) with highest date value

I have a messages table like this
Messages Table
I want to select each unique pair (including reversed order) with highest date. Therefore resulting SQL Select Statement would be like this:
from_id | to_id | date | message
1 2 13:06 I'm Alp
2 3 13:06 I'm Oliver
3 1 11:38 From third to one
I tried to use distinct with max function but it didn't help.
You can use window functions:
select *
from (
select m.*,
row_number() over(partition by min(from_id, to_id), max(from_id, to_id) order by date desc) rn
from messages m
) m
where rn = 1
Note: counter-intuitively enough, SQLite's min() and max() functions, when given several arguments, are the equivalent of least() and greatest() in other databases.

PostgreSQL using sum in where clause

I have a table which has a numeric column named 'capacity'. I want to select first rows which the total sum of their capacity is no greater than X, Sth like this query
select * from table where sum(capacity )<X
But I know I can not use aggregation functions in where part.So what other ways exists for this problem?
Here is some sample data
id| capacity
1 | 12
2 | 13.5
3 | 15
I want to list rows which their sum is less than 26 with the order of id, so a query like this
select * from table where sum(capacity )<26 order by id
and it must give me
id| capacity
1 | 12
2 | 13.5
because 12+13.5<26
A bit late to the party, but for future reference, the following should work for a similar problem as the OP's:
SELECT id, sum(capacity)
FROM table
GROUP BY id
HAVING sum(capacity) < 26
ORDER by id ASC;
Use the PostgreSQL docs for reference to aggregate functions: https://www.postgresql.org/docs/9.1/tutorial-agg.html
Use Having clause
select * from table order by id having sum(capacity)<X
You can use the window variant of sum to produce a cumulative sum, and then use it in the where clause. Note that window functions can't be placed directly in the where clause, so you'd need a subquery:
SELECT id, capacity
FROM (SELECT id, capacity, SUM(capacity) OVER (ORDER BY id ASC) AS cum_sum
FROM mytable) t
WHERE cum_sum < 26
ORDER BY id ASC;

SQL: How many rows have the largest value for a column

I am sure this is a very simple answer, though I have not turned anything up. Most because I am sure I am phrasing the question wrong.
Anyway, lets say I have this very simple table:
Table: election_candidates
id | candidate_id | election_id | votes
---------------------------------------
1 | 2 | 1 | 3
2 | 5 | 1 | 3
3 | 3 | 1 | 2
I need to know if two candidates are tied. So if there is more than one candidate with the most amount of votes for an election.
I know I can use MAX function to get the largest value for an election, but is their an easy query to get how many candidates have the MAX for a given election?
I'm using PHP and the Codeigniter framework, though just a general example of a query that could work is just fine.
Most databases support ANSI-standard window functions. One way to do this is using rank():
select ec.election_id, count(*) as NumTies
from (select ec.*, rank(votes) over (partition by election_id order by votes desc) as seqnum
from election_candidates ec
) ec
where seqnum = 1
group by ec.election_id;
Couldn't you just do something like:
select e.*
from election_candidates e
inner join (
select election_id, max(votes) as maxVotes,
from election_candidates
group by election_id
) maxVotesPerElectionId on e.election_Id = maxVotesPerElectionId.election_id
and e.votes = maxVotesPerElectionId.maxVotes
this should get you the candiates (per election) with the max votes.
Just the winner:
SELECT *
from election_candidates
ORDER BY votes DESC
LIMIT 0,1
This will group all elections together, using rank() sort each election by votes cast and list in the order of placement.
All candidates are listed and displayed on how they did in each election.
DECLARE #T AS TABLE (id INT,candidate_id INT,election_id INT,votes INT)
INSERT INTO #T VALUES
(1 ,2,1,3),(2 ,5,1,3),(3 ,3,1,2),(4 ,2,2,3),(5 ,5,3,1),(6 ,6,1,4),(7 ,2,3,3),(8 ,1,4,3),
(9 ,1,5,2),(10,4,5,3),(11,5,5,3),(12,6,5,4)
SELECT
election_id,
votes,
RANK() OVER (PARTITION BY election_id ORDER BY votes) AS RANKING,
candidate_id
FROM #T
ORDER BY election_id,
RANK() OVER (PARTITION BY election_id ORDER BY votes)