SQL select rows while ignoring duplicate set of columns

SQL select rows while ignoring duplicate set of columns - sql

I have these two tables
Shoe
id: str
size: str
model_id: str
created_at: Date
Model
id: str
name: str
I want to query join Shoe and Model ordered by created_at.
So I have this query
Select *
from Shoe
join Model on Shoe.model_id = Model.id
ordered by Shoe.created_at DESC
limit 10
But the caveat is that I do not want to rows with the same size and model_id in this result.
For example
id
size
model_id
created_at
name
1
10
abc
2022-12-06
'Shoe'
2
10
abc
2022-12-04
'Shoe'
3
11
abc
2022-12-03
'Shoe'
In the above example, I only want first one and third one. Second one has the same size + model_id as the first one so we ignore.
How can we achieve this?
I have been trying to play with group_by, but still can't seem to figure this out
How can I modify this SQL command
Select *
from Shoe
join Model on Shoe.model_id = Model.id
ordered by Shoe.created_at DESC
limit 10
to get the desirable result?

This could be one way of doing it:
SELECT *
FROM Shoe
join Model on Shoe.model_id = Model.id
JOIN (SELECT [size], [model_id], MIN([id]) AS "Min Id" FROM Shoe GROUP BY [size], [model_id]
) as f1 ON f1.[size] = Shoe.[Size] AND f1.[model_id] = Shoe.[model_id] AND f1.[Min Id] = Shoe.[Id]
order by Shoe.created_at DESC
limit 10

Related

Trying to display Using Sum() with JOIN doesn't provide correct output

I'm trying to create a query that displays a user's Id, the sum of total steps, and sum of total calories burnt.
The data for steps and calories are within two datasets, so I used JOIN. However, when I write out the query, the joined data does not look correct. However when I do them separately, it appears to show the correct data
Below are my queries...I am fairly new to SQL, so I am somewhat confused on what I did wrong. How do I correct this? Thank you in advanced for the help!
For the Steps table, "Id" and "StepTotal" are Integers. For the Calories table, "Id" and "Calories" are also Integers.
SELECT steps.Id,Sum(StepTotal) AS Total_steps,Sum(cal.Calories) as Total_calories
FROM fitbit.Daily_steps AS steps
JOIN fitbit.Daily_calories AS cal ON steps.Id=cal.Id
GROUP BY Id
Given Output(Picture)
Expected Output(Picture)
For Steps
SELECT Id,Sum(StepTotal) AS Total_steps
FROM fitbit.Daily_steps
group by Id
Id
Total_steps
1503960366
375619
1624580081
178061
1644430081
218489
For Calories
SELECT Id,Sum(Calories) AS Total_calories
FROM fitbit.Daily_calories
group by Id
Id
Total_calories
1503960366
56309
1624580081
45984
1644430081
84339

I believe your current solution is returning additional rows as the result of the JOIN.
Let's look at an example data set
Steps
id | total
a | 5
a | 7
b | 3
Calories
id | total
a | 100
a | 300
b | 400
Now, if we SELECT * FROM Calories, we'd get 3 rows. If we SELECT * FROM Calories GROUP BY id, we'd get two rows.
But if we use a JOIN:
SELECT Steps.id, Steps.total AS steps, Calories.total AS cals FROM Steps
JOIN Calories
ON Steps.id = Calories.id
WHERE id = 'a'
This would return the following:
Steps_Calories
id | steps | cals
a | 5 | 100
a | 5 | 300
a | 7 | 100
a | 7 | 300
So now if we GROUP BY & SUM(steps), we get 24, instead of the expected 12, because the JOIN returns each pairing of steps & calories.
To mitigate this, we can use sub-queries & group & sum within the sub-queries
SELECT Steps.id, Steps.total AS steps, Calories.total AS cals
FROM (SELECT id, SUM(total) FROM Steps GROUP BY id) as step_totals
JOIN (Select id, SUM(total) FROM Cals GROUP BY id) as cal_totals
JOIN Calories
ON cal_totals.id = step_totals.id
Now each subquery only returns a single row for each id, so the join only returns a single row as well.
Of course, you'll have to adapt this for your schema.

How to merge two tables with one to many relationship

I have two tables to main orders and ordered products.
Table 1: ORDERS
"CREATE TABLE IF NOT EXISTS ORDERS("
"id_order INTEGER PRIMARY KEY AUTOINCREMENT,
"o_date TEXT,"
"o_seller TEXT,"
"o_buyer TEXT,"
"o_shipping INTEGER,"
"d_amount INTEGER,"
"d_comm INTEGER,"
"d_netAmount INTEGER)"
Table 2: ORDERED_PRODUCTS
"CREATE TABLE IF NOT EXISTS dispatch_products("
"id_order INTEGER NOT NULL REFERENCES ORDERS(id_order),"
"product_name INTEGER,"
"quantity INTEGER,"
"rate INTEGER)"
I tried to join these two tables using following query:
SELECT *
FROM ORDERS a
INNER JOIN ORDERED_PRODUCTS b
ON a.id_order = b.id_order
WHERE a.buyer = 'abc'
The issue is with the entries with multiple products in table 2.
The output I'm getting is like below:
order_ID date seller buyer Ship amt comm nAmt Prod Qty Rate
1 A x 5 100 5 115 Scale 10 10
2 B abc 10 100 5 115 pen 5 10
2 B abc 10 100 5 115 paper 10 5
3 C xyz 10 100 5 220 book 5 20
3 C xyz 10 100 5 220 stapl 10 10
expected output:
order_ID date seller buyer Ship amt comm nAmt Prod Qty Rate
1 A x 5 100 5 115 Scale 10 10
2 B abc 10 100 5 115 pen 5 10
Paper 10 5
3 C xyz 10 100 5 220 Book 5 20
Stapl 10 10

Databases don't really work like that; you got what you asked for, and with no duplicates (all rows are different). You're looking at the columns of data that came from orders and saying "oh, the data is duplicated" but it isn't - it's joined "in context"
Imagine I gave you just one of your sample rows from your expected output:
Paper 10 5
Promise I just copy pasted that.
What order is it from?
No idea.. You've lost the context, so it could be from any order. Rows are individual entities, that stand alone and without reference to any other row, as a set of data. This is why the same order info needs to appear on each row. A database could be made to produce the expected output you asked for, but it would be really quote complex in a low end database like sqlite. More important to me is to point out why there's a difference between what you thought the query would give you, and what it gave you, as I think that's the real problem: the query gave you what it was supposed to, there's no fault in it; it's more a faulty assumption of what you'd get
If you're trying to prepare a report that uses the order as some kind of header, select them individually in the front end app. Select ALL the orders, then one by one (order by order) pull all the item detail out, building the report as you go:
myorders = dbquery("SELECT * FROM ORDERS")
for each(order o in myorders)
print(o.header)
details = dbquery("SELECT * FROM dispatch_products where id_order = ?", o.id)
for each(detail d in details)
print(d.prod, d.qty, d.rate)

Here's a way to make the DB do it, but you'll need a version of SQLite that supports window functions (3.10 doesn't) or another db (SQLS > 2008, Oracle > 9, or other big-name db from the last 10 or so years, or a very recent MySQL):
SELECT
CASE WHEN rn = 1 THEN d.o_date END as o_date,
CASE WHEN rn = 1 THEN d.o_seller END as o_seller,
CASE WHEN rn = 1 THEN d.o_buyer END as o_buyer,
CASE WHEN rn = 1 THEN d.o_shipping END as o_shipping,
CASE WHEN rn = 1 THEN d.d_amount END as d_amount,
CASE WHEN rn = 1 THEN d.d_comm END as d_comm,
CASE WHEN rn = 1 THEN d.d_netAmount END as d_netAmount,
d.name,
d.qty,
d.rate
FROM
SELECT o.*, op.name, op.qty, op.rate, row_number() over(partition by o.id_order order by op.name, op.qty, op.rate) rn
FROM ORDERS o
INNER JOIN ORDERED_PRODUCTS op
ON o.id_order = op.id_order
WHERE o.buyer = 'abc'
) d
ORDER BY d.id_order, d.rn
We basically take your query, add on a row number that restarts every time order id changes, and only show data from the orders table where rownumber is 1. If your SQLite doesn't have row_number you can fake it: How to use ROW_NUMBER in sqlite which i'll leave as an exercise for the reader :)

How can I SELECT the max row in a table SQL?

I have a little problem.
My table is:
Bill Product ID Units Sold
----|-----------|------------
1 | 10 | 25
1 | 20 | 30
2 | 30 | 11
3 | 40 | 40
3 | 20 | 20
I want to SELECT the product which has sold the most units; in this sample case, it should be the product with ID 20, showing 50 units.
I have tried this:
SELECT
SUM(pv."Units sold")
FROM
"Products" pv
GROUP BY
pv.Product ID;
But this shows all the products, how can I select only the product with the most units sold?

Leaving aside for the moment the possibility of having multiple products with the same number of units sold, you can always sort your results by the sum, highest first, and take the first row:
SELECT pv."Product ID", SUM(pv."Units sold")
FROM "Products" pv
GROUP BY pv."Product ID"
ORDER BY SUM(pv."Units sold") DESC
LIMIT 1
I'm not quite sure whether the double-quote syntax for column and table names will work - exact syntax will depend on your specific RDBMS.
Now, if you do want to get multiple rows when more than one product has the same sum, then the SQL will become a bit more complicated:
SELECT pv.`Product ID`, SUM(pv.`Units sold`)
FROM `Products` pv
GROUP BY pv.`Product ID`
HAVING SUM(pv.`Units sold`) = (
select max(sums)
from (
SELECT SUM(pv2.`Units sold`) as "sums"
FROM `Products` pv2
GROUP BY pv2.`Product ID`
) as subq
)
Here's the sqlfiddle

SELECT SUM(pv."Units sold") as `sum`
FROM "Products" pv
group by pv.Product ID
ORDER BY sum DESC
LIMIT 1
limit 1 + order by

The Best and effective way to this is Max function
Here's The General Syntax of Max function
SELECT MAX(ID) AS id
FROM Products;
and in your Case
SELECT MAX(Units Sold) from products
Here is the Complete Reference to MIN and MAX functions in Query
Click Here

greatest N per group with padding

I've been trying to solve this problem over the weekend, without luck so far. I have two tables:
TopOffers:
OfferId RetailerId Order
1 38 0
2 8 3
3 17 2
4 22 1
And Offers:
Id RetailerId Name Description etc...
1 3 Strawberry Red and smelly
2 38 Cookie Crunchy
3 17 Onion Of the nice kind
4 22 Apple Cheap
5 8 Toothbrush Lasts extra long!
My goal is to get the top 10 Offers for each Retailer ID. The order in which they should be listed is specified by the Order field in the TopOffer table (Sort order is Ascending). On top of that, the result should be padded to 10 offers when there are less than 10 TopOffer records for a retailer. The TopOffer table always contains 10 or less records per retailer.
So far I've managed to get this going, which works (I realize it doesn't get the top 10, but rather everything that's in the TopOffer table, which is alright, since the TopOffer table is always equal to or smaller than the top 10 for any retailer):
SELECT b.*
FROM
(
SELECT o.Id, to.`Order` FROM Offer AS o
LEFT JOIN TopOffer AS to
ON o.Id = to.OfferId
) AS a,
(
SELECT o.*, to.`Order` FROM Offer AS o
LEFT JOIN TopOffer AS to
ON o.Id = to.OfferId
) AS b
WHERE a.`Order` >= b.`Order` AND a.Id = b.Id
GROUP BY b.RetailerId, b.Id
HAVING Count(1) BETWEEN 1 AND 10
ORDER BY RetailerId, `Order` ASC
Unfortunately I can't seem to find any way of padding the result of this query with offers that don't have an entry in the TopOffer table if there aren't 10 TopOffer records for that retailer.
My sincerest thanks in advance for any help!

If you create a virtual table with numbers 1-10 you can left join to your results to get 10 of each
select number, results.*
from
(select 1 as number union select 2 union select 3 ... union select 10) numbers
left join
(your query here) results
on numbers.number = results.rank

How do I fix this SQL query returning improper values?

I am writing an SQL query which will return a list of auctions a certain user is losing, like on eBay.
This is my table:
bid_id bid_belongs_to_auction bid_from_user bid_price
6 7 1 15.00
8 7 2 19.00
13 7 1 25.00
The problematic area is this (taken from my full query, placed at the end of the question):
AND EXISTS (
SELECT 1
FROM bids x
WHERE x.bid_belongs_to_auction = bids.bid_belongs_to_auction
AND x.bid_price > bids.bid_price
AND x.bid_from_user <> bids.bid_from_user
)
The problem is that the query returns all the auctions on which there are higher bids, but ignoring the user's even higher bids.
So, an example when the above query works:
bid_id bid_belongs_to_auction bid_from_user bid_price
6 7 1 15.00
7 7 2 18.00
In this case, user 1 is returned as losing the auction, because there is another bid higher than the users bid.
But, here is when the query doesn't work:
bid_id bid_belongs_to_auction bid_from_user bid_price
6 7 1 15.00
8 7 2 19.00
13 7 1 25.00
In this case, user 1 is incorrectly returned as losing the auction, because there is another bid higher than one of his previous bids, but the user has already placed a higher bid over that.
If it's important, here's my full query, but I think it won't be necessary to solve the aforementioned problem, but I'm posting it here anyway:
$query = "
SELECT
`bid_belongs_to_auction`,
`auction_unixtime_expiration`,
`auction_belongs_to_hotel`,
`auction_seo_title`,
`auction_title`,
`auction_description_1`
FROM (
SELECT
`bid_belongs_to_auction`,
`bid_from_user`,
MAX(`bid_price`) AS `bid_price`,
`auctions`.`auction_enabled`,
`auctions`.`auction_unixtime_expiration`,
`auctions`.`auction_belongs_to_hotel`,
`auctions`.`auction_seo_title`,
`auctions`.`auction_title`,
`auctions`.`auction_description_1`
FROM `bids`
LEFT JOIN `auctions` ON `auctions`.`auction_id`=`bids`.`bid_belongs_to_auction`
WHERE `auction_enabled`='1' AND `auction_unixtime_expiration` > '$time' AND `bid_from_user`='$userId'
AND EXISTS (
SELECT 1
FROM bids x
WHERE x.bid_belongs_to_auction = bids.bid_belongs_to_auction
AND x.bid_price > bids.bid_price
AND x.bid_from_user <> bids.bid_from_user
)
GROUP BY `bid_belongs_to_auction`
) AS X
WHERE `bid_from_user`='$userId'
";

Here's a different approach:
$query = "
SELECT
`max_bids`.`bid_belongs_to_auction`,
`auctions`.`auction_unixtime_expiration`,
`auctions`.`auction_belongs_to_hotel`,
`auctions`.`auction_seo_title`,
`auctions`.`auction_title`,
`auctions`.`auction_description_1`
FROM `auctions`
INNER JOIN (
SELECT
`bid_belongs_to_auction`,
MAX(`bid_price`) AS `auction_max_bid`,
MAX(CASE `bid_from_user` WHEN '$userId' THEN `bid_price` END) AS `user_max_bid`
FROM `bids`
GROUP BY `bid_belongs_to_auction`
) AS `max_bids` ON `auctions`.`auction_id` = `max_bids`.`bid_belongs_to_auction`
WHERE `auctions`.`auction_enabled`='1'
AND `auctions`.`auction_unixtime_expiration` > '$time'
AND `max_bids`.`user_max_bid` IS NOT NULL
AND `max_bids`.`user_max_bid` <> `max_bids`.`auction_max_bid`
";
Basically, when you are retrieving the max bids for all the auctions, you are also retrieving the specific user's max bids along. Next step is to join the obtained list to the auctions table and apply an additional filter on the user's max bid being not equal to the auction's max bid.
Note: the `max_bids`.`user_max_bid` IS NOT NULL condition might be unnecessary. It would definitely be so in SQL Server, because the non-nullness would be implied by the `max_bids`.`user_max_bid` <> `max_bids`.`auction_max_bid` condition. I'm not sure if it's the same in MySQL.

Untested, but this is how I would approach it. Ought to perform OK if there's an index on userid and also one on auctionid.
select OurUserInfo.auctionid, OurUserInfo.userid,
OurUserInfo.ourusersmaxbid, Winningbids.TopPrice
from
(
select A.auctionid, A.userid, max(A.price) as OurUsersMaxBid
from auctions A where userid = ?
group by A.auctionid, A.userid
) as OurUserInfo
inner join
(
-- get the current winning bids for all auctions in which our user is bidding
select RelevantAuctions.auctionid, max(auctions.price) as TopPrice
from auctions inner join
(
select distinct auctionid from auctions where userid = ? -- get our user's auctions
) as RelevantAuctions
on auctions.auctionid = RelevantAuctions.auctionid
group by RelevantAuctions.auctionid
) as WinninBids
on OurUserInfo.auctionid = winningbids.auctionid
where WinninBids.TopPrice > OurUserInfo.ourusersmaxbid

Instead of
SELECT 1
FROM bids x
WHERE x.bid_belongs_to_auction = bids.bid_belongs_to_auction
AND x.bid_price > bids.bid_price
AND x.bid_from_user <> bids.bid_from_user
try this:
SELECT 1
FROM (SELECT BID_ID,
BID_BELONGS_TO_AUCTION,
BID_FROM_USER,
BID_PRICE
FROM (SELECT BID_ID,
BID_BELONGS_TO_AUCTION,
BID_FROM_USER,
BID_PRICE,
RANK ()
OVER (
PARTITION BY BID_BELONGS_TO_AUCTION, BID_FROM_USER
ORDER BY BID_PRICE DESC)
MY_RANK
FROM BIDS)
WHERE MY_RANK = 1) x
WHERE x.bid_belongs_to_auction = bids.bid_belongs_to_auction
AND x.bid_price > bids.bid_price
AND x.bid_from_user <> bids.bid_from_user;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL select rows while ignoring duplicate set of columns - sql

Related

Trying to display Using Sum() with JOIN doesn't provide correct output

How to merge two tables with one to many relationship

How can I SELECT the max row in a table SQL?

greatest N per group with padding

How do I fix this SQL query returning improper values?

Categories

Resources