How to get row numbers without duplicating - sql

I want to get row numbers from list. I tried ROW_NUMBER() and DENSE_RANK with different variations, getting just duplicated rows.
With my code (code is below) SQL returns list of all orders, which are including some product id's of order 20 (those three products '1013', '1024', '1025').
Problem is when I try to get row_numbers out of that list, it duplicate some rows because there are more than one product including in that order.
With my code it look like this:
Order_number
20
22
27
With ROW_NUMBER() it looks like this and that is problem:
Row_number | Order_number
1 20
2 20
3 20
4 22
5 27
6 27
I want it look like this:
Row_number | Order_number
1 20
2 22
3 27
SELECT DISTINCT ORDER_ID AS 'ORDERS, WHICH HAVE AT LEAST ONE PRODUCT OF ORDER 20'
FROM ORDERS
INNER JOIN STORAGE ON ORDERS.PRODUCT_ID = STORAGE.PRODUCT_ID
WHERE STORAGE.PRODUCT_ID IN ('1013', '1024', '1025');

I would suggest using exists, so you don't have to deal with duplicate elimination:
SELECT o.ORDER_ID
FROM ORDERS o
WHERE EXISTS (SELECT 1
FROM STORAGE S
WHERE o.PRODUCT_ID = s.PRODUCT_ID AND
s.PRODUCT_ID IN (1013, 1024, 1025)
);
With an index on STORAGE(ORDER_ID, PRODUCT_ID) this should have very good performance.
You can also do this directly using aggregation on STORAGE:
SELECT s.ORDER_ID
FROM STORAGE S
WHERE s.PRODUCT_ID IN (1013, 1024, 1025)
GROUP BY s.ORDER_ID;

1.Get your distinct ORDER_ID first, then number:
SELECT ROW_NUMBER() OVER (ORDER BY ORDER_ID), ORDER_ID
FROM
(
SELECT ORDER_ID
FROM ORDERS
INNER JOIN STORAGE ON ORDERS.PRODUCT_ID = STORAGE.PRODUCT_ID
WHERE STORAGE.PRODUCT_ID IN ('1013', '1024', '1025')
GROUP BY ORDER_ID
) dt
2.Don't join to STORAGE table, instead use a correlated subquery
SELECT ROW_NUMBER() OVER (PARTITION BY ORDER_ID), ORDER_ID
FROM ORDERS o
WHERE EXISTS
(
SELECT 1
FROM STORAGE s
WHERE s.PRODUCT_ID IN ('1013', '1024', '1025')
AND s.PRODUCT_ID = o.PRODUCT_ID
) dt
3.Use DENSE_RANK() (haven't tested since you don't say what RDBMS you are using, but it may work)
SELECT DISTINCT DENSE_RANK() OVER (ORDER BY ORDER_ID), ORDER_ID
FROM ORDERS
INNER JOIN STORAGE ON ORDERS.PRODUCT_ID = STORAGE.PRODUCT_ID
WHERE STORAGE.PRODUCT_ID IN ('1013', '1024', '1025')

Related

Counting table rows where a column value first appears after date XYZ

We have an orders table that looks like so:
orderId
customerId
orderDate
320
45
2020-01-01
455
67
2021-02-11
122
45
2019-04-22
Based on this I need to count all 'new' customers that first entered the system after date XYZ.
I'm thinking of something involving a having clause but wondered if there was a better way to go about it. Something along these lines (SQL may not be exact, but the general idea):
select count(*) from (select distinct(customerId) from orders group by customerId having min(orders.orderDate) > XYZ) as foo
Is there a better / faster way to go about this?
Assuming you wanted the count of new customers coming into the system after 2021-02-11, you could try:
SELECT COUNT(DISTINCT customerId)
FROM orders o1
WHERE
orderDate > '2021-02-11' AND
NOT EXISTS (SELECT 1 FROM orders o2
WHERE o2.customerId = o1.customerId AND o2.orderDate <= '2021-02-11');
The above logic reads, in plain English, to count any customer record appearing after 2021-02-11, where that customer also did not appear previously in a record on or before 2021-02-11.
Your query is already fine, another option is to use partition and count only 1 customerId (alternative of distinct keyword)
select count(1) from (select
row_number() over (partition by customerId order by orderDate asc) rn
from orders where orderDate > '2020-01-01') t1
where rn = 1
Try dbfiddle
You are already using this method:
select count(*)
from (select customerId, min(orderDate) as first_orderDate
from orders o
group by customerId
having min(orderDate) >= '2021-02-11'
) oc;
For performance, I would suggest using the customers table:
select count(*)
from customers c
where not exists (select 1
from orders o
where o.customerId = c.customerId and
o.orderDate < '2021-02-11'
);
For this, you want an index on orders(customerId, orderDate).

Access SQL - Delete records with same identifiers based on criteria

I have a database with multiple records with the same identifier. I want to remove just one of those records.
OrderNum Cost
10001 254
10002 343
10002 300
10003 435
10003 323
For the above table, lets say I just want to delete the records with duplicate Order Numbers that have the smaller cost. Ex: Records 10002, keep the one with a cost of 343, delete the smaller 300.
Here is the query I have come up with, however I am using the cost to identify the duplicate which is bad if there is a similar cost somewhere else in the table.
DELETE Orders.*
FROM Orders
WHERE (cost In
(Select min(cost) FROM Orders
GROUP BY [OrderNum] HAVING Count(*) > 1))
How can I query through using the Order Number and deleting the one smaller of value that has a duplicate?
I'll explain the solution in stages:
SELECT OrderNum, Min(Cost) as MinCost
FROM Orders
GROUP BY OrderNum
HAVING COUNT(*) > 1
This returns records you intend to delete:
OrderNum MinCost
10002 300
10003 323
The following is another version of the same query using sub-SELECTs:
SELECT *
FROM
(
SELECT OrderNum, Min(Cost) as MinCost
FROM Orders
GROUP BY OrderNum
HAVING COUNT(*) > 1
) M
We want to join the marked for deletion records back to the Orders table, one way to achieve this is using an EXISTS statement:
SELECT *
FROM Orders O
WHERE EXISTS (
SELECT *
FROM
(
SELECT OrderNum, Min(Cost) as MinCost
FROM Orders
GROUP BY OrderNum
HAVING COUNT(*) > 1
) M
WHERE O.OrderNum = M.OrderNum
AND O.Cost = M.MinCost
)
Now that we've mastered the SELECT statement needed, we turn it into the DELETE statement:
DELETE
FROM Orders O
WHERE EXISTS (
SELECT *
FROM
(
SELECT OrderNum, Min(Cost) as MinCost
FROM Orders
GROUP BY OrderNum
HAVING COUNT(*) > 1
) M
WHERE O.OrderNum = M.OrderNum
AND O.Cost = M.MinCost
)
If you have large amounts of data, you may wish to create an index to optimize join:
CREATE INDEX IX_Orders_001 ON Orders (OrderNum, Cost);
You want to really do something like:
WHERE (ordernum, cost) IN (SELECT ordernum, min(cost) as cost FROM Orders GROUP BY OrderNum HAVING COUNT(*) > 1);
But Access doesn't support tuples like this as many larger RDBMS's do.
Instead you could concatenate your tuples:
WHERE ordernum & cost IN (SELECT ordernum & min(cost) FROM Orders GROUP BY OrderNum HAVING Count(*) > 1);
This will remove all duplicates but the largest one for each
delete a from yourtable a
join
(select *, row_number() OVER (partition by ordernum, cost ORDER BY ordernum, cost desc) rownum from yourtable )b
on a.ordernum=b.ordernum
where rownum<>1
You can use JOIN to delete the smaller cost of each OrderNum like below :
DELETE Orders.*
FROM Orders
join (Select OrderNum, max(cost) as cost FROM Orders
GROUP BY [OrderNum] HAVING Count(*) > 1) as R
on Orders.OrderNum=R.OrderNum and Orders.cost < R.cost

SQL Select Group By Min() - but select other

I want to select the ID of the Table Products with the lowest Price Grouped By Product.
ID Product Price
1 123 10
2 123 11
3 234 20
4 234 21
Which by logic would look like this:
SELECT
ID,
Min(Price)
FROM
Products
GROUP BY
Product
But I don't want to select the Price itself, just the ID.
Resulting in
1
3
EDIT: The DBMSes used are Firebird and Filemaker
You didn't specify your DBMS, so this is ANSI standard SQL:
select id
from (
select id,
row_number() over (partition by product order by price) as rn
from orders
) t
where rn = 1
order by id;
If your DBMS doesn't support window functions, you can do that with joining against a derived table:
select o.id
from orders o
join (
select product,
min(price) as min_price
from orders
group by product
) t on t.product = o.product and t.min_price = o.price;
Note that this will return a slightly different result then the first solution: if the minimum price for a product occurs more then once, all those IDs will be returned. The first solution will only return one of them. If you don't want that, you need to group again in the outer query:
select min(o.id)
from orders o
join (
select product,
min(price) as min_price
from orders
group by product
) t on t.product = o.product and t.min_price = o.price
group by o.product;
SELECT ID
FROM Products as A
where price = ( select Min(Price)
from Products as B
where B.Product = A.Product )
GROUP BY id
This will show the ID, which in this case is 3.

SQL select x number of rows from table based on column value

I'm looking for a way to select top 3 rows from 4 vendors from a table of products, following this criteria:
Must select 4 vendors.
Must select top 3 products for each vendor ordered by product rating.
I tried doing something like:
select top 12 * product, vendor
from products
order by productrating
but obvisously that goesn't give me 3 products for each vendor.
The product table has:
productid (int), productname (nvarchar(500)), productrating (float),
vendor (id), price (float).
These are the relevant columns.
You can use the ANSI standard row_number() function to get 3 products for each vendor:
select p.*
from (select p.*,
row_number() over (partition by vendor order by rating desc) as seqnum
from products p
) p
where p.seqnum <= 3
If you want 4 vendors:
select top 12 p.*
from (select p.*,
row_number() over (partition by vendor order by rating desc) as seqnum
from products p
) p
where p.seqnum <= 3
order by vendor;
This will give you top 3 Products per vendor. You didn't specify how you're selecting the 4 vendors. That logic could easily be included using the WHERE clause or using a different ORDER BY depending on how you select the 4 vendors.
SELECT TOP 12 vnd.Vendor, apl.ProductName
FROM Vendors vnd
CROSS APPLY (
SELECT TOP 3 ProductID
FROM Products prd
WHERE vnd.ProductID = prd.ProductID
ORDER BY prd.ProductRating DESC
) apl
ORDER BY vnd.VendorName
If you have fixed list of vendors you would like to query you can use the following approach:
SELECT TOP 3
p.ProductID
FROM Products p
WHERE p.ProductID IN ( SELECT v.ProductID
FROM Vendors v
WHERE v.VendorID IN (Vendor1ID, Vendor2ID, Vendor3ID, Vendor4ID)
ORDER BY p.ProductRating DESC
You will need to look for a work-around if you have vendor names to select them filtering out by its name but keeping the join coindition trhough its IDs.

Oracle Complex Sort - Multiple Children

I have a table as follows:
BRAND_ID PRODUCT_ID PRODUCT_DESC PRODUCT_TYPE
100 1000 Tools A
100 1500 Tools A
200 2000 Burgers B
300 3000 Clothing C
300 4000 Makeup D
300 5000 Clothing C
So a Brand can have multiple products, all of the same type or mixed types. If a brands products are all of the same type I need them first in the result, sorted by product type, followed by brands that have different product types. I can do this programatically but I wanted to see if there is a way to do it in the query.
I don't have access to Oracle, but I believe something along these lines should work...
WITH
ranked_data
AS
(
SELECT
COUNT(DISTINCT product_type) OVER (PARTITION BY brand_id) AS brand_rank,
MIN(product_type) OVER (PARTITION BY brand_id) AS first_product_type,
*
FROM
yourTable
)
SELECT
*
FROM
ranked_data
ORDER BY
brand_rank,
first_product_type,
brand_id,
product_type,
product_description
An alternative is to JOIN on to a sub-query to calculate the two sorting fields.
SELECT
yourTable.*
FROM
yourTable
INNER JOIN
(
SELECT
brand_id,
COUNT(DISTINCT product_type) AS brand_rank,
MIN(product_type) AS first_product_type,
FROM
yourTable
GROUP BY
brand_id
)
AS brand_summary
ON yourTable.brand_id = brand_summary.brand_id
ORDER BY
brand_summary.brand_rank,
brand_summary.first_product_type,
yourTable.brand_id,
yourTable.product_type,
yourTable.product_description
How about selecting from a sub-select that figures out number of distinct brands and then sorting by the count?
select t.BRAND_ID,
t.PRODUCT_ID,
t.PRODUCT_DESC,
t.PRODUCT_TYPE
from (select t2.BRAND_ID,
t2.PRODUCT_ID,
count(distinct t2.PRODUCT_TYPE) cnt
from YOURTABLE t2
group by t2.BRAND_ID, t2.PRODUCT_ID) data
join YOURTABLE t on t.BRAND_ID = data.BRAND_ID and t.PRODUCT_ID = data.PRODUCT_ID
order by data.cnt, BRAND_ID, PRODUCT_ID, PRODUCT_TYPE