Find Duplicates in a table

Find Duplicates in a table - sql

My table contains multiple lots (LOT_ID) and each lot contains multiple products(PRODUCT_ID) and there are multiple orders (ORDER_ID) under each Product. I would like to know the order ID’s which are repeated for multiple products for a given LOT
S.NO LOT_ID Product_ID Order_ID
1 101 P108 90001
2 101 P109 90001
3 101 P110 80900
4 102 S189 10098
5 102 S234 10087
6 102 S465 10098
7 102 S342 10050
8 103 L109 20090
9 103 L110 20098
10 103 L111 20020
Desired result
S.NO LOT_ID Product_ID Order_ID
1 101 P108 90001
2 101 P109 90001
3 102 S189 10098
4 102 S465 10098

I think you should apply group by on order_id first and you will get the result set. Please check the answer posted, However I haven't run this.
select LOT_ID, Product_ID, Order_ID
from <tableName>
where Order_ID IN (SELECT Order_ID FROM <tableName> where LOT_ID in (101,102)
GROUP BY Order_ID HAVING COUNT(*) > 1);

count repeats and then select the quantity you need
select t.*, count(*) over (partition by t.LOT_ID, t.Product_ID, t.Order_ID) as c
, count(*) over (partition by t.LOT_ID, t.Order_ID) as c2
from t
When count of unique strings is not equal count of unique Lots and Orders - is your case.

Related

MS Access SQL, How to return only the newest row before a given date joined to a master table

I have two tables in a MS Access database as shown below. CustomerId is a primary key and fkCustomerId is a foreign key linked to the CustomerId in the other table.
Customer table
CustomerId
Name
1
John
2
Bob
3
David
Purchase table
fkCustomerId
OrderDate
fkStockId
1
01/02/2010
100
3
08/07/2010
101
2
14/01/2011
102
2
21/10/2011
103
3
02/03/2012
104
1
30/09/2012
105
3
01/01/2013
106
1
18/04/2014
107
3
22/11/2015
108
I am trying to return a list of customers showing the last fkStockId for each customer ordered before a given date.
So for the date 01/10/2012, I'd be looking for a return of
fkCustomerId
Name
fkStockId
1
John
105
2
Bob
103
3
David
104
A solution seems to be escaping me, any help would be greatly appreciated.

You can use nested select to get last order date.
SELECT Purchase.fkCustomerId,
Name,
fkStockId
FROM Purchase
JOIN
(
SELECT fkCustomerId,
MAX(OrderDate) as last_OrderDate
FROM Purchase
WHERE OrderDate < '01/10/2012'
GROUP BY fkCustomerId
) AS lastOrder
ON lastOrder.fkCustomerId = Purchase.fkCustomerId
AND last_OrderDate = OrderDate
LEFT JOIN Customer
ON Customer.CustomerId = Purchase.fkCustomerId
This example assumes OrderDate before '01/10/2012'. You might need to change it if you want it to be filtered by a different value.
Another assumption is that there's only one corresponding fkStockId for each OrderDate

Aggregation and joining 2 tables or Sub Queries

I have the following tables.
Order_table
Order_ID
Item_ID
Qty_shipped
1111
11
4
1111
22
6
1111
33
6
1111
44
6
Shipping_det
Order_ID
Ship_num
Ship_cost
1111
1
16.84
1111
2
16.60
1111
3
16.60
I want my output to be as follows,
Order ID
Qty_shipped
Ship_cost
1111
22
50.04
I wrote the following query,
select sum(O.qty_shipped) as Qty_shipped, sum(S.Ship_cost) as Total_cost
from Order_table O
join shipping_det S on O.Order_ID = S.Order_ID
and I got my output as
Qty_shipped
Total_cost
66
200.16
As per my understanding, because I joined the two tables, Qty_shipped got multipled 3 times and Total_cost got multiplied 4 times.
Any help would be appreciated.
Thanks in advance.

You need to aggregate before joining. Or, to union the table together and then aggregate:
select order_id, sum(qty_shipped), sum(ship_cost)
from ((select order_id, qty_shipped, 0 as ship_cost
from order_table
) union all
(select order_id, 0, ship_cost
from shipping_det
)
) os
group by order_id;

How to find number of distinct phones per customer and put the customers(counts) in different buckets as per the counts?

Below is the table where I have customer_id and different phones they have.
customer_id phone_number
101 123456789
102 234567891
103 345678912
102 456789123
101 567891234
104 678912345
105 789123456
106 891234567
106 912345678
106 456457234
101 655435664
107 453426782
Now, I want to find customer_id and distinct phone number count.
So I used this query:
select distinct customer_id ,count(distinct phone_number)
from customer_phone;
customer_id no of phones
101 3
102 2
103 1
104 1
105 1
106 3
107 1
And, from the above table my final goal is to achieve the below output which takes the counts and puts in different buckets and then count number of consumers that fall in those buckets.
Buckets no of consumers
3 2
2 1
1 4
There are close to 200 million records. Can you please explain an efficient way to work on this?

You can use width_bucket for that:
select bucket, count(*)
from (
select width_bucket(count(distinct phone_number), 1, 10, 10) as bucket
from customer_phone
group by customer_id
) t
group by bucket;
width_bucket(..., 1, 10, 10) creates ten buckets for the values 1 through 10.
Online Example: http://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=1e6d55305570499f363837aba21bdc7e

Use two aggregations:
select cnt, count(*), min(customer_id), max(customer_id)
from (select customer_id, count(distinct phone_number) as cnt
from customer_phone
group by customer_id
) c
group by cnt
order by cnt;

Subtracting rows depending on values of another column

I have two tables purchase, I want to subtract purchase date. depending on Customer ID, there are repeating customer ID's, so I want to subtract purchase date of Customer ID 105 and 105, 108 and 108 etc.
I have the following code, but it is subtracting each purchase date from the next purchase date
SELECT DATEDIFF(DAY,P1.PURCHASEDATE,P2.PURCHASEDATE) AS "diff in days since last purchase"
FROM Purchases P1
JOIN Purchases P2
ON P1.CustomerID= P2.CustomerID

Try adding to your ON a not equal: P1.PURCHASEID <> P2.PURCHASEID , meaning something like this:
SELECT DATEDIFF(DAY,P1.PURCHASEDATE,P2.PURCHASEDATE) AS "diff in days"
FROM Purchases P1
JOIN Purchases P2
(ON P1.CustomerID= P2.CustomerID and P1.PURCHASEID <> P2.PURCHASEID )

You can use OUTER APPLY:
;WITH Purchases AS (
SELECT *
FROM (VALUES
(1,'2012-08-15',1,105,'a510'),
(2,'2012-08-15',2,102,'a510'),
(3,'2012-08-15',3,103,'a506'),
(4,'2012-08-16',1,105,'a510'),
(5,'2012-08-17',5,106,'a507'),
(6,'2012-08-17',5,107,'a509'),
(7,'2012-08-18',4,108,'a502'),
(8,'2012-08-19',2,108,'a510'),
(9,'2012-08-19',3,109,'a502'),
(10,'2012-08-20',3,110,'a503')
) as t(PurchaseID,PurchaseDate,Qty,CustomerID,ProductID)
)
SELECT p1.*,
DATEDIFF(DAY,P2.PurchaseDate,P1.PurchaseDate) as ddiff
FROM Purchases p1
OUTER APPLY (
SELECT TOP 1 *
FROM Purchases
WHERE p1.CustomerID = CustomerID
AND PurchaseDate < p1.PurchaseDate
ORDER BY PurchaseDate DESC
) p2
Will output:
PurchaseID PurchaseDate Qty CustomerID ProductID ddiff
1 2012-08-15 1 105 a510 NULL
2 2012-08-15 2 102 a510 NULL
3 2012-08-15 3 103 a506 NULL
4 2012-08-16 1 105 a510 1
5 2012-08-17 5 106 a507 NULL
6 2012-08-17 5 107 a509 NULL
7 2012-08-18 4 108 a502 NULL
8 2012-08-19 2 108 a510 1
9 2012-08-19 3 109 a502 NULL
10 2012-08-20 3 110 a503 NULL
Also you can use LAG (SQL Server 2012 and up):
SELECT *,
DATEDIFF(DAY,LAG(PurchaseDate,1,NULL) OVER (PARTITION BY CustomerID ORDER BY PurchaseDate),PurchaseDate) as ddiff
FROM Purchases

MS Access SQL Query - Showing total number of orders in a certain year

SQL appears to be more complex than I anticipated. My problem: for each customer, I would like to show the Customer ID and the total number of orders placed in 2011.
My table looks like this
Table: Order_t
Order_ID Order_Date Customer_ID
-------- ---------- -----------
1001 10/21/2011 1
1002 10/25/2011 8
1003 10/26/2011 15
1004 10/27/2011 5
1005 11/24/2011 3
1006 11/27/2011 2
1007 11/28/2011 11
1008 12/3/2011 12
1009 12/5/2011 1
1010 1/16/2012 4
I would like my query to display a table like this:
Customer_ID Orders_Placed
----------- -------------
1 2
2 1
3 1
5 1
8 1
11 1
12 1
15 1
My current query is this (I am currently completely neglecting the Date part because I haven't even figured out the grouping yet:
SELECT Customer_ID, SUM(Order_ID) AS Orders_Placed
FROM Order_t
GROUP BY Order_ID, Customer_ID
And this is my obviously wrong query:
Customer_ID Orders_Placed
----------- -------------
1 1001
8 1002
15 1003
5 1004
3 1005
2 1006
11 1007
12 1008
1 1009
4 1010
Thanks for help, but I would also like to understand where the problem is in my logic. What crucial part do I seem to not understand?

The problem with your logic is this
GROUP BY Order_ID, Customer_ID
Which means each combination of (Order_ID, Customer_ID) is put in a different GROUP. Since Order_ID alone is unique, practically no grouping is happening.
To do it correctly, you need to GROUP BY the Customer_ID (it reads like what you need, doesn't it), then COUNT the Orders. Finally, add the date filter as well.
SELECT Customer_ID, COUNT(Order_ID) AS Orders_Placed
FROM Order_t
WHERE Order_Date >= #1/1/2011# and Order_Date < #1/1/2012#
GROUP BY Customer_ID

Use count() instead
SELECT Customer_ID, COUNT(Order_ID) AS Orders_Placed
FROM Order_t
GROUP BY Customer_ID

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find Duplicates in a table - sql

count repeats and then select the quantity you need select t., count() over (partition by t.LOT_ID, t.Product_ID, t.Order_ID) as c , count(*) over (partition by t.LOT_ID, t.Order_ID) as c2 from t When count of unique strings is not equal count of unique Lots and Orders - is your case.

Related

MS Access SQL, How to return only the newest row before a given date joined to a master table

Aggregation and joining 2 tables or Sub Queries

How to find number of distinct phones per customer and put the customers(counts) in different buckets as per the counts?

Subtracting rows depending on values of another column

MS Access SQL Query - Showing total number of orders in a certain year

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find Duplicates in a table - sql

count repeats and then select the quantity you need select t.*, count(*) over (partition by t.LOT_ID, t.Product_ID, t.Order_ID) as c , count(*) over (partition by t.LOT_ID, t.Order_ID) as c2 from t When count of unique strings is not equal count of unique Lots and Orders - is your case.

Related

MS Access SQL, How to return only the newest row before a given date joined to a master table

Aggregation and joining 2 tables or Sub Queries

How to find number of distinct phones per customer and put the customers(counts) in different buckets as per the counts?

Subtracting rows depending on values of another column

MS Access SQL Query - Showing total number of orders in a certain year

Categories

Resources

count repeats and then select the quantity you need select t., count() over (partition by t.LOT_ID, t.Product_ID, t.Order_ID) as c , count(*) over (partition by t.LOT_ID, t.Order_ID) as c2 from t When count of unique strings is not equal count of unique Lots and Orders - is your case.