Show duplicate rows(all columns of that row) where all columns are duplicate except one column - sql

In below table, I need to select duplicate records where all columns are duplicate except Customer Type and Price for a particular week.
For e.g
Week Customer Product Customer Type Price
1 Alex Cycle Consumer 100
1 Alex Cycle Reseller 101
2 John Motor Consumer 200
3 John Motor Consumer 200
3 John Motor Reseller 201
I am using below query but this query doesn't show me both costumer type, it just shows me consumer count(*) for a combination.
select Week, Customer, product, count(distinct Customer Type)
from table
group by Week, Customer, product
having count(distinct Customer Type) > 1
I would like to see below result, that shows me duplicate values and not just the count(*) of duplicate row. I am trying to see customers assigned to multiple customer types in a particular week for a product and at the same time show me all columns. It doesn't matter if the price is different.
Week Customer Product Customer Type Price
1 Alex Cycle Consumer 100
1 Alex Cycle Reseller 101
3 John Motor Consumer 200
3 John Motor Reseller 201
Thanks
Shaki

WITH CustomerDistribution_CTE (WeekC ,CustomerC, ProductC)
AS
(
select Week, Customer, product
from Your_Table_Name group by Week, Customer,
product having count(distinct CustomerType) > 1
)
SELECT Y.*
FROM CustomerDistribution_CTE C
inner join Your_Table_Name Y on C.WeekC =Y.Week
and C.CustomerC =Y.Customer and C.productC =Y.product
Note :Please replace "Your_Table_Name" with exact table name and Try.

One way to achieve this, using generic SQL, is to use a "derived table" like this:
select x.*
from tablex x
inner join (
select Week, Customer, Product
from tablex
group by Week, Customer, Product
having count(*) > 1
) d on x.Week = d.Week and x.Customer = d.Customer and x.Product = d.Product

You can do that by using DISTINCT like
select DISTINCT Customer,Product,Customer_Type,Price from Your_Table_Name
will look for DISTINCT combination.
Note: This query if of SQL Server

From the expected result that you have pasted, it looks like you are not concerned about the week.
If you have a ID (incremental PK), it would be much simpler like below
select * from table where ID not in
(select max(ID) from table group by Customer, Product, CustomerType having count(*) > 1 )
This is tested on MySQL. Do you have a ID column?
In case you don't have a ID column, try the below:
select max(week) week, Customer, Product, CustomerType, max(price) from device group by Customer, Product, CustomerType;
I have not verified this one.

This will return your expected result set:
select *
from table
-- Teradata syntax to filter the result of an OLAP-function
-- (similar to HAVING after GROUP BY)
qualify
count(*)
over (partition by Week, Customer, product) > 1
For other DBMSes you will need to nest your query:
select *
from
(
select ...,
count(*)
over (partition by Week, Customer, product) as cnt
from table
) as dt
where cnt > 1
Edit:
After re-reading your description above Select might be not exactly what you want, because it will also return rows with a single type. Then switch to:
select *
from table
-- Teradata syntax to filter the result of an OLAP-function
-- (similar to HAVING after GROUP BY)
qualify -- at least two different types:
min(Customer_Type) over (partition by Week, Customer, product)
<> max(Customer_Type) over (partition by Week, Customer, product)

Related

Best approach to display all the users who have more than 1 purchases in a month in SQL

I have two tables in an Oracle Database, one of which is all the purchases done by all the customers over many years (purchase_logs). It has a unique purchase_id that is paired with a customer_id.The other table contains the user info of all the customers. Both have a common key of customer_id.
I want to display the user info of customers who have more than 1 unique item (NOT the item quantity) purchased in any month (i.e if A customer bought 4 unique items in february 2020 they would be valid as well as someone who bought 2 items in june). I was wondering what should my correct approach be and also how to correct execute that approach.
The two approaches that I can see are
Approach 1
Count the overall number of purchases done by all customers, filter the ones that are greater than 1 and then check if they any of them were done within a month.
Use this as a subquery in the where clause of the main query for retrieving the customer info for all the customer_id which match this condition.
This is what i've done so far,this retrieves the customer ids of all the customers who have more than 1 purchases in total. But I do not understand how to filter out all the purchases that did not occur in a single arbitrary month.
SELECT * FROM customer_details
WHERE customer_id IN (
SELECT cust_id from purchase_logs
group by cust_id
having count(*) >= 2);
Approach 2
Create a temporary table to Count the number of monthly purchases of a specific user_id then find the MAX() of the whole table and check if that MAX value is bigger than 1 or not. Then if it is provide it as true for the main query's where clause for the customer_info.
Approach 2 feels like the more logical option but I cannot seem to understand how to write the proper subquery for it as the command MAX(COUNT(customer_id)) from purchase_logs does not seem to be a valid query.
This is the DDL diagram.
This is the Sample Data of Purchase_logs
Customer_info
and Item_info
and the expected output for this sample data would be
It is certainly possible that there is a simpler approach that I am not seeing right now.
Would appreciate any suggestions and tips on this.
You need this query:
SELECT DISTINCT cust_id
FROM purchase_logs
GROUP BY cust_id, TO_CHAR(purchase_date, 'YYYY-MON')
HAVING COUNT(DISTINCT item_id) > 1;
to get all the cust_ids of the customers who have more than 1 unique item purchased in any month and you can use with the operator IN:
SELECT *
FROM customer_details
WHERE customer_id IN (
SELECT DISTINCT cust_id -- here DISTINCT may be removed as it does not make any difference when the result is used with IN
FROM purchase_logs
GROUP BY cust_id, TO_CHAR(purchase_date, 'YYYY-MON')
HAVING COUNT(DISTINCT item_id) > 1
);
One approach might be to try
with multiplepurchase as (
select customer_id,month(purchasedate),count(*) as order_count
from purchase_logs
group by customer_id,month(purchasedate)
having count(*)>=2)
select customer_id,username,usercategory
from mutiplepurchase a
left join userinfo b
on a.customer_id=b.customer_id
Expanding on #MT0 answer:
SELECT *
FROM customer_details CD
WHERE exists (
SELECT cust_id
FROM purchase_logs PL
where CD.customer_id = PL.customer_id
GROUP BY cust_id, item_id, to_char(purchase_date,'YYYYMM')
HAVING count(*) >= 2
);
I want to display the user info of customers who have more than 1 purchases in a single arbitrary month.
Just add a WHERE filter to your sub-query.
So assuming that you wanted the month of July 2021 and you had a purchase_date column (with a DATE or TIMESTAMP data type) in your purchase_logs table then you can use:
SELECT *
FROM customer_details
WHERE customer_id IN (
SELECT cust_id
FROM purchase_logs
WHERE DATE '2021-07-01' <= purchase_date
AND purchase_date < DATE '2021-08-01'
GROUP BY cust_id
HAVING count(*) >= 2
);
If you want the users where they have bought two-or-more items in any single calendar month then:
SELECT *
FROM customer_details c
WHERE EXISTS (
SELECT 1
FROM purchase_logs p
WHERE c.customer_id = p.cust_id
GROUP BY cust_id, TRUNC(purchase_date, 'MM')
HAVING count(*) >= 2
);

SQL Check is there different records in table

I want to check if there are any different records in my table. I have a sales table and in this table I have 2 records for same product so actually this salesman doesn't sale more than one product. I need a select query to check salesmen who sell more than 1 (different) product.
Table name SALES -- columns sales_id, salesman_name, product,name, quantity
Need only show salesman names.
Example data and expectation
You can use COUNT(DISTINCT <Field Name>) for what you want. (same for sqlserver, MySQL and Oracle)
Check this query:
SELECT salesman_name
,COUNT(DISTINCT product) ProductCount
FROM Sales
GROUP BY salesman_name
HAVING COUNT(DISTINCT product) >1
SELECT *
FROM SALES
WHERE salesman_name IN (
SELECT
salesman_name
FROM SALES
GROUP BY product
HAVING COUNT(*) > 1
)
Just use aggregation:
select salesman_name
from sales
group by salesman_name
having min(product) <> max(product);
If you want the original rows use exists:
select s.*
from sales s
where exists (select 1
from sales s2
where s2.salesman_name = s.salesman_name and
s2.product <> s.product
);
With an index on sales(salesman_name, product), this should be the fastest way to get the original rows.
SELECT
salesman_name,product,COUNT(*)
FROM
SALES
GROUP BY
salesman_name,product
HAVING
COUNT(*) > 1

select multiple records based on order by

i have a table with a bunch of customer IDs. in a customer table is also these IDs but each id can be on multiple records for the same customer. i want to select the most recently used record which i can get by doing order by <my_field> desc
say i have 100 customer IDs in this table and in the customers table there is 120 records with these IDs (some are duplicates). how can i apply my order by condition to only get the most recent matching records?
dbms is sql server 2000.
table is basically like this:
loc_nbr and cust_nbr are primary keys
a customer shops at location 1. they get assigned loc_nbr = 1 and cust_nbr = 1
then a customer_id of 1.
they shop again but this time at location 2. so they get assigned loc_nbr = 2 and cust_Nbr = 1. then the same customer_id of 1 based on their other attributes like name and address.
because they shopped at location 2 AFTER location 1, it will have a more recent rec_alt_ts value, which is the record i would want to retrieve.
You want to use the ROW_NUMBER() function with a Common Table Expression (CTE).
Here's a basic example. You should be able to use a similar query with your data.
;WITH TheLatest AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY group-by-fields ORDER BY sorting-fields) AS ItemCount
FROM TheTable
)
SELECT *
FROM TheLatest
WHERE ItemCount = 1
UPDATE: I just noticed that this was tagged with sql-server-2000. This will only work on SQL Server 2005 and later.
Since you didn't give real table and field names, this is just psuedo code for a solution.
select *
from customer_table t2
inner join location_table t1
on t1.some_key = t2.some_key
where t1.LocationKey = (select top 1 (LocationKey) as LatestLocationKey from location_table where cust_id = t1.cust_id order by some_field)
Use an aggregate function in the query to group by customer IDs:
SELECT cust_Nbr, MAX(rec_alt_ts) AS most_recent_transaction, other_fields
FROM tableName
GROUP BY cust_Nbr, other_fields
ORDER BY cust_Nbr DESC;
This assumes that rec_alt_ts increases every time, thus the max entry for that cust_Nbr would be the most recent entry.
By using time and date we can take out the recent detail for the customer.
use the column from where you take out the date and the time for the customer.
eg:
SQL> select ename , to_date(hiredate,'dd-mm-yyyy hh24:mi:ss') from emp order by to_date(hiredate,'dd-mm-yyyy hh24:mi:ss');

Product sales by month - SQL

I just created a small data warehouse with the following details.
Fact Table
Sales
Dimensions
Supplier
Products
Time (Range is one year)
Stores
I want to query which product has the max sales by month, I mean the output to be like
Month - Product Code - Num_Of_Items
JAN xxxx xxxxx
FEB xxxx xxxxx
I tried the following query
with product_sales as(
SELECT dd.month,
fs.p_id,
dp.title,
SUM(number_of_items) Num
FROM fact_sales fs
INNER JOIN dim_products dp
ON fs.p_id = dp.p_id
INNER JOIN dim_date dd
ON dd.date_id = fs.date_id
GROUP BY dd.month,
fs.p_id,
dp.title
)
select distinct month,movie_id,max(num)
from product_sales
group by movie_id,title, month;
Instead of max of 12 rows, I am having 132 records. I need guidance with this. Thanks.
There are a few things about your query that don't make sense, such as:
Where does movie_id come from?
What is from abc? Should it be from product_sales?
That said, if you need the maximum product sales by month and you need to include the product code (or movie ID or whatever), you need an analytical query. Yours would go something like this:
WITH product_sales AS (
SELECT
dd.month,
fs.p_id,
dp.title,
SUM(number_of_items) Num,
RANK() OVER (PARTITION BY dd.month ORDER BY SUM(number_of_items) DESC) NumRank
FROM fact_sales fs
INNER JOIN dim_products dp ON fs.p_id = dp.p_id
INNER JOIN dim_date dd ON dd.date_id = fs.date_id
GROUP BY dd.month, fs.p_id, dp.title
)
SELECT month, p_id, title, num
FROM product_sales
WHERE NumRank = 1
Note that if there's a tie for top sales in any month, this query will show all top sales for the month. In other words, if product codes AAAA and BBBB are tied for top sales in January, the query results will have a January row for both products.
If you want just one row per month even if there's a tie, use ROW_NUMBER instead of RANK(), but note that ROW_NUMBER will arbitrarily pick a winner unless you define a tie-breaker. For example, to have the lowest p_id be the tie-breaker, define the NumRank column like this:
ROW_NUMBER() OVER (
PARTITION BY dd.month
ORDER BY SUM(number_of_items) DESC, p_id
) NumRank
you can user MAX () KEEP (DENSE_RANK FIRST ORDER BY ) to select the movie_id with the max value of num
...
select
month,
MAX(movie_id) KEEP (DENSE_RANK FIRST order by num desc) as movie_id,
MAX(num)
from
abc
group by month
;

Find duplicate records in a table using SQL Server

I am validating a table which has a transaction level data of an eCommerce site and find the exact errors.
I want your help to find duplicate records in a 50 column table on SQL Server.
Suppose my data is:
OrderNo shoppername amountpayed city Item
1 Sam 10 A Iphone
1 Sam 10 A Iphone--->>Duplication to be detected
1 Sam 5 A Ipod
2 John 20 B Macbook
3 John 25 B Macbookair
4 Jack 5 A Ipod
Suppose I use the below query:
Select shoppername,count(*) as cnt
from dbo.sales
having count(*) > 1
group by shoppername
will return me
Sam 2
John 2
But I don't want to find duplicate just over 1 or 2 columns. I want to find the duplicate over all the columns together in my data. I want the result as:
1 Sam 10 A Iphone
with x as (select *,rn = row_number()
over(PARTITION BY OrderNo,item order by OrderNo)
from #temp1)
select * from x
where rn > 1
you can remove duplicates by replacing select statement by
delete x where rn > 1
SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as cnt
FROM dbo.sales
GROUP BY OrderNo, shoppername, amountPayed, city, item
HAVING COUNT(*) > 1
SQL> SELECT JOB,COUNT(JOB) FROM EMP GROUP BY JOB;
JOB COUNT(JOB)
--------- ----------
ANALYST 2
CLERK 4
MANAGER 3
PRESIDENT 1
SALESMAN 4
Just add all fields to the query and remember to add them to Group By as well.
Select shoppername, a, b, amountpayed, item, count(*) as cnt
from dbo.sales
group by shoppername, a, b, amountpayed, item
having count(*) > 1
To get the list of multiple records use following command
select field1,field2,field3, count(*)
from table_name
group by field1,field2,field3
having count(*) > 1
Try this instead
SELECT MAX(shoppername), COUNT(*) AS cnt
FROM dbo.sales
GROUP BY CHECKSUM(*)
HAVING COUNT(*) > 1
Read about the CHECKSUM function first, as there can be duplicates.
Try this
with T1 AS
(
SELECT LASTNAME, COUNT(1) AS 'COUNT' FROM Employees GROUP BY LastName HAVING COUNT(1) > 1
)
SELECT E.*,T1.[COUNT] FROM Employees E INNER JOIN T1 ON T1.LastName = E.LastName
with x as (
select shoppername,count(shoppername)
from sales
having count(shoppername)>1
group by shoppername)
select t.* from x,win_gp_pin1510 t
where x.shoppername=t.shoppername
order by t.shoppername
First of all, I doubt that the result it not accurate? Seem like there are Three 'Sam' from the original table. But it is not critical to the question.
Then here we come for the question itself. Based on your table, the best way to show duplicate value is to use count(*) and Group by clause. The query would look like this
SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as RepeatTimes FROM dbo.sales GROUP BY OrderNo, shoppername, amountPayed, city, item HAVING COUNT(*) > 1
The reason is that all columns together from your table uniquely identified each record, which means the records will be considered as duplicate only when all values from each column are exactly the same, also you want to show all fields for duplicate records, so the group by will not miss any column, otherwise yes because you can only select columns that participate in the 'group by' clause.
Now I would like to give you any example for With...Row_Number()Over(...), which is using table expression together with Row_Number function.
Suppose you have a nearly same table but with one extra column called Shipping Date, and the value may change even the rest are the same. Here it is:
OrderNo shoppername amountpayed city Item Shipping Date
1 Sam 10 A Iphone 2016-01-01
1 Sam 10 A Iphone 2016-02-02
1 Sam 5 A Ipod 2016-03-03
2 John 20 B Macbook 2016-04-04
3 John 25 B Macbookair 2016-05-05
4 Jack 5 A Ipod 2016-06-06
Notice that row# 2 is not a duplicate one if you still take all columns as a unit. But what if you want to treat them as duplicate as well in this case? You should use With...Row_Number()Over(...), and the query would look like this:
WITH TABLEEXPRESSION
AS
(SELECT *,ROW_NUMBER() OVER (PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier) --if you consider the one with late shipping date as the duplicate
FROM dbo.sales)
SELECT * FROM TABLEEXPRESSION
WHERE Identifier !=1 --or use '>1'
The above query will give result together with Shipping Date, for example:
OrderNo shoppername amountpayed city Item Shipping Date Identifier
1 Sam 10 A Iphone 2016-02-02 2
Note this one is different from the one with 2016-01-01, and the reason why 2016-02-02 has been filtered out is PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier, and Shipping Date is NOT one of the column that need to be took care of for duplicate records, which means the one with 2016-02-02 still could be a perfect result for your question.
Now summarize it little bit, using count(*) and Group by clause together is the best choice when you only want to show all columns from Group byclause as the result, otherwise you will miss the columns that do not participate in group by.
While For With...Row_Number()Over(...), it is suitable in every scenario that you want to find duplicate records, however, it is little bit complicated to write the query and little bit over engineered compared to the former one.
If your purpose is to delete duplicate records from table, you have to use the later WITH...ROW_NUMBER()OVER(...)...DELETE FROM...WHERE one.
Hope this helps!
You can use below methods to find the output
with Ctec AS
(
select *,Row_number() over(partition by name order by Name)Rnk
from Table_A
)
select Name from ctec
where rnk>1
select name from Table_A
group by name
having count(*)>1
Select *
from dbo.sales
group by shoppername
having(count(Item) > 1)
Select EventID,count() as cnt
from dbo.EventInstances
group by EventID
having count() > 1
The following is running code:
SELECT abnno, COUNT(abnno)
FROM tbl_Name
GROUP BY abnno
HAVING ( COUNT(abnno) > 1 )