Return only unique rows from a Table - sql

I have a table with 4 columns and 7 rows.
This table contains 1 customer with the same ID same LNAME and FNAME.
Also the table has 2 customers with the same ID, but different LNAME or FNAME.
That is the sales reps input error. Ideally my table should have only 2 rows (Row with ID_pk 3 and 7)
I need to have the following result-sets from the above table:
All unique rows by all the four columns (Row with ID_pk 3 and 7). (excluding case # 3 listed below)
All duplicates by all the four columns (Row with ID_pk 3 and 8).
All duplicates by Customer_ID but with not matching LNAME and/or FNAME (Row with ID_pk 1, 2, 4 and 5) (these rows have to be sent back to sales reps for validation.)

Doing stuff this like relies heavily on nested queries, the GROUP BY clause, and the COUNT function.
Part 1 - Unique rows
This query will show you all the rows where the customer ID has matching data.
SELECT Customer_ID, Customer_FNAME, Customer_LNAME FROM dbo.customers WHERE Customer_ID IN (
SELECT Customer_ID FROM (
SELECT DISTINCT Customer_ID, Customer_FNAME, Customer_LNAME FROM dbo.customers
GROUP BY Customer_ID, Customer_FNAME, Customer_LNAME
) Customers
GROUP BY Customer_ID
HAVING COUNT(Customer_ID) = 1
)
GROUP BY Customer_ID, Customer_FNAME, Customer_LNAME
Part 2 - Duplicates
This query will show you all the rows that have the same data entered more than once.
SELECT Customer_ID, Customer_FNAME, Customer_LNAME
FROM dbo.customers
GROUP BY Customer_ID, Customer_FNAME, Customer_LNAME
HAVING COUNT(Customer_ID) > 1
Part 3 - Mismatched Data
This query is basically the same as the first, just looking for a different COUNT value.
SELECT Customer_ID, Customer_FNAME, Customer_LNAME FROM dbo.customers WHERE Customer_ID IN (
SELECT Customer_ID FROM (
SELECT DISTINCT Customer_ID, Customer_FNAME, Customer_LNAME FROM dbo.customers
GROUP BY Customer_ID, Customer_FNAME, Customer_LNAME
) Customers
GROUP BY Customer_ID
HAVING COUNT(Customer_ID) > 1
)
GROUP BY Customer_ID, Customer_FNAME, Customer_LNAME

You may use a CTE (Common Table expression): https://msdn.microsoft.com/en-us/library/ms175972.aspx
;WITH checkDup AS (
SELECT Customer_ID, ROW_NUMBER() OVER (PARTITION BY Customer_ID ORDER BY Customer ID) AS 'RN'
FROM Table)
SELECT Customer_ID FROM checkDup
WHERE RN = 1;
Will give you your example output.
You may manipulate the CTE to get the other results you seek.

Related

How to select columns that aren't part of an aggregate query using HAVING SUM() in the WHERE and selecting only certain rows on db2

Using AS400 db2 for this.
I have a table of orders. From that table I have to:
Get all orders from a specified list of order IDs and type
Group by the user_id on those orders
Check to make sure the total order amount on the group is greater than $100
Return all orders that matched the group but the results won't be grouped, which includes order_id which is not part of the group
I got a bit stuck because the AS400 did not like that I was asking to select a field that wasn't part of the group, which I need.
I came up with this query, but it's slow.
-- Create a common temp table we can use in both places
WITH wantedOrders AS (
SELECT order_id FROM orders
WHERE
-- Only orders from the web
order_type = 'web'
-- And only orders that we want to get at this time
AND order_id IN
(
50,
20,
30
)
)
-- Our main select that gets all order information, even the non-grouped stuff
SELECT
t1.order_id,
t1.user_id,
t1.amount,
t2.total_amount,
t2.count
FROM orders AS t1
-- Join in the group data where we can do our query
JOIN (
SELECT
user_id,
SUM(amount) as total_amount,
COUNT(*) AS count
FROM
orders
-- Re use the temp table to get the order numbers
WHERE order_id IN (SELECT order_id FROM wantedOrders)
GROUP BY
user_id
HAVING SUM(amount)>100
) AS t2 ON t2.user_id=t1.user_id
-- Make sure we only use the order numbers
WHERE order_id IN (SELECT order_id FROM wantedOrders)
ORDER BY t1.user_id ASC;
What's the better way to write this query?
Try this:
WITH
wantedOrders (order_id) AS
(
VALUES 1, 2
)
, orders (order_id, user_id, amount) AS
(
VALUES
(1, 1, 50)
, (2, 1, 50)
, (1, 2, 60)
, (2, 2, 60)
, (3, 3, 200)
, (4, 3, 200)
)
-- Our main select that gets all order information, even the non-grouped stuff
SELECT *
FROM
(
SELECT
order_id,
user_id,
amount,
SUM (amount) OVER (PARTITION BY user_id) AS total_amount,
COUNT (*) OVER (PARTITION BY user_id) AS count
FROM orders t
WHERE EXISTS
(
SELECT 1
FROM wantedOrders w
WHERE w.order_id = t.order_id
)
) A
WHERE total_amount > 100
ORDER BY user_id ASC
ORDER_ID
USER_ID
AMOUNT
TOTAL_AMOUNT
COUNT
1
2
60
120
2
2
2
60
120
2
If order_id is the PK of the table. Then just add the columns you need to the wantedOrders query and use it as your "base" (instead of using orders and refiltering it. You should end up joining wantedOrders with itself.
You can do:
select t.*
from orders t
join (
select user_id
from orders t
where order_id in (50, 20, 30)
group by user_id
having sum(total_amount) > 100
) s on s.user_id = t.user_id
The first table orders as t will produce the data you want. It will be filtered by the second "table expression" s that preselects the groups according to your logic.

Finding customers that only bought items no one else bought

Below is a list of orders, is there a way to find the person_id of the customers, that has only bought products no one else has bought?
CREATE TABLE orders
AS
SELECT product_id, person_id
FROM ( VALUES
( 1 , 1 ),
( 2 , 1 ),
( 2 , 2 ),
( 3 , 3 ),
( 12, 6 ),
( 10, 3 )
) AS t(product_id, person_id);
The result would be the following table:
| person_id |
|-----------|
| 3 |
| 6 |
Do i have to find all the people who did buy items no one else bought and create a table that doesn't include those people?
You want all the products purchased by the person to be unique.
select person_id
from (select t.*,
min(person_id) over (partition by product_id) as minp,
max(person_id) over (partition by product_id) as maxp
from t
) t
group by person_id
having sum(case when minp <> maxp then 1 else 0 end) = 0;
You are probably thinking "Huh? What does this do?".
The subquery calculates the minimum person and maximum person on each product. If these are the same, than that one person is the only purchaser.
The having then checks that there are no non-single-purchaser products for a given person.
Perhaps a more intuitive phrasing of the logic would be:
select person_id
from (select t.*,
count(distinct person_id) over (partition by product_id) as numpersons
from t
) t
group by person_id
having max(numperson) = 1;
Alas, Postgres doesn't support COUNT(DISTINCT) as a window function.
The traditional self join with boolean aggregation
select o0.person_id
from
orders o0
left join
orders o1 on o0.product_id = o1.product_id and o0.person_id <> o1.person_id
group by o0.person_id
having bool_and(o1.product_id is null)
;
person_id
-----------
3
6
The inline view which is being joined gets all the product_ids which have only one person_id. Once all product_ids are found they will be joined to the original customers table to get the person_ids. This should solve your problem!!
SELECT person_id
FROM customers c1
INNER JOIN
(
SELECT product_id
FROM customers
GROUP BY product_id
HAVING COUNT(person_id ) = 1
) c2
ON c1.product_id = c2.product_id;
This is Gordon's logic using aggregates only:
SELECT person_id
FROM
(
SELECT product_id,
-- if count = 1 it's the only customer who bought this product
min(person_id) as person_id,
-- if the combination(person_id,product_id) is unique DISTINCT can be removed
count(distinct person_id) as cnt
FROM customers
GROUP BY product_id
) AS dt
GROUP BY person_id
HAVING max(cnt) = 1 -- only unique products
Here is another solution:
with unique_products as
(select product_id
from orders
group by product_id
having count(*) = 1)
select person_id
from orders
except
select person_id
from orders
where not exists
(select * from unique_products where unique_products.product_id = orders.product_id)
First all the identifier of products that appear in a single order are found. Then we subtract from all the persons (in the orders) those which do not have a order with a single product (i.e. all the persons that have at least ordered a product ordered by somebody else).

SQL Server : select only last record per customer from a join query

Assume I have these 3 tables :
The first 2 tables define customers of different types ,i.e second table has other columns which are not included in table 1 i just left them the same to save complexity.
The third table defines orders for both types of customers . Each customer has more than one orders
I want to select the last order for every customer, i.e the order with order_id 4 for customer 1 which was created on 23/12/2016 and the order with order_id 5 for customer 2 which was created on 26/12/2016
I tried something like this :
select *
from customertype1
left join order on order.customer_id = customertype1.customer_id
order by order_id desc;
But this gives me multiple records for every customer, as I have stated above I want only the last order for every customertype1.
If you want the last order for each customer, then you only need the orders table:
select o.*
from (select o.*,
row_number() over (partition by customer_id order by datecreated desc) as seqnum
from orders o
) o
where seqnum = 1;
If you want to include all customers, then you need to combine the two tables. Assuming they are mutually exclusive:
with c as (
select customer_id from customers1 union all
select customer_id from customers2
)
select o.*
from c left join
(select o.*,
row_number() over (partition by customer_id order by datecreated desc) as seqnum
from orders o
) o
on c.customer_id = o.customer_id and seqnum = 1;
A note about your data structure: You should have one table for all customers. You can then define a foreign key constraint between orders and customers. For the additional columns, you can have additional tables for the different types of customers.
Use ROW_NUMBER() and PARTITION BY.
ROW_NUMBER(): it will give sequence no to your each row
PARTITION BY: it will group your data by given column
When you use ROW_NUMBER() and PARTITION BY both together then first partition by group your records and then row_number give then sequence no by each group, so for each group you have start sequence from 1
Help Link: Example of ROW_NUMBER() and PARTITION BY
This is the general idea. You can work out the details.
with customers as
(select customer_id, customer_name
from table1
union
select customer_id, customer_name
from table2)
, lastOrder as
(select customer_id, max(order_id) maxOrderId
from orders
group by customer_id)
select *
from lastOrder join customers on lastOrder.Customer_id = customers.customer_id
join orders on order_id = maxOrderId

T-Sql find duplicate row values

I want to write a stored procedure.
In that stored procedure, I want to find duplicate row values from a table, and calculate sum operation on these rows to the same table.
Let's say, I have a CustomerSales table;
ID SalesRepresentative Customer Quantity
1 Michael CustA 55
2 Michael CustA 10
and I need to turn table to...
ID SalesRepresentative Customer Quantity
1 Michael CustA 65
2 Michael CustA 0
When I find SalesRepresentative and Customer duplicates at the same time, I want to sum all Quantity values of these rows and assign to the first row of a table, and others will be '0'.
Could you help me.
To aggregate duplicates into one row:
SELECT min(ID) AS ID, SalesRepresentative, Customer
,sum(Quantity) AS Quantity
FROM CustomerSales
GROUP BY SalesRepresentative, Customer
ORDER BY min(ID)
Or, if you actually want those extra rows with 0 as Quantity in the result:
SELECT ID, SalesRepresentative, Customer
,CASE
WHEN (count(*) OVER (PARTITION BY SalesRepresentative,Customer)) = 1
THEN Quantity
WHEN (row_number() OVER (PARTITION BY SalesRepresentative,Customer
ORDER BY ID)) = 1
THEN sum(Quantity) OVER (PARTITION BY SalesRepresentative,Customer)
ELSE 0
END AS Quantity
FROM CustomerSales
ORDER BY ID
This makes heavy use of window functions.
Alternative version without window functions:
SELECT min(ID) AS ID, SalesRepresentative, Customer, sum(Quantity) AS Quantity
FROM CustomerSales
GROUP BY SalesRepresentative, Customer
UNION ALL
SELECT ID, SalesRepresentative, Customer, 0 AS Quantity
FROM CustomerSales c
GROUP BY SalesRepresentative, Customer
LEFT JOIN (
SELECT min(ID) AS ID
FROM CustomerSales
GROUP BY SalesRepresentative, Customer
) x ON (x.ID = c.ID)
WHERE x.ID IS NULL
ORDER BY ID

PostgreSQL SELECT the last order per customer per date range

In PostgreSQL:
I have a Table that has 3 columns:
CustomerNum, OrderNum, OrderDate.
There may(or may not) be many orders for each customer per date range. What I am needing is the last OrderNum for each Customer that lies in the date range that is supplied.
What I have been doing is getting a ResultSet of the customers and querying each one separately, but this is taking too much time.
Is there any way of using a sub-select to select out the customers, then get the last OrderNum for each Customer?
On postgres you can also use the non-standard DISTINCT ON clause:
SELECT DISTINCT ON (CustomerNum) CustomerNum, OrderNum, OrderDate
FROM Orders
WHERE OrderDate BETWEEN 'yesterday' AND 'today'
ORDER BY CustomerNum, OrderDate DESC;
See http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT
select customernum, max(ordernum)
from table
where orderdate between '...' and '...'
group by customernum
that's all.
SELECT t1.CustomerNum, t1.OrderNum As LastOrderNum, t1.LastOrderDate
FROM table1 As t1
WHERE t1.OrderDate = (SELECT MAX(t2.OrderDate)
FROM table1 t2
WHERE t1.CustomerNum = t2.CustomerNum
AND t2.OrderDate BETWEEN date1 AND date2)
AND t1.OrderDate BETWEEN date1 AND date2
Not sure about your Customer table's structure or relationships, but this should work:
SELECT Customer.Num, (
SELECT OrderNum FROM Orders WHERE CustomerNum = Customer.Num AND OrderDate BETWEEN :start AND :end ORDER BY OrderNum DESC LIMIT 1
) AS LastOrderNum
FROM Customer
If by last order number you mean the largest order number then you can just use your select as the predicate for customer num, group the results and select the maximum:
SELECT CustomerNum, MAX(OrderNum) AS LastOrderNum
FROM Orders
WHERE
CustomerNum IN (SELECT CustomerNum FROM ...)
AND
OrderDate BETWEEN :first_date AND :last_date
GROUP BY CustomerNum
If the last order number isn't necessarily the largest order number then you'll need to either find the largest order date for each customer and join it together with the rest of the orders to find the corresponding number(s):
SELECT O.CustomerNum, O.OrderNum AS LastOrderNum
FROM
(SELECT CustomerNum, MAX(OrderDate) AS OrderDate
FROM Orders
WHERE
OrderDate BETWEEN :first_date AND :last_date
AND
CustomerNum IN (SELECT CustomerNum FROM ...)
GROUP BY CustomerNum
) AS CustLatest
INNER JOIN
Orders AS O USING (CustomerNum, OrderDate);
-- generate some data
DROP TABLE tmp.orders;
CREATE TABLE tmp.orders
( id INTEGER NOT NULL
, odate DATE NOT NULL
, payload VARCHAR
)
;
ALTER TABLE tmp.orders ADD PRIMARY KEY (id,odate);
INSERT INTO tmp.orders(id,odate,payload) VALUES
(1, '2011-10-04' , 'one' )
, (1, '2011-10-24' , 'two' )
, (1, '2011-10-25' , 'three' )
, (1, '2011-10-26' , 'four' )
, (2, '2011-10-23' , 'five' )
, (2, '2011-10-24' , 'six' )
;
-- CTE to the rescue ...
WITH sel AS (
SELECT * FROM tmp.orders
WHERE odate BETWEEN '2011-10-23' AND '2011-10-24'
)
SELECT * FROM sel s0
WHERE NOT EXISTS (
SELECT * FROM sel sx
WHERE sx.id = s0.id
AND sx.odate > s0.odate
)
;
result:
DROP TABLE
CREATE TABLE
NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "orders_pkey" for table "orders"
ALTER TABLE
INSERT 0 6
id | odate | payload
----+------------+---------
1 | 2011-10-24 | two
2 | 2011-10-24 | six
(2 rows)