sql deleting duplicate row - sql

I have a table in SQL
Id owner_id amount
1 100 1000
2 101 2000
3 100 3000
4 104 800
5 100 1200
i want only one owner_id i don't want 100 multiple times, but i want amount of all owner_id 100 i,e that amount should be added(i,e. 1000+3000+12000) if i delete duplicate Owner_id row. how to do it
And one more issue that owner_id from another table, how to get Owner name from another table. How to add join to get name of the owner

SELECT
owner_id,
SUM(amount) total_amount
FROM
table
GROUP BY
owner_id

try this :
-- Acumulate all the amount to be able to do the cleanup
UPDATE table SET amount = sumAmount
FROM table t
JOIN (SELECT owner_id, SUM(amount) sumAmount
FROM table
GROUP BY owner_id) x ON x.owner_id = t.owner_id;
-- Delete duplicated data
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY owner_id, amount ORDER BY Id) row
FROM table)
DELETE CTE WHERE row <> 1

Please use below query,
select owner_id, sum(amount) from table_name group by owner_id;

It works with the SUM Function, here is a reference: Link
SUM sums the amount of a the column which you insert, here: amount.
Please be aware that some functions only works if all other selected columns within another function OR grouped by.
SELECT
owner_id,
SUM(amount) total_amount
FROM
table
GROUP BY
owner_id

Related

How to enforce uniqueness in postgresql per row for a specific column

I have the following table (stripped down for demonstration)
products
- id
- part_number
- group_id
I want to be able to query against products and only return a single row per group_id (whichever is noticed first in the query is fine). All rows with group_id = null return as well.
Example:
ID part_number group_id
2314 ABB19 1
4543 GFH54 1
3454 GHT56 2
3657 QWT56 2
7689 GIT56 2
3465 HG567 null
5675 FG345 null
I would want to query against this table and get the following results:
ID part_number group_id
2314 ABB19 1
3454 GHT56 2
3465 HG567 null
5675 FG345 null
I have tried using group by but wasnt able to get it working without selecting the group_id and doing a group by on it which just returned a list of unique group_id's. Given the complexity of my real products table its important that I am able to keep using select * and not naming each column I need to return.
row_number() and filtering might be more efficient than distinct on and union all, which incur two table scans.
select *
from (
select p.*,
row_number() over(partition by group_id order by id) rn
from products p
) p
where rn = 1 or group_id is null
I was able to solve this with a combination of DISTINCT ON and a UNION
SELECT DISTINCT ON (group_id) * from products
WHERE group_id IS NOT NULL
UNION
SELECT * FROM products
WHERE group_id IS NULL

PostgreSQL: Remove rows from a table using id's

I have a bill table with id as the pk and a billno column which I should remove duplicates from
total rows (62924)
select count(billno) from bill
unique billno (59704), so need to remove 3220 rows
select count(distinct billno) from bill
query to get the duplicates (3220)
select count(*) from bill
WHERE bill.billno IN (SELECT billno
FROM bill
GROUP BY billno HAVING COUNT(*) > 1)
AND bill.company_code like '1'
However when I remove duplicates by id, the total does not tally :-
count after remove duplicated rows (61385) => SHOULD GET 59704 here..
select count (*) from bill
where bill.id not in
(
select id from bill
WHERE bill.billno IN (SELECT billno
FROM bill
GROUP BY billno HAVING COUNT(*) > 1)
AND bill.company_code like '1'
)
Can I know why this is happening?
You seem to be removing all duplicate rows. If you want a result set with no duplicates, use distinct on:
select distinct on (billno) b.*
from bill b
order by billno, id desc;
This returns the row for each bill that has the highest id.
I'm not sure why your query filters on the company. The question mentions nothing about that filtering.

how can I select rows that column does NOT have more than 1 value?

I am very new to SQL and I am wondering how to solve this issue. For example, my table looks as follows:
As you see in the table item_id 1 appears in both city_id 1 and 2, so does the item_id 4, but I want to get all the items where appears only in one city_id.
In this example, these would be item_id 2 (appearing only in city_id 2) and item_id 3 (appearing in city_id 1).
Use aggregation on item_id and count distinct values of city_id. The having clause can be used to filter on aggregates.
select item_id from mytable group by id having count(distinct city_id) = 1
You can use the following query:
SELECT item_id
FROM table_name
GROUP BY item_id
HAVING COUNT(DISTINCT city_id) = 1
In case you want to see the city_id to you can use this query:
SELECT item_id, MIN(city_id) AS city_id
FROM example
GROUP BY item_id
HAVING COUNT(DISTINCT city_id) = 1
Since there is only one city_id you can use MIN or MAX to get the id.
demo on dbfiddle.uk
You want all the id where they have only one distinct city:
SELECT item_id
FROM table
GROUP BY item_id
HAVING count(distinct city_id) = 1
It works by counting all the different values that city_id has for the same item_id. For those item ids where they repeat a lot, but the city_id is always the same the count of unique values in the city id is 1, and we can look for these using a HAVING clause. "Having" is like a where clause that runs after a GROUP BY operation is completed. It is the conceptual equivalent of this:
SELECT item_id
FROM
(
SELECT item_id, count(distinct city_id) as cdci
FROM table
GROUP BY item_id
) x
WHERE cdci = 1
If you want the city id too you can either get the MAX city (because in this case there is only one city so it's safe to do):
SELECT item_id, MAX(city_id) as city_id
FROM table
GROUP BY item_id
HAVING count(distinct city_id) = 1
or you could join this query back to the item table as a subquery:
SELECT t.*
(
SELECT item_id
FROM table
GROUP BY item_id
HAVING count(distinct city_id) = 1
) x
INNER JOIN
table t
ON x.item_id = t.item_id
This technique is the more general process for performing a group by that finds some particular set of rows, then bringing in the rest of the data from that row. You cant always stick every other column you want in a MAX because it will mix row data up, and you can't put the extra columns in your group by because that will subdivide what you're grouping on, giving the wrong results. Doing the group as a subquery and joining it back is a typical way to get all the row data when you have to group it to find which rows are interesting
In your case this form of query will bring all the duplicated rows (whereas the group by/max won't). If you don't want the duplicate rows you can make the top line SELECT DISTINCT t.* but don't make a habit of slapping distinct in to get rid of duplicated rows; if your tables don't have duplicates to start with but suddenly after you wrote a JOIN you got duplicated rows, google fornwhat a Cartesian product is in database queries and how to prevent it
You just need a group by on item id with having
Select item_id from table group by
item_id having count(distinct city_id)
=1
Also, if you want to have majority of same no of rows as input then
Select item_id, city, rank()
over(partition by item_id order by city)
rn
From table where rn=1;

Show duplicate rows(all columns of that row) where all columns are duplicate except one column

In below table, I need to select duplicate records where all columns are duplicate except Customer Type and Price for a particular week.
For e.g
Week Customer Product Customer Type Price
1 Alex Cycle Consumer 100
1 Alex Cycle Reseller 101
2 John Motor Consumer 200
3 John Motor Consumer 200
3 John Motor Reseller 201
I am using below query but this query doesn't show me both costumer type, it just shows me consumer count(*) for a combination.
select Week, Customer, product, count(distinct Customer Type)
from table
group by Week, Customer, product
having count(distinct Customer Type) > 1
I would like to see below result, that shows me duplicate values and not just the count(*) of duplicate row. I am trying to see customers assigned to multiple customer types in a particular week for a product and at the same time show me all columns. It doesn't matter if the price is different.
Week Customer Product Customer Type Price
1 Alex Cycle Consumer 100
1 Alex Cycle Reseller 101
3 John Motor Consumer 200
3 John Motor Reseller 201
Thanks
Shaki
WITH CustomerDistribution_CTE (WeekC ,CustomerC, ProductC)
AS
(
select Week, Customer, product
from Your_Table_Name group by Week, Customer,
product having count(distinct CustomerType) > 1
)
SELECT Y.*
FROM CustomerDistribution_CTE C
inner join Your_Table_Name Y on C.WeekC =Y.Week
and C.CustomerC =Y.Customer and C.productC =Y.product
Note :Please replace "Your_Table_Name" with exact table name and Try.
One way to achieve this, using generic SQL, is to use a "derived table" like this:
select x.*
from tablex x
inner join (
select Week, Customer, Product
from tablex
group by Week, Customer, Product
having count(*) > 1
) d on x.Week = d.Week and x.Customer = d.Customer and x.Product = d.Product
You can do that by using DISTINCT like
select DISTINCT Customer,Product,Customer_Type,Price from Your_Table_Name
will look for DISTINCT combination.
Note: This query if of SQL Server
From the expected result that you have pasted, it looks like you are not concerned about the week.
If you have a ID (incremental PK), it would be much simpler like below
select * from table where ID not in
(select max(ID) from table group by Customer, Product, CustomerType having count(*) > 1 )
This is tested on MySQL. Do you have a ID column?
In case you don't have a ID column, try the below:
select max(week) week, Customer, Product, CustomerType, max(price) from device group by Customer, Product, CustomerType;
I have not verified this one.
This will return your expected result set:
select *
from table
-- Teradata syntax to filter the result of an OLAP-function
-- (similar to HAVING after GROUP BY)
qualify
count(*)
over (partition by Week, Customer, product) > 1
For other DBMSes you will need to nest your query:
select *
from
(
select ...,
count(*)
over (partition by Week, Customer, product) as cnt
from table
) as dt
where cnt > 1
Edit:
After re-reading your description above Select might be not exactly what you want, because it will also return rows with a single type. Then switch to:
select *
from table
-- Teradata syntax to filter the result of an OLAP-function
-- (similar to HAVING after GROUP BY)
qualify -- at least two different types:
min(Customer_Type) over (partition by Week, Customer, product)
<> max(Customer_Type) over (partition by Week, Customer, product)

select multiple records based on order by

i have a table with a bunch of customer IDs. in a customer table is also these IDs but each id can be on multiple records for the same customer. i want to select the most recently used record which i can get by doing order by <my_field> desc
say i have 100 customer IDs in this table and in the customers table there is 120 records with these IDs (some are duplicates). how can i apply my order by condition to only get the most recent matching records?
dbms is sql server 2000.
table is basically like this:
loc_nbr and cust_nbr are primary keys
a customer shops at location 1. they get assigned loc_nbr = 1 and cust_nbr = 1
then a customer_id of 1.
they shop again but this time at location 2. so they get assigned loc_nbr = 2 and cust_Nbr = 1. then the same customer_id of 1 based on their other attributes like name and address.
because they shopped at location 2 AFTER location 1, it will have a more recent rec_alt_ts value, which is the record i would want to retrieve.
You want to use the ROW_NUMBER() function with a Common Table Expression (CTE).
Here's a basic example. You should be able to use a similar query with your data.
;WITH TheLatest AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY group-by-fields ORDER BY sorting-fields) AS ItemCount
FROM TheTable
)
SELECT *
FROM TheLatest
WHERE ItemCount = 1
UPDATE: I just noticed that this was tagged with sql-server-2000. This will only work on SQL Server 2005 and later.
Since you didn't give real table and field names, this is just psuedo code for a solution.
select *
from customer_table t2
inner join location_table t1
on t1.some_key = t2.some_key
where t1.LocationKey = (select top 1 (LocationKey) as LatestLocationKey from location_table where cust_id = t1.cust_id order by some_field)
Use an aggregate function in the query to group by customer IDs:
SELECT cust_Nbr, MAX(rec_alt_ts) AS most_recent_transaction, other_fields
FROM tableName
GROUP BY cust_Nbr, other_fields
ORDER BY cust_Nbr DESC;
This assumes that rec_alt_ts increases every time, thus the max entry for that cust_Nbr would be the most recent entry.
By using time and date we can take out the recent detail for the customer.
use the column from where you take out the date and the time for the customer.
eg:
SQL> select ename , to_date(hiredate,'dd-mm-yyyy hh24:mi:ss') from emp order by to_date(hiredate,'dd-mm-yyyy hh24:mi:ss');