How to get one row from duplicated rows?

How to get one row from duplicated rows? - sql

I have 2 tables SVC_ServiceTicket and SVC_CustomersVehicle
The table ServiceTicket has a column customerID which is a foreign key to CustomersVehicle.So in ServiceTicket column customerID can have duplicate values.
When I do
select sst.ServiceTicketID,sst.CustomerID
from ServiceTicket sst,CustomersVehicle scv
where sst.CustomerID=scv.CV_ID
then it gives me duplicate customerID.So my requirement is if there are duplicate values of customerID then I want the latest customerID and as well serviceticket of that corresponding(latest customerID)
For example in the below screenshot there are customerID 13 is repeating so in this case I want latest customerID as well as serviceticket so the values I want is 8008 and 13
Please tell me how to do

Use aggregate function MAX. Also I would recommend to use a JOIN.
SELECT MAX(sst.ServiceTicketID) AS ServiceTicketID,sst.CustomerID
FROM ServiceTicket sst JOIN
CustomersVehicle scv ON sst.CustomerVehicleID=scv.CV_ID
GROUP BY sst.CustomerID

Related

SQL data cleaning SELECT DISTINCT from duplicate ID and return list of records. Scenario: Return unique IDs for first and latest instance

Dataset: customer_data
Table: customer_table (30 records)
Fields: customer_id, name
Datatype: customer_id = INTEGER, name = STRING
The problem or request: the customer_table contains 30 rows of customer data. However, there are some duplicate rows and I need to clean the data. I am using Google BigQuery to perform my SQL querying and I want to query the customer_table from the customer_data dataset to return unique customer_id along with the corresponding name.
If duplicate customer_id exists, but the duplicate has a different name, return the first instance record and discard the duplicate and continue returning all unique customer_id and name.
Alternately, if duplicate customer_id exists, but has different name, return the latest instance record from the table and discard the duplicate and continue returning all unique customer_id and name.
My methods:
Identify the unique values using SELECT DISTINCT.
SELECT DISTINCT customer_id
FROM customer_data.customer_table
Result: 24 rows
SELECT DISTINCT name
FROM customer_data.customer_table
Result: 25 rows
After finding out the number of unique values from customer_id and name do not match, I suspect one of the customer_id shares two different name.
Visualize which duplicate customer_id has two names:
SELECT DISTINCT customer_id, name
FROM customer_data.customer_table
ORDER BY customer_id ASC
Result: 25 rows
It appears there is one duplicate customer_id and the same customer_id has two different name.
Example:
customer_id
name
1890
Henry Fiction
1890
Arthur Stories
Return DISTINCT customer_id and name. If there are duplicates return only the first, discard the duplicate, and continue returning unique customer_id and name.
SELECT DISTINCT customer_id, name
FROM
(SELECT
customer_id, name,
ROW_NUMBER() OVER (PARTITION BY customer_id
ORDER BY customer_id ASC) AS row_num
FROM
customer_data.customer_table) subquery
WHERE
subquery.rownum = 1
Result: 24 rows
I decided to try using ROW_NUMBER() in a subquery to ask the query to perform an inner task first by making an index for the number of times the query count for each customer_id. Then, have it perform the final task with a WHERE clause to return a list of DISTINCT customer_id and the matching name for the first instance the customer_id is recorded in the customer_table.
Excellent! I was able to make a query to return unique customer_id along with their name from the customer_table, and if there are duplicate customer_id but the duplicate id has different name, create a list of customer_id and name that selects the first instance customer_id is recorded in the customer_table.
Now, what if I wanted to ask the query to create a list of unique customer_id and name that, instead of selecting the first customer_id when it encounter duplicates, select the latest record entry in the table if it encounter duplicate customer_id. How should I approach to solving this problem? What query method would you suggest?
Expected result: 24 rows
What I've tried:
SELECT DISTINCT customer_id, name
FROM
(SELECT
customer_id, name,
ROW_NUMBER() OVER (PARTITION BY customer_id
ORDER BY customer_id ASC) AS row_num
FROM
customer_data.customer_table) subquery
WHERE
subquery.row_num > 1
Result : 4 rows
Desired result: 24 rows
I tried changing the WHERE clause for subquery.row_num > 1 just to see what would change and see the desired values I want in my created list of unique customer_id and name. Of the 4 rows produced from the query, only one row has the duplicate customer_id and different name that I want, which is the latest duplicate customer_id having a different name in the customer_table. Referring back to the example where
SELECT DISTINCT customer_id, name
FROM customer_data.customer_table
revealed:
customer_id
name
1890
Henry Fiction
1890
Arthur Stories
One of the duplicates customer_id, 1890, was recorded first in the table and the other recorded later. The alternate request is to return a list of unique customer_id and name that if the query encounters duplicate customer_id it will select the latest record in the customer_table.

In case you don't have a timestamp when a record was added, I am afraid you won't be able to identify the latest record. Based on this post, BQ does not add the timestamp automatically. Is your table partitioned? If yes, then you might be able to identify the latest record using partitions.

Insert values from another table and update original table with returning values

I'm new to PostgreSQL (and even Stackoverflow).
Say, I have two tables Order and Delivery:
Order
id product address delivery_id
--------------------------------------------------
1 apple mac street (null)
3 coffee java island (null)
4 window micro street (null)
Delivery
id address
----------------
Delivery.id and Order.id are auto-incrementing serial columns.
The table Delivery is currently empty.
I would like to move Order.address to Delivery.address and its Delivery.id to Order.delivery_id to arrive at this state:
Order
id product address delivery_id
--------------------------------------------------
1 apple mac street 1
5 coffee java island 2
7 window micro street 3
Delivery
id address
---------------------
1 mac street
2 java island
3 micro street
I'll then remove Order.address.
I found a similar question for Oracle but failed to convert it to PostgreSQL:
How to insert values from one table into another and then update the original table?
I still think it should be possible to use a plain SQL statement with the RETURNING clause and a following INSERT in Postgres.
I tried this (as well as some variants):
WITH ids AS (
INSERT INTO Delivery (address)
SELECT address
FROM Order
RETURNING Delivery.id AS d_id, Order.id AS o_id
)
UPDATE Order
SET Delivery_id = d_id
FROM ids
WHERE Order.id = ids.o_id;
This latest attempt failed with:
ERROR: missing FROM-clause entry for table "Delivery" LINE 1: ...address Order RETURNING Delivery.id...
How to do this properly?

First of all, ORDER is a reserved word. Don't use it as identifier. Assuming orders as table nae instead.
WITH ids AS (
INSERT INTO delivery (address)
SELECT DISTINCT address
FROM orders
ORDER BY address -- optional
RETURNING *
)
UPDATE orders o
SET delivery_id = i.id
FROM ids i
WHERE o.address = i.address;
You have to account for possible duplicates in order.address. SELECT DISTINCT produces unique addresses.
In the outer UPDATE we can now join back on address because delivery.address is unique. You should probably keep it that way beyond this statement and add a UNIQUE constraint on the column.
Effectively results in a one-to-many relationship between delivery and orders. One row in delivery can have many corresponding rows in orders. Consider to enforce that by adding a FOREIGN KEY constraint accordingly.
This statement enjoys the benefit of starting out on an empty delivery table. If delivery wasn't empty, we'd have to work with an UPSERT instead of the INSERT. See:
How to use RETURNING with ON CONFLICT in PostgreSQL?
Related:
Insert data in 3 tables at a time using Postgres
About the cause for the error message you got:
RETURNING causes error: missing FROM-clause entry for table
Use legal, lower-case identifiers exclusively, if you can. See:
Are PostgreSQL column names case-sensitive?

You can't return columns from the FROM relation in the RETURNING clause of the CTE query. You'll have to either manage this in a cursor, or add an order_id column to the Delivery table, something like this:
ALTER TABLE Delivery ADD COLUMNN order_id INTEGER:
INSERT INTO Delivery (address, order_id)
SELECT address, id
FROM Order
;
WITH q_ids AS
(
SELECT id, order_id
FROM Delivery
)
UPDATE Order
SET delivery_id = q_ids.id
FROM q_ids
WHERE Order.id = q_ids.order_id;

Distinct column with primary key column

Distinct column count differs when adding the primary key column in the Select query
The count distinct for supplier_payment_terms is 110, but when adding the PK column, the count changes to thousands.
select distinct supplier, unique_id from indirect_spend;
I expect the same record count of 110 when including the PK column in the select. The Select must only include the unique_id of the supplier.

"I expect the same record count of 110 when including the PK column in the select"
Then you expect wrong. SELECT DISTINCT causes all rows appearing in the result to be distinct, i.e. no duplicate rows in the result.
Besides. Imagine two rows (supplier-id unique-id) (1 2) and (1 5). You say you expect only one row in the result. How is the system going to determine which one of the two rows to deliver ?

You can use aggregation to get example primary keys:
select supplier, min(unique_id), max(unique_id)
from indirect_spend
group by supplier;

oracle unique constraint

I'm trying to insert distinct values from one table into another. My target table has a primary key studentid and when I perform distinct id from source to target the load is successful. When I'm trying to load a bunch of columns from source to target including student_id, I'm getting an error unique constraint violated. There is only one constraint on target which is the primary key on studentid.
my query looks like this (just an example)
insert into target(studentid, age, schoolyear)
select distinct id, age, 2012 from source
Why does the above query returns an error where as the below query works perfectly fine
insert into target(studentid)
select distinct id from source
help me troubleshoot this.
Thanks for your time.

In your first query you are selecting for distinct combination of three columns ie,
select distinct id, age, 2012 from source
Not the distinct id alone. In such case there are possibility for duplicate id's.
For example, Your above query is valid for this
id age
1 23
1 24
1 25
2 23
3 23
But in your second query you are selecting only distinct id's
select distinct id from source
So this will return like,
id
1
2
3
In this case there is no way for duplicates and your insert into target will not
fail.
If you really want to do bulk insert with constrain on target then go for
any aggregate functions
select id, max(age), max(2012) group by id from source
Or if you dont want to loose any records from source to target then remove your constraint on target and insert it.
Hope this helps

sql insert error

This is my Insert Statement
INSERT INTO ProductStore (ProductID, StoreID, CreatedOn)
(SELECT DISTINCT(ProductId), 1, GETDATE() FROM ProductCategory
WHERE EXISTS (SELECT StoreID, EntityID FROM EntityStore
WHERE EntityType = 'Category' AND ProductCategory.CategoryID = EntityStore.EntityID AND StoreID = 1))
I am trying to Insert into table ProductStore, all the Products Which are mapped to Categories that are mapped to Store 1. Column StoreID can definitely have more than one row with the same entry. And I am getting the following error: Violation of Primary Key Constraint...
However, the Following query does work:
INSERT INTO ProductStore (ProductID, StoreID, CreatedOn)
VALUES (2293,1,GETDATE()),(2294,1,GETDATE())
So apparently, the ProductID Column is trying to insert the same one more than once.
Can you see anything wrong with my query?
TIA

I don't see any part of that query that excludes records already in the table.

Take out the INSERT INTO statement and just run the SELECT - you should be able to spot pretty quickly where the duplicates are.
My guess is that you're slightly mistaken about what SELECT DISTINCT actually does, as evidenced by the fact that you have parentheses around the ProductId. SELECT DISTINCT only guarantees the elimination of duplicates when all columns in the select list are the same. It won't guarantee in this case that you only get one row for each ProductId.

select distinct productid is selecting an existing ID and therefor in violation with your primary key constraint.
Why don't you create the primary key using Identity increment? In that case you don't need to worry about the ID itself, it will be generated for you.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to get one row from duplicated rows? - sql

Use aggregate function MAX. Also I would recommend to use a JOIN. SELECT MAX(sst.ServiceTicketID) AS ServiceTicketID,sst.CustomerID FROM ServiceTicket sst JOIN CustomersVehicle scv ON sst.CustomerVehicleID=scv.CV_ID GROUP BY sst.CustomerID

Related

SQL data cleaning SELECT DISTINCT from duplicate ID and return list of records. Scenario: Return unique IDs for first and latest instance

Insert values from another table and update original table with returning values

Distinct column with primary key column

oracle unique constraint

sql insert error

Categories

Resources