Sql query to delete duplicate rows - sql

I have three tables.
Diagnose, Patient and PatientDiagnose
The tables look like this
Diagnose:
uuid,text,date
Patient:
uuid,name
PatientDiagnose:
patientuuid,diagnoseuuid
One patient can of course have multiple diagnoses and two patients can of course have the same diagnose but the two diagnoses are represented uniqly in Diagnose with different uuids. Therefore the two patients are represented in PatientDiagnose with their patient uuids and each one with those unique diagnose uuid.
Now I have found out that I would like to fix something in my DB. I would like to delete the diagnoses that are considered duplicates for a patient. Duplicates are: if they belong to the same patient and has the same text, within the same year (use of year function on date?) and leaving just one of those diagnoses intact.
I would like to remove those duplicates since I only want one diagnose pr patient of the same text, a year.
How can I do that in SQL?
Tommy

You say that a diagnose shall refer to exactly one patient. Your database, however, doesn't guarantee this, so you should fix that issue first. That would leave you with only two tables:
Patient: patientuuid, name
Diagnose: diagnoseuuid, text, date, patientuuid
Once you've converted your tables thus, you can easily do the cleanup:
delete from diagnose
where exists
(
select *
from diagnose other
where other.diagnoseuuid < diagnose.diagnoseuuid
and other.text = diagnose.text
and year(other.date) = year(diagnose.date)
and other.patientuuid = diagnose.patientuuid
);
You haven't mentioned which DBMS you are using. It may not feature the YEAR function. In that case try EXTRACT(YEAR FROM date) or look up date functions in your manual.

Related

How to understand this query?

SELECT DISTINCT
...
...
...
FROM Reviews Rev
INNER JOIN Reviews SubRev ON Subrev.W_ID=Rev.ID
WHERE Rev.Status='Approved'
This is a small part of a long query that I've been trying to understand for a day now. What is happening with the join? Reviews table appears to be joined with itself, under different aliases. Why is this done? What does it achieve? Also, ID field of the Reviews table is null for the entries that are nevertheless selected and returned. This is correct, but I don't understand how that can happen if the W_ID field is not null.
It allows you to join one row from the table to a different row in the table.
I've both seen this done, and used it myself, in cases where you maybe have a relationship between those rows.
Real-world examples:
An old version of a record and a newer version
Some sort of hierarchical relationship (e.g. if the table contains records of people, you can record that someone is a parent of someone else). There are probably plenty of other possible use cases, too.
SQL allows you to create a foreign key which relates between two different columns in the same table.

Multiple Columns but only one join

I have searched through quite a few answers on this but I can't seem anything specific to this situation. Apologies if I have over looked this.
We have a calendar system we are re-coding to log events, holidays, absent days, MOT's, Inspections etc
Initially our calendar focused around people, so we had a join from the person table to the calendarDay, but now other tables from the database are required to have days assigned to them.
My plan was to have a junction table for each of the three tables needing joined - Person, Job and Vehicle, or, have an ID column in the Assignment junction table that has a column for Person_ID, Job_ID and Vehicle_ID fk's:
CalendarDayAssignment
ID CalendarDayID PersonID JobID VehicleID
so this table would contain a CalendarDay_ID and either a Person_ID, Job_ID or Vehicle_ID depending on which table has a day assigned to it, leaving two columns having a NULL. (These could be moved directly to the CalendarDay table actually, as days will not be shared)
My preference would be the later shown above as I would only need one table, rather than 3 (potentially more if more objects need days assigned to them), and the referential data integrity is intact also.
I know this may be subjective, but is this a reasonable way of accomplishing this? It seems easier to include more objects should the time come.
Just as a note to the above, we will be pulling different data from the Person, Job and Vehicle tables, so I can't see how to implement a one size fits all solutions that doesn't end up with redundant data from a coding POV. The Junction table will likely have only 10,000 rows per year added.

I'm having trouble resolving a Internship postgresql exercise. Using two different tables

I've applied myself for a internship which uses postgresql, I never had any contact with programming language before the university which started just a couple months ago. The employer sent me an email with some exercises that I have to do before Monday. I have three days to learn the language and resolve the exercises. I've been studying the whole day, about 14 hours (for real). I'm getting used to the postgresql but I'm struggling with one thing. Since I'm very new to programming and I don't have enough time to do that very specific search I have no other options but ask you guys for.
Here's the problem. I have the same columns 'id_cliente' on both tables. I need to show a table where it shows all persons names, ids and how many movies each one of them borrowed from the rental.
I tried two different codes and none of them works.
select en_cliente.id_cliente, nome, count(en_aluguel.id_cliente) as alugueis
from en_cliente, en_aluguel
where en_cliente.id_cliente=en_aluguel.id_cliente
group by en_cliente.id_cliente;
Which makes Maria goes missing (Because her ID doesn't shows at the first table. It's supposed to show a zero
Also:
select en_cliente.id_cliente, nome, count(en_aluguel.id_cliente) as alugueis
from en_cliente, en_aluguel
group by en_cliente.id_cliente;
Which makes every value of the last column (id as alugueis) to be a '7'
First Table:
Second Table:
Third Table:
Two things:
Any time you have two tables in the FROM line, you’re doing an INNER JOIN, which requires matching rows on both tables.
Any time you have criteria in the WHERE clause, only rows matching that will be returned, which again will limit you to records in both tables with the clause you have.
You need to LEFT JOIN, which allows you to go from records that exist, to records that may or may not exist.
Give this a try. It will start at your en_cliente table and will join to your records in the en_aluguel table even if there is not a match in en_aluguel.
select en_cliente.id_cliente, nome, count(en_aluguel.id_cliente) as alugueis
from en_cliente
left join en_aluguel on (en_cliente.id_cliente=en_aluguel.id_cliente)
group by en_cliente.id_cliente;
Note: if you change the word “left” in my example to “inner”, you’ll end up with exactly the same code (just a different syntax) as your first example.

Joining multiple Tables in Oracle gives out duplicated records

I am a newbie to sql. I have three tables mr1,mr2,mr3. Caseid is the primary keys in all these tables. I need to join all these table columns and display result.
Problem is that i dont know which join to use.
when i joined all these just like below query:
select mr1.col1,mr1.col2,mr2.col1,mr2.col2,mr3.col1,mr3.col2
from mr1,mr2,mr3
where mr1.caseid = mr2.caseid
and mr2.caseid = mr3.caseid;
it displays 4 records, eventhough the maximum number of records is two, which is in table mr2.
records are duplicated, can anyone help me in this regard?
Distinct will do it but it's not the correct approch.
You need to add another join (mr1.caseid = mr3.caseid) because mr2 and mr3 rows must be related to the same row in mr1, otherwise you end up with 2 pairs, onde for each tabled joined to your primary table (mr2).
First answer in SO, so forgive me if it wasn't that clear.
Your problem is that your tables are in a one-to many relationship. When you join them, it is expected that the number of rows will go up unless you take steps to limit the records returned. How to fix depends on the meaning of the data.
If all the fields are exactly the same, then adding DISTINCT will fix the problem. However, it may be faster, depending on the size of the tables and the number of records you are returning, to use a derived table to limit the records in the join to only one from the table with multiple records.
If at least one of the fields is different however, then you need to know the business rule that will allow you to pick the correct record. It might be accomplished by adding a where clause or by using an aggregate function and group by or even both. This really depends on the meaning of the result set which is why you need to ask further question in your own organization as they are the only ones who will know which of the multiple records is the correct one to pick from the perspectives of the people who will be using the results of the query. Further, the business might actually want to see all of the records and you have no problem at all.

What is a fast way of joining two tables and using the first table column to "filter" the second table?

I am trying to develop a SQL Server 2005 query but I'm being unsuccessful at the moment. I trying every different approach that I know, like derived tables, sub-queries, CTE's, etc, but I couldn't solve the problem. I won't post the queries I tried here because they involve many other columns and tables, but I will try to explain the problem with a simpler example:
There are two tables: PARTS_SOLD and PARTS_PURCHASED. The first contains products that were sold to customers, and the second contains products that were purchased from suppliers. Both tables contains a foreign key associated with the movement itself, that contains the dates, etc.
Here is the simplified schema:
Table PARTS_SOLD:
part_id
date
other columns
Table PARTS_PURCHASED
part_id
date
other columns
What I need is to join every row in PARTS_SOLD with a unique row from PARTS_PURCHASED, chose by part_id and the maximum "date", where the "date" is equal of before the "date" column from PARTS_PURCHASED. In other words, I need to collect some information from the last purchase event for the item for every event of selling this item.
The problem itself is that I didn't find a way of joining the PARTS_PURCHASED table with PARTS_SOLD table using the column "date" from PARTS_SOLD to limit the MAX(date) of the PARTS_PURCHASED table.
I could have done this with a cursor to solve the problem with the tools I know, but every table has millions of rows, and perhaps using cursors or sub-queries that evaluate a query for every row would make the process very slow.
You aren't going to like my answer. Your database is designed incorrectly which is why you can't get the data back out the way you want. Even using a cursor, you would not get good data from this. Assume that you purchased 5 of part 1 on May 31, 2010. Assume on June 1, you sold ten of part 1. Matching just on date, you would match all ten to the May 31 purchase even though that is clearly not correct, some parts might have been purchased on May 23 and some may have been purchased on July 19, 2008.
If you want to know which purchased part relates to which sold part, your database design should include the PartPurchasedID as part of the PartsSold record and this should be populated at the time of the purchase, not later for reporting when you have 1,000,000 records to sort through.
Perhaps the following would help:
SELECT S.*
FROM PARTS_SOLD S
INNER JOIN (SELECT PART_ID, MAX(DATE)
FROM PARTS_PURCHASED
GROUP BY PART_ID) D
ON (D.PART_ID = S.PART_ID)
WHERE D.DATE <= S.DATE
Share and enjoy.
I'll toss this out there, but it's likely to contain all kinds of mistakes... both because I'm not sure I understand your question and because my SQL is... weak at best. That being said, my thought would be to try something like:
SELECT * FROM PARTS_SOLD
INNER JOIN (SELECT part_id, max(date) AS max_date
FROM PARTS_PURCHASED
GROUP BY part_id) AS subtable
ON PARTS_SOLD.part_id = subtable.part_id
AND PARTS_SOLD.date < subtable.max_date