SQL delete duplicate rows based on multiple fields

SQL delete duplicate rows based on multiple fields - sql

I have the following table in sql:
id | trip_id | stop_id | departure_time
----------------------------------------
1 | 1| 1| 06:25:00
2 | 1| 2| 06:35:00
3 | 1| 3| 06:45:00
4 | 1| 2| 06:55:00
What I need to do is identify where a trip_id as multiple instances of a certain stop_id (in this case stop_id 2).
I then need to delete any duplications leaving only the one with the latest departure time.
So given the above table Id delete the row with id 2 and be left with:
id | trip_id | stop_id | departure_time
----------------------------------------
1 | 1| 1| 06:25:00
3 | 1| 3| 06:45:00
4 | 1| 2| 06:55:00
I have managed to do this with a series of sql queries but I hit the N+1 issue and it takes ages.
Can anyone recommend a way I may be able to do this in one query? Or at the very least identify all the ids of rows that need deleting in one query?
Im doing this in ruby on rails so if you prefer an active record solution I wouldn't hate you for it :)
Thanks in advance.

You may try the following logic:
DELETE
FROM yourTable t1
WHERE EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.trip_id = t1.trip_id AND
t2.stop_id = t1.stop_id AND
t2.departure_time > t1.departure_time);
In plain English, this says to scan your entire table, and delete any record for which we can find another record with an identical trip_id and stop_id, where the departure time is also greater than that of the record being considered for deletion. If we find such a match, then it is a duplicate according to your definition.

You can try below way -
DELETE FROM tablename
WHERE id in
(
select id from
(select *, row_number() over(partition by stop_id order by departure_time desc) as rn from tablename)aa
)A where rn>1

try like below
DELETE FROM table a
WHERE a.ctid <> (SELECT max(b.ctid)
FROM table b
WHERE a.stop_id = b.stop_id)

Related

SQL UPDATE ORA-01427: single-row subquery returns more than one row calculating percentage

This is the table i have that i want to put the percentages.
+------+--------+-------------+--------------+
| H_ID | H_NAME | DOCTOR_STAT | PATIENT_STAT |
+------+--------+-------------+--------------+
| 1 | NAME 1 | 0 | 0|
| 2 | NAME 2 | 0 | 0|
| 3 | NAME 3 | 0 | 0|
| 4 | NAME 4 | 0 | 0|
| 5 | NAME 5 | 0 | 0|
+------+--------+-------------+--------------+
This is the code i have written. The inner sql-query works fine (it prints correctly the percentage of said hospital id) but it returns multiple rows so the update doesn't work.
UPDATE HOSPITAL_STATISTICS
SET DOCTOR_STAT=(SELECT ROUND(100*COUNT(DOCTOR.WORK_HOSPITAL)/(SELECT COUNT(DOCTOR.ID)FROM DOCTOR),2)
FROM DOCTOR,HOSPITAL
WHERE HOSPITAL.HOSPITAL_ID=DOCTOR.WORK_HOSPITAL
GROUP BY HOSPITAL.HOSPITAL_ID)
I know this returns multiple rows but i don't know how to solve this. I must calculate the percentages of how many doctors are working on every hospital. If you need more table data for help please tell me :)
EDIT:I have a second table named Patient_Visit and a third one Hospital
Patient_Visit
[Visit_ID,Patient_ID,Hospital_ID]
Hospital
[Hospital_ID,Hospital_Name]
and im trying to do the same thing with this code
UPDATE HOSPITAL_STATISTICS
SET PATIENT_STAT=(SELECT ROUND(100* COUNT (*) / (SELECT COUNT(*) FROM PATIENT_VISIT),2)
FROM PATIENT_VISIT,HOSPITAL
WHERE PATIENT_VISIT.HOSPITAL_ID=HOSPITAL.HOSPITAL_ID AND HOSPITAL_STATISTICS.H_ID=PATIENT_VISIT.HOSPITAL_ID
GROUP BY HOSPITAL_VISIT.HOSPITAL_ID)
which gives me this error:ORA-01407: cannot update ("HOSPITAL_STATISTICS"."PATIENT_STAT") to NULL. Any ideas?

You want a correlated subquery, not a JOIN:
UPDATE HOSPITAL_STATISTICS HS
SET DOCTOR_STAT = (SELECT ROUND(100 * COUNT(*) / (SELECT COUNT(*) FROM DOCTOR), 2)
FROM DOCTOR D
WHERE HS.H_ID = D.WORK_HOSPITAL
);

Sql server select query of with ids, Count of ids grouped by casted date from datetime

i am struggling to find a right way to write as select query that produces a count of ids with unique date, i have Log table as
id| DateTime
1|23-03-2019 18:27:45|
1|23-03-2019 18:27:45|
2|23-03-2019 18:27:50|
2|23-03-2019 18:27:51|
2|23-03-2019 18:28:01|
3|23-03-2019 18:33:15|
1|24-03-2019 18:13:18|
2|23-03-2019 18:27:12|
2|23-03-2019 15:27:46|
3|23-03-2019 18:21:58|
3|23-03-2019 18:21:58|
4|24-03-2019 10:11:14|
What i have am tried
select id, count(cast(DateTime as DATE)) as Counts from Logs group by id
its producing proper count of ids with id like
id|count
1 | 2|
2 | 3|
3 | 1|
1 | 1|
2 | 2|
3 | 2|
4 | 1|
What i want is to add datetime column casted as date
id|count|Date
1 | 2| 23-03-2019
2 | 3| 23-03-2019
3 | 1| 23-03-2019
1 | 1| 24-03-2019
2 | 2| 24-03-2019
3 | 2| 24-03-2019
4 | 1| 24-03-2019
However i get an error saying
Column 'Logs.DateTime' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
when i try
select id, count(cast(DateTime as DATE)) as Counts from Logs group by id

You need to add cast(DateTime as DATE) also in group by
select id,cast(DateTime as DATE) as dateval, count(cast(DateTime as DATE)) as Counts
from Logs
group by id,cast(DateTime as DATE)

SQL: Merge localized version of a table to the main one

Imagine I have a main table like:
Table guys
|id| name|profession|
|--|------|----------|
| 1| John| developer|
| 2| Mike| boss|
| 3| Roger| fireman|
| 4| Bob| policeman|
I also have a localized version which is not complete (the boss is missing):
Table guys_bg
|id| name | profession|
|--|------|-----------|
| 1| Джон|разработчик|
| 3|Роджър| пожарникар|
| 4| Боб| полицай|
I want to prioritize guys_bg results while still showing all the guys (The boss is still a guy, right?).
This is the desired result:
|id| name | profession|
|--|------|-----------|
| 1| Джон|разработчик|
| 2| Mike| boss|
| 3|Роджър| пожарникар|
| 4| Боб| полицай|
Take into consideration that both tables may have a lot of (100+) columns so joining the tables and using CASE for every column will be very tedious.
What are my options?

Here is one way using union all:
select gb.*
from guys_bg gb
union all
select g.*
from guys g
where not exists (select 1 from guys_bg gb where gb.id = g.id);

You can also make it with using FULL JOIN.
SELECT
ISNULL(b.id,g.id) id
, ISNULL(b.name, g.name) name
, ISNULL(b.profession, g.profession) profession
FROM
guys g
FULL JOIN guys_bg b ON g.id = b.id

SQL Count occurrences of non-unique column

Suppose I have a SQL table looking something like this:
--------------------
| id| name|
--------------------
| 1| Alice|
| 2| Bob|
| 3| Alice|
| 4| Alice|
| 5| Jeff|
| ...| ...|
--------------------
Is it possible to formulate a query which returns a list of names and the number of times they occur? I've made a solution to this by querying all the rows, removing duplicates counting and then ordering; which works, but just looks messy. Can this be neatened up in a SQL query?

This is standard SQL and should deliver your expected result:
select name, count(*)
from tblName
group by name
order by name
If you want to order by the count in descending order, you can use:
select name, count(*)
from tblName
group by name
order by 2 DESC

SQL query for finding the most frequent value of a grouped by value

I'm using SQLite browser, I'm trying to find a query that can find the max of each grouped by a value from another column from:
Table is called main
| |Place |Value|
| 1| London| 101|
| 2| London| 20|
| 3| London| 101|
| 4| London| 20|
| 5| London| 20|
| 6| London| 20|
| 7| London| 20|
| 8| London| 20|
| 9| France| 30|
| 10| France| 30|
| 11| France| 30|
| 12| France| 30|
The result I'm looking for is the finding the most frequent value grouping by place:
| |Place |Most Frequent Value|
| 1| London| 20|
| 2| France| 30|
Or even better
| |Place |Most Frequent Value|Largest Percentage|2nd Largest Percentage|
| 1| London| 20| 0.75| 0.25|
| 2| France| 30| 1| 0.75|

You can group by place, then value, and order by frequency eg.
select place,value,count(value) as freq from cars group by place,value order by place, freq;
This will not give exactly the answer you want, but near to it like
London | 101 | 2
France | 30 | 4
London | 20 | 6
Now select place and value from this intermediate table and group by place, so that only one row per place is displayed.
select place,value from
(select place,value,count(value) as freq from cars group by place,value order by place, freq)
group by place;
This will produce the result like following:
France | 30
London | 20
This works for sqlite. But for some other programs, it might not work as expected and return the place and value with least frequency. In those, you can put order by place, freq desc instead to solve your problem.

The first part would be something like this.
http://sqlfiddle.com/#!7/ac182/8
with tbl1 as
(select a.place,a.value,count(a.value) as val_count
from table1 a
group by a.place,a.value
)
select t1.place,
t1.value as most_frequent_value
from tbl1 t1
inner join
(select place,max(val_count) as val_count from tbl1
group by place) t2
on t1.place=t2.place
and t1.val_count=t2.val_count
Here we are deriving tbl1 which will give us the count of each place and value combination. Now we will join this data with another derived table t2 which will find the max count and we will join this data to get the required result.
I am not sure how do you want the percentage in second output, but if you understood this query, you can use some logic on top of it do derive the required output. Play around with the sqlfiddle. All the best.

RANK
SQLite now supports RANK, so we can use the exact same syntax that works on PostgreSQL, similar to https://stackoverflow.com/a/12448971/895245
SELECT "city", "value", "cnt"
FROM (
SELECT
"city",
"value",
COUNT(*) AS "cnt",
RANK() OVER (
PARTITION BY "city"
ORDER BY COUNT(*) DESC
) AS "rnk"
FROM "Sales"
GROUP BY "city", "value"
) AS "sub"
WHERE "rnk" = 1
ORDER BY
"city" ASC,
"value" ASC
This would return all in case of tie. To return just one you could use ROW_NUMBER instead of RANK.
Tested on SQLite 3.34.0 and PostgreSQL 14.3. GitHub upstream.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL delete duplicate rows based on multiple fields - sql

You can try below way - DELETE FROM tablename WHERE id in ( select id from (select *, row_number() over(partition by stop_id order by departure_time desc) as rn from tablename)aa )A where rn>1

try like below DELETE FROM table a WHERE a.ctid <> (SELECT max(b.ctid) FROM table b WHERE a.stop_id = b.stop_id)

Related

SQL UPDATE ORA-01427: single-row subquery returns more than one row calculating percentage

Sql server select query of with ids, Count of ids grouped by casted date from datetime

SQL: Merge localized version of a table to the main one

SQL Count occurrences of non-unique column

SQL query for finding the most frequent value of a grouped by value

Categories

Resources