How to deselect duplicate entries in a query? - sql

I've got a query like this:
SELECT *
FROM RecipeTable, RecipeIngredientTable, SyncRecipeIngredientTable
WHERE RecipeTable.recipe_id = SyncRecipeIngredientTable.recipe_id
AND RecipeIngredientTable.recipe_ingredient_id =
SyncRecipeIngredientTable.recipe_ingredient_id
AND RecipeIngredientTable.recipe_item_name in ("ayva", "pirinç", "su")
GROUP by RecipeTable.recipe_id
HAVING COUNT(*) >= 3;
and this query returns the result like this:
As you can see in the image there is 3 duplicate, unnecessary entries (no, i can't delete them because of the multiple foreign keys). How can I deselect these duplicate entries from the result query? In the end I want to return 6 entries not 9.

What you want to eliminate in the result set is not duplication of recipe_id values but recipe_name values.
You just need to group(partition) by recipe_name through use of ROW_NUMBER() analytic function :
SELECT recipe_id, author_name ...
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY recipe_name) AS rn,
sr.recipe_id, author_name ...
FROM SyncRecipeIngredientTable sr
JOIN RecipeIngredientTable ri
ON ri.recipe_ingredient_id = sr.recipe_ingredient_id
JOIN RecipeTable rt
ON rt.recipe_id = sr.recipe_id
WHERE ri.recipe_item_name in ("ayva", "pirinç", "su")
)
WHERE rn = 1
This way, you can pick only one of the records with rn=1 (ORDER BY Clause might be added to that analytic function after PARTITION BY clause if spesific record is needed to be picked)

Related

SQL query to return duplicate rows for certain column, but with unique values for another column

I have written the query shown here that combines three tables and returns rows where the at_ticket_num from appeal_tickets is duplicated but against a different at_sys_ref value
select top 100
t.t_reference, at.at_system_ref, at_ticket_num, a.a_case_ref
from
tickets t, appeal_tickets at, appeals_2 a
where
t.t_reference in ('AB123','AB234') -- filtering on these values so that I can see that its working
and t.t_number = at.at_ticket_num
and at.at_system_ref = a.a_system_ref
and at.at_ticket_num IN (select at_ticket_num
from appeal_tickets
group by at_ticket_num
having count(distinct at_system_ref) > 1)
order by
t.t_reference desc
This is the output:
t_reference at_system_ref at_ticket_num a_case_ref
-------------------------------------------------------
AB123 30838974 23641583 1111979010
AB123 30838976 23641583 1111979010
AB234 30839149 23641520 1111977352
AB234 30839209 23641520 1111988003
I want to modify this so that it only returns records where t_reference is duplicated but against a different a_case_ref. So in above case only records for AB234 would be returned.
Any help would be much appreciated.
You want all ticket appeals that have more than one system reference and more than one case reference it seems. You can join the tables, count the occurrences per ticket and then only keep the tickets that match these criteria.
select *
from
(
select
t.t_reference, at.at_system_ref, at.at_ticket_num, a.a_case_ref,
count(distinct a.a_system_ref) over (partition by at.at_ticket_num) as sysrefs,
count(distinct a.a_case_ref) over (partition by at.at_ticket_num) as caserefs
from tickets t
join appeal_tickets at on at.at_ticket_num = t.t_number
join appeals_2 a on a.a_system_ref = at.at_system_ref
) counted
where sysrefs > 1 and caserefs > 1
order by t.t_reference, at.at_system_ref, at.at_ticket_num, a.a_case_ref;
Correction
It seems that SQL Server still doesn't support COUNT(DISTINCT ...) OVER (...). You can count distinct values in a subquery though. Replace
count(distinct a.a_system_ref) over (partition by at.at_ticket_num) as sysrefs,
by
(
select count(distinct a2.a_system_ref)
from appeal_tickets at2
join appeals_2 a2 on a2.a_system_ref = at2.at_system_ref
where at2.at_ticket_num = t.t_number
) as sysrefs,
An alternative workaround is to use DENSE_RANK in two directions (found here: https://stackoverflow.com/a/53518204/2270762):
dense_rank() over (partition by at.at_ticket_num order by a.a_system_ref) +
dense_rank() over (partition by at.at_ticket_num order by a.a_system_ref desc) -
1 as sysrefs,
with data as (
<your query plus one column>,
case when
min() over (partition by t.t_reference)
<>
max() over (partition by t.t_reference)
then 1 end as dup
)
select * from data where dup = 1

SQL Server: Each GROUP BY expression must contain at least one column that is not an outer reference

scenario 1:
I have two tables INFUSION_APP_APPOINTMENT,INFUSION_APP_NURSE_NOTES where
INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID=INFUSION_APP_APPOINTMENT.ID and i want to find out the INFUSION_APP_NURSE_NOTES.ID's where INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID is same.
for eg. if the INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID = 1 and INFUSION_APP_NURSE_NOTES.ID is 12,15,78, then i want to display all the
INFUSION_APP_NURSE_NOTES.ID's where INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID =1.
i use below script
SELECT INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID,INFUSION_APP_NURSE_NOTES.ID
FROM INFUSION_APP_NURSE_NOTES
GROUP BY INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID,INFUSION_APP_NURSE_NOTES.ID
HAVING COUNT(INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID)>1
but it does not gives me any records.
scenario 2:
I am running below script with the intention to get the duplicate records with different INFUSION_APP_NURSE_NOTES.ID's but same INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID.
SELECT INFUSION_APP_NURSE_NOTES.ID,INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID,INFUSION_APP_NURSE_NOTES.TYPE
FROM INFUSION_APP_NURSE_NOTES
WHERE
EXISTS (
SELECT 1 FROM INFUSION_APP_APPOINTMENT
WHERE
INFUSION_APP_NURSE_NOTES.ENABLE=1
AND INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID=INFUSION_APP_APPOINTMENT.ID
GROUP BY INFUSION_APP_NURSE_NOTES.ID
HAVING COUNT(INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID)>1
)
ORDER BY INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID;
but getting below error
SQL Error(164): Each GROUP BY expression must contain at least one
column that is not an outer reference
how to solve it?
i want the only row which has common APPOINTMENT_ID but different n
The question is unclear. Finding duplicates is typically performed using ranking functions like ROW_NUMBER(). This query :
SELECT *,ROW_NUMBER(PARTITION BY APPOINTMENT_ID ORDER BYID) as RN
FROM INFUSION_APP_NURSE_NOTES
WHERE
ENABLE=1
Will rank notes for the same appointment by ID and return 1, 2, 3 etc starting from the earliest note. ORDER BY ID DESC would return 1 for the latest note.
This can be used in a subquery or CTE to find the first, last or or duplicate records, eg :
with notes as (
SELECT *,ROW_NUMBER(PARTITION BY APPOINTMENT_ID ORDER BYID) as RN
FROM INFUSION_APP_NURSE_NOTES
WHERE
ENABLE=1
)
select *
from notes
where RN=1
Will return the first note per appointment while :
where RN>1
Will return only duplicates.
The question doesn't say what should be done with the duplicates though.
If the question is how to return all notes from appointments with multiple notes, a subquery can be used to return the APPOINTMENT_IDs that have more than one note. There's no need to include the INFUSION_APP_APPOINTMENT table though :
SELECT *
FROM INFUSION_APP_NURSE_NOTES
where
ENABLE=1 AND
APPOINTMENT_ID IN ( SELECT APPOINTMENT_ID
FROM INFUSION_APP_NURSE_NOTES
WHERE
ENABLE=1
group by APPOINTMENT_ID
having count(*)>1)
Try this
SELECT INFUSION_APP_NURSE_NOTES.ID,INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID,INFUSION_APP_NURSE_NOTES.TYPE
FROM INFUSION_APP_NURSE_NOTES
WHERE
EXISTS (
SELECT COUNT(B.APPOINTMENT_ID), B.ID
FROM INFUSION_APP_APPOINTMENT A
INNER JOIN INFUSION_APP_NURSE_NOTES B ON B.APPOINTMENT_ID = A.ID
WHERE
B.ENABLE=1
GROUP BY B.ID
HAVING COUNT(B.APPOINTMENT_ID)>1
)
ORDER BY INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID;

Remove duplicate row based on select statement

I have two select statements which is returning duplicated data. What I'm trying to accomplish is to remove a duplicated leg. But I'm having hard times to get to the second row programmatically.
select i.InvID, i.UID, i.StartDate, i.EndDate, i.Minutes,i.ABID from inv_v i, InvoiceLines_v i2 where
i.Period = '2014/08'
and i.EndDate = i2.EndDate
and i.Minutes = i2.Minutes
and i.Uid <> i2.Uid
and i.abid = i2.abid
order by i.EndDate
This select statement returns the following data.
As you can see it returns duplicate rows where minutes are the same ABID is the same but InvID are different. What I need to do is to remove one of the InvID where the criteria matches. Doesn't matter which one.
The second select statement is returning different data.
select i.InvID, i.UID, i.StartDate, i.EndDate, i.Minutes from InvoiceLines_v i, InvoiceLines_v i2 where
i.Period = '2014/08'
and i.EndDate = i2.EndDate
and i.Uid = i2.Uid
and i.Abid <> i2.Abid
and i.Language <> i2.Language
order by i.startdate desc
In this select statement I want to remove an InvID where UID is the same then select the lowest Mintues. In This case, I would remove the following InvIDs: 2537676 , 2537210
My goal is to remove those rows...
I could accomplish this using cursor grab the InvID and remove it by simple delete statement, but I'm trying to stay away from cursors.
Any suggestions on how I can accomplish this?
You can use exists to delete all duplicates except the one with the highest InvID by deleting those rows where another row exists with the same values but with a higher InvID
delete from inv_v
where exists (
select 1 from inv_v i2
where i2.InvID > inv_v.InvID
and i2.minutes = inv_v.minutes
and i2.EndDate = inv_v.EndDate
and i2.abid = inv_v.abid
and i2.uid <> inv_v.uid -- not sure why <> is used here, copied from question
)
I have faced similar problems regarding duplicate data and some one told me to use partition by and other methods but those were causing performance issues
However , I had a primary key in my table through which I was able to select one row from the duplicate data and then delete it.
For example in the first select statement "minutes" and "ABID" are the criteria to consider duplicacy in data.But "Invid" can be used to distinguish between the duplicate rows.
So you can use below query to remove duplicacy.
delete from inv_i where inv_id in (select max(inv_id) from inv_i group by minutes,abid having count(*) > 1 );
This simple concept was helpful to me. It can be helpful in your case if "Inv_id" is unique.
;WITH CTE AS
(
SELECT InvID
,[UID]
,StartDate
,EndDate
,[Minutes]
,ROW_NUMBER() OVER (PARTITION BY InvID, [UID] ORDER BY [Minutes] ASC) rn
FROM InvoiceLines_v
)
SELECT *
FROM CTE
WHERE rn = 1
Replace the ORIGINAL_TABLE with your table name.
QUERY 1:
WITH DUP_TABLE AS
(
SELECT ROW_NUMBER()
OVER (PARTITION BY minutes, ABID ORDER BY minutes, ABID) As ROW_NO
FROM <ORIGINAL_TABLE>
)
DELETE FROM DUP_TABLE WHERE ROW_NO > 1;
QUERY 2:
WITH DUP_TABLE AS
(
SELECT ROW_NUMBER()
OVER (PARTITION BY UID ORDER BY minutes) As ROW_NO
FROM <ORIGINAL_TABLE>
)
DELETE FROM DUP_TABLE WHERE ROW_NO > 1;

Getting row number for query

I have a query which will return one row. Is there any way I can find the row index of the row I'm querying when the table is sorted?
I've tried rowid but got #582 when I was expecting row #7.
Eg:
CategoryID Name
I9GDS720K4 CatA
LPQTOR25XR CatB
EOQ215FT5_ CatC
K2OCS31WTM CatD
JV5FIYY4XC CatE
--> C_L7761O2U CatF <-- I want this row (#5)
OU3XC6T19K CatG
L9YKCYAYMG CatH
XKWMQ7HREG CatI
I've tried rowid with unexpected results:
SELECT rowid FROM Categories WHERE CategoryID = 'C_L7761O2U ORDER BY Name
EDIT: I've also tried J Cooper's suggestion (below), but the row numbers just aren't right.
using (var cmd = conn.CreateCommand()) {
cmd.CommandText = string.Format(#"SELECT (SELECT COUNT(*) FROM Recipes AS t2 WHERE t2.RecipeID <= t1.RecipeID) AS row_Num
FROM Recipes AS t1
WHERE RecipeID = 'FB3XSAXRWD'
ORDER BY Name";
cmd.Parameters.AddWithValue("#recipeId", id);
idx = Convert.ToInt32(cmd.ExecuteScalar());
Here is a way to get the row number in Sqlite:
SELECT CategoryID,
Name,
(SELECT COUNT(*)
FROM mytable AS t2
WHERE t2.Name <= t1.Name) AS row_Num
FROM mytable AS t1
ORDER BY Name, CategoryID;
Here's a funny trick you can use in Spatialite to get the order of values. If you use the count() function with a WHERE clause limiting to only values >= the current value, then the count will actually give the order. So if I have a point layer called "mypoints" with columns "value" and "val_order" then:
SELECT value, (
SELECT count(*) FROM mypoints AS my
WHERE my.value>=mypoints.value) AS val_order
FROM mypoints
ORDER BY value DESC;
Gives the descending order of the values.
I can update the "val_order" column this way:
UPDATE mypoints SET val_order = (
SELECT count(*) FROM mypoints AS my
WHERE my.value>=mypoints.value
);
What you are asking can be explained in two different ways, but I'm assuming you want to sort the resulting table and then number those rows according to the sort.
declare #resultrow int
select
#resultrow = row_number() OVER (ORDER BY Name Asc) as 'Row Number'
from Categories WHERE CategoryID = 'C_L776102U'
select #resultrow

Selecting a single (random) row for an SQL join

I've got an sql query that selects data from several tables, but I only want to match a single(randomly selected) row from another table.
Easier to show some code, I guess ;)
Table K is (k_id, selected)
Table C is (c_id, image)
Table S is (c_id, date)
Table M is (c_id, k_id, score)
All ID-columns are primary keys, with appropriate FK constraints.
What I want, in english, is for eack row in K that has selected = 1 to get a random row from C where there exists a row in M with (K_id, C_id), where the score is higher than a given value, and where c.image is not null and there is a row in s with c_id
Something like:
select k.k_id, c.c_id, m.score
from k,c,m,s
where k.selected = 1
and m.score > some_value
and m.k_id = k.k_id
and m.c_id = c.c_id
and c.image is not null
and s.c_id = c.c_id;
The only problem is this returns all the rows in C that match the criteria - I only want one...
I can see how to do it using PL/SQL to select all relevent rows into a collection and then select a random one, but I'm stuck as to how to select a random one.
you can use the 'order by dbms_random.random' instruction with your query.
i.e.:
SELECT column FROM
(
SELECT column FROM table
ORDER BY dbms_random.value
)
WHERE rownum = 1
References:
http://awads.net/wp/2005/08/09/order-by-no-order/
http://www.petefreitag.com/item/466.cfm
with analytics:
SELECT k_id, c_id, score
FROM (SELECT k.k_id, c.c_id, m.score,
row_number() over(PARTITION BY k.k_id ORDER BY NULL) rk
FROM k, c, m, s
WHERE k.selected = 1
AND m.score > some_value
AND m.k_id = k.k_id
AND m.c_id = c.c_id
AND c.image IS NOT NULL
AND s.c_id = c.c_id)
WHERE rk = 1
This will select one row that satisfies your criteria per k_id. This will likely select the same set of rows if you run the query several times. If you want more randomness (each run produces a different set of rows), you would replace ORDER BY NULL by ORDER BY dbms_random.value
I'm not too familiar with oracle SQL, but try using LIMIT random(), if there is such a function available.