Find Duplicate Values in a column based on specific criteria

Find Duplicate Values in a column based on specific criteria - sql

I have a table that holds actions against specific accounts, the actions are given a numbered SET of actions and within that SET they get a unique, sequential number. We ran into an issue where somehow one of the unique numbers had been duplicated and would like to check for more examples where this might have happened. The table looks a little like this:
Account | Action Set | Action No | Action Code
--------|------------|-----------|------------
001 | 1 | 1 | GEN
001 | 1 | 2 | PHO
001 | 1 | 3 | RAN
001 | 1 | 3 | GEN
002 | 1 | 1 | GEN
002 | 1 | 2 | PHO
002 | 1 | 3 | RAN
I have tried various things I've found through searches on here but can't find anything that looks like it fits my specific circumstances.
For any given account number, I would like to find where within one Action SET the same Action Number is used more than once. I also need to return the full row, not just a count of how many there are.
From the example above, I would expect to see these results, same account, same action set, same action number
Account | Action Set | Action No | Action Code
--------|------------|-----------|------------
001 | 1 | 3 | RAN
001 | 1 | 3 | GEN
I would post what I have tried so far but honestly the extent of the code I have written so far is:
SELECT
TIA
Mark

Based on your description, you can use exists:
select t.*
from t
where exists (select 1
from t t2
where t2.account = t.account and
t2.actionset = t.actionset and
t2.actionno <> t.actionno
);
EDIT:
The above assumes that action numbers are different. Otherwise you can use:
select t.*
from t
where (select count(*)
from t t2
where t2.account = t.account and
t2.actionset = t.actionset
) >= 2;

try this one
Select account,actionset,actioncode,actionno
from table
where (account,actionset)
IN
(
Select account,actionset from table
group by account,actionset
having count(distinct actionno)>1
)
group by account,actionset,actioncode,actionno

Please find my solution for Getting duplicate records from table.
SELECT [ActionSet],ActionCode,[ActionNo]
FROM
(
SELECT *,ROW_NUMBER()OVER(PARTITION by [ActionSet],[ActionNo] ORDER BY
[ActionNo]) as rnk FROM [dbo].[ActionAccount]
) t where t.rnk>1
Thanks .

Related

JOIN two tables, but only include data from first table in first instance of each unique record

Title might be confusing.
I have a table of Cases, and each Case can contain many Tasks. To achieve a different workflow for each Task, I have different tables such as Case_Emails, Case_Calls, Case_Chats, etc...
I want to build a Query that will eventually be exported to Excel. In this query, I want to list out each Task, and the Tasks are already joined together via a UNION in another table using a common format. For each task in the Query, I want only the first Task associated with a case to include the details from Cases table. Example below:
+----+---------+------------+-------------+-------------+-------------+
| id | Case ID | Agent Name | Task Info 1 | Task Info 2 | Task Info 3 |
+----+---------+------------+-------------+-------------+-------------+
| 1 | 4000000 | Some Name | Detailstuff | Stuffdetail | Thingsyo |
| 2 | | | Detailstuff | Stuffdetail | Thingsyo |
| 3 | | | Detailstuff | Stuffdetail | Thingsyo |
| 4 | 4000003 | Some Name | Detailstuff | Stuffdetail | Thingsyo |
| 5 | | | Detailstuff | Stuffdetail | Thingsyo |
| 6 | 4000006 | Some Name | Detailstuff | Stuffdetail | Thingsyo |
+----+---------+------------+-------------+-------------+-------------+
My original approach was attempting a LEFT JOIN on Case ID, but I couldn't figure out how to filter the data out from the extra rows.

This would be much simpler if Access supported the ROW_NUMBER function. It doesn't, but you can sort of simulate it with a correlated subquery using the Tasks table (this assumes that each task has a unique numeric ID). This basically assigns a row number to each task, partitioned by the CaseID. Then you can just conditionally display the CaseID and AgentName where RowNum = 1.
SELECT Switch(RowNum = 1, CaseID) as Case,
Switch(RowNum = 1, AgentName) as Agent,
TaskName
FROM (
SELECT c.CaseID,
c.AgentName,
t.TaskName,
(select count(*)
from Tasks t2
where t2.CaseID = c.CaseID and t2.ID <= t.ID) as RowNum
FROM Cases c
INNER JOIN Tasks t ON c.CaseID = t.CaseID
order by c.CaseID, t.TaskName
)
You didn't post your table structure, so I'm not sure this will work for you as-is, but maybe you can adapt it.

No matter what when you join you will have duplicate values. to remove the duplicates either put in a Distinct in your select or a Group by after your filters. This should resolve the duplicates in you query for task info 1,2,3.

Found out that I can name my tables in the query like so:
FROM Case_Calls Calls
With this other name, I was able to filter based on a sub query:
IIF( Calls.[ID] <> (select top 1 [ID] from Case_Calls where [Case ID] = Calls.[Case ID]), '', Cases.[Creator]) As [Case Creator]
This solution gives me the results that I want :) It's rather ugly SQL, and difficult to parse when I'm dealing with dozens of columns, but it gets the job done!
I'm still curious if there is a better solution...

Find spectators that have seen the same shows (match multiple rows for each)

For an assignment I have to write several SQL queries for a database stored in a PostgreSQL server running PostgreSQL 9.3.0. However, I find myself blocked with last query. The database models a reservation system for an opera house. The query is about associating the a spectator the other spectators that assist to the same events every time.
The model looks like this:
Reservations table
id_res | create_date | tickets_presented | id_show | id_spectator | price | category
-------+---------------------+---------------------+---------+--------------+-------+----------
1 | 2015-08-05 17:45:03 | | 1 | 1 | 195 | 1
2 | 2014-03-15 14:51:08 | 2014-11-30 14:17:00 | 11 | 1 | 150 | 2
Spectators table
id_spectator | last_name | first_name | email | create_time | age
---------------+------------+------------+----------------------------------------+---------------------+-----
1 | gonzalez | colin | colin.gonzalez#gmail.com | 2014-03-15 14:21:30 | 22
2 | bequet | camille | bequet.camille#gmail.com | 2014-12-10 15:22:31 | 22
Shows table
id_show | name | kind | presentation_date | start_time | end_time | id_season | capacity_cat1 | capacity_cat2 | capacity_cat3 | price_cat1 | price_cat2 | price_cat3
---------+------------------------+--------+-------------------+------------+----------+-----------+---------------+---------------+---------------+------------+------------+------------
1 | madama butterfly | opera | 2015-09-05 | 19:30:00 | 21:30:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
2 | don giovanni | opera | 2015-09-12 | 19:30:00 | 21:45:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
So far I've started by writing a query to get the id of the spectator and the date of the show he's attending to, the query looks like this.
SELECT Reservations.id_spectator, Shows.presentation_date
FROM Reservations
LEFT JOIN Shows ON Reservations.id_show = Shows.id_show;
Could someone help me understand better the problem and hint me towards finding a solution. Thanks in advance.
So the result I'm expecting should be something like this
id_spectator | other_id_spectators
-------------+--------------------
1| 2,3
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.

Note based on comments: Wanted to make clear that this answer may be of limited use as it was answered in the context of SQL-Server (tag was present at the time)
There is probably a better way to do it, but you could do it with the 'stuff 'function. The only drawback here is that, since your ids are ints, placing a comma between values will involve a work around (would need to be a string). Below is the method I can think of using a work around.
SELECT [id_spectator], [id_show]
, STUFF((SELECT ',' + CAST(A.[id_spectator] as NVARCHAR(10))
FROM reservations A
Where A.[id_show]=B.[id_show] AND a.[id_spectator] != b.[id_spectator] FOR XML PATH('')),1,1,'') As [other_id_spectators]
From reservations B
Group By [id_spectator], [id_show]
This will show you all other spectators that attended the same shows.

Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
In other words, you want a list of ...
all spectators that have seen all the shows that a given spectator has seen (and possibly more than the given one)
This is a special case of relational division. We have assembled an arsenal of basic techniques here:
How to filter SQL results in a has-many-through relation
It is special because the list of shows each spectator has to have attended is dynamically determined by the given prime spectator.
Assuming that (d_spectator, id_show) is unique in reservations, which has not been clarified.
A UNIQUE constraint on those two columns (in that order) also provides the most important index.
For best performance in query 2 and 3 below also create an index with leading id_show.
1. Brute force
The primitive approach would be to form a sorted array of shows the given user has seen and compare the same array of others:
SELECT 1 AS id_spectator, array_agg(sub.id_spectator) AS id_other_spectators
FROM (
SELECT id_spectator
FROM reservations r
WHERE id_spectator <> 1
GROUP BY 1
HAVING array_agg(id_show ORDER BY id_show)
#> (SELECT array_agg(id_show ORDER BY id_show)
FROM reservations
WHERE id_spectator = 1)
) sub;
But this is potentially very expensive for big tables. The whole table hast to be processes, and in a rather expensive way, too.
2. Smarter
Use a CTE to determine relevant shows, then only consider those
WITH shows AS ( -- all shows of id 1; 1 row per show
SELECT id_spectator, id_show
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
)
SELECT sub.id_spectator, array_agg(sub.other) AS id_other_spectators
FROM (
SELECT s.id_spectator, r.id_spectator AS other
FROM shows s
JOIN reservations r USING (id_show)
WHERE r.id_spectator <> s.id_spectator
GROUP BY 1,2
HAVING count(*) = (SELECT count(*) FROM shows)
) sub
GROUP BY 1;
#> is the "contains2 operator for arrays - so we get all spectators that have at least seen the same shows.
Faster than 1. because only relevant shows are considered.
3. Real smart
To also exclude spectators that are not going to qualify early from the query, use a recursive CTE:
WITH RECURSIVE shows AS ( -- produces exactly 1 row
SELECT id_spectator, array_agg(id_show) AS shows, count(*) AS ct
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
GROUP BY 1
)
, cte AS (
SELECT r.id_spectator, 1 AS idx
FROM shows s
JOIN reservations r ON r.id_show = s.shows[1]
WHERE r.id_spectator <> s.id_spectator
UNION ALL
SELECT r.id_spectator, idx + 1
FROM cte c
JOIN reservations r USING (id_spectator)
JOIN shows s ON s.shows[c.idx + 1] = r.id_show
)
SELECT s.id_spectator, array_agg(c.id_spectator) AS id_other_spectators
FROM shows s
JOIN cte c ON c.idx = s.ct -- has an entry for every show
GROUP BY 1;
Note that the first CTE is non-recursive. Only the second part is recursive (iterative really).
This should be fastest for small selections from big tables. Row that don't qualify are excluded early. the two indices I mentioned are essential.
SQL Fiddle demonstrating all three.

It sounds like you have one half of the total question--determining which id_shows a particular id_spectator attended.
What you want to ask yourself is how you can determine which id_spectators attended an id_show, given an id_show. Once you have that, combine the two answers to get the full result.

So the final answer I got, looks like this :
SELECT id_spectator, id_show,(
SELECT string_agg(to_char(A.id_spectator, '999'), ',')
FROM Reservations A
WHERE A.id_show=B.id_show
) AS other_id_spectators
FROM Reservations B
GROUP By id_spectator, id_show
ORDER BY id_spectator ASC;
Which prints something like this:
id_spectator | id_show | other_id_spectators
-------------+---------+---------------------
1 | 1 | 1, 2, 9
1 | 14 | 1, 2
Which suits my needs, however if you have any improvements to offer, please share :) Thanks again everybody!

Access SQL Max-Function

I have a question concerning MS Access queries involving these tables:
tblMIDProcessMain ={ Process_ID,Process_Title,...}
tblMIDProcessVersion = { ProcessVersion_ID, ProcessVersion_FK_Process, ProcessVersion_VersionNo, ProcessVersion_FK_Status, ...}
tblMIDProcessVersionStatus = { ProcessVersionStatus_ID,ProcessVersionStatus_Value }
The tables store different versions of a process description. The "ProcessVersion_VersionNo" field contains an integer providing the version number. Now I would like to get for each process the highest version number thus the current version. If I do the following it kind of works:
SELECT tblMIDProcessMain.Process_Titel
, Max(tblMIDProcessVersion.ProcessVersion_VersionNo) AS CurrentVersion
FROM tblMIDProcessMain
INNER JOIN tblMIDProcessVersion
ON tblMIDProcessMain.Process_ID = tblMIDProcessVersion.ProcessVersion_FK_Process
GROUP BY tblMIDProcessMain.Process_Titel;
The query returns a recordset with each existing process_title and the respective max number of the version field. But as soon as I add other fields like "ProcessVersion_FK_Status" in the Select statement the query stops working.
Any help would be appreciated. Thanks in advance.
Jon
Edit:
To clarify things a little I added a simplified example
Parent-Table:
Process_ID | Process_Title
----------------------------------
1 | "MyProcess"
2 | "YourProcess"
Child-Table:
Version_ID | Version_FK_ProcessID | Version_No | Version_Status
---------------------------------------------------------------------------
1 | 1 | 1 | "New"
2 | 2 | 1 | "Discarded"
3 | 2 | 2 | "Reviewed"
4 | 2 | 3 | "Released"
Intended Result:
Title | Max_Version_No | Status
--------------------------------------------------------
MyProcess | 1 | "New"
YourProcess | 3 | "Released"

Given the example tables you updated your post with, this should work:
select process_title as Title
, max_version.max_version_no
, c.version_status as status
from (parenttable p
inner join (select max(version_id) as max_version_no, version_fk_process_id from childtable group by version_fk_process_id) max_version
on p.process_id = max_version.version_fk_process_id)
inner join childtable c
on max_version.max_version_no = c.version_id and max_version.version_fk_process_id = c.version_fk_process_id

I assume you are adding the new field to the 'Group By" clause? If not, then you either must include in the 'Group By', or you must use one of the operators like "Max" or "First" etc.

Fetch Id's that are related to a specific set of items, but not others

Good morning all, apologies for the title... i had trouble simplifying the problem down to a line. My database platform is Teradata.
I am working w/ a table like the following (let's call it "t1")
+------------+----------------------------------------+
| Service_Id | Product |
+------------+----------------------------------------+
| 1 | Traffic |
| 1 | Weather |
| 1 | Travel |
| 1 | Audio |
| 1 | Audio Add-on |
| 2 | Traffic |
| 2 | Weather |
| 2 | Travel |
+------------+----------------------------------------+
I am trying to select service_id's that are related to the following products AND ONLY the following products: Traffic, Weather, Travel
"Service_Id = 1" does not apply here because while it has the required products, it also has an "audio" product related to it... so we have to leave it out. I was able to successfully do this through a series of temp (volatile) tables but it's feeling really hacky and I feel there's got to be a better way. Thanks for your assistance.

I'm doing stuff like that (find a subset/superset/exact match for a set of rows) in my training classes using pizzas :-)
There are several ways to get your result, but for an exact match the easiest way is a SUM using following logic:
SELECT service_id
FROM t1
GROUP BY 1
HAVING
SUM(CASE WHEN Product IN ('Traffic', 'Weather', 'Travel') THEN 1 ELSE -1 END = 3

Assuming that Product is unique for every service_ID.
SELECT service_ID
FROM tableName a
WHERE Product IN ('Traffic', 'Weather', 'Travel') AND
EXISTS
(
SELECT 1
FROM tableName b
WHERE a.Service_ID = b.Service_ID
GROUP BY b.Service_ID
HAVING COUNT(*) = 3 -- <<== total number of products
)
GROUP BY service_ID
HAVING COUNT(*) = 3 -- <<== total number of products
SQLFiddle Demo (demo is running under MySQL database, not sure if it will work on teradata)

complex django or SQL query

A record can have status 'renewal_required'. If it enters this status, and the applicant indeed renews, a copy is generated, which enters status 'in_process' (But an application can have status 'in_process' for other reasons too).
Now I need to get all records that have renewal_required status, BUT, if a copy exists in status 'in_process' for a given applicant, I shall only show that one...the key is the applicant_id, being the same for copied records.
| id | status | applicant_id |
| 1 | renewal_required | 2 |
| 2 | in_process | 3 |
| 3 | renewal_required | 4 |
| 4 | in_process | 4 |
in the above example, records with id 1 and 4 would be returned...
Can this be done? Thanks for any suggestion (DB-redesign excluded, even if the design looks ridiculous - can't do anything about it right now)
Solution needs to be for django but if a SQL solution is being proposed I will happily accept it and adapt/execute directly

select a.applicant_id,COALESCE(b.status,a.status) status from
(select applicant_id,status from yourtable where status='renewal_required') a
left join
(select applicant_id,status from yourtable where status='in_process') b
on a.applicant_id = b.applicant_id;
check the DEMO

Here, a possible solution
SELECT MAX(t1.id) as max_id, t1.status, t1.applicant_id
FROM t1
JOIN (
SELECT MIN(status) as status, applicant_id
FROM t1
WHERE status in ('renewal_required', 'in_process')
GROUP by applicant_id ) tmp
ON t1.status = tmp.status
AND t1.applicant_id = tmp.applicant_id
GROUP BY t1.status, t1.applicant_id
SQL Fiddle
EDIT: Rethought it, now this one won't work if there are more statuses than just these two, because of SELECT MIN(status). Could you comment on that?
EDIT2: Might be like this it will. added WHERE status in ('renewal_required', 'in_process')

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find Duplicate Values in a column based on specific criteria - sql

try this one Select account,actionset,actioncode,actionno from table where (account,actionset) IN ( Select account,actionset from table group by account,actionset having count(distinct actionno)>1 ) group by account,actionset,actioncode,actionno

Please find my solution for Getting duplicate records from table. SELECT [ActionSet],ActionCode,[ActionNo] FROM ( SELECT *,ROW_NUMBER()OVER(PARTITION by [ActionSet],[ActionNo] ORDER BY [ActionNo]) as rnk FROM [dbo].[ActionAccount] ) t where t.rnk>1 Thanks .

Related

JOIN two tables, but only include data from first table in first instance of each unique record

Find spectators that have seen the same shows (match multiple rows for each)

Access SQL Max-Function

Fetch Id's that are related to a specific set of items, but not others

complex django or SQL query

Categories

Resources