SQL Finding multiple-line duplicates - sql

At my worki we have data stored in a database, the data is not normalized. I am looking for a way to find what data was duplicated.
Our Data base has 3 rows columns, Name, State, Strategy
This data might looks something like this:
OldTable:
Name | State | Strat
-----+-------+------
A | M | 1
A | X | 3
B | T | 6
C | M | 1
C | X | 3
D | X | 3
What I'd like to do is move the data to two tables, one containing the name the other containing the set of State and Strats so it would look more like this
NewTable0:
Name | StratID
-----+--------
A | 1
B | 2
C | 1
D | 3
NewTable1:
StratID | State | Strat
--------+-------+------
1 | M | 1
1 | X | 3
2 | T | 6
3 | X | 3
So in the data example A and C would be duplicates, but D would not be. How would I go about finding and/or identifying these duplicates?

Try:
SELECT OT1.Name Name1, OT2.Name Name2
FROM OldTable OT1
JOIN OldTable OT2 ON OT1.Name < OT2.Name AND
OT1.State = OT2.State AND
OT1.Strat = OT2.Strat
GROUP BY OT1.Name, OT2.Name
HAVING COUNT(*) = (SELECT COUNT(*) FROM OldTable TC1 WHERE TC1.NAME = OT1.NAME)
AND COUNT(*) = (SELECT COUNT(*) FROM OldTable TC2 WHERE TC2.NAME = OT2.NAME)

You could find this out by grouping the Names together, and only listing those where there is more than one record:
SELECT OldTable.Name, COUNT(1) Duplicates
FROM OldTable
GROUP BY OldTable.Name
HAVING Duplicates > 1
Should output:
OldTable:
Name | Duplicates
-----+------------
A | 2
C | 2

Related

SQL generate Data based of the ids of three tables

I have three tables store, gender, age_group each of these tables have ids. I need to generate table data for each one all possible combinations of the three.
ex. store_id = (1,2,3) gender_id = (1,2,3) age_group_id = (1,2,3)
so that i have a table that looks like this:
|store_id|gender_id|age_group_id|
|:------:|:-------:|:----------:|
| 1 | 1 | 1 |
| 1 | 2 | 1 |
| 1 | 3 | 1 |
| 2 | 1 | 3 |
| 2 | 2 | 3 |
| 3 | 1 | 3 |
| 3 | 2 | 3 |
etc. continuing on until each combination is populated, any suggestions on best approach to do this in SQL
Cross join the three tables:
select
s.Id as store_id,
g.Id as gender_id,
a.Id as age_group_id
from store s
cross join gender g
cross join age_group a

how to select unique records from a table based on a column which has distinct values in another column

I have below table SUBJ_SKILLS which has records like
TCHR_ID | LINE_NBR | SUBJ | SUBJ_TYPE
--------| ------- | ---------- | ----------
1 | 1 | Maths | R
1 | 2 | 101 | U
2 | 1 | BehaviourialTech | U
3 | 2 | Maths | R
4 | 1 | RegionalLANG | U
5 | 3 | ForeignLANG | U
5 | 4 | Maths | R
6 | 2 | Science | R
7 | 1 | 101 | U
7 | 3 | Physics | R
..
..
I am trying to retrieve records like below (i.e. single teacher who taught multiple different subjects)
TCHR_ID | LINE_NBR | SUBJ | SUBJ_TYPE
--------| ------- | ---------- | ----------
5 | 3 | ForeignLANG | U
5 | 4 | Maths | R
7 | 1 | 101 | U
7 | 3 | Physics | R
1 | 1 | Maths | R
1 | 2 | 101 | U
Here, the line numbers are unique, means that TCHR_ID:5 taught Physics (which was LINE_NBR=1, but was removed later). So, the LINE_NBR are not updated and stay as is.
i also have a look up table (SUBJ_LKUP) for subject and their categories/type like below ('R' for Regular subject and 'U' for Unique subject )
SUBJ | SUBJ_TYPE
----------------- | ------------
Maths | R
Physics | R
ForeignLANG | U
101 | U
Science | R
BehaviourialTech | U
RegionalLANG | U
My approach to resolve this was to create a table which have 2 records for Teacher and use another query on base table (SUBJ_SKILLS) and new table to filter out distinct records. I came up with below queries..
Query-1:
create table tchr_with_2_subj as select SS.TCHR_ID
from SUBJ_SKILLS SS, SUBJ_LKUP SL
where SS.SUBJ = SL.SUBJ
and SL.SUBJ_TYPE IN ('R', 'U') AND SS.TCHR_ID IN
(select SS.TCHR_ID from SUBJ_SKILLS SS)
GROUP BY SS.TCHR_ID HAVING COUNT(*) = 2)
Query-2:
select SS.TCHR_ID from SUBJ_SKILLS SS, tchr_with_2_subj tw2s
where SS.TCHR_ID = tw2s.TCHR_ID
GROUP BY SS.TCHR_ID,SS.SUBJ_TYPE HAVING COUNT(*) > 1)
Question:
1)'IN' condition in Query-1 is causing problems and pulling wrong records.
2) Is there a better way to write query to pull matching records using a single query (i.e. instead of creating a table)
Could someone help me on this pls.
For the answer to your original question, I would use window functions:
select ss.*
from (select ss.*,
min(subj) over (partition by tchr_id) as mins,
max(subj) over (partition by tchr_id) as maxs
from SUBJ_SKILLS ss
) ss
where mins <> maxs;
It is unclear how the subject type fits in, but if you need to include that, similar logic will work.
Your second table can be obtained from your first table with:
select ss.*
from
subj_skills as ss
inner join (
select tchr_id
from subj_skills
group by tchr_id
having count(*) > 1
) as mult on mult.tchr_id=ss.tchr_id;
I'd use analytic functions here, asomething like:
select tchr_id, line_nbr, subj, SUBJ_TYPE
from (select count(distinct subj) over (partition by tchr_id) as grp_cnt,
s.*
from subj_skills s)
where grp_cnt > 1
If you need to filter out invalid records, you can do it in the inner query. If a teacher cannot teach the same subject multiple times (the req 'multiple different subjects' can be translated to 'multiple subjects'), then I'd rather use count(*) instead of count(distinct subj).

Why isn't this returning unique combinations of these attributes?

When using the following query:
with neededSkills(SkillCode) as (
select distinct SkillCode
from job natural join hasprofile natural join requires_skill
where job_code = '1'
minus
select skillcode
from person natural join hasskill
where id = '1'
)
select distinct
taughtin.c_code as c,
count(taughtin.skillcode) as s,
ti.c_code as cc,
count(ti.skillcode) as ss
from taughtin, taughtin ti
where taughtin.c_code <> ti.c_code
and taughtin.skillcode <> ti.skillcode
and taughtin.skillcode in (select skillcode from neededskills)
and ti.skillcode in (select skillcode from neededskills)
group by (taughtin.c_code, ti.c_code)
order by (taughtin.c_code);
It returns:
C | S | CC | SS
----|----|----|----
1 | 1 | 2 | 1
1 | 1 | 3 | 1
1 | 1 | 5 | 1
2 | 1 | 1 | 1
3 | 1 | 1 | 1
5 | 1 | 1 | 1
I would expect it to return only lines where the combination of C and CC was not already used. Do I misunderstand how group by works? How would I achieve this result?
I am trying to have it return:
C | S | CC | SS
----|----|----|----
1 | 1 | 2 | 1
1 | 1 | 3 | 1
1 | 1 | 5 | 1
I use Oracle SQLPlus.
You're grouping on the combination of taughtin.c_code and ti.c_code, which are seperate columns in the context of the query (even though they are the same column in the schema). A pair of 1, 2 is not the same as a pair of 2, 1; the values may be the same but the sources are not.
If you want to get the combinations one way but not the other then the simplest thing is to always make one value large than the other; instead of:
where taughtin.c_code <> ti.c_code
use:
where ti.c_code > taughtin.c_code
Though it would be better to use ANSI joins for the main query too, and I'm not a fan of natural joins. You also don't need either distinct; the first may eliminate duplicates but they don't logically matter if you're only using the temporary result set for in()

Exclude a group when one row match in another table

I have two tables :
the first one called "card" with one column "id".
| id |
| 1 |
| 2 |
| 3 |
| .. |
The second table is named "waste" with two columns "card_id" and "waste_type".
| card_id | waste_type |
| 1 | 1 |
| 1 | 3 |
| 2 | 2 |
| 2 | 1 |
And i want to select only the card where there is no waste_type = 2
The query should look like this :
SELECT c.id FROM card c
JOIN waste w
ON c.id = w.card_id
WHERE waste_type <> 2
I want this result :
id
1
But i get :
id
1
2
How can i do that ? Thank you so much in advance !
You should use a not exists clause for that.
select c.id
from card c
where not exists (select null from waste w
where w.card_id = c.id
and w.waste_type = 2)
With your query, I would guess you rather retrieve
1
1
2

How to write a Sql query to find distinct values that have never met the following "Where Not(a=x and b=x)"

I have the following table called Attributes
* AttId * CustomerId * Class * Code *
| 1 | 1 | 1 | AA |
| 2 | 1 | 1 | AB |
| 3 | 1 | 1 | AC |
| 4 | 1 | 2 | AA |
| 5 | 1 | 2 | AB |
| 6 | 1 | 3 | AB |
| 7 | 2 | 1 | AA |
| 8 | 2 | 1 | AC |
| 9 | 2 | 2 | AA |
| 10 | 3 | 1 | AB |
| 11 | 3 | 3 | AB |
| 12 | 4 | 1 | AA |
| 13 | 4 | 2 | AA |
| 14 | 4 | 2 | AB |
| 15 | 4 | 3 | AB |
Where each Class, Code pairing represents a specific Attribute.
I'm trying to write a query that returns all customers that are NOT linked to the Attribute pairing Class = 1, Code = AB.
This would return Customer Id values 2 and 4.
I started to write Select Distinct A.CustomerId From Attributes A Where (A.Class = 1 and A.Code = 'AB') but stopped when I realised I was writing a SQL query and there is not an operator available to place before the parentheses to indicate the clause within must Not be met.
What am I missing? Or which operator should I be looking at?
Edit:
I'm trying to write a query that only returns those Customers (ie distinct Customer Id's) that have NO link to the Attribute pairing Class = 1, Code = AB.
This could only be Customer Id values 2 and 4 as the table does Not contain the rows:
* AttId * CustomerId * Class * Code *
| x | 2 | 1 | AB |
| x | 4 | 1 | AB |
Changed Title from:
How to write "Where Not(a=x and b=x)"in Sql Query
To:
How to write a Sql query to find distinct values that have never met the following "Where Not(a=x and b=x)"
As the previous title was a question in it's own right however the detail of the question added an extra dimension which led to confusion.
One way would be
SELECT DISTINCT CustomerId FROM Attributes a
WHERE NOT EXISTS (
SELECT * FROM Attributes forbidden
WHERE forbidden.CustomerId = a.CustomerId AND forbidden.Class = _forbiddenClassValue_ AND forbidden.Code = _forbiddenCodeValue_
)
or with join
SELECT DISTINCT a.CustomerId FROM Attributes a
LEFT JOIN (
SELECT CustomerId FROM Attributes
WHERE Class = _forbiddenClassValue_ AND Code = _forbiddenCodeValue_
) havingForbiddenPair ON a.CustomerId = havingForbiddenPair.CustomerId
WHERE havingForbiddenPair.CustomerId IS NULL
Yet another way is to use EXCEPT, as per ypercube's answer
SELECT CustomerId
FROM Attributes
EXCEPT
SELECT CustomerId
FROM Attributes
WHERE Class = 1
AND Code = AB ;
Since no one has posted the simple logical statement, here it is:
select . . .
where A.Class <> 1 OR A.Code <> 'AB'
The negative of (X and Y) is (not X or not Y).
I see, this is a grouping thing. For this, you use aggregation and having:
select customerId
from Attributes a
group by CustomerId
having sum(case when A.Class = 1 and A.Code = 'AB' then 1 else 0 end) = 0
I always prefer to solve "is it in a set" type questions using this technique.
Select Distinct A.CustomerId From Attributes A Where not (A.Class = 1 and A.Code = 'AB')
Try this:
SELECT DISTINCT A.CustomerId From Attributes A Where
0 = CASE
WHEN A.Class = 1 and A.Code = 'AB' THEN 1
ELSE 0
END
Edit: of course this still gives you cust 1 (doh!), you should probably use pjotrs NOT EXISTS query ideally, serves me right for not looking at the data closely enough :)