Comparing Result Set Against Itself - sql

I have a query that I would like to compare against itself - so far this query is working (if there is a neater/better way to write it I'd like to know!) but it's producing some "duplicate" values.
SELECT
a.GroupID,
a.MemberH,
a.ChartID as ChartIDA,
b.ChartID as ChartIDB
FROM
(Select DISTINCT
s.GroupID,
c.ChartID,
m.MemberH
From Charts c, ChartRetrieval cr, Sites s, Members m
Where
c.ChartID=cr.ChartID
and cr.ChartScanningStatusID <> 331
and s.SiteID=c.SiteID
and s.ProjectID not in (1,2,111)
and m.MemberID=c.MemberID) a
INNER JOIN
(Select DISTINCT
s.GroupID,
c.ChartID,
m.MemberH
From Charts c, ChartRetrieval cr, Sites s, Members m
Where
c.ChartID=cr.ChartID
and cr.ChartScanningStatusID <> 331
and s.SiteID=c.SiteID
and s.ProjectID not in (1,2,111)
and m.MemberID=c.MemberID) b ON a.GroupID=b.GroupID AND a.MemberHICN=b.MemberHICN
WHERE
a.GroupID=b.GroupID
and a.MemberH=b.MemberH
and a.ChartID <> b.ChartID
Order By a.GroupID
So far the results are correct but as I said it's giving me some dupes.
IE -
Group ID | MemberH | ChartIDA | ChartIDB
-----------------------------------------
471021 | 810392941 | 4810391 | 2193845
-----------------------------------------
471021 | 810392941 | 2193845 | 4810391
I know these rows are technically not duplicates but the info is the same just flipped (so for me they are dupes lol).
Is there a way I can fix this?

A simple trick :
and a.ChartID <> b.ChartID
Instead of
and a.ChartID < b.ChartID
This will show only one of the two rows.

This is half an answer as it does not cover the duplicate rows. You could self join a CTE in order to have a smaller query:
;WITH SomeQuery AS (
SELECT
a.GroupID,
a.MemberH,
a.ChartID as ChartIDA,
b.ChartID as ChartIDB
FROM
(Select DISTINCT
s.GroupID,
c.ChartID,
m.MemberH
From Charts c, ChartRetrieval cr, Sites s, Members m
Where
c.ChartID=cr.ChartID
and cr.ChartScanningStatusID <> 331
and s.SiteID=c.SiteID
and s.ProjectID not in (1,2,111)
and m.MemberID=c.MemberID)
)
SELECT A.*
FROM SomeQuery A
INNER JOIN SomeQuery B
ON A.GroupID = B.GroupID AND A.MemberHICN = B.MemberHICN
WHERE
A.GroupID = B.GroupID
and A.MemberH = B.MemberH
and A.ChartID <> B.ChartID
Order By A.GroupID
More about Common Table Expressions here.

Related

Sql code for distinct fields

I was wondering if anyone can help me with this query.
I have two tables that I join together (DDS2ENVR.QBO AND KCA0001.ORTS)
THE QBO Table has a field labeled NIIN AND RIC. THE KCA0001.ORTS table has a field named SERVICE and OWN_RIC.
I Join the tables by QBO.RIC and ORTS.OWN_RIC. My dilemma is that under the NIIN field multiple rows can be identical but have different values for RIC.
Example:
NIIN RIC
123455 A
122222 B
123456 C
122222 A
I want to query a distinct count for NIINS that separates by the different service where it does not overlap. So example NIIN should only find distinct values only associated with A where the same NIIN is not found in B,C,D etc.
SELECT D.SERVICE, COUNT(C.NIIN)
FROM DDS2ENVR.QBO C
JOIN KCA0001.ORTS D ON D.OWN_RIC = C.RIC
WHERE C.SITE_ID = ('HEAA')
GROUP BY D.SERVICE
HAVING COUNT(DISTINCT C.NIIN) > 1
Please ask questions if this does not make any sense.
Using Not Exists
SELECT D.SERVICE, COUNT(C.NIIN)
FROM DDS2ENVR.QBO C
JOIN KCA0001.ORTS D ON D.OWN_RIC = C.RIC
WHERE C.SITE_ID = ('HEAA')
and NOT EXISTS (Select 1 from DDS2ENVR.QBO C1 where C1.NIIN = C.NIIN and C1.RIC <> C.RIC)
GROUP BY D.SERVICE
HAVING COUNT(DISTINCT C.NIIN) > 1
Also if the table DDS2ENVR.QBO doesn't contain duplicates and your dbms supports CTE
With cte as
(Select NIIN from DDS2ENVR.QBO group by NIIN having count(*) = 1)
SELECT D.SERVICE, COUNT(C.NIIN)
FROM DDS2ENVR.QBO C
JOIN KCA0001.ORTS D ON D.OWN_RIC = C.RIC
WHERE C.SITE_ID = ('HEAA')
and C.NIIN in (Select * from cte)
GROUP BY D.SERVICE
HAVING COUNT(DISTINCT C.NIIN) > 1

Restricting results to only rows where one value appears only once

I have a query that is more complex than the example here, but which needs to only return the rows where a certain field doesn't appear more than once in the data set.
ACTIVITY_SK STUDY_ACTIVITY_SK
100 200
101 201
102 200
100 203
In this example I don't want any records with an ACTIVITY_SK of 100 being returned because ACTIVITY_SK appears twice in the data set.
The data is a mapping table, and is used in many joins, but multiple records like this imply data quality issues and so I need to simply remove them from the results, rather than cause a bad join elsewhere.
SELECT
A.ACTIVITY_SK,
A.STATUS,
B.STUDY_ACTIVITY_SK,
B.NAME,
B.PROJECT
FROM
ACTIVITY A,
PROJECT B
WHERE
A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
I had tried something like this:
SELECT
A.ACTIVITY_SK,
A.STATUS,
B.STUDY_ACTIVITY_SK,
B.NAME,
B.PROJECT
FROM
ACTIVITY A,
PROJECT B
WHERE
A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
WHERE A.ACTIVITY_SK NOT IN
(
SELECT
A.ACTIVITY_SK,
COUNT(*)
FROM
ACTIVITY A,
PROJECT B
WHERE
A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
GROUP BY A.ACTIVITY_SK
HAVING COUNT(*) > 1
)
But there must be a less expensive way of doing this...
Something like this could be a bit "cheaper" to run:
SELECT
A.ACTIVITY_SK,
A.STATUS,
B.STUDY_ACTIVITY_SK,
B.NAME,
B.PROJECT
PROJECT B INNER JOIN
(SELECT
ACTIVITY_SK,
MIN(STATUS) STATUS,
FROM
ACTIVITY
GROUP BY ACTIVITY_SK
HAVING COUNT(ACTIVITY_SK) = 1 ) A
ON A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
Another alternative:
select * from (
SELECT
A.ACTIVITY_SK,
A.STATUS,
B.STUDY_ACTIVITY_SK,
B.NAME,
B.PROJECT,
count(distinct a.pk) over (partition by a.activity_sk) AS c
FROM
ACTIVITY A,
PROJECT B
WHERE
A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
) where c = 1;
(where a.pk refers to a unique identifier from the ACTIVITY table)

Using (IN operator) OR condition in Where clause as AND condition

Please look at following image, I have explained my requirements in the image.
alt text http://img30.imageshack.us/img30/5668/shippment.png
I can't use here WHERE UsageTypeid IN(1,2,3,4) because this will behave as an OR condition and fetch all records.
I just want those records, of first table, which are attached with all 4 ShipmentToID .
All others which are attached with 3 or less ShipmentToIDs are not needed in result set.
Thanks.
if (EntityId, UsageTypeId) is unique:
select s.PrimaryKeyField, s.ShipmentId from shipment s, item a
where s.PrimaryKeyField = a.EntityId and a.UsageTypeId in (1,2,3,4)
group by s.PrimaryKeyField, s.ShipmentId having count(*) = 4
otherwise, 4-way join for the 4 fields,
select distinct s.* from shipment s, item a, item b, item c, item d where
s.PrimaryKeyField = a.EntityId = b.EntityId = c.EntityId = d.EntityId and
a.UsageTypeId = 1 and b.UsageTypeId = 2 and c.UsageTypeId = 3 and
d.UsageTypeId = 4
you'll want appropriate index on (EntityId, UsageTypeId) so it doesn't hang...
If there will never be duplicates of the UsageTypeId-EntityId combo in the 2nd table, so you'll never see:
EntityUsageTypeId | EntityId | UsageTypeId
22685 | 4477 | 1
22687 | 4477 | 1
You can count matching EntityIds in that table.
WHERE (count(*) in <tablename> WHERE EntityId = 4477) = 4
DECLARE #numShippingMethods int;
SELECT #numShippingMethods = COUNT(*)
FROM shippedToTable;
SELECT tbl1.shipmentID, COUNT(UsageTypeId) as Usages
FROM tbl2 JOIN tbl1 ON tbl2.EntityId = tbl1.EntityId
GROUP BY tbl1.EntityID
HAVING COUNT(UsageTypeId) = #numShippingMethods
This way is preferred to the multiple join against same table method, as you can simply modify the IN clause and the COUNT without needing to add or subtract more tables to the query when your list of IDs changes:
select EntityId, ShipmentId
from (
select EntityId
from (
select EntityId
from EntityUsage eu
where UsageTypeId in (1,2,3,4)
group by EntityId, UsageTypeId
) b
group by EntityId
having count(*) = 4
) a
inner join Shipment s on a.EntityId = s.EntityId

MySql scoping problem with correlated subqueries

I'm having this Mysql query, It works:
SELECT
nom
,prenom
,(SELECT GROUP_CONCAT(category_en) FROM
(SELECT DISTINCT category_en FROM categories c WHERE id IN
(SELECT DISTINCT category_id FROM m3allems_to_categories m2c WHERE m3allem_id = 37)
) cS
) categories
,(SELECT GROUP_CONCAT(area_en) FROM
(SELECT DISTINCT area_en FROM areas c WHERE id IN
(SELECT DISTINCT area_id FROM m3allems_to_areas m2a WHERE m3allem_id = 37)
) aSq
) areas
FROM m3allems m
WHERE m.id = 37
The result is:
nom prenom categories areas
Man Multi Carpentry,Paint,Walls Beirut,Baalbak,Saida
It works correclty, but only when i hardcode into the query the id that I want (37).
I want it to work for all entries in the m3allem table, so I try this:
SELECT
nom
,prenom
,(SELECT GROUP_CONCAT(category_en) FROM
(SELECT DISTINCT category_en FROM categories c WHERE id IN
(SELECT DISTINCT category_id FROM m3allems_to_categories m2c WHERE m3allem_id = m.id)
) cS
) categories
,(SELECT GROUP_CONCAT(area_en) FROM
(SELECT DISTINCT area_en FROM areas c WHERE id IN
(SELECT DISTINCT area_id FROM m3allems_to_areas m2a WHERE m3allem_id = m.id)
) aSq
) areas
FROM m3allems m
And I get an error:
Unknown column 'm.id' in 'where
clause'
Why?
From the MySql manual:
13.2.8.7. Correlated Subqueries
[...]
Scoping rule: MySQL evaluates from inside to outside.
So... do this not work when the subquery is in a SELECT section? I did not read anything about that.
Does anyone know? What should I do? It took me a long time to build this query... I know it's a monster query but it gets what I want in a single query, and I am so close to getting it to work!
Can anyone help?
You can only correlate one level deep.
Use:
SELECT m.nom,
m.prenom,
x.categories,
y.areas
FROM m3allens m
LEFT JOIN (SELECT m2c.m3allem_id,
GROUP_CONCAT(DISTINCT c.category_en) AS categories
FROM CATEGORIES c
JOIN m3allems_to_categories m2c ON m2c.category_id = c.id
GROUP BY m2c.m3allem_id) x ON x.m3allem_id = m.id
LEFT JOIN (SELECT m2a.m3allem_id,
GROUP_CONCAT(DISTINCT a.area_en) AS areas
FROM AREAS a
JOIN m3allems_to_areas m2a ON m2a.area_id = a.id
GROUP BY m2a.m3allem_id) y ON y.m3allem_id = m.id
WHERE m.id = ?
The reason for the error is that in the subquery m is not defined. It is defined later in the outer query.

Multiple MAX values select using inner join

I have query that work for me only when values in the StakeValue don't repeat.
Basically, I need to select maximum values from SI_STAKES table with their relations from two other tables grouped by internal type.
SELECT a.StakeValue, b.[StakeName], c.[ProviderName]
FROM SI_STAKES AS a
INNER JOIN SI_STAKESTYPES AS b ON a.[StakeTypeID] = b.[ID]
INNER JOIN SI_PROVIDERS AS c ON a.[ProviderID] = c.[ID] WHERE a.[EventID]=6
AND a.[StakeGroupTypeID]=1
AND a.StakeValue IN
(SELECT MAX(d.StakeValue) FROM SI_STAKES AS d
WHERE d.[EventID]=a.[EventID] AND d.[StakeGroupTypeID]=a.[StakeGroupTypeID]
GROUP BY d.[StakeTypeID])
ORDER BY b.[StakeName], a.[StakeValue] DESC
Results for example must be:
[ID] [MaxValue] [StakeTypeID] [ProviderName]
1 1,5 6 provider1
2 3,75 7 provider2
3 7,6 8 provider3
Thank you for your help
There are two problems to solve here.
1) Finding the max values per type. This will get the Max value per StakeType and make sure that we do the exercise only for the wanted events and group type.
SELECT StakeGroupTypeID, EventID, StakeTypeID, MAX(StakeValue) AS MaxStakeValue
FROM SI_STAKES
WHERE Stake.[EventID]=6
AND Stake.[StakeGroupTypeID]=1
GROUP BY StakeGroupTypeID, EventID, StakeTypeID
2) Then we need to get only one return back for that value since it may be present more then once.
Using the Max Value, we must find a unique row for each I usually do this by getting the Max ID is has the added advantage of getting me the most recent entry.
SELECT MAX(SMaxID.ID) AS ID
FROM SI_STAKES AS SMaxID
INNER JOIN (
SELECT StakeGroupTypeID, EventID, StakeTypeID, MAX(StakeValue) AS MaxStakeValue
FROM SI_STAKES
WHERE Stake.[EventID]=6
AND Stake.[StakeGroupTypeID]=1
GROUP BY StakeGroupTypeID, EventID, StakeTypeID
) AS SMaxVal ON SMaxID.StakeTypeID = SMaxVal.StakeTypeID
AND SMaxID.StakeValue = SMaxVal.MaxStakeValue
AND SMaxID.EventID = SMaxVal.EventID
AND SMaxID.StakeGroupTypeID = SMaxVal.StakeGroupTypeID
3) Now that we have the ID's of the rows that we want, we can just get that information.
SELECT Stakes.ID, Stakes.StakeValue, SType.StakeName, SProv.ProviderName
FROM SI_STAKES AS Stakes
INNER JOIN SI_STAKESTYPES AS SType ON Stake.[StakeTypeID] = SType.[ID]
INNER JOIN SI_PROVIDERS AS SProv ON Stake.[ProviderID] = SProv.[ID]
WHERE Stake.ID IN (
SELECT MAX(SMaxID.ID) AS ID
FROM SI_STAKES AS SMaxID
INNER JOIN (
SELECT StakeGroupTypeID, EventID, StakeTypeID, MAX(StakeValue) AS MaxStakeValue
FROM SI_STAKES
WHERE Stake.[EventID]=6
AND Stake.[StakeGroupTypeID]=1
GROUP BY StakeGroupTypeID, EventID, StakeTypeID
) AS SMaxVal ON SMaxID.StakeTypeID = SMaxVal.StakeTypeID
AND SMaxID.StakeValue = SMaxVal.MaxStakeValue
AND SMaxID.EventID = SMaxVal.EventID
AND SMaxID.StakeGroupTypeID = SMaxVal.StakeGroupTypeID
)
You can use the over clause since you're using T-SQL (hopefully 2005+):
select distinct
a.stakevalue,
max(a.stakevalue) over (partition by a.staketypeid) as maxvalue,
b.staketypeid,
c.providername
from
si_stakes a
inner join si_stakestypes b on
a.staketypeid = b.id
inner join si_providers c on
a.providerid = c.id
where
a.eventid = 6
and a.stakegrouptypeid = 1
Essentially, this will find the max a.stakevalue for each a.staketypeid. Using a distinct will return one and only one row. Now, if you wanted to include the min a.id along with it, you could use row_number to accomplish this:
select
s.id,
s.maxvalue,
s.staketypeid,
s.providername
from (
select
row_number() over (order by a.stakevalue desc
partition by a.staketypeid) as rownum,
a.id,
a.stakevalue as maxvalue,
b.staketypeid,
c.providername
from
si_stakes a
inner join si_stakestypes b on
a.staketypeid = b.id
inner join si_providers c on
a.providerid = c.id
where
a.eventid = 6
and a.stakegrouptypeid = 1
) s
where
s.rownum = 1