Multiple rows from Left Join in SQL were rows are uniquely matched - sql

I have two views that I am trying to join. I am joining on three elements, date, case number and surgeon id number. Each should only have one match for the previous case out value, but I am getting multiple rows after my left join.
Here is my code:
CREATE VIEW [dbo].[OR]
AS
SELECT DISTINCT
[ID].*,
[BYSURG].[PREV_PAT_OUT] AS PrevPtOut
FROM
[dbo].[OR_LOG_INDEXED] [ID]
LEFT JOIN
[DBO].[OR_CASE_NUM] BYSURG ON [ID].[SURG_DT] = [BYSURG].[SURG_DT]
AND [ID].[SURGEON_ID] = [BYSURG].[SURGEON_ID]
AND [ID].[CASE_NUM_BY_ROOM] = [BYSURG].[CASE_NUM_BY_ROOM_ADJ]
Any insights are much appreciated.
Thanks!
M

Replace your select block with one that retrieves all columns:
SELECT
*
FROM
[dbo].[OR_LOG_INDEXED] [ID]
LEFT JOIN
[DBO].[OR_CASE_NUM] BYSURG ON [ID].[SURG_DT] = [BYSURG].[SURG_DT]
AND [ID].[SURGEON_ID] = [BYSURG].[SURGEON_ID]
AND [ID].[CASE_NUM_BY_ROOM] = [BYSURG].[CASE_NUM_BY_ROOM_ADJ]
Run it and look at your "duplicate" rows - something about them will no longer be a duplicate - perhaps you've forgotten to include some other criteria in your where clause
Putting DISTINCT in the select block is not the answer - find out what data element about the "duplicate" rows is different and then filter out the rows you don't want

Related

Query with Left outer join and group by returning duplicates

To begin with, I have a table in my db that is fed with SalesForce info. When I run this example query it returns 2 rows:
select * from SalesForce_INT_Account__c where ID_SAP_BAYER__c = '3783513'
When I run this next query on the same table I obtain one of the rows, which is what I need:
SELECT MAX(ID_SAP_BAYER__c) FROM SalesForce_INT_Account__c where ID_SAP_BAYER__c = '3783513' GROUP BY ID_SAP_BAYER__c
Now, I have another table (PedidosEspecialesZarateCabeceras) which has a field (NroClienteDireccionEntrega) that I can match with the field I've been using in the SalesForce table (ID_SAP_BAYER__c). This table has a key that consists of just 1 field (NroPedido).
What I need to do is join these 2 tables to obtain a row from PedidosEspecialesZarateCabeceras with additional fields coming from the SalesForce table, and in case those additional fields are not available, they should come as NULL values, so for that im using a LEFT OUTER JOIN.
The problem is, since I have to match NroClienteDireccionEntrega and ID_SAP_BAYER__c and there's 2 rows in the salesforce table with the same ID_SAP_BAYER__c, my query returns 2 duplicate rows from PedidosEspecialesZarateCabeceras (They both have the same NroPedido).
This is an example query that returns duplicates:
SELECT
cab.CUIT AS CUIT,
convert(nvarchar(4000), cab.NroPedido) AS NroPedido,
sales.BillingCity__c as Localidad,
sales.BillingState__c as IdProvincia,
sales.BillingState__c_Desc as Provincia,
sales.BillingStreet__c as Calle,
sales.Billing_Department__c as Distrito,
sales.Name as RazonSocial,
cab.NroCliente as ClienteId
FROM PedidosEspecialesZarateCabeceras AS cab WITH (NOLOCK)
LEFT OUTER JOIN
SalesForce_INT_Account__c AS sales WITH (NOLOCK) ON
cab.NroClienteDireccionEntrega = sales.ID_SAP_BAYER__c
and sales.ID_SAP_BAYER__c in
( SELECT MAX(ID_SAP_BAYER__c)
FROM SalesForce_INT_Account__c
GROUP BY ID_SAP_BAYER__c
)
WHERE cab.NroPedido ='5320'
Even though the join has MAX and Group By, this returns 2 duplicate rows with different SalesForce information (Because of the 2 salesforce rows with the same ID_SAP_BAYER__c), which should not be possible.
What I need is for the left outer join in my query to pick only ONE of the salesforce rows to prevent duplication like its happening right now. For some reason the select max with the group by is not working.
Maybe I should try to join this tables in a different way, can anyone give me some other ideas on how to join the two tables to return just 1 row? It doesnt matter if the SalesForce row that gets picked out of the 2 isn't the correct one, I just need it to pick one of them.
Your IN clause is not actually doing anything, since...
SELECT MAX(ID_SAP_BAYER__c)
FROM SalesForce_INT_Account__c
GROUP BY ID_SAP_BAYER__c
... returns all possible IDSAP_BAYER__c values. (The GROUP BY says you want to return one row per unique ID_SAP_BAYER__c and then, since your MAX is operating on exactly one unique value per group, you simply return that value.)
You will want to change your query to operate on a value that is actually different between the two rows you are trying to differentiate (probably the MAX(ID) for the relevant ID_SAP_BAYER__c). Plus, you will want to link that inner query to your outer query.
You could probably do something like:
...
LEFT OUTER JOIN
SalesForce_INT_Account__c sales
ON cab.NroClienteDireccionEntrega = sales.ID_SAP_BAYER__c
and sales.ID in
(
SELECT MAX(ID)
FROM SalesForce_INT_Account__c sales2
WHERE sales2.ID_SAP_BAYER__c = cab.NroClienteDireccionEntrega
)
WHERE cab.NroPedido ='5320'
By using sales.ID in ... SELECT MAX(ID) ... instead of sales.ID_SAP_BAYER__c in ... SELECT MAX(ID_SAP_BAYER__c) ... this ensures you only match one of the two rows for that ID_SAP_BAYER__c. The WHERE sales2.ID_SAP_BAYER__c = cab.NroClienteDireccionEntrega condition links the inner query to the outer query.
There are multiple ways of doing the above, especially if you don't care which of the relevant rows you match on. You can use the above as a starting point and make it match your preferred style.
An alternative might be to use OUTER APPLY with TOP 1. Something like:
SELECT
...
FROM PedidosEspecialesZarateCabeceras AS cab
OUTER APPLY(
SELECT TOP 1 *
FROM SalesForce_INT_Account__c s1
WHERE cab.NroClienteDireccionEntrega = s1.ID_SAP_BAYER__c
) sales
WHERE cab.NroPedido ='5320'
Without an ORDER BY the match that TOP 1 chooses will be arbitrary, but I think that's what you want anyway. (If not, you could add an ORDER BY).

JOIN results in too many rows

I would be super happy if I get help for this problem. Thank you in advance.
Table #1: station_temporar_con_station has 5984 rows, and 7 columns as seen in the screenshot:
ID_stations, latitude, longitude, connection_coord_city_type_coordinates_text, type_of_stations, ID_city
SQL_station_temporar_con_station
Table #2: air_quality_temporar has 11946 rows and 13 columns as seen in this screenshot:
table air_quality_temporar
Now I should have a table with all the 11946 rows from air_quality_temporar supplemented with the column connection_coord_city_type from station_temporar_con_station.
What I've tried so far:
Solution #1:
SELECT
ID_measurement, ID_stations,
station_temporar_con_station.latitude,
station_temporar_con_station.longitude,
station_temporar_con_station.connection_coord_city_type,
station_temporar_con_station.coordinates_text,
type_of_stations, ID_city
FROM
station_temporar_con_station
JOIN
air_quality_temporar ON station_temporar_con_station.coordinates_text = air_quality_temporar.coordinates_text;
But this JOIN results in 14'377 rows instead of 11'946 rows.
Solution #2:
SELECT
reference, pm25, PM10, latitude, longitude,
(SELECT connection_coord_city_type
FROM station_temporar_con_station),
conc_pm25, conc_pm10, year, pm10_type, pm25_type, date_compiled
FROM
air_quality_temporar;
But only the first value from connection_cord_city_type is filled in, because the DB does not know what it should assign where.
Does anyone have any input or a solution?
You should try to avoid duplicate connections and join only unique data. I added two latitude and longitude fields to the join below.
I also used the left join and the air_quality_temporar table in put left table To recover 11946 rows.
SELECT ID_measurement, ID_stations,
station_temporar_con_station.latitude,
station_temporar_con_station.longitude,
station_temporar_con_station.connection_coord_city_type,
station_temporar_con_station.coordinates_text, type_of_stations, ID_city
FROM air_quality_temporar
LEFT JOIN station_temporar_con_station
ON air_quality_temporar.coordinates_text = station_temporar_con_station.coordinates_text
AND air_quality_temporar.latitude = station_temporar_con_station.latitude AND
air_quality_temporar.longitude = station_temporar_con_station.longitude
You have duplicates in your tables. The one of interest is station_temporar_con_station.
To find the duplicates, use:
SELECT coordinates_text, MIN(connection_coord_city_type), MAX(connection_coord_city_type)
FROM station_temporar_con_station
GROUP BY coordinates_text;
Then you need to figure out what to do. I would suggest fixing the data.
If you just want to get any matching row in the query, you can use window functions:
SELECT aqt.*, stcs.*
FROM air_quality_temporar aqt LEFT JOIN
(SELECT stcs.*,
ROW_NUMBER() OVER (PARTITION BY coordinates_text ORDER BY coordinates_text) as seqnum
FROM station_temporar_con_station
) stcs
ON stcs.coordinates_text = aqt.coordinates_text AND
stcs.seqnum = 1;
Note that this returns an arbitrary row when there are duplicates. I also replaced the JOIN with LEFT JOIN. The duplicate rows might be hiding the fact that some rows have no matches.

SQL: Combine two tables, into same columns, based on relation table

Not being well versed in complex SQL, I am trying to figure out how I can write a query that will return (almost) the same columns from two tables, based on a "relationship" table. I have tried using UNION, but the number of columns are different between the three tables. I also tried IF...ELSE, but could not get that to function. I have also looked at INCLUDE and EXCLUDE.
Here is my current query:
SELECT
/* Relation Table */
[data_Related_Asset].[ID_Related_Asset]
,[data_Related_Asset].[BIOMED_Tag]
,[data_Related_Asset].[Related_BIOMED_Tag]
/* Lab Table */
,[data_Lab_Asset].[Room]
,[Lab_Area].[Work_Area]
,[data_Lab_Asset].[Pet_Name_Bench]
,[data_Lab_Asset].[BGL_ID]
,[data_Lab_Asset].[BIOMED_Tag] AS LAB_BIOMED
,[data_Lab_Asset].[Endpoint_Tag]
,[Lab_Class].[Class]
,[Lab_Class].[Subclass]
,[Lab_Class].[Subcategory]
/* IT Table */
,[data_IT_Asset].[Room]
,[IT_Area].[Work_Area]
,[data_IT_Asset].[Bench_Instrument]
,[data_IT_Asset].[BIOMED_Tag] AS IT_BIOMED
,[data_IT_Asset].[Endpoint_Tag]
,[IT_Class].[Class]
,[IT_Class].[Subclass]
,[IT_Class].[Subcategory]
FROM [data_Related_Asset]
LEFT JOIN [data_Lab_Asset] ON [data_Lab_Asset].[BIOMED_Tag] = [data_Related_Asset].[Related_BIOMED_Tag]
LEFT JOIN [data_IT_Asset] ON [data_IT_Asset].[BIOMED_Tag] = [data_Related_Asset].[Related_BIOMED_Tag]
LEFT JOIN [tbl_Class] Lab_Class ON [Lab_Class].[ID_Class] = [data_Lab_Asset].[Class_ID]
LEFT JOIN [tbl_Class] IT_Class ON [IT_Class].[ID_Class] = [data_IT_Asset].[Class_ID]
LEFT JOIN [tbl_Work_Area] Lab_Area ON [Lab_Area].[ID_Work_Area] = [data_Lab_Asset].[Work_Area_ID]
LEFT JOIN [tbl_Work_Area] IT_Area ON [IT_Area].[ID_Work_Area] = [data_IT_Asset].[Work_Area_ID]
ORDER BY ID_Related_Asset
The query is being used in a custom app and is set up to search for an "ID" in the [data_Related_Asset].[BIOMED_Tag] column, and return all [Related_BIOMED_Tag] records.
When I run the above query I get all the results I need, but across a lot of columns. If the item being return is in the LAB table, then the LAB_Asset columns are populated, but the IT_Asset columns are all NULL. And if the item is in the IT table, the opposite is true - the LAB_Asset columns are all NULL and the IT_Asset columns are populated. For example, below you can see where rows 2 & 12 returned the IT_Asset information.
I'd like to be able to return everything in the same set of NINE columns to condense the viewed table. (Room, Work_Area, Bench, BGL_ID, BIOMED_Tag, Endpoint_Tag, Class, Subclass, Subcategory) For example, below you can see where I moved the info from the IT_Asset table over to the first columns.
I'm sure I'm missing a simple solution/function here. Any help is greatly appreciated!
You can use UNION but you just have to ensure that you have the same columns in the same order in each statement being union'd.
So for missing columns just use nulls (or any suitable dummy data) e.g.
SELECT col1, col2, null, col4
from tableA
UNION
SELECT col1, null, col3, null
from tableB

How do I write an SQL query to identify duplicate values in a specific field?

This is the table I'm working with:
I would like to identify only the ReviewIDs that have duplicate deduction IDs for different parameters.
For example, in the image above, ReviewID 114 has two different parameter IDs, but both records have the same deduction ID.
For my purposes, this record (ReviewID 114) has an error. There should not be two or more unique parameter IDs that have the same deduction ID for a single ReviewID.
I would like write a query to identify these types of records, but my SQL skills aren't there yet. Help?
Thanks!
Update 1: I'm using TSQL (SQL Server 2008) if that helps
Update 2: The output that I'm looking for would be the same as the image above, minus any records that do not match the criteria I've described.
Cheers!
SELECT * FROM table t1 INNER JOIN (
SELECT review_id, deduction_id FROM table
GROUP BY review_id, deduction_id
HAVING COUNT(parameter_id) > 1
) t2 ON t1.review_id = t2.review_id AND t1.deduction_id = t2.deduction_id;
http://www.sqlfiddle.com/#!3/d858f/3
If it is possible to have exact duplicates and that is ok, you can modify the HAVING clause to COUNT(DISTINCT parameter_id).
Select ReviewID, deduction_ID from Table
Group By ReviewID, deduction_ID
Having count(ReviewID) > 1
http://www.sqlfiddle.com/#!3/6e113/3 has an example
If I understand the criteria: For each combination of ReviewID and deduction_id you can have only one parameter_id and you want a query that produces a result without the ReviewIDs that break those rules (rather than identifying those rows that do). This will do that:
;WITH review_errors AS (
SELECT ReviewID
FROM test
GROUP BY ReviewID,deduction_ID
HAVING COUNT(DISTINCT parameter_id) > 1
)
SELECT t.*
FROM test t
LEFT JOIN review_errors r
ON t.ReviewID = r.ReviewID
WHERE r.ReviewID IS NULL
To explain: review_errors is a common table expression (think of it as a named sub-query that doesn't clutter up the main query). It selects the ReviewIDs that break the criteria. When you left join on it, it selects all rows from the left table regardless of whether they match the right table and only the rows from the right table that match the left table. Rows that do not match will have nulls in the columns for the right-hand table. By specifying WHERE r.ReviewID IS NULL you eliminate the rows from the left hand table that match the right hand table.
SQL Fiddle

Use two DISTINCT statements in SQL

I have combined two different tables together, one side is named DynDom and the other is CATH. I am trying to remove duplicates from that table such as below:
However, if i select distinct Dyndom pdbcode from the table, it returns distinct values of that pdbcode.
and
Based on the pictures above, I commented out the DynDom/CATH columns in the table and ran the query separately for DynDom/CATH and it returned those values accordingly, which is what i need and i was wondering if it's possible for me to use 2 distinct statements to return distinct values of the entire table based on the pdbcode.
Here's my code :
select DISTINCT
cath_dyndom_table_2."DYNDOM_DOMAINID",
cath_dyndom_table_2."DYNDOM_DSTART",
cath_dyndom_table_2."DYNDOM_DEND",
cath_dyndom_table_2."DYNDOM_CONFORMERID",
cath_dyndom_table_2.pdbcode,
cath_dyndom_table_2."DYNDOM_ChainID",
cath_dyndom_table_2.cath_pdbcode,
cath_dyndom_table_2."CATH_BEGIN",
cath_dyndom_table_2."CATH_END"
from
cath_dyndom_table_2
where
pdbcode = '2hun'
order by
cath_dyndom_table_2."DYNDOM_DOMAINID",
cath_dyndom_table_2."DYNDOM_DSTART",
cath_dyndom_table_2."DYNDOM_DEND",
cath_dyndom_table_2.pdbcode,
cath_dyndom_table_2.cath_pdbcode,
cath_dyndom_table_2."CATH_BEGIN",
cath_dyndom_table_2."CATH_END";
In the end, i would like to search domains from DynDom and CATH, based on the pdbcode and return the rows without having duplicate values.
Thank you.
UPDATE :
This is my VIEW table that i have done.
CREATE VIEW cath_dyndom_table AS
SELECT
r.domainid AS "DYNDOM_DOMAINID",
r.DomainStart AS "DYNDOM_DSTART",
r.Domain_End AS "DYNDOM_DEND",
r.ddid AS "DYN_DDID",
r.confid AS "DYNDOM_CONFORMERID",
r.pdbcode,
r.chainid AS "DYNDOM_ChainID",
d.cath_pdbcode,
d.cathbegin AS "CATH_BEGIN",
d.cathend AS "CATH_END"
FROM dyndom_domain_table r
FULL OUTER JOIN cath_domains d ON d.cath_pdbcode::character(4) = r.pdbcode
ORDER BY confid ASC;
What you are getting is the cartesian product of the ´two tables`.
In order to get one line without duplicates you need to have to have a 1-to-1 relation between both tables.
You can see HERE what are cartesian joins and HERE how to avoid them!
It sounds as though you want a UNION of domain name and ranges from each table - this can be achieved like so:
SELECT DYNDOM_DOMAINID, DYNDOM_DSTART, DYNDOM_DEND
FROM DynDom
UNION
SELECT RTRIM(cath_pdbcode), CATH_BEGIN, CATH_END
FROM CATH
This should eliminate exact duplicates (ie. where the domain name, start and end are all identical) but will not eliminate duplicate domain names with different ranges - if these exist you will need to decide how to handle them (retain them as separate entries, combine them with lowest start and highest end, or whatever other option is preferred).
EDIT: Actually, I believe you can get the desired results simply by changing the JOIN ON condition in your view to be:
FULL OUTER JOIN cath_domains d
ON d.cath_pdbcode::character(5) = r.pdbcode || r.chainid AND
r.DomainStart <= d.cathbegin AND
r.Domain_End >= d.cathend