JOIN results in too many rows - sql

I would be super happy if I get help for this problem. Thank you in advance.
Table #1: station_temporar_con_station has 5984 rows, and 7 columns as seen in the screenshot:
ID_stations, latitude, longitude, connection_coord_city_type_coordinates_text, type_of_stations, ID_city
SQL_station_temporar_con_station
Table #2: air_quality_temporar has 11946 rows and 13 columns as seen in this screenshot:
table air_quality_temporar
Now I should have a table with all the 11946 rows from air_quality_temporar supplemented with the column connection_coord_city_type from station_temporar_con_station.
What I've tried so far:
Solution #1:
SELECT
ID_measurement, ID_stations,
station_temporar_con_station.latitude,
station_temporar_con_station.longitude,
station_temporar_con_station.connection_coord_city_type,
station_temporar_con_station.coordinates_text,
type_of_stations, ID_city
FROM
station_temporar_con_station
JOIN
air_quality_temporar ON station_temporar_con_station.coordinates_text = air_quality_temporar.coordinates_text;
But this JOIN results in 14'377 rows instead of 11'946 rows.
Solution #2:
SELECT
reference, pm25, PM10, latitude, longitude,
(SELECT connection_coord_city_type
FROM station_temporar_con_station),
conc_pm25, conc_pm10, year, pm10_type, pm25_type, date_compiled
FROM
air_quality_temporar;
But only the first value from connection_cord_city_type is filled in, because the DB does not know what it should assign where.
Does anyone have any input or a solution?

You should try to avoid duplicate connections and join only unique data. I added two latitude and longitude fields to the join below.
I also used the left join and the air_quality_temporar table in put left table To recover 11946 rows.
SELECT ID_measurement, ID_stations,
station_temporar_con_station.latitude,
station_temporar_con_station.longitude,
station_temporar_con_station.connection_coord_city_type,
station_temporar_con_station.coordinates_text, type_of_stations, ID_city
FROM air_quality_temporar
LEFT JOIN station_temporar_con_station
ON air_quality_temporar.coordinates_text = station_temporar_con_station.coordinates_text
AND air_quality_temporar.latitude = station_temporar_con_station.latitude AND
air_quality_temporar.longitude = station_temporar_con_station.longitude

You have duplicates in your tables. The one of interest is station_temporar_con_station.
To find the duplicates, use:
SELECT coordinates_text, MIN(connection_coord_city_type), MAX(connection_coord_city_type)
FROM station_temporar_con_station
GROUP BY coordinates_text;
Then you need to figure out what to do. I would suggest fixing the data.
If you just want to get any matching row in the query, you can use window functions:
SELECT aqt.*, stcs.*
FROM air_quality_temporar aqt LEFT JOIN
(SELECT stcs.*,
ROW_NUMBER() OVER (PARTITION BY coordinates_text ORDER BY coordinates_text) as seqnum
FROM station_temporar_con_station
) stcs
ON stcs.coordinates_text = aqt.coordinates_text AND
stcs.seqnum = 1;
Note that this returns an arbitrary row when there are duplicates. I also replaced the JOIN with LEFT JOIN. The duplicate rows might be hiding the fact that some rows have no matches.

Related

Concatenate ALL values from 2 tables using SQL?

I am trying to use SQL to create a table that concatenates all dates from a specific range to all items in another table. See image for an example.
I have a solution where I can create a column of "null" values in both tables and join on that column but wondering if there is a more sophisticated approach to doing this.
Example image
I've tried the following:
Added a constant value to each table
Then I joined the 2 tables on that constant value so that each row matched each row of both tables.
This got the intended result but I'm wondering if there's a better way to do this where I don't have to add the constant values:
SELECT c.Date_,k.user_email
FROM `operations-div-qa.7_dbtCloud.calendar_table_hours_st` c
JOIN `operations-div-qa.7_dbtCloud.table_key` k
ON c.match = k.match
ORDER BY Date_,user_email asc
It's not a concatenation in the image given, Its a join
select t1.dates Date ,t2.name Person
from table t1,table t2;
Cross join should work for you:
It joins every row from both tables with each other. Use this when there is no relationship between the tables.
Did not test so syntax may be slightly off.
SELECT c.Date_,k.user_email
FROM `operations-div-qa.7_dbtCloud.calendar_table_hours_st` c
CROSS JOIN `operations-div-qa.7_dbtCloud.table_key` k
ORDER BY Date_,user_email asc

table with duplicate id inner join table no duplicate id

in the last select result i see duplicate id . how to remove please the duplicate . see the attached picture
3 select query
While your sample data does not allow a 100% accurate answer, here are some guidelines that will hopefully help you.
To avoid duplicates in a join when no column can be used to uniquely identify the duplicate records, I would suggest to suppress them in a subquery, and then join the subquery with the other table.
As you seem to have true duplicates, meaning all columns that you retrive from the child table have identical values, then DISTINCT should to the trick :
SELECT i.*, c.TELEPHONE_NUM
FROM [dbo].[GT_Import] AS i
JOIN (
SELECT DISTINCT TELEPHONE_NUM
FROM [dbo].[ComplainSubscriber_Import]
) AS c ON c.TELEPHONE_NUM = i.TELEPHONE_NUM
WHERE ...
You can add more columns to the subquery, as long as they do have duplicated values.

Multiple rows from Left Join in SQL were rows are uniquely matched

I have two views that I am trying to join. I am joining on three elements, date, case number and surgeon id number. Each should only have one match for the previous case out value, but I am getting multiple rows after my left join.
Here is my code:
CREATE VIEW [dbo].[OR]
AS
SELECT DISTINCT
[ID].*,
[BYSURG].[PREV_PAT_OUT] AS PrevPtOut
FROM
[dbo].[OR_LOG_INDEXED] [ID]
LEFT JOIN
[DBO].[OR_CASE_NUM] BYSURG ON [ID].[SURG_DT] = [BYSURG].[SURG_DT]
AND [ID].[SURGEON_ID] = [BYSURG].[SURGEON_ID]
AND [ID].[CASE_NUM_BY_ROOM] = [BYSURG].[CASE_NUM_BY_ROOM_ADJ]
Any insights are much appreciated.
Thanks!
M
Replace your select block with one that retrieves all columns:
SELECT
*
FROM
[dbo].[OR_LOG_INDEXED] [ID]
LEFT JOIN
[DBO].[OR_CASE_NUM] BYSURG ON [ID].[SURG_DT] = [BYSURG].[SURG_DT]
AND [ID].[SURGEON_ID] = [BYSURG].[SURGEON_ID]
AND [ID].[CASE_NUM_BY_ROOM] = [BYSURG].[CASE_NUM_BY_ROOM_ADJ]
Run it and look at your "duplicate" rows - something about them will no longer be a duplicate - perhaps you've forgotten to include some other criteria in your where clause
Putting DISTINCT in the select block is not the answer - find out what data element about the "duplicate" rows is different and then filter out the rows you don't want

Number of Records don't match when Joining three tables

Despite going through every material I could possibly find on the internet, I haven't been able to solve this issue myself. I am new to MS Access and would really appreciate any pointers.
Here's my problem - I have three tables
Source1084 with columns - Department, Sub-Dept, Entity, Account, +few more
R12CAOmappingTable with columns - Account, R12_Account
Table4 with columns - R12_Account, Department, Sub-Dept, Entity, New Dept, LOB +few more
I have a total of 1084 records in Source and the result table must also contain 1084 records. I need to draw a table with all the columns from Source + R12_account from R12CAOmappingTable + all columns from Table4.
Here is the query I wrote. This yields the right columns but gives me more or less number of records with interchanging different join options.
SELECT rmt.r12_account,
srb.version,
srb.fy,
srb.joblevel,
srb.scenario,
srb.department,
srb.[sub-department],
srb.[job function],
srb.entity,
srb.employee,
table4.lob,
table4.product,
table4.newacct,
table4.newdept,
srb.[beg balance],
srb.jan,
srb.feb,
srb.mar,
srb.apr,
srb.may,
srb.jun,
srb.jul,
srb.aug,
srb.sep,
srb.oct,
srb.nov,
srb.dec,
rmt.r12_account
FROM (source1084 AS srb
LEFT JOIN r12caomappingtable AS rmt
ON srb.account = rmt.account)
LEFT JOIN table4
ON ( srb.department = table4.dept )
AND ( srb.[sub-department] = table4.subdept )
AND ( srb.entity = table4.entity )
WHERE ( ( ( srb.[sub-department] ) = table4.subdept )
AND ( ( srb.entity ) = table4.entity )
AND ( ( rmt.r12_account ) = table4.r12_account ) );
In this simple example, Table1 contains 3 rows with unique fld1 values. Table2 contains one row, and the fld1 value in that row matches one of those in Table1. Therefore this query returns 3 rows.
SELECT *
FROM
Table1 AS t1
LEFT JOIN Table2 AS t2
ON t1.fld1 = t2.fld1;
However if I add the WHERE clause as below, that version of the query returns only one row --- the row where the fld1 values match.
SELECT *
FROM
Table1 AS t1
LEFT JOIN Table2 AS t2
ON t1.fld1 = t2.fld1
WHERE t1.fld1 = t2.fld1;
In other words, that WHERE clause counteracts the LEFT JOIN because it excludes rows where t2.fld1 is Null. If that makes sense, notice that second query is functionally equivalent to this ...
SELECT *
FROM
Table1 AS t1
INNER JOIN Table2 AS t2
ON t1.fld1 = t2.fld1;
Your situation is similar. I suggest you first eliminate the WHERE clause and confirm this query returns at least your expected 1084 rows.
SELECT Count(*) AS CountOfRows
FROM (source1084 AS srb
LEFT JOIN r12caomappingtable AS rmt
ON srb.account = rmt.account)
LEFT JOIN table4
ON ( srb.department = table4.dept )
AND ( srb.[sub-department] = table4.subdept )
AND ( srb.entity = table4.entity );
After you get the query returning the correct number of rows, you can alter the SELECT list to return the columns you want. But the columns aren't really the issue until you can get the correct rows.
Without knowing your tables values it is hard to give a complete answer to your question. The issue that is causing you a problem based on how you described it. Is more then likely based on the type of joins you are using.
The best way I found to understand what type of joins you should be using would referencing a Venn diagram explaining the different type of joins that you can use.
Jeff Atwood also has a really good explanation of SQL joins on his site using the above method as well.
Best to just use the query builder. Drop in your main table. Choose the columns you want. Now for any of the other lookup values then simply drop in the other tables, draw the join line(s), double click and use a left join. You can do this for 2 or 30 columns that need to "grab" or lookup other values from other tables. The number of ORIGINAL rows in the base table returned should ALWAYS remain the same.
So just use the query builder and follow the above.
The problem with your posted SQL is you NESTED the joins inside (). Don't do that. (or let the query builder do this for you – they tend to be quite messy but will also work).
Just use this:
FROM source1084 AS srb
LEFT JOIN r12caomappingtable AS rmt
ON srb.account = rmt.account
LEFT JOIN table4
ON ( srb.department = table4.dept )
AND ( srb.[sub-department] = table4.subdept )
AND ( srb.entity = table4.entity )
As noted, I don't see why you are "repeating" the conditions again in the where clause.

Different count results with join

I have that sql:
SELECT DISTINCT
count(KTT)
FROM
TRA.EVENT;
it returns me a number of 1901335.
Now I want to expand the sql with a join like this:
SELECT DISTINCT
count(E.KTT)
FROM
TRA.EVENT E
LEFT JOIN TRA.TMP_BNAME TBN ON E.KTT = TBN.KTT_DEF;
But here I have a result of 1942376.
I dont understand why? I expect also a result of 1901335. I thought I easily join the values from TBN based on the entries of EVENT?
EDIT
SELECT DISTINCT
E.KTT,
TB.B_BEZEICHNER
FROM
TRA.EVENT E
LEFT JOIN TRA.TMP_BNAME TBN ON E.KTT = TBN.KTT_DEF
LEFT JOIN TRA.TMP_B TB ON TBN.B_ID = TB.B_ID;
What I am doing wrong?
Thx for your help.
Stefan
You have not provided full details so treat those comments as general ones.
When you join 2 tables, it may happen that it can create "duplicate" rows from one table. In your instance, there may be more than 1 record with the same KTT_DEF in TRA.TMP_BNAME table. When you join that to TRA.EVENT table, it create more than one record for each original record in TRA.EVENT table.
You may choose to count the distinct values of KTT from TRA.EVENT and use DISTINCT keyword but you need to put it into the COUNT: SELECT COUNT(DISTINCT E.KTT). This will work provided that your values are actually unique. If they are not, the count will be different from the first query.
You want to count the distinct KTT?
Then your code is wrong. You have to use:
SELECT count(DISTINCT KTT)
FROM TRA.EVENT;
You get different count because you count every row. Not the distinct ones. And because the join add more rows to the query thats why you get a bigger number.
Try this:
SELECT COUNT(DISTINCT E.KTT)
FROM TRA.EVENT E
LEFT JOIN TRA.TMP_BNAME TBN ON E.KTT = TBN.KTT_DEF;