Full outer join joining together every record multiple times - sql

Query below:
select
cu.course_id as 'bb_course_id',
cu.user_id as 'bb_user_id',
cu.role as 'bb_role',
cu.available_ind as 'bb_available_ind',
CASE cu.row_status WHEN 0 THEN 'ENABLED' ELSE 'DISABLED' END AS 'bb_row_status',
eff.course_id as 'registrar_course_id',
eff.user_id as 'registrar_user_id',
eff.role as 'registrar_role',
eff.available_ind as 'registrar_available_ind',
CASE eff.row_status WHEN 'DISABLE' THEN 'DISABLED' END as 'registrar_row_status'
into enrollments_comparison_temp
from narrowed_users_enrollments cu
full outer join enrollments_feed_file eff on cu.course_id = eff.course_id
Quick background: I'm taking the data from a replicated table and selecting it into narrowed_users_enrollments based on some criteria. In a script I'm taking a text feed file, with enrollment data, and inserting it into enrollments_feed_file. The purpose is to compare the most recent enrollment data with enrollments already in the database.
However the issue is that joining these tables results in about 160,000 rows when I'm really only expecting about 22,000. The point of doing this comparison is so that I can look for nulled values on either side of the join. For example, if the table on the right contains a null, then disable the enrollment record. If the table on the left contains a null, then add this student's enrollment.
I know it's a little off because I'm not using PKs or FKs. This is what is selected into the table:
Here's a screenshot showing a select * from the enrollments table on the left and a feed file on the right.
http://i.imgur.com/0ZPZ9HS.png
Here's a screenshot showing the newly created table from the full outer join.
http://i.imgur.com/89ssAkS.png
As you can see even though there there's only one matching enrollment(the matching jmartinez12 columns), there's 4 extra rows created for the same record on the left for the enrollments on the right. What I'm trying to get is for it to be 5 rows, with the first being how it is in the screenshot(matching pre-existing enrollment and enrollment in the feed file), BUT, the next 4 rows with the bb_* columns should be NULL up to the registrar_course_id.
Am I overlooking something simple here? I've tried a select distinct and I've added a where clause specifying when the course_ids are equal however that ensures that I won't get null rows which I need. I have also joined the tables on the user_id however the results are still the same.

One quick suggestion is to add the DISTNCT clause. If the records you are setting are complete duplicates that may cut it down to what you are expecting.

The fix was to also join on:
ON cu.course_id = eff.course_id AND cu.user_id = eff.user_id

Related

SQL Query across two tables only show most recently updated result per tag address

I have two tables: violator_state and violator_tags
violator_state:
m_state_id
is_violating
m_translatedid
m_tag
m_violator_tag
This table holds the "tags" which has an unchanging row count of 10 in this case. The purpose is to list out each tag present, connect the full tag address (m_violator_tag) with its shorthand name (m_tag) and state whether it is in "violation". I need to use this table as reference because of the link between m_violator_tag and m_tag.
violator_tags
m_violator_id
m_eval_time_from
m_eval_time_to
m_tag
m_tag_peers
m_tag_position
This table is constantly having new rows added to it holding the information of what tags are in violation with a specific tag. So it would show T6 in violation with T1,T2,T9 ect.
I am looking to create a query which joins the two tables to show only the most recently updated (largest m_eval_time_from) for each tag.
I am using the following query to join the two tables but I expect m_translatedid and m_tag to match but they do not. Unsure why.
SELECT violator_state.m_violator_tag, violator_state.is_violating, violator_state.m_translatedid, violator_tags.m_tag, violator_tags.m_eval_time_to, violator_tags.m_tag_peers,
violator_tags.m_tag_position, violator_tags.m_eval_time_from
FROM violator_tags CROSS JOIN
violator_state
Violation_state table
violation_tags table
results of my (incorrect) query
Any suggestions on what I should try?
Your CROSS JOIN will give you a cartesian product where EVERY row in the first table is paired with ALL the rows in the second table e.g. if you have 10 rows in each, you will get 10 x 10 = 100 rows in the result! I believe you need to join the tables on the m_tag column and select the violator_tags row with the latest date. The query below should do this for you (though you haven't provided your question in a manner that makes it easy for me to double-check my code - see the link provided by a_horse_with_no_name for more on this or use a website like db-fiddle to set up your example).
SELECT vs.m_violator_tag,
vs.is_violating,
vs.m_translatedid,
vt.m_tag,
vt.m_eval_time_to,
vt.m_tag_peers,
vt.m_tag_position,
vt.m_eval_time_from
FROM violator_tags vt
JOIN violator_state vs
ON vt.m_tag = vs.m_tag
AND vt.m_eval_time_from = (SELECT MAX(vt.m_eval_time_from)
FROM violator_tags
WHERE m_tag = vt.m_tag)

Outputting the data from several sql tables without having a common value

I have a select query which combined several tables. PRODUCTION_ORDER_RESULTS, PRODUCTION_ORDERS and SERVICE_GUARANTY_NEW have common value however STOCKS table does not.
SELECT PR_ORDERS.ARRIVED_CITY,
PR_ORDERS.MONTAJ_DATE,
PR_ORDER_RESULT.TRANSFER_DATE,
PR_ORDERS.P_ORDER_ID,
PR_ORDER_RESULT.P_ORDER_ID,
SG.SALE_CONSUMER_ID,
SG.IS_SERI_SONU,
S.BRAND_ID,
S.PROPERTY
FROM workcube_test_1.PRODUCTION_ORDER_RESULTS PR_ORDER_RESULT,
workcube_test_1.PRODUCTION_ORDERS PR_ORDERS,
workcube_test_1.SERVICE_GUARANTY_NEW SG,
workcube_test_1.STOCKS S
WHERE PR_ORDER_RESULT.P_ORDER_ID = PR_ORDERS.P_ORDER_ID
AND PR_ORDER_RESULT.PR_ORDER_ID = SG.PROCESS_ID
when I run the query, it shows the output as below.
The problem here is there are four data rows returned from PRODUCTION_ORDER_RESULTS, PRODUCTION_ORDERS, SERVICE_GUARANTY_NEW and once I have added the STOCKS table, arrived_city, montaj_date, transfer_date columns are side by side with STOCKS table's rows, but the columns value should be null, not filled with data.
The way I tried is UNION of STOCKS table, however unioned table values are ignored, can not use them in html blocks.
there needs to be at least one more join condition among tables where there's for STOCKS table, I think there might exist such a column STOCK_ID within a table such as PRODUCTION_ORDER_RESULTS in order to join with STOCKS table. I think this should be the reason for multiple returning rows. If there's no common column, then the returning data will be produced as many as the number of records within STOCK table due to existing CROSS JOIN logic within the current query. So rearrange your query as
SELECT PR_ORDERS.ARRIVED_CITY,
PR_ORDERS.MONTAJ_DATE,
PR_ORDER_RESULT.TRANSFER_DATE,
PR_ORDERS.P_ORDER_ID,
PR_ORDER_RESULT.P_ORDER_ID,
SG.SALE_CONSUMER_ID,
SG.IS_SERI_SONU,
S.BRAND_ID,
S.PROPERTY
FROM workcube_test_1.PRODUCTION_ORDER_RESULTS PR_ORDER_RESULT
JOIN workcube_test_1.PRODUCTION_ORDERS PR_ORDERS
ON PR_ORDER_RESULT.P_ORDER_ID = PR_ORDERS.P_ORDER_ID
JOIN workcube_test_1.SERVICE_GUARANTY_NEW SG
ON PR_ORDER_RESULT.PR_ORDER_ID = SG.PROCESS_ID
JOIN workcube_test_1.STOCKS S
ON PR_ORDER_RESULT.STOCK_ID = S.ID

Inner joining two tables returns empty result

I am trying to get two tables that aren't related to pull a column of data each.
I have one table called AlphaData and one called TLAuth. Each includes a column that is labelled invoice and I need to pull both columns so I can at least start a comparison. TLAuth will include some of the invoice numbers from AlphaData, but not all of them.
Right now I am using the following code:
SELECT Alphadata.Invoice, TLAuth.Invoice
FROM Alphadata
INNER JOIN TlAuth
ON TLauth.TLAUthID = Alphadata.TLAUthID;
But every time I run this it comes up totally blank. There is definitely data in there, I can pull one column of data from each, but not both at the same time. I have even setup a relationship (1 to Many from TL Auth to Alphadata) and it doesn't seem to work so any help would be grand.
If the tables could not match you should use left join
SELECT Alphadata.Invoice, TLAuth.Invoice
From Alphadata
LEFT JOIN TlAuth ON TLauth.TLAUthID=Alphadata.TLAUthID;

Select distinct record with join count records

I have two tables: Company and Contact, with a relationship of one-to-many.
I have another table Track which identifies some of the companies as parent companies to other companies.
I want to write a SQL query that selects the parent companies from Track and the amount of contacts that each parent has.
SELECT Track.ParentId, Count(Contact.companyId)
FROM Track
INNER JOIN Contact
ON Track.ParentId = Contact.companyId
GROUP BY Track.ParentId
however The result holds less records than when I run the following query:
SELECT DISTINCT Track.ParentId
FROM Track
I tried the first query with an added DISTINCT and it returned the same results (less then what it was meant to).
You're performing an INNER JOIN with the Contact table, which means that any rows from the first table (Track in this case) with no matches to the JOINed table will not show up in your results. Try using a LEFT OUTER JOIN instead.
The COUNT with Contact.companyId will only count rows where there is a match (Contact.companyId is not NULL). Since you're counting contacts that's fine as they will count as 0. If you were trying to count some other set of data and tried to do a COUNT on a specific column (rather than COUNT(*)) then any NULL values in that column would not count towards your total, which might or might not be what you want.
I used an INNER JOIN which returns only records that are identical in both tables.
To return all records from Track table, and records that match in the Contact table, I need to use LEFT JOIN.

How do I remove "duplicate" rows from a view?

I have a view which was working fine when I was joining my main table:
LEFT OUTER JOIN OFFICE ON CLIENT.CASE_OFFICE = OFFICE.TABLE_CODE.
However I needed to add the following join:
LEFT OUTER JOIN OFFICE_MIS ON CLIENT.REFERRAL_OFFICE = OFFICE_MIS.TABLE_CODE
Although I added DISTINCT, I still get a "duplicate" row. I say "duplicate" because the second row has a different value.
However, if I change the LEFT OUTER to an INNER JOIN, I lose all the rows for the clients who have these "duplicate" rows.
What am I doing wrong? How can I remove these "duplicate" rows from my view?
Note:
This question is not applicable in this instance:
How can I remove duplicate rows?
DISTINCT won't help you if the rows have any columns that are different. Obviously, one of the tables you are joining to has multiple rows for a single row in another table. To get one row back, you have to eliminate the other multiple rows in the table you are joining to.
The easiest way to do this is to enhance your where clause or JOIN restriction to only join to the single record you would like. Usually this requires determining a rule which will always select the 'correct' entry from the other table.
Let us assume you have a simple problem such as this:
Person: Jane
Pets: Cat, Dog
If you create a simple join here, you would receive two records for Jane:
Jane|Cat
Jane|Dog
This is completely correct if the point of your view is to list all of the combinations of people and pets. However, if your view was instead supposed to list people with pets, or list people and display one of their pets, you hit the problem you have now. For this, you need a rule.
SELECT Person.Name, Pets.Name
FROM Person
LEFT JOIN Pets pets1 ON pets1.PersonID = Person.ID
WHERE 0 = (SELECT COUNT(pets2.ID)
FROM Pets pets2
WHERE pets2.PersonID = pets1.PersonID
AND pets2.ID < pets1.ID);
What this does is apply a rule to restrict the Pets record in the join to to the Pet with the lowest ID (first in the Pets table). The WHERE clause essentially says "where there are no pets belonging to the same person with a lower ID value).
This would yield a one record result:
Jane|Cat
The rule you'll need to apply to your view will depend on the data in the columns you have, and which of the 'multiple' records should be displayed in the column. However, that will wind up hiding some data, which may not be what you want. For example, the above rule hides the fact that Jane has a Dog. It makes it appear as if Jane only has a Cat, when this is not correct.
You may need to rethink the contents of your view, and what you are trying to accomplish with your view, if you are starting to filter out valid data.
So you added a left outer join that is matching two rows? OFFICE_MIS.TABLE_CODE is not unique in that table I presume? you need to restrict that join to only grab one row. It depends on which row you are looking for, but you can do something like this...
LEFT OUTER JOIN OFFICE_MIS ON
OFFICE_MIS.ID = /* whatever the primary key is? */
(select top 1 om2.ID
from OFFICE_MIS om2
where CLIENT.REFERRAL_OFFICE = om2.TABLE_CODE
order by om2.ID /* change the order to fit your needs */)
If the secondd row has one different value than it is not really duplicate and should be included.
Instead of using DISTINCT, you could use a GROUP BY.
Group by all the fields that you want to be returned as unique values.
Use MIN/MAX/AVG or any other function to give you one result for fields that could return multiple values.
Example:
SELECT Office.Field1, Client.Field1, MIN(Office.Field1), MIN(Client.Field2)
FROM YourQuery
GROUP BY Office.Field1, Client.Field1
You could try using Distinct Top 1 but as Hunter pointed out, if there is if even one column is different then it should either be included or if you don't care about or need the column you should probably remove it. Any other suggestions would probably require more specific info.
EDIT: When using Distinct Top 1 you need to have an appropriate group by statement. You would really be using the Top 1 part. The Distinct is in there because if there is a tie for Top 1 you'll get an error without having some way to avoid a tie. The two most common ways I've seen are adding Distinct to Top 1 or you could add a column to the query that is unique so that sql would have a way to choose which record to pick in what would otherwise be a tie.