Oracle Different row counts using Join and without Join - sql

I have an Oracle DB and use this query below to fetch records for a requirement. Five columns from three tables and a where condition.
select un.name, he.emp_no, he.lname, hr.in_unit, hr.out_unit
from hr_employee he
inner join hr_roster hr on he.eid = hr.eid
inner join units un on he.unit = un.unit_code
where hr.unit_date = to_date( '24-JUL-20','dd-MON-yy')
Later on I realize that if used in this way below, without Joins it is slightly faster.
select un.name, he.emp_no, he.lname, hr.in_unit, hr.out_unit
from hr_employee he, hr_roster hr, units un
where hr.unit_date = to_date( '24-JUL-20','dd-MON-yy')
But I notice that there's a difference of the rows getting fetched comparing the queries above.
When I took a row count of both queries, the one using Joins returns 1012 and the other one keeps fetching without a count.
I am bit confused and do not know which query is the most suitable to use.

The Second query treats as a CROSS JOIN, since there's no respective join conditions among those tables' columns, just exists a restriction due to a certain date, while the first one has a standard inner joins among tables with regular INNER JOIN conditions.

The second query is basically incorrect as does not have join conditions on the second and 3rd table, except for a limitation on a date for the first table only. So it basically produces a cartesian product of the selected records from 1rst table times ALL records on 2nd table times ALL records on 3rd table.
The first query, which looks more correct, produces the selected records on 1rst table times the records on 2nd table joined by he.eid = hr.eid times the records on 3rd table joined by he.unit = un.unit_code

Related

Newbie to SQL I have run the the inner join query but result comes up with columns only

I have run this query in adventureworks but the result is run successfully but i only get the columns instead of the data with columns how so?
select
a.BusinessEntityID,b.bonus,b.SalesLastYear
from
[Sales].[SalesPersonQuotaHistory] a
inner join
[Sales].[SalesPerson] b
on
a.SalesQuota = b.SalesQuota
My best guess is that instead of joining the tables on SalesQuota, you should be joining them on something else - An ID field, typically.
I don't have Adventureworks here, but judging from the names of the tables and the columns that you've provided, I would assume that there's a SalesPersonID field of some sort that actually connects a Salesperson's quota history to the Salesperson him/herself.
I would expect that you're looking for something closer to this:
SELECT
a.BusinessEntityID
,b.bonus
,b.SalesLastYear
FROM [Sales].[SalesPersonQuotaHistory] a
INNER JOIN [Sales].[SalesPerson] b
ON a.SalesPersonID = b.SalesPersonID
General Knowledge:
INNER JOIN means "Show me only entries (rows) that have a matching value on both sides of the condition." (i.e. The value in Table A matches the value in Table B).
So ON a.SalesQuota = b.SalesQuota means "Only where the value of SalesQuota in Table A matches the value of SalesQuota in Table B."
I'm not sure what the purpose of this query could be, since it is entirely possible that two salespeople have the same values in both tables, and then you would get duplicate rows (because the values of SalesQuota would match in both cases), or that the values wouldn't match at all, and then you wouldn't get any rows - I suspect that is what's happening to you.
Consider the conditions of what you're trying to join. Are you really trying to join quota amounts, or are you trying to retrieve quota information for specific salespeople? The answer should help guide your JOIN conditions.

Querying on tables, in different databases, on same server

I ran two queries that I came across and they both seemed to have returned different results.
SELECT *
FROM [Log].[dbo].[LogTable] AS MainLog, [Archive].[dbo].[LogTable]
ORDER BY [Log].[dbo].[LogTable] DESC;
SELECT *
FROM [Log].[dbo].[LogTable]
UNION ALL
SELECT *
FROM [Archive].[dbo].[LogTable]
ORDER BY [Log].[dbo].[LogTable] DESC;
The second query returned the correct number of rows. It also ordered correctly too. The first query brought back a lot of rows, what exactly did the first query do? It didn't error and it did combine the data.
Without specifying how and what columns to join on, you are performing a cross-join. The second query is selecting the records from each table and concatenating the results together.
What you are getting in query one is called Cartesian product. Let's read your query step by step.
Access two databases.
Join LogTable in Archive and Log database.
No Where clause or Join on Condition makes way for Cross join
Cross Join means if you have a,b,c in Log and 1,2,3 in Archive DB. O/P will be a1,a2,a3,b1,b2,b3,c1,c2,c3. (of course row wise)
Where as in second query you are just writing o/p of two statements one after the other. a,b,c,1,2,3

Why does this SQL query need DISTINCT?

I've written a query to filter a table based on criteria found in a master table, and then remove rows that match a third table. I'm executing the query in Access, so I can't use MINUS. It works, but I found that it returns duplicate rows for some, but not all, of the selected records. I fixed it with DISTINCT, but I don't know why it would return duplicates in the first place. It's a pretty simple query:
select distinct sq.*
from
(select List_to_Check.*, Master_List.SELECTION_VAR
from List_to_Check
left join Master_List
on List_to_Check.SUB_ID = Master_List.SUB_ID
where Master_List.SELECTION_VAR = 'criteria'
) as sq
left join List_to_Exclude
on sq.SUB_ID = List_to_Exclude.SUB_ID
where List_to_Exclude.SUB_ID is null
;
Edit: The relationships between all three tables are 1-to-1 on the SUB_ID var. Combined with using a LEFT JOIN, I would expect one line per ID.
I recommend breaking your query apart and checking for duplicates. My guess is that it's your data/ the sub_ID isn't very unique.
Start with you sub query since you're returning all of those columns. If you get duplicates there, your query is going to return duplicates regardless of what is in your exclusion table.
Once you have those duplicates cleared up, check the exclusion table for duplicate sub_Id.
To save time in trouble-shooting, if there are known culprits that are duplicates, you may want to limit the returned values, so you can focus on the peculiarities of those data.
I'm not sure this is a problem, but look into the logic on
on List_to_Check.SUB_ID =
Master_List.SUB_ID
where Master_List.SELECTION_VAR = 'criteria'
Where clauses on data in the right side of a left outer join may not be returning the data you expect. Try this and see what happens:
on List_to_Check.SUB_ID = Master_List.SUB_ID
and Master_List.SELECTION_VAR = 'criteria'
The inner query joins List_to_Check and master but the outer query joins List_to_Exclude with Subscriber(maybe you can change the names i call these 3 tables)
To avoid duplicates you need to use one of the table in both the queries inner and outer. This will avoid duplicates.

Outer join on multiple tables on SQL Server

I'm writing a recursive algorithm. It's taking data from 4 periods in the last year, and creating a resultset.
The issue is that not all scenarios return 4 periods.
So, I've done an set of 4 selects on the table, used an outer join to connect them. Their joined on the PK. However, they're all joined to the first datapoint. Sometimes this datapoint doesn't exist, which throws a wrench in my join.
Is there an easy way to do a full outer join on 4 tables using a PK with doing 16 where clauses and outer joining them with (+)
Actually, does (+) even work on sql server?
Thanks,
Eric
You should first create a complete dataset containing all periods in the previous year. *You could do this by using something like SELECT DISTINCT PERIOD FROM (SELECT PERIOD FROM SetA union SELECT PERIOD FROM SetB UNION SELECT PERIOD FROM SETC etc...) AS COMPLETESET"
Then left join against the COMPLETESET all other datasets on period.
The data points that do not exist in the joins will return null values.

SQL Join returns empty set

I'm having problems with my SQL Join. I want to join two tables on a specific ID number and during a specific time frame, but I just keep getting an empty set returned. What I want to get is a match between both tables on the ID numbers, and also filter it by time, also called "Term". Term is on the ProcInfo table I believe. Any ideas on what I'm doing wrong?
SELECT*
FROM tblPernfo INNER JOIN tblProcInfo ON tblProcInfo.eID=tblPernfo.eID
WHERE Term In ('1st Sum 2010')
ORDER BY Term;
First
SELECT (specify columns here)
FROM tblPernfo
INNER JOIN tblProcInfo ON tblProcInfo.eID=tblPernfo.eID
WHERE Term In ('1st Sum 2010')
ORDER BY Term;
it is very poor practice to use select *. It causes performance problems.
Why are you using IN? = should work.
Now to get to why no records are returned. This is a simple dataset, so there are only a coupl eof possibilities. First is that there are no records in tblProcInfo that match to records in tblPernfo. You can confirm or exclude this possibility by running the statement without the where clause.
SELECT (specify columns here)
FROM tblPernfo
INNER JOIN tblProcInfo ON tblProcInfo.eID=tblPernfo.eID
If it returns records, the where clause is the issue, if it does not the join ins the issue. Next run this ( or substitute tblProcInfo idf that is the table that contains the Term column:
SELECT (specify columns here)
FROM tblPernfo
WHERE Term In ('1st Sum 2010')
If that returns data and the first query returned records then the only possibility left is that there are no records in the second table that match the first table for this specific value.