Outer join on multiple tables on SQL Server - sql

I'm writing a recursive algorithm. It's taking data from 4 periods in the last year, and creating a resultset.
The issue is that not all scenarios return 4 periods.
So, I've done an set of 4 selects on the table, used an outer join to connect them. Their joined on the PK. However, they're all joined to the first datapoint. Sometimes this datapoint doesn't exist, which throws a wrench in my join.
Is there an easy way to do a full outer join on 4 tables using a PK with doing 16 where clauses and outer joining them with (+)
Actually, does (+) even work on sql server?
Thanks,
Eric

You should first create a complete dataset containing all periods in the previous year. *You could do this by using something like SELECT DISTINCT PERIOD FROM (SELECT PERIOD FROM SetA union SELECT PERIOD FROM SetB UNION SELECT PERIOD FROM SETC etc...) AS COMPLETESET"
Then left join against the COMPLETESET all other datasets on period.
The data points that do not exist in the joins will return null values.

Related

Terminology for when a join propagates out additional rows

When joining between two tables/queries:
with
cte1 (id) as (
select 1 from dual),
cte2 (id) as (
select 1 from dual union all
select 1 from dual)
select
cte1.id as cte1_id,
cte2.id as cte2_id
from
cte1
left join
cte2
on cte1.id = cte2.id
CTE1_ID CTE2_ID
1 1
1 1
Unsurprisingly, that join propagates out additional rows. The query on the left side of the join only had one row. But the resultset has two rows due to the join.
I suspect “propagate” isn’t quite the right word for describing that scenario.
What’s the proper term?
For example, when talking to people who are new to SQL, I often say, “Be careful with that join. It looks like you’re accidentally propagating out additional rows, since the join is 1:many.”
In this example you are not propagating rows (at least in my understanding anyway). You have two rows in the table on the right side of the join and you have two rows in the result.
However, if you had this:
WITH cte AS
(
SELECT 1 AS id FROM dual
UNION ALL
SELECT 1 FROM dual
)
SELECT x.id, y.id
FROM cte x
INNER JOIN cte y ON x.id = y.id;
You would start with 2 rows and the query would return 4, because the join is partial. To me, this is propagating data.
When every row from one side of the join is joined with every row on the other, the term you are looking for is a "cartesian product", which is achieved in SQL using a "cross join" or, in cases where the join is not unique but is limited partially, you could use "partial cartesian product" (though I don't recommend it) or more commonly a "partial cross-join". I think the latter is more likely to be readily appreciated by SQL developers.
In either case, there are times where both can be appropriate but a lot of the time they are the result of an error in a join clause.
What’s the proper term?
"Cartesian Product" could be one term you can use.
I.e. "Be careful of that join. It looks like you are accidentally returning the cartesian product of the two tables."
A CROSS JOIN will return the cartesian product of the two joined tables; it is also called a "Cartesian Join".
An INNER JOIN will return the cartesian product of the two joined tables that is filtered by some relationship (the join condition(s)) between columns of the two tables; it is also called an "Equi Join".
An OUTER JOIN is similar to the INNER JOIN but will also return the non-matched rows on one (for LEFT or RIGHT joins) or both (for FULL joins) sides of the join condition.
why a LEFT JOIN an INNER JOIN would do the same!
And no it doesn't propagate, you have in cte2 2 id's with 1 that is what UNION ALL actually amkes so when you join both tables, with the same id you will receive 2 rows as joined result set.
A Left Join also takes all rows of the left tables and troes to join in your case by the id and if it didn't find any companion, it adds the row with the right table columns as NULL.
So no wonders and no miracles, simple SQL

Oracle Left outer joining same tables multiple times

I have four tables named as ROLE,EMPLOYEE, HIERARCHY_UNIT, EMPLOYEE_CLASS all of them have a different nameid column which is a primary key to STRING_TABLE table.Also STRING_TABLE will have a column stringtext that stores the exact text or we can say name.
All of these tables are linked as all of them will have a empid column.
Now i want to select some information from EMPLOYEE table along with the names of corresponding role, hierarchy_unit,employeeclass.My select query willl be something like
SELECT EMPST.STRINGTEXT,EROLE.STRINGTEXT,EHIER.STRINGTEXT,EEMPCLASS.STRINGTEXT
FROM EMPLOYEE EMP
INNER JOIN ROLE RO ON ROLE.EMPID=EMP.EMPID
INNER JOIN EMPLOYEE_CLASS EC ON EC.EMPID = EMP.EMPID
INNER JOIN HIERARCHY_UNIT HU ON HU.EMPID=EMP.EMPID
LEFT OUTER JOIN STRING_TABLE EMPST ON EMPST.NAMEID=EMP.NAMEID
LEFT OUTER JOIN STRING_TABLE EROLE ON RO.NAMEID=EROLE.NAMEID
LEFT OUTER JOIN STRING_TABLE EHIER ON HU.NAMEID=EHIER.NAMEID
LEFT OUTER JOIN STRING_TABLE EEMPCLASS ON EEMPCLASS.NAMEID=EC.NAMEID
The above query is working fine but i have a question whether doing the join to same table will not cause any performance issue. In the above example i have taken left outer join 3 times ( in actual i have a case of 26 joins with the string table ).Is there any way to optimize the above select query n and not to take join with same table multiple times?
It is hard to say anything about performance of a query without it's execution plan. Guessing is the only thing possible at the moment.
So, assuming STRING_TABLE.NAMEID is worth indexing (there are many unique values) and you already have this column indexed, this query is fine.
In the select-part of the query you've specified 4 STRING_TABLE's columns. This means you're asking the database to find values for different NAMEIDs from 4 different lines in that table and post them in one line for each line in the result query.
What the database has to do is looking 4 times (or 26 in production) for the particular line with particular nameid for each line in the output. This is why you need to join STRINGS_TABLE 4 times and, again, this is fine as long as column is worth-indexing and is already indexed.
If there are many non-unique data in NAMEID column, you might need to use partitioning or even to change database structure in order to get better performance. But, again, query is fine and I don't see any way to make it better

Oracle Different row counts using Join and without Join

I have an Oracle DB and use this query below to fetch records for a requirement. Five columns from three tables and a where condition.
select un.name, he.emp_no, he.lname, hr.in_unit, hr.out_unit
from hr_employee he
inner join hr_roster hr on he.eid = hr.eid
inner join units un on he.unit = un.unit_code
where hr.unit_date = to_date( '24-JUL-20','dd-MON-yy')
Later on I realize that if used in this way below, without Joins it is slightly faster.
select un.name, he.emp_no, he.lname, hr.in_unit, hr.out_unit
from hr_employee he, hr_roster hr, units un
where hr.unit_date = to_date( '24-JUL-20','dd-MON-yy')
But I notice that there's a difference of the rows getting fetched comparing the queries above.
When I took a row count of both queries, the one using Joins returns 1012 and the other one keeps fetching without a count.
I am bit confused and do not know which query is the most suitable to use.
The Second query treats as a CROSS JOIN, since there's no respective join conditions among those tables' columns, just exists a restriction due to a certain date, while the first one has a standard inner joins among tables with regular INNER JOIN conditions.
The second query is basically incorrect as does not have join conditions on the second and 3rd table, except for a limitation on a date for the first table only. So it basically produces a cartesian product of the selected records from 1rst table times ALL records on 2nd table times ALL records on 3rd table.
The first query, which looks more correct, produces the selected records on 1rst table times the records on 2nd table joined by he.eid = hr.eid times the records on 3rd table joined by he.unit = un.unit_code

How to LEFT JOIN two simple tables but not throwing away data from the second table? [duplicate]

I am trying to get the number of page opens on a per day basis using the following query.
SELECT day.days, COUNT(*) as opens
FROM day
LEFT OUTER JOIN tracking ON day.days = DAY(FROM_UNIXTIME(open_date))
WHERE tracking.open_id = 10
GROUP BY day.days
The output I get it is this:
days opens
1 9
9 2
The thing is, in my day table, I have a single column that contains the number 1 to 30 to represent the days in a month. I did a left outer join and I am expecting to have all days show on the days column!
But my query is doing that, why might that be?
Nanne's answer given explains why you don't get the desired result (your WHERE clause removes rows), but not how to fix it.
The solution is to change WHERE to AND so that the condition is part of the join condition, not a filter applied after the join:
SELECT day.days, COUNT(*) as opens
FROM day
LEFT OUTER JOIN tracking
ON day.days = DAY(FROM_UNIXTIME(open_date))
AND tracking.open_id = 10
GROUP BY day.days
Now all rows in the left table will be present in the result.
You specify that the connected tracking.open_id must be 10. For the other rows it will be NULL, so they'll not show up!
The condition is in the WHERE clause. After joining the tables the WHERE conditions are evaluated to filter out everything matching the criteria.Thus anything not matching tracking.open_id = 10 gets discarded.
If you want to apply this condition while joining the two tables, a better way is to use it with the ON clause (i.e. joining condition) than the entire dataset condition.

Make the correct INNER JOIN when using a BETWEEN

I'm sorry if this is a dumb question, but I have this particular case I can't figure how to handle. I need a query where I get all date values between two date values on another tables, and right now this is my query
SELECT h.hour_gkey, h.hour_time
FROM Hours as h
INNER JOIN ServiceHours sh ON h.hour_gkey BETWEEN sh.openhour_hour_gkey AND sh.closehour_hour_gkey;
So to explain it a bit further, the ServiceHours table has two fields openhour_hour_gkey and closehour_hourg_key that are integer, this two fields contain Foreign Keys of the Hours table and therefore they have time values, the hour_gkey(integer) its the primary key of Hours table and I need to show only the values of hour_time (date fieldtype) that are between the dates that correspond to those two fields. How could I do that
Using right now SQL Server 2014
I'm interpreting your question to be "How do I select all of the rows of Hours whose hour_time values are between those related to ServiceHours.openhour_hour_gkey and ServiceHours.closehour_hour_gkey?"
I furthermore suppose that it is intentional that you are neither selecting any columns from ServiceHours nor filtering to narrow the results to those associated with a single ServiceHours row. Thus, if there are multiple ServiceHour rows you will get a set of Hours rows for each one, with these sets not necessarily being disjoint, and with no indication of which goes with which ServiceHour.
In any case, you need to perform a join for each relationship you want to traverse, and for this query you seem to want another join to get the target data. that might look like this:
SELECT h.hour_gkey, h.hour_time
FROM
Hours h
CROSS JOIN ServiceHours sh
INNER JOIN Hours sho
ON sh.openhour_hour_gkey = sho.hour_gkey
INNER JOIN Hours shc
ON sh.closehour_hour_gkey = shc.hour_gkey
WHERE h.hour_time BETWEEN sho.hour_time AND shc.hour_time;
I have written the BETWEEN condition as a filter predicate instead of a join predicate because that seems a better characterization. For inner joins, however, the two alternatives are equivalent. Note also that this query is semantically equivalent to #DaveCosta's.
Here is one way I think it could be done. I am not certain if the syntax is exactly right for SQL Server. But the basic idea is, you would need to join from ServiceHours to Hours to get the actual open/close hour values, then select the rows from Hours with values in that range.
WITH min_max AS (
SELECT h1.hour_time min_hour_time, h2.hour_time max_hour_time
FROM ServiceHours sh
JOIN Hours h1 ON h1.hour_gkey = sh.openhour_hour_gkey
JOIN Hours h2 ON h2.hour_gkey = sh.closehour_hour_gkey
)
SELECT h.hour_time
FROM Hours h
JOIN min_max ON h.hour_time BETWEEN min_max.min_hour_time AND min_max.max_hour_time
(Note I'm assuming that ServiceHours has only one row. If it doesn't, there is probably some other field you want to include in both the subquery and the main query to indicate which row in ServiceHours each resulting row relates to.)