How to use an alias in a sql join - sql

I create a variable with a case when like it:
case when (a.exit_date='0001-01-01' and z.fermeture<>'0001-01-01') then z.fermeture
else a.exit_date
end as final_exit_date,
And after I got a sql join like it:
select a.*,b.*
from table1 as a
left join table2 as b on (a.id=b.id and b.start <= a.exit_date and a.exit_date < b.end)
where a.id=28445
When I do it, it works ! But me I don't want use the variable "a.exit_date"
I want replace it per the variable that I created ( final_exit_date), like it:
select a.*,b.*
from table1 as a
left join table2 as b on (a.id = b.id and b.start <= final_exit_date and final_exit_date < b.end)
where a.id=28445
Thanks in advance for reading me !!

When you create an expression with an alias in a SELECT list, the only place you are allowed to use the alias is in the ORDER BY clause. This frustrates many people in SQL because it is often that the alias would be useful in a WHERE clause or elsewhere, but that is not possible.
The solution is that you have to duplicate the expression instead of using the alias.
SELECT a.*,b.*
FROM table1 AS a
-- you will need the z table as well
LEFT JOIN table2 AS b ON (a.id=b.id and b.start <=
CASE WHEN (a.exit_date='0001-01-01' AND z.fermeture<>'0001-01-01') THEN z.fermeture ELSE a.exit_date END
AND
CASE WHEN (a.exit_date='0001-01-01' AND z.fermeture<>'0001-01-01') THEN z.fermeture ELSE a.exit_date END < b.end)
WHERE a.id=28445
Another option is to use a common table expression (CTE) or a subquery so that the alias is available. Using a CTE, it would look something like this:
;WITH records AS (
SELECT a.*, CASE WHEN (a.exit_date='0001-01-01' AND z.fermeture<>'0001-01-01') THEN z.fermeture ELSE a.exit_date END AS final_exit_date
FROM table1 AS a
LEFT OUTER JOIN otherTable AS z ON a.id = z.id -- whatever the condition is
)
SELECT r.* b.*
FROM records AS r
LEFT JOIN table2 AS b ON r.id = b.id AND b.start<= r.final_exit_date AND r.final_exit_date < b.end
There are likely some issues with the query above since you didn't include the table names or the columns (or how they are even related), but you should be able to create a working solution by adapting one of these two approaches.

Related

Full outer join acts like inner join with multiple conditions on the two tables

I am trying to have a full outer join between two tables Table1 and Table2 on ID with a query like the following in Teradata. The problem is it acts like inner join.
SELECT *
FROM Table1 AS a
FULL OUTER JOIN Table2 AS b
ON a.ID = b.ID
WHERE a.country in ('US','FR')
AND a.create_date = '2021-01-01'
AND b.country IN ('US','DE','BE')
AND b.create_date = '2021-01-01';
What I want is something like this:
SELECT * FROM
(
SELECT * FROM Table1 as a
WHERE a.country in ('US','FR')
AND a.create_date = '2021-01-01'
) as ax
FULL OUTER JOIN
(
SELECT * FROM Table2 as b
WHERE b.country IN ('US','DE','BE')
AND b.create_date = '2021-01-01'
) as bx
ON ax.ID=bx.ID;
I feel like the second query is not best practice, maybe inefficient and/or hard to read in complicated cases. How can I modify the first query to get the desired output?
I know that this is a fundamental problem and probably there are many other ways to do it (e.g. with USING, HAVING etc) but could not find a basic explanation. Would appreciate a comprehensive answer on alternative solutions as a guide for future reference.
EDIT
The difference in my question to Left Join With Where Clause is that I require a condition in both tables. I cannot figure out where to put the second WHERE condition.
The short answer: Both sets of predicates belong in the ON clause.
SELECT *
FROM Table1 AS a
FULL OUTER JOIN Table2 AS b
ON a.ID = b.ID
AND a.country in ('US','FR')
AND a.create_date = '2021-01-01'
AND b.country IN ('US','DE','BE')
AND b.create_date = '2021-01-01';
The ON clause both limits the rows that are eligible to participate in the join (pre-join filtering) and specifies how to match rows (join criteria). The WHERE clause filters results (after the join).
A generally less-desirable alternative would be to modify the predicates so as not to filter out the non-matching rows, e.g. assuming ID is NOT NULL in both tables
SELECT *
FROM Table1 AS a
FULL OUTER JOIN Table2 AS b
ON a.ID = b.ID
WHERE (a.country in ('US','FR')
AND a.create_date = '2021-01-01'
OR a.ID IS NULL)
AND (b.country IN ('US','DE','BE')
AND b.create_date = '2021-01-01'
OR b.ID IS NULL);
Logically the ON and WHERE work the same way for INNER JOIN but in that case the net result is the same (and many databases including Teradata will generate the same query plan for INNER JOIN regardless of where you put the filter predicates).

Join based on temp column

I have two tables that need to be joined, but the only similar column has excess data that needs to be stripped. I would just modify the tables, but I only have read access to them. So, I strip the unneeded text out of the table and add a temp column, but I cannot join to it. I get the error:
Invalid column name 'TempJoin'
SELECT
CASE WHEN CHARINDEX('- ExtraText',a.Column1)>0 THEN LEFT(a.Column1, (CHARINDEX('- ExtraText', a.Column1))-1)
WHEN CHARINDEX('- ExtraText',a.Column1)=0 THEN a.Column1
END AS TempJoin
,a.Column1
,b.Column2
FROM Table1 as a
LEFT JOIN Table2 as b WITH(NOLOCK) ON b.Column2=TempJoin
Easiest way would be to wrap this in a CTE. Also, be careful using NOLOCK, unless you have an explicit reason.
WITH cte AS (
SELECT
CASE WHEN CHARINDEX('- ExtraText',a.Column1) > 0
THEN LEFT(a.Column1, (CHARINDEX('- ExtraText', a.Column1))-1)
WHEN CHARINDEX('- ExtraText',a.Column1) = 0
THEN a.Column1
END AS TempJoin,
a.Column1
FROM Table1 AS a
)
SELECT *
FROM cte
LEFT JOIN Table2 AS b WITH(NOLOCK) ON b.Column2 = TempJoin;

BigQuery Full outer join producing "left join" results

I have 2 tables, both of which contain distinct id values. Some of the id values might occur in both tables and some are unique to each table. Table1 has 10,910 rows and Table2 has 11,304 rows
When running a left join query:
SELECT COUNT(DISTINCT a.id)
FROM table1 a
JOIN table2 b on a.id = b.id
I get a total of 10,896 rows or 10,896 ids shared across both tables.
However, when I run a FULL OUTER JOIN on the 2 tables like this:
SELECT COUNT(DISTINCT a.id)
FROM table1 a
FULL OUTER JOIN EACH table2 b on a.id = b.id
I get total of 10,896 rows, but I was expecting all 10,910 rows from table1.
I am wondering if there is an issue with my query syntax.
As you are using EACH - it looks like you are running your queries in Legacy SQL mode.
In BigQuery Legacy SQL - COUNT(DISTINCT) function is probabilistic - gives statistical approximation and is not guaranteed to be exact.
You can use EXACT_COUNT_DISTINCT() function instead - this one gives you exact number but a little more expensive on back-end
Even better option - just use Standard SQL
For your specific query you will only need to remove EACH keyword and it should work as a charm
#standardSQL
SELECT COUNT(DISTINCT a.id)
FROM table1 a
JOIN table2 b on a.id = b.id
and
#standardSQL
SELECT COUNT(DISTINCT a.id)
FROM table1 a
FULL OUTER JOIN table2 b on a.id = b.id
I added the original query as a subquery and counted ids and produced the expected results. Still a little strange, but it works.
SELECT EXACT_COUNT_DISTINCT(a.id)
FROM
(SELECT a.id AS a.id,
b.id AS b.id
FROM table1 a FULL OUTER JOIN EACH table2 b on a.id = b.id))
It is because you count in both case the number of non-null lines for table a by using a count(distinct a.id).
Use a count(*) and it should works.
You will have to add coalesce... BigQuery, unlike traditional SQL does not recognize fields unless used explicitly
SELECT COUNT(DISTINCT coalesce(a.id,b.id))
FROM table1 a
FULL OUTER JOIN EACH table2 b on a.id = b.id
This query will now take full effect of full outer join :)

Join Tables on Date Range in Hive

I need to join tableA to tableB on employee_id and the cal_date from table A need to be between date start and date end from table B. I ran below query and received below error message, Would you please help me to correct and query. Thank you for you help!
Both left and right aliases encountered in JOIN 'date_start'.
select a.*, b.skill_group
from tableA a
left join tableB b
on a.employee_id= b.employee_id
and a.cal_date >= b.date_start
and a.cal_date <= b.date_end
RTFM - quoting LanguageManual Joins
Hive does not support join conditions that are not equality conditions
as it is very difficult to express such conditions as a map/reduce
job.
You may try to move the BETWEEN filter to a WHERE clause, resulting in a lousy partially-cartesian-join followed by a post-processing cleanup. Yuck. Depending on the actual cardinality of your "skill group" table, it may work fast - or take whole days.
If your situation allows, do it in two queries.
First with the full join, which can have the range; Then with an outer join, matching on all the columns, but include a where clause for where one of the fields is null.
Ex:
create table tableC as
select a.*, b.skill_group
from tableA a
, tableB b
where a.employee_id= b.employee_id
and a.cal_date >= b.date_start
and a.cal_date <= b.date_end;
with c as (select * from TableC)
insert into tableC
select a.*, cast(null as string) as skill_group
from tableA a
left join c
on (a.employee_id= c.employee_id
and a.cal_date = c.cal_date)
where c.employee_id is null ;
MarkWusinich had a great solution but with one major issue. If table a has an employee ID twice within the date range table c will also have that employee_ID twice (if b was unique if not more) creating 4 records after the join. As such if A is not unique on employee_ID a group by will be necessary. Corrected below:
with C as
(select a.employee_id, b.skill_group
from tableA a
, tableB b
where a.employee_id= b.employee_id
and a.cal_date >= b.date_start
and a.cal_date <= b.date_end
group by a.employee_id, b.skill_group
) C
select a.*, c.skill_group
from tableA a
left join c
on a.employee_id = c.employee_id
and a.cal_date = c.cal_date;
Please note: If B was somehow intentionally not distinct on (employee_id, skill_group), then my query above would also have to be modified to appropriately reflect that.

How to select records from a Table that has a certain number of rows in a related table in SQL Server?

Not quite sure how to ask this, but I have 2 tables that are related in a 1 to many relationship, I need to select all records in the "1" table that have less than three records in the "many' table.
select b.foreignkey,count(b.foreignkey) as bidcount
from b
where b.foreignkey in (select a.id from a) and bidcount< 3
group by b.foreignkey
this doesn't work at all I know but I am at a loss how to do this.
I need to in the end select all the records from the "a" table based on this criteria. Sorry if that is confusing!
Just using your code, not tested:
SELECT
b.foreignkey,
count(b.foreignkey) as bidcount
FROM
b
WHERE
b.foreignkey IN (SELECT a.id FROM a)
GROUP BY
b.foreignkey
HAVING
count(b.foreignkey) < 3
Try this:
SELECT t1.id,COUNT(t2.parentId)
FROM table1 as t1
INNER JOIN table2 as t2
ON t1.id = t2.parentId
GROUP BY t1.id
HAVING COUNT(t2.parentId) < 3
You didn't mention which version of SQL Server you're using - if you're on SQL Server 2005 or newer, you could use this CTE (Common Table Expression):
;WITH ChildRows AS
(
SELECT A.Id, COUNT(b.Id) AS 'BCount'
FROM
dbo.TableA A
INNER JOIN
dbo.TableB B ON B.TableAId = A.Id
)
SELECT A.*, R.BCount
FROM dbo.TableA A
INNER JOIN ChildRows R ON A.Id = R.Id
The inner SELECT lists the Id columns from TableA and the count of the child rows associated with those (using the INNER JOIN to TableB) - and the outer SELECT just builds on top of that result set and shows all fields from table A (and the count from the B table)
if you want to return all fields of your (1) table in one query, I suggest you consider using CROSS APPLY:
SELECT t1.* FROM table_1 t1
CROSS APPLY (SELECT COUNT(*) cnt FROM Table_Many t2 WHERE t2.fk = t1.pk) a
where a.cnt < 3
in some particular cases, based on your indices and db structure, this query may run 4 times faster than the GROUP BY method
you have posted this question in sql server, I have a answer in oracle database system (don't know whether it will run in sql server as well or not)
this is as follow-
select [desired column list] from
(select b.*, count(*) over (partition by b.foreignkey) c_1
from b
where b.foreignkey in (select a.id from a) )
where c_1 < 3 ;
i hope it should work on sql server as well...
if not please let me update ..