Not exist in anyone - sql

I want to make a query where I select all the IDs of table A, which ids will connect to existing values of end_date in table B.
I need to get the IDs of table A which will connect only to finished IDs(i.e. with existing end_date) on B table.
Relation of table a and b is one to many . A can correlate to many Bs . B will always correlate to one A table.
I have made something like this:
select id
from A
where not exists
(select 1
from B
where end_date is null
and A.id=B.id)
Is this correct? Or is there a faster query for the same thing?
EDIT: end_date is in table B
example :
In the data set:
A.id=1
B.id=1
B.bid=333
B.end_date=null
A.id=1
B.id=1
B.bid=334
B.end_date=05/05/2014
A.id=2
B.id=2
B.bid=335
B.end_date=null
A.id=2
B.id=2
B.bid=336
B.end_date=null
A.id=3
B.id=3
B.bid=337
B.end_date=04/04/2014
A.id=3
B.id=3
B.bid=338
B.end_date=04/04/2014`
My query should result only id=3.

Assuming your table structure is
A(id)
B(id, end_date)
Then to select all A.id where there is no b.end_date (or it is null) you can use this query
Select id
From A
Where id Not In (Select id From B Where end_date is Not Null)

You don't specify your DBMS, but in later versions of SQL Server, this might be faster. You will have to test based on your data:
SELECT DISTINCT A.ID
FROM A
INNER JOIN B ON A.ID = b.ID
WHERE b.End_date IS NOT NULL
EXCEPT
SELECT B.ID
WHERE b.End_date IS NULL
EXCEPT is a set operator that returns all entries in the first set that don't exist in the second set. Doing the query this way gives you two SARGable WHERE clauses rather than one nonSARGable subquery, so it could end up faster depending on your data topography and your physical indexes.

You can probably use a LEFT JOIN like
select A.id
from A a
left join B b
on a.id = b.id
and b.end_date is not null
where b.id is null

Related

SQL inner join with conditional selection

I am new in SQL. Lets say I have 2 tables one is table_A and the other one is table_B. And I want to create a view with two of them which is view_1.
table_A:
id
foo
1
d
2
e
null
f
table_B
id
name
1
a
2
b
3
c
and when I use this query :
SELECT DISTINCT table_A.id, table_B.name
FROM table_A
INNER JOIN table_B ON table_B.id = table_A.id
the null value in table_A can't be seen in the view_1 since it is not found in table_B. I want view_1 to show also this null row like :
id
name
1
a
2
b
null
no entry
Should I create a 4. table? I couldn't find a way.
Try this Query:
SELECT DISTINCT a.id,(CASE When b.name IS NULL OR b.name = '' Then 'No Entry' else b.name end) name FROM table_A a
LEFT JOIN table_B b on a.id = b.id
You are looking for an outer join. Thus you keep all table_A rows and join table_B rows where they exist. If no match exists, the table_B columns in the joined row are NULL.
You replace NULLs with a value with COALESCE.
SELECT a.id, COALESCE(b.name, 'no entry') AS name
FROM table_a a
LEFT OUTER JOIN table_b b ON b.id = a.id
ORDER BY a.id NULLS LAST;
You haven't tagged your request with your DBMS. Not all DBMS support the NULLS LAST clause.
Please note that there is no DISTINCT in my query. It is not needed. And every time you think you must use DISTINCT, think twice. SELECT DISTINCT is very seldom needed. Most often it is used, because the query is kind of flawed and causes the undesired duplicates itself.

Why does this query return values in the column

I have the following query:
SELECT
a.id,
c.c_date
FROM table_a a ,
table_c c
WHERE
a.id = c.id AND
a.id IN (SELECT id from table_c where c_date is null);
I have two tables, table_a and table_c.
I join these two tables, but then get use an IN statement to only show the id's for in which are in table_c AND have the c_date column set to null`.
This query though returns id, and c_date values, and some of the c_date values are not null, how is this possible?
I thought in my sub query I am only selecting id which have null c_dates?
This should work without the subquery assuming you don't want to return null dates. Please note the use of the join:
SELECT a.id,
c.c_date
FROM table_a a
JOIN table_c c ON a.id = c.id
WHERE c_date is null;
It's difficult to answer your specific question though without sample data and expected results. You probably have multiple records in table_c that match the id field in table_a.
It will be easier to explain with this example:
table_a
id col_x col_...
1
2
table_c
id c_date col_m col_...
1 null
1 03/14/2016
2 04/14/2016
You should consider review your intention on your result set. Change your query to #sgeddes answer is a way.

Join Tables on Date Range in Hive

I need to join tableA to tableB on employee_id and the cal_date from table A need to be between date start and date end from table B. I ran below query and received below error message, Would you please help me to correct and query. Thank you for you help!
Both left and right aliases encountered in JOIN 'date_start'.
select a.*, b.skill_group
from tableA a
left join tableB b
on a.employee_id= b.employee_id
and a.cal_date >= b.date_start
and a.cal_date <= b.date_end
RTFM - quoting LanguageManual Joins
Hive does not support join conditions that are not equality conditions
as it is very difficult to express such conditions as a map/reduce
job.
You may try to move the BETWEEN filter to a WHERE clause, resulting in a lousy partially-cartesian-join followed by a post-processing cleanup. Yuck. Depending on the actual cardinality of your "skill group" table, it may work fast - or take whole days.
If your situation allows, do it in two queries.
First with the full join, which can have the range; Then with an outer join, matching on all the columns, but include a where clause for where one of the fields is null.
Ex:
create table tableC as
select a.*, b.skill_group
from tableA a
, tableB b
where a.employee_id= b.employee_id
and a.cal_date >= b.date_start
and a.cal_date <= b.date_end;
with c as (select * from TableC)
insert into tableC
select a.*, cast(null as string) as skill_group
from tableA a
left join c
on (a.employee_id= c.employee_id
and a.cal_date = c.cal_date)
where c.employee_id is null ;
MarkWusinich had a great solution but with one major issue. If table a has an employee ID twice within the date range table c will also have that employee_ID twice (if b was unique if not more) creating 4 records after the join. As such if A is not unique on employee_ID a group by will be necessary. Corrected below:
with C as
(select a.employee_id, b.skill_group
from tableA a
, tableB b
where a.employee_id= b.employee_id
and a.cal_date >= b.date_start
and a.cal_date <= b.date_end
group by a.employee_id, b.skill_group
) C
select a.*, c.skill_group
from tableA a
left join c
on a.employee_id = c.employee_id
and a.cal_date = c.cal_date;
Please note: If B was somehow intentionally not distinct on (employee_id, skill_group), then my query above would also have to be modified to appropriately reflect that.

Can this be done with a single SQL Join?

I am not sure if this can be done with a single JOIN, but I basically have two tables with an ID column in common. To make it simple I'll say Table A just contains an ID while Table B contains an ID and Code. There is a 1:M relationship between Table A and Table B, however it's also possible an ID from Table A is not contained in Table B at all. I was hoping to have a query return every ID that exists in Table B within a particular code range, or does not exist in Table B at all.
I tried using a LEFT JOIN with something like:
SELECT A.id FROM A LEFT JOIN B ON A.id = B.id AND b.code BETWEEN '000' AND '123'
But, this still gives me the IDs that exist in Table B outside of the code range.
Use a left join, and filter the result to contain the codes in the range, and also the lines where there is no matching record in table B:
select
A.id
from
A
left join B on B.id = A.id
where
B.code between '000' and '123' or B.id is null
What about
SELECT id FROM A LEFT JOIN B ON A.id = B.id
WHERE b.code IS NULL OR b.code BETWEEN ' ' AND '123'

SQL Where Condition with IF Statement

So I'm trying to write a query to pull some data and I have one condition that needs to be met and I can't seem to figure out how to actually execute it. What I'm trying to achieve is that if a column is not null in one table, then I want to check another table and see if there is a specific value in one those columns. So in a psuedo code type of way I'm trying to do this
SELECT id, user_name, created_date, transaction_number
FROM TableA
WHERE (IF TableA.response_id IS NULL OR
IF (SELECT type_id from TableB WHERE (type_id NOT IN ('4)) AND (id = TableA.response_id))
So from here what I'm trying to do is pull all transactions for customers that have no responses in them, but from those that do have responses I still want to grab transaction that's don't have a specific code associated to them. I'm not sure if it's possible to do it in this manner or if I need to create some temporary tables that can then be manipulated but I'm stuck on this one condition.
At first I thought you wanted the CASE statement from the wording of your question, but I think you're just looking for an OUTER JOIN with an OR statement:
SELECT DISTINCT a.id, a.user_name, a.created_date, a.transaction_number
FROM TableA A
LEFT JOIN TableB B ON A.response_id = B.Id
WHERE A.response_id IS NULL
OR B.type_id NOT IN (4)
A Visual Explanation of SQL Joins
where TableA.Response_id is null or (select count(1) from TableB WHERE (type_id NOT IN ('4)) AND (id = TableA.response_id)) = 0
provided that your subquery is logically correct.
Well I'm not 100% certain I follow, but assuming what you want is to see if there are response entries for a particular ID in Table A I think you want something like this.
SELECT a.id, user_name, created_date, transaction_number
FROM TableA a
LEFT JOIN TableB b
ON a.id=b.id
LEFT JOIN TableC c
ON a.id=c.id
WHERE isnull(b.id,c.id) IS NOT NULL
GROUP BY a.id, user_name, created_date, transaction_number
ISNULL will return b.id if it is not null, c.id if c.id is not null and NULL otherwise. That will tell you if there's a response for a.id in either TableB or TableC. That's assuming TableB & TableC are more like logs. If you're saying those table will certainly have an entry for a.id then it's just a matter of replacing b.id & c.id with b.[response_column] & c.[response_column] respectively.