SQL function to create a one-to-one match between two tables? - sql

I am trying to join 2 tables. Table_A has ~145k rows whereas Table_B has ~205k rows.
They have two columns in common (i.e. ISIN and date). However, when I execute this query:
SELECT A.*,
B.column_name
FROM Table_A
JOIN
Table_B ON A.date = B.date
WHERE A.isin = B.isin
I get a table with more than 147k rows. How is it possible? Shouldn't it return a table with at most ~145k rows?

What you are seeing indicates that, for some of the records in Table_A, there are several records in Table_B that satisfy the join conditions (equality on the (date, isin) tuple).
To exhibit these records, you can do:
select B.date, B.isin
from Table_A
join Table_B on A.date = B.date and A.isin = B.isin
group by B.date, B.isin
having count(*) > 1
It's up to you to define how to handle those duplicates. For example:
if the duplicates have different values in column column_name, then you can decide to pull out the maximum or minimum value
or use another column to filter on the top or lower record within the duplicates
if the duplicates are true duplicates, then you can use select distinct in a subquery to dedup them before joining
... other solutions are possible ...

If you want one row per table A, then use outer apply:
SELECT A.*,
B.column_name
FROM Table_A a OUTER APPLY
(SELECT TOP (1) b.*
FROM Table_B b
WHERE A.date = B.date AND A.isin = B.isin
ORDER BY ? -- you can specify *which* row you want when there are duplicates
) b;
OUTER APPLY implements a lateral join. The TOP (1) ensures that at most one row is returned. The OUTER (as opposed to CROSS) ensures that nothing is filtered out. In this case, you could also phrase it as a correlated subquery.
All that said, your data does not seem to be what you really expect. You should figure out where the duplicates are coming from. The place to start is:
select b.date, b.isin, count(*)
from tableb b
group by b.date, b.isin
having count(*) >= 2;
This will show you the duplicates, so you can figure out what to do about them.

Duplicate possibilities is already discuss.
When millions of records are use in join then often due to poor Cardianility Estimate,
record return are not accurate.
For this just change join order,
SELECT A.*,
B.column_name
FROM Table_A
JOIN
Table_B ON A.isin = B.isin
and
A.date = B.date
Also create non clustered index on both table.
Create NonClustered index isin_date_table_A on Table_A(isin,date)include(*Table_A)
*Table_A= comma seperated list Table_A column which is require in resultset
Create NonClustered index isin_date_table_B on Table_B(isin,date)include(column_nameA)
Update STATISTICS Table_A
Update STATISTICS Table_B

Keeping the DATE columns of both tables in the same format in the JOIN condition you should be getting the result as expected.
Select A.*, B.column_name
from Table_A
join Table_B on to_date(a.date,'DD-MON-YY') = to_date(b.date,'DD-MON-YY')
where A.isin = B.isin

Related

SQL inner join with conditional selection

I am new in SQL. Lets say I have 2 tables one is table_A and the other one is table_B. And I want to create a view with two of them which is view_1.
table_A:
id
foo
1
d
2
e
null
f
table_B
id
name
1
a
2
b
3
c
and when I use this query :
SELECT DISTINCT table_A.id, table_B.name
FROM table_A
INNER JOIN table_B ON table_B.id = table_A.id
the null value in table_A can't be seen in the view_1 since it is not found in table_B. I want view_1 to show also this null row like :
id
name
1
a
2
b
null
no entry
Should I create a 4. table? I couldn't find a way.
Try this Query:
SELECT DISTINCT a.id,(CASE When b.name IS NULL OR b.name = '' Then 'No Entry' else b.name end) name FROM table_A a
LEFT JOIN table_B b on a.id = b.id
You are looking for an outer join. Thus you keep all table_A rows and join table_B rows where they exist. If no match exists, the table_B columns in the joined row are NULL.
You replace NULLs with a value with COALESCE.
SELECT a.id, COALESCE(b.name, 'no entry') AS name
FROM table_a a
LEFT OUTER JOIN table_b b ON b.id = a.id
ORDER BY a.id NULLS LAST;
You haven't tagged your request with your DBMS. Not all DBMS support the NULLS LAST clause.
Please note that there is no DISTINCT in my query. It is not needed. And every time you think you must use DISTINCT, think twice. SELECT DISTINCT is very seldom needed. Most often it is used, because the query is kind of flawed and causes the undesired duplicates itself.

Distinct IDs from one table for inner join SQL

I'm trying to take the distinct IDs that appear in table a, filter table b for only these distinct IDs from table a, and present the remaining columns from b. I've tried:
SELECT * FROM
(
SELECT DISTINCT
a.ID,
a.test_group,
b.ch_name,
b.donation_amt
FROM table_a a
INNER JOIN table_b b
ON a.ID=b.ID
ORDER by a.ID;
) t
This doesn't seem to work. This query worked:
SELECT DISTINCT a.ID, a.test_group, b.ch_name, b.donation_amt
FROM table_a a
inner join table_b b
on a.ID = b.ID
order by a.ID
But I'm not entirely sure this is the correct way to go about it. Is this second query only going to take unique combinations of a.ID and a.test_group or does it know to only take distinct values of a.ID which is what I want.
Your first and second query are similar.(just that you can not use ; inside your query) Both will produce the same result.
Even your second query which you think is giving you desired output, can not produce the output what you actually want.
Distinct works on the entire column list of the select clause.
In your case, if for the same a.id there is different a.test_group available then it will have multiple records with same a.id and different a.test_group.

Get all rows that are not exist in the appropriate table depending on the date in sql

I have two tables: TableA and TableB (as in the following picture):
The result should be as in the following figure:
What is the best way to get result (as in the table Result) using mssql query?
Thanks.
If I understand correctly, you want the date/value pairs that don't exist.
Generate the list of all date/value pairs using a cross join. Then filter out the ones you don't want:
select b.value, d.date
from tableb b cross join
(select distinct date from tablea a) d
where not exists (select 1 from tablea a where a.date = d.date and a.value = b.value)

Keep on sort order when using IN statement in WHERE clause

I'm using SQL Server 2005.
I have a temporary sorted table (Table_A) that contains 2 columns (ID, RowNumber).
Now, I create a new table by selecting all rows from other table (Table_B) that exist (ID value) in the temporary table (Table_A).
SELECT *
FROM Table_B
WHERE Table_B.ID IN (SELECT ID FROM Table_A)
The results of the query above is not sorted by Table_A sorting.
I'm looking for a way to keep the results of the new result table sorted by the Table_A sorting.
Tx....
You'll need to use a JOIN instead. I have assumed below that Table_A can only have 1 row per ID. If this is not the case then rewriting as a JOIN will introduce duplicate rows and we will need more details of which RowNumber to use for sorting purposes in that case.
SELECT Table_B.*
FROM Table_B JOIN Table_A ON Table_B.ID = Table_A.ID
ORDER BY Table_A.RowNumber
select b.* from Table_B b
join Table_A a on a.id = b.id
order by a.RowNumber

SQL LEFT outer join with only some rows from the right?

I have two tables TABLE_A and TABLE_B having the joined column as the employee number EMPNO.
I want to do a normal left outer join. However, TABLE_B has certain records that are soft-deleted (status='D'), I want these to be included. Just to clarify, TABLE_B could have active records (status= null/a/anything) as well as deleted records, in this case i don't want that employee in my result. If however there are only deleted records of the employee in TABLE_B i want the employee to be included in the result.I hope i'm making my requirement clear. (I could do a lengthy qrslt kind of thingy and get what I want, but I figure there has to be a more optimized way of doing this using the join syntax). Would appreciate any suggestions(even without the join). His newbness is trying the following query without the desired result:
SELECT TABLE_A.EMPNO
FROM TABLE_A
LEFT OUTER JOIN TABLE_B ON TABLE_A.EMPNO = TABLE_B.EMPNO AND TABLE_B.STATUS<>'D'
Much appreciate any help.
Just to clarify -- all records from TABLE_A should appear, unless there are rows in table B with statues other than 'D'?
You'll need at least one non-null column on B (I'll use 'B.ID' as an example, and this approach should work):
SELECT TABLE_A.EMPNO
FROM TABLE_A
LEFT OUTER JOIN TABLE_B ON
(TABLE_A.EMPNO = TABLE_B.EMPNO)
AND (TABLE_B.STATUS <> 'D' OR TABLE_B.STATUS IS NULL)
WHERE
TABLE_B.ID IS NULL
That is, reverse the logic you might think -- join onto TABLE_B only where you have rows that would exclude TABLE_A entries, and then use the IS NULL at the end to exclude those. This means that only those which didn't match (those with no row in TABLE_B, or with only 'D' rows) get included.
An alternative might be
SELECT TABLE_A.EMPNO
FROM TABLE_A
WHERE NOT EXISTS (
SELECT * FROM TABLE_B
WHERE TABLE_B.EMPNO = TABLE_A.EMPNO
AND (TABLE_B.STATUS <> 'D' OR TABLE_B.STATUS IS NULL)
)
The following query will get you the employee records that aren't deleted, or only the employ only has deleted records.
select
a.*
from
table_a a
left join table_b b on
a.empno = b.empno
where
b.status <> 'D'
or (b.status = 'D' and
(select count(distinct status) from table_b where empno = a.empno) = 1)
This is in ANSI SQL, but if I knew your RDBMS, I could give a more specific solution that may be a bit more elegant.
ah crud, this apparently works ><
SELECT TABLE_A.EMPNO
FROM TABLE_A
LEFT OUTER JOIN TABLE_B ON TABLE_A.EMPNO = TABLE_B.EMPNO
where TABLE_B.STATUS<>'D'
If you guys have any extra info to chime in with though, please feel free.
UPDATE:
Saw this question after sometime and thought i'll add more helpful info: This link has good info regarding ANSI syntax - http://www.oracle-base.com/articles/9i/ANSIISOSQLSupport.php
In particular this part from the linked page is informative:
Extra filter conditions can be added to the join to using AND to form a complex join. These are often necessary when filter conditions are required to restrict an outer join. If these filter conditions are placed in the WHERE clause and the outer join returns a NULL value for the filter column the row would be thrown away. if the filter condition is coded as part of the join the situation can be avoided.
SELECT A.*, B.*
FROM
Table_A A
INNER JOIN Table_B B
ON A.EmpNo = B.EmpNo
WHERE
NOT EXISTS (
SELECT *
FROM Table_B X
WHERE
A.EmpNo = X.EmpNo
AND X.Status <> 'D'
)
I think this does the trick. The left join is not needed because you only want to include employees with all (and at least one) deleted rows.
This is how I understand the question. You need to include only those employees for which either of the following is true:
an employee has only (soft-)deleted rows in TABLE_B;
an employee has only non-deleted rows in TABLE_B;
an employee has no rows in TABLE_B at all.
In other words, if an employee has both deleted and non-deleted rows in TABLE_B, omit that employee, otherwise include them.
This is how I think it could be solved:
SELECT DISTINCT a.EMPNO
FROM TABLE_A a
LEFT JOIN TABLE_B b1 ON a.EMPNO = b1.EMPNO
LEFT JOIN TABLE_B b2 ON b1.EMPNO = b2.EMPNO
AND (b1.STATUS = 'D' AND (b2.STATUS <> 'D' OR b2 IS NULL) OR
b2.STATUS = 'D' AND (b1.STATUS <> 'D' OR b1 IS NULL))
WHERE b2.EMPNO /* or whatever non-nullable column there is */ IS NULL
Alternatively, though, you could use grouping:
SELECT a.EMPNO
FROM TABLE_A a
LEFT JOIN TABLE_B b ON a.EMPNO = b1.EMPNO
GROUP BY a.EMPNO
HAVING 0 IN (COUNT(CASE b.STATUS WHEN 'D' THEN 1 ELSE NULL END),
COUNT(CASE b.STATUS WHEN 'D' THEN NULL ELSE 1 END))