LEFT JOIN FIlter - sql

If I have two tables - Table_A and Table_B - and if I am using LEFT JOIN to join them, how can I filter only those rows from Table_B which joined with the rows in the Table_A more than once?
DB flavor: Teradata

If I'm not mistaken Teradata supports window functions, so this might work:
select *
from (
select a.*,
b.*
count(*) over (partition by a.MyCol) as cnt
from Table_A a
left join Table_B b ON a.MyCol = b.MyCol
where ... -- Conditions
) t
where cnt > 1
(not tested)

Here is a Teradata-specific version of your accepted answer:
select a.*,
b.*
from Table_A a
left join Table_B b
ON a.MyCol = b.MyCol
where ... -- Conditions
QUALIFY count(*) over (partition by a.MyCol) > 1
Note that QUALIFY is a Teradata extension to the ANSI standard (and a handy one at that).

may be it's help for you
1) you can used INNER JOIN .
2) you can also check joind row is not null or blank .

Select a.*,b.* from Table_A a
left join Table_B b on condition
HAVING COUNT(DISTINCT a.value)>1
make necessary edits and check

Related

Exposing more fields on group by sql

I know, in a Group By you can't Select a field that is not in an aggregate function or the GROUP BY clause.
However, There must be a workaround using joins or something else.
I have TWO tables BMP_VISITS_SITES and BMP_VISITS_COMMENTS which are connected by StationID in a one-to-many relationship. One Site can have many comments.
I'm trying to write a query that returns all Sites and the latest (only 1) comment. I have a "working" query but it only returns two columns which are in either an aggregate function or group by.
Here is my "working" query:
select a.StationID,
MAX(b.[dateobserved]) as LastDateObserved,
a.Status
from BMP_VISITS_SITES a
left outer join BMP_VISITS_COMMENTS as b
on a.[StationID] = b.[StationID]
group by a.StationID;
But how can I access all the columns in both tables?
I've tried inner joins with 1/2 success. When I join my BMP_VISITS_SITES to the above query I get all the fields of the table (t1). Great, but as soon as I try joining on BMP_VISITS_COMMENTS (t3) I get more results than I should.
select t1.*, t2.*
--,t3.*
from BMP_VISITS_SITES t1
inner join (
select a.StationID, MAX(b.[dateobserved]) as LastDateObserved from BMP_VISITS_SITES a
left outer join BMP_VISITS_COMMENTS as b
on a.[StationID] = b.[StationID]
group by a.StationID
) t2 on t2.StationID = t1.StationID
--inner join sde.BMP_VISITS_COMMENTS t3 on t3.StationID = t2.StationID;
SELECT a.*, b.* FROM
BMP_VISITS_SITES a
OUTER APPLY
(
SELECT TOP 1 *
FROM BMP_VISITS_COMMENTS b
WHERE b.StationID = a.StationID
ORDER BY LastDateObserved DESC
) b
You can use apply to get the last comment record and return all fields from both sides of the query.
Use row_number()
select *
from
(
select a.StationID,
a.Status,
b.*,
row_number() over (partition by a.stationid, a.status order by b.[dateobserved] desc) as rn
from BMP_VISITS_SITES a
left outer join BMP_VISITS_COMMENTS as b
on a.[StationID] = b.[StationID]
) v
where rn = 1

Oracle SQL WITH clause select joined column

SQL:
WITH joined AS (
SELECT *
FROM table_a a
JOIN table_b b ON (a.a_id = b.a_id)
)
SELECT a_id
FROM joined
returns invalid identifier.
How can you select joined column when using WITH clause? I have tried aliases, prefixing and nothing worked. I know I can use:
WITH joined AS (
SELECT a.a_id
FROM table_a a
JOIN table_b b ON (a.a_id = b.a_id)
)
SELECT a_id
FROM joined
but I need this alias to cover all fields.
Only way I managed to meet this condition is using:
WITH joined AS (
SELECT a.a_id a_id_alias, a.*, b.*
FROM table_a a
JOIN table_b b ON (a.a_id = b.a_id)
)
SELECT a_id_alias
FROM joined
but it is not perfect solution...
You can use the effect of the USING clause when joining the tables.
When you join tables where the join columns have the same name (as it is the case with your example), the USING clause will return the join column only once, so the following works:
with joined as (
select *
from table_a a
join table_b b using (a_id)
)
select a_id
from joined;
SQLFiddle example: http://sqlfiddle.com/#!4/e7e099/2
I don't think you can do this without aliases. The result of the "joined" query has two fields, both named a_id. Unless you alias one (or both), as you did in your final query, the outer query has no idea which a_id you are referring to.
Why is your final query not a "perfect" solution?
You can probably use alias as below:
WITH JOINED AS (
SELECT A.A_ID A_A_ID, B.A_ID B_A_ID,
A.FIELD_NAME1 A_FIELDNAME1, A.FIELDNAME2 A_FIELDNAME2,A.FIELDNAME_N A_FIELDNAME_N,
B.FIELD_NAME1 B_FIELDNAME1, B.FIELDNAME2 B_FIELDNAME2,B.FIELDNAME_N B_FIELDNAME_N,
FROM TABLE_A A
JOIN TABLE_B B ON (A.A_ID = B.A_ID)
)
SELECT A_A_ID, B_A_ID
FROM JOINED
IT IS ALWAYS A GOOD PRACTICE TO AVOID USING SELECT *

SQL joined by last date

This is a question asked here before more than once, however I couldn't find what I was looking for. I am looking for join two tables, where the joined table is set by the last register ordered by date time, until here all is ok.
My trouble start on having more than two records on the joined table, let me show you a sample
table_a
-------
id
name
description
created
updated
table_b
-------
id
table_a_id
name
description
created
updated
What I have done at the beginning was:
SELECT a.id, b.updated
FROM table_a AS a
LEFT JOIN (SELECT table_a_id, max (updated) as updated
FROM table_b GROUP BY table_a_id ) AS b
ON a.id = b.table_a_id
Until here I was getting cols, a.id and b.updated. I need the full table_b cols, but when I try to add a new col to my query, Postgres tells me that I need to add my col to a GROUP BY criteria in order to complete the query, and the result is not what I am looking for.
I am trying to find a way to have this list.
DISTINCT ON or is your friend. Here is a solution with correct syntax:
SELECT a.id, b.updated, b.col1, b.col2
FROM table_a as a
LEFT JOIN (
SELECT DISTINCT ON (table_a_id)
table_a_id, updated, col1, col2
FROM table_b
ORDER BY table_a_id, updated DESC
) b ON a.id = b.table_a_id;
Or, to get the whole row from table_b:
SELECT a.id, b.*
FROM table_a as a
LEFT JOIN (
SELECT DISTINCT ON (table_a_id)
*
FROM table_b
ORDER BY table_a_id, updated DESC
) b ON a.id = b.table_a_id;
Detailed explanation for this technique as well as alternative solutions under this closely related question:
Select first row in each GROUP BY group?
Try:
SELECT a.id, b.*
FROM table_a AS a
LEFT JOIN (SELECT t.*,
row_number() over (partition by table_a_id
order by updated desc) rn
FROM table_b t) AS b
ON a.id = b.table_a_id and b.rn=1
You can use Postgres's distinct on syntax:
select a.id, b.*
from table_a as a left join
(select distinct on (table_a_id) table_a_id, . . .
from table_b
order by table_a_id, updated desc
) b
on a.id = b.table_a_id
Where the . . . is, you should put in the columns that you want.

How to select records from a Table that has a certain number of rows in a related table in SQL Server?

Not quite sure how to ask this, but I have 2 tables that are related in a 1 to many relationship, I need to select all records in the "1" table that have less than three records in the "many' table.
select b.foreignkey,count(b.foreignkey) as bidcount
from b
where b.foreignkey in (select a.id from a) and bidcount< 3
group by b.foreignkey
this doesn't work at all I know but I am at a loss how to do this.
I need to in the end select all the records from the "a" table based on this criteria. Sorry if that is confusing!
Just using your code, not tested:
SELECT
b.foreignkey,
count(b.foreignkey) as bidcount
FROM
b
WHERE
b.foreignkey IN (SELECT a.id FROM a)
GROUP BY
b.foreignkey
HAVING
count(b.foreignkey) < 3
Try this:
SELECT t1.id,COUNT(t2.parentId)
FROM table1 as t1
INNER JOIN table2 as t2
ON t1.id = t2.parentId
GROUP BY t1.id
HAVING COUNT(t2.parentId) < 3
You didn't mention which version of SQL Server you're using - if you're on SQL Server 2005 or newer, you could use this CTE (Common Table Expression):
;WITH ChildRows AS
(
SELECT A.Id, COUNT(b.Id) AS 'BCount'
FROM
dbo.TableA A
INNER JOIN
dbo.TableB B ON B.TableAId = A.Id
)
SELECT A.*, R.BCount
FROM dbo.TableA A
INNER JOIN ChildRows R ON A.Id = R.Id
The inner SELECT lists the Id columns from TableA and the count of the child rows associated with those (using the INNER JOIN to TableB) - and the outer SELECT just builds on top of that result set and shows all fields from table A (and the count from the B table)
if you want to return all fields of your (1) table in one query, I suggest you consider using CROSS APPLY:
SELECT t1.* FROM table_1 t1
CROSS APPLY (SELECT COUNT(*) cnt FROM Table_Many t2 WHERE t2.fk = t1.pk) a
where a.cnt < 3
in some particular cases, based on your indices and db structure, this query may run 4 times faster than the GROUP BY method
you have posted this question in sql server, I have a answer in oracle database system (don't know whether it will run in sql server as well or not)
this is as follow-
select [desired column list] from
(select b.*, count(*) over (partition by b.foreignkey) c_1
from b
where b.foreignkey in (select a.id from a) )
where c_1 < 3 ;
i hope it should work on sql server as well...
if not please let me update ..

How do I find records that are not joined?

I have two tables that are joined together.
A has many B
Normally you would do:
select * from a,b where b.a_id = a.id
To get all of the records from a that has a record in b.
How do I get just the records in a that does not have anything in b?
select * from a where id not in (select a_id from b)
Or like some other people on this thread says:
select a.* from a
left outer join b on a.id = b.a_id
where b.a_id is null
select * from a
left outer join b on a.id = b.a_id
where b.a_id is null
The following image will help to understand SQL LET JOIN :
Another approach:
select * from a where not exists (select * from b where b.a_id = a.id)
The "exists" approach is useful if there is some other "where" clause you need to attach to the inner query.
SELECT id FROM a
EXCEPT
SELECT a_id FROM b;
You will probably get a lot better performance (than using 'not in') if you use an outer join:
select * from a left outer join b on a.id = b.a_id where b.a_id is null;
SELECT <columnns>
FROM a WHERE id NOT IN (SELECT a_id FROM b)
In case of one join it is pretty fast, but when we are removing records from database which has about 50 milions records and 4 and more joins due to foreign keys, it takes a few minutes to do it.
Much faster to use WHERE NOT IN condition like this:
select a.* from a
where a.id NOT IN(SELECT DISTINCT a_id FROM b where a_id IS NOT NULL)
//And for more joins
AND a.id NOT IN(SELECT DISTINCT a_id FROM c where a_id IS NOT NULL)
I can also recommended this approach for deleting in case we don't have configured cascade delete.
This query takes only a few seconds.
The first approach is
select a.* from a where a.id not in (select b.ida from b)
the second approach is
select a.*
from a left outer join b on a.id = b.ida
where b.ida is null
The first approach is very expensive. The second approach is better.
With PostgreSql 9.4, I did the "explain query" function and the first query as a cost of cost=0.00..1982043603.32.
Instead the join query as a cost of cost=45946.77..45946.78
For example, I search for all products that are not compatible with no vehicles. I've 100k products and more than 1m compatibilities.
select count(*) from product a left outer join compatible c on a.id=c.idprod where c.idprod is null
The join query spent about 5 seconds, instead the subquery version has never ended after 3 minutes.
Another way of writing it
select a.*
from a
left outer join b
on a.id = b.id
where b.id is null
Ouch, beaten by Nathan :)
This will protect you from nulls in the IN clause, which can cause unexpected behavior.
select * from a where id not in (select [a id] from b where [a id] is not null)