How to get FIRST response in a join

How to get FIRST response in a join - sql

I am working with this query:
select t1.*, t2.Value from `db.ds.table1` t1
join `db.ds.table2` t2
on t1.Address= t2.Address
t2.Value is identical in all join matches on Address. however, the query cartesians.
how do set the join, so I get just the "first" response from the join, and not ALL of them?
btw, there's close to 300mil per table.
Thanks!

t2.Value is identical in all join matches on Address ...
... so it really not necessarily the first but rather any ...
Below is for BigQuery Standard SQL
#standardSQL
SELECT t1.*, t2.value
FROM `project.dataset.table1` t1
JOIN (
SELECT address, ANY_VALUE(value) value
FROM `project.dataset.table2`
GROUP BY address
) t2
ON t1.address = t2.address

One method uses row_number():
select t1.*, t2.Value
from `db.ds.table1` t1 join
(select t2.*, row_number() over (partition by address order by ?) as seqnum
from `db.ds.table2` t2
) t2
on t2.address = t1.address and t2.seqnum = 1;
The ? is for the column that specifies the ordering -- what "first" means.

Related

Displaying records only when there are more than same three values

Is there a way to display only records where one email corresponds to more than 3 names?
I have tried the code below, but it does not return anything.
SELECT
t1.Name, t2.Email
FROM
Table1 t1
JOIN Table2 t2 on t1.ID=t2.PersonID
GROUP BY
t1.Name, t2.Email
HAVING COUNT(t2.Email) > 3

If your version supports window function then you can do :
select t.*
from (select t1.name, t2.email,
count(t2.email) over (partition by t1.name) as cnt
from t1 inner join
t2
on t2.personid = t1.id
) t
where cnt > 3;

I think you want:
SELECT t1.Name, GROUP_CONCAT(t2.Email)
FROM Table1 t1 JOIN
Table2 t2
ON t1.ID = t2.PersonID
GROUP BY t1.Name
HAVING COUNT(t2.Email) > 3;
The big change is to the GROUP BY -- this does not have EMAIL. You seem to want the emails returns on each row, so this concatenates them together.
EDIT:
In SQL Server, you would use string_agg():
SELECT t1.Name, STRING_AGG(t2.Email, ',')
FROM Table1 t1 JOIN
Table2 t2
ON t1.ID = t2.PersonID
GROUP BY t1.Name
HAVING COUNT(t2.Email) > 3;
Or, if you want individual rows:
SELECT n.*
FROM (SELECT t1.Name, t2.Email,
COUNT(*) OVER (PARTITION BY t1.Name) as cnt
FROM Table1 t1 JOIN
Table2 t2
ON t1.ID = t2.PersonID
) n
WHERE cnt > 3
ORDER BY Name;

How to join two tables on distinct values of a column?

SELECT table1.*
,address
,job
FROM table1
JOIN table2 ON table2.name = table1.name
The above query returns result for duplicate values of name too. How can I convert the query to get only one value for distinct values of name column?
I am using SQL Server

You can easily accomplish this with row_number window function. See query below:
select t1.id, t1.name, t1.pets, t2.address, t2.job
from (
select *,
row_number() over (partition by [name] order by id) rn
from Table1
) t1
join table2 t2 on t1.name = t2.name
where t1.rn = 1

I would recommend a lateral join -- apply -- for this purpose:
SELECT t1.*, t2.address, t2.job
FROM table2 t2 CROSS APPLY
(SELECT t1.*
FROM table1 t1
WHERE t2.name = t1.name
) t1;
Normally, the subquery would have an ORDER BY to specify the ordering. Otherwise the result is indeterminate.
This is often faster than using window functions for this purpose.

Cannot query an alias table

I'm doing something like:
SELECT T1.NAME, T2.DATE
FROM T1
INNER JOIN
(
SELECT * FROM OTHERTABLE
) AS T2 ON T2.USERID = T1.USERID
Which works, but if I query the alias table, I get an error saying that T2 is an invalid object name.
Example:
SELECT
T1.NAME,
T2.DATE,
CASE
WHEN EXISTS (SELECT TOP 1 1 FROM T2 WHERE T2.THISFIELD = T1.THISFIELD) THEN 'HELLO'
ELSE 'BYE'
END AS COMMENT -- THIS ALSO FAILS
FROM T1
INNER JOIN
(
SELECT * FROM OTHERTABLE
) AS T2 ON T2.USERID = T1.USERID
WHERE (SELECT COUNT(*) FROM T2) > 0
I thought that's what I did, "create" T2. Is there any way I can use T2 like such ?
My goal is to scrape all the related data from OTHERTABLE once because I'll have many CASE in the SELECT clause depending whether data exists in T2 or not. I don't want to do EXISTS for every field since that'll launch a new query in a huge table everytime.

Your query using a sub-query of SELECT * FROM OTHERTABLE which doesn't make sense. You can modify it like;
SELECT
T1.NAME,
T2.DATE,
...
FROM T1
JOIN OTHERTABLE T2 ON T2.USERID = T1.USERID
WHERE (SELECT COUNT(*) FROM OTHERTABLE ) > 0
You cannot use a sub-query multiple times in the same query. Instead use a Common Table Expression (CTE) for that purpose. T2 is a CTE in the following example.
;WITH T2 AS
(
SELECT UserId, col1, col2, [Date]
FROM OtherTable
)
SELECT T1.NAME, T2.DATE
FROM T1
JOIN T2 ON T2.USERID = T1.USERID
WHERE (SELECT COUNT(*) FROM T2) > 0

Can I use the exists function in the select part of an SQL query?

I need to run a query where one of the fields returned is a yes or no if there is a row in another table matching one of the key fields in the first table.
Sounds like a job for join, except the second table is one to many and I just need to know if there are zero or a non zero number of rows in the secondary table.
I could do something like this:
select t1.name, t1.id, (select count(1) from t2 where t1.id=t2.id) from t1
but I'd like to avoid making an aggregate subquery if possible.
It was mentioned to me that I could use the exists() function, but I'm not seeing how to do that in a select field.
This is sybase 15 by the way.

You could still do the JOIN, something like this:
SELECT t1.name, t1.id, CASE WHEN t2.id IS NULL THEN 0 ELSE 1 END Existst2
FROM t1
LEFT JOIN (SELECT id FROM t2 GROUP BY id) t2
ON t1.id = t2.id

ahhh, I got it from another stackoverflow quetion...
case when exists (select * from t2 where t1.id = t2.id) then 1 else 0 end

I am just writing down the syntax here:
if exists (select * from table1 t1 inner join table1 t2 on t1.id = t2.id )
select * from table2

How about this query ( Work with all databases )
select t1.name, t1.id, 'Y' as HasChild
from t1
where exists ( select 1 from t2 where t2.id = t1.id)
UNION
select t1.name, t1.id, 'N' as HasChild
from t1
where NOT exists ( select 1 from t2 where t2.id = t1.id)

SQL Query with conditional JOIN

The scenario:
Table1
CatId|Name|Description
Table2
ItId|Title|Date|CatId (foreign key)
I want to return all rows from Table1 and Title,Date from Table2, where
The returned from Table 2 must be the Latest one by the date column.
(in second table there many items with same CatId and I need just the latest)
I have 2 queries but can't merge them together:
Query 1:
SELECT Table1.Name, Table1.Description,
Table2.Title, Table2.Date
FROM
Table1 LEFT JOIN Table2 ON Table1.CatId=Table2.CatId
Query2:
SELECT TOP 1 Table2.Title, Table2.Date
FROM
Table2
WHERE
Table2.CatId = #inputParam
ORDER BY Table2.Date DESC

You can use a UNION, but you'll need to make the columns match up:
OK, after rereading the question, I understand what you're trying to do.
This should do the trick:
SELECT Table1.Name, Table1.Description,
T2.Title, T2.Date
FROM
Table1
LEFT JOIN (
SELECT CatId, Title, Date, ROW_NUMBER() over (ORDER BY CatId, Date DESC) - RANK() over (ORDER BY CatID) as Num
FROM Table2) T2 on T2.CatId = Table1.CatId AND T2.Num = 0

Sounds like you're talking about a groupwise maximum (newest row in Table2 for each matching row in Table1), in which case, the easiest way is use ROW_NUMBER:
WITH CTE AS
(
SELECT
t1.Name, t1.Description, t2.Title, t2.Date,
ROW_NUMBER() OVER (PARTITION BY t1.CatId ORDER BY t2.Date DESC) AS Seq
FROM Table1 t1
LEFT JOIN Table2 t2
ON t2.CatId = t1.CatId
)
SELECT *
FROM CTE
WHERE Seq = 1
OR Date IS NULL

Shouldn't this work?
SELECT Table1.Name, Table1.Description,
T2.Title, T2.Date
FROM
Table1 LEFT JOIN (
SELECT TOP 1 Table2.CatId Table2.Title, Table2.Date
FROM
Table2
WHERE
Table2.CatId = Table1.catId
ORDER BY Table2.Date DESC
) T2
ON Table1.CatId=T2.CatId

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to get FIRST response in a join - sql

Related

Displaying records only when there are more than same three values

How to join two tables on distinct values of a column?

Cannot query an alias table

Can I use the exists function in the select part of an SQL query?

SQL Query with conditional JOIN

Categories

Resources