How to join two tables on distinct values of a column? - sql

SELECT table1.*
,address
,job
FROM table1
JOIN table2 ON table2.name = table1.name
The above query returns result for duplicate values of name too. How can I convert the query to get only one value for distinct values of name column?
I am using SQL Server

You can easily accomplish this with row_number window function. See query below:
select t1.id, t1.name, t1.pets, t2.address, t2.job
from (
select *,
row_number() over (partition by [name] order by id) rn
from Table1
) t1
join table2 t2 on t1.name = t2.name
where t1.rn = 1

I would recommend a lateral join -- apply -- for this purpose:
SELECT t1.*, t2.address, t2.job
FROM table2 t2 CROSS APPLY
(SELECT t1.*
FROM table1 t1
WHERE t2.name = t1.name
) t1;
Normally, the subquery would have an ORDER BY to specify the ordering. Otherwise the result is indeterminate.
This is often faster than using window functions for this purpose.

Related

How to Group By all fields nested tables in a Left Join query in BigQuery?

I have about 10 tables that I make one big nested tables by rounds with the following query:
R1 AS(
SELECT ANY_VALUE(Table1).*, ARRAY_AGG(( SELECT AS STRUCT Table2.* EXCEPT(ID))) AS Table2
FROM Table1 LEFT JOIN Table2 USING(ID)
GROUP BY Table1.ID),
R2 AS(
SELECT ANY_VALUE(R1).*, ARRAY_AGG(( SELECT AS STRUCT Table3.* EXCEPT(ID))) AS Table3
FROM R1 LEFT JOIN Table3 USING(ID)
GROUP BY R1.ID),
...
SELECT ANY_VALUE(R9).*, ARRAY_AGG(( SELECT AS STRUCT Table10.* EXCEPT(ID))) AS Table10
FROM R9 LEFT JOIN Table10 USING(ID)
The thing is that for example in my first table I can have two records with the same ID but some other fields will be different and I want to consider them as two distinct records and thus group by all the fields of the table while I join.
Then I want to do the same with all the "sub-table" (the R tables in the query), so I will able to group by all the fields of the nested tables.
How can I do it easily ?
I tried GROUP BY Table1.* but it doesn't work...
Thank you in advance
Try to_json_string:
...
FROM Table1 t1
...
GROUP BY to_json_string(t1)
You seem to want something like this:
select *
from table1 t1 left join
(select t2.*
from table2 t2
where true
qualify row_number() over (partition by t2.id order by t2.id) = 0
) t2
using (id)
This uses qualify instead of group by to fetch one row.
If you don't want all rows from from table1, you can whittle them down as well:
select *
from (select t1.*
from table1 t1
where true
qualify row_number() over (partition by id, col1, col2 order by id) = 1
) t1 left join
(select t2.*
from table2 t2
where true
qualify row_number() over (partition by t2.id order by t2.id) = 0
) t2
using (id)
How to Group By all fields ...?
I tried GROUP BY Table1.* but it doesn't work...
Consider below example
SELECT ANY_VALUE(t1).*,
ARRAY_AGG(( SELECT AS STRUCT t2.* EXCEPT(ID))) AS Table2
FROM Table1 t1 LEFT JOIN Table2 t2 USING(ID)
GROUP BY FORMAT('%t', t1)

How to get FIRST response in a join

I am working with this query:
select t1.*, t2.Value from `db.ds.table1` t1
join `db.ds.table2` t2
on t1.Address= t2.Address
t2.Value is identical in all join matches on Address. however, the query cartesians.
how do set the join, so I get just the "first" response from the join, and not ALL of them?
btw, there's close to 300mil per table.
Thanks!
t2.Value is identical in all join matches on Address ...
... so it really not necessarily the first but rather any ...
Below is for BigQuery Standard SQL
#standardSQL
SELECT t1.*, t2.value
FROM `project.dataset.table1` t1
JOIN (
SELECT address, ANY_VALUE(value) value
FROM `project.dataset.table2`
GROUP BY address
) t2
ON t1.address = t2.address
One method uses row_number():
select t1.*, t2.Value
from `db.ds.table1` t1 join
(select t2.*, row_number() over (partition by address order by ?) as seqnum
from `db.ds.table2` t2
) t2
on t2.address = t1.address and t2.seqnum = 1;
The ? is for the column that specifies the ordering -- what "first" means.

ORACLE SQL select two table where the same type

I have two table
-------table1-------
_name _status
aaa   Y
bbb   Y
ccc   N
-------table2-------
_name _type
aaa   AA
aaa   BB
aaa   CC
bbb   AA
bbb   BB
ccc   CC
Can I select to?
_name _status _type
aaa   Y   AA,BB,CC
bbb   Y   AA,BB
ccc   N   CC
You can use listagg():
select t1.name, t1.status,
listagg(t2.type, ',') within group (order by t2.type) as types
from table1 t1 join
table2 t2
on t1.name = t2.name
group by t1.name, t1.status;

Cannot query an alias table

I'm doing something like:
SELECT T1.NAME, T2.DATE
FROM T1
INNER JOIN
(
SELECT * FROM OTHERTABLE
) AS T2 ON T2.USERID = T1.USERID
Which works, but if I query the alias table, I get an error saying that T2 is an invalid object name.
Example:
SELECT
T1.NAME,
T2.DATE,
CASE
WHEN EXISTS (SELECT TOP 1 1 FROM T2 WHERE T2.THISFIELD = T1.THISFIELD) THEN 'HELLO'
ELSE 'BYE'
END AS COMMENT -- THIS ALSO FAILS
FROM T1
INNER JOIN
(
SELECT * FROM OTHERTABLE
) AS T2 ON T2.USERID = T1.USERID
WHERE (SELECT COUNT(*) FROM T2) > 0
I thought that's what I did, "create" T2. Is there any way I can use T2 like such ?
My goal is to scrape all the related data from OTHERTABLE once because I'll have many CASE in the SELECT clause depending whether data exists in T2 or not. I don't want to do EXISTS for every field since that'll launch a new query in a huge table everytime.
Your query using a sub-query of SELECT * FROM OTHERTABLE which doesn't make sense. You can modify it like;
SELECT
T1.NAME,
T2.DATE,
...
FROM T1
JOIN OTHERTABLE T2 ON T2.USERID = T1.USERID
WHERE (SELECT COUNT(*) FROM OTHERTABLE ) > 0
You cannot use a sub-query multiple times in the same query. Instead use a Common Table Expression (CTE) for that purpose. T2 is a CTE in the following example.
;WITH T2 AS
(
SELECT UserId, col1, col2, [Date]
FROM OtherTable
)
SELECT T1.NAME, T2.DATE
FROM T1
JOIN T2 ON T2.USERID = T1.USERID
WHERE (SELECT COUNT(*) FROM T2) > 0

SQL Query with conditional JOIN

The scenario:
Table1
CatId|Name|Description
Table2
ItId|Title|Date|CatId (foreign key)
I want to return all rows from Table1 and Title,Date from Table2, where
The returned from Table 2 must be the Latest one by the date column.
(in second table there many items with same CatId and I need just the latest)
I have 2 queries but can't merge them together:
Query 1:
SELECT Table1.Name, Table1.Description,
Table2.Title, Table2.Date
FROM
Table1 LEFT JOIN Table2 ON Table1.CatId=Table2.CatId
Query2:
SELECT TOP 1 Table2.Title, Table2.Date
FROM
Table2
WHERE
Table2.CatId = #inputParam
ORDER BY Table2.Date DESC
You can use a UNION, but you'll need to make the columns match up:
OK, after rereading the question, I understand what you're trying to do.
This should do the trick:
SELECT Table1.Name, Table1.Description,
T2.Title, T2.Date
FROM
Table1
LEFT JOIN (
SELECT CatId, Title, Date, ROW_NUMBER() over (ORDER BY CatId, Date DESC) - RANK() over (ORDER BY CatID) as Num
FROM Table2) T2 on T2.CatId = Table1.CatId AND T2.Num = 0
Sounds like you're talking about a groupwise maximum (newest row in Table2 for each matching row in Table1), in which case, the easiest way is use ROW_NUMBER:
WITH CTE AS
(
SELECT
t1.Name, t1.Description, t2.Title, t2.Date,
ROW_NUMBER() OVER (PARTITION BY t1.CatId ORDER BY t2.Date DESC) AS Seq
FROM Table1 t1
LEFT JOIN Table2 t2
ON t2.CatId = t1.CatId
)
SELECT *
FROM CTE
WHERE Seq = 1
OR Date IS NULL
Shouldn't this work?
SELECT Table1.Name, Table1.Description,
T2.Title, T2.Date
FROM
Table1 LEFT JOIN (
SELECT TOP 1 Table2.CatId Table2.Title, Table2.Date
FROM
Table2
WHERE
Table2.CatId = Table1.catId
ORDER BY Table2.Date DESC
) T2
ON Table1.CatId=T2.CatId