SQL Query with conditional JOIN - sql

The scenario:
Table1
CatId|Name|Description
Table2
ItId|Title|Date|CatId (foreign key)
I want to return all rows from Table1 and Title,Date from Table2, where
The returned from Table 2 must be the Latest one by the date column.
(in second table there many items with same CatId and I need just the latest)
I have 2 queries but can't merge them together:
Query 1:
SELECT Table1.Name, Table1.Description,
Table2.Title, Table2.Date
FROM
Table1 LEFT JOIN Table2 ON Table1.CatId=Table2.CatId
Query2:
SELECT TOP 1 Table2.Title, Table2.Date
FROM
Table2
WHERE
Table2.CatId = #inputParam
ORDER BY Table2.Date DESC

You can use a UNION, but you'll need to make the columns match up:
OK, after rereading the question, I understand what you're trying to do.
This should do the trick:
SELECT Table1.Name, Table1.Description,
T2.Title, T2.Date
FROM
Table1
LEFT JOIN (
SELECT CatId, Title, Date, ROW_NUMBER() over (ORDER BY CatId, Date DESC) - RANK() over (ORDER BY CatID) as Num
FROM Table2) T2 on T2.CatId = Table1.CatId AND T2.Num = 0

Sounds like you're talking about a groupwise maximum (newest row in Table2 for each matching row in Table1), in which case, the easiest way is use ROW_NUMBER:
WITH CTE AS
(
SELECT
t1.Name, t1.Description, t2.Title, t2.Date,
ROW_NUMBER() OVER (PARTITION BY t1.CatId ORDER BY t2.Date DESC) AS Seq
FROM Table1 t1
LEFT JOIN Table2 t2
ON t2.CatId = t1.CatId
)
SELECT *
FROM CTE
WHERE Seq = 1
OR Date IS NULL

Shouldn't this work?
SELECT Table1.Name, Table1.Description,
T2.Title, T2.Date
FROM
Table1 LEFT JOIN (
SELECT TOP 1 Table2.CatId Table2.Title, Table2.Date
FROM
Table2
WHERE
Table2.CatId = Table1.catId
ORDER BY Table2.Date DESC
) T2
ON Table1.CatId=T2.CatId

Related

How to Group By all fields nested tables in a Left Join query in BigQuery?

I have about 10 tables that I make one big nested tables by rounds with the following query:
R1 AS(
SELECT ANY_VALUE(Table1).*, ARRAY_AGG(( SELECT AS STRUCT Table2.* EXCEPT(ID))) AS Table2
FROM Table1 LEFT JOIN Table2 USING(ID)
GROUP BY Table1.ID),
R2 AS(
SELECT ANY_VALUE(R1).*, ARRAY_AGG(( SELECT AS STRUCT Table3.* EXCEPT(ID))) AS Table3
FROM R1 LEFT JOIN Table3 USING(ID)
GROUP BY R1.ID),
...
SELECT ANY_VALUE(R9).*, ARRAY_AGG(( SELECT AS STRUCT Table10.* EXCEPT(ID))) AS Table10
FROM R9 LEFT JOIN Table10 USING(ID)
The thing is that for example in my first table I can have two records with the same ID but some other fields will be different and I want to consider them as two distinct records and thus group by all the fields of the table while I join.
Then I want to do the same with all the "sub-table" (the R tables in the query), so I will able to group by all the fields of the nested tables.
How can I do it easily ?
I tried GROUP BY Table1.* but it doesn't work...
Thank you in advance
Try to_json_string:
...
FROM Table1 t1
...
GROUP BY to_json_string(t1)
You seem to want something like this:
select *
from table1 t1 left join
(select t2.*
from table2 t2
where true
qualify row_number() over (partition by t2.id order by t2.id) = 0
) t2
using (id)
This uses qualify instead of group by to fetch one row.
If you don't want all rows from from table1, you can whittle them down as well:
select *
from (select t1.*
from table1 t1
where true
qualify row_number() over (partition by id, col1, col2 order by id) = 1
) t1 left join
(select t2.*
from table2 t2
where true
qualify row_number() over (partition by t2.id order by t2.id) = 0
) t2
using (id)
How to Group By all fields ...?
I tried GROUP BY Table1.* but it doesn't work...
Consider below example
SELECT ANY_VALUE(t1).*,
ARRAY_AGG(( SELECT AS STRUCT t2.* EXCEPT(ID))) AS Table2
FROM Table1 t1 LEFT JOIN Table2 t2 USING(ID)
GROUP BY FORMAT('%t', t1)

Subquery based on column in main query

I'm looking for help structuring a SQL query with a subquery on table2 based on a column in table1, but where table1 and table2 have no relation.
something like
SELECT name, address, dateCreated,
(SELECT itemId FROM table2 WHERE itemDate BETWEEN dateCreated AND DATEADD(ss,10,dateCreated) as item
FROM table1
So for each row 'item' must be selected from table2 based on dateCreated for that row.
You can try using IF EXISTS as shown below.
SELECT name
, [address]
, dateCreated
FROM table1
where exits(
SELECT itemId
FROM table2 WHERE itemDate BETWEEN dateCreated AND DATEADD(ss, 10,dateCreated) and table1.ItemId = table2.ItemId)
SELECT name, address, dateCreated, table2.itemId
from table1 LEFT JOIN table2 WHERE itemDate BETWEEN dateCreated AND DATEADD(ss,10,dateCreated)
If you want at most one item from table2, then your approach is fine but you want top (1):
SELECT t1.name, t1.address, t1.dateCreated,
(SELECT TOP (1) t2.itemId
FROM table2 t2
WHERE t2.itemDate BETWEEN t1.dateCreated AND DATEADD(second, 10, t1.dateCreated
) as item
FROM table1 t1;
You can also phrase this as a lateral join, using outer apply:
SELECT t1.name, t1.address, t1.dateCreated,
t2.itemId
FROM table1 t1 OUTER APPLY
(SELECT TOP (1) t2.itemId
FROM table2 t2
WHERE t2.itemDate BETWEEN t1.dateCreated AND DATEADD(second, 10, t1.dateCreated
) t2;
This makes it easy to select multiple columns.
Can you please try this below logic? The way tried, will through ERROR if there are more than one records found in table2 against any row from table1.
SELECT A.name,
A.address,
A.dateCreated,
B.itemId
FROM table1 A
INNER JOIN table2 B
ON B.itemDate BETWEEN A.dateCreated AND DATEADD(ss,10,A.dateCreated)
With the above query, you will get N numbers of row for each row in table1 based the logic applied for Date --BETWEEN A.dateCreated AND DATEADD(ss,10,A.dateCreated)

How to join two tables on distinct values of a column?

SELECT table1.*
,address
,job
FROM table1
JOIN table2 ON table2.name = table1.name
The above query returns result for duplicate values of name too. How can I convert the query to get only one value for distinct values of name column?
I am using SQL Server
You can easily accomplish this with row_number window function. See query below:
select t1.id, t1.name, t1.pets, t2.address, t2.job
from (
select *,
row_number() over (partition by [name] order by id) rn
from Table1
) t1
join table2 t2 on t1.name = t2.name
where t1.rn = 1
I would recommend a lateral join -- apply -- for this purpose:
SELECT t1.*, t2.address, t2.job
FROM table2 t2 CROSS APPLY
(SELECT t1.*
FROM table1 t1
WHERE t2.name = t1.name
) t1;
Normally, the subquery would have an ORDER BY to specify the ordering. Otherwise the result is indeterminate.
This is often faster than using window functions for this purpose.

where column in from another select results with limit (mysql/mariadb)

when i run this query returns all rows that their id exist in select from table2
SELECT * FROM table1 WHERE id in (
SELECT id FROM table2 where name ='aaa'
)
but when i add limit or between to second select :
SELECT * FROM table1 WHERE id in (
SELECT id FROM table2 where name ='aaa' limit 4
)
returns this error :
This version of MariaDB doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
You are using LIMIT without an ORDER BY. This is generally not recommended because that returns an arbitrary set of rows -- and those can change from one execution to another.
You can convert this to a JOIN -- fortunately. If id is not duplicated in table2:
SELECT t1.*
FROM table1 t1 JOIN
(SELECT t2.id
FROM table2 t2
WHERE t2.name = 'aaa'
LIMIT 4
) t2
USING (id);
If id can be duplicated in table2, then:
SELECT t1.*
FROM table1 t1 JOIN
(SELECT DISTINCT t2.id
FROM table2 t2
WHERE t2.name = 'aaa'
LIMIT 4
) t2
USING (id);
Another fun way uses LIMIT:
SELECT t1.*
FROM table1 t1
WHERE id <= ANY (SELECT t2.id
FROM table2
WHERE t2.name = 'aaa'
ORDER BY t2.id
LIMIT 1 OFFSET 3
);
LIMIT is allowed in a scalar subquery.
You can use an analytic function such as ROW_NUMBER() in order to return one row from the subquery. I suppose, this way no problem would occur like raising too many rows issue :
SELECT * FROM
(
SELECT t1.*,
ROW_NUMBER() OVER (ORDER BY t2.id DESC) AS rn
FROM table1 t1
JOIN table2 t2 ON t2.id = t1.id
WHERE t2.name ='aaa'
) t
WHERE rn = 1
P.S.: Btw, id columns are expected to be primary keys of your tables, aren't they ?
Update ( depending on your need in the comment ) Consider using :
SELECT * FROM
(
SELECT j.*,
ROW_NUMBER() OVER (ORDER BY j.id DESC) AS rn2
FROM job_forum j
CROSS JOIN
( SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t2.id ORDER BY t2.id DESC) AS rn1
FROM table2 t2
WHERE t2.name ='aaa'
AND t2.id = j.id ) t2
WHERE rn1 = 1
) jj
WHERE rn2 <= 10

Oracle nested correlated subquery problem

Consider table1 and table2 with a one-to-many relationship (table1 is the master table and table2 is the detail table). I want to get records from table1 where some value ('XXX') is the value of the most recent record in table2 of the detail records correlated to table1. What I want to do is this:
select t1.pk_id
from table1 t1
where 'XXX' = (select a_col
from ( select a_col
from table2 t2
where t2.fk_id = t1.pk_id
order by t2.date_col desc)
where rownum = 1)
But, because the reference to table1 (t1) in the correlated subquery is two-levels deep, it pops up with an Oracle error (invalid id t1). I need to be able to rewrite this, but the one caveat is that only the where clause may be changed (i.e. the initial select and from must remain unchanged). Can it be done?
Here's a different analytic approach:
select t1.pk_id
from table1 t1
where 'XXX' = (select distinct first_value(t2.a_col)
over (order by t2.date_col desc)
from table2 t2
where t2.fk_id = t1.pk_id)
And here's the same idea using a ranking function:
select t1.pk_id
from table1 t1
where 'XXX' = (select max(t2.a_col) keep
(dense_rank first order by t2.date_col desc)
from table2 t2
where t2.fk_id = t1.pk_id)
you could use analytics here: join table1 to table2, take the most recent table2 record for each element in table1 and verify that this most recent element has a value of 'XXX':
SELECT *
FROM (SELECT t1.*,
t2.a_col,
row_number() over (PARTITION BY t1.pk
ORDER BY t2.date_col DESC) rnk
FROM table1 t1
JOIN table2 t2 ON t2.fk_id = t1.pk_id)
WHERE rnk = 1
AND a_col = 'XXX'
Update: Without modifying the top-level SELECT, you could write a query like this:
SELECT t1.pk_id
FROM table1 t1
WHERE 'XXX' =
(SELECT a_col
FROM (SELECT a_col,
t2_in.fk_id,
row_number() over(PARTITION BY t2_in.fk_id
ORDER BY t2_in.date_col DESC) rnk
FROM table2 t2_in) t2
WHERE rnk = 1
AND t2.fk_id = t1.pk_id)
Basically you only join (SEMI-JOIN) the rows from table2 that are the most recent for each fk_id
Try this:
select t1.pk_id
from table1 t1
where 'XXX' =
(select a_col
from table2 t2
where t2.fk_id = t1.pk_id
and t2.date_col =
(select max(t3.date_col)
from table2 t3
where t3.fk_id = t2.fk_id)
)
Does this do what you are looking for?
select t1.pk_id
from table1 t1
where 'XXX' = ( select a_col
from table2 t2
where t2.fk_id = t1.pk_id
t2.date_col = (select max(date_col) from table2 where fk_id = t1.pk_id)
)