How to join large subset of data with smaller subset data - sql

I have three tables in SQL Server
TABLE_A - contains 500 rows
TABLE_B - contains 1 million rows
TABLE_C - contains 1 million rows
I want to select the rows from TABLE_B and TABLE_C join with TABLE_A based on a row number position from TABLE_B and TABLE_C tables.
Below is my sample query:
SELECT TOP (50), *
INTO ##tempResult
FROM TABLE_A
LEFT JOIN
(SELECT *
FROM
(SELECT
memberID,
ROW_NUMBER() OVER (PARTITION BY TABLE_A.member_id ORDER BY TABLE_A EM.UTupdateDate DESC) AS rowNum,
FROM
TABLE_B
JOIN
TABLE_C ON TABLE_B.memberID = TABLE_C.memberID
)
) AS TABLE_subset
WHERE
TABLE_subset.rowNum <=2
) AS TABLE_INC ON TABLE_A.memberID = TABLE_INC.memberID
WHERE TABLE_A.colA = 'XYZ'
Here the TABLE_subset is joining entire records in TABLE_B and TABLE_C, but I want to join only the top 50 records with TABLE_A.
Is there any way to achieve this ?

Your question and query doesn't match exactly, but CROSS APPLY is probably your friend here.
The general idea is:
select TOP 50 *
from tableA a
CROSS APPLY (
SELECT TOP 2 b.id, c.otherid
from tableB b
inner join tableC c
ON c.id = b.id
where b.id = a.id -- Here you match field between A and B
order by b.date DESC -- order by something
) data
Now just need to adapt to your needs

Related

Sparksql to select certain records against 3 tables

I have 3 tables and need to fetch the records as below
Table_A,
Table_B,
Table_C
Select only Table_A records which are common in Table_B & Table_C and ignore which are not common in both Table_B & Table_C finally results would be no duplicates.
Approach 1 Tried: inner join Table_A with Table_B and again separate inner join Table_A with Table_C finally did union.
Ab = Table_A.join(Table_B,Table_A["id"] == Table_B["id"], "inner").select(common columns)
Ac = Table_A.join(Table_C,Table_A["id"] == Table_C["id"], "inner").select(common columns)
result = Ab.union(Ac) <<Got more duplicates>>
result = result,dropDuplicates(["id"])
But still I got the duplicates.
Approach 2 Tried with SparkSql:
Table_A
left outer
Table_B
on A.id = B.id
left outer Table_C
on A.id = c.id
In this Approach, no duplicates but more records than Table_A also the uncommon records.
Any suggestion and best approach would be apprciated
In Spark SQL, I would recommend exists:
select a.*
from table_a a
where exists (select 1 from table_b b on b.id = a.id)
and exists (select 1 from table_c c on c.id = a.id)
This does the filtering you want, and will not duplicate records of table_a in the resuletset, even if there are multiple matches in table_b or table_c.

Oracle SQL: Get all users in one table but not another and join to a third table

I am wondering how to use oracle sql to get all the rows that are in one table but not another. The issue I am having is that the two tables don't have a field in common so I need to join to a third master table.
This is what I've tried which doesn't produce any errors but also produces 0 records which isn't possible but clearly I've done something wrong.
SELECT a.USER_ID, c.AD_ID, c.CREATED_DATE_ FROM $A$ a, $C$ c, $B$ b
WHERE (b.USER_ID IS NULL AND a.CUSTOMER_ID = c.CUSTOMER_ID)
I have three tables:
Table A has fields CUSTOMER_ID & USER_ID
Table B has field USER_ID
Table C has field CUSTOMER_ID
I need all the users that are in table C but not table B. They are all in Table A because that is the master list of users.
Any insight would be greatly appreciated.
SELECT
*
FROM
table_a
WHERE
NOT EXISTS (SELECT * FROM table_b WHERE table_b.user_id = table_a.user_id )
AND EXISTS (SELECT * FROM table_c WHERE table_c.customer_id = table_a.customer_id)
My solution:
select * from TableC tc
join TableA ta on tc.CUSTOMER_ID=ta.CUSTOMER_ID
left join TableB tb on tb.USER_ID=ta.USER_ID
where ta.USER_ID is null
I think you want:
select a.USER_ID, c.AD_ID, c.CREATED_DATE_
from a join
c
on a.customer_id = c.customer_id
where not exists (select 1 from b where b.user_id = a.user_id);

Need to retrieve all records in table A and only single one in table B that is the last updated

I have to retrieve certain records in TABLE_A - then need to display the last time the row was updated - which is in TABLE_B (however, there are many records that correlate in TABLE_B). TABLE_A's TABLE_A.PK is ID and links to TABLE_B through TABLE_B.LINK, where the schema would be:
TABLE_A
===================
ID NUMBER
DESC VARCHAR2
TABLE_B
===================
ID NUMBER
LINK NUMBER
LAST_DATE DATE
And the actual table data would be:
TABLE_A
===================
100 DESCRIPTION0
101 DESCRIPTION1
TABLE_B
===================
1 100 12/12/2012
2 100 12/13/2012
3 100 12/14/2013
4 101 12/12/2012
5 101 12/13/2012
6 101 12/14/2013
So, I would need something to read out:
Result
====================
100 DESCRIPTION0 12/14/2013
101 DESCRIPTION1 12/14/2013
I tried to join different ways, but nothing seems to work:
select * from
(SELECT ID, DESC from TABLE_A WHERE ID >= 100) TBL_A
full outer join
(select LAST_DATE from TABLE_B WHERE ROWNUM = 1 order by LAST_DATE DESC) TBL_B
on TBL_A.ID = TBL_B.LINK;
The easiest thing to do would be to join table_a with an aggregate query on table_b:
SELECT table_a.*, table_b.last_date
FROM table_a
LEFT JOIN (SELECT link, MAX(last_date) AS last_date
FROM table_b
GROUP BY link) table_b ON table_a.id = table_b.link
If you just want the most recent date, think aggregation and join. The extra levels of subqueries do not help. Something like:
select a.id, a.desc, max(last_date)
from table_a a join
table_b b
on a.id = b.link
where a.id >= 100
group by a.id, a.desc;
Note: I doubt a full outer join is necessary, although you can keep that if you have join keys that don't match between the tables. Perhaps a left join is appropriate.
I should point out that if you want more fields from b, then your initial inclination to use row_number() is correct. But the query would look like:
select a.id, a.desc, max(last_date)
from table_a a left join
(select b.*, row_number() over (partition by link order by last_date desc) as seqnum
from table_b b
) b
on a.id = b.link and b.seqnum = 1
where a.id >= 100
group by a.id, a.desc;

SQL joined by last date

This is a question asked here before more than once, however I couldn't find what I was looking for. I am looking for join two tables, where the joined table is set by the last register ordered by date time, until here all is ok.
My trouble start on having more than two records on the joined table, let me show you a sample
table_a
-------
id
name
description
created
updated
table_b
-------
id
table_a_id
name
description
created
updated
What I have done at the beginning was:
SELECT a.id, b.updated
FROM table_a AS a
LEFT JOIN (SELECT table_a_id, max (updated) as updated
FROM table_b GROUP BY table_a_id ) AS b
ON a.id = b.table_a_id
Until here I was getting cols, a.id and b.updated. I need the full table_b cols, but when I try to add a new col to my query, Postgres tells me that I need to add my col to a GROUP BY criteria in order to complete the query, and the result is not what I am looking for.
I am trying to find a way to have this list.
DISTINCT ON or is your friend. Here is a solution with correct syntax:
SELECT a.id, b.updated, b.col1, b.col2
FROM table_a as a
LEFT JOIN (
SELECT DISTINCT ON (table_a_id)
table_a_id, updated, col1, col2
FROM table_b
ORDER BY table_a_id, updated DESC
) b ON a.id = b.table_a_id;
Or, to get the whole row from table_b:
SELECT a.id, b.*
FROM table_a as a
LEFT JOIN (
SELECT DISTINCT ON (table_a_id)
*
FROM table_b
ORDER BY table_a_id, updated DESC
) b ON a.id = b.table_a_id;
Detailed explanation for this technique as well as alternative solutions under this closely related question:
Select first row in each GROUP BY group?
Try:
SELECT a.id, b.*
FROM table_a AS a
LEFT JOIN (SELECT t.*,
row_number() over (partition by table_a_id
order by updated desc) rn
FROM table_b t) AS b
ON a.id = b.table_a_id and b.rn=1
You can use Postgres's distinct on syntax:
select a.id, b.*
from table_a as a left join
(select distinct on (table_a_id) table_a_id, . . .
from table_b
order by table_a_id, updated desc
) b
on a.id = b.table_a_id
Where the . . . is, you should put in the columns that you want.

LEFT JOIN - How to join tables and include extra row even if you have right match

I have two tables
Table A
-------
ID
ProductName
Table B
-------
ID
ProductID
Size
I want to join these two tables
SELECT * FROM
(SELECT * FROM A)
LEFT JOIN
(SELECT * FROM B)
ON A.ID = B.ProductID
This is easy, I will get all rows from A multiplied by rows matched in B, and NULL fields if there is no match.
But here comes the tricky question, how can I get all rows from A with NULL fields for table B, even if there is a match, so I get an extra line with NULL values plus all the matches?
SELECT A.*
, B3.ID
, B3.ProductID
, B3.Size
FROM A
LEFT JOIN
(
SELECT ProductID as MatchID
, ID
, ProductID
, Size
FROM B
UNION ALL
SELECT ID
, null
, null
, null
FROM A A2
) B3
ON A.ID = B3.MatchID
Live example at SQL Fiddle.
Instead of using UNION ALL in a subquery as suggested by others, you could also (and I would) use UNION ALL at the outer level, which keeps the query simpler:
SELECT A.ID, A.ProductName, B.ID, B.Size
FROM A
INNER JOIN B
ON B.ProductID = A.ID
UNION ALL
SELECT A.ID, A.ProductName, NULL, NULL
FROM A
Since every join is going to be successful, we can switch to a full/inner join:
SELECT
*
FROM
A
INNER JOIN
(SELECT ID,ProductID,Size FROM B
UNION ALL
SELECT NULL,ID,NULL FROM A) B
ON
A.ID = B.ProductID
Now would be a very good time to switch to naming columns explicitly, rather than using SELECT *
Or, if, as per #Andomar's comment, you need all of the B columns to be NULL:
SELECT
A.ID,A.ProductName,
B.ID,B.ProductID,B.Size
FROM
A
INNER JOIN
(SELECT ID,ProductID,Size,ProductID as MatchID FROM B
UNION ALL
SELECT NULL,NULL,NULL,ID FROM A) B
ON
A.ID = B.MatchID