SQL: Inner Join return one row based on criteria - sql

This is probably simple, but i'm looking for the raw SQL to perform an INNER JOIN but only return one of the matches on the second table based on criteria.
Given two tables:
**TableOne**
ID Name
1 abc
2 def
**TableTwo**
ID Date
1 12/1/2014
1 12/2/2014
2 12/3/2014
2 12/4/2014
2 12/5/2014
I want to join but only return the latest date from the second table:
Expected Result:
1 abc 12/2/2014
2 def 12/5/2014
I can easily accomplish this in LINQ like so:
TableOne.Select(x=> new { x.ID, x.Name, Date = x.TableTwo.Max(y=>y.Date) });
So in other words, what does the above LINQ statement translate into in raw SQL?

There are two ways to do this:
Using GROUP BY and MAX():
SELECT one.ID,
one.Name,
MAX(two.Date)
FROM TableOne one
INNER JOIN TableTwo two on one.ID = two.ID
GROUP BY one.ID, one.Name
Using ROW_NUMBER() with a CTE:
; WITH cte AS (
SELECT one.ID,
one.Name,
two.Date,
ROW_NUMBER() OVER (PARTITION BY one.ID ORDER BY two.Date DESC) as rn
FROM TableOne one
INNER JOIN TableTwo two ON one.ID = two.ID
)
SELECT ID, Name, Date FROM cte WHERE rn = 1

You could join the first table with an aggregate query:
SELECT t1.id, d
FROM TableOne t1
JOIN (SELECT id, MAX[date] AS d
FROM TableTwo
GROUP BY id) t2 ON t1.id = t2.id

Something like:
SELECT TableOne.id, TableOne.name, MAX(TableTwo.Date)
FROM TableOne
LEFT JOIN TableTwo ON TableOne.id = TableTwo.id
GROUP BY TableOne.id, TableOne.name;
The join will produce a table with as many rows as TableTwo, but the group by will filter it to one row per TableOne's rows.

Since nobody else has covered a Common Table Expression (CTE) that will perform the task you want, I'll throw it in here:
with maxDates as (
select Id, max(Date)
from Table2
group by Id
)
select x.Id, x.Name, y.Date
from TableOne x
inner join maxDates y
on x.Id = y.id

Related

Join table and pick rows where for given id exists only one value

I don't know, if I made good title, but please let me visualize this.
So I have two tables and for given case I need to select row where payment currency was ONLY in EUR.
Correct document Id's will be: 2, 3, 4, 5
These are overall bigger tables with 900k+ records.
Can you please suggest me how query should look?
use correlated subquery with not exists
select distinct a.document_id from tablename a inner join tablename b b on a.document_id=b.payment_docid
where not exists
(select 1 from tablename b1 where b1.payment_docid=b.payment_docid and currency<>'EUR')
Try this query:
select payment_docId from MyTable
group by payment_docId
having max(currency) = 'EUR'
and min(currency) = 'EUR'
or you could use having count(*) = 1 with min or max as well.
use corelated subquery
select t1.* from table2 as t1
where exists( select 1 from table2 t2 where t1.payment_docid=t2.payment_docid
having count(distinct currency)=1)
and currency='EUR'
It is possible to use INNER JOIN with the following conditions to get all rows:
SELECT
pd.payment_doc_id
, pd.currency
FROM DocTable dt
INNER JOIN PaymentDocs pd
ON dt.document_id = pd.payment_doc_id AND pd.currency IN ('EUR')
If you want distinct rows, then you can apply operator GROUP BY:
SELECT
pd.payment_doc_id
, pd.currency
FROM DocTable dt
INNER JOIN PaymentDocs pd
ON dt.document_id = pd.payment_doc_id AND pd.currency IN ('EUR')
GROUP BY pd.payment_doc_id
, pd.currency
Aggregation is the only efficient want :
select doc_id
from table t
group by doc_id
having min(currency) = max(currency) and min(currency) = 'EUR';

How to compare two tables in Hive based on counts

I have below hive tables
Table_1
ID
1
1
2
Table_2
ID
1
2
2
I am comparing two tables based on count of ID in both tables, I need the output like below
ID
1 - 2records in table 1 and 1 record in Table 2
2 - one record in Table 1 and 2 records in table 2
Table_1 is parent table
i am using below query
select count(*),ID from Table_1 group by ID;
select count(*),ID from Table_2 group by ID;
Just do a full outer join on your queries with the on condition as X.id = Y.id, and then select * from the resultant table checking for nulls on either side.
Select id, concat(cnt1, " entries in table 1, ",cnt2, "entries in table 2") from (select * from (select count(*) as cnt1, id from table1 group by id) X full outer join (select count(*) as cnt2, id from table2 group by id)
on X.id=Y.id
)
Try This. You may use a case statement to check if it should be record / records etc.
SELECT m.id,
CONCAT (COALESCE(a.ct, 0), ' record in table 1, ', COALESCE(b.ct, 0),
' record in table 2')
FROM (SELECT id
FROM table_1
UNION
SELECT id
FROM table_2) m
LEFT JOIN (SELECT Count(*) AS ct,
id
FROM table_1
GROUP BY id) a
ON m.id = a.id
LEFT JOIN (SELECT Count(*) AS ct,
id
FROM table_2
GROUP BY id) b
ON m.id = b.id;
You could use this Python program to do a full comparison of 2 Hive tables:
https://github.com/bolcom/hive_compared_bq
If you want a quick comparison just based on counts, then pass the "--just-count" option (you can also specify the group by column with "--group-by-column").
The script also allows you to visually see all the differences on all rows and all columns if you want a complete validation.

SQL select 1 to many within the same row

I have a table with 1 record, which then ties back to a secondary table which can contain either no match, 1 match, or 2 matches.
I need to fetch the corresponding records and display them within the same row which would be easy using left join if I just had 1 or no matches to tie back, however, because I can get 2 matches, it produces 2 records.
Example with 1 match:
Select T1.ID, T1.Person1, T2.Owner
From T1
Left Join T2
ON T1.ID = T2.MatchID
Output
ID Person1 Owner1
----------------------
1 John Frank
Example with 2 match:
Select T1.ID, T1.Person1, T2.Owner
From T1
Left Join T2
ON T1.ID = T2.MatchID
Output
ID Person1 Owner
----------------------
1 John Frank
1 John Peter
Is there a way I can formulate my select so that my output would reflect the following When I have 2 matches:
ID Person1 Owner1 Owner2
-------------------------------
1 John Frank Peter
I explored Oracle Pivots a bit, however couldn't find a way to make this work. Also explored the possibility of using left join on the same table twice using MIN() and MAX() when fetching the matches, however I can only see myself resorting this as a "no other option" scenario.
Any suggestions?
** EDIT **
#ughai - Using CTE does address the issue to some extent, however when attempting to retrieve all of the records, the details derived from this common table isn't showing any records on the LEFT JOIN unless I specify the "MatchID" (CASE_MBR_KEY) value, meaning by removing the "where" clause, my outer joins produce no records, even though the CASE_MBR_KEY values are there in the CTE data.
WITH CTE AS
(
SELECT TEMP.BEAS_KEY,
TEMP.CASE_MBR_KEY,
TEMP.FULLNAME,
TEMP.BIRTHDT,
TEMP.LINE1,
TEMP.LINE2,
TEMP.LINE3,
TEMP.CITY,
TEMP.STATE,
TEMP.POSTCD,
ROW_NUMBER()
OVER(ORDER BY TEMP.BEAS_KEY) R
FROM TMP_BEN_ASSIGNEES TEMP
--WHERE TEMP.CASE_MBR_KEY = 4117398
)
The reason for this is because the ROW_NUMBER value, given the amount of records won't necessarily be 1 or 2, so I attempted the following, but getting ORA-01799: a column may not be outer-joined to a subquery
--// BEN ASSIGNEE 1
LEFT JOIN CTE BASS1
ON BASS1.CASE_MBR_KEY = C.CASE_MBR_KEY
AND BASS1.R IN (SELECT min(R) FROM CTE A WHERE A.CASE_MBR_KEY = C.CASE_MBR_KEY)
--// END BA1
--// BEN ASSIGNEE 2
LEFT JOIN CTE BASS2
ON BASS2.CASE_MBR_KEY = C.CASE_MBR_KEY
AND BASS2.R IN (SELECT MAX(R) FROM CTE B WHERE B.CASE_MBR_KEY = C.CASE_MBR_KEY)
--// END BA2
** EDIT 2 **
Fixed the above issue by moving the Row number clause to the "Where" portion of the query instead of within the JOIN clause. Seems to work now.
You can use CTE with ROW_NUMBER() with 2 LEFT JOIN OR with PIVOT like this.
SQL Fiddle
Query with Multiple Left Joins
WITH CTE as
(
SELECT MatchID,Owner,ROW_NUMBER()OVER(ORDER BY Owner) r FROM t2
)
select T1.ID, T1.Person, t2.Owner as Owner1, t3.Owner as Owner2
FROM T1
LEFT JOIN CTE T2
ON T1.ID = T2.MatchID AND T2.r = 1
LEFT JOIN CTE T3
ON T1.id = T3.MatchID AND T3.r = 2;
Query with PIVOT
WITH CTE as
(
SELECT MatchID,Owner,ROW_NUMBER()OVER(ORDER BY Owner) R FROM t2
)
SELECT ID, Person,O1,O2
FROM T1
LEFT JOIN CTE T2
ON T1.ID = T2.MatchID
PIVOT(MAX(Owner) FOR R IN (1 as O1,2 as O2));
Output
ID PERSON OWNER1 OWNER2
1 John Maxwell Peter
If you know there are at most two matches, you can also use aggregation:
Select T1.ID, T1.Person1,
MIN(T2.Owner) as Owner1,
(CASE WHEN MIN(t2.Owner) <> MAX(t2.Owner) THEN MAX(t2.Owner) END) as Owner2
From T1 Left Join
T2
on T1.ID = T2.MatchID
Group By t1.ID, t1.Person1;

selecting records from main table and count of each row in another table

I have 2 table in my database that tables are in relationship with foreign key
I want to select all records from main table and then select count of each row in another table than have same ID from main table I tried to create a select query but it is not work correctly
this query return all records from main table + count of all records from next table(not count of each row in relationship)
SELECT tblForumSubGroups_1.id, tblForumSubGroups_1.GroupID,
tblForumSubGroups_1.SubGroupTitle, tblForumSubGroups_1.SubGroupDesc,
(SELECT COUNT(dbo.tblForumPosts.id) AS Expr1
FROM dbo.tblForumSubGroups INNER JOIN dbo.tblForumPosts ON
dbo.tblForumSubGroups.id = dbo.tblForumPosts.SubGroupID) AS Expr1
FROM dbo.tblForumSubGroups AS tblForumSubGroups_1 INNER JOIN
dbo.tblForumPosts AS tblForumPosts_1 ON tblForumSubGroups_1.id
= tblForumPosts_1.SubGroupID
SELECT tblForumSubGroups_1.id, tblForumSubGroups_1.GroupID, tblForumSubGroups_1.SubGroupTitle, tblForumSubGroups_1.SubGroupDesc,
COUNT(tblForumPosts_1.id) AS Expr1
FROM dbo.tblForumSubGroups AS tblForumSubGroups_1
INNER JOIN dbo.tblForumPosts AS tblForumPosts_1 ON tblForumSubGroups_1.id = tblForumPosts_1.SubGroupID
GROUP BY tblForumSubGroups_1.id, tblForumSubGroups_1.GroupID, tblForumSubGroups_1.SubGroupTitle, tblForumSubGroups_1.SubGroupDesc
I would suggest cross apply as you can do a lot more things with it ...
SELECT t1.id,
t1.GroupID,
t1.SubGroupTitle,
t1.SubGroupDesc,
t2.val
FROM dbo.tblForumSubGroups AS t1
cross apply (SELECT COUNT(*)
FROM dbo.tblForumPosts as t2
WHERE t1.id = t2.SubGroupID) x(val)
Do not mix sub-query and join logic. Use only one of them. I prefer sub-select.
SELECT tblForumSubGroups_1.id,
tblForumSubGroups_1.GroupID,
tblForumSubGroups_1.SubGroupTitle,
tblForumSubGroups_1.SubGroupDesc,
(SELECT COUNT(*)
FROM dbo.tblForumPosts
WHERE dbo.tblForumSubGroups.id = dbo.tblForumPosts.SubGroupID) AS Expr1
FROM dbo.tblForumSubGroups AS tblForumSubGroups_1
Just to supply another answer though I believe the cross apply is likely the best option:
SELECT
A.id, A.GroupID, A.SubGroupTitle, A.SubGroupDesc,
B.IDCount AS Expr1
FROM dbo.tblForumSubGroups A
INNER JOIN (
Select SubGroupID, Count(ID) as IDCount
from dbo.tblForumPosts
Group By SubGroupID
) B On A.ID = B.SubGroupID

Exposing more fields on group by sql

I know, in a Group By you can't Select a field that is not in an aggregate function or the GROUP BY clause.
However, There must be a workaround using joins or something else.
I have TWO tables BMP_VISITS_SITES and BMP_VISITS_COMMENTS which are connected by StationID in a one-to-many relationship. One Site can have many comments.
I'm trying to write a query that returns all Sites and the latest (only 1) comment. I have a "working" query but it only returns two columns which are in either an aggregate function or group by.
Here is my "working" query:
select a.StationID,
MAX(b.[dateobserved]) as LastDateObserved,
a.Status
from BMP_VISITS_SITES a
left outer join BMP_VISITS_COMMENTS as b
on a.[StationID] = b.[StationID]
group by a.StationID;
But how can I access all the columns in both tables?
I've tried inner joins with 1/2 success. When I join my BMP_VISITS_SITES to the above query I get all the fields of the table (t1). Great, but as soon as I try joining on BMP_VISITS_COMMENTS (t3) I get more results than I should.
select t1.*, t2.*
--,t3.*
from BMP_VISITS_SITES t1
inner join (
select a.StationID, MAX(b.[dateobserved]) as LastDateObserved from BMP_VISITS_SITES a
left outer join BMP_VISITS_COMMENTS as b
on a.[StationID] = b.[StationID]
group by a.StationID
) t2 on t2.StationID = t1.StationID
--inner join sde.BMP_VISITS_COMMENTS t3 on t3.StationID = t2.StationID;
SELECT a.*, b.* FROM
BMP_VISITS_SITES a
OUTER APPLY
(
SELECT TOP 1 *
FROM BMP_VISITS_COMMENTS b
WHERE b.StationID = a.StationID
ORDER BY LastDateObserved DESC
) b
You can use apply to get the last comment record and return all fields from both sides of the query.
Use row_number()
select *
from
(
select a.StationID,
a.Status,
b.*,
row_number() over (partition by a.stationid, a.status order by b.[dateobserved] desc) as rn
from BMP_VISITS_SITES a
left outer join BMP_VISITS_COMMENTS as b
on a.[StationID] = b.[StationID]
) v
where rn = 1