First row in one to many relationship join - sql

I have 2 tables like this:
Table A:
guv, col1, col2
Table B:
guv, col3, col4, col5..
Now each A and B have one to many relationship, so when I run the following query:
select * from A,B where a.guv. = b.guv
It returns all the rows in B that match the join, how do I return only one row(based on some order in one of the columns) that matches?
I tried to do this using Top as read in some other answers, but its not supported by aws athena.

You may use ROW_NUMBER() function within the join query as the following:
SELECT guv, col1, col2, col3, col4, col5
FROM
(
SELECT A.guv, A.col1, A.col2, B.col3, B.col4, B.col5,
ROW_NUMBER() OVER (PARTITION BY A.guv ORDER BY B.col3) rn
FROM TableA A JOIN TableB B
ON A.guv=B.guv
) T
WHERE rn = 1
In ROW_NUMBER() OVER (PARTITION BY A.guv ORDER BY B.col3) you may change the order by B.col3 to any other column order.

Related

Get random value sets from table without using cursor or While loop

I have a table with 5 columns:
ID - int identity,col1,col2,col3,col4,(all 4 cols are varchar)
There are approx. 68,000 unique col1/col2 values. For each of these, there can be between 1 and approx. 214,000 unique col3/col4 values.
My task is to retrieve one random col3 and col4 (from the same row) for each of the unique col1/col2 values.
Is it possible to accomplish this without using a While loop or a cursor? I've done some research and know how to get random values (and the identity column helps with that), but the only way I can see to do this is to go thru the 68,000 unique col1/col2 values 1 by 1, and grab a random col3/col4 value from each.
Also, these row counts are for preliminary development/testing (collected from 4 previous months of data). When this goes live we will be going back 27 months. So obviously, we are talking about a massive amount of data.
I've seen some mentions of using CTE's, but have not been successful in finding an example or explanation.
Thanks for your help.
I figured out a solution involving temp tables, ROW_NUMBER() over..., and RAND().
First, I selected the distinct col1 and col2 values into #temp1.
SELECT DISTINCT col1, col2
INTO #temp1
FROM sourceTable
Next, I selected the distinct col3 and col4 values for each col1/col2, along with a row number, and put in temp table #temp2:
SELECT t.COL1, t.COL2, a.col3, a.col4,
ROW_NUMBER() OVER (
PARTITION BY t.col1, t.col2
ORDER BY t.col1, t.col2, a.col3, a.col4) as RowNumber
INTO #temp2
FROM #temp1 t
JOIN sourceTable a ON a.col1 = t.col1 AND a.col2 = t.col2
GROUP BY t.col1, t.col2, a.col3, a.col4
ORDER BY t.col1, t.col2, RowNumber
Then, I selected one of the rows at random from each set of col1/col2's into a 3rd temp table:
SELECT x.col1, x.col2,
(SELECT TOP 1 y.RowNumber
FROM #temp2 y
WHERE y.col1 = x.col1
AND y.col2 = x.col2
AND y.RowNumber >= RAND() *
(SELECT MAX(z.RowNumber)
FROM #temp2 z
WHERE z.col1 = x.col1
AND z.col2 = x.col2)) AS Random_RowNumber
INTO #temp3
FROM #temp1 x
ORDER BY x.col1, x.col2
Lastly, I join the tables to get the random rows:
SELECT t3.col1, t3.col2, t2.col3, t2.col4
FROM #temp3 t3
JOIN #temp2 t2 on t2.col1 = t3.col1 AND t2.col2 = t3.col2 AND t2.RowNumber = t3.Random_RowNumber

Using column names and count togetger

I am new to SQL.
I am currently trying to write a query where i would like to list down all the details in my tables. I am using joins to get the together and everything is working fine. Where i get stuck is when i try to use count with my other columns. The issue is that the count i am referring to is a text field and as per that table the same id appears multiple times and i want to get the count from my query. My query looks like this
select col1, col2, col3, count(col4)
from table1 c
left join table2 a on c.id = a.id
however this does not work. I would appreciate any leads.
Use and study on group by. Your code should be something like below.
select col1, col2, col3, count(col4)
from table1 c
left join table2 a on c.id = a.id
group by col1, col2, col3
You need to add group by clause
select col1, col2, col3, count(col4)
from table1 c
left join table2 a on c.id = a.id
group by col1, col2, col3

In SQL can I Perform Logic on Multiple Columns, which are SELECT Statements?

I am trying to accomplish the following, and I am not sure if it is possible. I have a SELECT Statement that contains an inner SELECT for two of the table columns like so:
SELECT
col1,
col2,
(SELECT SUM(col1)
FROM table2)
AS FirstResultToAdd,
(SELECT SUM(col2)
FROM table3)
AS SecondResultToAdd,
FROM Table1
So my question is: Is it possible to perform a calculation, such as doing a SUM of "FirstResultToAdd" and "SecondResultToAdd, and returning that as a single column result on "Table1"? Also to keep in mind, I have excluded any joins of the tables to keep the example simple.
I believe you want to perform some logic on the result of Sub-query
To add the two sub-query result
SELECT col1,
col2,
(SELECT col1
FROM table2)
AS FirstResultToAdd,
(SELECT col2
FROM table3)
AS SecondResultToAdd,
(SELECT col1
FROM table2)
+
(SELECT col2
FROM table3)
AS total
FROM table1
To make the query more readable you can make the original query as Sub-Select and perform the logic in Outer query
just nest one more time...
select col1, col2, sum( FirstResultToAdd )
from (
SELECT
col1,
col2,
(SELECT col1
FROM table2)
AS FirstResultToAdd,
(SELECT col2
FROM table3)
AS SecondResultToAdd,
FROM Table1
)
Edit: Fixed Group By
Try this:
Select A.Col1,
A.Col2,
(B.Col3 + C.Col4)
From(
(Select Col1,
Col2
From [Table1]) A
Inner join (Select Sum(Col3) AS Col3
From [Table2]) B on 1 = 1
Inner join (Select Sum(Col4) AS Col4
From [Table3]) C on 1 = 1
)
Group By A.Col1,
A.Col2,
B.Col3,
C.Col4

Sum on subqueries on SQL Server

I have a query with some subqueries inside and I want to add a sum query to sum them all.
How can I do that?
example:
Id,
(SELECT COUNT(*) FROM table1 LEFT JOIN table2 on ...) as col1,
(SELECT COUNT(*) FROM table3 LEFT JOIN table4 on ...) as col2,
** Sum of both col1 and col2 here **
Try this:
SELECT ID, col1, col2, [Total] = (col1 + col2)
FROM (
SELECT Id,
(SELECT COUNT(*) FROM table1 LEFT JOIN table2 on ...) as col1,
(SELECT COUNT(*) FROM table3 LEFT JOIN table4 on ...) as col2
FROM [TABLE]) T
Hope that helps.
the easiest way would be to treat all your query as a subquery
select Id, col1 + col2 as total
from
(<yourCode>) s
Because it's not possible to use alias in the same "level of query" in the select clause.

Join Tables SQL Server with duplicates

I have a table
col1
1
2
and other table
col1 col2 col3
1 1 data value one
1 2 data value one
2 3 data value two
and I want to join both tables to obtain the following result
col1 col2 col3
1 1 data value one
2 3 data value two
The second table have duplicates but I need to join only one (randomly). I've tried with Inner Join, Left Join, Right Join and always returns all rows. Actually I use SQL Server 2008.
select t1.col1, t2.col2, t2.col3 from table1 t1
cross apply
(select top 1 col2, col3 from table2 where col1 = t1.col1 order by newid()) t2
You can use the ROW_NUMBER Function along with ORDER BY NEWID() To get one random row for each value in col1:
WITH CTE AS
( SELECT Col1,
Col2,
Col3,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY NEWID())
FROM Table2
)
SELECT *
FROM Table1
INNER JOIN CTE
ON CTE.Col1 = table1.Col1
AND CTE.RowNumber = 1 -- ONLY GET ONE ROW FOR EACH VALUE
Use Distinct it will eliminate dups, but you sure both rows will contain same data?