I have 2 tables like this:
Table A:
guv, col1, col2
Table B:
guv, col3, col4, col5..
Now each A and B have one to many relationship, so when I run the following query:
select * from A,B where a.guv. = b.guv
It returns all the rows in B that match the join, how do I return only one row(based on some order in one of the columns) that matches?
I tried to do this using Top as read in some other answers, but its not supported by aws athena.
You may use ROW_NUMBER() function within the join query as the following:
SELECT guv, col1, col2, col3, col4, col5
FROM
(
SELECT A.guv, A.col1, A.col2, B.col3, B.col4, B.col5,
ROW_NUMBER() OVER (PARTITION BY A.guv ORDER BY B.col3) rn
FROM TableA A JOIN TableB B
ON A.guv=B.guv
) T
WHERE rn = 1
In ROW_NUMBER() OVER (PARTITION BY A.guv ORDER BY B.col3) you may change the order by B.col3 to any other column order.
Related
I have a table with 5 columns:
ID - int identity,col1,col2,col3,col4,(all 4 cols are varchar)
There are approx. 68,000 unique col1/col2 values. For each of these, there can be between 1 and approx. 214,000 unique col3/col4 values.
My task is to retrieve one random col3 and col4 (from the same row) for each of the unique col1/col2 values.
Is it possible to accomplish this without using a While loop or a cursor? I've done some research and know how to get random values (and the identity column helps with that), but the only way I can see to do this is to go thru the 68,000 unique col1/col2 values 1 by 1, and grab a random col3/col4 value from each.
Also, these row counts are for preliminary development/testing (collected from 4 previous months of data). When this goes live we will be going back 27 months. So obviously, we are talking about a massive amount of data.
I've seen some mentions of using CTE's, but have not been successful in finding an example or explanation.
Thanks for your help.
I figured out a solution involving temp tables, ROW_NUMBER() over..., and RAND().
First, I selected the distinct col1 and col2 values into #temp1.
SELECT DISTINCT col1, col2
INTO #temp1
FROM sourceTable
Next, I selected the distinct col3 and col4 values for each col1/col2, along with a row number, and put in temp table #temp2:
SELECT t.COL1, t.COL2, a.col3, a.col4,
ROW_NUMBER() OVER (
PARTITION BY t.col1, t.col2
ORDER BY t.col1, t.col2, a.col3, a.col4) as RowNumber
INTO #temp2
FROM #temp1 t
JOIN sourceTable a ON a.col1 = t.col1 AND a.col2 = t.col2
GROUP BY t.col1, t.col2, a.col3, a.col4
ORDER BY t.col1, t.col2, RowNumber
Then, I selected one of the rows at random from each set of col1/col2's into a 3rd temp table:
SELECT x.col1, x.col2,
(SELECT TOP 1 y.RowNumber
FROM #temp2 y
WHERE y.col1 = x.col1
AND y.col2 = x.col2
AND y.RowNumber >= RAND() *
(SELECT MAX(z.RowNumber)
FROM #temp2 z
WHERE z.col1 = x.col1
AND z.col2 = x.col2)) AS Random_RowNumber
INTO #temp3
FROM #temp1 x
ORDER BY x.col1, x.col2
Lastly, I join the tables to get the random rows:
SELECT t3.col1, t3.col2, t2.col3, t2.col4
FROM #temp3 t3
JOIN #temp2 t2 on t2.col1 = t3.col1 AND t2.col2 = t3.col2 AND t2.RowNumber = t3.Random_RowNumber
I am new to SQL.
I am currently trying to write a query where i would like to list down all the details in my tables. I am using joins to get the together and everything is working fine. Where i get stuck is when i try to use count with my other columns. The issue is that the count i am referring to is a text field and as per that table the same id appears multiple times and i want to get the count from my query. My query looks like this
select col1, col2, col3, count(col4)
from table1 c
left join table2 a on c.id = a.id
however this does not work. I would appreciate any leads.
Use and study on group by. Your code should be something like below.
select col1, col2, col3, count(col4)
from table1 c
left join table2 a on c.id = a.id
group by col1, col2, col3
You need to add group by clause
select col1, col2, col3, count(col4)
from table1 c
left join table2 a on c.id = a.id
group by col1, col2, col3
I am trying to accomplish the following, and I am not sure if it is possible. I have a SELECT Statement that contains an inner SELECT for two of the table columns like so:
SELECT
col1,
col2,
(SELECT SUM(col1)
FROM table2)
AS FirstResultToAdd,
(SELECT SUM(col2)
FROM table3)
AS SecondResultToAdd,
FROM Table1
So my question is: Is it possible to perform a calculation, such as doing a SUM of "FirstResultToAdd" and "SecondResultToAdd, and returning that as a single column result on "Table1"? Also to keep in mind, I have excluded any joins of the tables to keep the example simple.
I believe you want to perform some logic on the result of Sub-query
To add the two sub-query result
SELECT col1,
col2,
(SELECT col1
FROM table2)
AS FirstResultToAdd,
(SELECT col2
FROM table3)
AS SecondResultToAdd,
(SELECT col1
FROM table2)
+
(SELECT col2
FROM table3)
AS total
FROM table1
To make the query more readable you can make the original query as Sub-Select and perform the logic in Outer query
just nest one more time...
select col1, col2, sum( FirstResultToAdd )
from (
SELECT
col1,
col2,
(SELECT col1
FROM table2)
AS FirstResultToAdd,
(SELECT col2
FROM table3)
AS SecondResultToAdd,
FROM Table1
)
Edit: Fixed Group By
Try this:
Select A.Col1,
A.Col2,
(B.Col3 + C.Col4)
From(
(Select Col1,
Col2
From [Table1]) A
Inner join (Select Sum(Col3) AS Col3
From [Table2]) B on 1 = 1
Inner join (Select Sum(Col4) AS Col4
From [Table3]) C on 1 = 1
)
Group By A.Col1,
A.Col2,
B.Col3,
C.Col4
I have a query with some subqueries inside and I want to add a sum query to sum them all.
How can I do that?
example:
Id,
(SELECT COUNT(*) FROM table1 LEFT JOIN table2 on ...) as col1,
(SELECT COUNT(*) FROM table3 LEFT JOIN table4 on ...) as col2,
** Sum of both col1 and col2 here **
Try this:
SELECT ID, col1, col2, [Total] = (col1 + col2)
FROM (
SELECT Id,
(SELECT COUNT(*) FROM table1 LEFT JOIN table2 on ...) as col1,
(SELECT COUNT(*) FROM table3 LEFT JOIN table4 on ...) as col2
FROM [TABLE]) T
Hope that helps.
the easiest way would be to treat all your query as a subquery
select Id, col1 + col2 as total
from
(<yourCode>) s
Because it's not possible to use alias in the same "level of query" in the select clause.
I have a table
col1
1
2
and other table
col1 col2 col3
1 1 data value one
1 2 data value one
2 3 data value two
and I want to join both tables to obtain the following result
col1 col2 col3
1 1 data value one
2 3 data value two
The second table have duplicates but I need to join only one (randomly). I've tried with Inner Join, Left Join, Right Join and always returns all rows. Actually I use SQL Server 2008.
select t1.col1, t2.col2, t2.col3 from table1 t1
cross apply
(select top 1 col2, col3 from table2 where col1 = t1.col1 order by newid()) t2
You can use the ROW_NUMBER Function along with ORDER BY NEWID() To get one random row for each value in col1:
WITH CTE AS
( SELECT Col1,
Col2,
Col3,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY NEWID())
FROM Table2
)
SELECT *
FROM Table1
INNER JOIN CTE
ON CTE.Col1 = table1.Col1
AND CTE.RowNumber = 1 -- ONLY GET ONE ROW FOR EACH VALUE
Use Distinct it will eliminate dups, but you sure both rows will contain same data?