So I have 2 tables, team A and team B, with their score. I want the rank of the score of every member of team A within team B using SQL or vertica, as shown below
Team A Table
user score
-------------
asa 100
bre 200
cqw 50
duy 50
Team B Table
user score
------------
gfh 20
ewr 80
kil 70
cvb 90
Output:
Team A Table
user score rank in team B
------------------------------
asa 100 1
bre 200 1
cqw 50 4
duy 50 4
Try this - and this only works in Vertica.
INTERPOLATE PREVIOUS VALUE is an outer-join predicate specific to Vertica that joins two tables on non-equal columns, using the 'last known' value in the outer-joined table to make a match succeed.
WITH
-- input, don't use in query itself
table_a (the_user,score) AS (
SELECT 'asa',100
UNION ALL SELECT 'bre',200
UNION ALL SELECT 'cqw',50
UNION ALL SELECT 'duy',50
)
,
table_b(the_user,score) AS (
SELECT 'gfh',20
UNION ALL SELECT 'ewr',80
UNION ALL SELECT 'kil',70
UNION ALL SELECT 'cvb',90
)
-- end of input - start WITH clause here
,
ranked_b AS (
SELECT
RANK() OVER(ORDER BY score DESC) AS the_rank
, *
FROM table_b
)
SELECT
a.the_user AS a_user
, a.score AS a_score
, b.the_rank AS rank_in_team_b
FROM table_a a
LEFT JOIN ranked_b b
ON a.score INTERPOLATE PREVIOUS VALUE b.score
ORDER BY 1
;
a_user|a_score|rank_in_team_b
asa | 100| 1
bre | 200| 1
cqw | 50| 4
duy | 50| 4
Simple correlated query should do:
select
a.*,
(select count(*) + 1 from table_b b where b.score > a.score) rank_in_b
from table_a a;
All you need to do is count the number of people with more score than current user in the table b and add 1 to it to get the rank.
Related
I have a table historical_data
ID
Date
column_a
column_b
1
2011-10-01
a
a1
1
2011-11-01
w
w1
1
2011-09-01
a
a1
2
2011-01-12
q
q1
2
2011-02-01
d
d1
3
2011-11-01
s
s1
I need to retrieve the whole history of an id based on the date condition on any 1 row related to that ID.
date>='2011-11-01' should get me
ID
Date
column_a
column_b
1
2011-10-01
a
a1
1
2011-11-01
w
w1
1
2011-09-01
a
a1
3
2011-11-01
s
s1
I am aware you can get this by using a CTE or a subquery like
with selected_id as (
select id from historical_data where date>='2011-11-01'
)
select hd.* from historical_data hd
inner join selected_id si on hd.id = si.id
or
select * from historical_data
where id in (select id from historical_data where date>='2011-11-01')
In both these methods I have to query/scan the table ``historical_data``` twice.
I have indexes on both id and date so it's not a problem right now, but as the table grows this may cause issues.
The table above is a sample table, the table I have is about to touch 1TB in size with upwards of 600M rows.
Is there any way to achieve this by only querying the table once? (I am using Snowflake)
Using QUALIFY:
SELECT *
FROM historical_data
QUALIFY MAX(date) OVER(PARTITION BY id) >= '2011-11-01'::DATE;
I have two tables:
Table A
Purchase_date
Product_ID
20200101
1
20190101
2
20200301
1
20201201
2
Table B
Product_ID
Price
Price_change_date
1
10
20191231
2
15
20201031
1
12
20200110
1
20
20201231
2
8
20190331
I want to join these two tables based on two criteria:
If the purchase_date < min(price_change_date), return the price corresponding to the min(price_change_date)
Else return the price at the max(price_change_date) that is less than the purchase_date
I have written a query to successfully get results for the second criteria, but not the first, and I'm not sure if they can be combined within the same query.
Results for the above table should yield:
Results
Purchase_date
Product_ID
Price
Price_change_date
20200101
1
10
20191231
20190101
2
8
20190331
20200301
1
12
20200110
20201201
2
15
20201031
Notice the second row is the one that returns a price with a purchase date that precedes the price_change_date.
Thanks in advance!!
You can use a lateral join:
select a.*, b.*
from a, lateral
(select b.*
from b
where b.product_id = a.product_id and b.price_change_date <= a.purchase_date
order by b.price_change_date desc
limit 1
) b;
EDIT:
The above gives you the most recent price change information. If you want records in a before the original price, then you can use:
select a.*
from a left join
(select b.product_id, min(b.price_change_date) as min_price_change_date
from b
group by product_id
) b
on a.purchase_date < b.price_change_date;
I have a table such as this:
PalmId | UserId | CreatedDate
1 | 1 | 2018-03-08 14:18:27.077
1 | 2 | 2018-03-08 14:18:27.077
1 | 3 | 2018-03-08 14:18:27.077
1 | 1 | 2018-03-08 14:18:27.077
I wish to know how many dates were created for Palm 1 and I also wish to know how many users have created those dates for Palm 1. So the outcome for first is 4 and outcome for second is 3
I am wondering if I can do that in a single query as oppose to having to do a subquery and a join on itself as in example below.
SELECT MT.[PalmId], COUNT(*) AS TotalDates, T1.[TotalUsers]
FROM [MyTable] MT
LEFT OUTER JOIN (
SELECT MT2.[PalmId], COUNT(*) AS TotalUsers
FROM [MyTable] MT2
GROUP BY MT2.[UserId]
) T1 ON T1.[PalmId] = MT.[PalmId]
GROUP BY MT.[PalmId], T1.[TotalUsers]
According to first table you could do something like this:
select count(distinct uerid) as N_Users,
count(created_date) as created_date, -- if you use count(*) you consider also rows with 'NULL'
palmid
from your_table
group by palmid
If you want "4" and "3", then I think you want:
SELECT MT.PalmId, COUNT(*) AS NumRows, COUNT(DISTINCT mt.UserId) as NumUsers
FROM MyTable MT
GROUP BY MT.PalmId
I'm trying to write a query that selects distinct uid's but I want to choose those distinct uid's given an order on a modified_at column.
Example:
Table_A
uid data_value modified_at
=== ========== ===========
1 a 1/1/2016
1 b 1/2/2016
1 c 1/3/2016
2 d 1/1/2016
2 e 1/2/2016
3 f 3/1/2016
3 g 3/3/2016
3 h 3/4/2016
4 i 2/1/2016
5 j 1/5/2016
5 k 1/6/2016
So I want to select distinct uid's that have been modified most recently.
I'm not sure if there's a quick query that would allow be to do this rather than pull the information separately into a script and modify.
Write now, all I can do is
select distinct uid, data_value, modified_at
from Table_A (...and other stuff if I want to join and do things)
You can use DISTINCT ON:
SELECT DISTINCT ON (uid) uid, data_value, modified_at
FROM Table_A
ORDER BY uid, modified_at DESC
use a windows function row_number() and use cte syntaxis for better reading
WITH cte as (
SELECT *,
row_number() over (PARTITION BY uid ORDER BY modified_at DESC) as rn
FROM TableA
)
SELECT *
FROM cte
WHERE rn = 1
I have a table like this
Table A:
Id Count
1 4
1 16
1 8
2 10
2 15
3 18
etc
Table B:
1 sample1.file
2 sample2.file
3 sample3.file
TABLE C:
Count fileNumber
16 1234
4 2345
15 3456
18 4567
and so on...
What I want is this
1 sample1.file 1234
2 sample2.file 3456
3 sample3.file 4567
To get the max value from table A I used
Select MAX (Count) from A where Id='1'
This works well but my problem is when combining data with another table.
When I join Table B and Table A, I need to get the MAX for all Ids and in my query I dont know what Id is.
This is my query
SELECT B.*,C.*
JOIN A on A.Id = B.ID
JOIN C on A.id = B.ID
WHERE (SELECT MAX(COUNT)
FROM A
WHERE Id = <what goes here????>)
To summarise, what I want is Values from Table B, FileNumber from Table c (where the count is Max for ID from table A).
UPDATE: COrrecting table C above. Looks like I need Table A.
I think this is the query you're looking for:
select b.*, c.filenumber from b
join (
select id, max(count) as count from a
group by id
) as NewA on b.id = NewA.id
join c on NewA.count = c.count
However, you should take into account that I don't get why for id=1 in tableA you choose the 16 to match against table C (which is the max) and for id=2 in tableA you choose the 10 to match against table C (which is the min). I assumed you meant the max in both cases.
Edit:
I see you've updated tableA data. The query results in this, given the previous data:
+----+---------------+------------+
| ID | FILENAME | FILENUMBER |
+----+---------------+------------+
| 1 | sample1.file | 1234 |
| 2 | sample2.file | 3456 |
| 3 | sample3.file | 4567 |
+----+---------------+------------+
Here is a working example
Using Mosty’s working example (renaming the keyword count to cnt for a column name), this is another approach:
with abc as (
select
a.id,
a.cnt,
rank() over (
partition by a.id
order by cnt desc
) as rk,
b.filename
from a join b on a.id = b.id
)
select
abc.id, abc.filename, c.filenumber
from abc join c
on c.cnt = abc.cnt
where rk = 1;
select
PreMax.ID,
B.FileName,
C2.FileNumber
from
( select C.id, max( C.count ) maxPerID
from TableC C
group by C.ID
order by C.ID ) PreMax
JOIN TableC C2
on PreMax.ID = C2.ID
AND PreMax.maxPerID = C2.Count
JOIN TableB B
on PreMax.ID = B.ID