SELECT columns OVER (PARTITION BY column) - sql

Suppose I want to retrieve the swimmer and their time at the 75th Percentile for each day.
This is what I was trying to do:
SELECT tableA.DATE, tableA.SWIMMER, tableA.TIME
OVER (PARTITION BY tableA.DATE)
FROM tableA
WHERE RANK = CEIL(0.75 * NUM_OF_SWIMMERS);
But this errors at the OVER statement.
What's the best way to get the data I need?
Thanks!

Your error is that the OVER clause of a windowing function requires an ORDER BY clause.
But assuming that num_swimmers , why not just return
select
date,
swimmer,
time
from tablea
where
RANK = CEIL(0.75 * NUM_OF_SWIMMERS)
?
The WHERE clause will ensure the only rows returned are the 75th percentile for a given day

Related

MAX in Select statement not returning the highest value?

I have a question regarding the max-statement in a select -
Without the MAX-statemen i have this select:
SELECT stockID, DATE, close, symbol
FROM ta_stockprice JOIN ta_stock ON ta_stock.id = ta_stockprice.stockID
WHERE stockid = 8648
ORDER BY close
At the end i only want to have the max row for the close-column so i tried:
Why i didnĀ“t get date = "2021-07-02" as output?
(i saw that i allways get "2021-07-01" as output - no matter if i use MAX / MIN / AVG...)
The MAX() turns the query into an aggregation query. With no GROUP BY, it returns one row. But the query is syntactically incorrect, because it mixes aggregated and unaggregated columns.
Once upon a time, MySQL allowed such syntax in violation of the SQL Standard but returned values from arbitrary rows for the unaggreged columns.
Use ORDER BY to do what you want:
SELECT stockID, DATE, close, symbol
FROM ta_stockprice JOIN ta_stock ON ta_stock.id = ta_stockprice.stockID
WHERE stockid = 8648
ORDER BY close DESC
LIMIT 1;

SQL- calculate ratio and get max ratio with corresponding user and date details

I have a table with user, date and a col each for messages sent and messages received:
I want to get the max of messages_sent/messages_recieved by date and user for that ratio. So this is the output I expect:
Andrew Lean 10/2/2020 10
Andrew Harp 10/1/2020 6
This is my query:
SELECT
ds.date, ds.user_name, max(ds.ratio) from
(select a.user_name, a.date, a.message_sent/ a.message_received as ratio
from messages a
group by a.user_name, a.date) ds
group by ds.date
But the output I get is:
Andrew Lean 10/2/2020 10
Jalinn Kim 10/1/2020 6
In the above output 6 is the correct max ratio for the date grouped but the user is wrong. What am I doing wrong?
With a recent version of most databases, you could do something like this.
This assumes, as in your data, there's one row per user per day. If you have more rows per user per day, you'll need to provide a little more detail about how to combine them or ignore some rows. You could want to SUM them. It's tough to know.
WITH cte AS (
select a.user_name, a.date
, a.message_sent / a.message_received AS ratio
, ROW_NUMBER() OVER (PARTITION BY a.date ORDER BY a.message_sent / a.message_received DESC) as rn
from messages a
)
SELECT t.user_name, t.date, t.ratio
FROM cte AS t
WHERE t.rn = 1
;
Note: There's no attempt to handle ties, where more than one user has the same ratio. We could use RANK (or other methods) for that, if your database supports it.
Here, I am just calculating the ratio for each column in the first CTE.
In the second part, I am getting the maximum results of the ratio calculated in the first part on date level. This means I am assuming each user will have one row for each date.
The max() function on date level will ensure that we always get the highest ratio on date level.
There could be ties, between the ratios for that we can use ROW_NUMBER' OR RANK()` to set a rank for each row based on the criteria that we would like to pass in case of ties and then filter on the rank generated.
with data as (
select
date,
user_id,
messages_sent / messages_recieved as ratio
from [table name]
)
select
date,
max(ratio) as higest_ratio_per_date
from data
group by 1,2

Get count of records of the top 3 rows and compare the counts

In SQL Server 2016, I have a query as such:
SELECT [Report_date], count(distinct indv_id)
FROM
[dbo].[STG_TABLE] group by report_date order by report_date desc
I get the results as below:
Report_date (No column name)
2020-08-21 47918
2020-08-12 968065
2020-07-31 977804
Now I want to compare the difference between the counts in each row. If the difference is more than 10%, then I need to send an email out in the SSIS package.
How can I go through each row and calculate the difference? I want to look at the first row and compare it with the second row.
You question seems to be about calculating the ratios between rows. For that, use lag(). To get the ratio:
SELECT [Report_date], COUNT(DISTINCT indv_id),
(COUNT(DISTINCT indv_id) * 1.0 / LAG(COUNT(DISTINCT indv_id)) OVER (ORDER BY report_date))
FROM [dbo].[STG_TABLE]
GROUP BY report_date
ORDER BYreport_date DESC;
I'm not sure what results you want, but this is the basic information.

How can I make this query run efficiently?

In BigQuery, we're trying to run:
SELECT day, AVG(value)/(1024*1024) FROM (
SELECT value, UTC_USEC_TO_DAY(timestamp) as day,
PERCENTILE_RANK() OVER (PARTITION BY day ORDER BY value ASC) as rank
FROM [Datastore.PerformanceDatum]
WHERE type = "MemoryPerf"
) WHERE rank >= 0.9 AND rank <= 0.91
GROUP BY day
ORDER BY day desc;
which returns a relatively small amount of data. But we're getting the message:
Error: Resources exceeded during query execution. The query contained a GROUP BY operator, consider using GROUP EACH BY instead. For more details, please see https://developers.google.com/bigquery/docs/query-reference#groupby
What is making this query fail, the size of the subquery? Is there some equivalent query we can do which avoids the problem?
Edit in response to comments: If I add GROUP EACH BY (and drop the outer ORDER BY), the query fails, claiming GROUP EACH BY is here not parallelizable.
I wrote an equivalent query that works for me:
SELECT day, AVG(value)/(1024*1024) FROM (
SELECT data value, UTC_USEC_TO_DAY(dtimestamp) as day,
PERCENTILE_RANK() OVER (PARTITION BY day ORDER BY value ASC) as rank
FROM [io_sensor_data.moscone_io13]
WHERE sensortype = "humidity"
) WHERE rank >= 0.9 AND rank <= 0.91
GROUP BY day
ORDER BY day desc;
If I run only the inner query, I get 3,660,624 results. Is your dataset bigger than that?
The outer select gives me only 4 results when grouped by day. I'll try a different grouping to see if I can hit a limit there:
SELECT day, AVG(value)/(1024*1024) FROM (
SELECT data value, dtimestamp / 1000 as day,
PERCENTILE_RANK() OVER (PARTITION BY day ORDER BY value ASC) as rank
FROM [io_sensor_data.moscone_io13]
WHERE sensortype = "humidity"
) WHERE rank >= 0.9 AND rank <= 0.91
GROUP BY day
ORDER BY day desc;
Runs too, now with 57,862 different groups.
I tried different combinations to get to the same error. I was able to get the same error as you doubling the amount of initial data. An easy "hack" to double the amount of data is changing:
FROM [io_sensor_data.moscone_io13]
To:
FROM [io_sensor_data.moscone_io13], [io_sensor_data.moscone_io13]
Then I get the same error. How much data do you have? Can you apply an additional filter? As you are already partitioning the percentile_rank by day, can you add an additional query to only analyze a fraction of the days (for example, only last month)?

get previous from max value

I have folowing sql query an di want to get previous of max value from table.
select max(card_no),vehicle_number
FROM WBG.WBG_01_01
group by vehicle_number
Through this query i got each maximum card number of each vehicle.But i want to get previouse of that max.For example
if vehicle number has card number 21,19,17,10,5,6,1 and i want to get 19 from max function
Please anyone tell me how can i do this in sql.
Another idea would be to use analytics, something like this:
select
vehicle_number,
prev_card_no
from (
select
card_no,
vehicle_number,
lag(card_no) over
(partition by vehicle_number order by card_no) as prev_card_no,
max(card_no) over
(partition by vehicle_number) as max_card_no
FROM WBG.WBG_01_01
)
where max_card_no = card_no;
Of course, this doesn't take into account your seemingly arbitrary ordering from your question, nor would it work with duplicate maximum numbers.
try this one:
select max(card_no),vehicle_number
FROM WBG.WBG_01_01
where card_no < (Select max(card_no) from WBG.WBG_01_01 group by vehicle_number)
group by vehicle_number