SQL Server LAG() function to calculate differences between rows - sql

I'm new in SQL Server and I've got some doubts about the lag() function.
I have to calculate the average distance (in days) between two user's activities. Then, I have to GROUP BY all the users, calculate all the date differences between rows for each user, and finally select the average of the group.
Just to be clear, I've got this kind of table:
First I have to filter days with activities (activities!=0). Then I have to create this:
And finally, the expected outcome is this one:
I thought this could be a "kind of" code:
select userid, avg(diff)
(SELECT *,DATEDIFF(day, Lag(dateid, 1) OVER(ORDER BY [Userid]),
dateid) as diff
FROM table1
where activities!=0
group by userid) t
group by userid
Of course it doesn't work. I think I also have to do a while loop since rownumber changes for each users.
I hope you can help meeee! thank you very much

You are almost there. Just add partition by userid so the difference is calculated for each userid and order by dateid.
select userid, avg(diff)
(SELECT t.*
,DATEDIFF(day, Lag(dateid, 1) OVER(PARTITION BY [Userid] ORDER BY [dateid]),dateid) as diff
FROM table1 t
where wager!=0
) t
group by userid

You don't need lag() at all. The average is the maximum minus the minimum divided by one less than the count:
SELECT userid,
DATEDIFF(day, MIN(dateid), MAX(dateid)) * 1.0 / NULLIF(COUNT(*), 1) as avg_diff
FROM table1
WHERE wager<> 0
GROUP BY userid;

Related

Users that played in X different dates - SQL Standard + BigQuery

I have the following schema of a data model (I only have the schema, not the tables) on BigQuery with SQL Standard.
I have created this query to select the Top 10 users that generated more revenue in the last three months on the Love game:
SELECT
users.user_id,
SUM(pay.amount) AS total_rev
FROM
`my-database.User` AS users
INNER JOIN
`my-database.IAP_events` AS pay
ON
users.User_id = pay.User_id
INNER JOIN
`my-database.Games` AS games
ON
users.Game_id = games.Game_id
WHERE
games.game_name = "Love"
GROUP BY
users.user_id
ORDER BY
total_rev ASC
LIMIT
10
But then, the exercise says to only consider users that played during 10 different days in the last 3 months. I understand I would use a subquery with a count in the dates but I am a little lost on how to do it...
Thanks a lot!
EDIT: You need to count distinct dates, not transactions, so in the qualify clause you'll need to state COUNT(DISTINCT date_) OVER ... instead of COUNT(transaction_id) OVER .... Fixed the code already.
As far as I understood, you need to count the distinct transaction_id inside IAP_Events on a 3 previous months window, check that the count is greater than 10, and then sum the amounts of all the users included in that constraint.
To do so, you can use BigQuery's analytic functions, aka window functions:
with window_counting as (
select
user_id,
amount
from
iap_events
where
date_ >= date_sub(current_date(), interval 3 month)
qualify
count(distinct date_) over (partition by user_id) > 10
),
final as (
select
user_id,
sum(amount)
from
window_counting
group by
1
order by
2 desc
limit 10
)
select * from final
You will just need to add the needed joins inside the first CTE in order to filter by game_name :)

show only user with at least one entry per month

I have two tables, let's say one is called User and the other one is called Data
Every User has many many entries in the Data table.
The Data table has the UserID and Dates included.
I would like to make a SQL query where I only get users with at least one entry per month in year 2019.
I have no idea how to do that.
You should really mention your database type. Treat this more like pseudo-code for now. But if you update your question, I can update my answer.
SELECT userID,
YEAR(Dates),
COUNT(DISTINCT MONTH(Dates))
FROM Data
WHERE YEAR(Dates) = 2019
GROUP BY UserId,
YEAR(Dates)
HAVING COUNT(DISTINCT MONTH(Dates))=12
Since you are looking only at the year 2019, you can exclude it from the GROUP BY clause. If you need to adjust the minimum entries for MONTH, I would suggest:
WITH CTE AS (
SELECT userID,
MONTH(Dates) as [month],
COUNT(*) as TotalEntriesPerMonth
FROM Data
WHERE YEAR(Dates) = 2019
GROUP BY UserId, MONTH(Dates)
HAVING COUNT(*)>=5
)
SELECT userID
FROM CTE
GROUP BY userID
HAVING COUNT([month]) = 12
Not clear what you are asking, but your query may be like below :
CREATE TABLE Data(UserId int,Dates date)
INSERT INTO Data(UserId,Dates) VALUES(1,'2020/04/28'),(1,'2020/04/29'),(2,'2020/04/29')
;WITH CTE AS (
SELECT UserId,ROW_NUMBER() OVER(PARTITION BY MONTH(Dates),UserId ORDER BY UserId) AS rn FROM Data)
SELECT Distinct UserId FROM CTE WHERE rn >=1

SQL - Select Query to get group by records whose sum(data) > 24

I need to select a UserID from the table whose sum of Data greater than 24.
I can able to select group and sum the records using
SELECT SUM(DATA),UserID FROM TableName GROUP BY UserID
But how can I select only the records for which SUM(DATA)>24
I have tried
SELECT SUM(DATA),UserID FROM #tempTimesheetValue where SUM(DATA)>24 GROUP BY UserID
But its not working.
Thanks in advance for suggestion..,
you can do this by below query:
select UserID, DATA from (
SELECT SUM(DATA) as DATA, UserID FROM #tempTimesheetValue GROUP BY UserID
) A where DATA > 24
The question might as well have the correct answer, which is;
SELECT SUM(DATA), UserID
FROM #tempTimesheetValue
GROUP BY UserID
HAVING SUM(DATA) > 24;
A subquery could be used, but it is unnecessary complication.

PostgreSQL select daily max and corresponding hour of ocurrence

I have the following table structure, with daily-hourly data:
time_of_ocurrence(timestamp); particles(numeric)
"2012-11-01 00:30:00";191.3
"2012-11-01 01:30:00";46
...
"2013-01-01 02:30:00";319.6
How do i select the DAILY max and THE HOUR in which this max occur?
I've tried
SELECT date_trunc('hour', time_of_ocurrence) as hora,
MAX(particles)
from my_table WHERE time_of_ocurrence > '2013-09-01'
GROUP BY hora ORDER BY hora
But it doesn't work:
"2013-09-01 00:00:00";34.35
"2013-09-01 01:00:00";33.13
"2013-09-01 02:00:00";33.09
"2013-09-01 03:00:00";28.08
My result would be in this format instead (one max per day, showing the hour)
"2013-09-01 05:00:00";100.35
"2013-09-02 03:30:00";80.13
How can i do that? Thanks!
This type of question has come up on StackOverflow frequently, and these questions are categorized with the greatest-n-per-group tag, if you want to see other solutions.
edit: I changed the following code to group by day instead of by hour.
Here's one solution:
SELECT t.*
FROM (
SELECT date_trunc('day', time_of_ocurrence) as hora, MAX(particles) AS particles
FROM my_table
GROUP BY hora
) AS _max
INNER JOIN my_table AS t
ON _max.hora = date_trunc('day', t.time_of_ocurrence)
AND _max.particles = t.particles
WHERE time_of_ocurrence > '2013-09-01'
ORDER BY time_of_ocurrence;
This might also show more than one result per day, if more than one row has the max value.
Another solution using window functions that does not show such duplicates:
SELECT * FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY date_trunc('day', time_of_ocurrence)
ORDER BY particles DESC) AS _rn
FROM my_table
) AS _max
WHERE _rn = 1
ORDER BY time_of_ocurrence;
If multiple rows have the same max, one row with nevertheless be numbered row 1. If you need specific control over which row is numbered 1, you need to use ORDER BY in the partitioning clause using a unique column to break such ties.
Use window functions:
select distinct
date_trunc('day',time_of_ocurrence) as day,
max(particles) over (partition by date_trunc('day',time_of_ocurrence)) as particles_max_of_day,
first_value(date_trunc('hour',time_of_ocurrence)) over (partition by date_trunc('day',time_of_ocurrence) order by particles desc)
from my_table
order by 1
One edge case here is if the same MAX number of particles show up in the same day, but in different hours. This version would randomly pick one of them. If you prefer one over the other (always the earlier one for example) you can add that to the order by clause:
first_value(date_trunc('hour',time_of_ocurrence)) over (partition by date_trunc('day',time_of_ocurrence) order by particles desc, time_of_ocurrence)

How do I use ROW_NUMBER()?

I want to use the ROW_NUMBER() to get...
To get the max(ROW_NUMBER()) --> Or i guess this would also be the count of all rows
I tried doing:
SELECT max(ROW_NUMBER() OVER(ORDER BY UserId)) FROM Users
but it didn't seem to work...
To get ROW_NUMBER() using a given piece of information, ie. if I have a name and I want to know what row the name came from.
I assume it would be something similar to what I tried for #1
SELECT ROW_NUMBER() OVER(ORDER BY UserId) From Users WHERE UserName='Joe'
but this didn't work either...
Any Ideas?
For the first question, why not just use?
SELECT COUNT(*) FROM myTable
to get the count.
And for the second question, the primary key of the row is what should be used to identify a particular row. Don't try and use the row number for that.
If you returned Row_Number() in your main query,
SELECT ROW_NUMBER() OVER (Order by Id) AS RowNumber, Field1, Field2, Field3
FROM User
Then when you want to go 5 rows back then you can take the current row number and use the following query to determine the row with currentrow -5
SELECT us.Id
FROM (SELECT ROW_NUMBER() OVER (ORDER BY id) AS Row, Id
FROM User ) us
WHERE Row = CurrentRow - 5
Though I agree with others that you could use count() to get the total number of rows, here is how you can use the row_count():
To get the total no of rows:
with temp as (
select row_number() over (order by id) as rownum
from table_name
)
select max(rownum) from temp
To get the row numbers where name is Matt:
with temp as (
select name, row_number() over (order by id) as rownum
from table_name
)
select rownum from temp where name like 'Matt'
You can further use min(rownum) or max(rownum) to get the first or last row for Matt respectively.
These were very simple implementations of row_number(). You can use it for more complex grouping. Check out my response on Advanced grouping without using a sub query
If you need to return the table's total row count, you can use an alternative way to the SELECT COUNT(*) statement.
Because SELECT COUNT(*) makes a full table scan to return the row count, it can take very long time for a large table. You can use the sysindexes system table instead in this case. There is a ROWS column that contains the total row count for each table in your database. You can use the following select statement:
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('table_name') AND indid < 2
This will drastically reduce the time your query takes.
You can use this for get first record where has clause
SELECT TOP(1) * , ROW_NUMBER() OVER(ORDER BY UserId) AS rownum
FROM Users
WHERE UserName = 'Joe'
ORDER BY rownum ASC
ROW_NUMBER() returns a unique number for each row starting with 1. You can easily use this by simply writing:
ROW_NUMBER() OVER (ORDER BY 'Column_Name' DESC) as ROW_NUMBER
May not be related to the question here. But I found it could be useful when using ROW_NUMBER -
SELECT *,
ROW_NUMBER() OVER (ORDER BY (SELECT 100)) AS Any_ID
FROM #Any_Table
select
Ml.Hid,
ml.blockid,
row_number() over (partition by ml.blockid order by Ml.Hid desc) as rownumber,
H.HNAME
from MIT_LeadBechmarkHamletwise ML
join [MT.HAMLE] h on ML.Hid=h.HID
SELECT num, UserName FROM
(SELECT UserName, ROW_NUMBER() OVER(ORDER BY UserId) AS num
From Users) AS numbered
WHERE UserName='Joe'
You can use Row_Number for limit query result.
Example:
SELECT * FROM (
select row_number() OVER (order by createtime desc) AS ROWINDEX,*
from TABLENAME ) TB
WHERE TB.ROWINDEX between 0 and 10
--
With above query, I will get PAGE 1 of results from TABLENAME.
If you absolutely want to use ROW_NUMBER for this (instead of count(*)) you can always use:
SELECT TOP 1 ROW_NUMBER() OVER (ORDER BY Id)
FROM USERS
ORDER BY ROW_NUMBER() OVER (ORDER BY Id) DESC
Need to create virtual table by using WITH table AS, which is mention in given Query.
By using this virtual table, you can perform CRUD operation w.r.t row_number.
QUERY:
WITH table AS
-
(SELECT row_number() OVER(ORDER BY UserId) rn, * FROM Users)
-
SELECT * FROM table WHERE UserName='Joe'
-
You can use INSERT, UPDATE or DELETE in last sentence by in spite of SELECT.
SQL Row_Number() function is to sort and assign an order number to data rows in related record set. So it is used to number rows, for example to identify the top 10 rows which have the highest order amount or identify the order of each customer which is the highest amount, etc.
If you want to sort the dataset and number each row by seperating them into categories we use Row_Number() with Partition By clause. For example, sorting orders of each customer within itself where the dataset contains all orders, etc.
SELECT
SalesOrderNumber,
CustomerId,
SubTotal,
ROW_NUMBER() OVER (PARTITION BY CustomerId ORDER BY SubTotal DESC) rn
FROM Sales.SalesOrderHeader
But as I understand you want to calculate the number of rows of grouped by a column. To visualize the requirement, if you want to see the count of all orders of the related customer as a seperate column besides order info, you can use COUNT() aggregation function with Partition By clause
For example,
SELECT
SalesOrderNumber,
CustomerId,
COUNT(*) OVER (PARTITION BY CustomerId) CustomerOrderCount
FROM Sales.SalesOrderHeader
This query:
SELECT ROW_NUMBER() OVER(ORDER BY UserId) From Users WHERE UserName='Joe'
will return all rows where the UserName is 'Joe' UNLESS you have no UserName='Joe'
They will be listed in order of UserID and the row_number field will start with 1 and increment however many rows contain UserName='Joe'
If it does not work for you then your WHERE command has an issue OR there is no UserID in the table. Check spelling for both fields UserID and UserName.