show only user with at least one entry per month - sql

I have two tables, let's say one is called User and the other one is called Data
Every User has many many entries in the Data table.
The Data table has the UserID and Dates included.
I would like to make a SQL query where I only get users with at least one entry per month in year 2019.
I have no idea how to do that.

You should really mention your database type. Treat this more like pseudo-code for now. But if you update your question, I can update my answer.
SELECT userID,
YEAR(Dates),
COUNT(DISTINCT MONTH(Dates))
FROM Data
WHERE YEAR(Dates) = 2019
GROUP BY UserId,
YEAR(Dates)
HAVING COUNT(DISTINCT MONTH(Dates))=12
Since you are looking only at the year 2019, you can exclude it from the GROUP BY clause. If you need to adjust the minimum entries for MONTH, I would suggest:
WITH CTE AS (
SELECT userID,
MONTH(Dates) as [month],
COUNT(*) as TotalEntriesPerMonth
FROM Data
WHERE YEAR(Dates) = 2019
GROUP BY UserId, MONTH(Dates)
HAVING COUNT(*)>=5
)
SELECT userID
FROM CTE
GROUP BY userID
HAVING COUNT([month]) = 12

Not clear what you are asking, but your query may be like below :
CREATE TABLE Data(UserId int,Dates date)
INSERT INTO Data(UserId,Dates) VALUES(1,'2020/04/28'),(1,'2020/04/29'),(2,'2020/04/29')
;WITH CTE AS (
SELECT UserId,ROW_NUMBER() OVER(PARTITION BY MONTH(Dates),UserId ORDER BY UserId) AS rn FROM Data)
SELECT Distinct UserId FROM CTE WHERE rn >=1

Related

Users that played in X different dates - SQL Standard + BigQuery

I have the following schema of a data model (I only have the schema, not the tables) on BigQuery with SQL Standard.
I have created this query to select the Top 10 users that generated more revenue in the last three months on the Love game:
SELECT
users.user_id,
SUM(pay.amount) AS total_rev
FROM
`my-database.User` AS users
INNER JOIN
`my-database.IAP_events` AS pay
ON
users.User_id = pay.User_id
INNER JOIN
`my-database.Games` AS games
ON
users.Game_id = games.Game_id
WHERE
games.game_name = "Love"
GROUP BY
users.user_id
ORDER BY
total_rev ASC
LIMIT
10
But then, the exercise says to only consider users that played during 10 different days in the last 3 months. I understand I would use a subquery with a count in the dates but I am a little lost on how to do it...
Thanks a lot!
EDIT: You need to count distinct dates, not transactions, so in the qualify clause you'll need to state COUNT(DISTINCT date_) OVER ... instead of COUNT(transaction_id) OVER .... Fixed the code already.
As far as I understood, you need to count the distinct transaction_id inside IAP_Events on a 3 previous months window, check that the count is greater than 10, and then sum the amounts of all the users included in that constraint.
To do so, you can use BigQuery's analytic functions, aka window functions:
with window_counting as (
select
user_id,
amount
from
iap_events
where
date_ >= date_sub(current_date(), interval 3 month)
qualify
count(distinct date_) over (partition by user_id) > 10
),
final as (
select
user_id,
sum(amount)
from
window_counting
group by
1
order by
2 desc
limit 10
)
select * from final
You will just need to add the needed joins inside the first CTE in order to filter by game_name :)

SQL Server LAG() function to calculate differences between rows

I'm new in SQL Server and I've got some doubts about the lag() function.
I have to calculate the average distance (in days) between two user's activities. Then, I have to GROUP BY all the users, calculate all the date differences between rows for each user, and finally select the average of the group.
Just to be clear, I've got this kind of table:
First I have to filter days with activities (activities!=0). Then I have to create this:
And finally, the expected outcome is this one:
I thought this could be a "kind of" code:
select userid, avg(diff)
(SELECT *,DATEDIFF(day, Lag(dateid, 1) OVER(ORDER BY [Userid]),
dateid) as diff
FROM table1
where activities!=0
group by userid) t
group by userid
Of course it doesn't work. I think I also have to do a while loop since rownumber changes for each users.
I hope you can help meeee! thank you very much
You are almost there. Just add partition by userid so the difference is calculated for each userid and order by dateid.
select userid, avg(diff)
(SELECT t.*
,DATEDIFF(day, Lag(dateid, 1) OVER(PARTITION BY [Userid] ORDER BY [dateid]),dateid) as diff
FROM table1 t
where wager!=0
) t
group by userid
You don't need lag() at all. The average is the maximum minus the minimum divided by one less than the count:
SELECT userid,
DATEDIFF(day, MIN(dateid), MAX(dateid)) * 1.0 / NULLIF(COUNT(*), 1) as avg_diff
FROM table1
WHERE wager<> 0
GROUP BY userid;

SQL - Select Query to get group by records whose sum(data) > 24

I need to select a UserID from the table whose sum of Data greater than 24.
I can able to select group and sum the records using
SELECT SUM(DATA),UserID FROM TableName GROUP BY UserID
But how can I select only the records for which SUM(DATA)>24
I have tried
SELECT SUM(DATA),UserID FROM #tempTimesheetValue where SUM(DATA)>24 GROUP BY UserID
But its not working.
Thanks in advance for suggestion..,
you can do this by below query:
select UserID, DATA from (
SELECT SUM(DATA) as DATA, UserID FROM #tempTimesheetValue GROUP BY UserID
) A where DATA > 24
The question might as well have the correct answer, which is;
SELECT SUM(DATA), UserID
FROM #tempTimesheetValue
GROUP BY UserID
HAVING SUM(DATA) > 24;
A subquery could be used, but it is unnecessary complication.

Max of a Date field into another field in Postgresql

I have a postgresql table wherein I have few fields such as id and date. I need to find the max date for that id and show the same into a new field for all the ids. SQLFiddle site was not responding so I have an example in the excel. Here is the screenshot of the data and the output for the table.
You could use the windowing variant of max:
SELECT id, date, MAX(date) OVER (PARTITION BY id)
FROM mytable
Something like this might work:
WITH maxdts AS (
SELECT id, max(dt) maxdt FROM table GROUP BY id
)
SELECT id, date, maxdt FROM table t, maxdts m WHERE t.id = m.id;
Keep in mind without more information that this could be a horribly inefficient query, but it will get you what you need.

How to count rows in SQL Server 2012?

I am trying to find whether a person (id = A3) is continuously active in a program at least five months or more in a given year (2013). Any suggestion would be appreciated. My data look like as follows:
You simply use group by and a conditional expression:
select id,
(case when count(ActiveMonthYear) >= 5 then 'YES!' else 'NAW' end)
from table t
where ListOfTheMonths between '201301' and '201312'
group by id;
EDIT:
I suppose "continuously" doesn't just mean any five months. For that, there are various ways. I like the difference of row numbers approach
select distinct id
from (select t.*,
(row_number() over (partition by id order by ListOfTheMonths) -
count(ActiveMonthYear) over (partition by id order by ListOfTheMonths)
) as grp
from table t
where ListOfTheMonths between '201301' and '201312'
) t
where ActiveMonthYear is not null
group by id, grp
having count(*) >= 5;
The difference in the subquery is constant for groups of consecutive active months. This is then used a grouping. The result is a list of all ids that meet this criteria. You can add a where for a particular id (do it in the subquery).
By the way, this is written using select distinct and group by. This is one of the rare cases where these two are appropriately used together. A single id could have two periods of five months in the same year. There is no reason to include that person twice in the result set.