Users that played in X different dates - SQL Standard + BigQuery - sql

I have the following schema of a data model (I only have the schema, not the tables) on BigQuery with SQL Standard.
I have created this query to select the Top 10 users that generated more revenue in the last three months on the Love game:
SELECT
users.user_id,
SUM(pay.amount) AS total_rev
FROM
`my-database.User` AS users
INNER JOIN
`my-database.IAP_events` AS pay
ON
users.User_id = pay.User_id
INNER JOIN
`my-database.Games` AS games
ON
users.Game_id = games.Game_id
WHERE
games.game_name = "Love"
GROUP BY
users.user_id
ORDER BY
total_rev ASC
LIMIT
10
But then, the exercise says to only consider users that played during 10 different days in the last 3 months. I understand I would use a subquery with a count in the dates but I am a little lost on how to do it...
Thanks a lot!

EDIT: You need to count distinct dates, not transactions, so in the qualify clause you'll need to state COUNT(DISTINCT date_) OVER ... instead of COUNT(transaction_id) OVER .... Fixed the code already.
As far as I understood, you need to count the distinct transaction_id inside IAP_Events on a 3 previous months window, check that the count is greater than 10, and then sum the amounts of all the users included in that constraint.
To do so, you can use BigQuery's analytic functions, aka window functions:
with window_counting as (
select
user_id,
amount
from
iap_events
where
date_ >= date_sub(current_date(), interval 3 month)
qualify
count(distinct date_) over (partition by user_id) > 10
),
final as (
select
user_id,
sum(amount)
from
window_counting
group by
1
order by
2 desc
limit 10
)
select * from final
You will just need to add the needed joins inside the first CTE in order to filter by game_name :)

Related

SQL Server LAG() function to calculate differences between rows

I'm new in SQL Server and I've got some doubts about the lag() function.
I have to calculate the average distance (in days) between two user's activities. Then, I have to GROUP BY all the users, calculate all the date differences between rows for each user, and finally select the average of the group.
Just to be clear, I've got this kind of table:
First I have to filter days with activities (activities!=0). Then I have to create this:
And finally, the expected outcome is this one:
I thought this could be a "kind of" code:
select userid, avg(diff)
(SELECT *,DATEDIFF(day, Lag(dateid, 1) OVER(ORDER BY [Userid]),
dateid) as diff
FROM table1
where activities!=0
group by userid) t
group by userid
Of course it doesn't work. I think I also have to do a while loop since rownumber changes for each users.
I hope you can help meeee! thank you very much
You are almost there. Just add partition by userid so the difference is calculated for each userid and order by dateid.
select userid, avg(diff)
(SELECT t.*
,DATEDIFF(day, Lag(dateid, 1) OVER(PARTITION BY [Userid] ORDER BY [dateid]),dateid) as diff
FROM table1 t
where wager!=0
) t
group by userid
You don't need lag() at all. The average is the maximum minus the minimum divided by one less than the count:
SELECT userid,
DATEDIFF(day, MIN(dateid), MAX(dateid)) * 1.0 / NULLIF(COUNT(*), 1) as avg_diff
FROM table1
WHERE wager<> 0
GROUP BY userid;

How to filter records by them amount per date?

i have a tablet 'A' that have a column of date. and the same date can be in a few records. I'm trying to filter the records where the amount of the records by day is less than 5. And still keep all the fields of the tablet.
I mean that if i have only 4 records on 11/10/2017 I need to filter all of this 4 records.
So You can SELECT them basing at sub-query . In SUB-Query group them by this date column and then use HAVING with aggregated count to know how many in every date-group we have and then select all which have this count lesser than 5 ;
SELECT *
FROM A
WHERE A.date in (SELECT subA.date
FROM A
GROUP BY A.date
HAVING COUNT(*) < 5 );
Take Care's answer is good. Alternatively, you can use an analytic/windowing function. I'd benchmark both and see which one works better.
with cte as (
select *, count(1) over (partition by date) as cnt
from table_a
)
select *
from cte
where cnt < 5

SQL Getting Top 2 Results for each individual column value

I have a table 'Cashup_Till' that records all data on what a particular till has recorded in a venue for a given day, each venue has multiple tills all with a designated number 'Till_No'. I need to get the previous 2 days entries for each till number. For each till Individually I can do this...
SELECT TOP 2 T.* FROM CashUp_Till T
WHERE T.Till_No = (Enter Till Number Here)
ORDER BY T.Till_Id DESC
Some venues have 20-30 tills so Ideally I need to do all the tills in one call. I can pass in a user defined table type of till numbers, then select them in a subquery, but that's as far as my SQL knowledge takes me, does anyone have a solution?
Here is one way:
SELECT T.*
FROM (SELECT T.*,
ROW_NUMBER() OVER (PARTITION BY Till_No ORDER BY Till_Id DESC) as seqnum
FROM CashUp_Till T
) T
WHERE seqnum <= 2;
This assumes that there is one record per day, which I believe is suggested by the question.
If you have a separate table of tills, then:
select ct.*
from t cross apply
(select top 2 ct.*
from cashup_till ct
where ct.till_no = t.till_no
order by till_id desc
) ct;

How to count rows in SQL Server 2012?

I am trying to find whether a person (id = A3) is continuously active in a program at least five months or more in a given year (2013). Any suggestion would be appreciated. My data look like as follows:
You simply use group by and a conditional expression:
select id,
(case when count(ActiveMonthYear) >= 5 then 'YES!' else 'NAW' end)
from table t
where ListOfTheMonths between '201301' and '201312'
group by id;
EDIT:
I suppose "continuously" doesn't just mean any five months. For that, there are various ways. I like the difference of row numbers approach
select distinct id
from (select t.*,
(row_number() over (partition by id order by ListOfTheMonths) -
count(ActiveMonthYear) over (partition by id order by ListOfTheMonths)
) as grp
from table t
where ListOfTheMonths between '201301' and '201312'
) t
where ActiveMonthYear is not null
group by id, grp
having count(*) >= 5;
The difference in the subquery is constant for groups of consecutive active months. This is then used a grouping. The result is a list of all ids that meet this criteria. You can add a where for a particular id (do it in the subquery).
By the way, this is written using select distinct and group by. This is one of the rare cases where these two are appropriately used together. A single id could have two periods of five months in the same year. There is no reason to include that person twice in the result set.

Is there a way to extract the year data from a date in ORACLE? Counting user votes

Hi I am creating a database that tracks how many times a user votes for one of our products.
Is there a way that I can change the query below to count the max number of votes each year? I'm using Oracle SQL developer
The creates tables are structured as
Members(username, email, pwd)
Votes (username*, prodCode*, score, voteDate)
note:
SELECT username, count(username)
FROM Votes NATURAL JOIN members
GROUP BY username, email
HAVING COUNT(*) >= All(SELECT count(username)
FROM Votes v1
GROUP BY username);
I'm not entirely sure I understand your question, but something like this maybe?
SELECT username,
extract(year from voteDate) as vote_year,
count(*) as votes_per_year,
max(count(*)) over (partition by extract(year from voteDate)) as max_votes_per_year
FROM Votes vt
GROUP BY username, extract(year from voteDate);
This will give you the count for each user and year and the max count for each year in every row. (Note the members table is not necessary for this as you apparently don't need any column from that).
If you need to show a zero count for years where nobody voted (or one specific user didn't vote), then you'll need something like this:
with years as (
select 2009 + level as year
from dual
connect by level <= 21
)
SELECT username,
y.year,
count(vt.username) as votes_per_year,
max(count(vt.username)) over (partition by y.year) as max_votes_per_year
FROM years y
LEFT JOIN Votes vt on y.year = extract(year from vt.voteDate)
GROUP BY username, y.year
order by y.year, username;
This will generate a list of years between 2010 and 2030 "on-the-fly" using a common table expression (with ...). You can adjust the CTE to extend or narrow the number of years.
For this you'd better have a table with years (or at least a table of number that could help you set up a set of years).
SELECT
username
, MAX(cnt) AS mcnt
FROM (
SELECT
username
, count(username) AS cnt
FROM members
CROSS JOIN Years
INNER JOIN Votes ON (
members.username = Votes.username -- using an INT here would be more efficient
AND
Votes.voteDate >= Years.firstJanuary
AND
Votes.voteDate < Years.firstJanuary+ 1 Year -- Pseudo code, you didn't mention any DBMS
)
GROUP BY username, prodCode, Years.firstJanuary
) AS X