BigQuery - Nested Query with different WHERE parameters? - google-bigquery

I'm trying to trying to fetch the user_counts and new_user_counts by date where new_user_counts is defined by condition WHERE date of timestamp event_timestamp = date of timestamp user_first_touch_timestamp while user_counts would fetch the distinct count of user_pseduo_id field between the same date range. How can I do this in the same query? Here's how my current query is looking.
Eventually, I'd like the result to be as:
|Date | new_user_count | user_counts |
|20200820 | X | Y |
Here is the error I'm getting at line 8 of code:
Syntax error: Function call cannot be applied to this expression. Function calls require a path, e.g. a.b.c() at [8:5]
Thanks.
SELECT
event_date,
COUNT (DISTINCT(user_pseudo_id)) AS new_user_counts FROM
`my-google-analytics-table-name.*`
WHERE DATE(TIMESTAMP_MICROS(event_timestamp)) =
DATE(TIMESTAMP_MICROS(user_first_touch_timestamp))
AND event_date BETWEEN '20200820' AND '20200831'
(SELECT
COUNT (DISTINCT(user_pseudo_id)) AS user_counts
FROM `my-google-analytics-table-name.*`
WHERE event_date BETWEEN '20200820' AND '20200831'
)
GROUP BY event_date
ORDER BY event_date ASC

Try below (solely based on your original query just fixing the syntax/logic)
SELECT
event_date,
COUNT(DISTINCT IF(
DATE(TIMESTAMP_MICROS(event_timestamp)) = DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)),
user_pseudo_id,
NULL
)) AS new_user_counts,
COUNT(DISTINCT(user_pseudo_id)) AS user_counts
FROM `my-google-analytics-table-name.*`
GROUP BY event_date
ORDER BY event_date ASC

Related

Cumulative Sum with Postgre SQL using date truncating

I'm relatively new to using SQL in Apache Superset and I'm not sure where to look or how to solve my problem.
The short version of what I am trying to do is add a column of cumulative sum based on the total number of users by month.
Here is my PostgreSQL query so far:
SELECT
DATE(DATE_TRUNC('month', crdate)) AS "Month",
COUNT(DISTINCT user_id) AS "COUNT_DISTINCT(user_id)"
FROM
datasource
WHERE
user_id IS NOT NULL
GROUP BY
DATE(DATE_TRUNC('month', create))
ORDER BY
"COUNT_DISTINCT(user_id)" DESC
Sum of Users by Month
There are some syntax errors, you can't order by an alias and in group by your date column is wrong, so it should be like this:
SELECT
DATE(DATE_TRUNC('month', crdate)) AS "Month",
COUNT(DISTINCT user_id) AS "COUNT_DISTINCT(user_id)"
FROM
datasource
WHERE
user_id IS NOT NULL
GROUP BY
DATE(DATE_TRUNC('month', crdate)) AS "Month"
ORDER BY
COUNT_DISTINCT(user_id) desc
You can use your query a Basis for the Window function
CREATE TABLE datasource(crdate timestamp,user_id int)
WITH CTE AS (
SELECT
DATE_TRUNC('month',"crdate") as "Month",
COUNT(DISTINCT user_id) AS "COUNT_DISTINCT(user_id)"
FROM
datasource
WHERE
user_id IS NOT NULL
GROUP BY
DATE_TRUNC('month', "crdate")
)
SELECT "Month", SUM("COUNT_DISTINCT(user_id)") OVER (ORDER BY "Month") as cumultatove_sum
FROM CTE
Month | cumultatove_sum
:---- | --------------:
db<>fiddle here

depending on the max aggregate function to count the same column

I'm trying to get the date of the latest game and the number of game a player had in a 1 hour window before that.
Here is the code I want to execute
SELECT MAX(saved_date), COUNT(saved_date) FROM Game
WHERE username = 'user_name'
HAVING saved_date >= MAX(saved_date) - interval '1' hour;
but it gives me the error not a GROUP BY expression. The error doesn't appear when I add the line GROUP BY saved_date but it doesn't answer my first question. I know I can do it in two statements but I'd prefer it done in one.
Do you have any advice or solution ? As this is my first post here, please be indulgent, thanks !
Additional info :
The Game table is created like this :
CREATE TABLE GAME (
game_id NUMBER GENERATED ALWAYS AS IDENTITY,
...
saved_date DATE,
...
);
output with GROUP BY saved_date :
MAX(saved_date) | COUNT(saved_date)
----------------|------------------
29/03/21 | 1
29/03/21 | 1
29/03/21 | 1
desired output :
MAX(saved_date) | COUNT(saved_date)
----------------|------------------
29/03/21 | 3
You can use analytic functions (instead of aggregation functions):
SELECT saved_date,
COUNT(*) OVER (
ORDER BY saved_date DESC
RANGE BETWEEN INTERVAL '0' HOUR PRECEDING
AND INTERVAL '1' HOUR FOLLOWING
) AS num_games
FROM game
WHERE username = 'user_name'
ORDER BY saved_date DESC
FETCH FIRST ROW ONLY;
or, if you are using Oracle 11g:
SELECT saved_date,
num_games
FROM (
SELECT ROW_NUMBER() OVER ( ORDER BY saved_date DESC ) AS rn,
saved_date,
COUNT(*) OVER (
ORDER BY saved_date DESC
RANGE BETWEEN INTERVAL '0' HOUR PRECEDING
AND INTERVAL '1' HOUR FOLLOWING
) AS num_games
FROM game
WHERE username = 'user_name'
)
WHERE rn = 1;
Which, for the sample data:
CREATE TABLE GAME (
game_id NUMBER GENERATED ALWAYS AS IDENTITY,
username VARCHAR2(100),
saved_date DATE
);
INSERT INTO game ( username, saved_date )
SELECT 'user_name', DATE '1970-01-01' + ( LEVEL - 1 ) * INTERVAL '10' MINUTE
FROM DUAL
CONNECT BY LEVEL <= 20;
Outputs:
SAVED_DATE | NUM_GAMES
:------------------ | --------:
1970-01-01 03:10:00 | 7
db<>fiddle here
You need a subquery. I would approach this using a window function:
SELECT MAX(saved_date), COUNT(*)
FROM (SELECT G.*,
MAX(G.saved_date) OVER (PARTITION BY G.username) as max_saved_date
FROM Game G
WHERE G.username = 'user_name'
) G
WHERE saved_date >= max_saved_date - interval '1' hour;

how to get unique row numbers in sql

How to get only the first row from the result of the below query. I need the latest record for each date so I did the partition by created_date. But in some places, I am getting the same row number and not able to get the expected output. Please find the below query, current output, and expected output.
What changes do in need to make in order to get the expected output? Thank you.
WITH ctetable
AS (
SELECT created_date BPMDate
,tenor
,row_number() OVER (
PARTITION BY created_date ORDER BY created_date DESC
) rw
FROM table1 a
INNER JOIN table2 b ON a.case_id = b.case_id
AND a.eligible_transaction = 'true'
AND to_date(a.created_date) >= '2020-10-01'
AND to_date(a.created_date) <= '2020-10-05'
AND case_status = 'Completed'
)
SELECT BPMDate
,Tenor
,rw
FROM ctetable
Current output:
date tenor rw
2020-10-05 13:24:15.0 1W 1
2020-10-05 12:15:43.0 1Y 1
2020-10-05 12:15:43.0 1Y 2
2020-10-01 13:30:59.0 1W 1
2020-10-01 13:30:59.0 1W 2
Expected output:
date tenor rw
2020-10-05 13:24:15.0 1W 1
2020-10-01 13:30:59.0 1W 1
Regards,
Viresh
That would be:
with ctetable as (
select created_date, bpmdate, tenor,
row_number() over (partition by date(created_date) order by created_date desc ) rn
from table1 a
inner join table2 b
on a.case_id = b.case_id
and a.eligible_transaction = 'true'
and to_date(a.created_date) >= '2020-10-01'
and to_date(a.created_date) <= '2020-10-05'
and case_status='completed'
)
select bpmdate,tenor,rw
from ctetable
where rn = 1
Changes to your original code:
you need to remove the time portion of the date in the partition by clause of the window function; you didn't tell which database you are using: I used date(), but the function might be different in your database (trunc() in Oracle, date_trunc() in Postgres, and so on)
the outer query needs to filter on the row number that is equal to 1
You seem to want the first row per day:
select BPMDate, Tenor, rw
from (select t.*,
row_number() over (partition by trunc(bpmdate) order by bpmdate) as seqnum
from ctetable
) t
where seqnum = 1;
Note: I don't know if your database supports trunc(), but that is simply some method for extracting the date from the column.

Sub-query is Not Working for Date_Part()

I want to pass the subquery as an argument to the EXTRACT() function of Postgres to get the number of the day of the week but it is not working.
Working Code:
SELECT EXTRACT(dow FROM DATE '2018-06-07');
It returns:
+-------------+
| date_part |
|-------------|
| 4.0 |
+-------------+
Not Working Code:
SELECT EXTRACT(DOW FROM DATE
(SELECT start_date from leaves where submitted_by=245 and type_id = 16)
);
It returns
syntax error at or near "SELECT"
LINE 1: SELECT EXTRACT(DAY FROM DATE (SELECT submitted_on FROM leave...
I don't know why EXTRACT() function is not accepting subquery result as the query:
SELECT start_date from leaves where submitted_by=245 and type_id = 16;
returns the following which I think is identical I have passed as a
date string in the working example.
+--------------+
| start_date |
|--------------|
| 2018-06-07 |
+--------------+
Can somebody correct it or let me know some other way to get the number of the day of the week.
Just apply it to the column of the select:
SELECT EXTRACT(DOW from start_date)
from leaves
where submitted_by=245 and type_id = 16
If you really want to use a scalar sub-query, then you must get rid of the DATE keyword, that is only needed to specify date constants.
SELECT EXTRACT(DOW FROM
(SELECT start_date from leaves where submitted_by=245 and type_id = 16)
);
Put the function inside the select:
select (select extract(dow from start_date)
from leaves
where submitted_by = 245 and type_id = 16
)
I don't see the advantage for using a subquery in the select for this (as opposed to -- say -- moving the subquery to the from. But this should do what you want.

sql - find the number of days a user was using the app

I like to write a sql query that counts the number of days each user used the application and how many concurrent days. A user can enter the app several times a day but that should count as 1.
My table looks like this:
id | bigint
user_id | bigint
action_date | timestamp without time zone
To count the number of days per user:
SELECT user_id, count(DISTINCT action_date::date) AS days
FROM user_action_tbl
GROUP BY user_id;
One way to do it
SELECT user_id, COUNT(*) days_total, SUM(conseq) days_consecutive
FROM
(
SELECT user_id,
CASE WHEN LEAD(date, 1) OVER (PARTITION BY user_id ORDER BY date) - date = 1 THEN 1 ELSE 0 END consecutive
FROM
(
SELECT user_id, action_date::date date
FROM table1
GROUP BY user_id, action_date::date
) q
) p
GROUP BY user_id
Here is a SQLFiddle demo