PostgreSQL How to add WHERE where is count() and GROUP - sql

How can I add WHERE function into my query ?
SELECT count(*), date_trunc('year', "createdAt") AS txn_year
FROM tables
WHERE active = 1 // not working I don't know why
GROUP BY txn_year;
Thanks for any opinion

SELECT count(*), date_trunc('year', "createdAt") AS txn_year
FROM tables
WHERE column_active = 1
GROUP BY txn_year;

active is of type character varying, i.e. a string type. This should work:
SELECT count(*), date_trunc('year', "createdAt") AS txn_year
FROM tables
WHERE active = '1'
GROUP BY txn_year;

Related

Cumulative Sum with Postgre SQL using date truncating

I'm relatively new to using SQL in Apache Superset and I'm not sure where to look or how to solve my problem.
The short version of what I am trying to do is add a column of cumulative sum based on the total number of users by month.
Here is my PostgreSQL query so far:
SELECT
DATE(DATE_TRUNC('month', crdate)) AS "Month",
COUNT(DISTINCT user_id) AS "COUNT_DISTINCT(user_id)"
FROM
datasource
WHERE
user_id IS NOT NULL
GROUP BY
DATE(DATE_TRUNC('month', create))
ORDER BY
"COUNT_DISTINCT(user_id)" DESC
Sum of Users by Month
There are some syntax errors, you can't order by an alias and in group by your date column is wrong, so it should be like this:
SELECT
DATE(DATE_TRUNC('month', crdate)) AS "Month",
COUNT(DISTINCT user_id) AS "COUNT_DISTINCT(user_id)"
FROM
datasource
WHERE
user_id IS NOT NULL
GROUP BY
DATE(DATE_TRUNC('month', crdate)) AS "Month"
ORDER BY
COUNT_DISTINCT(user_id) desc
You can use your query a Basis for the Window function
CREATE TABLE datasource(crdate timestamp,user_id int)
WITH CTE AS (
SELECT
DATE_TRUNC('month',"crdate") as "Month",
COUNT(DISTINCT user_id) AS "COUNT_DISTINCT(user_id)"
FROM
datasource
WHERE
user_id IS NOT NULL
GROUP BY
DATE_TRUNC('month', "crdate")
)
SELECT "Month", SUM("COUNT_DISTINCT(user_id)") OVER (ORDER BY "Month") as cumultatove_sum
FROM CTE
Month | cumultatove_sum
:---- | --------------:
db<>fiddle here

Month over Month percent change in user registrations

I am trying to write a query to find month over month percent change in user registration. \
Users table has the logs for user registrations
user_id - pk, integer
created_at - account created date, varchar
activated_at - account activated date, varchar
state - active or pending, varchar
I found the number of users for each year and month. How do I find month over month percent change in user registration? I think I need a window function?
SELECT
EXTRACT(month from created_at::timestamp) as created_month
,EXTRACT(year from created_at::timestamp) as created_year
,count(distinct user_id) as number_of_registration
FROM users
GROUP BY 1,2
ORDER BY 1,2
This is the output of above query:
Then I wrote this to find the difference in user registration in the previous year.
SELECT
*
,number_of_registration - lag(number_of_registration) over (partition by created_month) as difference_in_previous_year
FROM (
SELECT
EXTRACT(month from created_at::timestamp) as created_month
,EXTRACT(year from created_at::timestamp) as created_year
,count( user_id) as number_of_registration
FROM users as u
GROUP BY 1,2
ORDER BY 1,2) as temp
The output is this:
You want an order by clause that contains created_year.
number_of_registration
- lag(number_of_registration) over (partition by created_month order by created_year) as difference_in_previous_year
Note that you don't actually need a subquery for this. You can do:
select
extract(year from created_at) as created_year,
extract(month from created_at) as created_year
count(*) as number_of_registration,
count(*) - lag(count(*)) over(partition by extract(month from created_at) order by extract(year from created_at))
from users as u
group by created_year, created_month
order by created_year, created_month
I used count(*) instead of count(user_id), because I assume that user_id is not nullable (in which case count(*) is equivalent, and more efficient). Casting to a timestamp is also probably superfluous.
These queries work as long as you have data for every month. If you have gaps, then the problem should be addressed differently - but this is not the question you asked here.
I can get the registrations from each year as two tables and join them. But it is not that effective
SELECT
t1.created_year as year_2013
,t2.created_year as year_2014
,t1.created_month as month_of_year
,t1.number_of_registration_2013
,t2.number_of_registration_2014
,(t2.number_of_registration_2014 - t1.number_of_registration_2013) / t1.number_of_registration_2013 * 100 as percent_change_in_previous_year_month
FROM
(select
extract(year from created_at) as created_year
,extract(month from created_at) as created_month
,count(*) as number_of_registration_2013
from users
where extract(year from created_at) = '2013'
group by 1,2) t1
inner join
(select
extract(year from created_at) as created_year
,extract(month from created_at) as created_month
,count(*) as number_of_registration_2014
from users
where extract(year from created_at) = '2014'
group by 1,2) t2
on t1.created_month = t2.created_month
First off, Why are you using strings to hold date/time values? Your 1st step should to define created_at, activated_at as a proper timestamps. In the resulting query I assume this correction. If this is faulty (you do not correct it) then cast the string to timestamp in the CTE generating the date range. But keep in mind that if you leave it as text you will at some point get a conversion exception.
To calculate month-over-month use the formula "100*(Nt - Nl)/Nl" where Nt is the number of users this month and Nl is the number of users last month. There are 2 potential issues:
There are gaps in the data.
Nl is 0 (would incur divide by 0 exception)
The following handles this by first generating the months between the earliest date to the latest date then outer joining monthly counts to the generated dates. When Nl = 0 the query returns NULL indication the percent change could not be calculated.
with full_range(the_month) as
(select generate_series(low_month, high_month, interval '1 month')
from (select min(date_trunc('month',created_at)) low_month
, max(date_trunc('month',created_at)) high_month
from users
) m
)
select to_char(the_month,'yyyy-mm')
, users_this_month
, case when users_last_month = 0
then null::float
else round((100.00*(users_this_month-users_last_month)/users_last_month),2)
end percent_change
from (
select the_month, users_this_month , lag(users_this_month) over(order by the_month) users_last_month
from ( select f.the_month, count(u.created_at) users_this_month
from full_range f
left join users u on date_trunc('month',u.created_at) = f.the_month
group by f.the_month
) mc
) pc
order by the_month;
NOTE: There are several places there the above can be shortened. But the longer form is intentional to show how the final vales are derived.

SQL subquery using group by item from main query

I have a table with a created timestamp and id identifier.
I can get number of unique id's per week with:
SELECT date_trunc('week', created)::date AS week, count(distinct id)
FROM my_table
GROUP BY week ORDER BY week;
Now I want to have the accumulated number of created by unique id's per week, something like this:
SELECT date_trunc('week', created)::date AS week, count(distinct id),
(SELECT count(distinct id)
FROM my_table
WHERE date_trunc('week', created)::date <= week) as acc
FROM my_table
GROUP BY week ORDER BY week;
But that doesn't work, as week is not accessible in the sub select (ERROR: column "week" does not exist).
How do I solve this?
I'm using PostgreSQL
Use a cumulative aggregation. But, I don't think you need the distinct, so:
SELECT date_trunc('week', created)::date AS week, count(*) as cnt,
SUM(COUNT(*)) OVER (ORDER BY MIN(created)) as running_cnt
FROM my_table
GROUP BY week
ORDER BY week;
In any case, as you've phrased the problem, you can change cnt to use count(distinct). Your subquery is not using distinct at all.
CTEs or a temp table should fix your problem. Here is an example using CTEs.
WITH abc AS (
SELECT date_trunc('week', created)::date AS week, count(distinct id) as IDCount
FROM my_table
GROUP BY week ORDER BY week;
)
SELECT abc.week, abc.IDcount,
(SELECT count(*)
FROM my_table
WHERE date_trunc('week', created)::date <= adc.week) as acc
FROM abc
GROUP BY week ORDER BY abc.week;
Hope this helps

Unique values per time period

In my table trips , I have two columns: created_at and user_id
My goal is to count unique user_ids per month with a query in postgres. So far, I have written this - but it returns an error
SELECT user_id,
to_char(created_at, 'YYYY-MM') as t COUNT(*)
FROM (SELECT DISTINCT user_id
FROM trips) group by t;
How should I change this query?
The query is much simpler than that:
SELECT to_char(created_at, 'YYYY-MM') as yyyymm, COUNT(DISTINCT user_id)
FROM trips
GROUP BY yyyymm
ORDER BY yyyymm;

Check if timestamp is contained in date

I'm trying to check if a datetime is contained in current date, but I'm not veing able to do it.
This is my query:
select
date(timestamp) as event_date,
count(*)
from pixel_logs.full_logs f
where 1=1
where event_date = CUR_DATE()
How can I fix it?
Like Mikhail said, you need to use CURRENT_DATE(). Also, count(*) requires you to GROUP BY the date in your example. I do not know how your data is formatted, but one way to modify your query:
#standardSQL
WITH
table AS (
SELECT
1494977678 AS timestamp_secs) -- Current timestamp (in seconds)
SELECT
event_date,
COUNT(*) as count
FROM (
SELECT
DATE(TIMESTAMP_SECONDS(timestamp_secs)) AS event_date,
CURRENT_DATE()
FROM
table)
WHERE
event_date = CURRENT_DATE()
GROUP BY
event_date;