The query only works when there is :: DATE.
-- Wrap the query you wrote in a CTE named reg_dates
WITH reg_dates AS (
SELECT
user_id,
MIN(order_date) AS reg_date
FROM orders
GROUP BY user_id)
SELECT
-- Count the unique user IDs by registration month
DATE_TRUNC('month', reg_date) :: DATE AS delivr_month,
COUNT(DISTINCT user_id) AS regs
FROM reg_dates
GROUP BY delivr_month
ORDER BY delivr_month ASC;
Why is that required? When I run the query below, without :: DATE, it does not work.
-- Wrap the query you wrote in a CTE named reg_dates
WITH reg_dates AS (
SELECT
user_id,
MIN(order_date) AS reg_date
FROM orders
GROUP BY user_id)
SELECT
-- Count the unique user IDs by registration month
DATE_TRUNC('month', reg_date) AS delivr_month,
COUNT(DISTINCT user_id) AS regs
FROM reg_dates
GROUP BY delivr_month
ORDER BY delivr_month ASC;
Highly likely your RDBMS is PostgreSQL, in your case the :: converts a date type of date, further, :: is represented as CAST(expression AS type).
Equally,
CAST (DATE_TRUNC('month', reg_date) AS DATE) AS delivr_month
What does "does not work" mean? Note that date_trunc in PostgreSQL returns a datetime. So if you need a date for your query to work, this is why you need ::date.
Related
I am trying to write a query to find month over month percent change in user registration. \
Users table has the logs for user registrations
user_id - pk, integer
created_at - account created date, varchar
activated_at - account activated date, varchar
state - active or pending, varchar
I found the number of users for each year and month. How do I find month over month percent change in user registration? I think I need a window function?
SELECT
EXTRACT(month from created_at::timestamp) as created_month
,EXTRACT(year from created_at::timestamp) as created_year
,count(distinct user_id) as number_of_registration
FROM users
GROUP BY 1,2
ORDER BY 1,2
This is the output of above query:
Then I wrote this to find the difference in user registration in the previous year.
SELECT
*
,number_of_registration - lag(number_of_registration) over (partition by created_month) as difference_in_previous_year
FROM (
SELECT
EXTRACT(month from created_at::timestamp) as created_month
,EXTRACT(year from created_at::timestamp) as created_year
,count( user_id) as number_of_registration
FROM users as u
GROUP BY 1,2
ORDER BY 1,2) as temp
The output is this:
You want an order by clause that contains created_year.
number_of_registration
- lag(number_of_registration) over (partition by created_month order by created_year) as difference_in_previous_year
Note that you don't actually need a subquery for this. You can do:
select
extract(year from created_at) as created_year,
extract(month from created_at) as created_year
count(*) as number_of_registration,
count(*) - lag(count(*)) over(partition by extract(month from created_at) order by extract(year from created_at))
from users as u
group by created_year, created_month
order by created_year, created_month
I used count(*) instead of count(user_id), because I assume that user_id is not nullable (in which case count(*) is equivalent, and more efficient). Casting to a timestamp is also probably superfluous.
These queries work as long as you have data for every month. If you have gaps, then the problem should be addressed differently - but this is not the question you asked here.
I can get the registrations from each year as two tables and join them. But it is not that effective
SELECT
t1.created_year as year_2013
,t2.created_year as year_2014
,t1.created_month as month_of_year
,t1.number_of_registration_2013
,t2.number_of_registration_2014
,(t2.number_of_registration_2014 - t1.number_of_registration_2013) / t1.number_of_registration_2013 * 100 as percent_change_in_previous_year_month
FROM
(select
extract(year from created_at) as created_year
,extract(month from created_at) as created_month
,count(*) as number_of_registration_2013
from users
where extract(year from created_at) = '2013'
group by 1,2) t1
inner join
(select
extract(year from created_at) as created_year
,extract(month from created_at) as created_month
,count(*) as number_of_registration_2014
from users
where extract(year from created_at) = '2014'
group by 1,2) t2
on t1.created_month = t2.created_month
First off, Why are you using strings to hold date/time values? Your 1st step should to define created_at, activated_at as a proper timestamps. In the resulting query I assume this correction. If this is faulty (you do not correct it) then cast the string to timestamp in the CTE generating the date range. But keep in mind that if you leave it as text you will at some point get a conversion exception.
To calculate month-over-month use the formula "100*(Nt - Nl)/Nl" where Nt is the number of users this month and Nl is the number of users last month. There are 2 potential issues:
There are gaps in the data.
Nl is 0 (would incur divide by 0 exception)
The following handles this by first generating the months between the earliest date to the latest date then outer joining monthly counts to the generated dates. When Nl = 0 the query returns NULL indication the percent change could not be calculated.
with full_range(the_month) as
(select generate_series(low_month, high_month, interval '1 month')
from (select min(date_trunc('month',created_at)) low_month
, max(date_trunc('month',created_at)) high_month
from users
) m
)
select to_char(the_month,'yyyy-mm')
, users_this_month
, case when users_last_month = 0
then null::float
else round((100.00*(users_this_month-users_last_month)/users_last_month),2)
end percent_change
from (
select the_month, users_this_month , lag(users_this_month) over(order by the_month) users_last_month
from ( select f.the_month, count(u.created_at) users_this_month
from full_range f
left join users u on date_trunc('month',u.created_at) = f.the_month
group by f.the_month
) mc
) pc
order by the_month;
NOTE: There are several places there the above can be shortened. But the longer form is intentional to show how the final vales are derived.
I want to find customers where for example, system by error registered duplicates of an order.
It's pretty easy, if reg_date is EXACTLY the same but I have no idea how to implement it in query to count as duplicate if for example there was up to 1 second difference between transactions.
select * from
(select customer_id, reg_date, count(*) as cnt
from orders
group by 1,2
) x where cnt > 1
Here is example dataset:
https://www.db-fiddle.com/f/m6PhgReSQbVWVZhqe8n4mi/0
CUrrently only customer's 104 orders are counted as duplicates because its reg_date is identical, I want to count also orders 1,2 and 4,5 as there's just 1 second difference
demo:db<>fiddle
SELECT
customer_id,
reg_date
FROM (
SELECT
*,
reg_date - lag(reg_date) OVER (PARTITION BY customer_id ORDER BY reg_date) <= interval '1 second' as is_duplicate
FROM
orders
) s
WHERE is_duplicate
Use the lag() window function. It allows to have a look hat the previous record. With this value you can do a diff and filter the records where the diff time is more than one second.
Try this following script. This will return you day/customer wise duplicates.
SELECT
TO_CHAR(reg_date :: DATE, 'dd/mm/yyyy') reg_date,
customer_id,
count(*) as cnt
FROM orders
GROUP BY
TO_CHAR(reg_date :: DATE, 'dd/mm/yyyy'),
customer_id
HAVING count(*) >1
In my table trips , I have two columns: created_at and user_id
My goal is to count unique user_ids per month with a query in postgres. So far, I have written this - but it returns an error
SELECT user_id,
to_char(created_at, 'YYYY-MM') as t COUNT(*)
FROM (SELECT DISTINCT user_id
FROM trips) group by t;
How should I change this query?
The query is much simpler than that:
SELECT to_char(created_at, 'YYYY-MM') as yyyymm, COUNT(DISTINCT user_id)
FROM trips
GROUP BY yyyymm
ORDER BY yyyymm;
I'm trying to check if a datetime is contained in current date, but I'm not veing able to do it.
This is my query:
select
date(timestamp) as event_date,
count(*)
from pixel_logs.full_logs f
where 1=1
where event_date = CUR_DATE()
How can I fix it?
Like Mikhail said, you need to use CURRENT_DATE(). Also, count(*) requires you to GROUP BY the date in your example. I do not know how your data is formatted, but one way to modify your query:
#standardSQL
WITH
table AS (
SELECT
1494977678 AS timestamp_secs) -- Current timestamp (in seconds)
SELECT
event_date,
COUNT(*) as count
FROM (
SELECT
DATE(TIMESTAMP_SECONDS(timestamp_secs)) AS event_date,
CURRENT_DATE()
FROM
table)
WHERE
event_date = CURRENT_DATE()
GROUP BY
event_date;
My PG query:
SELECT "Tracks"."PageId",
date_trunc("month", "Tracks"."createdAt") AS month,
count(*)
FROM "Tracks"
WHERE "Tracks"."PageId" IN (29,30,31)
GROUP BY month, "Tracks"."PageId"
and my schema:
id, createdAt, updatedAt, PageId
A bit confused as to why I'm receiving this error!
You can use an alias in the where or group by clause. You need to repeat the expression:
SELECT "Tracks"."PageId",
date_trunc('month', "Tracks"."createdAt") AS month,
count(*)
FROM "Tracks"
WHERE "Tracks"."PageId" IN (29,30,31)
GROUP BY date_trunc('month', "Tracks"."createdAt"), "Tracks"."PageId";
Note that the first parameter for date_trunc() is a varchar value, so you need to put that in single quotes, not double quotes.
If you don't want to repeat the expression you can put that into a derived table:
select "PageId", month, count(*)
from (
SELECT "Tracks"."PageId",
date_trunc('month', "Tracks"."createdAt") AS month
FROM "Tracks"
WHERE "Tracks"."PageId" IN (29,30,31)
) t
group by month, "PageId";
Unrelated, but: you should really avoid quoted identifiers. They are much more trouble then they are worth it