How to force zero values in Redshift? - sql

If this is my query:
select
(min(timestamp))::date as date,
(count(distinct(user_id)) as user_id_count
(row_number() over (order by signup_day desc)-1) as days_since
from
data.table
where
timestamp >= current_date - 3
group by
timestamp
order by
timestamp asc;
And these are my results
date | user_id_count | days_since
------------+-----------------+-------------
2018-01-22 | 3 | 1
2018-01-23 | 5 | 0
How can I get it the table to show (where the user ID count is 0?):
date | user_id_count | days_since
------------+-----------------+-------------
2018-01-21 | 0 | 0
2018-01-22 | 3 | 1
2018-01-23 | 5 | 0

You need to generate the dates. In Postgres, generate_series() is the way to go:
select g.ts as dte,
count(distinct t.user_id) as user_id_count
row_number() over (order by signup_day desc) - 1) as days_since
from generate_series(current_date::timestamp - interval '3 day', current_date::timestamp, interval '1 day') g(ts) left join
data.table t
on t.timestamp::date = g.ts
group by t.ts
order by t.ts;

You have to create a "calendar" table with dates and left join your aggregated result like that:
with
aggregated_result as (
select ...
)
select
t1.date
,coalesce(t2.user_id_count,0) as user_id_count
,coalesce(t2. days_since,0) as days_since
from calendar t1
left join aggregated_result t2
using (date)
more on creating calendar table: How do I create a dates table in Redshift?

Related

how to know the changed name in table by date_key

i have a table with 3 value
Date_key | user_name | user_id
2022-07-12 | milkcotton | 1
2022-09-12 | cereal | 2
2022-06-12 | musicbox1 | 3
2022-12-31 | harrybel1 | 1
2022-12-25 | milkcotton1| 4
2023-01-01 | cereal | 2
i want to know the user who changed the user_name in 1 semester (01 july 2022 - 31 december 2022). Can i do this?
my expected value is:
previous_name| new_name | user_id
milkcotton | harrybel1 | 1
Thank you!
know the changed of the user_name from 1 table
Note: This is done in Postgres SQL. This should be similar in most of the SQL engines. Date functions could slightly different in other SQL engines.
Try this:
with BaseTbl as(
select *,
cast(to_char(Date_key, 'YYYYMM') as int) as year_month,
cast(to_char(Date_key, 'MM') as int) as month,
row_number() over(partition by user_id order by date_key desc) as rnk
from Table1
),
LatestTwoChanges as(
select *
from BaseTbl
where user_id in (select user_id from BaseTbl where rnk=2 )
and rnk <=2
)
select
t2.user_name as previous_name,
t1.user_name as new_name,
t1.user_id
from LatestTwoChanges t1
join LatestTwoChanges t2
on t1.user_id=t2.user_id
where t1.rnk=1
and t2.rnk=2
and t1.year_month-t2.year_month <6
and t1.user_name <> t2.user_name
and (t1.month + t2.month <= 12 or t1.month + t2.month >=14 )
-- this is to check whether the date falling in the same semester.
SQL fiddle demo Here
Here, the table t1 contains the latest changes and table t2 contains the previous changes for a user_id.
The last filter condition
and (t1.month + t2.month <= 12 or t1.month + t2.month >=14 )
is to make sure that the two dates are falling in the same semester or not . which means the two months should be either between 1 and 6 or 7 and 12

Count if previous month data exists postgres

i'm stuck with a query to count id where if it exists in previous month than 1
my table look like this
date | id |
2020-02-02| 1 |
2020-03-04| 1 |
2020-03-04| 2 |
2020-04-05| 1 |
2020-04-05| 3 |
2020-05-06| 2 |
2020-05-06| 3 |
2020-06-07| 2 |
2020-06-07| 3 |
i'm stuck with this query
SELECT date_trunc('month',date), id
FROM table
WHERE id IN
(SELECT DISTINCT id FROM table WHERE date
BETWEEN date_trunc('month', current_date) - interval '1 month' AND date_trunc('month', current_date)
the main problem is that i stuck with current_date function. is there any dynamic ways change current_date? thanks
What i expected to be my result is
date | count |
2020-02-01| 0 |
2020-03-01| 1 |
2020-04-01| 1 |
2020-05-01| 1 |
2020-06-01| 2 |
Solution 1 with SELF JOIN
SELECT date_trunc('month', c.date) :: date AS date
, count(DISTINCT c.id) FILTER (WHERE p.date IS NOT NULL)
FROM test AS c
LEFT JOIN test AS p
ON c.id = p.id
AND date_trunc('month', c.date) = date_trunc('month', p.date) + interval '1 month'
GROUP BY date_trunc('month', c.date)
ORDER BY date_trunc('month', c.date)
Result :
date count
2020-02-01 0
2020-03-01 1
2020-04-01 1
2020-05-01 1
2020-06-01 2
Solution 2 with WINDOW FUNCTIONS
SELECT DISTINCT ON (date) date
, count(*) FILTER (WHERE count > 0 AND previous_month) OVER (PARTITION BY date)
FROM
( SELECT DISTINCT ON (id, date_trunc('month', date))
id
, date_trunc('month', date) AS date
, count(*) OVER w AS count
, first_value(date_trunc('month', date)) OVER w = date_trunc('month', date) - interval '1 month' AS previous_month
FROM test
WINDOW w AS (PARTITION BY id ORDER BY date_trunc('month', date) GROUPS BETWEEN 1 PRECEDING AND 1 PRECEDING)
) AS a
Result :
date count
2020-02-01 0
2020-03-01 1
2020-04-01 1
2020-05-01 1
2020-06-01 2
see dbfiddle

Obtain Name Column Based on Value

I have a table that calculates the number of associated records that fit a criteria for each parent record. See example below:
note - morning, afternoon and evening are only weekdays
| id | morning | afternoon | evening | weekend |
| -- | ------- | --------- | ------- | ------- |
| 1 | 0 | 2 | 3 | 1 |
| 2 | 2 | 9 | 4 | 6 |
What I am trying to achieve is to determine which columns have the lowest value and get their column name as such:
| id | time_of_day |
| -- | ----------- |
| 1 | morning |
| 2 | afternoon |
Here is my current SQL code to result in the first table:
SELECT
leads.id,
COALESCE(morning, 0) morning,
COALESCE(afternoon, 0) afternoon,
COALESCE(evening, 0) evening,
COALESCE(weekend, 0) weekend
FROM leads
LEFT OUTER JOIN (
SELECT DISTINCT ON (lead_id) lead_id, COUNT(*) AS morning
FROM lead_activities
WHERE lead_activities.modality = 'Call' AND lead_activities.bound_type = 'outbound' AND extract('dow' from created_at) IN (0,1,2,3,4,5) AND (extract('hour' from created_at) >= 0 AND extract('hour' from created_at) < 12)
GROUP BY lead_id
) morning ON morning.lead_id = leads.id
LEFT OUTER JOIN (
SELECT DISTINCT ON (lead_id) lead_id, COUNT(*) AS afternoon
FROM lead_activities
WHERE lead_activities.modality = 'Call' AND lead_activities.bound_type = 'outbound' AND extract('dow' from created_at) IN (0,1,2,3,4,5) AND (extract('hour' from created_at) >= 12 AND extract('hour' from created_at) < 17)
GROUP BY lead_id
) afternoon ON afternoon.lead_id = leads.id
LEFT OUTER JOIN (
SELECT DISTINCT ON (lead_id) lead_id, COUNT(*) AS evening
FROM lead_activities
WHERE lead_activities.modality = 'Call' AND lead_activities.bound_type = 'outbound' AND extract('dow' from created_at) IN (0,1,2,3,4,5) AND (extract('hour' from created_at) >= 17 AND extract('hour' from created_at) < 25)
GROUP BY lead_id
) evening ON evening.lead_id = leads.id
LEFT OUTER JOIN (
SELECT DISTINCT ON (lead_id) lead_id, COUNT(*) AS weekend
FROM lead_activities
WHERE lead_activities.modality = 'Call' AND lead_activities.bound_type = 'outbound' AND extract('dow' from created_at) IN (6,7)
GROUP BY lead_id
) weekend ON weekend.lead_id = leads.id
You can use CASE/WHEN/ELSE to check for the specific conditions and produce different values. For example:
with
q as (
-- your query here
)
select
id,
case
when morning <= least(afternoon, evening, weekend) then 'morning'
when afternoon <= least(morning, evening, weekend) then 'afternoon'
when evening <= least(morning, afternoon, weekend) then 'evening'
else 'weekend'
end as time_of_day
from q

Using windows functions to count by groups of dates PostgreSQL

I have a table amongst whose columns are id and created_at and I want to use window functions around the created_at of each entry to count how many entries there are within 48 hours of them. As an example, for the original table:
id | created_at
----|------------
01 | 2016/01/04
02 | 2016/01/05
03 | 2016/01/05
04 | 2016/01/06
05 | 2016/01/07
06 | 2016/01/08
07 | 2016/01/08
08 | 2016/01/09
and the result should be
id | created_at | count
----|------------|-------
01 | 2016/01/04 | 4
02 | 2016/01/05 | 5
03 | 2016/01/05 | 5
04 | 2016/01/06 | 7
05 | 2016/01/07 | 7
06 | 2016/01/08 | 5
07 | 2016/01/08 | 5
08 | 2016/01/09 | 4
The explanation is that since there are 2 transactions on 2016/01/05, 1 on 2016/01/06, 1 on 2016/01/07, 2 on 2016/01/08, and 1 on 2016/01/09, there are a total of 7 transactions within 2 days of transaction 05.
It is better to use a date table that have consecutive dates in case dates in your table have gaps.
I am wondering what's the role of the id column? Here is how I would do it without considering the id column.
select row_number()over(order by dt) as id
,dt as created_at
,cnt1+cnt2+cnt3+cnt4+cnt5 as cnt
from
(
select
date_table.dt
,lag(cnt,2,0)over(order by created_at asc) as cnt1
,lag(cnt,1,0)over(order by created_at asc) as cnt2
,isnull(cnt,0) cnt3
,lead(cnt,1,0)over(order by created_at asc) as cnt4
,lead(cnt,2,0)over(order by created_at asc) as cnt5
from
date_table left join
(select created_at,count(*) as cnt from your_table group by created_at) c
on date_table.day = c.created_at
) T
Using window functions for this purpose is challenging because of the duplicate days. You can get the results using a join or correlated subquery:
select t.*,
(select count(*)
from t t2
where t2 between t.created_at - interval 2 * '1 day' and
t.created_at + interval 2 * '1 day'
) as cnt
from t;
EDIT:
You could use window functions by doing a cumulative sum by date and then joining back. This is, of course, a bit challenging because of holes in the dates. But, something like this:
with c as (
select d.dte, count(t.created_at) as cnt,
sum(count(t.created_at))) over (order by d.dte) as cumecnt
from (select generate_series(min(created_at) - interval '2 day',
max(created_at) + interval '2 day',
'1 day')
from t
) d(dte) left join
on d.dte = t.created_at
)
select t.*, cmax.cumecnt - cmin.cumecnt
from t join
c cmin
on t.created_at = cmin.dte + interval '2 day' join
c cmax
on t.created_at = cmax.dte - interval '2 day';

Select distinct users group by time range

I have a table with the following info
|date | user_id | week_beg | month_beg|
SQL to create table with test values:
CREATE TABLE uniques
(
date DATE,
user_id INT,
week_beg DATE,
month_beg DATE
)
INSERT INTO uniques VALUES ('2013-01-01', 1, '2012-12-30', '2013-01-01')
INSERT INTO uniques VALUES ('2013-01-03', 3, '2012-12-30', '2013-01-01')
INSERT INTO uniques VALUES ('2013-01-06', 4, '2013-01-06', '2013-01-01')
INSERT INTO uniques VALUES ('2013-01-07', 4, '2013-01-06', '2013-01-01')
INPUT TABLE:
| date | user_id | week_beg | month_beg |
| 2013-01-01 | 1 | 2012-12-30 | 2013-01-01 |
| 2013-01-03 | 3 | 2012-12-30 | 2013-01-01 |
| 2013-01-06 | 4 | 2013-01-06 | 2013-01-01 |
| 2013-01-07 | 4 | 2013-01-06 | 2013-01-01 |
OUTPUT TABLE:
| date | time_series | cnt |
| 2013-01-01 | D | 1 |
| 2013-01-01 | W | 1 |
| 2013-01-01 | M | 1 |
| 2013-01-03 | D | 1 |
| 2013-01-03 | W | 2 |
| 2013-01-03 | M | 2 |
| 2013-01-06 | D | 1 |
| 2013-01-06 | W | 1 |
| 2013-01-06 | M | 3 |
| 2013-01-07 | D | 1 |
| 2013-01-07 | W | 1 |
| 2013-01-07 | M | 3 |
I want to calculate the number of distinct user_id's for a date:
For that date
For that week up to that date (Week to date)
For the month up to that date (Month to date)
1 is easy to calculate.
For 2 and 3 I am trying to use such queries:
SELECT
date,
'W' AS "time_series",
(COUNT DISTINCT user_id) COUNT (user_id) OVER (PARTITION BY week_beg) AS "cnt"
FROM user_subtitles
SELECT
date,
'M' AS "time_series",
(COUNT DISTINCT user_id) COUNT (user_id) OVER (PARTITION BY month_beg) AS "cnt"
FROM user_subtitles
Postgres does not allow window functions for DISTINCT calculation, so this approach does not work.
I have also tried out a GROUP BY approach, but it does not work as it gives me numbers for whole week/months.
Whats the best way to approach this problem?
Count all rows
SELECT date, '1_D' AS time_series, count(DISTINCT user_id) AS cnt
FROM uniques
GROUP BY 1
UNION ALL
SELECT DISTINCT ON (1)
date, '2_W', count(*) OVER (PARTITION BY week_beg ORDER BY date)
FROM uniques
UNION ALL
SELECT DISTINCT ON (1)
date, '3_M', count(*) OVER (PARTITION BY month_beg ORDER BY date)
FROM uniques
ORDER BY 1, time_series
Your columns week_beg and month_beg are 100 % redundant and can easily be replaced by
date_trunc('week', date + 1) - 1 and date_trunc('month', date) respectively.
Your week seems to start on Sunday (off by one), therefore the + 1 .. - 1.
The default frame of a window function with ORDER BY in the OVER clause uses is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. That's exactly what you need.
Use UNION ALL, not UNION.
Your unfortunate choice for time_series (D, W, M) does not sort well, I renamed to make the final ORDER BY easier.
This query can deal with multiple rows per day. Counts include all peers for a day.
More about DISTINCT ON:
Select first row in each GROUP BY group?
DISTINCT users per day
To count every user only once per day, use a CTE with DISTINCT ON:
WITH x AS (SELECT DISTINCT ON (1,2) date, user_id FROM uniques)
SELECT date, '1_D' AS time_series, count(user_id) AS cnt
FROM x
GROUP BY 1
UNION ALL
SELECT DISTINCT ON (1)
date, '2_W'
,count(*) OVER (PARTITION BY (date_trunc('week', date + 1)::date - 1)
ORDER BY date)
FROM x
UNION ALL
SELECT DISTINCT ON (1)
date, '3_M'
,count(*) OVER (PARTITION BY date_trunc('month', date) ORDER BY date)
FROM x
ORDER BY 1, 2
DISTINCT users over dynamic period of time
You can always resort to correlated subqueries. Tend to be slow with big tables!
Building on the previous queries:
WITH du AS (SELECT date, user_id FROM uniques GROUP BY 1,2)
,d AS (
SELECT date
,(date_trunc('week', date + 1)::date - 1) AS week_beg
,date_trunc('month', date)::date AS month_beg
FROM uniques
GROUP BY 1
)
SELECT date, '1_D' AS time_series, count(user_id) AS cnt
FROM du
GROUP BY 1
UNION ALL
SELECT date, '2_W', (SELECT count(DISTINCT user_id) FROM du
WHERE du.date BETWEEN d.week_beg AND d.date )
FROM d
GROUP BY date, week_beg
UNION ALL
SELECT date, '3_M', (SELECT count(DISTINCT user_id) FROM du
WHERE du.date BETWEEN d.month_beg AND d.date)
FROM d
GROUP BY date, month_beg
ORDER BY 1,2;
SQL Fiddle for all three solutions.
Faster with dense_rank()
#Clodoaldo came up with a major improvement: use the window function dense_rank(). Here is another idea for an optimized version. It should be even faster to exclude daily duplicates right away. The performance gain grows with the number of rows per day.
Building on a simplified and sanitized data model
- without the redundant columns
- day as column name instead of date
date is a reserved word in standard SQL and a basic type name in PostgreSQL and shouldn't be used as identifier.
CREATE TABLE uniques(
day date -- instead of "date"
,user_id int
);
Improved query:
WITH du AS (
SELECT DISTINCT ON (1, 2)
day, user_id
,date_trunc('week', day + 1)::date - 1 AS week_beg
,date_trunc('month', day)::date AS month_beg
FROM uniques
)
SELECT day, count(user_id) AS d, max(w) AS w, max(m) AS m
FROM (
SELECT user_id, day
,dense_rank() OVER(PARTITION BY week_beg ORDER BY user_id) AS w
,dense_rank() OVER(PARTITION BY month_beg ORDER BY user_id) AS m
FROM du
) s
GROUP BY day
ORDER BY day;
SQL Fiddle demonstrating the performance of 4 faster variants. It depends on your data distribution which is fastest for you.
All of them are about 10x as fast as the correlated subqueries version (which isn't bad for correlated subqueries).
Without correlated subqueries. SQL Fiddle
with u as (
select
"date", user_id,
date_trunc('week', "date" + 1)::date - 1 week_beg,
date_trunc('month', "date")::date month_beg
from uniques
)
select
"date", count(distinct user_id) D,
max(week_dr) W, max(month_dr) M
from (
select
user_id, "date",
dense_rank() over(partition by week_beg order by user_id) week_dr,
dense_rank() over(partition by month_beg order by user_id) month_dr
from u
) s
group by "date"
order by "date"
Try
SELECT
*
FROM
(
SELECT dates, count(user_id), 'D' as timesereis FROM users_data GROUP BY dates
UNION
SELECT max(dates), count(user_id), 'W' FROM users_data GROUP BY date_part('year',dates)+date_part('week',dates)
UNION
SELECT max(dates), count(user_id), 'M' FROM users_data GROUP BY date_part('year',dates)+date_part('week',dates)
) tEMP order by dates, timesereis
SQLFIDDLE
Try queries like this
SELECT count(distinct user_id), date_format(date, '%Y-%m-%d') as date_period
FROM uniques
GROUP By date_period