Compare counts of two tables joined on week - SQL - sql

I'm a newbie in SQL. I have two tables. I want to count the number of occurrences of one thing each week in the first, and of another thing each week in the second, and then compare them.
I already have the codes for counting in two separate graphs bu can't seem to be able to join them.
My first count :
select
date_part('week',Table2.date at time zone 'utc' at time zone 'Europe/Paris') as week,
count(Table2.issue_solved) as count2
from Table2
where date is not null
group by week
order by week asc
My second count
select
date_part('week',Table1.activity_date at time zone 'utc' at time zone 'Europe/Paris') as week,
count(distinct Table1.activity_id) as count1
from Table1
left join X
on Y1 = Y2
left join W
on A1 = A2
and B1 = B2
where activity_dimensions.type in ('Training')
and acquisition_opportunity_dimensions.product_family = 'EHR'
and activity_dimensions.country = 'fr'
and activity_date::date >= date_trunc('[aggregation]', [daterange_start])
and activity_date::date <= [daterange_end]
and activity_date::date <= current_date
group by week
order by count_training_meetings desc
I tried to join the first code into the second with a join on week, but I can't seem to make this work.
Any idea?

Not sure if periscope allows full join, but if you have some weeks in your first data set (query) which don't appear in the second one, and vice versa, you should use this operator in order to retrieve everything.
coalesce is intend to get the first value it recognices as not null.
In standard sql, it should be something like this
select
coalesce(q1.week, q2.week) as week,
count1,
count2
from
(
select
date_part('week',Table2.date at time zone 'utc' at time zone 'Europe/Paris') as week,
count(Table2.issue_solved) as count2
from Table2
where date is not null
group by week
) q1
full join
(
select
date_part('week',Table1.activity_date at time zone 'utc' at time zone 'Europe/Paris') as week,
count(distinct Table1.activity_id) as count1
from Table1
left join X
on Y1 = Y2
left join W
on A1 = A2
and B1 = B2
where activity_dimensions.type in ('Training')
and acquisition_opportunity_dimensions.product_family = 'EHR'
and activity_dimensions.country = 'fr'
and activity_date::date >= date_trunc('[aggregation]', [daterange_start])
and activity_date::date <= [daterange_end]
and activity_date::date <= current_date
group by week
) q2
on q1.week = q2.week
As I told you in previous comments, maybe it could be wrong to mix weeks from different years if they are present on your data, but this is just a suggestion

Related

Redshift - Adding dates (month interval) between two dates

Using Amazon Redshift.
Also have a dates table with all calendar dates that can be utilized.
Question: How can I take a start timestamp (created_at) and end timestamp (ended_at) and add a column that adds 1 month to the start timestamp until the end timestamp.
I have a table with:
user_id,
plan_id,
created_at,
ended_at, (can be null)
So if I had a created_at timestamp of 2019-07-11, I would have a column with additional rows for 2019-08-11, 2019-09-11, 2019-10-11, etc. The goal is to associate the monthly amounts paid by a user to the dates when starting with only a start and end date.
EDIT:
I used the below query which works when an ended_at timestamp is present, however, when it is null, I need to have the next month populated until an ended_at timestamp is present.
select
ps.network_id,
ps.user_id,
ps.plan_id,
ps.created_at,
extract('day' from ps.created_at) as extract_day,
d.calendar_date,
ps.archived_at as ended_at,
ps.application_fee_percent,
pp.amount,
pp.interval,
pp.name
from payments_subscriptions ps
left outer join dates d on extract('day' from date_trunc('day',d.calendar_date)) = extract('day' from ps.created_at) AND date_trunc('day',d.calendar_date) >= date_trunc('day',ps.created_at) AND date_trunc('day',d.calendar_date) < date_trunc('day',ps.archived_at)
left outer join payments_plans pp on ps.plan_id = pp.id
where ps.network_id = '1318990'
and ps.user_id = '2343404'
order by 3,6 desc
output from above query - subscription with null ended_at needs to continue until ended_at is present
Use dateadd function for increasing time/date in timestamp
https://docs.aws.amazon.com/redshift/latest/dg/r_DATEADD_function.html
For increasing one month use this:
DATEADD(month, 1, CURRENT_TIMESTAMP)
For anyone looking for a potential solution, I ended up joining my dates table in this fashion:
LEFT OUTER JOIN dates d ON extract('day' FROM date_trunc('day',d.calendar_date)) = extract('day' FROM payments_subscriptions.created_at)
AND date_trunc('day',d.calendar_date) >= date_trunc('day',payments_subscriptions.created_at)
AND date_trunc('day',d.calendar_date) < date_trunc('day',getdate())
and this where clause:
WHERE (calendar_date < date_trunc('day',payments_subscriptions.archived_at) OR payments_subscriptions.archived_at is null)

Postgresql left join date_trunc with default values

I have 3 tables which I'm querying to get the data based on different conditions. I have from and to params and these are the ones I'm using to create a range of time in which I'm looking for the data in those tables.
For instance if I have from equals to '2020-07-01' and to equals to '2020-08-01' I'm expecting to receive the grouped row values of the tables by week, if in some case some of the weeks don't have records I want to return 0, if some tables have records for the same week, I'd like to sum them.
Currently I have this:
SELECT d.day, COALESCE(t.total, 0)
FROM (
SELECT day::date
FROM generate_series(timestamp '2020-07-01',
timestamp '2020-08-01',
interval '1 week') day
) d
LEFT JOIN (
SELECT date AS day,
SUM(total)
FROM table1
WHERE id = '1'
AND date BETWEEN '2020-07-01' AND '2020-08-01'
GROUP BY day
) t USING (day)
ORDER BY d.day;
I'm generating a series of dates grouped by week, and on top of that I'm doing adding a left join. Now for some reason, it only works if the dates match completely, otherwise COALESCE(t.total, 0) returns 0 even if in that week the SUM(total) is not 0.
The same way I'm applying the LEFT JOIN, I'm using other left joins with other tables in the same query, so I'm falling with the same problem.
Please see if this works for you. Whenever you find yourself aggregating more than once, ask yourself whether it is necessary.
Rather than try to match on discrete days, use time ranges.
with limits as (
select '2020-07-01'::timestamp as dt_start,
'2020-08-01'::timestamp as dt_end
), weeks as (
SELECT x.day::date as day, least(x.day::date + 7, dt_end::date) as day_end
FROM limits l
CROSS JOIN LATERAL
generate_series(l.dt_start, l.dt_end, interval '1 week') as x(day)
WHERE x.day::date != least(x.day::date + 7, dt_end::date)
), t1 as (
select w.day,
sum(coalesce(t.total, 0)) as t1total
from weeks w
left join table1 t
on t.id = 1
and t.date >= w.day
and t.date < w.day_end
group by w.day
), t2 as (
select w.day,
sum(coalesce(t.sum_measure, 0)) as t2total
from weeks w
left join table2 t
on t.something = 'whatever'
and t.date >= w.day
and t.date < w.day_end
group by w.day
)
select t1.day,
t1.t1total,
t2.t2total
from t1
join t2 on t2.day = t1.day;
You can keep adding tables like that with CTEs.
My earlier example with multiple left join was bad because it blows out the rows due to a lack of join conditions between the left-joined tables.
There is an interesting corner case for e.g. 2019-02-01 to 2019-03-01 which returns an empty interval as the last week. I have updated to filter that out.

Postgres: Return zero as default for rows where there is no matach

I am trying to get all the paid contracts from my contracts table and group them by month. I can get the data but for months where there is no new paid contract I want to get a zero instead of missing month. I have tried coalesce and generate_series but I cannot seem to get the missing row.
Here is my query:
with months as (
select generate_series(
'2019-01-01', current_date, interval '1 month'
) as series )
select date(months.series) as day, SUM(contracts.price) from months
left JOIN contracts on date(date_trunc('month', contracts.to)) = months.series
where contracts.tier='paid' and contracts.trial=false and (contracts.to is not NULL) group by day;
I want the results to look like:
|Contract Value| Month|
| 20 | 01-2020|
| 10 | 02-2020|
| 0 | 03-2020|
I can get the rows where there is a contract but cannot get the zero row.
Postgres Version 10.9
I think that you want:
with months as (
select generate_series('2019-01-01', current_date, interval '1 month' ) as series
)
select m.series as day, coalesce(sum(c.price), 0) sum_price
from months m
left join contracts c
on c.to >= m.series
and c.to < m.series + interval '1' month
and co.tier = 'paid'
and not c.trial
group by m.series;
That is:
you want the condition on the left joined table in the on clause of the join rather than in the where clause, otherwise they become mandatory, and evict rows where the left join came back empty
the filter on the date can be optimized to avoid using date functions; this makes the query SARGeable, ie the database may take advantage of an index on the date column
table aliases make the query easier to read and write
You need to move conditions to the on clause:
with months as (
select generate_series( '2019-01-01'::date, current_date, interval '1 month') as series
)
select dm.series as day, coalesce(sum(c.price), 0)
from months m left join
contracts c
on c.to >= m.series and
c.to < m.series + interval '1 month' and
c.tier = 'paid' and
c.trial = false
group by day;
Note some changes to the query:
The conditions on c that were in the where clause are in the on clause.
The date comparison uses simple data comparisons, rather than truncating to the month. This helps the optimizer and makes it easier to use an index.
Table aliases make the query easier to write and to read.
There is no need to convert day to a date. It already is.
to is a bad choice for a column name because it is reserved. However, I did not change it.

Subtraction of counts of 2 tables

I have 2 different tables, A and B. A is something like created and b is removed
I want to obtain the nett difference of the counts per week in an SQL query.
Currently I have
SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS Week,
Count(id) AS "A - New"
FROM table_name.A
GROUP BY 1
ORDER BY 1
This gets me the count per week for table A only. How could I incorporate the logic of subtracting the same Count(id) from B, for the same timeframe?
Thanks! :)
The potential issue here is that for any week you might only have additions or removals, so to align a count from the 2 tables - by week - an approach would be to use a full outer join, like this:
SELECT COALESECE(A.week, b.week) as week
, count_a
, count_b
, COALESECE(count_a,0) - COALESECE(count_b,0) net
FROM (
SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS week
, Count(*) AS count_A
FROM table_a
GROUP BY DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08')
) a
FUUL OUTER JOIN (
SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS week
, Count(*) AS count_b
FROM table_b
GROUP BY DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08')
) b on a.week = b.week
The usual syntex for substracting values from 2 queries is as follows
Select (Query1) - (Query2) from dual;
Assuming both the tables have same number of id in 'id' column and your given query works for tableA, following query will subtract the count(id) from both tables.
select(SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS Week,
Count(id) AS "A - New" FROM table_name.A GROUP BY 1 ORDER BY 1) - (SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS Week,
Count(id) AS "B - New" FROM table_name.B GROUP BY 1 ORDER BY 1) from dual
Or you can also try the following approach
Select c1-c2 from(Query1 count()as c1),(Query2 count() as c2);
So your query will be like
Select c1-c2 from (SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS Week, Count(id) AS c1 FROM table_name.A GROUP BY 1 ORDER BY 1),(SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS Week, Count(id) AS c2 FROM table_name.B GROUP BY 1 ORDER BY 1);

MySQL date range SELECT + JOIN query using column with CURRENT_TIMESTAMP

I am using this query:
SELECT p.id, count(clicks.ip)
FROM `p`
LEFT JOIN c clicks ON p.id = clicks.pid
WHERE clicks.ip = '111.222.333.444'
To select clicks from table "c", that has "pid" = "p.id". The query seems to work fine, but now I want to use that query with date ranges. The "c" table has a column "time" that uses MySQL CURRENT_TIMESTAMP data type (YYYY-MM-DD HH:MM:SS). How can I use my query with date range using that column?
I want to be able to select count(clicks.ip) from a specific day, and also group the results by hour (but this is for a different query).
Use:
SELECT p.id,
COUNT(clicks.ip)
FROM `p`
LEFT JOIN c clicks ON clicks.pid = p.id
AND clicks.ip = '111.222.333.444'
AND clicks.time BETWEEN DATE_SUB(NOW(), INTERVAL 1 DAY)
AND NOW()
I provided an example that will count clicks that occurred between this time yesterday (DATE_SUB(NOW(), INTERVAL 1 DAY)) and today (NOW()). Mind that BETWEEN is inclusive.