I have a scenario where there are cases and multiple calls are made to customers against those cases. All these call logs are in a table which has following columns -
id primary key int
case_id int
call_made_at timestamp
I have to find number of new calls(1st call made for any case) made in last 7 days and number of old calls (any call which is not 1st call for any case) in last 7 days.
I can use row_number() with partition over case_id. But lifetime of case id is not much. So doing a partition on the entire table seems bad. Also table will soon become huge.
Any suggestions?
I see this as aggregation, not window functions:
select sum(case when min_cma >= current_date - interval '7 day' then 1 else 0 end) as last_7_days,
sum(case when max_cma >= current_date - interval '7 day' and
min_cms < current_date - interval '7 day'
then 1 else 0 end) as
from (select cl.case_id,
min(call_made_at) as min_cma,
max(call_made_at) as max_cma
from call_logs cl
group by cl.case_id
) cl;
You can add something like where max_cma >= current_date - interval '7 day' to the outer query. It will probably improve performance.
Related
My database table looks like this:
CREATE TABLE record
(
id INT,
status INT,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (id)
);
And I want to create a generic query to get count of record created after 3 hours of interval in last day
For example, I want to know in last 1 day after 3 hours how many records are created.
What I have so far: with a little help from stackoverflow I am able to create a query to calculate the count for a single full day.
SELECT
DATE(created_at) AS day, COUNT(1)
FROM
record
WHERE
created_at >= current_date - 1
GROUP BY
DATE(created_at)
This is telling me in full day like 24 records are created but I want to get how many are made in interval of 3 hours
If you want the count for the last three hours of data:
select count(*)
from record
where created_at >= now() - interval '3 hour';
If you want the last day minus 3 hours, that would be 21 hours:
select count(*)
from record
where created_at >= now() - interval '21 hour';
EDIT:
You want intervals of 3 hours for the last 24 hours. The simplest method is probably generate_series():
select gs.ts, count(r.created_at)
from generate_series(now() - interval '24 hour', now() - interval '3 hour', interval '3 hour') gs(ts) left join
record r
on r.created_at >= gs.ts and
r.created_at < gs.ts + interval '3 hour'
group by gs.ts
order by gs.ts;
In my PostgreSQL database I have an invitations table with the following fields:
Invitations: id, created_at, completed_at (timestamp)
I am working to write a PostgreSQL query that returns the number of records completed between 2-7 days of the creation date.
Here is what I have so far:
SELECT round(count(i.completed_at <= i.created_at + interval '7 day' and i.completed_at > i.created_at + interval '1 day')::decimal / count(DISTINCT i.id), 2) * 100 AS "CR in D2-D7"
FROM invitations i
The select statement is not returning the correct value. What am I doing wrong?
The expression you are feeding to the first COUNT yields a boolean and never a NULL (unless its inputs are NULL). But COUNT counts all of its non-NULL input, so whether the expression returns true or false, the count still increments. There are many ways to fix this, a simple (but probably not the best--just the least typing difference from what you already have) one would be to use nullif to convert false to NULL inside the first COUNT.
But even then, is this correct? It seems odd that one COUNT has a DISTINCT and the other does not.
So a more complete solution may be something like:
SELECT
round(
count(distinct i.id) filter (where i.completed_at <= i.created_at + interval '7 day' and i.completed_at > i.created_at + interval '1 day')::decimal
/
count(DISTINCT i.id)
,2) * 100 AS "CR in D2-D7"
FROM invitations i
Just do this below:
SELECT * from
invitations i
where
i.completed_at <= i.created_at + interval '7 day'
and i.completed_at > i.created_at + interval '1 day'
Here is another way:
SELECT COUNT(*)
FROM invitations i
WHERE completed_at BETWEEN (created_at + '2 days'::interval) AND (created_at + '7 days'::interval);
I have a table like below image. What I need is to get average value of Volume column, grouped by User both for 1 hour and 24 hours ago. How can I use avg with two different date range in single query?
You can do it like:
SELECT user, AVG(Volume)
FROM mytable
WHERE created >= NOW() - interval '1 hour'
AND created <= NOW()
GROUP BY user
Few things to remember, you are executing the query on same server with same time zone. You need to group by the user to group all the values in volume column and then apply the aggregation function like avg to find average. Similarly if you need both together then you could do the following:
SELECT u1.user, u1.average, u2.average
FROM
(SELECT user, AVG(Volume) as average
FROM mytable
WHERE created >= NOW() - interval '1 hour'
AND created <= NOW()
GROUP BY user) AS u1
INNER JOIN
(SELECT user, AVG(Volume) as average
FROM mytable
WHERE created >= NOW() - interval '1 day'
AND created <= NOW()
GROUP BY user) AS u2
ON u1.user = u2.user
Use conditional aggregation. Postgres offers very convenient syntax using the FILTER clause:
SELECT user,
AVG(Volume) FILTER (WHERE created >= NOW() - interval '1 hour' AND created <= NOW()) as avg_1hour,
AVG(Volume) FILTER (WHERE created >= NOW() - interval '1 day' AND created <= NOW()) as avg_1day
FROM mytable
WHERE created >= NOW() - interval '1 DAY' AND
created <= NOW()
GROUP BY user;
This will filter out users who have had no activity in the past day. If you want all users -- even those with no recent activity -- remove the WHERE clause.
The more traditional method uses CASE:
SELECT user,
AVG(CASE WHEN created >= NOW() - interval '1 hour' AND created <= NOW() THEN Volume END) as avg_1hour,
AVG(CASE WHEN created >= NOW() - interval '1 day' AND created <= NOW() THEN Volume END) as avg_1day
. . .
SELECT User, AVG(Volume) , ( IIF(created < DATE_SUB(NOW(), INTERVAL 1 HOUR) , 1 , 0) )IntervalType
WHERE created < DATE_SUB(NOW(), INTERVAL 1 HOUR)
AND created < DATE_SUB(NOW(), INTERVAL 24 HOUR)
GROUP BY User, (IIF(created < DATE_SUB(NOW(), INTERVAL 1 HOUR))
Please Tell me about it's result :)
For large datasets which option is better multiple select vs case
CASE EXAMPLE:
SELECT SUM(CASE WHEN(created_at > (CURRENT_DATE - INTERVAL '1 days')) THEN 1 ELSE 0 END) as day_count,
SUM(CASE WHEN(created_at > (CURRENT_DATE - INTERVAL '1 months')) THEN 1 ELSE 0 END) as month_count,
SUM(CASE WHEN(created_at > (CURRENT_DATE - INTERVAL '3 months')) THEN 1 ELSE 0 END) as quater_count,
SUM(CASE WHEN(created_at > (CURRENT_DATE - INTERVAL '6 months')) THEN 1 ELSE 0 END) as half_year_count,
SUM(CASE WHEN(created_at > (CURRENT_DATE - INTERVAL '1 years')) THEN 1 ELSE 0 END) as year_count,
count(*) as total_count from wallets;
Multiple Select Query:
SELECT count(*) from wallets where created_at > CURRENT_DATE - INTERVAL '1 days';
SELECT count(*) from wallets where created_at > CURRENT_DATE - INTERVAL '1 months';
SELECT count(*) from wallets where created_at > CURRENT_DATE - INTERVAL '3 months';
SELECT count(*) from wallets where created_at > CURRENT_DATE - INTERVAL '6 months';
SELECT count(*) from wallets where created_at > CURRENT_DATE - INTERVAL '1 years';
SELECT count(*) from wallets;
the requirements are to find wallets count by day, month, 3 months, 6 months and year.
If I go with multiple select then 6 queries will be needed to fetch the data.
using switch case we can get the data in a single query but I am not sure its a best practice to use switch case for large datasets.
Please find the query analysis below, I have only 10 records in my DB:
Case query Analysis:
Multiple query Analysis:
The single query is going to be better. You will get an improvement in performance using filter:
SELECT COUNT(*) FILTER (WHERE created_at > (CURRENT_DATE - INTERVAL '1 days')) as day_count,
COUNT(*) FILTER (WHERE created_at > (CURRENT_DATE - INTERVAL '1 months')) as month_count,
COUNT(*) FILTER (WHERE created_at > (CURRENT_DATE - INTERVAL '3 months')) as quater_count,
COUNT(*) FILTER (WHERE created_at > (CURRENT_DATE - INTERVAL '6 months')) as half_year_count,
COUNT(*) FILTER (WHERE created_at > (CURRENT_DATE - INTERVAL '1 years')) as year_count,
COUNT(*) as total_count
FROM wallets;
If you have an index on created_at, then this should also help Postgres optimize to only use that index.
I can do an educated guess. Without an actual data tests are of little use.
The Multiple Select Query is easier to optimise by database planar. The PostgreSQL 9.6+ could use index only scans for this. It may end in a few very fast queries.
The Case Example is very hard to read. I’m afraid nobody can write an index for this and the query will be forced to scan the whole table. This could be a horribly slow operation.
From knex point of view the difference is that multiple queries can be sent to the DB through separate connections and executed parallel. Doing just single query is probably more performant overall causing less stress / data transfer overhead on DB server.
Biggest drawback from first way of doing the query is that you cannot build it nicely with knex and it looks horrible to anyone who reads the code.
Better way to achieve this kind of packing multiple queries to single is to use postgres with statement (common table expressions) https://www.postgresql.org/docs/9.6/static/queries-with.html which knexalso supports http://knexjs.org/#Builder-with
EDIT: or just do multiple queries in single select somewhat like Gordon Linoff suggested :
knex
.select(
knex('wallets')
.where('createad_at', '>', knex.raw("CURRENT_DATE - INTERVAL '1 days'"))
.count()
.as('lastDay'),
knex('wallets')
.where('createad_at', '>', knex.raw("CURRENT_DATE - INTERVAL '1 months'"))
.count()
.as('lastMonth'),
... rest of the queries ...
);
https://runkit.com/embed/wsy01ar1hb73
postgresql should be able to optimize multiple subqueries to be executed with similair plan that Gordon's answer promotes.
I am trying to get aggregate values by time periods of two relations (buys and uses) and join them so that I can get the results in one report and also draw a ratio on them. I am using PostgreSQL. The end report required is: dateTime, u.sum, b.sum, b.sum/u.sum
The following query works but scales very poorly with larger table sizes.
SELECT b2.datetime AS dateTime, b2.sum AS BUY_VOLUME, u1.sum AS USE_VOLUME,
CASE u1.sum
WHEN 0 THEN 0
ELSE (b2.sum / u1.sum)
END AS buyToUseRatio
FROM(
SELECT SUM(b.total / 100.0) AS sum, date_trunc('week', (b.datetime + INTERVAL '1 day')) - INTERVAL '1 day' as datetime
FROM buys AS b
WHERE
datetime > date_trunc('month', CURRENT_DATE) - INTERVAL '1 year'
GROUP BY datetime) AS b2
INNER JOIN (SELECT SUM(u.amount) / 100.00 AS sum, date_trunc('week', (u.datetime + INTERVAL '1 day')) - INTERVAL '1 day' AS datetime
FROM uses AS u
WHERE
datetime > date_trunc('month', CURRENT_DATE) - INTERVAL '1 year'
GROUP BY datetime) AS u1 ON b2.datetime = u1.datetime
ORDER BY b2.datetime ASC;
I was wondering if anyone could help me by providing an alternative query that would get the end result required and is faster to execute.
I appreciate any help on this :-) My junior level SQL is a little rusty and I can't think of another way of doing this without creating indexes. Thanks in advance.
At least, these indexes can help your query:
create index idx_buys_datetime on buys(datetime);
create index idx_uses_datetime on uses(datetime);
Your query seems fine. However, you could use full join (instead of inner) to have all rows, where at least one of your tables have data. You could even use generate_series() to always have 1 year of results, even when there is no data in either of your tables, but I'm not sure if that's what you need. Also, some other things can be written more easily; your query could look like this:
select dt, buy_volume, use_volume, buy_volume / nullif(use_volume, 0.0) buy_to_use_ratio
from (select sum(total / 100.0) buy_volume, date_trunc('week', (datetime + interval '1 day')) - interval '1 day' dt
from buys
where datetime > date_trunc('month', current_timestamp - interval '1 year')
group by 2) b
full join (select sum(amount) / 100.0 use_volume, date_trunc('week', (datetime + interval '1 day')) - interval '1 day' dt
from uses
where datetime > date_trunc('month', current_timestamp - interval '1 year')
group by 2) u using (dt)
order by 1
http://rextester.com/YVASV92568
So the answer depends on how large your tables are, but if it was me, I would create one or two new "summary" tables based on your query and make sure to keep them updated (run a batch job once a day to update them or once an hour with all the data that has changed recently).
Then, I would be able to query those tables and do so, much faster.
If however, your tables are very small, then just keep going the way you are and play around with indexes till you get some timing which is acceptable.