Get the count of distinct userids for last couple of days - sql

Let's say the last 7 days for this table:
Userid Download time
Rab01 2020-04-29 03:28
Klm01 2020-04-29 04:01
Klm01 2020-04-30 05:10
Rab01 2020-04-29 12:14
Osa_3 2020-04-25 09:01
Following is the required output:
Count Download_time
1 2020-04-25
2 2020-04-29
1 2020-04-30

Tested with PostgreSQL. You also tagged Redshift, which forked at Postgres 8.2, a long time ago. There may be discrepancies ..
Since you seem to be happy with standard ISO format, a simple cast to date would be most efficient:
SELECT count(DISTINCT userid) AS "Count"
, download_time::date AS "Download_Day"
FROM tbl
WHERE download_time >= CURRENT_DATE - 7
AND download_time < CURRENT_DATE
GROUP BY 2;
db<>fiddle here
CURRENT_DATE is standard SQL and works for both Postgres and Redshift. Related:
How do I determine the last day of the previous month using PostgreSQL?
About the "last 7 days": I took the last 7 whole days (excluding today - necessarily incomplete), with syntax that can use a plain index on (download_time). Related:
Get dates of a day of week in a date range
Slow LEFT JOIN on CTE with time intervals
Interval (days) in PostgreSQL with two parameters
Ideally, you have a composite index on (download_time, userid) (and fulfill some preconditions) to get very fast index-only scans. See:
Is a composite index also good for queries on the first field?
count(DISTINCT ...) is typically slow. For big tables with many duplicates, there are faster techniques. Disclose your exact setup and cardinalities if you need to optimize performance.
If the actual data type is timestamptz, not just timestamp, you also need to define the time zone defining day boundaries. See:
Ignoring time zones altogether in Rails and PostgreSQL
About the optional short syntax GROUP BY 2:
Select first row in each GROUP BY group?
About capitalization of identifiers:
Are PostgreSQL column names case-sensitive?

You can use date_trunc function for get day only part from datetime and use it for grouping.
The query may be next:
SELECT
count(distinct Userid) as Count, -- get unuque users count
to_char(date_trunc('day', Download_time), 'YYYY-MM-DD') AS Download_Day -- convert time do day
FROM table
WHERE DATE_PART('day', NOW() - Download_time) < 7 -- last 7 days
GROUP BY Download_Day; -- group by day
Fiddle

Related

How to make a group query to select multiple rows?

I have a DateTime column (timestamp 2022-05-22 10:10:12) with a batch of stamps per each day.
I need to filter the rows where stamp is before 9am (here is no problem) and I'm using this code:
SELECT * FROM tickets
WHERE date_part('hour'::text, tickets.date_in) < 9::double precision;
The output is the list of the rows where the time in timestamp is less than 9 am (50 rows from 2000).
date_in
2022-05-22 08:10:12
2022-04-23 07:11:13
2022-06-15 08:45:26
Then I need to find all the days where at least one row has a stamp before 9 am - and here I'm stuck. Any idea how to select all the days where at least one stamp was before 9 am?
The code I'm trying:
SELECT * into temp1 FROM tickets
WHERE date_part('hour'::text, tickets.date_in) < 9::double precision
ORDER BY date_part('day'::text, date_in);
Select * into temp2
from tickets, temp1
where date_part('day'::text, tickets.date_in) = date_part('day'::text, temp1.date_in);
Update temp2 set distorted_route = 1;
But this is giving me nothing.
Expected output is to get all the days where at least one route was done before 9am:
date_in
2022-05-22 08:10:12
2022-05-22 10:11:45
2022-05-22 12:14:59
2022-04-23 07:11:13
2022-04-23 11:42:25
2022-06-15 08:45:26
2022-06-15 15:10:57
Should I make an additional table (temp1) to feed it with the first query result (just the rows before 9am) and then make a cross table query to find in the source table public.tickets all the days which are equal to the public.temp1?
Select * from tickets, temp1
where TO_Char(tickets.date_in, 'YYYY-MM-DD')
= TO_Char(temp1.date_in, 'YYYY-MM-DD');
or like this:
SELECT *
FROM tickets
WHERE EXISTS (
SELECT date_in FROM TO_Char(tickets.date_in, 'YYYY-MM-DD') = TO_Char(temp1.date_in, 'YYYY-MM-DD')
);
Ideally, I'd want to avoid using a temporary table and make a request just for one table.
After that, I need to create a view or update and add some remarks to the source table.
Assuming you mean:
How to select all rows where at least one row exists with a timestamp before 9 am of the same day?
SELECT *
FROM tickets t
WHERE EXISTS (
SELECT FROM tickets t1
WHERE t1.date_in::date = t.date_in::date -- same day
AND t1.date_in::time < time '9:00' -- time before 9:00
AND t1.id <> t.id -- exclude self
)
ORDER BY date_id; -- optional, but typically helpful
id being the PK column of your undisclosed table.
But be aware that ...
... typically you'll want to work with timestamptz instead of timestamp. See:
Ignoring time zones altogether in Rails and PostgreSQL
https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_timestamp_.28without_time_zone.29
... this query is slow for big tables, because it cannot use a plain index on (date_id) (not "sargable"). Related:
How do you do date math that ignores the year?
There are various ways to optimize performance. The best way depends on undisclosed information for performance questions.

Oddities with postgres SQL [negative date interval and alias that doesn't work only in condition clause]

I'm coming to you guys with with two small oddities I can't seem to understand with postgres:
(1)
SELECT "LASTREQUESTED",
(DATE_TRUNC('seconds', CURRENT_TIMESTAMP - "LASTREQUESTED")
- INTERVAL '8 hours') AS "TIME"
FROM "USER" AS u
JOIN "REQUESTLOG" AS r ON u."ID" = r."ID"
ORDER BY "TIME"
I'm calculating when users can make their next request [once every 8 hours], but if you look at entry 16 I get "1 day -06:20:47" instead of "18:00:00" ish, unlike every other line. [The table LASTREQUESTED is a simple timestamp, nothing different here from the other entries for line 16], why is that?
(2)
On the same request, if I try to add a condition on the "TIME" column, the compiler says it doesn't exist although using it to order by is ok. I don't get why.
SELECT (DATE_TRUNC('seconds', CURRENT_TIMESTAMP - "LASTREQUESTED")
- INTERVAL '8 hours') AS "TIME"
FROM "USER" AS u
JOIN "REQUESTLOG" AS r ON u."ID" = r."ID"
WHERE "TIME" > 0
ORDER BY "TIME";
Question #1: negative hours but positive days?
According to the PostgreSQL documentation, this is a situation where PostgreSQL differs from the SQL standard:
According to the SQL standard all fields of an interval value must have the same sign…. PostgreSQL allows the fields to have different signs….
Internally interval values are stored as months, days, and seconds. This is done because the number of days in a month varies, and a day can have 23 or 25 hours if a daylight savings time adjustment is involved. The months and days fields are integers while the seconds field can store fractions. …
You can see a more extreme example of this with the following query:
=# select interval '1 day' - interval '300 hours';
?column?
------------------
1 day -300:00:00
(1 row)
So this is not a single interval in seconds expressed in a strange way; instead, it's an interval of 0 months, +1 day, and -1,080,000.0 seconds. If you are certain that there's no daylight savings time issues with the timestamps that you got these intervals from, you can use justify_hours to convert days into 24-hour periods and get an interval that makes more sense:
=# select justify_hours(interval '1 day' - interval '300 hours');
justify_hours
--------------------
-11 days -12:00:00
Question #2: SELECT columns can't be used in WHERE?
This is standard PostgreSQL behavior. See this duplicate question. Solutions presented there include:
Repeat the expression twice, once in the SELECT list, and again in the WHERE clause. (I've done this more times than I want to remember…)
SELECT (my - big * expression) AS x
FROM stuff
WHERE (my - big * expression) > 5
ORDER BY x
Create a subquery without that WHERE filter, and put the WHERE conditions in the outer query
SELECT *
FROM (SELECT (my - big * expression) AS x
FROM stuff) AS subquery
WHERE x > 5
ORDER BY x
Use a WITH statement to achieve something similar to the subquery trick.
I don't now exactly why it's calculating as-is (maybe because you subtract an Interval from another Interval) but when you change the calculation to Timestamp minus Timestamp it works as expected:
DATE_TRUNC('seconds', CURRENT_TIMESTAMP - (LASTREQUESTED + INTERVAL '8 hours'))
See Fiddle
Regarding #2: Based on Standard SQL the columns in the Select-list are calculated after FROM/WHERE/GROUP BY/HAVING, but before ORDER, that's why you can't use an alias in WHERE. There are some good articles on that topic written by Itzik Ben-Gan (based on MS SQL Server, but similar for PostgreSQL).

Get a count of records created each week in SQL

I have a table Questions. How can I get a count of all questions asked in a week?
More generically, how can I bucket records by the week they were created in?
Questions
id created_at title
----------------------------------------------------
1 2014-12-31 09:43:42 "Add things"
2 2013-11-23 02:98:55 "How do I ruby?"
3 2015-01-15 15:11:19 "How do I python?"
...
I'm using SQLLite, but PG answers are fine too.
Or if you have the answer using Rails ActiveRecord, that is amazing, but not required.
I've been trying to use DATEPART() but haven't come up with anything successful yet: http://msdn.microsoft.com/en-us/library/ms174420.aspx
In postgreSQL it's as easy as follows:
SELECT id, created_at, title, date_trunc('week', created_at) created_week
FROM Questions
If you wanted to get the # of questions per week, simply do the following:
SELECT date_trunc('week', created_at) created_week, COUNT(*) weekly_cnt
FROM Questions
GROUP BY date_trunc('week', created_at)
Hope this helps. Note that date_trunc() will return a date and not a number (i.e., it won't return the ordinal number of the week in the year).
Update:
Also, if you wanted to accomplish both in a single query you could do so as follows:
SELECT id, created_at, title, date_trunc('week', created_at) created_week
, COUNT(*) OVER ( PARTITION BY date_trunc('week', created_at) ) weekly_cnt
FROM Questions
In the above query I'm using COUNT(*) as a window function and partitioning by the week in which the question was created.
If the created_at field is already indexed, I would simply look for all rows with a created_at value between X and Y. That way the index can be used.
For instance, to get rows with a created_at value in the 3rd week of 2015, you would run:
select *
from questions
where created_at between '2015-01-11' and '2015-01-17'
This would allow the index to be used.
If you want to be able to specify a week in the where clause, you could use the date_part or extract functions to add a column to this table storing the year and week #, and then index that column so that queries can take advantage of it.
If you don't want to add the column, you could of course use either function in the where clause and query against the table, but you won't be able to take advantage of any indexes.
Because you mentioned not wanting to add a column to the table, I would recommend adding a function based index.
For example, if your ddl were:
create table questions
(
id int,
created_at timestamp,
title varchar(20)
);
insert into questions values
(1, '2014-12-31 09:43:42','"Add things"'),
(2, '2013-11-23 02:48:55','"How do I ruby?"'),
(3, '2015-01-15 15:11:19','"How do I python?"');
create or replace function to_week(ts timestamp)
returns text
as 'select concat(extract(year from ts),extract(week from ts))'
language sql
immutable
returns null on null input;
create index week_idx on questions (to_week(created_at));
You could run:
select q.*, to_week(created_at) as week
from questions q
where to_week(created_at) = '20153';
And get:
| ID | CREATED_AT | TITLE | WEEK |
|----|--------------------------------|--------------------|-------|
| 3 | January, 15 2015 15:11:19+0000 | "How do I python?" | 20153 |
(reflecting the third week of 2015, ie. '20153')
Fiddle: http://sqlfiddle.com/#!15/c77cd/3/0
You could similarly run:
select q.*,
concat(extract(year from created_at), extract(week from created_at)) as week
from questions q
where concat(extract(year from created_at), extract(week from created_at)) =
'20153';
Fiddle: http://sqlfiddle.com/#!15/18c1e/3/0
But it would not take advantage of the function based index, because there is none. In addition, it would not use any index you might have on the created_at field because, while that field might be indexed, you really aren't searching on that field. You are searching on the result of a function applied against that field. So the index on the column itself cannot be used.
If the table is large you will either want a function based index or a column holding that week that is itself indexed.
SQLite has no native datetime type like MS SQL Server does, so the answer may depend on how you are storing dates. Not all T-SQL will work in SQLite.
You can store datetime as an integer that counts seconds since 1/1/1970 12:00 AM. There are 604,800 seconds in a week. So you could query on an expression like
rawdatetime / 604800 -- iff rawdatetime is integer
More on handling datetimes in SQLite here: https://www.sqlite.org/datatype3.html
Get the week number using strfdate(%V)
Store it in DB, and use it to identify in which week a question was asked
http://apidock.com/ruby/DateTime/strftime
SQL can do it too with the DATE_FORMAT(datetime,'%u')
So use:
SELECT DATE_FORMAT(column,'%u') FROM Table

Postgresql query between date ranges

I am trying to query my postgresql db to return results where a date is in certain month and year. In other words I would like all the values for a month-year.
The only way i've been able to do it so far is like this:
SELECT user_id
FROM user_logs
WHERE login_date BETWEEN '2014-02-01' AND '2014-02-28'
Problem with this is that I have to calculate the first date and last date before querying the table. Is there a simpler way to do this?
Thanks
With dates (and times) many things become simpler if you use >= start AND < end.
For example:
SELECT
user_id
FROM
user_logs
WHERE
login_date >= '2014-02-01'
AND login_date < '2014-03-01'
In this case you still need to calculate the start date of the month you need, but that should be straight forward in any number of ways.
The end date is also simplified; just add exactly one month. No messing about with 28th, 30th, 31st, etc.
This structure also has the advantage of being able to maintain use of indexes.
Many people may suggest a form such as the following, but they do not use indexes:
WHERE
DATEPART('year', login_date) = 2014
AND DATEPART('month', login_date) = 2
This involves calculating the conditions for every single row in the table (a scan) and not using index to find the range of rows that will match (a range-seek).
From PostreSQL 9.2 Range Types are supported. So you can write this like:
SELECT user_id
FROM user_logs
WHERE '[2014-02-01, 2014-03-01]'::daterange #> login_date
this should be more efficient than the string comparison
Just in case somebody land here... since 8.1 you can simply use:
SELECT user_id
FROM user_logs
WHERE login_date BETWEEN SYMMETRIC '2014-02-01' AND '2014-02-28'
From the docs:
BETWEEN SYMMETRIC is the same as BETWEEN except there is no
requirement that the argument to the left of AND be less than or equal
to the argument on the right. If it is not, those two arguments are
automatically swapped, so that a nonempty range is always implied.
SELECT user_id
FROM user_logs
WHERE login_date BETWEEN '2014-02-01' AND '2014-03-01'
Between keyword works exceptionally for a date. it assumes the time is at 00:00:00 (i.e. midnight) for dates.
Read the documentation.
http://www.postgresql.org/docs/9.1/static/functions-datetime.html
I used a query like that:
WHERE
(
date_trunc('day',table1.date_eval) = '2015-02-09'
)
or
WHERE(date_trunc('day',table1.date_eval) >='2015-02-09'AND date_trunc('day',table1.date_eval) <'2015-02-09')

Postgres SQL select a range of records spaced out by a given interval

I am trying to determine if it is possible, using only sql for postgres, to select a range of time ordered records at a given interval.
Lets say I have 60 records, one record for each minute in a given hour. I want to select records at 5 minute intervals for that hour. The resulting rows should be 12 records each one 5 minutes apart.
This is currently accomplished by selecting the full range of records and then looping thru the results and pulling out the records at the given interval. I am trying to see if I can do this purly in sql as our db is large and we may be dealing with tens of thousands of records.
Any thoughts?
Yes you can. Its really easy once you get the hang of it. I think its one of jewels of SQL and its especially easy in PostgreSQL because of its excellent temporal support. Often, complex functions can turn into very simple queries in SQL that can scale and be indexed properly.
This uses generate_series to draw up sample time stamps that are spaced 1 minute apart. The outer query then extracts the minute and uses modulo to find the values that are 5 minutes apart.
select
ts,
extract(minute from ts)::integer as minute
from
( -- generate some time stamps - one minute apart
select
current_time + (n || ' minute')::interval as ts
from generate_series(1, 30) as n
) as timestamps
-- extract the minute check if its on a 5 minute interval
where extract(minute from ts)::integer % 5 = 0
-- only pick this hour
and extract(hour from ts) = extract(hour from current_time)
;
ts | minute
--------------------+--------
19:40:53.508836-07 | 40
19:45:53.508836-07 | 45
19:50:53.508836-07 | 50
19:55:53.508836-07 | 55
Notice how you could add an computed index on the where clause (where the value of the expression would make up the index) could lead to major speed improvements. Maybe not very selective in this case, but good to be aware of.
I wrote a reservation system once in PostgreSQL (which had lots of temporal logic where date intervals could not overlap) and never had to resort to iterative methods.
http://www.amazon.com/SQL-Design-Patterns-Programming-Focus/dp/0977671542 is an excellent book that goes has lots of interval examples. Hard to find in book stores now but well worth it.
Extract the minutes, convert to int4, and see, if the remainder from dividing by 5 is 0:
select *
from TABLE
where int4 (date_part ('minute', COLUMN)) % 5 = 0;
If the intervals are not time based, and you just want every 5th row; or
If the times are regular and you always have one record per minute
The below gives you one record per every 5
select *
from
(
select *, row_number() over (order by timecolumn) as rown
from tbl
) X
where mod(rown, 5) = 1
If your time records are not regular, then you need to generate a time series (given in another answer) and left join that into your table, group by the time column (from the series) and pick the MAX time from your table that is less than the time column.
Pseudo
select thetimeinterval, max(timecolumn)
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
And further join it back to the table for the full record (assuming unique times)
select t.* from
tbl inner join
(
select thetimeinterval, max(timecolumn) timecolumn
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
) y on tbl.timecolumn = y.timecolumn
How about this:
select min(ts), extract(minute from ts)::integer / 5
as bucket group by bucket order by bucket;
This has the advantage of doing the right thing if you have two readings for the same minute, or your readings skip a minute. Instead of using min even better would be to use one of the the first() aggregate functions-- code for which you can find here:
http://wiki.postgresql.org/wiki/First_%28aggregate%29
This assumes that your five minute intervals are "on the fives", so to speak. That is, that you want 07:00, 07:05, 07:10, not 07:02, 07:07, 07:12. It also assumes you don't have two rows within the same minute, which might not be a safe assumption.
select your_timestamp
from your_table
where cast(extract(minute from your_timestamp) as integer) in (0,5);
If you might have two rows with timestamps within the same minute, like
2011-01-01 07:00:02
2011-01-01 07:00:59
then this version is safer.
select min(your_timestamp)
from your_table
group by (cast(extract(minute from your_timestamp) as integer) / 5)
Wrap either of those in a view, and you can join it to your base table.