PostgreSQL group by epoch month - sql

I got a crazy amount of rows that uses epoch time.
id, customerid, orderid, uxtime
On my desk right now is to build a admin page that allows others to quickly wade through this humongous pile of rows.
They want to be able to choose year and month and get that months list.
That means: Choose 2016 and April that should return all ids from april-16.
This has be able to be done in a smart cool sql-sentence. That is where you come in. I am working on it and making some progress but I am pretty sure all of you is so much quicker than me. :)

To convert "April 2016" to a unix epoch use make_date() and extract()
extract(epoch from make_date(2016,4,1))
You also need an upper bound for a where clause which would typically be the first of the next month:
extract(epoch from make_date(2016,4,1) + interval '1' month)
So your SQL statement would be something like this:
select ...
from ...
where uxtime >= extract(epoch from make_date(2016,4,1))
and uxtime < extract(epoch from make_date(2016,4,1) + interval '1' month);
A slightly shorter way of writing it would be:
select ...
from ...
where to_char(to_timestamp(uxtime), 'yyyy-mm') = '2016-04'
The above however will be a lot slower the the first solution because it cant' make use of an index on uxtime
You could create an index on to_char(to_timestamp(uxtime), 'yyyy-mm') if you really prefer that solution to speed up the query.

Related

Trying to exclude a subset within a few days of a report

I am trying to exclude trades of a certain typology if they are less than 4 days but this filter im using is super slow in returning results. it also doesnt capture trades which are booked 10th of may expiring 13th june which it should. if I amend it to include these it will also include trades within the same month less than 4 days. can someone help me make it more efficient and capturing what I want. using orcale sql developer
and ( DC.M_TYPOLOGY <> 'Repo BD' OR (DC.M_TYPOLOGY ='Repo BD' and
(((to_char( DC.M_OPTMAT , 'YYYY')- to_char (DC.M_TRADEDATE, 'YYYY'))= 0
and (to_char( DC.M_OPTMAT , 'MM')- to_char (DC.M_TRADEDATE, 'MM') < 1))
and (to_char( DC.M_OPTMAT , 'DD')- to_char (DC.M_TRADEDATE, 'DD') > 4))))
All that conversion is unnecessary (and wrong - what happens when trades span year end?). We can do arithmetic with Oracle dates so this is the same:
( DC.M_TYPOLOGY <> 'Repo BD'
OR (DC.M_TYPOLOGY ='Repo BD'
and DC.M_TRADEDATE - DC.M_OPTMAT < 4 )
)
Note: this answer posted before the recent edit to the question. I'm leaving this in place but I will revise it once the OP has clarified the rules they want to enforce.
Of course, this may not speed your query up. Performance problems can arise from many different causes and there simply isn't enough detail provided.

Specifying custom date range in SQL query

I want to write a query where in I need to specify the custom range (instead of hardcoded date range) for date starting from the order day. In the table being used, I have the date for the order.
As of now I have hardcoded the date range like:
where owh.order_day between TO_DATE('2016/07/15','YYYY/MM/DD') and TO_DATE('2017/01/17','YYYY/MM/DD')
where order_day is a date.
But rather I want something like:
where owh.order_day between TO_DATE(owh.order_day - 1,'YYYY/MM/DD') and TO_DATE(owh.order_day +3,'YYYY/MM/DD')
I am doing "-1" as it's "between", so it will take from order_day - order_day+2
For example, If the order_day is: "17/01/2016" then I want the condition to be where the date range is dynamically calculated as: "16/01/2016 - 20/01/2016" .
Is something like this possible? If yes, how can we achieve in in SQL??
The DB in question is Oracle
Any leads appreciated
Since you have not told us which RDBMS you are using, and since you are saying "any leads appreciated", I suppose we are free to give an answer for any RDBMS. The following will work for MySQL:
BETWEEN DATE_SUB( somedate, INTERVAL 1 DAY ) AND DATE_ADD( somedate, INTERVAL 1 DAY )
(source: https://www.tutorialspoint.com/sql/sql-date-functions.htm#function_date-add)

Oddities with postgres SQL [negative date interval and alias that doesn't work only in condition clause]

I'm coming to you guys with with two small oddities I can't seem to understand with postgres:
(1)
SELECT "LASTREQUESTED",
(DATE_TRUNC('seconds', CURRENT_TIMESTAMP - "LASTREQUESTED")
- INTERVAL '8 hours') AS "TIME"
FROM "USER" AS u
JOIN "REQUESTLOG" AS r ON u."ID" = r."ID"
ORDER BY "TIME"
I'm calculating when users can make their next request [once every 8 hours], but if you look at entry 16 I get "1 day -06:20:47" instead of "18:00:00" ish, unlike every other line. [The table LASTREQUESTED is a simple timestamp, nothing different here from the other entries for line 16], why is that?
(2)
On the same request, if I try to add a condition on the "TIME" column, the compiler says it doesn't exist although using it to order by is ok. I don't get why.
SELECT (DATE_TRUNC('seconds', CURRENT_TIMESTAMP - "LASTREQUESTED")
- INTERVAL '8 hours') AS "TIME"
FROM "USER" AS u
JOIN "REQUESTLOG" AS r ON u."ID" = r."ID"
WHERE "TIME" > 0
ORDER BY "TIME";
Question #1: negative hours but positive days?
According to the PostgreSQL documentation, this is a situation where PostgreSQL differs from the SQL standard:
According to the SQL standard all fields of an interval value must have the same sign…. PostgreSQL allows the fields to have different signs….
Internally interval values are stored as months, days, and seconds. This is done because the number of days in a month varies, and a day can have 23 or 25 hours if a daylight savings time adjustment is involved. The months and days fields are integers while the seconds field can store fractions. …
You can see a more extreme example of this with the following query:
=# select interval '1 day' - interval '300 hours';
?column?
------------------
1 day -300:00:00
(1 row)
So this is not a single interval in seconds expressed in a strange way; instead, it's an interval of 0 months, +1 day, and -1,080,000.0 seconds. If you are certain that there's no daylight savings time issues with the timestamps that you got these intervals from, you can use justify_hours to convert days into 24-hour periods and get an interval that makes more sense:
=# select justify_hours(interval '1 day' - interval '300 hours');
justify_hours
--------------------
-11 days -12:00:00
Question #2: SELECT columns can't be used in WHERE?
This is standard PostgreSQL behavior. See this duplicate question. Solutions presented there include:
Repeat the expression twice, once in the SELECT list, and again in the WHERE clause. (I've done this more times than I want to remember…)
SELECT (my - big * expression) AS x
FROM stuff
WHERE (my - big * expression) > 5
ORDER BY x
Create a subquery without that WHERE filter, and put the WHERE conditions in the outer query
SELECT *
FROM (SELECT (my - big * expression) AS x
FROM stuff) AS subquery
WHERE x > 5
ORDER BY x
Use a WITH statement to achieve something similar to the subquery trick.
I don't now exactly why it's calculating as-is (maybe because you subtract an Interval from another Interval) but when you change the calculation to Timestamp minus Timestamp it works as expected:
DATE_TRUNC('seconds', CURRENT_TIMESTAMP - (LASTREQUESTED + INTERVAL '8 hours'))
See Fiddle
Regarding #2: Based on Standard SQL the columns in the Select-list are calculated after FROM/WHERE/GROUP BY/HAVING, but before ORDER, that's why you can't use an alias in WHERE. There are some good articles on that topic written by Itzik Ben-Gan (based on MS SQL Server, but similar for PostgreSQL).

Postgresql query between date ranges

I am trying to query my postgresql db to return results where a date is in certain month and year. In other words I would like all the values for a month-year.
The only way i've been able to do it so far is like this:
SELECT user_id
FROM user_logs
WHERE login_date BETWEEN '2014-02-01' AND '2014-02-28'
Problem with this is that I have to calculate the first date and last date before querying the table. Is there a simpler way to do this?
Thanks
With dates (and times) many things become simpler if you use >= start AND < end.
For example:
SELECT
user_id
FROM
user_logs
WHERE
login_date >= '2014-02-01'
AND login_date < '2014-03-01'
In this case you still need to calculate the start date of the month you need, but that should be straight forward in any number of ways.
The end date is also simplified; just add exactly one month. No messing about with 28th, 30th, 31st, etc.
This structure also has the advantage of being able to maintain use of indexes.
Many people may suggest a form such as the following, but they do not use indexes:
WHERE
DATEPART('year', login_date) = 2014
AND DATEPART('month', login_date) = 2
This involves calculating the conditions for every single row in the table (a scan) and not using index to find the range of rows that will match (a range-seek).
From PostreSQL 9.2 Range Types are supported. So you can write this like:
SELECT user_id
FROM user_logs
WHERE '[2014-02-01, 2014-03-01]'::daterange #> login_date
this should be more efficient than the string comparison
Just in case somebody land here... since 8.1 you can simply use:
SELECT user_id
FROM user_logs
WHERE login_date BETWEEN SYMMETRIC '2014-02-01' AND '2014-02-28'
From the docs:
BETWEEN SYMMETRIC is the same as BETWEEN except there is no
requirement that the argument to the left of AND be less than or equal
to the argument on the right. If it is not, those two arguments are
automatically swapped, so that a nonempty range is always implied.
SELECT user_id
FROM user_logs
WHERE login_date BETWEEN '2014-02-01' AND '2014-03-01'
Between keyword works exceptionally for a date. it assumes the time is at 00:00:00 (i.e. midnight) for dates.
Read the documentation.
http://www.postgresql.org/docs/9.1/static/functions-datetime.html
I used a query like that:
WHERE
(
date_trunc('day',table1.date_eval) = '2015-02-09'
)
or
WHERE(date_trunc('day',table1.date_eval) >='2015-02-09'AND date_trunc('day',table1.date_eval) <'2015-02-09')

select * from table where datetime in month (without breaking index)

I have a bunch of timestamped rows (using the 'datetime' data type)
I want to select all the rows that have a timestamp that is within a particular month.
The column is indexed so I can't do MONTH(timestamp) = 3 because that'll make this index unuseable.
If I have year and month variables (in perl), is there a horrific bit of SQL I can use like:
timestamp BETWEEN DATE($year, $month, 0) AND DATE($year, $month, 31);
But nicer, and actually works?
I would actually go with the idea you proposed ; maybe with a small difference :
select *
from your_table
where date_field >= '2010-01-01'
and date_field < '2010-02-01'
(Of course, up to you the use $year and $month properly)
Note the < '2010-02-01' part : you might have to consider this, if you have dates that include the time.
For instance, if you have a line with a date like '2010-01-31 12:53:12', you probably want to have that line selected -- and, by default, '2010-01-31' means '2010-01-31 00:00:00'.
Maybe that doesn't look 'nice' to the eye ; but it'll work ; and use the index... It's the kind of solution I generaly use when I have that kind of problem.
This is substantively Pascal MARTIN's answer, but avoids having to know explicitly what the next year/month is (so you don't have to increment year and wrap around the $month, when $month == 12):
my $sth = $mysql_dbh->prepare(<<__EOSQL);
SELECT ...
FROM tbl
WHERE ts >= ? AND ts < (? + INTERVAL 1 MONTH)
__EOSQL
my $yyyymm = $year . '-' . sprintf('%02d', $month);
$sth->execute($yyyymm, $yyyymm);
For bonus fugly points, you could also do this:
... WHERE ts BETWEEN ? AND (? + INTERVAL 1 MONTH - INTERVAL 1 SECOND)
That - INTERVAL 1 SECOND will coerce the upper boundary from a DATE into a DATETIME/TIMESTAMP type set to the last second of a day, which is, as Pascal indicated, what you want on the upper bound.
If you need the same month of every year the index you have will not help you and no amount of SQL syntax trickery will help you
On the other hand if you need a month of a particular year then any query with date ranges should do it
Another alternative is to add an extra column to the table that stores the month, precomputed. That would just be a simple int column, and is trivial to index. Unless you're dealing with a kajillion rows, the extra space for an unsigned tiny int is neglible (one byte + db overhead per row).
It'd require a bit of extra work to keep synched with the timestamp column, but that's what triggers are for.
How about WHERE MONTH(`date`) = '$month' AND YEAR(`date`) = '$year'