I have a bunch of timestamped rows (using the 'datetime' data type)
I want to select all the rows that have a timestamp that is within a particular month.
The column is indexed so I can't do MONTH(timestamp) = 3 because that'll make this index unuseable.
If I have year and month variables (in perl), is there a horrific bit of SQL I can use like:
timestamp BETWEEN DATE($year, $month, 0) AND DATE($year, $month, 31);
But nicer, and actually works?
I would actually go with the idea you proposed ; maybe with a small difference :
select *
from your_table
where date_field >= '2010-01-01'
and date_field < '2010-02-01'
(Of course, up to you the use $year and $month properly)
Note the < '2010-02-01' part : you might have to consider this, if you have dates that include the time.
For instance, if you have a line with a date like '2010-01-31 12:53:12', you probably want to have that line selected -- and, by default, '2010-01-31' means '2010-01-31 00:00:00'.
Maybe that doesn't look 'nice' to the eye ; but it'll work ; and use the index... It's the kind of solution I generaly use when I have that kind of problem.
This is substantively Pascal MARTIN's answer, but avoids having to know explicitly what the next year/month is (so you don't have to increment year and wrap around the $month, when $month == 12):
my $sth = $mysql_dbh->prepare(<<__EOSQL);
SELECT ...
FROM tbl
WHERE ts >= ? AND ts < (? + INTERVAL 1 MONTH)
__EOSQL
my $yyyymm = $year . '-' . sprintf('%02d', $month);
$sth->execute($yyyymm, $yyyymm);
For bonus fugly points, you could also do this:
... WHERE ts BETWEEN ? AND (? + INTERVAL 1 MONTH - INTERVAL 1 SECOND)
That - INTERVAL 1 SECOND will coerce the upper boundary from a DATE into a DATETIME/TIMESTAMP type set to the last second of a day, which is, as Pascal indicated, what you want on the upper bound.
If you need the same month of every year the index you have will not help you and no amount of SQL syntax trickery will help you
On the other hand if you need a month of a particular year then any query with date ranges should do it
Another alternative is to add an extra column to the table that stores the month, precomputed. That would just be a simple int column, and is trivial to index. Unless you're dealing with a kajillion rows, the extra space for an unsigned tiny int is neglible (one byte + db overhead per row).
It'd require a bit of extra work to keep synched with the timestamp column, but that's what triggers are for.
How about WHERE MONTH(`date`) = '$month' AND YEAR(`date`) = '$year'
Related
I have the following query that joins two large tables. I am trying to join on patient_id and records that are not older than 30 days.
select * from
chairs c
join data id
on c.patient_id = id.patient_id
and to_date(c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') >= 0
and to_date (c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') < 30
Currently, this query takes 2 hours to run. What indexes can I create on these tables for this query to run faster.
I will take a shot in the dark, because as others said it depends on what the table structure, indices, and the output of the planner is.
The most obvious thing here is that as long as it is possible, you want to represent dates as some date datatype instead of strings. That is the first and most important change you should make here. No index can save you if you transform strings. Because very likely, the problem is not the patient_id, it's your date calculation.
Other than that, forcing hash joins on the patient_id and then doing the filtering could help if for some reason the planner decided to do nested loops for that condition. But that is for after you fixed your date representation AND you still have a problem AND you see that the planner does nested loops on that attribute.
Some observations if you are stuck with string fields for the dates:
YYYYMMDD date strings are ordered and can be used for <,> and =.
Building strings from the data in chairs to use to JOIN on data will make good use of an index like one on data for patient_id, from_date.
So my suggestion would be to write expressions that build the date strings you want to use in the JOIN. Or to put it another way: do not transform the child table data from a string to something else.
Example expression that takes 30 days off a string date and returns a string date:
select to_char(to_date('20200112', 'YYYYMMDD') - INTERVAL '30 DAYS','YYYYMMDD')
Untested:
select * from
chairs c
join data id
on c.patient_id = id.patient_id
and id.from_date between to_char(to_date(c.from_date, 'YYYYMMDD') - INTERVAL '30 DAYS','YYYYMMDD')
and c.from_date
For this query:
select *
from chairs c join data
id
on c.patient_id = id.patient_id and
to_date(c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') >= 0 and
to_date (c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') < 30;
You should start with indexes on (patient_id, from_date) -- you can put them in both tables.
The date comparisons are problematic. Storing the values as actual dates can help. But it is not a 100% solution because comparison operations are still needed.
Depending on what you are actually trying to accomplish there might be other ways of writing the query. I might encourage you to ask a new question, providing sample data, desired results, and a clear explanation of what you really want. For instance, this query is likely to return a lot of rows. And that just takes time as well.
Your query have a non SERGABLE predicate because it uses functions that are iteratively executed. You need to discard such functions and replace them by a direct access to the columns. As an exemple :
SELECT *
FROM chairs AS c
JOIN data AS id
ON c.patient_id = id.patient_id
AND c.from_date BETWEEN id.from_date AND id.from_date + INTERVAL '1 day'
Will run faster with those two indexes :
CREATE X_SQLpro_001 ON chairs (patient_id, from_date);
CREATE X_SQLpro_002 ON data (patient_id, from_date) ;
Also try to avoid
SELECT *
And list only the necessary columns
Here is what I did:
Select count(check_id)
From Checks
Where timestamp::date > '2012-07-31'
Group by 1
Is it right to do it like I did or is there a better way? Should/could I have used the DateDIFF function in my WHERE clause? Something like: DATEDIFF(day, timestamp, '2012/07/31') > 0
Also, I need to figure out how I'd calculate the total rate of acceptance for this
time period? Can anyone provide their expertise with this?
Is it right to do it like I did or is there a better way?
Using a cast like that is a perfectly valid way to convert a timestamp to a date (I don't understand the reference to the non-existing datediff though - why would adding anything to a timestamp change it)
However, the cast has one drawback: if there is an index on the column "timestamp" it won't be used.
But as you just want a range after a certain date, there is no reason to cast the column to begin with.
The following will achieve the same thing as your query, but can make use of an index on the column "timestamp" in case there is one and using it is considered beneficial by the optimizer.
Select count(distinct check_id)
From Checks
Where "timestamp" > date '2012-07-31' + 1
Note the + 1 which selects the day after, otherwise the query would include rows that are on that date but after midnight.
I removed the unnecessary group by from your query.
If you want to get a count per day, then you will need to include the day in the SELECT list. In that case casting is a good way to do it:
Select "timestamp"::date, count(distinct check_id)
From Checks
Where "timestamp" > date '2012-07-31' + 1
group by "timestamp"::date
I have found many Questions and Answers about a SELECT excluding rows with a value "NOT IN" a sub-query (such as this). But how to exclude a list of values rather than a sub-query?
I want to search for rows whose timestamp is within a range but exclude some specific date-times. In English, that would be:
Select all the ORDER rows recorded between noon and 2 PM today except for the ones of these times: Today 12:34, Today 12:55, and Today 13:05.
SQL might be something like:
SELECT *
FROM order_
WHERE recorded_ >= ?
AND recorded_ < ?
AND recorded_ NOT IN ( list of date-times… )
;
So two parts to this Question:
How to write the SQL to exclude rows having any of a list of values?
How to set an arbitrary number of arguments to a PreparedStatement in JDBC?(the arbitrary number being the count of the list of values to be excluded)
Pass array
A fast and NULL-safe alternative would be a LEFT JOIN to an unnested array:
SELECT o.*
FROM order_ o
LEFT JOIN unnest(?::timestamp[]) x(recorded_) USING (recorded_)
WHERE o.recorded_ >= ?
AND o.recorded_ < ?
AND x.recorded_ IS NULL;
This way you can prepare a single statement and pass any number of timestamps as array.
The explicit cast ::timestamp[] is only necessary if you cannot type your parameters (like you can in prepared statements). The array is passed as single text (or timestamp[]) literal:
'{2015-07-09 12:34, 2015-07-09 12:55, 2015-07-09 13:05}', ...
Or put CURRENT_DATE into the query and pass times to add like outlined by #drake . More about adding a time / interval to a date:
How to get the end of a day?
Pass individual values
You could also use a VALUES expression - or any other method to create an ad-hoc table of values.
SELECT o.*
FROM order_ o
LEFT JOIN (VALUES (?::timestamp), (?), (?) ) x(recorded_)
USING (recorded_)
WHERE o.recorded_ >= ?
AND o.recorded_ < ?
AND x.recorded_ IS NULL;
And pass:
'2015-07-09 12:34', '2015-07-09 12:55', '2015-07-09 13:05', ...
This way you can only pass a predetermined number of timestamps.
Asides
For up to 100 parameters (or your setting of max_function_args), you could use a server-side function with a VARIADIC parameter:
Return rows matching elements of input array in plpgsql function
I know that you are aware of timestamp characteristics, but for the general public: equality matches can be tricky for timestamps, since those can have up to 6 fractional digits for seconds and you need to match exactly.
Related
Select rows which are not present in other table
Optimizing a Postgres query with a large IN
SELECT *
FROM order_
WHERE recorded_ BETWEEN (CURRENT_DATE + time '12:00' AND CURRENT_DATE + time '14:00')
AND recorded_ NOT IN (CURRENT_DATE + time '12:34',
CURRENT_DATE + time '12:55',
CURRENT_DATE + time '13:05')
;
I am trying to query my postgresql db to return results where a date is in certain month and year. In other words I would like all the values for a month-year.
The only way i've been able to do it so far is like this:
SELECT user_id
FROM user_logs
WHERE login_date BETWEEN '2014-02-01' AND '2014-02-28'
Problem with this is that I have to calculate the first date and last date before querying the table. Is there a simpler way to do this?
Thanks
With dates (and times) many things become simpler if you use >= start AND < end.
For example:
SELECT
user_id
FROM
user_logs
WHERE
login_date >= '2014-02-01'
AND login_date < '2014-03-01'
In this case you still need to calculate the start date of the month you need, but that should be straight forward in any number of ways.
The end date is also simplified; just add exactly one month. No messing about with 28th, 30th, 31st, etc.
This structure also has the advantage of being able to maintain use of indexes.
Many people may suggest a form such as the following, but they do not use indexes:
WHERE
DATEPART('year', login_date) = 2014
AND DATEPART('month', login_date) = 2
This involves calculating the conditions for every single row in the table (a scan) and not using index to find the range of rows that will match (a range-seek).
From PostreSQL 9.2 Range Types are supported. So you can write this like:
SELECT user_id
FROM user_logs
WHERE '[2014-02-01, 2014-03-01]'::daterange #> login_date
this should be more efficient than the string comparison
Just in case somebody land here... since 8.1 you can simply use:
SELECT user_id
FROM user_logs
WHERE login_date BETWEEN SYMMETRIC '2014-02-01' AND '2014-02-28'
From the docs:
BETWEEN SYMMETRIC is the same as BETWEEN except there is no
requirement that the argument to the left of AND be less than or equal
to the argument on the right. If it is not, those two arguments are
automatically swapped, so that a nonempty range is always implied.
SELECT user_id
FROM user_logs
WHERE login_date BETWEEN '2014-02-01' AND '2014-03-01'
Between keyword works exceptionally for a date. it assumes the time is at 00:00:00 (i.e. midnight) for dates.
Read the documentation.
http://www.postgresql.org/docs/9.1/static/functions-datetime.html
I used a query like that:
WHERE
(
date_trunc('day',table1.date_eval) = '2015-02-09'
)
or
WHERE(date_trunc('day',table1.date_eval) >='2015-02-09'AND date_trunc('day',table1.date_eval) <'2015-02-09')
How would I go about doing a query that returns results of all rows that contain dates for current year and month at the time of query.
Timestamps for each row are formated as such: yyyy-mm-dd
I know it probably has something to do with the date function and that I must somehow set a special parameter to make it spit out like such: yyyy-mm-%%.
Setting days to be wild card character would do the trick but I can't seem to figure it out how to do it.
Here is a link to for quick reference to date-time functions in mysql:
http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html
Thanks
I think EXTRACT is the function you are looking for:
SELECT * FROM table
WHERE EXTRACT(YEAR_MONTH FROM timestamp_field) = EXTRACT(YEAR_MONTH FROM NOW())
you could extract the year and month using a function, but that will not be able to use an index.
if you want scalable performance, you need to do this:
SELECT *
FROM myTable
WHERE some_date_column BETWEEN '2009-01-01' AND '2009-01-31'
select * from someTable where year(myDt) = 2009 and month(myDt) = 9 and day(myDt) = 12