Postgres Interval not working in where subquery

Postgres Interval not working in where subquery - sql

I have the following code:
Select * from table
where to_date <= ( select max(to_date)
FROM table)
and to_date >= (select (max(to_date)::date - interval '6 months')::date as to_date
FROM table)
Basically, I am trying to look at all the results between the max date and then 6 months in the past, and I tried doing that by making 2 sub queries.
I seem to get null, but oddly enough, if add the regular date that
(select (max(to_date)::date - interval '6 months')::date
is giving and paste it as >='yyyy-mm-dd', the query seems to be working fine. It is weird as both sub-queries are actually spitting out date format results and have no idea why its giving this.

You don't need both comparison:
select *
from table
where to_date >= (select (max(to_date)::date - interval '6 months')::date as to_date
from table
) ;
This is assuming that the table reference is the same in both the inner and outer query.

I can't really think of a reason why this wouldn't work, but you can rewrite the query to only run a single sub-query, which is also more efficient:
select t.*
from the_table t
cross join (
select max(the_date) as max_date
from the_table
) mt
where t.to_date <= mt.max_date
and t.to_date >= mt.max_date - interval '6 months'

Related

SELECT to_char(date_col, 'YYYY-MM') from table - doesn't work in subquery

I'm trying to build a query using Postgres 9.5.3
Function to_char in simple statement like this
SELECT to_char(date_created,'YYYY-MM') FROM some_table;
Returns results as follow:
+-----------
| to_char
+-----------
| 2017-06
| 2017-07
| 2017-10
Full statement I want to run
SELECT * FROM generate_series(
to_date('2016-01-01', 'YYYY-MM'),
to_date('2017-01-01', 'YYYY-MM'),
interval '1 month')
AS dates
WHERE dates NOT IN (
SELECT to_char(date_created,'YYYY-MM') FROM some_table
);
Result in the following error
Error in query: ERROR: operator does not exist: timestamp with time zone = text
LINE 2: WHERE dates NOT IN (SELECT to_char(date_create,'YYYY-MM')...
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.

You are comparing apples to oranges (timestamps to text).
generate_series() with an interval as the third parameter actually returns a timestamp, not a date. Your sub-select returns the column date_created as a string (text) - and comparing a timestamp to a text doesn't work.
As you apparently only want to check for the same month, you need to convert the date returned from generate_series() to the same text value:
SELECT *
FROM generate_series(to_date('2016-01-01', 'YYYY-MM'),
to_date('2017-01-01', 'YYYY-MM'),
interval '1 month') as dates (d)
WHERE to_char(dates.d, 'yyyy-mm') NOT IN (SELECT to_char(date_created,'YYYY-MM')
FROM some_table);
Another option is to compare dates by "normalizing" the date_created to the start of the month:
SELECT *
FROM generate_series(to_date('2016-01-01', 'YYYY-MM'),
to_date('2017-01-01', 'YYYY-MM'),
interval '1 month') as dates (d)
WHERE dates.d NOT IN (SELECT date_trunc('month', date_created)
FROM some_table);

You have a data type mismatch. Use TO_DATE to convert right side (inner query) back to DATE
...
WHERE dates NOT IN (
SELECT to_date(to_char(date_created,'YYYY-MM'), 'YYYY-MM-DD') FROM some_table
);
or use TO_CHAR to convert left side to CHAR
...
WHERE to_char(dates, 'YYYY-MM')NOT IN (
SELECT to_char(date_created,'YYYY-MM') FROM some_table
);

SQL Aggregation Join and Subquery Optimisation

I am trying to get aggregate values by time periods of two relations (buys and uses) and join them so that I can get the results in one report and also draw a ratio on them. I am using PostgreSQL. The end report required is: dateTime, u.sum, b.sum, b.sum/u.sum
The following query works but scales very poorly with larger table sizes.
SELECT b2.datetime AS dateTime, b2.sum AS BUY_VOLUME, u1.sum AS USE_VOLUME,
CASE u1.sum
WHEN 0 THEN 0
ELSE (b2.sum / u1.sum)
END AS buyToUseRatio
FROM(
SELECT SUM(b.total / 100.0) AS sum, date_trunc('week', (b.datetime + INTERVAL '1 day')) - INTERVAL '1 day' as datetime
FROM buys AS b
WHERE
datetime > date_trunc('month', CURRENT_DATE) - INTERVAL '1 year'
GROUP BY datetime) AS b2
INNER JOIN (SELECT SUM(u.amount) / 100.00 AS sum, date_trunc('week', (u.datetime + INTERVAL '1 day')) - INTERVAL '1 day' AS datetime
FROM uses AS u
WHERE
datetime > date_trunc('month', CURRENT_DATE) - INTERVAL '1 year'
GROUP BY datetime) AS u1 ON b2.datetime = u1.datetime
ORDER BY b2.datetime ASC;
I was wondering if anyone could help me by providing an alternative query that would get the end result required and is faster to execute.
I appreciate any help on this :-) My junior level SQL is a little rusty and I can't think of another way of doing this without creating indexes. Thanks in advance.

At least, these indexes can help your query:
create index idx_buys_datetime on buys(datetime);
create index idx_uses_datetime on uses(datetime);
Your query seems fine. However, you could use full join (instead of inner) to have all rows, where at least one of your tables have data. You could even use generate_series() to always have 1 year of results, even when there is no data in either of your tables, but I'm not sure if that's what you need. Also, some other things can be written more easily; your query could look like this:
select dt, buy_volume, use_volume, buy_volume / nullif(use_volume, 0.0) buy_to_use_ratio
from (select sum(total / 100.0) buy_volume, date_trunc('week', (datetime + interval '1 day')) - interval '1 day' dt
from buys
where datetime > date_trunc('month', current_timestamp - interval '1 year')
group by 2) b
full join (select sum(amount) / 100.0 use_volume, date_trunc('week', (datetime + interval '1 day')) - interval '1 day' dt
from uses
where datetime > date_trunc('month', current_timestamp - interval '1 year')
group by 2) u using (dt)
order by 1
http://rextester.com/YVASV92568

So the answer depends on how large your tables are, but if it was me, I would create one or two new "summary" tables based on your query and make sure to keep them updated (run a batch job once a day to update them or once an hour with all the data that has changed recently).
Then, I would be able to query those tables and do so, much faster.
If however, your tables are very small, then just keep going the way you are and play around with indexes till you get some timing which is acceptable.

Postgres - Fast way to sum over rows from last day of month

I want to query a table and sum a column for all of the rows from the last day of the month.
Let's use the following table as an example:
CREATE TABLE example(dt date, value int)
(The real table has many more columns and is relatively large, and the real query is more complicated)
I have the following query:
SELECT dt, SUM(value)
FROM example
WHERE dt IN (SELECT DISTINCT
date_trunc('MONTH', generate_series('2012-01-01'::date,
'2016-12-01'::date,
interval '1 day') + INTERVAL '1 MONTH - 1 day')::date)
GROUP BY dt
It runs in about ~2 seconds on my real table.
However, if I generate the full list of end-of-month days in my range and parameterise the query like so:
SELECT dt, SUM(value)
FROM example
WHERE dt IN ('2012-01-31', ...)
GROUP BY dt
It's much quicker, ~750ms.
I would prefer not to generate the dates and pass them through to the query like that, is there a way I can do this entirely in SQL and make it as fast as the latter version?

The sub-select is needlessly complicated. It can be simplified to:
SELECT dt, SUM(value)
FROM example
WHERE dt IN (SELECT d::date
from generate_series('2012-01-01'::date, '2016-12-01'::date, interval '1 month') dates (d)
GROUP BY dt; --<< the group by is necessary
Maybe that speeds up the query.
You can also try to put the date generation into a CTE:
with dates (d) as (
SELECT t::date
from generate_series('2012-01-01'::date, '2016-12-01'::date, interval '1 month') t
)
SELECT dt, SUM(value)
FROM example
WHERE dt IN ( select d from dates)
GROUP BY dt;
Sometimes doing a JOIN is also more efficient:
with dates (d) as (
SELECT t::date
from generate_series('2012-01-01'::date, '2016-12-01'::date, interval '1 month') t
)
SELECT dt, SUM(value)
FROM example
JOIN dates on example.dt = dates.d
GROUP BY dt;

The performance problem in your query comes from the fact that you are generating a daily series. Change it to monthly, remove the distinct and add a group by
select dt, sum(value)
from
example
inner join (
select date_trunc('month', dt) + interval '1 month - 1 day' as dt
from generate_series('2012-01-01'::date, '2016-12-01', '1 month') gs (dt)
) d using (dt)
group by dt

Difference of datetime column in SQL

I have a table of 20000 records. each Record has a datetime field. I want to select all records where gap between one record and subsequent record is more than one hour [condition to be applied on datetime field].
can any one give me the SQL command code for this purpose.
regards
KAM

ANSI SQL supports the lead() function. However, date/time functions vary by database. The following is the logic you want, although the exact syntax varies, depending on the database:
select t.*
from (select t.*,
lead(datetimefield) over (order by datetimefield) as next_datetimefield
from t
) t
where datetimefield + interval '1 hour' < next_datetimefield;
Note: In Teradata, the where would be:
where datetimefield + interval '1' hour < next_datetimefield;

This can also be done with a sub query, which should work on all DBMS. As gordon said, date/time functions are different in every one.
SELECT t.* FROM YourTable t
WHERE t.DateCol + interval '1 hour' < (SELECT min(s.DateCol) FROM YourTable s
WHERE t.ID = s.ID AND s.DateCol > t.DateCol)
You can replace this:
t.DateCol + interval '1 hour'
With one of this so it will work on almost every DBMS:
DATE_ADD( t.DateCol, INTERVAL 1 hour)
DATEADD(hour,1,t.DateCol)

Although Teradata doesn't support Standard SQL's LEAD it's easy to rewrite:
select tab.*,
min(ts) over (order by ts rows between 1 following and 1 following) as next_ts
from tab
qualify
ts < next_ts - interval '1' hour
If you don't need to show the next timestamp:
select *
from tab
qualify
ts < min(ts) over (order by ts rows between 1 following and 1 following) - interval '1' hour
QUALIFY is a Teradata extension, but really nice to have, similar to HAVING after GROUP BY

SQL date check within set number of days

Im trying to find out how many clients viewed a property within 14 days of May 20, 2004, either before or after. Not really sure at all how to go about this.
Im assuming i need to group it and use a having?
EDIT: I am using oracle now
select count(*)
from VIEWING
WHERE CLAUSE?

For a one time query with that specific date,
select count(*) clients
from yourtable
where yourdatefield >= {d'2004-05-06'}
and yourdatefield < {d'2004-06-08'}
You might want to consult a calendar to see if those dates are correct.

Edit #1, since you are using Oracle, you can use:
select count(*) TotalClients
from yourtable
where dt >= (to_date('2004-05-20', 'yyyy-mm-dd') - INTERVAL '14' DAY)
and dt <= (to_date('2004-05-20', 'yyyy-mm-dd') + INTERVAL '14' DAY)
See SQL Fiddle with Demo
Based on some of your previous questions you were using MySQL.
If you are using MySQL then you can use the DATE_ADD() function to get the date range and then use count(*) to return all records from those dates:
select count(*) TotalClients
from yourtable
where dt >= date_add(str_to_date('2004-05-20', '%Y-%m-%d'), INTERVAL -14 day)
and dt <= date_add(str_to_date('2004-05-20', '%Y-%m-%d'), INTERVAL 14 day)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Postgres Interval not working in where subquery - sql

You don't need both comparison: select * from table where to_date >= (select (max(to_date)::date - interval '6 months')::date as to_date from table ) ; This is assuming that the table reference is the same in both the inner and outer query.

Related

SELECT to_char(date_col, 'YYYY-MM') from table - doesn't work in subquery

SQL Aggregation Join and Subquery Optimisation

Postgres - Fast way to sum over rows from last day of month

Difference of datetime column in SQL

SQL date check within set number of days

Categories

Resources