Postgres partitioned tables with dynamic where clause

Postgres partitioned tables with dynamic where clause - sql

I have a partitioned table that I'm using to get the sum of X column for the records with Y column.
If I use
where logdate >= '2020-01-01' AND logdate < '2020-02-01'
I get a clean query plan and execution, while if I do
where logdate >= (SELECT CURRENT_DATE - 30) AND logdate < '2020-02-01'
really, SELECT CURRENT_DATE - 30 is just an example to show the initial date is going to be generated dynamically - it traverses all the partitioned with bitmap heap scans and index scans resulting in a query execution 10 times slower.
Is there something I can do in order improve the query for that given part (the where clause), or should I resign and try instead manipulating the dates with the programming language I'm currently using?

Related

Performing of my basic query taking long time

I use MsSQL. I have a "jobs" table which has 140 columns and includes more than 4 million records in it. This table's columns mostly varchar and bit.
The table's 40 columns connected to some other tables. Like "issuerid" from "issuers" table, "fileid" from "files"...
The indexes of table is only on the "fileid" which is non-unique and non-clustered.
My basic query is like in the following:
select issuerid,count(id) as total , sum(case when X_Status=1 then 1 else 0 end) P_Count
from jobs where 1=1 and issuerid='1001' and creationdate between '01/01/2019 12:00:01 AM' and '06/30/2019 11:59:59 PM' group by issuerid
The duration of the query is: 1min 20seconds (The PC has SSD and 4GB Ram)
So i tried to index on issuerid but it didn't affect as much.
I have a lot of queries on this table for my asp page. For example the sum case changes mostly;
sum(case when Y_Status=1 then 1 else 0 end) P_Count
Like this.
So even tried to let 2 columns in the table and executed this query
select count(id) as, sum(case when X_Status=1 then 1 else 0 end) P_Count from newjobs where 1=1
and this took around 30seconds.
I read many topics and article to improve query performance but didn't work. Is there anyone who has any idea to share?
Thank you.

The following should work for your exact query:
CREATE NONCLUSTERED INDEX IX_Jobs__IssuerID_CreationDate ON dbo.Jobs (IssuerID, CreationDate)
INCLUDE (X_Status);
Since your query filters on IssuerID and CreationDate these are the key columns, then I hav eadded X_Status as a non key column so that the whole query can be run from this index and there is no chance of a bookmark lookup or an index scan.
As an aside, your current where clause will always exclude things that happen in the first second of the first day and the last second of the last day (i.e between 00:00:00 and 00:00:01on 1st January, and 06/30/2019 23:59:59 and 07/01/2019 00:00:00). This may be deliberate, but I suspect it isn't. It is usually much better, and also more clear as to your intentions to use an open ended date range.
WHERE CreationDate > '20190101'
AND CreationDate < '20190701'
Or More likely:
WHERE CreationDate >= '20190101'
AND CreationDate < '20190701'
I have also swtiched to a culture invariant date time format, so that the date literal is interpretted as the same date on every machine. For more reading see:
What do BETWEEN and the devil have in common?
Bad habits to kick : mis-handling date / range queries

sql query to get today new records compared with yesterday

i have this table:
COD (Integer) (PK)
ID (Varchar)
DATE (Date)
I just want to get the new ID's from today, compared with yesterday (the ID's from today that are not present yesterday)
This needs to be done with just one query, maximum efficiency because the table will have 4-5 millions records
As a java developer i am able to do this with 2 queries, but with just one is beyond my knowledge so any help would be so much appreciated
EDIT: date format is dd/mm/yyyy and every day each ID may come 0 or 1 times

Here is a solution that will go over the base data one time only. It selects the id and the date where the date is either yesterday or today (or both). Then it GROUPS BY id - each group will have either one or two rows. Then it filters by the condition that the MIN date in the group is "today". Those are the id's that exist today but did not exist yesterday.
DATE is an Oracle keyword, best not used as a column name. I changed that to DT. I also assume that your "dt" field is a pure date (as pure as it can be in Oracle, meaning: time of day, which is always present, is 00:00:00).
select id
from your_table
where dt in (trunc(sysdate), trunc(sysdate) - 1)
group by id
having min(dt) = trunc(sysdate)
;
Edit: Gordon makes a good point: perhaps you may have more than one such row per ID, in the same day? In that case the time-of-day may also be different from 00:00:00.
If so, the solution can be adapted:
select id
from your_table
where dt >= trunc(sysdate) - 1 and dt < trunc(sysdate) + 1
group by id
having min(dt) >= trunc(sysdate)
;
Either way: (1) the base table is read just once; (2) the column DT is not wrapped within any function, so if there is an index on that column, it can be used to access just the needed rows.

The typical method would use not exists:
select t.*
from t
where t.date >= trunc(sysdate) and t.date < trunc(sysdate + 1) and
not exists (select 1
from t t2
where t2.id = t.id and
t2.date >= trunc(sysdate - 1) and t2.date < trunc(sysdate)
);
This is a general solution. If you know that there is at most one record per day, there are better solutions, such as using lag().

Use MINUS. I suppose your date column has a time part, so you need to truncate it.
select id from mytable where trunc(date) = trunc(sysdate)
minus
select id from mytable where trunc(date) = trunc(sysdate) - 1;
I suggest the following function index. Without it, the query would have to full scan the table, which would probably be quite slow.
create idx on mytable( trunc(sysdate) , id );

Most efficient way to retrieve data by timestamps

I'm using PostgreSQL 9.2.8.
I have table like:
CREATE TABLE foo
(
foo_date timestamp without time zone NOT NULL,
-- other columns, constraints
)
This table contains about 4.000.000 rows. One day data is about 50.000 rows.
My goal is to retrieve one day data as fast as possible.
I have created an index like:
CREATE INDEX foo_foo_date_idx
ON foo
USING btree
(date_trunc('day'::text, foo_date));
And now I'm selecting data like this (now() is just an example, i need data from ANY day):
select *
from process
where date_trunc('day'::text, now()) = date_trunc('day'::text, foo_date)
This query lasts about 20 s.
Is there any possiblity to obtain same data in shorter time?

It takes time to retrieve 50,000 rows. 20 seconds seems like a long time, but if the rows are wide, then that might be an issue.
You can directly index foo_date and use inequalities. So, you might try this version:
create index foo_foo_date_idx2 on foo(foo_date);
select p
from process p
where p.foo_date >= date_trunc('day', now()) and
p.foo_date < date_trunc('day', now() + interval '1 day');

SQL: Difference between "BETWEEN" vs "current_date - number"

I am wondering which of the following is the best way to implement and why.
select * from table1 where request_time between '01/18/2012' and '02/17/2012'
and
select * from table1 where request_time > current_date - 30

I ran the two queries through some of my date tables in my database and using EXPLAIN ANALYZE I found these results:
explain analyze
select * from capone.dim_date where date between '01/18/2012' and '02/17/2012'
Total runtime: 22.716 ms
explain analyze
select * from capone.dim_date where date > current_date - 30
Total runtime: 65.044 ms
So it looks like the 1st option is more optimal. Of course this is biased towards my DBMS but these are still the results I got.
The table has dates ranging from 1900 to 2099 so it is rather large, and not just some dinky little table.

Between has the inclusive ranges i.e when you issue a query like id between 2 and 10 the value of 2 and 10 will also be fetched.If you want to eliminate these values use > and <.
Also when indexes are applied say on date column > and < makes a good use of index than between.

Query aggregate faster than MAX

I have a fairly large table in which one of the columns is a date column. The query I execute is as follows.
select max(date) from tbl where date < to_date('10/01/2010','MM/DD/YYYY')
That is, I want to find the cell value closest to and less than a particular date value. This takes considerable time because of the max on the large table. Is there a faster way to do this? maybe using LAST_VALUE?

Put an index on the date column and the query should be plenty fast.

1) Add an index to the date column. Simply put, an index allows the database engine to store information about the data so it will speed up most queries where that column is one of the clauses. Info here http://docs.oracle.com/cd/B28359_01/server.111/b28310/indexes003.htm
2) Consider adding a second clause to the query. You have where date < to_date('10/01/2010','MM/DD/YYYY') now, why not change it to:
where date < to_date('10/01/2010','MM/DD/YYYY') and date > to_date('09/30/2010', 'MM/DD/YYYY')
since this will reduce the number of scanned rows.

Try
select date from (
select date from tbl where date < to_date('10/01/2010','MM/DD/YYYY') order by date desc
) where rownum = 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Postgres partitioned tables with dynamic where clause - sql

Related

Performing of my basic query taking long time

sql query to get today new records compared with yesterday

Most efficient way to retrieve data by timestamps

SQL: Difference between "BETWEEN" vs "current_date - number"

Query aggregate faster than MAX

Categories

Resources