SQLite - select expression is very slow - sql

I'm experiencing some heavy performance-issues with a query in SQLite. Currently there are around 20000 entries in the table activity_tbl and about 40 in the table activity_data_tbl. I have an index for both of the columns used in the query below, but it doesn't seem to have any effect on the performance at all.
SELECT a._id, a.start_time + b.length AS time
FROM activity_tbl a INNER JOIN activity_data_tbl b
ON a.activity_data_id = b._data_id
WHERE time > ?
ORDER BY 2
LIMIT 1
As you can see, I select one column and a value created from adding two columns together. I guess this is what's causing the low performance, since the query is very fast if I just select a.start_time or b.length.
Do you guys have any suggestion for how I could optimize this?

Try putting an index on the time column. This should speed up the query

This query is not optimizable using indexes for the filter part since you are filtering and ordering on a calculated value. To optimize the query you will either need to filter on one of the actual table columns (starttime or length) or pre-compute the time values before querying.
The only place an index will help, and I assume you have one, is on b.data_id.

A compound index may help. According to its docs, SQLite tries to avoid to access the table, if the index has enough information. So if the engine did its homework it will recognize that the index is enough to compute the where clause value and spare some time. If it does not work, only the pre-computation will do.
If you are more often confronted with similar tasks, please read this: http://www.sqlite.org/rtree.html

Related

Index for join query with where clause PostgreSQL

I have to optimize the following query with the help of indexes.
SELECT f.*
FROM first f
JOIN second s on f.attributex_id = s.id
WHERE f.attributex_id IS NOT NULL AND f.attributey_id IS NULL
ORDER BY s.month ASC LIMIT 100;
Further infos:
attributex_id is a foreign key pointing to second.id
attributey_id is a foreign key pointing to another table not used in the query
Changing the query is not an option
Most entries (98%) in first the following will be true f.attributex_id IS NOT NULL. Same for the second condition f.attributey_id IS NULL
I tried to add as index as follows.
CREATE INDEX index_for_first
ON first (attributex_id, attributey_id)
WHERE attributex_id IS NOT NULL AND (attributey_id IS NULL)
But the index is not used (checked via Explain Analyze) when executing the query. What kind of indexes would I need to optimize the query and what am I doing wrong with the above index?
Does an index on s.month make sense, too (month is unique)?
Based on the query text and the fact that nearly all records in first satisfy the where clause, what you're essentially trying to do is
identify the 100 second records with the lowest month value
output the contents of the related records in the first table.
To achieve that you can create indexes on
second.month
first.attributex_id
Caveats
Since this query must be optimized, it's safe to say there are many rows in both tables. Since there are only 12 months in the year, the output of the query is probably not deterministic (i.e., it may return a different set of rows each time it's run, even if there is no activity in either table between runs) since many records likely share the same value for month. Adding "tie breaker" column(s) to the index on second may help, though your order by only includes month, so no guarantees. Also, if second.month can have null values, you'll need to decide whether those null values should collate first or last among values.
Also, this particular query is not the only one being run against your data. These indexes will take up disk space and incrementally slow down writes to the tables. If you have a dozen queries that perform poorly, you might fall into a trap of creating a couple indexes to help each one individually and that's not a solution that scales well.
Finally, you stated that
changing the query is not an option
Does that mean you're not allowed to change the text of the query, or the output of the query?
I personally feel like re-writing the query to select from second and then join first makes the goal of the query more obvious. The fact that your initial instinct was to add indexes to first lends credence to this idea. If the query were written as follows, it would have been more obvious that the thing to do is facilitate efficient access to the tiny set of rows in second that you're interested in:
...
from second s
join first f ...
where ...
order by s.month asc limit 100;

Strange behavior when doing where and order by query in postgres

Background: A large table, 50M+, all column in query is indexed.
when I do a query like this:
select * from table where A=? order by id DESC limit 10;
In statement, A, id are both indexed.
Now confusing things happen:
the more rows where returned, the less time whole sql cost
the less rows where returned, the more time whole sql cost
I have a guess here: postgres do the order by first, and then where , so it cost more time to find 10 row in the orderd index when target rowset is small(like find 10 particular sand on beach); oppositeļ¼Œ if target rowset is large, it's easy to find the first 10.
Is it right? Or there are some other reason for this?
Final question: How to optimize this situation?
It can either use the index on A to apply the selectivity, then sort on "id" and apply the limit. Or it can read them already in order using the index on "id", then filter out the ones that meet the A condition until it finds 10 of them. It will choose the one it thinks is faster, and sometimes it makes the wrong choice.
If you had a multi-column index, on (A,id) it could use that one index to do both things, get the selectivity on A and still fetch the already in order by "id", at the same time.
Do you know PGAdmin? With "explain verbose" before your statement, you can check how the query is executed (meaning the order of the operators). Usually first happens the filter and only afterwards the sorting...

SQL query in dire need of optimization

I have this query, which works fine, except it takes a couple of minutes to load. I need help optimizing it so it runs faster and I don't know where to start:
SELECT
job_header.job,
job_header.suffix,
job_header.customer,
job_header.description,
job_header.comments_1,
job_header.date_due,
job_header.part,
job_header.customer_po,
job_header.date_closed,
job_header.flag_hold,
job_header.code_sort,
wo_user_flds.user_7,
wo_user_flds.user_3,
wo_user_flds.user_6,
wo_user_flds.user_5,
wo_user_flds.user_2,
quote_lines.user_2 as serialNo,
quote_lines.user_3 as unit,
quote_lines.user_4 as package
FROM job_header
LEFT JOIN wo_user_flds ON
(job_header.job = wo_user_flds.job) AND
(job_header.suffix = wo_user_flds.suffix)
LEFT JOIN quote_lines ON
(job_header.part = quote_lines.part)
WHERE job_header.date_closed = '000000'
AND LENGTH(job_header.job) > 5;
More information that might be of use:
Only the columns found in the select are the columns I need.
My query returns roughly 400 records.
Job_Header table has 97 columns and 6,300 records.
Wo_User_Flds table has 12 columns and 1,100 records.
Quote_Lines table has 198 columns and 46,000 records.
I could speculate on what I think I need to do, but I'm really just guessing at this point. I looked at similar questions and lot of talk of 'indexes', so I checked and these tables do have some indexes...if that helps? Thanks in advance.
[EDIT]
Thanks for the quick responses guys, really appreciate it. I'm going to look into everything everyone said, but here is the ddl for these tables: http://paste.ubuntu.com/13247664/
[EDIT 2]
My query takes 1 minute to load. My expectations may not be realistic in how much faster it can be. I might have to resort to breaking up the query into more than one and then just assemble the data on the client.
Without any other info you'd need an index on job_header on either (job, date_closed) or (date_closed, job). But post the indexes on the table e.g. sp_helpindex or better still the create index script (right click on the index in SSMS and script the index)
First be sure you have indexes on columns where you JOIN tables and your "WHERE clause column". In this case, you should have indexes on these columns:
--Table job_header indexes, beside unique index
job_header.job
job_header.suffix
job_header.part = quote_lines.part
job_header.date_closed
--Table wo_users_flds indexes, beside unique index
wo_user_flds.job
wo_user_flds.suffix
Then, avoid using UDFs (functions, like LENGHT, CAST, concatenation etc.). But in this case, you can leave LENGTH there. So your query would be same, only your indexes would improve query execution plan drastically.
Also, use execution plan to see where you have INDEX_SCAN and INDEX_SEEK. If you have INDEX_SCAN somewhere, it should be sign that you need index on that column.
This would be for start.

Oracle sql statement on very large table

I relative new to sql and I have a statement which takes forever to run.
SELECT
sum(a.amountcur)
FROM
custtrans a
WHERE
a.transdate <= '2013-12-31';
I's a large table but the statemnt takes about 6 minutes!
Any ideas why?
Your select, as you post it, will read 99% of the whole table (2013-12-31 is just a week ago, and i assume most entries are before that date and only very few after). If your table has many large columns (like varchar2(4000)), all that data will be read as well when oracle scans the table. So you might read several KB each row just to get the 30 bytes you need for amountcur and transdate.
If you have this scenario. create a combined index on transdate and amountcur:
CREATE INDEX myindex ON custtrans(transdate, amountcur)
With the combined index, oracle can read the index to fulfill your query and doesn't have to touch the main table at all, which might result in considerably less data that needs to be read from disk.
Make sure the table has an index on transdate.
create index custtrans_idx on custtrans (transdate);
Also if this field is defined as a date in the table then do
SELECT sum(a.amountcur)
FROM custtrans a
WHERE a.transdate <= to_date('2013-12-31', 'yyyy-mm-dd');
If the table is really large, the query has to scan every row with transdate below given.
Even if you have an index on transdate and it helps to stop the scan early (which it may not), when the number of matching rows is very high, it would take considerable time to scan them all and sum the values.
To speed things up, you could calculate partial sums, e.g. for each passed month, assuming that your data is historical and past does not change. Then you'd only need to scan custtrans only for 1-2 months, then quickly scan the table with monthly sums, and add the results.
Try to create an index only on column amountcur:
CREATE INDEX myindex ON custtrans(amountcur)
In this case Oracle will read most probably only the Index (Index Full Scan), nothing else.
Correction, as mentioned in comment. It must be a composite Index:
CREATE INDEX myindex ON custtrans(transdate, amountcur)
But maybe it is a bit useless to create an index just for a single select statement.
One option is to create an index on the column used in the where clause (this is useful if you want to retrieve only 10-15% rows by using indexed column).
Another option is to partition your table if it has millions of rows. In this case also if you try to retrieve 70-80% data, it wont help.
The best option is first to analyze your requirements and then make a choice.
Whenever you deal with date functions it's better to use to_date() function. Do not rely on implicit data type conversion.

How to speed up SQL query with group by statement + max function?

I have a table with millions of rows, and I need to do LOTS of queries which look something like:
select max(date_field)
where varchar_field1 = 'something'
group by varchar_field2;
My questions are:
Is there a way to create an index to help with this query?
What (other) options do I have to enhance performance of this query?
An index on (varchar_field1, varchar_field2, date_field) would be of most use. The database can use the first index field for the where clause, the second for the group by, and the third to calculate the maximum date. It can complete the entire query just using that index, without looking up rows in the table.
Obviously, an index on varchar_field1 will help a lot.
You can create yourself an extra table with the columns
varchar_field1 (unique index)
max_date_field
You can set up triggers on inserts, updates, and deletes on the table you're searching that will maintain this little table -- whenever a row is added or changed, set a row in this table.
We've had good success with performance improvement using this refactoring technique. In our case it was made simpler because we never delete rows from the table until they're so old that nobody ever looks up the max field. This is an especially helpful technique if you can add max_date_field to some other table rather than create a new one.