I have a statement (in simple we could see it like this: )
Select * from VIEW where View.target='alpha'
The view is using multiple tables - table X, Y, Z for example, and the target will be in X. The view returns a dataset of 2 billion rows and with this query the database will load all of them and search where target equals 'alpha'. Now is there any possibility to make this query faster?
Maybe there is a possibility to make the view from which I load a little bit smaller (?) I think when I could make the 'target='alpha' statement within the view and the view would then be smaller this would really help... but I think when I do this statement within the view the problem would be the same because then the view would do the same (Am I right?)
At the end it would be better when I could have the view as it is and do the work within the new statement but if someone would have an idea that would work when I change the view this could be done also.
Thank you!
A view is a stored SQL statement, and to make a view faster you will have to make the stored SQL statement faster.
The SQL statement depends on tables and to speed up queries against tables you use index. The benefits with index are plenty, read this great answer to a SO question here. It explains what index is, how it works and why its needed.
So to prevent a full table scan you add indexes to make the query faster. In your case you need to identify the columns in the SQL statement that make a good fit for an index. Usually columns that are commonly used in WHERE, ORDER BY, JOIN and GROUP BY clauses make a good fit to be included in an index.
In your case, start looking at the table where the target column exists. Start by adding an index there and then continue with its relations to other tables in the query. This will eventually result in a faster response time when using your view.
To create index on your Oracle table. Read the Oracle docs here
Related
I have a table with over 300 million records only containing key, source and hash value. The inbuilt sql of the application runs a large IN predicate on hash value in its sql to fetch the data.The sqls are performing slowly so need suggestions on how to improve the performance of the sql. I wouldn't be able to change the sql as its an internal sql already built-in the application. So far I have tried to put in an index on the key and another on hash column but doesn't provide much help.
Off the top of my head, you could create a table, perhaps a temporary table (which can also accept indices), containing the values in the IN predicate of your current query. Then, a simple inner join of your original query to this table would have the same effect, except it might be substantially faster if DB2 can take advantage of the index. You would need to make sure that the columns involved in the join both have an index.
You (or your DBA) might like to try the Design Advisor to advise on new indexes etc.
This can be run from the command line tool db2advis https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.admin.perf.doc/doc/c0005144.html
Or from Data Server Manager which provides a Web Based front end for the
Query Advisor and Access Path Advisor. https://www-01.ibm.com/support/docview.wss?uid=swg27048195
I have a very massive PostgreSQL database table. One of its columns stores a boolean parameter. What I want is a very simple query:
SELECT COUNT(*) FROM myTable WHERE myTable.isSthTrue;
The problem I have is that it is very slow as it needs to check every row if it satisfies the criteria. Adding an index to this parameter speeds up the calculation roughly two times, but doesn't really improve the complexity in general.
Some people on the internet suggest adding triggers that update the count which is stored in a separate table. But that feels like too much effort for an easy problem like this.
If you need an exact result.
Then, yes, a trigger-based solution is likely the best path to go.
If an estimate is okay, consider Materialized Views. (Postgres 9.3+)
Something like CREATE MATERIALIZED VIEW myCount AS SELECT COUNT(*) FROM myTable WHERE myTable.isSthTrue; would maintain a copy of the expensive query you reference. The only caveat is that this aggregate view would not be automatically updated. To do that you need to call REFRESH MATERIALIZED VIEW, which could be done in a cron, or other timed task.
what will happen when I query a view in sql 2000?
Does it query the underlying table? Is making a view on table fasten the query execution?
There is a table with 10 million records(Records from 2005 to till date).
The users are often interested in the current year data.So if i create a view on the table,only using the current year data then would that increase the query performance,while querying the view ?
I mean instead of querying the table with where condition year='2013' , the user can query the view and get better performance?
As Damien stated in his comment, the definitions of views are expanded and the query used to build them is "inlined" into your own query, so it is unlikely that you will get any performance increase by using a standard view.
An alternative is to create an indexed view, which is a standard view that has a clustered index created on it, effectively materialising it and making it act as if it were a table. In non-enterprise editions of SQL server you need to use the NOEXPAND hint when referencing the indexed view to make the optimiser take the index into account, but you can get performance gains this way.
There are special considerations for indexed views, certain settings and conditions that must be set for them to work, it's all detailed here:
Creating an Indexed View
I have two potential roads to take on the following problem, the try it and see methodology won't pay off for this solution as the load on the server is constantly in flux. The two approaches I have are as follows:
select *
from
(
select foo.a,bar.b,baz.c
from foo,bar,baz
-- updated for clarity sake
where foo.a=b.bar
and b.bar=baz.c
)
group by a,b,c
vice
create table results as
select foo.a,bar.b,baz.c
from foo,bar,baz
where foo.a=b.bar
and b.bar=baz.c ;
create index results_spanning on results(a,b,c);
select * from results group by a,b,c;
So in case it isn't clear. The top query performs the group by outright against the multi-table select thus preventing me from using an index. The second query allows me to create a new table that stores the results of the query, proceeding to create a spanning index, then finishing the group by query to utilize the index.
What is the complexity difference of these two approaches, i.e. how do they scale and which is preferable in the case of large quantities of data. Also, the main issue is the performance of the overall select so that is what I am attempting to fix here.
Comments
Are you really doing a CROSS JOIN on three tables? Are those three
columns indexed in their own right? How often do you want to run the
query which delivers the end result?
1) No.
2) Yes, where clause omitted for the sake of discussion as this is clearly a super trivial example
3) Doesn't matter.
2nd Update
This is a temporary table as it is only valid for a brief moment in time, so yes this table will only be queried against one time.
If your query is executed frequently and unacceptably slow, you could look into creating materialized views to pre-compute the results. This gives you the benefit of an indexable "table", without the overhead of creating a table every time.
You'll need to refresh the materialized view (preferably fast if the tables are large) either on commit or on demand. There are some restrictions on how you can create on commit, fast refreshable views, and they will add to your commit time processing slightly, but they will always give the same result as running the base query. On demand MVs will become stale as the underlying data changes until these are refreshed. You'll need to determine whether this is acceptable or not.
So the question is, which is quicker?
Run a query once and sort the result set?
Run a query once to build a table, then build an index, then run the query again and sort the result set?
Hmmm. Tricky one.
The use cases for temporary tables are pretty rare in Oracle. They normally onlya apply when we need to freeze a result set which we are then going to query repeatedly. That is apparently not the case here.
So, take the first option and just tune the query if necessary.
The answer is, as is so often the case with tuning questions, it depends.
Why are you doing a GROUP BY in the first place. The query as you posted it doesn't do any aggregation so the only reason for doing GROUP BY woudl be to eliminate duplicate rows, i.e. a DISTINCT operation. If this is actually the case then you doing some form of cartesian join and one tuning the query would be to fix the WHERE clause so that it only returns discrete records.
I have a 30gb table, which has 30-40 columns. I create reports using this table and it causes performance problems. I just use 4-5 columns of this table for the reports. So that, I want to create a second table for the reports. But the second table must be updated when the original table is changed without using triggers.
No matter what my query is, When the query is executed, sql tries to cache all 30gb. When the cache is fully loaded, sql starts to use disk. Actually I want to aviod this
How can I do this?
Is there a way of doing this using ssis
thanks in advance
CREATE VIEW myView
AS
SELECT
column1,
column3,
column4 * column7
FROM
yourTable
A view is effectively just a stored query, like a macro. You can then select from that view as if it were a normal table.
Unless you go for matierialised views, it's not really a table, it's just a query. So it won't speed anything up, but it does encapsulate code and assist in controlling what data different users/logins can read.
If you are using SQL Server, what you want is an indexed view. Create a view using the column you want and then place an index on them.
An indexed view stores the data in the view. It should keep the view up-to-date with the underlying table, and it should reduce the I/O for reading the table. Note: this assumes that your 4-5 columns are much narrower than the overall table.
Dems answer with the view seems ideal, but if you are truly looking for a new table, create it and have it automatically updated with triggers.
Triggers placed on the primary table can be added for all Insert, Update and Delete actions upon it. When the action happens, the trigger fires and can be used to do additional function... such as update your new secondary table. You will pull from the Inserted and Deleted tables (MSDN)
There are many great existing articles here on triggers:
Article 1, Article 2, Google Search
You can create that second table just like you're thinking, and use triggers to update table 2 whenever table 1 is updated.
However, triggers present performance problems of their own; the speed of your inserts and updates will suffer. I would recommend looking for more conventional alternatives to improve query performance, which sounds like SQL Server since you mentioned SSIS.
Since it's only 4-5 out of 30 columns, have you tried adding an index which covers the query? I'm not sure if there are even more columns in your WHERE clause, but you should try that first. A covering index would actually do exactly what you're describing, since the table would never need to be touched by the query. Of course, this does cost a little in terms of space and insert/update performance. There's always a tradeoff.
On top of that, I can't believe that you would need to pull a large percentage of rows for any given report out of a 30 gb table. It's simply too much data for a report to have. A filtered index can improve query performance even more by only indexing the rows that are most likely to be asked for. If you have a report which lists the results for the past calendar month, you could add a condition to only index the rows WHERE report_date > '5/1/2012' for example.