I have a query that is pulling information from one table. That table is rather large at 1.8 Million rows and growing by week. The query takes quite a while to run and is problematic when pulling multiple times. Is there any process that may speed up a query in a database with this many or more rows. I have another one with around 5 Million rows... The query is rather basic using a prompt to pull the rows relevant to the site number, and a prompt for between dates.
Arrival_ID criteria = [Select Arrival ID]
Week criteria = Between[Select week begin:] And [Select week end:]
Any help or direction pointing would be greatly appreciated.
Indexes on the columns Arrival_ID and Week might help.
Unless you're selecting a lot of columns from a very wide table, you should get fairly quick performance from Access on 1.8 million rows, as long as your indexes are selective.
I agree with Kieren Johnstone - can you store the data in SQL and then use access to run the queries?
Do double check the indexes.
When you compact/repair - do it twice - make it a habit. The second time clears up any issues set aside from the first one.
Related
I've been trying to figure out a performance issue for a while and would appreciate if someone can help me understand the issue.
Our application is connected to Oracle 11g. We have a very big table in which we keep data for last two months. We do millions of inserts every half an hour and a big bulk delete operation at the end of each day. Two of our columns are indexed and we definitely have skewed columns.
The problem is that we are facing many slow responses when reading from this table. I've done some researches as I am not a DB expert. I know about bind variable peeking and cursor sharing. The issue is that even for one specific query with a specific parameters, we see different execution time!
There is no LOB column in the table and the query we use to read data is not complex! it looks for all rows with a specific name (column is indexed) within a specific range (column is indexed).
I am wondering if the large number of insertions/deletions we do could cause any issues?
Is there any type of analysis we could consider to get more input on this issue?
I can see several possible causes of the inconsistency of your query times.
The number of updates being done while your query is running. As long as there are locks on the tables you use in the query your query has to wait for them to be release.
The statistics on the table can become very out of synch with this much data manipulation. I would try two things. First, I would find out when the DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC job is run and make sure the bulk delete is performed before this job each night. If this does not help I would ask the DBA to set up DBMS_MONITOR on your database to help you trouble shoot the issue.
Currently we have an AuditLog table that holds over 11M records. Regardless on the indexes and statistics any query referencing this table takes a long time. Most reports don't check for Audit records past a year but we would still like to keep these records. Whats the best way to handle this?
I was thinking of keeping the AuditLog table to hold all records less than or equal to a year old. Then move any records greater than a year old to an AuditLogHistory table. Maybe just running a batch file every night to move these records over and then update the indexes and statistics of the AuditLog table. Is this an okay way to complete this task? Or what other way should I be storing older records?
The records brought back from the AuditLog table hit a linked server and check in 6 different db's to see if a certain member exists in them based on a condition. I don't have access to make any changes to the linked server db's so can only optimize what I have which is the Auditlog. Hitting the linked server db's uses up over 90% of the queries cost. So I'm just trying to limit what I can.
First, I find it hard to believe that you cannot optimize a query on a table with 11 million records. You should investigate the indexes that you have relative to the queries that are frequently run.
In any case, the answer to your question is "partitioning". You would partition by the date column and be sure to include this condition in all queries. That will reduce the amount of data and probably speed the processing.
The documentation is a good place to start for learning about partitioning.
Given we have a big table in a relational database we need to query.
We have two options:
query the whole table
query subsets of data inside the table i.e. rows from 1 to 1000, then 1001 to 2000 etc.
Does this separation make some sense?
Does it depend on query structure?
Let's add some math. Given some query execution time is proportional to n^3 where n is the number of rows in the table. This means in first case query execution time is proportional to n^3. As for second option - its different. Total time would be (n/3)^3 + (n/3)^3 + (n/3)^3 = n^3 / 9 which is better.
Real life is more complicated: the query would not be the same in this case, we have to spend some time limiting the rows to the subset.
Also number of connections and concurrency of database can be limited thus we would not be able to query it simultaneously by 10 queries, for example, at least with the same speed.
But does these reasons make sense? May this help to cut time expenses for some big tables?
It depends on a lot of criteria. Some of them being :
How busy the database is ? That is how many parallel query is
running ?
Reason : If there are a large number of query running or any query with a number of parallel session then querying on the big table will be slow while smaller will work faster.
Into how many smaller table the bigger table have been divided into ?
Reason : One point to consider here that if a big table is divided
into several small tables and run the query on each of the smaller tables, then the individual results needed to be aggregated. This may take time depending on the query.
Type of query is being executed
Reason : If you are running a query having filtering condition on a column and you divide the large table based on values of that column, then you can skip some of the tables based on query condition and hence reduce time of output
Overall in such a scenario instead of dividing a big table into smaller ones it is better to partition the table. Range Partition can be used on the bigger table for faster query execution.
I'm re-designing a front-end for SQL Server in Access, to allow non-programmers in our company to query the database.
The problem is that some of the tables are very large. At the moment I'm using linked tables. One query that I'm trying to allow, accesses five tables including that large one. The table has millions of rows, as it has every transaction ever made in the company.
When I tried the query in Access it took minutes and would not finish, and Access just froze. So instead I decided to use a subquery to narrow down the large table before doing the joins. Every entry in the table has a date, so I made a subquery and filtered it to return only the current day just to test. In fact, because I was just testing, I even filtered it even further to only return the date column. This narrows it down to 80,000 entries or so. Eventually I did get results, but it took around three minutes, and that's just the subquery I'm testing. Once results DID return, Access would freeze every time I attempted to use the scroll bar.
Next I tried pass-through queries, thinking it'd be faster. It was faster, but still took around a minute and a half, and still had the freezing problems with the scroll bar. The issue is that this same query takes only 3 seconds on SQL server (the date query I mean.) I was hoping that I could get this query very fast and then use this for the join.
I could use views, but the problem is that I want the user to be able to specify the date range.
Is there anything I can do to speed up this performance or am I screwed?
It makes no sense to let the users scroll through 10th of thousands of records. They will be lost in the data flood. Instead, provide them means to analyze the data. First answer the question: "what kind of information do the users need? “ They might want to know how many transactions of a certain type have occurred during the day or within an hour. They might want to compare different days. Let the users group the data; this reduces the number of records that have to be transmitted and displayed. Show them counts, sums or averages. Let them filter the data or present them the grouped data in charts.
We currently have a search on our website that allows users to enter a date range. The page calls a stored procedure that queries for the date range and returns the appropriate data. However, a lot of our tables contain 30m to 60m rows. If a user entered a date range of a year (or some large range), the database would grind to a halt.
Is there any solution that doesn't involve putting a time constraint on the search? Paging is already implemented to show only the first 500 rows, but the database is still getting hit hard. We can't put a hard limit on the number of results returned because the user "may" need all of them.
If the user inputed date range is to large, have your application do the search in small date range steps. Possibly using a slow start approach: first search is limited to, say one month range and if it bings back less than the 500 rows, search the two preceding months until you have 500 rows.
You will want to start with most recent dates for descending order and with oldest dates for ascending order.
It sounds to me like this is a design and not a technical problem. No one ever needs millions of records of data on the fly.
You're going to have to ask yourself some hard questions: Is there another way of getting people their data than the web? Is there a better way you can ask for filtering? What exactly is it that the users need this information for and is there a way you can provide that level of reporting instead of spewing everything?
Reevaluate what it is that the users want and need.
We can't put a hard limit on the
number of results returned because the
user "may" need all of them.
You seem to be saying that you can't prevent the user from requesting large datasets for business reasons. I can't see any techical way around that.
Index your date field and force a query to use that index:
CREATE INDEX ix_mytable_mydate ON mytable (mydate)
SELECT TOP 100 *
FROM mytable WITH (INDEX ix_mytable_mydate)
WHERE mydate BETWEEN #start and #end
It seems that the optimizer chooses FULL TABLE SCAN when it sees the large range.
Could you please post the query you use and execution plan of that query?
Don't know which of these are possible
Use a search engine rather than a database?
Don't allow very general searches
Cache the results of popular searches
Break the database into shards on separate servers, combine the results on your application.
Do multiple queries with smaller date ranges internally
It sounds like you really aren't paging. I would have the stored procedure take a range (which you calculated) for the pages and then only get those rows for the current page. Assuming that the data doesn't change frequently, this would reduce the load on the database server.
How is your table data physically structured i.e. partitioned, split across Filegroups and disk storage etc. ?
Are you using table partitioning? If not you should look into using aligned partitioning. You could partition your data by date, say a partition for each year as an example.
Where I to request a query spanning three years, on a multiprocessor system, I could concurrently access all three partitions at once, thereby improving query performance.
How are you implementing the paging?
I remember I faced a problem like this a few years back and the issue was to do with how I implemented the paging. However the data that I was dealing with was not as big as yours.
Parallelize, and put it in ram (or a cloud). You'll find that once you want to access large amounts of data at the same time, rdbms become the problem instead of the solution. Nobody doing visualizations uses a rdbms.