I'm setting up a virtuoso server on my local machine, the databse is not big (about 2GB)
The application I'm using the server for needs to make a very large number of queries and the results need to come fast.
The HDD I'm using is mechanical, so it's not that fast, I am now trying to find a way to allocate part of my main memory as a local storage so that I can put the database file on it.
is there's an easy way to do that ?
That's not what RAM is for.
If your server ever lost power, you would lose all of the data.
If you want a faster HDD, get one with a higher RPM, or get an SSD.
Take a look at the performance Tuning Guide...
It details, how to configure exactly what you are looking for.
Data is still held on disk - but the more data that can be loaded into memory too will see better performance.
get all your data into memory and that's probably as fast as it gets :-)
There's a software called RamDisk plus
you can see a demo here:
http://www.youtube.com/watch?v=vAdRsQJBEBE
This software allows you to create a disk partition right out of your RAM
Related
I am running some spatial queries on tables that have records close to a billion. However, I cannot understand why Postgres is not using the memory in the dedicated server (with 32GB of memory). I tuned the server based on the suggestions in here. However, couldn't see any difference in the running time at all and see only under 100Mb of memory usage. I would expect Postgres to consume more memory by loading bigger chunks of data to it; thus, reducing the disk reads and the time. What could I be doing wrong here?
Already looked at these posts:
https://dba.stackexchange.com/questions/18484/tuning-postgresql-for-large-amounts-of-ram
http://patshaughnessy.net/2016/1/22/is-your-postgres-query-starved-for-memory
I've seen this question asked in many ways all over the Internet but despite implementing the abundance of advice (and some voodoo), I'm still struggling. I have a 100GB+ database that is constantly inserting and updating records in very large transactions (200+ statements per trans). After a system restart, the performance is amazing (data is written to a large SATA III SSD connected via USB 3.0). The SQL Server instance is running on a VM running under VMWare Workstation. The host is set to hold the entire VM in memory. The VM itself has a paging cache of 5000 MB. The SQL Server user is set to 'hold pages in memory'. I have 5 GBs of RAM allocated to the VM, and the max memory of the SQL Server instance is set to half a Gig.
I have played with every single one of these parameters to attempt to maintain consistent performance, but sure and steady, the performance eventually degrades to the point where it begins to time out. Here's the kicker though, if I stop the application that's loading the database, and then execute the stored proc in the Management Studio, it runs like lightning, clearly indicating it's not an issue with the query, and probably nothing to do with memory management or paging. If I then restart the loader app, it still crawls. If I reboot the VM however, the app once again runs like lightning...for a while...
Does anybody have any other suggestions based upon the symptoms presented?
Depending on how large your hot set is, 5GB memory may just tax it for a 100+gb database.
Check indices and query plans. We can not help you without them. And I bet you miss some indices - which is the standard performance issue people have.
Otherwise, once you made your homework - head over to dba.stackexchange.com and ask there.
Generally - consider that 200 statements per transaction may simply indicate a seriously sub-optimal programming. For example you could bulk-load the data into a temp table then merge into the final one.
Actually, I may have a working theory. What I did was add some logic to the app that when it times out, sit for two minutes, and then try again, and voila! Back to full speed. I rubber-ducky'd my co-worker and came up with the concept that my perceived SSD write speeds were actually the write speed to the VMWare host's virtual USB 3 buffer, and that the actual SSD write speeds were slower. I'm probably hitting against the host's buffer size and by forcing the app to wait 2 minutes, the host has a chance to dump its back-buffered data to the SSD. Elementary, Watson :)
If this approach also fails to be sustainable, I'll report in.
Try executing this to determine your problem queries:
SELECT TOP 20
qs.sql_handle,
qs.execution_count,
qs.total_worker_time AS Total_CPU,
total_CPU_inSeconds = --Converted from microseconds
qs.total_worker_time/1000000,
average_CPU_inSeconds = --Converted from microseconds
(qs.total_worker_time/1000000) / qs.execution_count,
qs.total_elapsed_time,
total_elapsed_time_inSeconds = --Converted from microseconds
qs.total_elapsed_time/1000000,
st.text,
qp.query_plan
FROM
sys.dm_exec_query_stats as qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as st
cross apply sys.dm_exec_query_plan (qs.plan_handle) as qp
ORDER BY qs.total_worker_time desc
Then check your estimated and actual execution plans on the queries this command helps you pinpoint.
Source How do I find out what is hammering my SQL Server? and at the bottom of the page of http://technet.microsoft.com/en-us/magazine/2007.11.sqlquery.aspx
Beyond the excellent indexing suggestions already given,
be sure to read up on parameter sniffing. That could be the cause of the problem.
SQL Server - parameter sniffing
http://www.sommarskog.se/query-plan-mysteries.html#compileps
As a result you could have a bad query plan being re-used, or SQL's buffer could be getting full and writing pages out of memory to disk (maybe that's other allocated memory in your case).
You could run DBCC FreeProcCache and DBCC FreeSystemCache to empty it and see if you get a performance boost.
You should give SQL more memory too - as much as you can while leaving room for other critical programs and the OS. You might have 5gb of Ram on the VM, but SQL is only getting to play with a 1/2 gb, which seems REALLY small for what you're describing.
If those things don't move you in the right direction, install the SQL Management Data Warehouse so you can see exactly what is happening when your slow down begins. Running it takes up additional memory, but you will give the DBA's more to go on.
In the end, what I did was a combination of two things, putting in logic to recover when timeouts occurred, and setting the host core count to only reflect physical cores, not logical cores, so for example, the host has 2 cores that are hyper-threaded. When I set my VM to use 4 cores, it occasionally gets hung in some infinite loop, but when I set it to 2 cores, it runs without fail. Still, aberrant behavior like this is difficult to mitigate reliably.
My rails application always reaches the threshold of the disk I/O rate set by my VPS at Linode. It's set at 3000 (I up it from 2000), and every hour or so I will get a notification that it reaches 4000-5000+.
What are the methods that I can use to minimize the disk IO rate? I mostly use Sphinx (Thinking Sphinx plugin) and Latitude and Longitude distance search.
What are the methods to avoid?
I'm using Rails 2.3.11 and MySQL.
Thanks.
did you check if your server is swapping itself to death? what does "top" say?
your Linode may have limited RAM, and it could be very likely that it is swapping like crazy to keep things running..
If you see red in the IO graph, that is swapping activity! You need to upgrade your Linode to more RAM,
or limit the number / size of processes which are running. You should also add approximately 2x the RAM size as Swap space (swap partition).
http://tinypic.com/view.php?pic=2s0b8t2&s=7
Since your question is too vague to answer concisely, this is generally a sign of one of a few things:
Your data set is too large because of historical data that you could prune. Delete what is no longer relevant.
Your tables are not indexed properly and you are hitting a lot of table scans. Check with EXAMINE on each of your slow queries.
Your data structure is not optimized for the way you are using it, and you are doing too many joins. Some tactical de-normalization would help here. Make sure all your JOIN queries are strictly necessary.
You are retrieving more data than is required to service the request. It is, sadly, all too common that people load enormous TEXT or BLOB columns from a user table when displaying only a list of user names. Load only what you need.
You're being hit by some kind of automated scraper or spider robot that's systematically downloading your entire site, page by page. You may want to alter your robots.txt if this is an issue, or start blocking troublesome IPs.
Is it going high and staying high for a long time, or is it just spiking temporarily?
There aren't going to be specific methods to avoid (other than not writing to disk).
You could try using a profiler in production like NewRelic to get more insight into your performance. A profiler will highlight the actions that are taking a long time, however, and when you examine the specific algorithm you're using in that action, you might discover what's inefficient about that particular action.
How does the work_mem option in Postgres work? Here's the description from http://www.postgresql.org/docs/8.4/static/runtime-config-resource.html:
Specifies the amount of memory to be used by internal
sort operations and hash tables before switching to
temporary disk files. The value defaults to one megabyte
(1MB). Note that for a complex query, several sort or
hash operations might be running in parallel; each one
will be allowed to use as much memory as this value
specifies before it starts to put data into temporary
files. Also, several running sessions could be doing
such operations concurrently. So the total memory used
could be many times the value of work_mem; it is
necessary to keep this fact in mind when choosing the
value. Sort operations are used for ORDER BY, DISTINCT,
and merge joins. Hash tables are used in hash joins,
hash-based aggregation, and hash-based processing of IN
subqueries.
I'm probably totally wrong here but..isn't "switching to temporary disk files" essentially the same thing as "virtual memory" in the operating system? Wouldn't the OS just create a swap file once the RAM is gone? Wouldn't it be better to set this to something like 100TB and let the OS figure it out? Before I potentially mess up my system, I want to check if anyone actually tried this approach.
PostgreSQL will for example switch to a sorting operation more suitable for on-disk sort than in-memory sort if it knows the sort will happen on disk - which it won't know if it happens in swap.
Also, PostgreSQL can switch to a completely different plan (for example, using a different JOIN method) if it figures out the data does not fit in RAM.
Setting work_mem too high will get you a very slow database as soon as you have enough data so that everything doesn't always fit in RAM anymore.
Keep in mind that work_mem is the maximum amount of RAM that can be used for every single sort operation. For a single query, multiple sort operations might run in parallel and there might be multiple connections querying the database at once. For that reason all sort operations may use x-times the amount of work_mem in RAM (that's the reason a conservative amount is recommended).
Now back to your question, if you choose a work_mem to a such high value, sort operations might use up most of your RAM, which leads to page in and out's from swap (keep in mind, there are lots of other processes and PostgreSQL parts that need some (or even lots of) RAM. Disk-based sort operations are by factors more efficient than page swaps done by the OS. As some of the other replies pointed out, a database server which has swap out and in constantly will perform extremely slow.
Another point is, that with such a high work_mem value, a single query (purposely or by accident) might more or less make the whole database server go unresponsive.
A database server that swaps is a dead database server.
In RAM postgres uses quicksort, on disk it uses another algorithm which is much more suited to harddisks. Using quicksort on swapped-out memory will be incredibly slow.
The OS is generic in the terms it handles swap, besides, there's a finite amount of address space a process can use, which isn't that big on 32 bit systems(2Gb on a windows 32 bit platform, can be enhanced to 3Gb), but you're right, you could let the OS handle this through virtual memory.
PostgreSQL is not 'generic' it'll know much better than the OS how to structure data once disk access is involved, so letting the database switch over to explicit file handling once memory is exhausted will have benefits over letting the OS handle it.
I have a problem with a large database I am working with which resides on a single drive - this Database contains around a dozen tables with the two main ones are around 1GB each which cannot be made smaller. My problem is the disk queue for the database drive is around 96% to 100% even when the website that uses the DB is idle. What optimisation could be done or what is the source of the problem the DB on Disk is 16GB in total and almost all the data is required - transactions data, customer information and stock details.
What are the reasons why the disk queue is always high no matter the website traffic?
What can be done to help improve performance on a database this size?
Any suggestions would be appreciated!
The database is an MS SQL 2000 Database running on Windows Server 2003 and as stated 16GB in size (Data File on Disk size).
Thanks
Well, how much memory do you have on the machine? If you can't store the pages in memory, SQL Server is going to have to go to the disk to get it's information. If your memory is low, you might want to consider upgrading it.
Since the database is so big, you might want to consider adding two separate physical drives and then putting the transaction log on one drive and partitioning some of the other tables onto the other drive (you have to do some analysis to see what the best split between tables is).
In doing this, you are allowing IO accesses to occur in parallel, instead of in serial, which should give you some more performance from your DB.
Before buying more disks and shifting things around, you might also update statistics and check your queries - if you are doing lots of table scans and so forth you will be creating unnecessary work for the hardware.
Your database isn't that big after all - I'd first look at tuning your queries. Have you profiled what sort of queries are hitting the database?
If you disk activity is that high while your site is idle, I would look for other processes that might be running that could be affecting it. For example, are you sure there aren't any scheduled backups running? Especially with a large db, these could be running for a long time.
As Mike W pointed out, there is usually a lot you can do with query optimization with existing hardware. Isolate your slow-running queries and find ways to optimize them first. In one of our applications, we spent literally 2 months doing this and managed to improve the performance of the application, and the hardware utilization, dramatically.