I am using MySQL database and I would like to know if I make multiple (500 and more) queries simultaneously in order to get information from multiple tables, how these queries are handled? Sequentially or in parallel?
Queries are always handled in parallel between multiple sessions (i.e. client connections). All queries on a single connections are run one-after-another. The level of parallelism between multiple connections can be configured depending on your available server resources.
Generally, some operations are guarded between individual query sessions (called transactions). These are supported by InnoDB backends, but not MyISAM tables (but it supports a concept called atomic operations). There are various level of isolation which differ in which operations are guarded from each other (and thus how operations in one parallel transactions affect another) and in their performance impact.
For more information read about transactions in general and the implementation in MySQL.
Each connection can run a maximum of one query at once, and does it in a single thread. The server opens one thread per query.
Normally, hopefully, queries don't block each other. Depending on the engine and the queries however, they may. There is a lot of locking in MySQL which is discussed in some detail in the manual.
However, if they don't block each other, they can still slow each other down by consuming resources. IO operations are a particular source of these slow-downs. If your data don't fit in memory, you should really limit the number of parallel queries to what your IO subsystem can handle, or things will get really bad. Measurement is the key.
I would normally say that if 500 queries are running at once (and NOT waiting on locks), you may not be getting best value from your hardware (do you have 500 cores? How many threads are waiting for IO?)
Normally all queries will be run in parallel.
But... there are some exceptions to that. Depending on your transaction isolation level a row can be locked while updating. Read more about that over here: http://dev.mysql.com/doc/refman/5.1/en/dynindex-isolevel.html
Related
I am using very large tables containing hundreds of millions of rows, and I am measuring the performances of some queries using SQL Developer, I figured out that there is an option called Unshared SQL worksheet, it allows me to execute many queries at the same time. Executing many queries at the same time is suitable for me especially that some queries or procedure take hours to be executed.
My question is does executing many queries at the same time affect performances? (by performances I mean the duration of execution of queries)
Every query costs something to execute. That's just physics. Your database has a fixed amount of resources - CPU, memory, I/O, temp disk - to service queries (let's leave elastic cloud instances out of the picture). Every query which is executing simultaneously is asking for resources from that fixed pot. Potentially, if you run too many queries at the same time you will run into resource contention, which will affect the performance of individual queries.
Note the word "potentially". Whether you will run into actual problem depends on many things: what resources your queries need, how efficiently your queries have been written, how much resource your database server has available, how efficiently it's been configured to support multiple users (and whether the DBA has implemented profiles to manage resource usage). So, like with almost every database tuning question, the answer is "it depends".
This is true even for queries which hit massive tables such as you describe. Although, if you have queries which you know will take hours to run you might wish to consider tuning them as a matter of priority.
Are Continuous Queries executed in a multi-threaded mode or just a single thread? I am trying to find out what the performance implications are when a millions of entries are added to a cache for which ContinuousQuery is enabled.
Well, both - depends on what you mean by "multi-threaded". Query's remote filter is executed by the same thread that performs the cache update, but the updates themselves are generally performed in multiple threads.
On the performance considerations: calling filters and listeners is relatively fast and having a continuous query should not slow down your application, but make sure not to put heavy code into them and not to acquire locks or use transactions to avoid deadlocks.
I am aware that multiple connections use multiple CPU cores in postgres and hence run in parallel.But when I execute a long running query say 30 seconds(Let's assume that this cannot be optimized further), the I/O is blocked and it does not run any other query from the same client/connection.
Is this by design or can it be improved ?
So I am assuming that the best way to run long running queries is to get a new connection or not to run any other query in the same connection until that query is complete ?
It is a design limitation.
PostgreSQL uses one process per connection, and has one session per process. Each process is single-threaded and makes heavy use of globals inherited via fork() from the postmaster. Shared memory is managed explicitly.
This has some big advantages in ease of development, debugging and maintenance, and makes the system more robust in the face of errors. However, it makes it significantly harder to add parallelization on a query level.
There's ongoing work to add parallel query support, but at present the system is really limited to using one CPU core per query. It can benefit from parallel I/O in some areas, like bitmap index scans (via effective_io_concurrency), but not in others.
There are some IMO pretty hacky workarounds like PL/Proxy but mostly you have to deal with parallelization yourself client-side if it's needed. This is rapidly becoming one of the more significant limitations impacting PostgreSQL. Applications can split up large queries into multiple smaller queries that affect a subset of the data, then unify client-side (or into an unlogged table that then gets further processed), i.e. a map/reduce-style pattern. If a mix of big long running queries and low-latency OLTP queries is needed, multiple connections are required and the app should usually use an internal connection pool.
I want to make sure the stress to the server is minimal while running queries from a read only schema (a user can select data and create temp tables and variables, but can't execute SPs, write and other more advanced stuff). What db hints/other tricks could I use in this situation?
Currently I am:
Using the WITH (NOLOCK) hint for every table
Setting the DEADLOCK_PRIORITY for the whole batch to -10 (although I am not sure it's really needed, since I am using NOLOCK)
My goals is to take as little server resources as possible and allow other more important things to be processed by the server freely. The queries that I am going to send to the server are local (can't be saved as SPs) and there will be many of them coming from various users every 5 minutes. They are generally simple SELECTs and are cheap in isolation. Are there any other ways to make them even less expensive?
EDIT:
I am not the owner of the server I am connecting to, so I can only use the SQL query I am passing to the server to achieve what I want.
The two measures you have taken will have little impact. They are mostly used out of superstitiousness. They can have an impact in rare cases. Practically, READ UNCOMMITTED (which is 100% identical to NOLOCK) enables allocation order scans on B-trees. That is only important for tables that are not in-memory anyway.
If you want to minimize locking and blocking, snapshot isolation can be a simple and very effective solution.
In order to truly minimize the impact of a certain workload you need to use Resource Governor. Everything else are partial solutions/workarounds.
Consider limiting CPU usage, memory, IO and parallelism.
I wrote a Java program to add and retrieve data from an MS Access. At present it goes sequentially through ~200K insert queries in ~3 minutes, which I think is slow. I plan to rewrite it using threads with 3-4 threads handling different parts of the hundred thousands records. I have a compound question:
Will this help speed up the program because of the divided workload or would it be the same because the threads still have to access the database sequentially?
What strategy do you think would speed up this process (except for query optimization which I already did in addition to using Java's preparedStatement)
Don't know. Without knowing more about what the bottle neck is I can't comment if it will make it faster. If the database is the limiter then chances are more threads will slow it down.
I would dump the access database to a flat file and then bulk load that file. Bulk loading allows for optimzations which are far, far faster than running multiple insert queries.
First, don't use Access. Move your data anywhere else -- SQL/Server -- MySQL -- anything. The DB engine inside access (called Jet) is pitifully slow. It's not a real database; it's for personal projects that involve small amounts of data. It doesn't scale at all.
Second, threads rarely help.
The JDBC-to-Database connection is a process-wide resource. All threads share the one connection.
"But wait," you say, "I'll create a unique Connection object in each thread."
Noble, but sometimes doomed to failure. Why? Operating System processing between your JVM and the database may involve a socket that's a single, process-wide resource, shared by all your threads.
If you have a single OS-level I/O resource that's shared across all threads, you won't see much improvement. In this case, the ODBC connection is one bottleneck. And MS-Access is the other.
With MSAccess as the backend database, you'll probably get better insert performance if you do an import from within MSAccess. Another option (since you're using Java) is to directly manipulate the MDB file (if you're creating it from scratch and there are no other concurrent users - which MS Access doesn't handle very well) with a library like Jackess.
If none of these are solutions for you, then I'd recommend using a profiler on your Java application and see if it is spending most of its time waiting for the database (in which case adding threads probably won't help much) or if it is doing processing and parallelizing will help.
Stimms bulk load approach will probably be your best bet but everything is worth trying once. Note that your bottle neck is going to be disk IO and multiple threads may slow things down. MS access can also fall apart when multiple users are banging on the file and that is exactly what your multi-threaded approach will act like (make a backup!). If performance continues to be an issue consider upgrading to SQL express.
MS Access to SQL Server Migrations docs.
Good luck.
I would agree that dumping Access would be the best first step. Having said that...
In a .NET and SQL environment I have definitely seen threads aid in maximizing INSERT throughputs.
I have an application that accepts asynchronous file drops and then processes them into tables in a database.
I created a loader that parsed the file and placed the data into a queue. The queue was served by one or more threads whose max I could tune with a parameter. I found that even on a single core CPU with your typical 7200RPM drive, the ideal number of worker threads was 3. It shortened the load time an almost proportional amount. The key is to balance it such that the CPU bottleneck and the Disk I/O bottleneck are balanced.
So in cases where a bulk copy is not an option, threads should be considered.
On modern multi-core machines, using multiple threads to populate a database can make a difference. It depends on the database and its hardware. Try it and see.
Just try it and see if it helps. I would guess not because the bottleneck is likely to be in the disk access and locking of the tables, unless you can figure out a way to split the load across multiple tables and/or disks.
IIRC access don't allow for multiple connections to te same file because of the locking policy it uses.
And I agree totally about dumping access for sql.