Apache Ignite Continuous Query - ignite

Are Continuous Queries executed in a multi-threaded mode or just a single thread? I am trying to find out what the performance implications are when a millions of entries are added to a cache for which ContinuousQuery is enabled.

Well, both - depends on what you mean by "multi-threaded". Query's remote filter is executed by the same thread that performs the cache update, but the updates themselves are generally performed in multiple threads.
On the performance considerations: calling filters and listeners is relatively fast and having a continuous query should not slow down your application, but make sure not to put heavy code into them and not to acquire locks or use transactions to avoid deadlocks.

Related

Performant Distributed Locking

I am evaluating the best approach to distributed locking. The oob reentrant locking support in Ignite is tied to the thread that acquires locks. Our requirements need locking and unlocking not tied to the same thread/process, one thread/process can lock and other thread/process can unlock. So while evaluating alternatives, we came across 2 options,
Use semaphores - This works but is extremely slow. Cannot use this.
Do manual locking - Use a separate cache where we add entries for locks and remove entries for unlock. But it needs too many cases to handle like nodes going down during put ops, which other node gets through, etc.
Just wanted to check if there are other oob performant options available that we might have overlooked.
TIA

Query parallelization for single connection in Postgres

I am aware that multiple connections use multiple CPU cores in postgres and hence run in parallel.But when I execute a long running query say 30 seconds(Let's assume that this cannot be optimized further), the I/O is blocked and it does not run any other query from the same client/connection.
Is this by design or can it be improved ?
So I am assuming that the best way to run long running queries is to get a new connection or not to run any other query in the same connection until that query is complete ?
It is a design limitation.
PostgreSQL uses one process per connection, and has one session per process. Each process is single-threaded and makes heavy use of globals inherited via fork() from the postmaster. Shared memory is managed explicitly.
This has some big advantages in ease of development, debugging and maintenance, and makes the system more robust in the face of errors. However, it makes it significantly harder to add parallelization on a query level.
There's ongoing work to add parallel query support, but at present the system is really limited to using one CPU core per query. It can benefit from parallel I/O in some areas, like bitmap index scans (via effective_io_concurrency), but not in others.
There are some IMO pretty hacky workarounds like PL/Proxy but mostly you have to deal with parallelization yourself client-side if it's needed. This is rapidly becoming one of the more significant limitations impacting PostgreSQL. Applications can split up large queries into multiple smaller queries that affect a subset of the data, then unify client-side (or into an unlogged table that then gets further processed), i.e. a map/reduce-style pattern. If a mix of big long running queries and low-latency OLTP queries is needed, multiple connections are required and the app should usually use an internal connection pool.

updating 2 800 000 records with 4 threads

I have a VB.net application with an Access Database with one table that contains about 2,800,000 records, each raw is updated with new data daily. The machine has 64GB of ram and i7 3960x and its over clocked to 4.9GHz.
Note: data sources are local.
I wonder if I use ~10 threads will it finish updating the data to the rows faster.
If it is possiable what would be the mechanisim of deviding this big loop to multiple threads?
Update: Sometimes the loop has to repeat the calculation for some row depending on results also the loop have exacly 63 conditions and its 242 lines of code.
Microsoft Access is not particularly good at handling many concurrent updates, compared to other database platforms.
The more your tasks need to do calculations, the more you will typically benefit from concurrency / threading. If you spin up 10 threads that do little more than send update commands to Access, it is unlikely to be much faster than it is with just one thread.
If you have to do any significant calculations between reading and writing data, threads may show a performance improvement.
I would suggest trying the following and measuring the result:
One thread to read data from Access
One thread to perform whatever calculations are needed on the data you read
One thread to update Access
You can implement this using a Producer / Consumer pattern, which is pretty easy to do with a BlockingCollection.
The nice thing about the Producer / Consumer pattern is that you can add more producer and/or consumer threads with minimal code changes to find the sweet spot.
Supplemental Thought
IO is probably the bottleneck of your application. Consider placing the Access file on faster storage if you can (SSD, RAID, or even a RAM disk).
Well if you're updating 2,800,000 records with 2,800,000 queries, it will definitely be slow.
Generally, it's good to avoid opening multiple connections to update your data.
You might want to show us some code of how you're currently doing it, so we could tell you what to change.
So I don't think (with the information you gave) that going multi-thread for this would be faster. Now, if you're thinking about going multi-thread because the update freezes your GUI, now that's another story.
If the processing is slow, I personally don't think it's due to your servers specs. I'd guess it's more something about the logic you used to update the data.
Don't wonder, test. Write it so you could dispatch as much threads to make the work and test it with various numbers of threads. What does the loop you are talking about look like?
With questions like "if I add more threads, will it work faster"? it is always best to test, though there are rule of thumbs. If the DB is local, chances are that Oded is right.

How simultaneous queries are handled in a MySQL database?

I am using MySQL database and I would like to know if I make multiple (500 and more) queries simultaneously in order to get information from multiple tables, how these queries are handled? Sequentially or in parallel?
Queries are always handled in parallel between multiple sessions (i.e. client connections). All queries on a single connections are run one-after-another. The level of parallelism between multiple connections can be configured depending on your available server resources.
Generally, some operations are guarded between individual query sessions (called transactions). These are supported by InnoDB backends, but not MyISAM tables (but it supports a concept called atomic operations). There are various level of isolation which differ in which operations are guarded from each other (and thus how operations in one parallel transactions affect another) and in their performance impact.
For more information read about transactions in general and the implementation in MySQL.
Each connection can run a maximum of one query at once, and does it in a single thread. The server opens one thread per query.
Normally, hopefully, queries don't block each other. Depending on the engine and the queries however, they may. There is a lot of locking in MySQL which is discussed in some detail in the manual.
However, if they don't block each other, they can still slow each other down by consuming resources. IO operations are a particular source of these slow-downs. If your data don't fit in memory, you should really limit the number of parallel queries to what your IO subsystem can handle, or things will get really bad. Measurement is the key.
I would normally say that if 500 queries are running at once (and NOT waiting on locks), you may not be getting best value from your hardware (do you have 500 cores? How many threads are waiting for IO?)
Normally all queries will be run in parallel.
But... there are some exceptions to that. Depending on your transaction isolation level a row can be locked while updating. Read more about that over here: http://dev.mysql.com/doc/refman/5.1/en/dynindex-isolevel.html

Will it be faster to use several threads to update the same database?

I wrote a Java program to add and retrieve data from an MS Access. At present it goes sequentially through ~200K insert queries in ~3 minutes, which I think is slow. I plan to rewrite it using threads with 3-4 threads handling different parts of the hundred thousands records. I have a compound question:
Will this help speed up the program because of the divided workload or would it be the same because the threads still have to access the database sequentially?
What strategy do you think would speed up this process (except for query optimization which I already did in addition to using Java's preparedStatement)
Don't know. Without knowing more about what the bottle neck is I can't comment if it will make it faster. If the database is the limiter then chances are more threads will slow it down.
I would dump the access database to a flat file and then bulk load that file. Bulk loading allows for optimzations which are far, far faster than running multiple insert queries.
First, don't use Access. Move your data anywhere else -- SQL/Server -- MySQL -- anything. The DB engine inside access (called Jet) is pitifully slow. It's not a real database; it's for personal projects that involve small amounts of data. It doesn't scale at all.
Second, threads rarely help.
The JDBC-to-Database connection is a process-wide resource. All threads share the one connection.
"But wait," you say, "I'll create a unique Connection object in each thread."
Noble, but sometimes doomed to failure. Why? Operating System processing between your JVM and the database may involve a socket that's a single, process-wide resource, shared by all your threads.
If you have a single OS-level I/O resource that's shared across all threads, you won't see much improvement. In this case, the ODBC connection is one bottleneck. And MS-Access is the other.
With MSAccess as the backend database, you'll probably get better insert performance if you do an import from within MSAccess. Another option (since you're using Java) is to directly manipulate the MDB file (if you're creating it from scratch and there are no other concurrent users - which MS Access doesn't handle very well) with a library like Jackess.
If none of these are solutions for you, then I'd recommend using a profiler on your Java application and see if it is spending most of its time waiting for the database (in which case adding threads probably won't help much) or if it is doing processing and parallelizing will help.
Stimms bulk load approach will probably be your best bet but everything is worth trying once. Note that your bottle neck is going to be disk IO and multiple threads may slow things down. MS access can also fall apart when multiple users are banging on the file and that is exactly what your multi-threaded approach will act like (make a backup!). If performance continues to be an issue consider upgrading to SQL express.
MS Access to SQL Server Migrations docs.
Good luck.
I would agree that dumping Access would be the best first step. Having said that...
In a .NET and SQL environment I have definitely seen threads aid in maximizing INSERT throughputs.
I have an application that accepts asynchronous file drops and then processes them into tables in a database.
I created a loader that parsed the file and placed the data into a queue. The queue was served by one or more threads whose max I could tune with a parameter. I found that even on a single core CPU with your typical 7200RPM drive, the ideal number of worker threads was 3. It shortened the load time an almost proportional amount. The key is to balance it such that the CPU bottleneck and the Disk I/O bottleneck are balanced.
So in cases where a bulk copy is not an option, threads should be considered.
On modern multi-core machines, using multiple threads to populate a database can make a difference. It depends on the database and its hardware. Try it and see.
Just try it and see if it helps. I would guess not because the bottleneck is likely to be in the disk access and locking of the tables, unless you can figure out a way to split the load across multiple tables and/or disks.
IIRC access don't allow for multiple connections to te same file because of the locking policy it uses.
And I agree totally about dumping access for sql.