I would like to use HSQLDB +Hibernate in a server with 5 to 30 clients that will fairly intensively write to the DB.
Each client will persist a dozen thousands lines in a single table every 30 seconds (24/7, that's roughly 1 billion rows/day), and the clients will also query the database for a few thousands lines more or less at random times at an average frequency of a couple of requests every 5 to 10 seconds.
Can HSQLDB handle such a use case or should I switch to MySQL/PostgreSQL ?
You are looking at a total of 2000 - 12000 writes and 5000 - 30000 reads per second.
With fast hardware, HSQLDB can probably handle this with persistent memory tables. With CACHED tables, it may be able to handle the lower range with solid state disks (disk seek time is the main parameter).
See this test. You can run it with MySQL and PostgresSQL for comparison.
http://hsqldb.org/web/hsqlPerformanceTests.html
You should switch. HSQLDB is not for critical apps. Be prepared for data corruption and decreasing startup performance over time.
The main negative hype comes from JBoss: https://community.jboss.org/wiki/HypersonicProduction
See also http://www.coderanch.com/t/89950/JBoss/HSQLDB-production
Also see similar question: Is is safe to use HSQLDB for production? (JBoss AS5.1)
Related
I have oracle table contain 900 million records , this table partioned to 24 partion , and have indexes :
i try to using hint and i put fetch_buffer to 100000:
select /+ 8 parallel +/
* from table
it take 30 minutes to get 100 million records
my question is :
is there are any way more faster to get the 900 million (all data in the table ) ? should i use partions and did 24 sequential queries ? or should i use indexes and split my query to 10 queries for example
The network is almost certainly the bottleneck here. Oracle parallelism only impacts the way the database retrieves the data, but data is still sent to the client with a single thread.
Assuming a single thread doesn't already saturate your network, you'll probably want to build a concurrent retrieval solution. It helps that the table is already partitioned, then you can read large chunks of data without re-reading anything.
I'm not sure how to do this in Scala, but you want to run multiple queries like this at the same time, to use all the client and network resources possible:
select * from table partition (p1);
select * from table partition (p2);
...
Not really an answer but too long for a comment.
A few too many variables can impact this to give informed advice, so the following are just some general hints.
Is this over a network or local on the server? If the database is remote server then you are paying a heavy network price. I would suggest (if possible) running the extract on the server using the BEQUEATH protocol to avoid using the network. Once the file(s) complete, is will be quicker to compress and transfer to destination than transferring the data direct from database to local file via JDBC row processing.
With JDBC remember to set the cursor fetch size to reduce round tripping - setFetchSize. The default value is tiny (10 I think), try something like 1000 to see how that helps.
As for the query, you are writing to a file so even though Oracle might process the query in parallel, your write to file process probably doesn't so it's a bottleneck.
My approach would be to write the Java program to operate off a range of values as command line parameters, and experiment to find which range size and concurrent instances of the Java give optimal performance. The range will likely fall within discrete partitions so you will benefit from partition pruning (assuming the range value is an a indexed column ideally the partition key).
Roughly speaking I would start with range of 5m, and run concurrent instances that match the number of CPU cores - 2; this is not a scientifically derive number just one that I tend to use as my first stab and see what happens.
I have seen the warnings of not using Google Big Table for small data sets.
Does this mean that a workload of 100 QPS could run slower (total time; not per query) than a workload of 8000 QPS?
I understand that 100 QPS is going to be incredibly inefficient on BigTable; but could it be as drastic as 100 inserts takes 15 seconds to complete; where-as a 8000 inserts could run in 1 second?
Just looking for a "in theory; from time to time; yes" vs "probably relatively unlikely" type answer to be a rough guide for how I structure my performance test cycles.
Thanks
There's a flat start up cost to running any Cloud Bigtable operations. That start up cost generally is generally less than 1 second. I would expect 100 operations should take less than 8000 operations. When I see extreme slowness, I usually suspect network latency or some other unique condition.
We're having issues with running small workloads on our Developer Big Table instance (2.5 TB) One instance instead of 3.
We have a key set up on user id and around 100 rows on the key user id. Total records in the database are a few million. We querying big table and seeing 1.4 seconds of latency from fetching the rows associated with a single key of user id. Total number of records returned is less than 100 and we're seeing way over a second of latency. It seems to me that giant workloads are the only way to use this data store. We're looking at other NoSQL alternatives like Redis.
I work on a database where we store sales of about 300 stores. There is 1 table per store and the total amount of lines is about 120 million (4 million for the biggest table).
The machine is a windows server 2008 R2 on a citrix virtual machine with 65Gb memory and SQL Server version is 2014.
Lines are added from the stores to the database via a webservice every minute so that customers (the store ownsers) can view their stats almost almost in real time.
Christmas is close and the amount of sales per day is increasing, it is now something like 100k lines per day.
The monitor says there is about 100-200 queries per second, they are all before their statistics and therefore query a lot of data.
Database I/O says about 0.1Mb/s ~ 0.5Mb/s.
CPU goes from 10% to 50%.
Often, the database server stop responding (no more connection possible) for about 30 sec ~ 2 min and I don't know why.
Is there any way I can find out why ?
Should I upsize or do something else ?
As data is not relational at all, may I go to a nosql solution for better availability ?
We use SQL Server and it can handle that much data. The profiler should give you some useful information.
If the data is not relational nosql will be faster. Depending on your needs the most recent version of MongoDB is worth checking out.
Actually, it was a hardware problem.
Everything is back to normal after changing the hard drive.
I have a task which pulls records from Oracle db to our SQL using dataflow task. This package runs everyday around 45 mins. This package will refresh about 15 tables. except one, others are incremental update. so almost every task runs 2 to 10 mins.
the one package which full replacement runs up to 25 mins. I want to tune this dataflow task to run faster.
There is just 400k of rows in the table. I did read some articles about DefaultBufferSize and DefaultBufferMaxRows. I have below doubts.
If I can set DefaultBufferSize upto 100 MB, Is there any place to look or analyse how much I can provide.
DefaultBufferMaxRows is set to 10k. Even If I give 50k and I provided 10 MB for DefaultBufferSize if which can only hold up to some 20k then what will SSIS do. Just ignore those 30k records or still it will pull all those 50k rocords(Spooling)?
Can I use Logging options to set proper limits?
As a general practice (and if you have enough memory), a smaller number of large buffers is better than a larger number of small buffers BUT not until the point where you have paging to disk (which is bad for obvious reasons)
To test it, you can log the event BufferSizeTuning, which will show you how many rows are in each buffer.
Also, before you begin adjusting the sizing of the buffers, the most important improvement that you can make is to reduce the size of each row of data by removing unneeded columns and by configuring data types appropriately.
How can I configure the maximum memory that a query (select query) can use in sql server 2008?
I know there is a way to set the minimum value but how about the max value? I would like to use this because I have many processes in parallel. I know about the MAXDOP option but this is for processors.
Update:
What I am actually trying to do is run some data load continuously. This data load is in the ETL form (extract transform and load). While the data is loaded I want to run some queries ( select ). All of them are expensive queries ( containing group by ). The most important process for me is the data load. I obtained an average speed of 10000 rows/sec and when I run the queries in parallel it drops to 4000 rows/sec and even lower. I know that a little more details should be provided but this is a more complex product that I work at and I cannot detail it more. Another thing that I can guarantee is that my load speed does not drop due to lock problems because I monitored and removed them.
There isn't any way of setting a maximum memory at a per query level that I can think of.
If you are on Enterprise Edition you can use resource governor to set a maximum amount of memory that a particular workload group can consume which might help.
In SQL 2008 you can use resource governor to achieve this. There you can set the request_max_memory_grant_percent to set the memory (this is the percent relative to the pool size specified by the pool's max_memory_percent value). This setting in not query specific, it is session specific.
In addition to Martin's answer
If your queries are all the same or similar, working on the same data, then they will be sharing memory anyway.
Example:
A busy web site with 100 concurrent connections running 6 different parametrised queries between them on broadly the same range of data.
6 execution plans
100 user contexts
one buffer pool with assorted flags and counters to show usage of each data page
If you have 100 different queries or they are not parametrised then fix the code.
Memory per query is something I've never thought or cared about since last millenium