best database for reading millions of records on secondary index - aerospike

currently i am using aerospike database for storing near about 27 million records on EBS and when i scan this data on secondary index, throughput is very low. please somebody suggest what are the possible alternatives. thanks in advance.

Posting the comment as an answer..
Performance on EBS will be slow and IOPS will limit the read tps. Read tps will be same as IOPS on EBS only store. Use bcache or data-in-memory true.
http://www.aerospike.com/docs/operations/plan/ssd/bcache/

Related

Redis performance if number of keys are increasing

We have a Redis Database that currently contains more than 10 Million keys in Production environment and as per our prediction it might grow to more than 100-200 Million keys.
Will it impact my Read/Write time to Redis?
I think raising of keys count will not impact the benchmark of Redis, but the write/read rate is limited to your Resources and you can't expect Redis to response you more than potential of resources. So if you try to read/write more, it may result to delay, connection lost or ...
My suggestion is to use Redis cluster(with multiple servers) to increase the read/write rate.

Can Redis save 30 TB data?

Redis is a good solution for my work, but the problem is Redis needs much memory to save data. And my data is too big. Is there some solution that I can save such big data? Can Redis compress these data to save?
Thanks!
Answer to your question is, YES. Like chris, pointed out you would need 30TB RAM system or a distributed system with total of 30 TB RAM.
BUT
Effective way to use redis is to use it as a cache store, otherwise you could have used any NOSQL database for BIG data like yours. Why would you need 30TB data in cache? Instead, you can try one of the following approach,
Store the actual content/data on disk and store the index of data in redis. You can use secondary indexes by redis.
Encode/Compress your data using any loss-less compression technique and then store in redis. Redis will not compress data for you.

Google BigQuery basic questions

These may be few basic questions.
When i load data into BQ tables, where exactly data stored? (If billing is already enabled). if it is data center, what would be data center capacity? Does our data co-exist with other users data?
When we fire queries, How our queries processed? What is the default compute engine used for this?
How can we increase query processing capacity?
Thanks
CP
BigQuery datacenter capacity is practically unlimited. If you plan to upload petabytes in a very short time frame you might need to contact support first just to make sure, but for normal big loads everything should be fine.
BigQuery doesn't use compute engine, but a series of very large clusters where all queries run. That's the secret to a low cost per query, without ongoing costs per hour like other alternatives.
BigQuery increases the number of CPUs involved in your query elastically as the query needs. You don't need to manage storage nor processing capacity.

Can Compression helps the performance of SSIS or SQL import export

I have records in range of 20-30 millions in a table, I want to export these records to another server. using ssis or import export wizard of SQL, but speed is coming very low like 9000 records per 5 sec. I altered the maximum buffer size, and this is the best i can get, I feel like if data is compressed in source server. It should help transfer rate. Please comment ! any help is appreciated..
Thanks
This is more or less unknowable for us. Specifically, we don't know what the bottleneck is for your export, so compressing the data may or may not help. Now, my feeling is that your bottleneck is likely something other than how quickly SQL Server can get the data off of disk, but it might be. For your transfer, do a wait analysis from within SQL Server (perhaps with an extended event session) and see what it's waiting on.

What is "Excessive resource usage" in SQL Azure?

I searched online for awhile about what is "Excessive resource usage" on SQL Azure, still cannot get an idea.
Some articles suggest query takes too long, too much memory etc will cause "Excessive resource usage". But If I use simple query, simple data structure, what will happen?
For example: I get a 1G SQL Azure as session state. Since session is a very small string, and save/delete all the time, I don't think it will grow to 1G for millions of session simultaneously. You can calculate, for 1 million session, 20 char each, only take 20M space, consider 20 minutes expire etc. Cannot even close to 1G. But the queries, should be lots and lots. Each query will be very simple and fast by index.
I wanna know, if this use will be consider as "Excessive resource usage"? Is there any hard number to limit you on the usage?
Btw, as example above, if all happen in same datacenter, so all cost is 1G database which is $10 a month, right?
Unfortunately the answer is 'it depends'. I think that probably the best reference (with guidance) on the SQL Azure Query Throttle is here: TechNet Article on SQL Azure Perormance This will povide details about the metrics that are monitored and the mechanism of the throttle.
The reason that I say it depends is that the throttle is non-deterministic for any given user. This is because the throttle will be activated based on the total load on the node (physical SQL Server in Azure DC). While the subscribers who will get throttled are the subscribers delivering the greatest load the level at which the throttle kicks in will depend on the total load on the node. SO if you are on a quiet node (where other tenant DBs are relatively inactive) then you will be able to put through a bunch more throughput than if you are on a busy node.
It is very appealing to use 1GB SQL Azure DBs for session state storage; you've identified the cost benefits. You are taking a risk though. One way to mitigate this risk is to partition across at least two SQL Azure 1GB DBs and adjust the load yourself based on whether one of the DBs starts hitting the throttle.
Another option if you want determinism for throughput is to use the WIndows Azure Cache to back your sesion state store. The Cache has hard pre-defined limits for query throughput so you can plan for it more easily Azure Caching FAQ including Limits. The Cache approach is probably a bit more expensive but with a lower risk of problems.