I am planning to use HSQL has a inmemeory datastore(only inmemory /no disk backup) .Then i wil take periodic backup of HSQL every x min (eg 15 min ) so that i can restore the data in case if box goes down for some reason.
Few doubts:
1)Is HSQL good for storing large amount of data . (Eg 15 GB)
2)Will search be good ? I guess yes since it is inmemory
3)Any other concerns?
4)Have you used HSQL for such purpose?
5)Any other open source which supports SQL like queries. I know memsql but its not open sourced
Yes. I have used HSQL DB as in-memory database. I run a stand-alone java application with 20 GB heap and HSQL DB. I have faced no problems. Application process nearly 1 million records everyday and has round 12GB of data.
I do not see major concerns with HSQL DB. But, concurrent access and locking is something which has to be taken care properly.
There are many open source in-memory databases.
Derby (Java Db), H2, HSQL DB.
You can also use MySQL, PostGRESQL, Oracle Coherence (recently released)
Here is a nice article which will maybe help you with your decision.
What I think:
I have intensively used HSQLDB 2 years ago in in-memory mode but not with very large amount of data (2-3 GB) and it performed well enough. However it was said that it is not the best solution for production usage.
HSQLDB doesn't support full text search - use H2 for it. Also H2 has built-in support for clustering and replication according to their comparison.
If you don't need ACID in your application and feel that key-value storage is enough - I certainly recommend using Redis. It is designed to perform well on in-memory data and certainly can handle milions of rows.
Related
I got a task to increase Oracle SQL database or web service performance. The web service required billions of data from the Oracle SQL database. Web service needs to populate those billions of data for each startup. Those data is mostly read-only and very rarely need an update or write data.
It is a very old codebase. That is why the solution was done in a way that it loads all data in memory to increase the performance. That is why it is slowing down development. It is like the first launch takes 30+ minutes. If for some reason those in-memory cached data becomes corrupted, I have to reload data from the database. It means another 30+ minutes waiting.
My task is to update this process. I have the flexibility to change the SQL database to something else that could help to speed up this process. Do you have any suggestion? Thanks in advance!
You can try to use MySQL. From my knowledge, MySQL has no limitation for the size of the database. I've attached a comparison you can look at between MySQL and Oracle. Comparison
I am working on a solution architecture and am having hard time choosing between Azure SQL DB or SQL DW.
The current scope involves around developing real-time BI reporting solution which is based on multiple sources. But in the long run the solution may be extended into a full fledged EDW and Marts.
I initially thought of using SQL DW so that for future scope the MPP capabilities could be used. But when I spoke to a mate who recently used SQL DW, he explained that the the development in SQL DW is not similar to SQL DB.
I have worked previously on Real Time reporting with no scope for EDW and we successfully used SQL DB. With this as well we can create Facts and Dimension and Marts.
Is there a strong case where I should be choosing SQL DW over SQL DB?
I think the two most important data points you can have here is the volume of data you're processing and the number of concurrent queries that you need to support. When talking about processing large volume data, and by large, I mean more than 3tb (which is not even really large, but large enough), then Azure SQL Data Warehouse becomes a juggernaut. The parallel processing is simply amazing (it's amazing at smaller volumes too, but you're paying a lot of money for overkill). However, the one issue can be the simultaneous query limit. It currently has a limit of 128 concurrent queries with a limit of 1,000 queries queued (read more here). If you're using the Data Warehouse as a data warehouse to process large amounts of data and then feed them into data marts where the majority of the querying takes place, this isn't a big deal. If you're planning to open this to large volume querying, it quickly becomes problematic.
Answer those two questions, query volume and data volume, and you can more easily decide between the two.
Additional factors can include the issues around the T-SQL currently supported. It is less than traditional SQL Server. Again, for most purposes around data warehousing, this is not an issue. For a full blown reporting server, it might be.
Most people successfully implementing Azure SQL Data Warehouse are using a combination of the warehouse for processing and storage and Azure SQL Database for data marts. There are exceptions when dealing with very large data volumes that need the parallel processing, but don't require lots of queries.
The 4 TB limit of Azure SQL Database may be an important factor to consider when choosing between the two options. Queries can be faster with Azure SQL Data Warehouse since is a MPP solution. You can pause Azure SQL DW to save costs with Azure SQL Database you can scale down to Basic tier (when possible).
Azure SQL DB can support up to 6,400 concurrent queries and 32k active connections, where Azure SQL DW can only support up to 32 concurrent queries and 1,024 active connections. So SQL DB is a much better solution if you are using something like a dashboard with thousands of users.
About developing for them, Azure SQL Database supports Entity Framework but Azure SQL DW does not support it.
I want also to give you a quick glimpse of how both of them compare in terms of performance 1 DWU is approximately 7.5 DTU (Database Throughput Unit, used to express the horse power of an OLTP Azure SQL Database) in capacity although they are not exactly comparable. More information about this comparison here.
Thanks for you responses Grant and Alberto. The responses have cleared a lot of air to make a choice.
Since, the data would be subject to dash-boarding and querying, I am tilting towards SQL Database instead of SQL DW.
Thanks again.
I was wondering if in addition to process and display data on dashboard in wso2cep, can I store it somewhere for a long period of time to get further information later? I have studied there are two types of tables used in wso2cep, in-memory and rdbms tables.
Which one should I choose?
There is one more option that is to switch to wso2das. Is it a good approach?
Is default database is fine for that purpose or I should move towards other supported databases like sql, orcale etc?
In-memory or RDBMS?
In-memory tables will internally use java collections structures, so it'll get destroyed once the JVM is terminated (after server restart, data won't be available). On the other hand, RDBMS tables will persist data permanently. For your scenario, I think you should proceed with RDBMS tables.
CEP or DAS?
CEP will only provide real-time analytics, where DAS provides batch analytics (with Spark SQL) in addition to real-time analytics. If you have a scenario which require batch processing, incremental processing, etc ... You can go ahead with DAS. Note that, migration form CEP to DAS is quite simple (since the artifacts are identical).
Default (H2) DB or other DB?
By default WSO2 products use embedded H2 DB as data source. However, it's recommended to use MySQL or Oracle in production environments.
I have a data intensive project for which I wrote the code recently, the data and sp live in a MS SQL db. My initial estimate is that the db will grow to 50TB, then it will become fairly static in growth. The final application will perform lots of row level look ups and readings, with a very small percentile of db write backs.
With the above scenario in mind, its being suggested that I should look at a NoSQL option in order to scale to the large load of data and transactions, and after a bit of research the roads leads to Neo4j (while considering MongoDB as a second alternative)
I would appreciate your guidance with the following set of initial questions:
-Does Neo4j support the concept of store procs? and does it supports conditional statements (if then, else, loops, etc)?
-Would I be able to install and run the 50TB db on a single node (single Windows Server)?
-Does Neo4j support/leverage multiple CPUs in single server (ex: 4 CPUs)?
-Would open source version be able to support the 50TB db? or would I need to purchase the ENT version?
Regards,
-r
Currently we have 100+ databases, some about 10GB in size with millions of records and they are growing at an alarming rate. We need to evaluate our archiving strategy.
Does anyone have any suggestions and sample scripts that go through all the tables and archives the data into an ARCHIVED database - with everything being audited (in regards to number of records imported etc..) and in case of failure it rolls back everything?
Regards
Partitioning can help a lot to archive within single database. Sliding window scenario is a particular tool.
Let me suggest setting up an Admin database. It will handle all setting and information about archiving.
There may be 2 SQL Server instances: Current Server and Archive Server. They will have same structure.
Process copies data from remote server to archive server using settings from Admin DB. There may be need in writing Dynamic SQL. Check Sp_MSForEachDB.
+1 to partitioning idea. To add - I think you can use it also if you have Developers edition