Which is the best database for accessing big time-based data? - sql

we need some help, choosing the best database for the following upcoming situation:
Write:
We'll get a huge number of data each minute (round about one million entries) and need to save them in a database. One of the unique identifiers of the entries are timestamps.
Read:
The software should be able to load the stored data for an user-defined time range. The loading process should be as fast as possible.
Currently, we are using the Microsoft SQL Server in our application. I'm not sure, if this is the right technology for our new requirements.
Which database should we use? Do we need to replace MSSQL with something or is another database able to run simultaneously with it?
Thank you!

Related

What is an efficient way to manage a vb form which handles huge data(much higher than 2 gb) with access 2013 as database?

I am currently designing a windows form using vb.net. The internet states that 2 gb is the limit for a .accdb file. However i am required to handle data a lot larger then 2 gb. what is the best way to implement this? Is there anyway i could regularly store data to some other access db and empty my main database? (But would this create troubles in migrating data from accdb to the windows form when demanded by the user?)
Edit: I read somewhere that splitting could help. But i dont see how?- it only creates a copy of the database on your local machine in the network.
You can use Linked table of Microsoft SQL Server 2012 Express edition which has 10 GB limit, the maximum relational database size is 10GB.
You can use MySQL Linked table , 2 TB limitation
It's not easy to give a generic answer without further details.
My first recommendation would be to change DBMS and use SQLite which supports roughly 140 TB Limit
If you must use Access then you will need a master database containing pointers to the real location of the data.
E.G.
MasterDB -> LocationTable -> (id, database_location)
So if you need a resource you will have to query the master with the id to get its actual location and then connect to the secondary and fetch the data.
Or you could have a mapping model where a certain range of IDs are in a certain database and you can keep the logic in code and access the db once.
Use SQL Server Express. It's free.
https://www.microsoft.com/en-us/cloud-platform/sql-server-editions-express
Or, if you don't want to use that, you'll need to split your data into different Access databases, and link to what you need. Do a Google search on this and you'll have everything you need to get going.
I agree with the other posts about switching to a more robust database system, but if you really do have to stay in Access, then yes, it can be done with linked tables.
You can have a master database with queries that use linked tables in multiple databases, each of which can be up to 2 GB. If a particular table needs to have more than the Access limit, then put part of the data in one database and part in another. A UNION query will allow you to return both tables as a single dataset.
Reads and updates are one thing, but there is the not-so-trivial task of managing growth if you need to do inserts. You'll need to know when a database file is about to grow beyond 2 GB and create a new one whose tables must then be linked to your master database.
It won't be pretty.

Is it possible to update a clone database only with changes in the source database?

For reasons I'm not about to explain, We keep a Access database that is to be a copy of a subset of a larger oracle database. It is not feasible to refer to data directly in the Oracle database due to speed issues (don't ask).
Every time a specific application is opened the local Access database is updated from the newest data found to the time of opening the application. First of all this does not capture changes in the existing records. Secondly it does not take into account changes in the source database made after opening the application.
For this reason several checks may be needed when carrying out certain operations in the application. So is it possible to update the local Access database only with changes in the Oracle database in a smarter and faster way than the hard way I am imagining (I'm not a PL/SQL / SQL expert)? Possibly it might be sufficient to look for changes only after a certain date (stored in one of the fields of the recordset retrieved).
Any suggestions?
You might want to look into data replication beethween Oracle en MSAccess databases. For example thru an ODBS drive or sqlserver database. Just google "ms access oracle replication" an see if this solves your problem.

Methods of maintaining sample data in a database

Firstly, let me apologize for the title, as it probably isn't as clear as I think it is.
What I'm looking for is a way to keep sample data in a database (SQL, 2005 2008 and Express) that get modified every so often. At present I have a handful of scripts to populate the database with a specific set of data, but every time the database is changed all the scripts have to be more or less rewritten and I was looking for some alternatives.
I've seen a number of tools and other software for creating sample data in a database, some free and some not. Are there any other methods I haven’t considered?
Thanks in advance for any input.
Edit: Also, if anyone has any advice at all in dealing with keeping data in sync with a changing application or database, that would be of some help as well.
If you are looking for tools for SQL server, go visit Red Gate Software, they have the best tools. They have a data compare tool that you can use to keep lookup type tables up-to-date and a SQL compare tool that you can use to keep the tables synched up between two datbases. So using SQL data compare, create a datbase with all the sample data you want. Then periodically refresh your testing db (or your prod db if these are strictly lookup type tables) using the compare tool.
I also like the alternative of having a script (you can use Red Gate's tool to create scripts) because that means you can store this info in your source control and use it as part of a deployment package to other servers.
You could save them in another database or the same db in different tables distinguished by the name, like employee_test
Joseph,
Do you need to keep just the data in sync, or the schema as well?
One solution to the data question would be SQL Server snapshots. You create a snapshot of your initial configuration, so any changes to the "real" database don't show up in the snapshot. Then, when you need to reset the table, select from the snapshot into a new table. I'm not sure how it will work if the schema changes, but it might be worth a try.
For generation of sample data, the Database project in Visual Studio has functionality that will create fake/random data.
Let me know if this make sense.
Erick

How to restore a database from different computers into one

I have 3 computers having the same sql server 2005 database, I would like to gather the data from the 3 computers to another computer which has the same database. Please help me.
This is called "data conversion" and a lot of your work will be to determine uniqueness on each one of them and coming up with strategies to prevent collisions, mainly primary keys that likely are the same across these databases. No simple answer here, it can be a project in itself.
It might be difficult without any manual data transformation. It depends on your database and type of the data. For example what do you use as a keys? If you have sequential integers as a primary/foreign keys, then you will have to do some manual data transformation. IF you use GUIDS, it will get slightly easier, but you still have to ensure that for example some lookup tables doesn't have different guid keys for same items etc.. But there is no took for doing this automatically.
Maybe if you have some very simple data without any relations to other tables (like table with one column with text messages etc) you can script the data with SQL Server Database Publishing Wizard, and then execute the scripts against your target database.
You need to backup your databases by right clicking in Enterprise Manager and choosing backup before choosing the location etc.
After backing up you can then restore to your local Sql Server by right clicking and choosing restore.
After you have the data locally you will need to write queries to transfer the data to your local database.
Alternatively you can use something like Red Gates Sql Data Compare to compare and transfer data using a visual interface. Although this costs money.
Redgate SQL Toolbelt may be able to help you. You would first copy database to that another computer and then compare it with Sql Data Compare against 3 databases always copying data only one way (to your new database). However I am not 100% sure if it will work like i think it would. You would have to verify it yourself.
Like other people suggested some things like primary keys etc may be problematic.

How do I keep a table synchronized with a query in SQL Server - ETL?

I wan't sure how to word this question so I'll try and explain. I have a third-party database on SQL Server 2005. I have another SQL Server 2008, which I want to "publish" some of the data in the third-party database too. This database I shall then use as the back-end for a portal and reporting services - it shall be the data warehouse.
On the destination server I want store the data in different table structures to that in the third-party db. Some tables I want to denormalize and there are lots of columns that aren't necessary. I'll also need to add additional fields to some of the tables which I'll need to update based on data stored in the same rows. For example, there are varchar fields that contain info I'll want to populate other columns with. All of this should cleanse the data and make it easier to report on.
I can write the query(s) to get all the info I want in a particular destination table. However, I want to be able to keep it up-to-date with the source on the other server. It doesn't have to be updated immediately (although that would be good) but I'd like for it be updated perhaps every 10 minutes. There are 100's of thousands of rows of data but the changes to the data and addition of new rows etc. isn't huge.
I've had a look around but I'm still not sure the best way to achieve this. As far as I can tell replication won't do what I need. I could manually write the t-sql to do the updates perhaps using the Merge statement and then schedule it as a job with sql server agent. I've also been having a look at SSIS and that looks to be geared at the ETL kind of thing.
I'm just not sure what to use to achieve this and I was hoping to get some advice on how one should go about doing this kind-of thing? Any suggestions would be greatly appreciated.
For that tables whose schemas/realtions are not changing, I would still strongly recommend Replication.
For the tables whose data and/or relations are changing significantly, then I would recommend that you develop a Service Broker implementation to handle that. The hi-level approach with service broker (SB) is:
Table-->Trigger-->SB.Service >====> SB.Queue-->StoredProc(activated)-->Table(s)
I would not recommend SSIS for this, unless you wanted to go to something like dialy exports/imports. It's fine for that kind of thing, but IMHO far too kludgey and cumbersome for either continuous or short-period incremental data distribution.
Nick, I have gone the SSIS route myself. I have jobs that run every 15 minutes that are based in SSIS and do the exact thing you are trying to do. We have a huge relational database and then we wanted to do complicated reporting on top of it using a product called Tableau. We quickly discovered that our relational model wasn't really so hot for that so I built a cube over it with SSAS and that cube is updated and processed every 15 minutes.
Yes SSIS does give the aura of being mainly for straight ETL jobs but I have found that it can be used for simple quick jobs like this as well.
I think, staging and partitioning will be too much for your case. I am implementing the same thing in SSIS now but with a frequency of 1 hour as I need to give some time for support activities. I am sure that using SSIS is a good way of doing it.
During the design, I had thought of another way to achieve custom replication, by customizing the Change Data Capture (CDC) process. This way you can get near real time replication, but is a tricky thing.