I have a program that runs continuously and saves data to a SQLite database every second.
I want to change this and use a memory database that saves to disk every 15 minutes instead.
My program is written in Java and I use the library SQLite4Java. It works well and I have a 'Job Queue' that handles the inserts with an ExecutorService in Java.
I have found out that the first time I write to file I can use 'SQLiteConnection.initializeBackup(File)' and after that DROP & CREATE the tables in memory and maybe use VACUUM, to clear memory. This works well, but later it gets tricky... How do I append/merge the new data from the memory-DB after another 15 minutes to the existing file?
Is there a standard approach to do this? Would it be good practice to simply fetch all data from each table in memory db (and lock it, to prevent further inserts), loop it through in and insert it in the file db?
Related
I am converting a large SQL database (100GB stored in 10 files, with 100 tables per file) to SQLite. Right now, I am using the CodeProject C# utility, as suggested in another thread (convert sql-server *.mdf file into sqlite file). However, this approach is not entirely satisfactory for two reasons:
The conversion process usually stops abruptly when converting one of my files. Then I have to go in and check which tables were successfully converted or not.
I could manually convert 10 tables at a time; but this requires 100 repetitions and my constant presence in front of my computer.
Thank you so much for your kind regards!
It is possible a "transaction log" is being created. This is the log used to rollback changes if something goes wrong. Since your job is so large, this log file can grow too large and the process will fail.
Try this:
1) Back up the data.
2) Turn off the log with this: PRAGMA database.journal_mode = OFF;
Caveat: I've never tried this with SqlLite but other databases work in a similar fashion.
I am a C# developer, I am not really good with SQL. I have a simple questions here. I need to move more than 50 millions records from a database to other database. I tried to use the import function in ms SQL, however it got stuck because the log was full (I got an error message The transaction log for database 'mydatabase' is full due to 'LOG_BACKUP'). The database recovery model was set to simple. My friend said that importing millions records using task->import data will cause the log to be massive and told me to use loop instead to transfer the data, does anyone know how and why? thanks in advance
If you are moving the entire database, use backup and restore, it will be the quickest and easiest.
http://technet.microsoft.com/en-us/library/ms187048.aspx
If you are just moving a single table read about and use the BCP command line tools for this many records:
The bcp utility bulk copies data between an instance of Microsoft SQL Server and a data file in a user-specified format. The bcp utility can be used to import large numbers of new rows into SQL Server tables or to export data out of tables into data files. Except when used with the queryout option, the utility requires no knowledge of Transact-SQL. To import data into a table, you must either use a format file created for that table or understand the structure of the table and the types of data that are valid for its columns.
http://technet.microsoft.com/en-us/library/ms162802.aspx
The fastest and probably most reliable way is to bulk copy the data out via SQL Server's bcp.exe utility. If the schema on the destination database is exactly identical to that on the source database, including nullability of columns, export it in "native format":
http://technet.microsoft.com/en-us/library/ms191232.aspx
http://technet.microsoft.com/en-us/library/ms189941.aspx
If the schema differs between source and target, you will encounter...interesting (yes, interesting is a good word for it) problems.
If the schemas differ or you need to perform any transforms on the data, consider using text format. Or another format (BCP lets you create and use a format file to specify the format of the data for export/import).
You might consider exporting data in chunks: if you encounter problems it gives you an easier time of restarting without losing all the work done so far.
You might also consider zipping the exported data files up to minimize time on the wire.
Then FTP the files over to the destination server.
bcp them in. You can use the bcp utility on the destination server for the BULK IMPORT statement in SQL Server to do the work. Makes no real difference.
The nice thing about using BCP to load the data is that the load is what is described as a 'non-logged' transaction, though it's really more like a 'minimally logged' transaction.
If the tables on the destination server have IDENTITY columns, you'll need to use SET IDENTITY statement to disable the identity column on the the table(s) involved for the nonce (don't forget to reenable it). After your data is imported, you'll need to run DBCC CHECKIDENT to get things back in synch.
And depending on what your doing, it can sometimes be helpful to put the database in single-user mode or dbo-only mode for the duration of the surgery: http://msdn.microsoft.com/en-us/library/bb522682.aspx
Another approach I've used to great effect is to use Perl's DBI/DBD modules (which provide access to the bulk copy interface) and write a perl script to suck out the data from the source server, transform it and bulk load it directly into the destination server, without having to save it to disk and move it. Also means you can trap errors and design things for recovery and restart right at the point of failure.
Use BCP to migrate data.
Another approach i have used in the past is to take a backup of the transaction log and shrink the log Prior to the migration. Split the migration script in parts and run the log backup- shrink - migrate iteration a few times.
I need to push a large SQL table from my local instance to SQL Azure. The transfer is a simple, 'clean' upload - simply push the data into a new, empty table.
The table is extremely large (~100 million rows) and consist only of GUIDs and other simple types (no timestamp or anything).
I create an SSIS package using the Data Import / Export Wizard in SSMS. The package works great.
The problem is when the package is run over a slow or intermittent connection. If the internet connection goes down halfway through, then there is no way to 'resume' the transfer.
What is the best approach to engineering an SSIS package to upload this data, in a resumable fashion? i.e. in case of connection failure, or to allow the job to be run only between specific time windows.
Normally, in a situation like that, I'd design the package to enumerate through batches of size N (1k row, 10M rows, whatever) and log to a processing table what the last successful batch transmitted would be. However, with GUIDs you can't quite partition them out into buckets.
In this particular case, I would modify your data flow to look like Source -> Lookup -> Destination. In your lookup transformation, query the Azure side and only retrieve the keys (SELECT myGuid FROM myTable). Here, we're only going to be interested in rows that don't have a match in the lookup recordset as those are the ones pending transmission.
A full cache is going to cost about 1.5GB (100M * 16bytes) of memory assuming the Azure side was fully populated plus the associated data transfer costs. That cost will be less than truncating and re-transferring all the data but just want to make sure I called it out.
Just order by your GUID when uploading. And make sure you use the max(guid) from Azure as your starting point when recovering from a failure or restart.
I want to save some temporary data in memory, which should be removed after server shuts down.
There is a temporary table in HSQLDB, but the data is removed immediately after transaction committed, which is too short for me. On the other side, the memory table keeps a script log file and resume the data when server new starts. It takes time and place to maintain such script logs, which are useless for my situation.
what I need is just a type of table, only the table structure is persistent in hard disk, the data and the data operations should only be performed in memory. Otherwise why do I need a in-memory DB instead of mysql?
Is there such type of table in HSQLDB?
thanks
Create the file: database, then create the tables. Perform SHUTDOWN. Edit the .properties file for the database, add the setting below and save.
files_readonly=true
When you perform your tests with this database, no data is written to disk.
Alternatively, with the latest versions of HSQLDB 2.2.x, you can can specify this property on the connection URL during the tests. For example
jdbc:hsqldb:file:myfilepath;files_readonly=true
Some years ago I built an Excel file for my colleagues that displays lots of data from an external ODBC data source. The data is partitioned into lots of data tables in different sheets. The file also contains a button that allows the user to update the data.
Since accessing the data from the external source was very slow, I implemented some caching logic that stored parts of the results, that were unlikely to change, in external tables on our SQL server and did some magic to keep the data synchronized. The excel file itself only accesses the SQL server. Every data table uses an SPROC to get part of the data.
Fast forward 5 years. The Excel file has grown in size and contain so many sheets and data that our Excel (still version 2003) got problems with it. So my colleagues split the file into two halfs.
The problem is now, that both excel files contain the logic to update the data and it can happen that a user clicks the update button in file no. 1 while another user is already updating file no. 2.
That's the point where the updating logic goes berserk and produces garbage.
The update run is only required once for both excel files because it updates all the data that's displayed in both files. It's quite expensive and lasts from 5 to 15 minutes.
I could split the update run into two halves as well, but that wouldn't make it any faster and updating the two files would take twice as long.
What I think about is some kind of mutex: User A clicks on the update button and the update run starts. User B wants to update too, but the (VBA/SPROC) logic detects that there's already an update running and waits till it finishes.
You could perform the updates in a Transaction with Serializable isolation level; your update code would need to detect and handle SQL Server error 1205 (and report to user that another update is in process).
Alternatively, add a rowversion timestamp to each row and only update a row if it hasn't been changed since you loaded it.
But when A has finished, B will run the update 'for nothing'.
Instead: When A clicks update, call a stored proc which fires the update asynchronously.
When the update starts, it looks at the last time it ran itself and exits if it was less than X minutes ago.