How to recover HDB with tplogs? - backup

Our system is currently backing up tplogs to S3. From what I have read, simply making sure these files are in the place that kdb expects them will allow for recovery if there is an issue with RDB during the day.
However, I did not see an explanation of how to use the tplogs to recover HDB. I tempted to create another backup system to sync the hdb folders to S3 also. That will be more work to set up and use at least double the storage, as well as being redundant. So if its not necessary then I would like to avoid that extra step.
Is there a way to recover the HDB from the tplogs in the event that we lose access to our HDB folders, or do I need to add another backup system for the HDB folders? Thanks.

To replay log file to HDB.
.Q.hdpf[`::;get `:tpLOgFile;.z.d;`sym]
As per my experience if you are building a HDB from TP logfile load tp log file using get function and save it using dpft that is efficient.
If you want to use -11! function then you have to provide a upd function(-11! read each row from tp log file and call upd function then insert data to in memory table) to load data in memory and then save data on disk.
In both case you have to load data in memory but by using get function you can skip upd function call
-11! function is efficient for building the RDB because it will not load the full log file.
For more details read Below link http://www.firstderivatives.com/downloads/q_for_Gods_July_2014.pdf

OK, actually found a forum answer to a similar question, with a script for replaying log files.
https://groups.google.com/forum/#!topic/personal-kdbplus/E9OkvJKGrLI
Jonny Press says:
The usual way of doing it is to use -11! to replay the log file. A basic script would be something like
// load schema
\l schema.q
// define upd
upd:insert
// replay log file
-11!`:schema2015.07.09
// save
.Q.hdpf[`::;`:hdb;2015.07.09;`sym]
This will read the full log file into memory. So you will need to have RAM available.
TorQ has a TP log replay script:
https://github.com/AquaQAnalytics/TorQ/blob/master/code/processes/tickerlogreplay.q

Related

(OS X) Determine if file is being written to?

My app is monitoring a "hot" folder somewhere on the local filesystem for newly added files to push to a network location. I'm running into a problem when very large files are being written into the hot folder: the file system event notifying me of changes in the hot folder will fire well before the file completes writing. When my app tries to upload the file, it mis-reads the file size as the current number of copied bytes, not the eventual total number of bytes.
Things I've tried:
NSURL getResourceValue:forKey:error: to read NSURLAllocatedFileSizeKey (same value as NSURLFileSizeKey while the file is being written).
NSFileManager attributesOfItemAtPath:error: to look at NSFileBusy (always NO).
I can't seem to find any mechanism short of repeatedly polling a file for its size to determine if the file is finished copying and can be uploaded.
There aren't great ways to do this.
If you can be certain that the writer is using NSFileCoordinator, then you can also use that to coordinate your access to the file.
Likewise, if you're sure that the writer has opted in to advisory locking, you could try to open the file for shared access by calling open() with the O_SHLOCK and O_NONBLOCK flags. If you succeed, then there are no other descriptors open for exclusive access. You can either use the file descriptor you've got or close it and then use some other API to access the file.
However, if you can't be sure of any of those, then your best bet may be to set a timer to repeatedly check the file's metadata (size, date modified, etc.). Only when you see that it has stopped changing over a reasonable time interval (2 seconds, maybe) would you attempt to access it (and cancel the timer).
You might want to do all three. Wait for the file's metadata to settle down, then use a NSFileCoordinator to read from the file. When it calls your reader block, use open() with O_SHLOCK | O_NONBLOCK to make sure there are no other processes which have exclusive access to it.
You need some form of coordinated file locking.
fcntl() and flock() are common functions for this.
Read up on it first.
Then see what options you have.
If you can control the code base of those other processes, all the better.
The problem with really large files is that what's changed or changing inside them is opaque and isn't always at the end.
Good processes should generally be doing atomic writes. (Write to a temp file then swap it out) but if these files are actually databases then you will want to look at using the db's server app for this sort of thing.
If the files are wrappers containing other files then it gets extra messy as those contents might have dependencies on one another to be in a usable state.

Can I use an rsync log file from a dry run as an input to a real run?

I like rsync. I can see what files will be deleted first. But what happens if during the backup, a sector of the source disk fails? Files could be deleted from the destination that should not be. However, if I check the log file for all deletion files first, then use the log file as instructions to rsync, then a source disk failure during backup should result in a lower probability of data loss.
I've read the man page and have to conclude that the answer is no. If not rsync, then what?
You can mitigate source disk failure risk using
--delete-after receiver deletes after transfer, not during
That will not delete files if a IO error is produced during copy.
But for ensuring integrity of your backup, I think the right way is using:
--only-write-batch=FILE like --write-batch but w/o updating destination
That will write diffs into a file. Once batch is created, you move it to destination machine, and apply diffs with:
--read-batch=FILE read a batched update from FILE

downloading huge files - application using grails

I am developing a RESTful web service that allows users to download data in csv and json formats that is dynamically retrieved from the database.
Right now I am using a StringWriter to write out the CSV data. My major concern is that the resultset could get very large depending the on the user input. In that case, having them all in memory doesn't seem to be a good idea to me.
I am thinking of creating a temp file, but how to make sure the file gets deleted soon after the download completes.
Is there a better way to do this.
Thanks for the help.
If memory is the issue, you could simply write out to the response writer that writes directly to the output stream? This way you're not storing anything (much) in memory and no need to write out temporary files:
// controller action for CSV download
def download = {
response.setContentType("text/csv")
response.setHeader("Content-disposition", "attachment;filename=downloadFile.csv")
def results = // get all your results
results.each { result ->
out << result.col1 << ',' << result.col2 // etc
out << '\n'
}
}
This writes out to the output stream as it is looping round your results.
In theory You can make this even more memory efficient by using a scrollable results set - see "Using Scrollable Results" section of Querying with GORM - Criteria - and looping round that whilst writing out to the response writer. In theory this means you're also not loading all your DB results into memory, but in practice this may not work as expected if you're using MySQL (and its Java connector). Manually batching up queries may work too (get DB rows 1-10000, write out, get 10001-20001, etc)
This kind of thing might be more difficult with JSON, depending on what library you're using to render your objects.
Well, the simplest solution to preventing temp files from sticking around too long would be a cron job that simply deletes any file in the temp directory that has a modified time older than, say, 1 hour.
If you want it to all be done within Grails, you could design a Quartz job to clean up files. This job could either do as described above (and simply check modification timestamps to decide what to delete) or you could run the job only "on demand" with a parameter of a file name to be deleted. Once the download action is called you could schedule the cleanup of that specific file for X minutes later (to allow enough time for a successful download). The job would then be in charge of simply deleting the file.
Depending on the number of files involved you can always use http://download.oracle.com/javase/1,5.0/docs/api/java/io/File.html#deleteOnExit() to ensure the file is blown away when the VM shuts down.
To create a temp file that gets automatically deleted after the session has expired, you can use the Session Temp Files plugin.

Intersystems Cache routine to write process information to a file on local system?

I am interested in creating a routine that would query the currently running cache processes and then write this information to a file. How could this be done in Cache 2008.2?
PERFMON might be what you're looking for. That's app with it's own UI, but you can call it's functions directly too, as an API.
Check the Cache docs for "Cache Monitoring Guide". That will give you links to PERFMON docs, as well as docs for other system monitoring tools.
You might find something useful in the Class Reference, under packages %SYSTEM, %SYS, and %Monitor.
For some process info you might need to shell out to the OS. In that case check into the $ZF function. That will let you invoke os-level commands from within Cache.
Oh, and you might want to consider saving the process data within the Cache DB, rather than dumping it out to a file. That is, create a Persistent Class with Properties corresponding to each process attribute that you want to capture, then write code to create, populate, and save instances of that class, taking the data from PERFMON or whatever other source you choose.
If you do that you can use Cache SQL to generate whatever kind of report you need. (Cache will automatically generate a SQL Table corresponding to your Persistent Class.) Cache supports ODBC, so you can use an external tool like Crystal Reports or Access for that part.
Obviously that will be more work than just echoing data to a file, but some kind of structure will be needed if you're going to do anything interesting with the information.

Platform independent file locking?

I'm running a very computationally intensive scientific job that spits out results every now and then. The job is basically to just simulate the same thing a whole bunch of times, so it's divided among several computers, which use different OSes. I'd like to direct the output from all these instances to the same file, since all the computers can see the same filesystem via NFS/Samba. Here are the constraints:
Must allow safe concurrent appends. Must block if some other instance on another computer is currently appending to the file.
Performance does not count. I/O for each instance is only a few bytes per minute.
Simplicity does count. The whole point of this (besides pure curiosity) is so I can stop having every instance write to a different file and manually merging these files together.
Must not depend on the details of the filesystem. Must work with an unknown filesystem on an NFS or Samba mount.
The language I'm using is D, in case that matters. I've looked, there's nothing in the standard lib that seems to do this. Both D-specific and general, language-agnostic answers are fully acceptable and appreciated.
Over NFS you face some problems with client side caching and stale data. I have written an OS independent lock module to work over NFS before. The simple idea of creating a [datafile].lock file does not work well over NFS. The basic idea to work around it is to create a lock file [datafile].lock which if present means file is NOT locked and a process that wants to acquire a lock renames the file to a different name like [datafile].lock.[hostname].[pid]. The rename is an atomic enough operation that works well enough over NFS to guarantee exclusivity of the lock. The rest is basically a bunch of fail safe, loops, error checking and lock retrieval in case the process dies before releasing the lock and renaming the lock file back to [datafile].lock
The classic solution is to use a lock file, or more accurately a lock directory. On all common OSs creating a directory is an atomic operation so the routine is:
try to create a lock directory with a fixed name in a fixed location
if the create failed, wait a second or so and try again - repeat until success
write your data to the real data file
delete the lock directory
This has been used by applications such as CVS for many years across many platforms. The only problem occurs in the rare cases when your app crashes while writing and before removing the lock.
Why not just build a simple server which sits between the file and the other computers?
Then if you ever wanted to change the data format, you would only have to modify the server, and not all of the clients.
In my opinion building a server would be much easier than trying to use a Network file system.
Lock File with a twist
Like other answers have mentioned, the easiest method is to create a lock file in the same directory as the datafile.
Since you want to be able to access the same file over multiple PC the best solution I can think of is to just include the identifier of the machine currently writing to the data file.
So the sequence for writing to the data file would be:
Check if there is a lock file present
If there is a lock file, see if I'm the one owning it by checking that its content has my identifier.
If that's the case, just write to the data file then delete the lock file.
If that's not the case, just wait a second or a small random length of time and try the whole cycle again.
If there is no lock file, create one with my identifier and try the whole cycle again to avoid race condition (re-check that the lock file is really mine).
Along with the identifier, I would record a timestamp in the lock file and check whether it's older than a given timeout value.
If the timestamp is too old, then assume that the lock file is stale and just delete it as it would mea one of the PC writing to the data file may have crashed or its connection may have been lost.
Another solution
If you are in control the format of the data file, could be to reserve a structure at the beginning of the file to record whether it is locked or not.
If you just reserve a byte for this purpose, you could assume, for instance, that 00 would mean the data file isn't locked, and that other values would represent the identifier of the machine currently writing to it.
Issues with NFS
OK, I'm adding a few things because Jiri Klouda correctly pointed out that NFS uses client-side caching that will result in the actual lock file being in an undetermined state.
A few ways to solve this issue:
mount the NFS directory with the noac or sync options. This is easy but doesn't completely guarantee data consistency between client and server though so there may still be issues although in your case it may be OK.
Open the lock file or data file using the O_DIRECT, the O_SYNC or O_DSYNC attributes. This is supposed to disable caching altogether.
This will lower performance but will ensure consistency.
You may be able to use flock() to lock the data file but its implementation is spotty and you will need to check if your particular OS actually uses the NFS locking service. It may do nothing at all otherwise.
If the data file is locked, then another client opening it for writing will fail.
Oh yeah, and it doesn't seem to work on SMB shares, so it's probably best to just forget about it.
Don't use NFS and just use Samba instead: there is a good article on the subject and why NFS is probably not the best answer to your usage scenario.
You will also find in this article various methods for locking files.
Jiri's solution is also a good one.
Basically, if you want to keep things simple, don't use NFS for frequently-updated files that are shared amongst multiple machines.
Something different
Use a small database server to save your data into and bypass the NFS/SMB locking issues altogether or keep your current multiple data files system and just write a small utility to concatenate the results.
It may still be the safest and simplest solution to your problem.
I don't know D, but I thing using a mutex file to do the jobe might work. Here's some pseudo-code you might find useful:
do {
// Try to create a new file to use as mutex.
// If it's already created, it will throw some kind of error.
mutex = create_file_for_writing('lock_file');
} while (mutex == null);
// Open your log file and write results
log_file = open_file_for_reading('the_log_file');
write(log_file, data);
close_file(log_file);
close_file(mutex);
// Free mutex and allow other processes to create the same file.
delete_file(mutex);
So, all processes will try to create the mutex file but only the one who wins will be able to continue. Once you write your output, close and delete the mutex so other processes can do the same.