Breaking/opening files after failed jobs (sas7bdat.lck issue) - automation

Good day,
Tl;dr:
a) Is it feasibly possible to recover data from .lck file?
b) If .lck issue appears, the SAS would work around it.
We have automated mundane jobs running on SAS machines. Every now and then job fails. This sometimes leaves locked file behind. (< filename>.sas7bdat.lck instead of < filename>.sas7bdat file)
This issue prevents re-running the program as SAS sees that there is already specified filename and tries to access it, failing. Message:
Attempt to rename temporary member of <dataset> failed.
Currently we handle them by manually deleting the file and adjusting generation number.
Question is two folded: a) Is it feasibly possible to recover data from .lck file? b) If .lck issue appears, the SAS would work around it. (Note that we have a lot of jobs and inputting checking code in all of them is work intensive.)

The .sas7bdat.lck file is the one that SAS writes to as it's creating a data set. If the data step (or PROC) completes successfully, the original data set file is deleted and the .sas7bdat.lck file gets renamed to remove the .lck part. If any errors occur, the .lck file gets deleted and the original data set is left in place, unmodified. That's how SAS avoids overwriting existing data sets when errors occur.
Therefore, you should be able to just rename the file to remove the .lck, or maybe rename it to damaged.sas7bdat for example, and then try accessing the file. You can try a PROC DATASETS REPAIR (https://v8doc.sas.com/sashtml/proc/z0247721.htm) if you really need to get whatever data might be present.
The best solution will obviously be to correct whatever fault is causing your jobs to bomb out like this in the first place. No SAS program should ever leave .lck files lying about, even if it encounters errors - your jobs must actually be crashing the SAS environment itself, or perhaps they're being killed prematurely by another process. Simply accepting that this happens and trying to work around it is likely to just be storing up more problems for the future.

Related

SQL Server - insufficient memory (mscorlib) / 'the operation could not be completed'

I have been working on building a new database. I began by building the structure within the database it is replacing and populating this as I created each set of tables. Once I had made additions I would drop what had been created and execute the code to build the structure again and a separate file to insert the data. I repeated this until the structure and content was complete to ensure each stage was as I intended.
The insert file is approximately 30mb with 500,000 lines of code (I appreciate this is not the best way to do this but for various reasons I cannot use alternative options). The final insert completed and took approximately 30 minutes.
A new database was created for me, the structure executed successfully but the data would not insert. I received the first error message shown below. I have looked into this and it appears I need to use the sqlcmd utility to get around this, although I find it odd as it worked in the other database which is on the same sever and has the same autogrow settings.
However, when I attempted to save the file after this error I received the second error message seen below. When I selected OK it took me to my file directory as it would if I selected Save As, I tried saving in a variety of places but received the same error.
I attempted to copy the code into notepad to save my changes but the code would not copy to the clipboard. I accepted I would lose my changes and rebooted my system. If I reopen this file and attempt to save it I receive the second error message again.
Does anyone have an explanation for this behaviour?
Hm. This looks more like an issue with SSMS and not the SQL Server DB/engine.
If you've been doing few times, possibly Management Studio ran out of RAM?
Have you tried breaking INSERT into batches/smaller files?

Watch folder for files being Read

I am trying to watch files in a directory to determine when files are opened/accessed. I thought FileSystemWatcher would do the trick using the event Changed.
Problem is that some applications do not create a lock on the file they open/access or change either the date modified or date accessed (even after fsutil behavior set disablelastaccess 0). Notepad for example. Apparently is makes a copy of the file in memory and plays with it there until you save it. Nor does it update the Date Accessed.
How can I monitor a directory of files and be notified when a file is simply opened/accessed by any program (e.g. Notepad)? Files may be opened from another computer, not necessarily on the computer running the "watcher".
I found lots of similar questions but did not see one focusing on file "access".
This is quite normal. Updating an existing file is quite dangerous since it can cause irretrievable data loss. A disk error (like disk full) while writing is very bad news. The common algorithm used:
rename the original file
write a new file using the original name
no error: delete the renamed file
error: delete the new file, rename original file back
Clearly this doesn't cause a Changed event to be raised, no file was changed.
Sorry, I didn't read the question well enough. There is no notification whatsoever for an app just opening a file for reading. FSW can only detect changes to the file system. There is no ready alternative either, this requires a custom file system filter driver that snoops on driver requests. Like the kind that SysInternals' ProcMon utility uses. I'm not aware of such a driver ready for use in a C# program, you can't write them in C# either. This just isn't a common requirement.

Platform independent file locking?

I'm running a very computationally intensive scientific job that spits out results every now and then. The job is basically to just simulate the same thing a whole bunch of times, so it's divided among several computers, which use different OSes. I'd like to direct the output from all these instances to the same file, since all the computers can see the same filesystem via NFS/Samba. Here are the constraints:
Must allow safe concurrent appends. Must block if some other instance on another computer is currently appending to the file.
Performance does not count. I/O for each instance is only a few bytes per minute.
Simplicity does count. The whole point of this (besides pure curiosity) is so I can stop having every instance write to a different file and manually merging these files together.
Must not depend on the details of the filesystem. Must work with an unknown filesystem on an NFS or Samba mount.
The language I'm using is D, in case that matters. I've looked, there's nothing in the standard lib that seems to do this. Both D-specific and general, language-agnostic answers are fully acceptable and appreciated.
Over NFS you face some problems with client side caching and stale data. I have written an OS independent lock module to work over NFS before. The simple idea of creating a [datafile].lock file does not work well over NFS. The basic idea to work around it is to create a lock file [datafile].lock which if present means file is NOT locked and a process that wants to acquire a lock renames the file to a different name like [datafile].lock.[hostname].[pid]. The rename is an atomic enough operation that works well enough over NFS to guarantee exclusivity of the lock. The rest is basically a bunch of fail safe, loops, error checking and lock retrieval in case the process dies before releasing the lock and renaming the lock file back to [datafile].lock
The classic solution is to use a lock file, or more accurately a lock directory. On all common OSs creating a directory is an atomic operation so the routine is:
try to create a lock directory with a fixed name in a fixed location
if the create failed, wait a second or so and try again - repeat until success
write your data to the real data file
delete the lock directory
This has been used by applications such as CVS for many years across many platforms. The only problem occurs in the rare cases when your app crashes while writing and before removing the lock.
Why not just build a simple server which sits between the file and the other computers?
Then if you ever wanted to change the data format, you would only have to modify the server, and not all of the clients.
In my opinion building a server would be much easier than trying to use a Network file system.
Lock File with a twist
Like other answers have mentioned, the easiest method is to create a lock file in the same directory as the datafile.
Since you want to be able to access the same file over multiple PC the best solution I can think of is to just include the identifier of the machine currently writing to the data file.
So the sequence for writing to the data file would be:
Check if there is a lock file present
If there is a lock file, see if I'm the one owning it by checking that its content has my identifier.
If that's the case, just write to the data file then delete the lock file.
If that's not the case, just wait a second or a small random length of time and try the whole cycle again.
If there is no lock file, create one with my identifier and try the whole cycle again to avoid race condition (re-check that the lock file is really mine).
Along with the identifier, I would record a timestamp in the lock file and check whether it's older than a given timeout value.
If the timestamp is too old, then assume that the lock file is stale and just delete it as it would mea one of the PC writing to the data file may have crashed or its connection may have been lost.
Another solution
If you are in control the format of the data file, could be to reserve a structure at the beginning of the file to record whether it is locked or not.
If you just reserve a byte for this purpose, you could assume, for instance, that 00 would mean the data file isn't locked, and that other values would represent the identifier of the machine currently writing to it.
Issues with NFS
OK, I'm adding a few things because Jiri Klouda correctly pointed out that NFS uses client-side caching that will result in the actual lock file being in an undetermined state.
A few ways to solve this issue:
mount the NFS directory with the noac or sync options. This is easy but doesn't completely guarantee data consistency between client and server though so there may still be issues although in your case it may be OK.
Open the lock file or data file using the O_DIRECT, the O_SYNC or O_DSYNC attributes. This is supposed to disable caching altogether.
This will lower performance but will ensure consistency.
You may be able to use flock() to lock the data file but its implementation is spotty and you will need to check if your particular OS actually uses the NFS locking service. It may do nothing at all otherwise.
If the data file is locked, then another client opening it for writing will fail.
Oh yeah, and it doesn't seem to work on SMB shares, so it's probably best to just forget about it.
Don't use NFS and just use Samba instead: there is a good article on the subject and why NFS is probably not the best answer to your usage scenario.
You will also find in this article various methods for locking files.
Jiri's solution is also a good one.
Basically, if you want to keep things simple, don't use NFS for frequently-updated files that are shared amongst multiple machines.
Something different
Use a small database server to save your data into and bypass the NFS/SMB locking issues altogether or keep your current multiple data files system and just write a small utility to concatenate the results.
It may still be the safest and simplest solution to your problem.
I don't know D, but I thing using a mutex file to do the jobe might work. Here's some pseudo-code you might find useful:
do {
// Try to create a new file to use as mutex.
// If it's already created, it will throw some kind of error.
mutex = create_file_for_writing('lock_file');
} while (mutex == null);
// Open your log file and write results
log_file = open_file_for_reading('the_log_file');
write(log_file, data);
close_file(log_file);
close_file(mutex);
// Free mutex and allow other processes to create the same file.
delete_file(mutex);
So, all processes will try to create the mutex file but only the one who wins will be able to continue. Once you write your output, close and delete the mutex so other processes can do the same.

How to reliably handle files uploaded periodically by an external agent?

It's a very common scenario: some process wants to drop a file on a server every 30 minutes or so. Simple, right? Well, I can think of a bunch of ways this could go wrong.
For instance, processing a file may take more or less than 30 minutes, so it's possible for a new file to arrive before I'm done with the previous one. I don't want the source system to overwrite a file that I'm still processing.
On the other hand, the files are large, so it takes a few minutes to finish uploading them. I don't want to start processing a partial file. The files are just tranferred with FTP or sftp (my preference), so OS-level locking isn't an option.
Finally, I do need to keep the files around for a while, in case I need to manually inspect one of them (for debugging) or reprocess one.
I've seen a lot of ad-hoc approaches to shuffling upload files around, swapping filenames, using datestamps, touching "indicator" files to assist in synchronization, and so on. What I haven't seen yet is a comprehensive "algorithm" for processing files that addresses concurrency, consistency, and completeness.
So, I'd like to tap into the wisdom of crowds here. Has anyone seen a really bulletproof way to juggle batch data files so they're never processed too early, never overwritten before done, and safely kept after processing?
The key is to do the initial juggling at the sending end. All the sender needs to do is:
Store the file with a unique filename.
As soon as the file has been sent, move it to a subdirectory called e.g. completed.
Assuming there is only a single receiver process, all the receiver needs to do is:
Periodically scan the completed directory for any files.
As soon as a file appears in completed, move it to a subdirectory called e.g. processed, and start working on it from there.
Optionally delete it when finished.
On any sane filesystem, file moves are atomic provided they occur within the same filesystem/volume. So there are no race conditions.
Multiple Receivers
If processing could take longer than the period between files being delivered, you'll build up a backlog unless you have multiple receiver processes. So, how to handle the multiple-receiver case?
Simple: Each receiver process operates exactly as before. The key is that we attempt to move a file to processed before working on it: that, and the fact the same-filesystem file moves are atomic, means that even if multiple receivers see the same file in completed and try to move it, only one will succeed. All you need to do is make sure you check the return value of rename(), or whatever OS call you use to perform the move, and only proceed with processing if it succeeded. If the move failed, some other receiver got there first, so just go back and scan the completed directory again.
If the OS supports it, use file system hooks to intercept open and close file operations. Something like Dazuko. Other operating systems may let you know about file operations in anoter way, for example Novell Open Enterprise Server lets you define epochs, and read list of files modified during an epoch.
Just realized that in Linux, you can use inotify subsystem, or the utilities from inotify-tools package
File transfers is one of the classics of system integration. I'd recommend you to get the Enterprise Integration Patterns book to build your own answer to these questions -- to some extent, the answer depends on the technologies and platforms you are using for endpoint implementation and for file transfer. It's a quite comprehensive collection of workable patterns, and fairly well written.

The strategy to get recovery from broken files?

Me and my colleague are trying to implement a mechanism to get recovery from broken files on an embedded equipment.
This could be happened during certain circumstances, e.g. user takes off the battery during file writing.
Orz, but now we have just one idea:
Create duplicated backup files, and copy them back if dangerous file i/o is not finished properly.
This is kind of stupid, as if the backup files also broken, we are just dead.
Do you have any suggestions or good articles on this?
Thanks in advance.
Read up on database logging and database journal files.
A database (like Oracle) has very, very robust file writing. Do not actually use Oracle. Use their design pattern. The design pattern goes something like this. You can borrow these ideas without actually using the actual product.
Your transaction (i.e., Insert) will fetch the block to be updated. Usually this is in memory cache, if not, it is read from disk to memory cache.
A "before image" (or rollback segment) copy is made of the block you're about to write.
You change the cache copy, write a journal entry, and queue up a DB write.
You commit the change, which makes the cache change visible to other transactions.
At some point, the DB writer will finalize the DB file change.
The journal is a simple circular queue file -- the records are just a history of changes with little structure to them. It can be replicated on multiple devices.
The DB files are more complex structures. They have a "transaction number" -- a simple sequential count of overall transactions. This is encoded in the block (two different ways) as well as written to the control file.
A good DBA assures that the control file is replicated across devices.
When Oracle starts up, it checks the control file(s) to find which one is likely to be correct. Others may be corrupted. Oracle checks the DB files to see which match the control file. It checks the journal to see if transactions need to be applied to get the files up to the correct transaction number.
Of course, if it crashes while writing all of the journal copies, that transaction will be lost -- not much can be done about that. However, if it crashes after the journal entry is written, it will probably recover cleanly with no problems.
If you lose media, and recover a backup, there's a chance that the journal file can be applied to the recovered backup file and bring it up to date. Otherwise, old journal files have to be replayed to get it up to date.
Depends on which OS etc. etc. but in most cases what you can do is copy to a temporary file name and as the last final step rename the files to the correct name.
This means the (WOOPS) Window of Opertunity Of Potential S****p is confined to the interval when the renames take place.
If the OS supports a nice directory structure and you lay out the files intelligently you can further refine this by copying the new files to a temp directory and renaming the directory so the WOOPS becomes the interval between "rename target to save" and "rename temp to target".
This gets even better if the OS supports Soft link directories then you can "ln -s target temp". On most OSes replacing a softlink will be an "atomic" operation which will work or not work without any messy halfway states.
All these options depend on having enough storage to keep a complete old and new copy on the file system.