how can I safely import files to sql server in ssis while new files are actively being written to the source directory? - sql

I need to import many xml files into sql server every day. I was thinking of running a for each loop container every few minutes to import the files to the db table and then move them to another directory, but sometimes over a dozen new files are written to the source folder every minute. Is it going to be an issue if the Package tries to loop through the folder at the exact moment new files are being written to the folder? If so, how can I work around this?

You could loop over the files in a script task and attempt to move them to a separate "ReadyToProcess" folder in a try/catch. Catch the IOException if the file is in use by another process, and continue on to the next file. The skipped file will be picked up on the next run. Then loop over the files in "ReadyToProcess" to read them into the database.

It seems like you know what files are finished writing and what files are still being modified which makes things a little easier. It is important to remember: if your SSIS task tries to open a file this currently being modified or used by another process the SSIS package will fail.
You can work around this by using a script task to generate a list of files in your source folder at a point in time and use a for or foreach loop to only fetch the files that are in the generated list. This would be in contrast to fetching everything that's in your source folders, as your post implies.
Other solutions would be to batch your incoming files and offset the package execution time so there isn't a risk of the file being exported to SQL as it's imported into your source folder.
For instance, loading your source documents in batches every 30 minutes: 1:00, 1:30, 2...
and execute your SSIS task every 30 minutes, but offset from the batch by 15 minutes: 1:15, 1:45, 2:15...
Lastly, if possible, run your SSIS package at a period where there will be no new files written to your source folder. While not always possible, if you knew there wouldn't be any new documents coming in at 2AM that'd be the best time to scheudle your SSIS package.

Related

SSIS - Why won't my Data Flow Task fail?

I've got a simple SSIS package that runs a 'foreach' loop, checking a folder for .csv files. It imports the contents of the CSV into a staging table where the columns map. On success of this, it moves the file to an archive folder appending the date. Where it fails, it is supposed to put the file into a failure folder.
However, i've tested with a random csv, that doesn't have column headings that match the mappings, and the data flow task DOESN'T fail & the file goes to the archive folder (of course the table isn't updated either). Any ideas as to why this is happening?
Here is the package:
Here is the data flow:
OK, I can do this.
Start with seven text files of input data, one of which contains error data.
The control flow executes like this.
The good files get moved to the ProcessedData folder.
The bad file gets moved to the ToReviewData folder.
The only setting you need to make is MaximumErrorCount on the Foreach Loop Container. Set this to a suitably high value.
I haven't changed any of the properties on the Load Cats task. In particular, you can see that FailPackageOnFailure is False; this is only required for checkpoints.
The precedence constraints are as you'd expect. Nothing clever here.
See training kit 70-463 > Chapter 4: Designing and Implementing Control Flow.

How to delete large file in Grails using Apache camel

I am using Grails 2.5. We are using Camel. I have folder called GateIn. In this delay time is 3minutes. So Every 3minutes , it will look into the folder for file. If the file exists, it will start to process. If the file is processed within 3 minutes, file get deleted automatically. Suppose my file takes 10minutes,file is not deleted.Again and again, it process the same file. How to make file get deleted whether it is small or bulk file. I have used noop= true to stop reuse of file. But i want to delete the file too once it is preocessed. Please give me some suggestion for that.
You can check the file size using camel file language and decide what to do next.
Usually, in this kind of small interval want to process a large size of file, it will be better to have another process zone (physical directory), you have to move the file after immediately consuming it to that zone.
You can have a separate logic or camel route to process the file. After successful process, you can delete or do appropriate step according to your requirement. Hope it helps !!

Monitoring a folder for a specific file

I have a program that uploads .txt or .rje files from a folder. Now when you put any other file format into the folder, like .jar, then the application crashes.
Now I cannot change the mechanics of the application, so I would like to know if there is a type of program/script that I can use that monitors the folder for any files that are non-txt/rje and then move them out of the folder once they are put there...
Is this possible using a script? (I do not want to use a .exe application to do this...not allowed to install 3rd party software onto the server this folder exists...)
Thank you
Your solution won't work as you have a race condition between the program doing the upload and the one doing the deletion. If upload runs first it still crashes.
The correct solution is to modify the upload program to cope with this scenario.
If that is not possible then the only safe work around would be to use a new folder to drop the files in, have a script run that constantly scans the folder and if a new file appears either move it to the processing folder or deletes it as appropriate.
(For the actual detection that's not my area of expertise but the simplest would be to have a bat file that just runs periodically (or even just runs once and loops with a wait, check, move, wait, check, move, etc) and processes everything in the folder when it runs).

How do I run data flow task successfully if certain files in the data flow doesn't exist

I have a data flow task that imports excel files. I cant use a for each loop to go through the excel files as the metadata for each excel file is completely different.
So in the data flow task I have 10 separate source files and use a union component to combine them then import it to SQL.
Problem i am facing now is sometimes certain excel files that i am importing might not exist so when my package runs it will fail as the file doesn't exist. So is there any way for me to create a check that allows the package run to skip the source file that doesn't exist and run the rest of the source files?
I am using SSIS 2005.
Suggestion: if the file doesn't exist, then create it first.
Have an empty version of each source file somewhere, and in your control flow (before the data flow), check to see if the files exist, and if they don't, copy the blank files to the location of the real files.
This article explains how to perform a check if file exists mechanism in SSIS:
http://www.bidn.com/blogs/DevinKnight/ssis/76/does-file-exist-check-in-ssis

Vb.net - FileMovement Lock on destination folder

Source code: vb.net
We are using File.Move() method to move the file from source to destination locaion.
But the destination location is being monitored by one tool, whenever we are moving files to the destination location, it will pick up the file and process it. The issue here when we try to move huge volume file like around 5GB file, the tool is immediately picking up the file and try to process it before the move operation is complete and send failure notice to all the users.. After again successfully moving the file completely, it picks up the same and process it sucessfully this time and send successful notice this time.
We can't have control over the tool which is monitoring the destination folder because it is a third party tool. However we want to find out the alternative option to place a lock over the destination foler like ReadWrite access till the move operation completes so that 3rd party will not be able to pick up or try to access that file.
Pls help us.
Not sure if it works, but you might be able to make the following work with directories as well:
FileOpen(1, "c:\file.ext", OpenMode.Binary)
Lock(1)
'Do something with file here
Unlock(1)
FileClose(1)
Reference and example here
I hope it helps.
First, I agree with #hometoast, sometimes tools like this just look for specific file extensions so you can copy in with a different file extension and then rename.
But barring that, download the file to a temp location and then Move the file into the dir getting watched. A Move does not recopy the file contents but just updates its pointers in the filesystem. Should be atomic.