I've got a simple SSIS package that runs a 'foreach' loop, checking a folder for .csv files. It imports the contents of the CSV into a staging table where the columns map. On success of this, it moves the file to an archive folder appending the date. Where it fails, it is supposed to put the file into a failure folder.
However, i've tested with a random csv, that doesn't have column headings that match the mappings, and the data flow task DOESN'T fail & the file goes to the archive folder (of course the table isn't updated either). Any ideas as to why this is happening?
Here is the package:
Here is the data flow:
OK, I can do this.
Start with seven text files of input data, one of which contains error data.
The control flow executes like this.
The good files get moved to the ProcessedData folder.
The bad file gets moved to the ToReviewData folder.
The only setting you need to make is MaximumErrorCount on the Foreach Loop Container. Set this to a suitably high value.
I haven't changed any of the properties on the Load Cats task. In particular, you can see that FailPackageOnFailure is False; this is only required for checkpoints.
The precedence constraints are as you'd expect. Nothing clever here.
See training kit 70-463 > Chapter 4: Designing and Implementing Control Flow.
Related
I am loading CSV files from a folder using Pentaho, and once files are loaded, I am making an entry into a table with the filenames that are loaded.
I need to put a check before loading a file if it is already loaded, for that I want to pick the filename and check with the names in the table that holds files which are already loaded. Since I am new to Pentaho, I am struggling to design this approach.
Please, suggest how should I go through to do this or if there is any totally different approach.
Your approach is valid. Make some book keeping of the processed filename in a database (you may also use a CSV file for that).
The difficulty with this approach is that the filename may not be in a field. So you have to write a master job to Add file name to results and give hand to a transformation that load the CSV (Press crtl-space in the box and find your variable in the drop down), check the database, with a Stream lookup, and Filter rows that are not matched. After the load, you 'Update' the bookkeeping table.
An other approach we used successfully in the past was to load the file form a directory and move the processed file into an other directory. This way it was easy to drop new files into a directory, and to retrieve processed file in case of problems.
This could be a start:
The Job
The transformation
Good day,
Tl;dr:
a) Is it feasibly possible to recover data from .lck file?
b) If .lck issue appears, the SAS would work around it.
We have automated mundane jobs running on SAS machines. Every now and then job fails. This sometimes leaves locked file behind. (< filename>.sas7bdat.lck instead of < filename>.sas7bdat file)
This issue prevents re-running the program as SAS sees that there is already specified filename and tries to access it, failing. Message:
Attempt to rename temporary member of <dataset> failed.
Currently we handle them by manually deleting the file and adjusting generation number.
Question is two folded: a) Is it feasibly possible to recover data from .lck file? b) If .lck issue appears, the SAS would work around it. (Note that we have a lot of jobs and inputting checking code in all of them is work intensive.)
The .sas7bdat.lck file is the one that SAS writes to as it's creating a data set. If the data step (or PROC) completes successfully, the original data set file is deleted and the .sas7bdat.lck file gets renamed to remove the .lck part. If any errors occur, the .lck file gets deleted and the original data set is left in place, unmodified. That's how SAS avoids overwriting existing data sets when errors occur.
Therefore, you should be able to just rename the file to remove the .lck, or maybe rename it to damaged.sas7bdat for example, and then try accessing the file. You can try a PROC DATASETS REPAIR (https://v8doc.sas.com/sashtml/proc/z0247721.htm) if you really need to get whatever data might be present.
The best solution will obviously be to correct whatever fault is causing your jobs to bomb out like this in the first place. No SAS program should ever leave .lck files lying about, even if it encounters errors - your jobs must actually be crashing the SAS environment itself, or perhaps they're being killed prematurely by another process. Simply accepting that this happens and trying to work around it is likely to just be storing up more problems for the future.
I have a data flow task that imports excel files. I cant use a for each loop to go through the excel files as the metadata for each excel file is completely different.
So in the data flow task I have 10 separate source files and use a union component to combine them then import it to SQL.
Problem i am facing now is sometimes certain excel files that i am importing might not exist so when my package runs it will fail as the file doesn't exist. So is there any way for me to create a check that allows the package run to skip the source file that doesn't exist and run the rest of the source files?
I am using SSIS 2005.
Suggestion: if the file doesn't exist, then create it first.
Have an empty version of each source file somewhere, and in your control flow (before the data flow), check to see if the files exist, and if they don't, copy the blank files to the location of the real files.
This article explains how to perform a check if file exists mechanism in SSIS:
http://www.bidn.com/blogs/DevinKnight/ssis/76/does-file-exist-check-in-ssis
I'm trying to load data from my database into an excel file of a standard template. The package is ready and it's running, throwing a couple of validation warnings stating that truncation may occur because my template has fields of a slightly smaller size than the DB columns i've matched them to.
However, no data is getting populated to my excel sheet.
No errors are reported, and when I click preview for my OLE DB source, it's showing me rows of results. None of these are getting populated into my excel sheet though.
You should first make sure that you have data coming through the pipeline. In the arrow connecting your Source task to Destination task (I'm assuming you don't have any steps between), double click and you'll open the Data Flow Path Editor. Click on Data Viewer, then Add and click OK. That will allow you to see what is moving through the pipeline.
Something to consider with Excel is that is prefers Unicode data types to Non-Unicode. Chances are you have a database collation that is Non-Unicode, so you might have to convert the values in a Data Conversion task.
ALSO, you may need to force the package to execute in 32bit runtime. The VS application develops in a 32bit environment, so the drivers you have visibility to are 32bit. If there is no 64bit equivalent, it will break when you try and run the package. Right click on your project and click Properties and under the Debug menu you'll need to change the setting Run64BitRuntime to FALSE.
you dont provide much informatiom. Add a Data View between your source and your excel destination to see if data is passing through. Do do it, just double click the data flow path, select data view and then add a grid.
Run your app. If you see data, provide more details so we can help you
Couple of questions that may lead to an answer:
Have you checked that data is actually passed through the SSIS package at run time?
Have you double checked your mapping?
Try converting within the package so you don't have the truncation issue
If you add some more details about what you're running, I may be able do give a better answer.
EDIT: Considering what you wrote in your comment, I'd defiantly try the third option. Let us know if this doesn't solve the problem.
Just as an assist for anyone else running into this - I had a similar issue and beat my head against the wall for a long time before I found out what was going on. My export WAS writing data to the file, but because I was using a template file as the destination, and that template file had previous data that had been deleted, the process was appending the data BELOW the previously used rows. So, I was writing out three lines of data, for example, but the data did not start until row 344!!!
The solution was to select the entire spreadsheet in my template file, and delete every bit of it so that I had a completely clean sheet to begin with. I then added my header lines to the clean sheet and saved it. Then I ran the data flow task and...ta-daa!!! Perfect export!
Hopefully this will help some poor soul who runs into this same issue in the future!
Okay, I'll try to explain as good as I can... Quite a particular case.
Tools: SSIS 2008
We have a control flow that now needs to be triggered by an event: the presence of one or multiple files. (1,2 or 3)
The variables used:
BO_FileLocation_1
BO_FileLocation_2
BO_FileLocation_3
BO_FileName_1
BO_FileName_2
BO_FileName_3
There can be one, two or three files: defined in above variables. When they are filled in,
they should be processed. When they are empty, this means there's just one file file, the process should ignore them and jump to the next (file watcher?) task.
For example:
BO_FileLocation_1= "C:\"
BO_FileLocation_2 NULL
BO_FileLocation_3 NULL
BO_FileName_1= "test.csv"
BO_FileName_2 NULL
BO_FileName_3 NULL
The report only needs one file.
I'd need a generic concept that checks the presence of these files, it could be more generic than my SSIS knowledge can handle right now. For example handy, when there's a 4th file in the future. I was also thinking to work with a single script to handle all the logic.
Thanks in advance
A possibly irrelevant image:
If all you want is to trigger the Copy Source File to handle if one or more of the files is present, just use the OR Constraint in your flow. The following image shows you how:
First connect all to the destination:
Then click one of the green arrows. This will make its properties window pop up. Select the Logical ORinstead of the Logical AND:
If everything went well, you should now see the connections as dashed lines:
There are several possible solutions:
Create a sequence container and include all the file imports in the sequence container. Add int variables for RowCountFile1, RowCountFile2, and RowCountFile3 and set the value to 0 (this is the default value when you create an int variable). Add a RowCount transformation to each of the data flows. Create a precedence constraint from the sequence container to the "Do something" task. Set the precedence constraint to success and expression. Set the expression value to #RowCountFile1 > 0 || #RowCountFile2 > 0 || #RowCountFile3 > 0. The advantage of this approach is that you can take an action as soon as the files are detected, you import all available files, and you only take an action after all the files have been imported. You could then schedule running this SSIS package as a SQL Server Agent job step and run it as frequently as you want.
A variant on solution 1 is to use for each file enumerator containers inside the sequence container. This would be useful if you don't know the exact name of the file and you expect to import more than one under some circumstances. For instance, if you get a file every few minutes with a timestamp in its file name and your process doesn't run for some reason, then you may have to process multiple files to get caught up and then take an action once it has been done.
You could use the file watcher task as you outlined in your question. The only problem I have with the file watcher task is that the package has to be in a constantly running state. This makes it hard to troubleshoot problems and performance. It also can introduce other problems since I remember having some problems with the file watcher task years ago when it first came out. It may well be a totally stable task now, but I prefer other methods over the task after having been burned previously. If you really want the package to run continously instead of having it be called by a job, then you could always use a script task to check for file, sleep thread if not found, check again, etc. I'm sure that's what the file watcher task does, but I would trust my own C# over the task. Power to anyone who has had better experiences than me with File Watcher...
Use PowerShell. If you just want to take an action if a file appears and you aren't importing the data, then a PowerShell script could do this just as well as a SSIS package. The drawback is that you have to learn some basic PowerShell, it may be hard to maintain in the future since PowerShell is probably not your bread and butter core language, and you may have to rewrite the code again to a SSIS package if you want to import the data. You would probably call the PowerShell script from a SQL Server Agent job step, so scheduling can be handled pretty easily.
There are more options than what I listed, so let me know if you still want more suggestions.