How to get part of file name parsed to the columnns in SSIS - sql

Trying to use this:
substring(#[User::v_Filename],37,3)
However, it seems substring can only handle 20 characaters ?
The file name looks like this:
D:\Projects\OTS\MYSSA Dashboard\Data\ATL_20150725Text.csv
All I want is the ATL Portion
But when the ssis moves to the next file, it may change to NYC or DAL, there are about 26 files to be processed all from different regions.

Test the substring without your file variable, that uses the substring function in an expression.
Example - create a variable with the expression, then click "evaluate":
substring("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 25, 2)
It will work fine.
Given that the substring function works fine, there must be something wrong with your [User::v_Filename] variable. Are you sure it is being set correctly? Perhaps you should try running BIDS with the debugger on and a breakpoint set to right after you assign the filename, and verify that it indeed is being set correctly.

Related

Use Pentaho Variable to Dynamically name EXCEL file

I am trying to dynamically name an excel file after processing it for archiving purposes.
If I process Logistics.xlsx I want to save it as U:\Archive\${varDP}.xlsx
Resulting file name U:\Archive\20190709.xlsx
I have tried Get system variable to get the date, This works fine. I have created the field (DateProcessed). However, I am unable to Set variables varDP to DateProcessed.
Thank you
You cannot set and use a variable in the same transformation. If you want to use a variable you should have a job with two transformations: first transformation gets the date and sets the variable; second transformation can then use the variable.
The main reason for that is that all steps initialise at the same time. Therefore, when the variable is read by the step that is using it, it's probably not set yet.
For these cases of Variables usage and passing parameters, i've been forwarding this previous answer, it has a link to another answer of mine where i go step by step of how to pass parameters to another Transformation without 'Set Variables', and in the linked answerm i have sent a downloadable example.

Snakemake: Pull sample-specific information from config sheet

In my workflow, I have a sample sheet that contains all the samples that are supposed to be analysed + the path where to find input files + the reference genome that is supposed to be used. All of this is sample-specific.
In my config file, I have a list of reference genomes and for each of them a list of paths of files depending on the tool.
In the rule that performs the alignment of each sample, I need to load some of those files but in a sample-specific way because the reference genome might not be the same for all samples.
Here is how I tried to solve this:
params: reference=lambda wildcards: table_samples['reference'][wildcards.sample],
chrom_sizes=config[reference]['chrom_sizes']
However, when I try to run it like this, I get an error (directly when running Snakemake) saying that reference in the line of chrom_sizes=... is not defined.
Does anybody have an idea of a workaround?
EDIT: Some more information because I guess it's not really clear what I meant. Here is the relevant part of my config file.
hg19:
bwa: 'path/to/hg19/bwa/reference'
samtools: 'path/to/hg19/samtools/reference'
chrom_sizes: '...'
mm9:
bwa: 'path/to/mm9/bwa/reference'
samtools: 'path/to/mm9/samtools/reference'
chrom_sizes: '...'
And here is an example of the sample sheet.
name path reference
sample1 path/to/sample1 mm9
So, in the line reference=lambda wildcards: table_samples['reference'][wildcards.sample] I load the respective reference to be used for the current sample. Then, in chrom_sizes=config[reference]['chrom_sizes'] I need to use reference as a variable to get chrom_sizes for the correct reference genome.
I hope this makes it a bit more clear.
This is probably a ugly solution but should work.
params:
reference = table_samples['reference']['{sample}']
chrom_sizes = config[table_samples['reference']['{sample}']]['chrom_sizes']
You were defining a variable under params and attempted to pass its value within params itself; I'm not sure Snakemake can do that.
You forgot to put quotes around the reference key. Like you write it, Python interprets it as a variable.
chrom_sizes=config['reference']['chrom_sizes']
Alright, taking the information from your comments I was able to make it work. I just had to modify them a little.
As I added to my original post, I actually needed reference to be a variable in order to pull the information for every sample individually.
As #JeeYem suggested, I tried to do the following:
chrom_sizes = config[table_samples['reference']['{sample}']]['chrom_sizes']
However, it seems not to be possible to use {sample} in this context. Instead, I changed it like this:
chrom_sizes = lambda wildcards: config[table_samples['reference'][wildcards.sample]]['chrom_sizes']
For now, it works! Thanks for everyone for the contribution!

Extracting information from a file variable in d3 pick basic

I have a file variable in d3 pick basic and I am trying to figure out what file it corresponds to.
I tried the obvious thing which was to say:
print f *suppose the file variable's name is f in this case
but that didn't work, because:
SELECTION: 58[B34] in program "FILEPRINTER", Line 7: File variable used
where string expression expected.
I also tried things like:
list f *didn't compile
execute list dict f *same error
execute list f *same error
but those also did not work.
In case any one is wondering, the reason I am trying to do this in the first place is that there is a global variable that is passed up and down in the code base I am working with, but I can't find where the global variable gets its value from.
That file pointer variable is called a "file descriptor". You can't get any information from it.
You can use the file-of-files to log Write events, and after a Write is performed by the code, check to see what file was updated. The details for doing this would be a bit cumbersome. You really should rely on the Value-Add Reseller or contract with competent assistance for this.
If this is not a live end-user system, you can also modify an item getting written with some very unique text like "WHAT!FILE!IS!THIS?". Then you can do a Search-System command to search the entire account (or system) to find that text. See docs for proper use of that command.
This is probably the best option... Inject the following:
IF #USER = "CRISZ" THEN ; * substitute your user ID
READU FOO FROM F,"BLAH" ELSE
DEBUG
RELEASE F,"BLAH"
END
END
That code will stop only for one person - for everyone else it will flow as normal. When it does stop, use the LIST-LOCKS command to see which file has a read lock for item "BLAH". That's your file! Don't forget to remove and recompile the code. Note that recompiling code while users are actively using it results in aborts. It's best to do this kind of thing after hours or on a test system.
If you can't modify the code like that, diagnostics like this can be difficult. If the above suggestions don't help, I think this challenge might be beyond your personal level of experience yet and recommend you get some help.
If suggestion here Does help, please flag this as the answer. :)

How to parse a date from an SSIS Excel filename

I want to use the foreach container to iterate through a folder matching something like: "Filename_MMYYYY.xls". That's easy enough to do; but I can't seem to find a way to parse the MMYYYY from the filename and add it to a variable (or something) that i can use as a lookup field for my DimDate table. It seems possible with a flat file data source, but not an excel connection. I'm using Visual Studio 2005. Please help!
Do I understand correctly that you want to take your filename, deconstruct it, and get a date-typed variable out of it? If so, then you need to start with the filename variable that you get from the Foreach Loop - I'll call that variable #FileName.
First, make a new variable - #FileDate - as a DateTime type. Go to its properties window (F4), and set the EvaluateAsExpression property to True. Edit the expression, and type in something like this (you may need to tweak):
(DT_DBTIMESTAMP)(SUBSTRING(#FileName, 12, 4) + "-" + SUBSTRING(#FileName, 10, 2) + "-01")
Now, if you want to take that date value and use it in your Data Flow, you can just use it straight in a Derived Column transform, or in an expression on your Lookup SQL statement, or wherever.

SSIS save string variable to text file

It seems like it should be simple but as of yet I havent found a way to save the value stored in an SSIS string variable to a text file. I've looked at using the flat file destination inside of a data flow but that requires a data flow source.
Any ideas on how to do this?
Use a script task.
I just tried this. I created a File connection manager, with the connection string pointing to the file I wanted to write to. I then created a string variable containing the text to write.
I added a Script Task, specified my string variable in the Read Only Variables list, then clicked Edit Script. The script was as follows:
public void Main()
{
ConnectionManager cm = Dts.Connections["File.tmp"];
var path = cm.ConnectionString;
var textToWrite = (string)Dts.Variables["User::StringVariable"].Value;
System.IO.File.WriteAllText(path, textToWrite);
Dts.TaskResult = (int)ScriptResults.Success;
}
This worked with no problems.
Here's a little sample of some code that worked in a SQL CLR in C#. You'll need to use VB if you're on 2005 I believe. The script task also needs the read variable property set to MyVariable to make the value of your variable available to it.
// create a writer and open the file
TextWriter tw = new StreamWriter("\\\\server\\share$\\myfile.txt");
// write a line of text to the file
tw.WriteLine(Dts.Variables["MyVariable"].Value);
// close the stream
tw.Close();
All it takes is one line of code in a simple Script task. No other dependencies, such as a connection manager, are needed.
Here's what it would look like in C#:
public void Main()
{
string variableValue = Dts.Variables["TheVariable"].Value.ToString();
string outputFile = Dts.Variables["Path"].Value.ToString();
System.IO.File.WriteAllText(outputFile, variableValue);
Dts.TaskResult = (int)ScriptResults.Success;
}
Obviously the most important line here is the one containing the WriteAllText function call.
The Path variable should contain a full path + filename for the output file.
Ok, I have an answer that doesn't involve use of script task. Pick some oledb sql source you have that's simple and you have a lot of control over. Make a query that returns only one row. Then put this query in a string variable:
"select vara, ' var =: " + #[User:varIWantToSee] + "' as myvar from tablea where vara = 1"
Then in OLEDB source pick "SQL command from a variable"
For varIWantToSee make sure you initialize it with a lot characters or ssis makes a very small length for that column that it doesn't let you override. At run time varIWantToSee will get set and you can see it. Pump this all into a flat file destination and you are in business. Why do some people have to do this? Because some people need to know the value of the variables in the runtime environment, their laptop development doesn't show the variable values they need. In my case I was running this on an Azure environment that had the database accesses I needed to test. If I were microsoft I would create a task that shows the runtime variable value at that stage of the job by writing it to the ssis log file created when the package runs. If someone knows how to do that, please enlighten us.
It's possible to use a Derived Column transformation to write the value of a variable into a column. The problem is that it needs a source to drive it, and there's no stock data source you can use that just spits out a null row onto the pipeline.
So, either you repurpose a single-row source to drive the derived column transformation, or you do what another answer suggests, and do it with a Script source.
I did it the way you described. I already had a oledb connection manager defined so I used an OLE DB Source and used the SQL Command data access mode. I used a simple query:
select getdate() as dt
...just to get it out of the way. Now I know the date of my variable pull. Then I used a Derived Column Transform to make my package variables available and wrote it out to a flat file.
Elegant? No, but it gets the job done.
Lets say you don't want to mess with Script tasks and you don't have a database you can connect to just to issue a data source command like:
SELECT 'Some arbitrary text'
There are still several ways to use a Process task for something as simple as writing a line of text to a file. For example you can use PowerShell with an input variable built using the following expression:
"'"+REPLACE(#[User::Text],"'","''")+"' > '"+REPLACE(#[User::Filename],"'","''")+"'"
Notice I escaped the filename because single quotes are legal there. Also note I used '>' for redirecting which overwrites the file if it exists. If I wanted to append I'd use '>>'.
Initially I had trouble with this method when User::Text contained multiple lines. It turns out you need some extra EOL characters after your filename when a command spans lines. Like this:
"'"+REPLACE(#[User::Text],"'","''")+"' > '"+REPLACE(#[User::Filename],"'","''")+"'\r\n\r\n"
Using cmd.exe with echo is a bit more precarious but can also work in certain circumstances and has much less overhead.
P.S. I've noticed with some versions of PowerShell that StandardInputVariable content is ignored without this:
-Command -
in the Arguments box. A lone minus sign as a Command argument is 'magic' and documented at https://learn.microsoft.com/en-us/powershell/scripting/powershell.exe-command-line-help. I believe all versions of PowerShell accept this param so even if it's not required for your version you may want to include it since it shouldn't break anything and may keep your code from breaking if PowerShell is updated to a version that requires it.