Inconsistent line endings in SSIS Flat File import - sql

I have a large, pipe delineated text file with no text qualifiers, and it looks like whatever spit out this file accidentally spit out false "LF" markers in the last column every few hundred rows.
The last column is a descriptive column, and It is not text qualified in any way like it should be.
file looks similar to this:
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Descr[LF]
iption[LF]
id|data|data|data|data|Description[LF]
Id|data|data|data|data|Description[LF]
id|data|data|data|data|Descripti[LF]
on[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|D[LF]
escription[LF]
I'm pretty new to SSIS and SQL in general, Does anyone have any advice on how to fix this?

I did actually find a way to fix it in Notepad++, because I don't know C# and I don't know SSIS well enough..
The ID was 8 Digits long, and followed by 7 Blank spaces. That was absolutely unique to this file.
In notepad++ I used (Find Extended) to search and replace "\n"(LF) with nothing
then I used the this expression for find:
(\d\d\d\d\d\d\d\d[[:blank:]][[:blank:]][[:blank:]][[:blank:]][[:blank:]][[:blank:]][[:blank:]])
to find all 8 digit numbers with 7 trailing spaces, and for replace, used this:
\r\n\1
to put a [CR][LF] in front of those 8 digit numbers.
Lo and behold it worked!
But either way.. My boss contacted the client and is requesting a better file. Now I get kudos, and we get proper data. Thanks for the advice all!

If I had to take a guess, I would say that this is occurring because of how the file is created... you are probably having data that just happens to include certain special characters which are being incorrectly interpreted as a Line Feed.
Check this site to see if the data within your problem lines match any of these encodings. If this is the case then ultimately you have two options available:
1) Create some elaborate and complicated ETL process to detect and correct the file data before you process it. This is inadvisable as it will be a major pain to create and maintain.
2) Try changing the way this file is produced. Most text export wizards will allow you to place quotes (") around text items so that your import process can quickly detect something as a text block as opposed to a series of encoded characters to interpret.

Related

Pentaho - Spoon Decimal from Text File Input

I'm new to Pentaho and have a little problem with the Text file Input.
Currently I have to have several data records written to a database. In the files, the decimal numbers are separated by a point.
Pentaho is currently transforming the number 123.3659 € to 12.33 €.
Can someone help?
When you read the file, do you read it as a csv, excel or something like that? If that's the case, then you can specify the format of the column to interpret the number correctly (I think, I'm talking from memory now) Or maybe playing with the language of the file might work.
If it's a file containing a string, you can use some step like the string operator to replace the point with a comma.
This problem might come from various reasons.
Although I think that by following the next steps you can solve the issue.
-First, you must get a "Replace in String" step;
-Then search for the dot and replace it with nothing as I show in the following image, or with a coma if the number you show is a float;
Example snip
Hope this helped!
Give feedback if so!
Have a good day!

Multi-line text in a .env file

In vue, is there a way to have a value span multiple lines in an .env file. Ex:
Instead of:
someValue=[{"someValue":"Here is a really really long piece which should be split into multiple lines"}]
I want to do something like:
someValue=`[{"someValue":"Here is a really
really long piece which
should be split into multiple lines"}]`
Doing the latter gives me a JSON parsing error if I try to do JSON.parse(someValue) in my code
I don't know if this will work, but I can't format a comment appropriately enough to get the point across so see if this will work:
someValue=[{"someValue":"Here is a really\
really long piece which\
should be split into multiple lines"}]
Where "\" should escape the newline similar to how you can write long bash commands while escaping the newline. I'm not certain the .env interpreter will support it though.
EDIT
Looks like this won't work. This syntax was actually proposed, but I don't think it was incorporated. See motdotla/dotenv#333 (which is what Vue uses to parse .env).
Like #zero298 said, this isn't possible. Likely you could delimit the entry with a character that wouldn't show up normally in the text (^ is a good candidate), then parse it within the application using string.replace('^', '\n');

Import text from a .txt file using keywords in random positions

I'm new in this great platform and I have a question in Visual Basic.net.
I would like to import data from a txt file (or if you prefer a richtextbox!) using keywords that can be placed in a random position within the txt file. For example a txt like this:
keyword 25
or like this:
keyword 25
In both cases the application should be able to recognise the line because of the presence of the keyword and get the number (25) that will be saved in a variable. Of course this number can vary in different files.
I was thinking to use a code similar to this one:
If line.StartsWith(keyword) Then
.....
End If
but the problem is that the keyword is not always placed as first char (there can be spaces before) and I don't know the line where this keyword is placed int the txt file.
Then I would even ask you how to get the number 25 that can be also placed in random position after the keyword (but for sure on the same line).
I hope everything is clear and thanks if you can help me.
You may consider using .TrimStart() on the lines as you read them, like so:
If line.TrimStart.StartsWith(keyword) Then
.......
End If

Flat File Schema lines longer then expected

Hello there Stackoverflow, I've been tasked with making a flat file schema as well as a map, however, our specifications are that there are 3 fields,
----------
Name       Length
----------
TIdentity     2
OIdentity     17
Result        2
However, the file that we receive is 500(ish) characters long, is there a way to make it ignore the remaning empty characters??
Thanks for any help you guys might be able to supply
You should definitely ensure the spec and sample files are correct (particularly that the spec contains any whitespace requirements/options), but assuming they are and you're just supposed to ignore the whitespace, you can create node to stuff the whitespace into and just ignore it.
Without knowing a bit more about your requirements, it's hard to say exactly how this should work. If the whitespace is always a fixed length, make a node that expects that many characters. If it's not always a fixed length, you may have to make a repeating node that's one character long but not the record terminator (presumably CR/LF or something of the like). If the whitespace itself is the delimiter, you might be able to do something with the ignore_trailing_delimiter on the record.
Worst case scenario (whitespace is variable, you can't control the partner who sends it to you, and you can't get the FFDASM to sensibly deal with it), write a custom Decode component to preprocess the file and remove the extraneous whitespace.

Reading blocks of text from a CSV file - vb.net

I need to parse a CSV file with blocks of text being processed in different ways according to certain rules, e.g.
userone,columnone,columntwo
userthirteen,columnone,columntwo
usertwenty,columnone,columntwo
customerone,columnone<br>
customertwo,columntwo<br>
singlevalueone
singlevaluetwo
singlevalueone_otherruleapplies
singlevaluethree_otherruleapplies
Each block of text will be grouped so the first three rows will be parsed using certain rules and so on. Notice that the last two groups have only one single column but each group must be handled in a different way.
I have the chance to propose the customer the format of the file so I'm thinking to propose the following.
[group 1]
userone,columnone,columntwo
userthirteen,columnone,columntwo
usertwenty,columnone,columntwo
[group N]
rowN
A kind of sections like the INI files from some years ago. However I'd like to hear your comments because I think there must be a better way to handle this.
I proposed to use XML but the customer prefers the text files.
Any suggestions are welcome.
m0dest0.
Ps. using VB.net and VS 2008
You can use regular expression groups set to either an enum line mode if each line has the same format, or to an enum multi-line if the format is not constrained to a single line. For each line in multiline you can include \n in your pattern to cross multiple lines to find you pattern. If its on a single line you don't need to include \n also know as Carriage return line feed in your regex matching pattern.
vb.net as well as many other modern programming language has extensive support for grouping operations. You can use index groups, or named groups.
Each name such as header1 or whatever you want to name it would be in this format: <myname>
See this link for more info: How do I access named capturing groups in a .NET Regex?.
Good luck.