Import text from a .txt file using keywords in random positions - vb.net

I'm new in this great platform and I have a question in Visual Basic.net.
I would like to import data from a txt file (or if you prefer a richtextbox!) using keywords that can be placed in a random position within the txt file. For example a txt like this:
keyword 25
or like this:
keyword 25
In both cases the application should be able to recognise the line because of the presence of the keyword and get the number (25) that will be saved in a variable. Of course this number can vary in different files.
I was thinking to use a code similar to this one:
If line.StartsWith(keyword) Then
.....
End If
but the problem is that the keyword is not always placed as first char (there can be spaces before) and I don't know the line where this keyword is placed int the txt file.
Then I would even ask you how to get the number 25 that can be also placed in random position after the keyword (but for sure on the same line).
I hope everything is clear and thanks if you can help me.

You may consider using .TrimStart() on the lines as you read them, like so:
If line.TrimStart.StartsWith(keyword) Then
.......
End If

Related

Cannot move a text object (variable) outside a function

I am trying to first convert pdf credit card statements to text then use regex to extract dates, amounts, and vendor from the individual lines. I can extract all the lines of text as they appear on the statement but when I call the variable with the text file, it only returns the last line.
I set the directory and read-in the pdf credit card statement as "dfpdf"
I run this code ....
with plumb.open(dfpdf) as pdf:
pages = pdf.pages
for page in pdf.pages:
text = page.extract_text()
global line
for line in text.split('\n'):
print(line)
this returns all the lines in the statement which is what I want. But if I later call or try to print "line" all I get is the last line of the statement. In addition to what is probably a really simple answer, I would also love a suggestion for a really good tutorial or class on using python to convert pdfs then using regex to create pd data frames. Thanks to all of you out there who know what you're doing and take the time to help amatuers like me. Mark

Select line after finding keyword

I wanted to make a piece of code that selects the line in a text file when it finds the keyword that it's searching for. I have no clue what to actually do, what I searched up didn't help, was outdated, or for another language. I would need this code for vb.net. Thank you.
An example of what I mean.
Let's say we wanna search for: SO11
And there's other lines.
(1) : HJ6
(2) : 46J
(3) : SO11
(4) : NTE
(5) : 4UJ
And the searched line is in line 3. I want it to select line 3 and have it dimmed into a string so I can use it for future things.
Try breaking you question up into smaller chunks. You might be thinking to broadly.
For example:
Read Text File
While reading text file If file contains SO11 then save that line to variable.(Looping through that file.)
Do stuff with that variable.
Give that a try and let us know how it goes.

Inconsistent line endings in SSIS Flat File import

I have a large, pipe delineated text file with no text qualifiers, and it looks like whatever spit out this file accidentally spit out false "LF" markers in the last column every few hundred rows.
The last column is a descriptive column, and It is not text qualified in any way like it should be.
file looks similar to this:
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Descr[LF]
iption[LF]
id|data|data|data|data|Description[LF]
Id|data|data|data|data|Description[LF]
id|data|data|data|data|Descripti[LF]
on[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|Description[LF]
id|data|data|data|data|D[LF]
escription[LF]
I'm pretty new to SSIS and SQL in general, Does anyone have any advice on how to fix this?
I did actually find a way to fix it in Notepad++, because I don't know C# and I don't know SSIS well enough..
The ID was 8 Digits long, and followed by 7 Blank spaces. That was absolutely unique to this file.
In notepad++ I used (Find Extended) to search and replace "\n"(LF) with nothing
then I used the this expression for find:
(\d\d\d\d\d\d\d\d[[:blank:]][[:blank:]][[:blank:]][[:blank:]][[:blank:]][[:blank:]][[:blank:]])
to find all 8 digit numbers with 7 trailing spaces, and for replace, used this:
\r\n\1
to put a [CR][LF] in front of those 8 digit numbers.
Lo and behold it worked!
But either way.. My boss contacted the client and is requesting a better file. Now I get kudos, and we get proper data. Thanks for the advice all!
If I had to take a guess, I would say that this is occurring because of how the file is created... you are probably having data that just happens to include certain special characters which are being incorrectly interpreted as a Line Feed.
Check this site to see if the data within your problem lines match any of these encodings. If this is the case then ultimately you have two options available:
1) Create some elaborate and complicated ETL process to detect and correct the file data before you process it. This is inadvisable as it will be a major pain to create and maintain.
2) Try changing the way this file is produced. Most text export wizards will allow you to place quotes (") around text items so that your import process can quickly detect something as a text block as opposed to a series of encoded characters to interpret.

How to use FILE_MASK parameter in FM EPS2_GET_DIRECTORY_LISTING

I am trying to filter files using FILE_MASK parameter in EPS2_GET_DIRECTORY_LISTING to reduce time searching all files in the folder (has thousands of files).
File mask I tried:
TK5_*20150811*
file name in the folder is;
TK5_Invoic_828243P_20150811111946364.xml.asc
But it exports all files to DIR_LIST table, so nothing filtered.
But when I try with;
TK5_Invoic*20150811*
It works!
What I think is it works if I give first 10 characters as it is. But in my case I do not have first 10 characters always.
Can you give me an advice on using FILE_MASK?
Haven’t tried, but this sounds plausible:
https://archive.sap.com/discussions/thread/3470593
The * wildcard may only be used at the end of the search string.
It is not specified, what a '*' matches to, when it is not the last non-space character in the FILE parameter value.

Removing handling newlines in a simple text import class

I have an input file that I want to use the string SPLIT function on for each line, depending on the Type field. However, the description field sometimes has data that has new lines in it so it messes up my file reader since it uses streamreader's readline() function
Handled:
Type|Name|User|Description
Type|Name|User|Description
Unhandled:
Type|Name|User|Description line 1
Description Line 2
Type|Name|User|Description
Besides not being able to validate on 'Type' for each line and keep reading the file for when the next Type field appears, are there any ways folks can come up with to properly read this file?
My solution was to have the file maker replace newline characters in their description field with another unique character that I can later add back in. I'm still interested in solutions from the file reader's perspective though
I know I'm talking to myself a lot here, but I found another solution, which is to remove remove line feeds, since the output file creator wrote out carriage returns for each line.
You could easily set a conditional statement to see if the Split array contains more than one element, which would indicate that it's a line you want to parse.