VB.net String Extraction - vb.net

At the company I work at, I have a software that I am developing in vb.net. This software uses a web browser control to load an excel file that the employee can modify. If then saves a copy of the excel file as an excel file for future modification, it saves it as a pdf file, to send to the customer, then prints the first page twice. I am trying to create a quote list. Quote File names are structured as follows...
12345 My Company Name Here 10-25-2013.pdf
Is there any way to "extract" just the "My Company Name Here" in the above example. I tried removing all numbers, and then the - and .pdf from the string, but it actually makes it where fewer results appear in the list view control. Any Ideas?
Dim di As New IO.DirectoryInfo("Z:\Quotes\" & Today.Year & "\" & Today.Month _
& " " & MonthName(Today.Month))
Dim diar1 As IO.FileInfo() = di.GetFiles("*.pdf")
Dim dra As IO.FileInfo
ListView1.View = View.Details
ListView1.Columns.Clear()
ListView1.Columns.Add("Quote Number")
ListView1.Columns.Add("Customer Name")
ListView1.Columns(0).Width = -2
ListView1.Columns(1).Width = -2
For Each dra In diar1
If dra.ToString.Contains("Product") = False Or dra.ToString.Contains("Thumbs.db") Then
Dim newIrm() = dra.ToString.Split(" ")
Dim NumericCharacters As New System.Text.RegularExpressions.Regex("\d")
Dim nonNumericOnlyString As String = NumericCharacters.Replace(newIrm(2), String.Empty)
ListView1.Items.Add(New ListViewItem({newIrm(0), newIrm(1) & newIrm(2)}))
End If
Next
Filename Format:
Z:\Quotes\2013\10 October\12345-RR My Company Name Here 10-25-2013.pdf

By assuming that the company name is always surrounded by blank spaces and that all the surrounding text does not contain any, you can use IndexOf and LastIndexOf. Sample code:
Dim input As String = "Z:\Quotes\2013\10 October\12345-RR My Company Name Here 10-25-2013.pdf"
Dim companyName As String = System.IO.Path.GetFileNameWithoutExtension(input)
companyName = companyName.Substring(companyName.IndexOf(" "), companyName.LastIndexOf(" ") - companyName.IndexOf(" ")).Trim()
If these conditions do not fully apply, you would have to describe clearly the constraints in order to update this code. Without systematically-applied constraints, there wouldn't be any way to deliver an accurate solution for this problem.

The postfix (date.pdf) is a constant size assuming your date format uses leading zeros.
The prefix is a variable size, however the first space of the complete file name always comes before the first character of the company name.
Using these two facts, you can easily find the index of the first and last character of the company "extract" the company name using this information.
Alternatively, you can split the file name into an array using space as your delimiter. You can then grab every index of the array, excluding the first and last index, and combine these elements seperated by a space.

Related

How do I search multiple textfiles for text, then adding that text into a listbox

I've got multiple text files within a folder, like this:
C:\Example\ 1.txt, 2.txt, 3.txt, 4.txt
The file names are generated by time and date they were created at so please don't try to open/search the documents using [1-4].txt or something similar as these are just examples.
I would like to search through all of these text files (without knowing their names as they're randomly generated), and if it matches certain text, I would like the rest of the text on that line to be added into a ListBox, then search the next/rest of the text files.
Example of text file contents:
[14:49:16] [Client thread/INFO]: Setting user: Users Name
All text after Setting user: which is on the same line should be added to the ListBox, so in this case, Users Name would be added.
The above text will always be the first line of the text file, so no need to search the whole file, the beginning of the text will always be the time created at (which will be different for each text file), then followed by [Client thread/INFO]: Setting user: which will always be the same for all of the text files, then Users Name , which wont actually output Users Name, this is what I would like to find, and then add to the ListBox.
I've got some of the code created, but there's three problems with it.
1: I have to define the name of the text file, which I will not know.
2: I'm not sure how to search through all of the documents, only the one that is defined.
3: I can get it to output the Users name, but only if I remove the leading time and [Client thread/INFO]:, but these items will always be there.
With these three problems, the code is useless, I'm just providing it as possibly it will make it easier for someone to help me?
Public Class Form1
Private Sub LoadFiles()
For Each line As String In IO.File.ReadLines("C:\Example\2016-09-28-1.txt")
'I had to define the name of the text file here, but I need to somehow automatically
'search all .txt files in that folder.
Dim params() As String = Split(line, ": ")
Select Case params(0)
'Text file has to be modified to show as:
Setting user: RandomNameHere
'for the RandomName to show within the ListBox,
'but normally it will never be shown like this within the text files.
Case "Setting user"
ListBox1.Items.Add(params(1))
End Select
Next
End Sub
Use System.IO.Directory.GetFiles to get the list of files, and System.IO.Path.GetExtension to filter for .txt files. The String.IndexOf function will let you search for text within each line of the file, and String.Substring will let you retrieve part of the line.
While your original code using Split could be made to work (you would need another loop to go through the split text), I think IndexOf and Substring are simpler in this case.
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim strFilenames() As String = System.IO.Directory.GetFiles("C:\Example")
For i As Integer = 0 To strFilenames.GetUpperBound(0)
Dim strFilename As String = strFilenames(i)
If System.IO.Path.GetExtension(strFilename).ToLower = ".txt" Then
For Each strLine As String In System.IO.File.ReadLines(strFilename)
'[14:49:16] [Client thread/INFO]: Setting user: Users Name
Dim strSearchText As String = "Setting user: "
Dim intPos As Integer = strLine.IndexOf(strSearchText)
If intPos > -1 Then
Dim strUsername As String = strLine.Substring(intPos + strSearchText.Length)
MsgBox(strFilename & " - " & strUsername) '<-- replace this with your SELECT CASE or whatever
End If
Next strLine
End If
Next i
End Sub
You can use system.io.directory class and use the getfiles method to get the filenames from a directoy. then you can open the file and do the needful.
https://msdn.microsoft.com/en-us/library/system.io.directory.getfiles(v=vs.110).aspx

How to get a filename without using for next

Using VB.Net
Get a file name without using for next from the directory
Dim filefound = Directory.GetFiles("C:\", "1.txt")
For Each inwardfile In filefound
Dim strFilename As String = Path.GetFileName(inwardfile)
Next
The above code is working fine, but i dont want to use for loop because i will always get one file at a time not the list of files also i am searching with filename not like "*.txt"
So How to modify the code, Any one can assist.
Your question is inconsistent which makes it very unclear. Do you want a file name or a collection of them? If you want filename not like "*.txt" then why use "1.txt" for GetFiles?
This will answer the title question How to get a filename without using for next. For this, assume a directory full of ".json" files. Some are named cs###.json (e.g. cs001.json) some are named vb###.json and many others.
Path would work to allow you to examine parts of the filename, but DirectoryInfo will provide access to some of that info via FileInfo objects. This uses EnumerateFiles to be able to filter out unwanted files so they are not even returned to your code/array variable.
Dim dix As New DirectoryInfo("C:\Temp\Json")
Get the json files which DO NOT start with CS. This will return an array of FileInfo objects in case you need to do further exclusions:
Dim jFile = dix.EnumerateFiles.Where(Function(f) f.Extension = ".json" AndAlso
f.Name.StartsWith("cs") = False).
ToArray()
Get the json files which ARE "*1.json" and DO NOT start with "cs". In this case, the last Select method will cause an array of filenames to be returned:
Dim jFile = dix.EnumerateFiles.Where(Function(f) f.Extension = ".json" AndAlso
f.Name.EndsWith("1.json") AndAlso
f.Name.StartsWith("cs") = False).
Select(Function(q) q.Name).
ToArray()
Change the Select to 'Function(q) q.FullName` if you want the full path name. These should come close to whatever you are really looking for.

How to search a sub-folder for a particular text file

I have a folder titled ‘The Arts’ which contains various sub-folders, one of which is titled ‘Music’. This ‘Music’ sub-folder contains various text files in the format:
John Doe.TXT
John Lennon.TXT
Elton John.TXT
Now, on my Form, I have two Textboxes in which the user can enter the names of artists like so;
Textbox1.Text = John
Textbox2.Text = Lennon
What I want to achieve is that on clicking a button on this form, the program searches the ‘The Arts’ parent folder for the ‘Music’ sub-folder and then searches within this music sub-folder for the text file name which exactly matches the artist name concatenated from Textboxes 1 and 2.
If a text file name exactly matches the artist name concatenated from Textboxes 1 and 2, then display a message. If no text file name within the Music sub-folder matches the name concatenated from Textboxes 1 and 2; then display a message that no file is found.
The below code is incomplete and just shows how I specified the main file path. I do not know how to proceed to get the program to do the above.
I am using Visual Basic 2010 Express. Thank you for your help.
Dim FilePath As String
FilePath = (Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "The Arts\"))
'This section is where I am stuck and need help...Thank you in advance.
If File.Exists(FilePath) Then
MsgBox("File found.")
Else
MsgBox("A record does not exist for this artist.")
Exit Sub
End If
How to check if a text file name exactly matches the artist name concatenated from Textboxes 1 and 2
You need to first concatenate the text from the text boxes, that given your example, need to be separated by a space. There are a few ways to accomplish that.
For example like this:
Dim artistName = TextBox1.Text + " " + TextBox2.Text
Or this:
Dim artistName = String.Concat(TextBox1.Text, " ", TextBox2.Text)
And there are even more ways to do this.
Next you will need to assemble this into a full file path name. For readability it makes sense to do this in a few steps:
' Directory
Dim desktopPath = Environment.GetFolderPath(Environment.SpecialFolder.Desktop)
Dim musicPath = Path.Combine(deskTopPath, "The Arts", "Music"))
' Combine directory name and the name of the file we want to find.
Dim filePath = Path.Combine(musicPath, artistName + ".TXT")
Finally you can check whether that file exists by calling the File.Exists method.
Dim found = File.Exists(filePath)

Validate a csv file

This is my sample file
#%cty_id1,#%ccy_id2,#%cty_src,#%cty_cd3,#%cty_nm4,#%cty_reg5,#%cty_natnl6,#%cty_bus7,#%cty_data8
690,ALL2,,AL,ALBALODMNIA,,,,
90,ALL2,,,AQ,AKNTARLDKCTICA,,,
161,IDR2,,AZ,AZLKFMERBALFKIJAN,,,,
252,LTL2,,BJ,BENLFMIN,,,,
206,CVE2,,BL,SAILFKNT BAFSDRTHLEMY,,,,
360,,,BW2,BOPSLFTSWLSOANA,,,,
The problem is for #%cty_cd3 is a standard column(NOT NULL) with length 2 letters only, but in sql server the record shifts to the other column,(due to a extra comma in btw)how do i validate a csv file,to make sure that
when there's a 2 character word need to be only in 4 column?
there are around 10000 records ?
Set of rules Defined !
Should have a standard set of delimiters for eachrow
if not
Check for NOT NULL values having Null values
If found Null
remove delimiter at the pointer
The 3 ,,, are not replaced with 2 ,,
#UPDATED : Can i know if this can be done using a script ?
Updated i need only a function That operates on records like
90,ALL2,,,AQ,AKNTARLDKCTICA,,, correct them using a Regex or any other method and put back into the source file !
Your best bet here may be to use the tSchemaComplianceCheck component in Talend.
If you read the file in with a tFileInputDelimited component and then check it with the tSchemaComplianceCheck where you set cty_cd to not nullable then it will reject your Antarctica row simply for the null where you expect no nulls.
From here you can use a tMap and simply map the fields to the one above.
You should be able to easily tweak this as necessary, potentially with further tSchemaComplianceChecks down the reject lines and mapping to suit. This method is a lot more self explanatory and you don't have to deal with complicated regex's that need complicated management when you want to accommodate different variations of your file structure with the benefit that you will always capture all of the well formatted rows.
You could try to delete the empty field in column 4, if column no. 4 is not a two-character field, as follows:
awk 'BEGIN {FS=OFS=","}
{
for (i=1; i<=NF; i++) {
if (!(i==4 && length($4)!=4))
printf "%s%s",$i,(i<NF)?OFS:ORS
}
}' file.csv
Output:
"id","cty_ccy_id","cty_src","cty_nm","cty_region","cty_natnl","cty_bus_load","cty_data_load"
6,"ALL",,"AL","ALBANIA",,,,
9,"ALL",,"AQ","ANTARCTICA",,,
16,"IDR",,"AZ","AZERBAIJAN",,,,
25,"LTL",,"BJ","BENIN",,,,
26,"CVE",,"BL","SAINT BARTH�LEMY",,,,
36,,,"BW","BOTSWANA",,,,
41,"BNS",,"CF","CENTRAL AFRICAN REPUBLIC",,,,
47,"CVE",,"CL","CHILE",,,,
50,"IDR",,"CO","COLOMBIA",,,,
61,"BNS",,"DK","DENMARK",,,,
Note:
We use length($4)!=4 since we assume two characters in column 4, but we also have to add two extra characters for the double quotes..
The solution is to use a look-ahead regex, as suggested before. To reproduce your issue I used this:
"\\,\\,\\,(?=\\\"[A-Z]{2}\\\")"
which matches three commas followed by two quoted uppercase letters, but not including these in the match. Ofc you could need to adjust it a bit for your needs (ie. an arbitrary numbers of commas rather than exactly three).
But you cannot use it in Talend directly without tons of errors. Here's how to design your job:
In other words, you need to read the file line by line, no fields yet. Then, inside the tMap, do the match&replace, like:
row1.line.replaceAll("\\,\\,\\,(?=\\\"[A-Z]{2}\\\")", ",,")
and finally tokenize the line using "," as separator to get your final schema. You probably need to manually trim out the quotes here and there, since tExtractDelimitedFields won't.
Here's an output example (needs some cleaning, ofc):
You don't need to entry the schema for tExtractDelimitedFields by hand. Use the wizard to record a DelimitedFile Schema into the metadata repository, as you probably already did. You can use this schema as a Generic Schema, too, fitting it to the outgoing connection of tExtractDelimitedField. Not something the purists hang around, but it works and saves time.
About your UI problems, they are often related to file encodings and locale settings. Don't worry too much, they (usually) won't affect the job execution.
EDIT: here's a sample TOS job which shows the solution, just import in your project: TOS job archive
EDIT2: added some screenshots
Coming to the party late with a VBA based approach. An alternative way to regex is to to parse the file and remove a comma when the 4th field is empty. Using microsoft scripting runtime this can be acheived the code opens a the file then reads each line, copying it to a new temporary file. If the 4 element is empty, if it is it writes a line with the extra comma removed. The cleaned data is then copied to the origonal file and the temporary file is deleted. It seems a bit of a long way round, but it when I tested it on a file of 14000 rows based on your sample it took under 2 seconds to complete.
Sub Remove4thFieldIfEmpty()
Const iNUMBER_OF_FIELDS As Integer = 9
Dim str As String
Dim fileHandleInput As Scripting.TextStream
Dim fileHandleCleaned As Scripting.TextStream
Dim fsoObject As Scripting.FileSystemObject
Dim sPath As String
Dim sFilenameCleaned As String
Dim sFilenameInput As String
Dim vFields As Variant
Dim iCounter As Integer
Dim sNewString As String
sFilenameInput = "Regex.CSV"
sFilenameCleaned = "Cleaned.CSV"
Set fsoObject = New FileSystemObject
sPath = ThisWorkbook.Path & "\"
Set fileHandleInput = fsoObject.OpenTextFile(sPath & sFilenameInput)
If fsoObject.FileExists(sPath & sFilenameCleaned) Then
Set fileHandleCleaned = fsoObject.OpenTextFile(sPath & sFilenameCleaned, ForWriting)
Else
Set fileHandleCleaned = fsoObject.CreateTextFile((sPath & sFilenameCleaned), True)
End If
Do While Not fileHandleInput.AtEndOfStream
str = fileHandleInput.ReadLine
vFields = Split(str, ",")
If vFields(3) = "" Then
sNewString = vFields(0)
For iCounter = 1 To UBound(vFields)
If iCounter <> 3 Then sNewString = sNewString & "," & vFields(iCounter)
Next iCounter
str = sNewString
End If
fileHandleCleaned.WriteLine (str)
Loop
fileHandleInput.Close
fileHandleCleaned.Close
Set fileHandleInput = fsoObject.OpenTextFile(sPath & sFilenameInput, ForWriting)
Set fileHandleCleaned = fsoObject.OpenTextFile(sPath & sFilenameCleaned)
Do While Not fileHandleCleaned.AtEndOfStream
fileHandleInput.WriteLine (fileHandleCleaned.ReadLine)
Loop
fileHandleInput.Close
fileHandleCleaned.Close
Set fileHandleCleaned = Nothing
Set fileHandleInput = Nothing
KillFile (sPath & sFilenameCleaned)
Set fsoObject = Nothing
End Sub
If that's the only problem (and if you never have a comma in the field bt_cty_ccy_id), then you could remove such an extra comma by loading your file into an editor that supports regexes and have it replace
^([^,]*,[^,]*,[^,]*,),(?="[A-Z]{2}")
with \1.
i would question the source system which is sending you this file as to why this extra comma in between for some rows? I guess you would be using comma as a delimeter for importing this .csv file into talend.
(or another suggestion would be to ask for semi colon as column separator in the input file)
9,"ALL",,,"AQ","ANTARCTICA",,,,
will be
9;"ALL";,;"AQ";"ANTARCTICA";;;;

reverse engineer vba code excel

I am not a VBA programmer. However, I have the 'unpleasant' task of re-implementing someones VBA code in another language. The VBA code consists of 75 modules which use one massive 'calculation sheet' to store all 'global variables'. So instead of using descriptive variable names, it often uses:
= Worksheets("bla").Cells(100, 75).Value
or
Worksheets("bla").Cells(100, 75).Value =
To make things worse, the 'calculation sheet' also contains some formulas.
Are there any (free) tools which allow you to reverse engineer such code (e.g. create Nassi–Shneiderman diagram, flowcharts)? Thanks.
I think #JulianKnight 's suggestion should work
Building on this, you could:
Copy all the code to a text editor capable of RegEx search/replace (Eg. Notepad++).
Then use the RegEx search/Replace with a search query like:
Worksheets\(\"Bla\"\).Cells\((\d*), (\d*)\).Value
And replace with:
Var_\1_\2
This will convert all the sheet stored values to variable names with row column indices.
Example:
Worksheets("bla").Cells(100, 75).Value To Var_100_75
These variables still need to be initialized.
This may be done by writing a VBA code which simply reads every (relevant) cell in the "Bla" worksheet and writes it out to a text file as a variable initialization code.
Example:
Dim FSO As FileSystemObject
Dim FSOFile As TextStream
Dim FilePath As String
Dim col, row As Integer
FilePath = "c:\WriteTest.txt" ' create a test.txt file or change this
Set FSO = New FileSystemObject
' opens file in write mode
Set FSOFile = FSO.OpenTextFile(FilePath, 2, True)
'loop round adding lines
For col = 1 To Whatever_is_the_column_limit
For row = 1 To Whatever_is_the_row_limit
' Construct the output line
FSOFile.WriteLine ("Var_" & Str(row) & "_" & Str(col) & _
" = " & Str(Worksheets("Bla").Cells(row, col).Value))
Next row
Next col
FSOFile.Close
Obviously you need to correct the output line syntax and variable name structure for whatever other language you need to use.
P.S. If you are not familiar with RegEx (Regular Expressions), you will find a plethora of articles on the web explaining it.