Speeding up looping through a textfile in vbs - testing

Good afternoon,
I have a problem with my code where im looping through a textfile. The textfile has approx 10,000 lines so I came up with using the instr search function to find the line number by finding the character number in which the "test name" appears and then using the mid function and counting left to find the line number.
eg.
000004###24503###Open Account Web ISA single###2#########Please enter your first name.###False#########Mr############callie####################################################################################################################################################################################################################################################################################################################################666###Imagenericpassword###Ops#######################################################################################################################################Cash ISA 2009 / 2010##########################################################################################################################################################################################################################################
So in this case it finds "Open Account Web ISA single" and counts left to find 000004. So this saves me looping through 10,000 lines.
So next I split this line into an array using ### as a delimeter, this results in lots of empty "columns" since they were empty when i concatinated the data from excel. This leaves me with a total of around 247 columns. My issue is I dont want to really loop through 247 columns since lots of them contain...well nothing. Is there a quicker way for me to do this?
I used to use excel but its far too slow.

You can remove the empty columns:
Set re = New RegExp
re.Pattern = "(###){2,}"
re.Global = True
withoutEmptyCols=re.Replace(input,"###")
This is the result for your example:
000004###24503###Open Account Web ISA single###2###Please enter your first name.###False###Mr###callie###666###Imagenericpassword###Ops###Cash ISA 2009 / 2010###

Related

Scanning through lists in VBA and adding each hit in the same cell

I have been leveraging for the past several months a couple of lines of code in VBA which with the help of the stackoverflow community I was able to adjust as needed (link Looping and scanning a list of e.g., 200 strings against a column with string data (Excel, VBA, Macros)). Essentially, the code scans row by row through a list of pre-defined keywords against a range of data highlighting possible hits in an adjacent empty column/cell. For example, if my range/column contained "DOG ABC LLC" and my keyword list/array contained "ABC" the macro helped easily highlight the hit in another column by displaying it.
I have noticed one issue with this method that often more than one keyword hit could occur. For example, I can have an array containing both "ABC" and "DOG" as separate keywords. The current loop in place sadly only factors in the first hit apparently and then moves on. I was wondering whether there is an easy way of adjusting the code so that one could add all possible hits into a cell after a comma or space. Therefore instead of seeing just "DOG" or "ABC", one could clearly see that there were 2 hits "ABC , DOG". Here is the code I have been using thus far:
Dim wordsArray() As Variant
wordsArray = Worksheets("Keywords").Range("B2:B439").Value
Dim word As Variant
Dim cell As Range
For Each cell In Worksheets("Normalized").Range("J2:J49010")
For Each word In wordsArray
If InStr(cell.Value, word) > 0 Then
cell.Offset(0, -1).Value = word
End If
Next word
Next cell
Thank you in advance for advice!

VBA getcrossreferenceitems(wdRefTypeNumberedItem) Paragraph Cut Off?

I'm using excel vba to extract information from a word document.
In the word document, there are levels of numbered lists. For example:
1. ABC
1.1 DEF
1.1.1 ABCDEF
2. AAA
2.1 BBB
2.1.1. CCC
and I need to get the full context of each heading in each level and put them into an excel range, i.e. {"1.ABC", "1.1 DEF", "1.1.1 ABCDEF", "2. AAA", "2.1 BBB", "2.1.1. CCC"}
The function I use is:
For Each sec In objDoc.getcrossreferenceitems(wdRefTypeNumberedItem)
However, my headings are truncated if the headings are too long. For example, I have (random text is added for confidentiality reasons):
"5.2.11. Current References: As part of the evaluation process, XXX will conduct 2340AERTQ3493YR. When selecting ADT34534FDGSR, please ensure that they are AERA34AEFDS."
But only
5.2.11. Current References: As part of the evaluation process, XXX will conduct 234
is displayed, and the rest of the sentence is gone.
If anybody has an alternate solution, please let me know.
i confirm this behavior. A workeable albeit and elaborate solution is to scan the document for all numbered items which gives you the full text and then cross reference that result against the list returned by the GetCrossReferenceItems. There's quite some work involved but works and gives you the ability to create one list with referable Headings and NumberedItems, which is what I did to build a more user friendly alternative to Word's own implementation.
You'll have to match the formatting Word applies to the list returned by GetCrossReferenceItems, ie. the identation and removal of special characters.
Be careful with track changes. There is a bug in GetCrossReferenceItems which means that items (in my case headers) that have a tracked change at the beginning of the text are not returned by GetCrossReferenceItems but internally are still on the list so the index is offset. If the item in question is item 11, then GetCrossReferenceItems gives the item belonging to item 12 the item 11. A workaround is to accept all revisions before GetCrossReferenceItems and undo it after.
It's not easy but works.
I met a similar problem in MSWord. I found some paragraph's text are shorten in the following code
Sub bug()
items = ActiveDocument.GetCrossReferenceItems(wdRefTypeNumberedItem)
For idx = 1 To UBound(items)
MsgBox items(idx)
Next
End Sub
I have to use a some long solution( in Python, sorry. But is is easy to rewrite in VBA):
varHeadings = []
for par in objDoc.Paragraph:
if par.Range.ListFormat.ListType == win32com.client.constants.wdListOutlineNumbering:
idx = par.Range.ListFormat.ListString
txt = par.Range.Text.strip('\n').strip('\r')
varHeadings.append('%s%s' % (idx, par.Range.Text))
which does work. However, as I have said, it is some tedious. So did I miss some VBA function in MSWord, or GetCrossReferenceItems has known bug and can not found any replacement in VBA?

Excel file contains invalid hidden characters that can't be removed

I have a peculiar problem with hidden characters in an Excel spreadsheet which uses VBA to create a text file. I've attached a link to a test version of the file, and I'll explain as best I can the issue.
The file creates a plain txt file that can be used to feed data into a System we use. It works well normally, however we've been supplied approximately 15,000 rows of data, and at random points throughout there are hidden characters.
In the test file, there's 1 row and it's cell B11 that has hidden characters at the beginning and end of the value. If you put your cursor at the end of it, and press the backspace key, it will look as if nothing has happened, but actually you've just deleted one of the characters.
As far as Excel is concerned, those hidden characters are question marks, but they're not, as text stream would parse those, but it doesn't, and instead throws up an invalid procedure call error.
I've tried using Excel's CLEAN formula, I've tried the VBA equivalent, tried using 'Replace', but nothing seems to recognise those characters. Excel is convinced they're just question marks, even an ASCII character call gives me the same answer (63), but replace doesn't replace them as question marks, it just omits them!
Any help on this, even if it's just a formula I could apply would be appreciated. In the interests of data protection the data in the file is fake by the way, it's nobody's real NI number.
The excel file with vba code is here
This VBA macro could be run on its own or in conjunction with the ClearFormatting macro. It did strip out the rogue unichars from the sample.
Sub strip_Rogue_Unichars()
Dim uc As Long
With Cells(11, 1).CurrentRegion
For uc = 8000 To 8390
.Replace what:=ChrW(uc), replacement:=vbNullString, lookat:=xlPart
DoEvents
Next uc
End With
End Sub
There's probably a better way to do this and being able to restrict the scope of the Unicode characters to search and replace would obviously speed things up. Turning off .EnableEvents, .ScreenUpdating, etc would likewise help. I believe the calculation was already at manual. I intentionally left a DoEvents in the loop as my first run was several thousand different unichars.

Copy cells if specific text is found

I seem to have a problem and currently have not found a solution to it, which is why I address this question to you:
Each day I have a list of invoices and orders coming from different suppliers, and the orders are based on part numbers and types.
This list is imported as text and then goes through a macro I made, to arrange everything in cells.
I also need to go through some steps to format this list based on the type of order (ex: windshield, carpets, wheels, etc ). what I usually do is to filter everything and select the order type that I am interested, and then copy on the same row cells with text and formulas from another worksheet, named "template", which is a list of conditions .
Since it varies from day to day, it may not necessarily contain all part types, which is I couldn't use a macro, and I have to continue by hand, and sometimes the list exceeds 200-300 lines.
To give you an example, if E2 has "windshield" I copy in M2 to Q2 a selection of cells from "Template" (M2 to Q2), if "carpets" I copy M3 to Q3, and so on. the list of conditions is around 15 - 20 rows, and sometimes 2 conditions may apply (if order exceeds $5000 I mark it red, if overdue I bold everything, etc) but mainly I copy based on text in coll E.
If this could be copied into a macro, I would really appreciate it, as I need to take some steps every time, like auto-fit, copy header, format the amounts as number (from text), change text color based on order type, etc, and this too takes time.
I hope this information is enough to make an idea about this, and if not, I could post an example of the list I have to work with.
Many thanks in advance for your support
Use Application.Worksheetfunction.Match to find in which row in Template the to-be-copied cells can be found, then copy range M-Q for this row and paste in your file
You are asking too much in one question to get help here. We are best at single issue questions. The text and code below is intended you give you some ideas. If your code does not work, post the relevant part here and explain the difference between what it does and what you want it to do.
The problems you mention do not sound difficult. I would expect basic VBA to be enough to get you started. Are you looking for bits of relevant code without learning VBA. If you are, this is a big mistake. Search the web for "Excel VBA tutorial" or visit a large library and review their Excel VBA Primers. There are many tutorials and books to choose from so select one that is right for you. The time spent learning the basics will quickly repay itself.
Dim RowCrnt As Long
Dim RowLast As Long
With Worksheets("xxxx")
RowLast = .Cells(Rows.Count,"E").End(xlUp).Row
For RowCrnt = 2 to RowLast
' Code to process each row goes here
Next
End With
The above is probably the structure of your code. The For loop will examine each row in turn which will allow you to take relevant actions.
I have used "E" as a column letter because your question suggests column "E" is the most important. However, code that references columns in this way can be very confusing. Worse, if the column positions change, you will have to work carefully through your code changing the column letters. Better to have some statements at the top like this:
Const ColDate As String = "A"
Const ColAmtInv As string = "B"
Const ColAmtPaid As string = "C"
Const ColProdType As String = "E"
With these constants every reference to a column uses a name not a letter. The code is easier to read and, if a column moves, one change to the constant statement will fix the problem.
The Select Case statement is useful:
Select Case .Cells(RowCrnt, ColProdType).Value
Case "carpets"
' code for carpets
Case "windshield"
' code for carpets
Case Else
' Unknown product type
.Cells(RowCrnt, ColProdType).Font.Color = RGB(255, 0, 0)
End Select
You can have a string of If statements so multiple actions can occur on a single row:
If DateAdd("m", 1, .Cells(RowCrnt,ColDate).Value) < Now() And _
.Cells(RowCrnt,ColAmtInv).Value) > .Cells(RowCrnt,ColAmtPaid).Value Then
' Invoice overdue
.Rows(RowCrnt).Font.Bold = True
End If
If .Cells(RowCrnt,ColAmtInv).Value) > 5000 Then
' Large invoice
.Rows(RowCrnt).Font.Color = RGB(255, 0, 0)
End If
You do not have to write the entire macro in one go. If most action is because of the product type then try that first.
Hope this helps get you started.

Script to modify outlook (2003) contacts

I'm trying to clean up my outlook 2003 contacts, which has become a rather ugly mess of various formatting, etc.
Basically, I have a bunch of contacts, in the form of either:
0xxxxxxxxx [ten digits, starting with 0] 0xxxxxxxx [nine digits, starting with 0] 0xxxxxxxx (xxxxx) [the same nine digits above with the last five repeated in parentheses] +xxxxxxx [some random "complete" number with an international dialing code, etc]
I want all of the numbers to match the last format. The algorithm is simple enough: for the first two types, drop the 0 and add +YYY where YYY is my country code. Ditto for the third, but drop everything in parentheses.
My problem is that I don't know how to go about doing this. I've written a million scripts in my life in Perl, but I'd rather not export everything to text, process it, and re-import; I'd like to have a one-click solution that can easily be re-run (such as when I import a new contact from my companies' directory which comes in one of the forms above). I suspect that VBScript is the way to go; I've seen a few references online to accessing contacts as objects, but I'm not really sure what the best way to get started is.
Any recommended resources?
This is a duplicate of https://superuser.com/questions/15913/script-to-modify-outlook-2003-contacts ; I'm not sure which site is a better location
I would say VBA, rather than VBScript.
Sub GetContactsTel()
Set oFolder = GetNamespace("MAPI").GetDefaultFolder(olFolderContacts)
' Loop through all of the items in the folder.
For i = 1 To oFolder.Items.Count
Debug.Print oFolder.Items(i).BusinessTelephoneNumber
Next
End Sub