I have large tables in RTF format, from 20-150 Mb in size. I first tried to export the RTF -> HTML -> Import to excel. It took about 35 minutes for a 60 Mb file. Next, I tried copying the table directly from Word -> excel. It always fails midway (everything gets pasted, by data is not in the right cell).
I tried a few more ways (importing all cells into memory before transferring to excel, and other permutations, and methods detailed in this and other sources) before settling on the .ConvertToText method.
This method is relatively faster, taking about 25 minutes for the same 60 Mb file (this is without displaying Word, setting repagination, events, dispayupdate and tableautofit to false).
Considering that these files can be entirely loaded into RAM memory in less than 10 seconds, I wonder why does it take 25 minutes to read data off a 60 Mb file. I understand that the table engine in Word is slow because of the change to HTML format, but reading a table cell by cell is atrociously slow. First few cells are super fast, Last cells are slower - I'm sure manual reading is faster than that. It defeats the whole purpose of automation. However, I do not have a choice.
The code is:
Dim oWord As Word.Application
Dim RTF As Word.Document
Set oWord = CreateObject("Word.Application")
Set RTF = oWord.Documents.Open(filename:=Fname, ConfirmConversions:=False, ReadOnly:=False) ', ReadOnly:=True)
Application.StatusBar = vbNullString
Application.EnableEvents = False
Application.ScreenUpdating = False
Application.DisplayAlerts = False
With oWord
Options.Pagination = False
Options.AllowReadingMode = False
Application.AutoRecover.Enabled = False
Options.SaveInterval = 0
Options.CheckGrammarAsYouType = False
Options.CheckGrammarWithSpelling = False
End With
With RTF
Options.Pagination = False
Options.AllowReadingMode = False
Application.AutoRecover.Enabled = False
Options.SaveInterval = 0
Options.CheckGrammarAsYouType = False
Options.CheckGrammarWithSpelling = False
End With
Dim AAF As Table
For Each AAF In RTF.Tables
AAF.AllowAutoFit = False
Next
oWord.Visible = False
Dim rng As Word.Range
Dim sData As String
Dim aData1() As String
Dim aData2() As String
Dim aDataAll() As String
Dim nrRecs As Long
Dim nrFields As Long
Dim lRecs As Long
Dim lFields As Long
Dim CTbl As Table 'Data Table
Dim oCell As Cell
'I'm not displaying the code which replaces all ^p with a spl character to maintain the table structure - it is staright forward, and does the job
Set rng = CTbl.ConvertToText(Separator:="$", NestedTables:=False)
sData = rng.Text 'This contains the entire table, delimited by vbCr and $...
Application.StatusBar = "Closing open files..."
RTF.Close (wdDoNotSaveChanges) 'All data has been extracted, hence quit word
oWord.Quit
Set oWord = Nothing
sData = Mid(sData, 1, Len(sData) - 1)
aData1() = Split(sData, vbCr)
nrRecs = UBound(aData1())
If Dbg Then MsgBox "The table contained " & nrRecs + 1 & " rows"
For lRecs = LBound(aData1()) To nrRecs 'Cycle through all rows
aData2() = Split(aData1(lRecs), "$") 'Split rows into arrays
Debug.Print aData1(lRecs)
nrFields = UBound(aData2()) 'Find out the number of columns
If lRecs = LBound(aData1()) Then 'If this is the first row/cycle,
ReDim Preserve aDataAll(nrRecs, 9) 'nrFields) 'Resize the array - currently I'm using a fixed size for the column since the first row of my table contains merged rows
End If
For lFields = LBound(aData2()) To nrFields 'Cycle through all columns
aDataAll(lRecs, lFields) = aData2(lFields) 'Collate the data in a single array
'If MsgBox(aDataAll(lRecs, lFields), vbYesNo, "Continue?") = vbNo Then Exit For
Next
Next 'All of this was slapped together from MS code samples and stackoverflow examples
Any suggestions to improve performance?
The conversion will go a lot faster if you first split the table (I assume there's one very large table) into smaller tables and then convert each table to text.
I tried this on a table with 10000 rows and 10 columns. The time to convert to text went from ~280 seconds to ~70 seconds (i.e. 4X faster).
I ran the code below directly from the document with the 10000 row table (as opposed to running from Excel) for simplicity.
Splt then convert:
Sub SplitThenConvert()
Dim t As Table
Set t = ActiveDocument.Tables.Item(1)
Dim rowCount As Integer
Dim index As Integer
Dim numSteps As Integer
Dim splitRow As Integer
Dim increment As Integer
Dim start_time, end_time
start_time = Now()
Application.ScreenUpdating = False
rowCount = t.Rows.Count
numSteps = 10
increment = rowCount / numSteps
splitRow = rowCount - increment
For index = 1 To numSteps
Debug.Print "Split #" + CStr(index)
ActiveDocument.Tables(1).Rows(splitRow).Select
Selection.SplitTable
splitRow = splitRow - increment
If splitRow < increment Then
Exit For
End If
Next index
index = ActiveDocument.Tables.Count
While index > 0
Debug.Print "Convert #" + CStr(index)
ActiveDocument.Tables(index).ConvertToText ","
index = index - 1
Wend
end_time = Now()
Application.ScreenUpdating = True
MsgBox (DateDiff("s", start_time, end_time))
End Sub
Convert entire table without splitting:
Sub ConvertAll()
Dim start_time, end_time
Application.ScreenUpdating = False
start_time = Now()
ActiveDocument.Tables(1).ConvertToText ","
end_time = Now()
Application.ScreenUpdating = True
MsgBox (DateDiff("s", start_time, end_time))
End Sub
I do agree with #KazJaw: reading/writing from/to MS Office programs (including .rtf because is treated as Word) is very computational expensive, better relying on other means as much as possible (just converting the .rtf file reading into a simple .txt file reading would improve the speed a lot). I have recently answered a post on these lines.
The other proposal I have is reducing the number of "live Office variables" as much as possible. Instead creating the RTF and the oDoc variables at the same time, better doing it one after the other (the same for Excel). What should be done only under exceptional circumstamces (because of being too computational expensive) is copying/pasting in real time between two different instances (for example, two different Word documents).
Thus, use the connection to Office programs for what it is intended, that is, top-level access to a file storing information in a pretty complex way: populate values, change formatting, perform complex actions (e.g., searching through the whole document); but intend to reduce the iterative behaviour (e.g., copying from one cell and pasting into another one over and over) as much as possible. See it in this way: copying/pasting in a .txt file involves just inspecting the input value/the target location and performing the action; doing it in Word involves the same than in the .txt file plus accounting for the huge amount of variables analysed while considering each record (formatting, references to other elements, special actions, etc.).
Related
I have a macro that loops through an unknown number of times. The number of times varies based on a total number of rows in multiple tables in a reference document, and that number of rows will vary across reference documents that may be used. The relevant snippet of code for the loop is below:
For Each oRow In oTbl.Rows
p = p + 1
Helper.ProgressIndicator_Code (p)
strPhrase = Split(Trim(oRow.Range.Cells(1).Range.Text), vbCr)(0)
strRule = Split(Trim(oRow.Cells(2).Range.Text), vbCr)(0)
If strPhrase <> "" Then
If Not strStartWord = vbNullString Then
'Process defined sections
arrEndWords = Split(strEndWord, "|")
For lngIndex = 0 To UBound(arrEndWords)
Set oRng = GetDocRange(strStartWord, arrEndWords(lngIndex))
If Not oRng Is Nothing Then Exit For
Next lngIndex
Else
'Process whole document
Set oRng = m_oDocCurrent.Range
End If
If Not oRng Is Nothing Then
Set oRngScope = oRng.Duplicate
With oRng.Find
.Text = strPhrase
Do While .Execute
If Not oRng.InRange(oRngScope) Then Exit For
oRng.HighlightColorIndex = wdTurquoise
If strRule <> "" Then
Set oComment = m_oDocCurrent.Comments.Add(Range:=oRng, Text:=strUsr & ": " & strRule)
oComment.Author = UCase("WordCheck")
oComment.Initial = UCase("WC")
End If
Loop
End With
End If
End If
Next oRow
The progress bar is a classic progress bar for which a label field width is updated using the below code based on a value of p as updated in the above code:
Sub progress(pctCompl As Integer)
ProgressIndicator.Text.Caption = pctCompl & "% Completed"
ProgressIndicator.Bar.Width = pctCompl * 2
DoEvents
End Sub
Here's my problem: The value of p varies based on which reference document is used, so my progress bar is never even approximately accurate with respect to the processing of the VBA macro. The progress bar doesn't have to be exact, merely close and to indicate that progress is being made and nothing has hung.
I'm not looking for written code, just would be very grateful for suggestions or advice as to approaches for making my progress bar more accurate so that I can learn (e.g., I just ran the macro for three different reference documents - one gave me 25%, one gave 44%, and one gave 82%; none showed even close to 100% when completed). Essentially I need to divide i by an unknown number to get my percentage, which is clearly impossible, so some function for a close approximation is needed.
Edit: New code based on #macropod suggestion.
Dim strCheckDoc As String, docRef As Document, projectPath As String, _
j As Integer, i As Integer, k As Integer, oNumRows as Long
j = 1
For i = 0 To UBound(strUsr)
strCheckDoc = [path to reference document unique to each strUsr]
Set docRef = Documents.Open(strCheckDoc, ReadOnly:=True, Visible:=False)
For k = 1 To docRef.Tables.Count
oNumRows = oNumRows + docRef.Tables(i).Rows.Count
Next k
Next i
Then the code to update the progress bar is:
Dim pctCompl As Single
pctCompl = Round((p / oNumRows) * 100)
ProgressIndicator.Text.Caption = pctCompl & "% Completed"
ProgressIndicator.Bar.Width = pctCompl * 2
DoEvents
The progress bar now gets to 64% when complete (i.e., it should be at 100%). I'm also working on a way to make oNumRows only count a row if the row has content in the first column.
Before I ask my question, I'm an amateur coder with basically no meaningful experience beyond VBA in ms office apps (I know - noob!)
I'm trying to create a web scraper using VBA to import data into excel and as per my comments in the below extract of code, the best I've been able to find on this is was in the winning answer to this question.
Below, I'm using investing.com as an example but in reality my project will be across multiple sites and will feed into a matrices which will be updating daily and self cannibalizing as events expire - For this reason I'd rather front-up the workload on the code side to make the inputs on an ongoing basis as minimal as possible (for me).
With that in mind, can I ask if there's a way to do any of the following (brace yourself, this will be cringe-worthy basic knowledge for some):
Is there a way in which I can and navigate to a url and run a for each loop on every table on that page (without have a known id for any)? this is to speed up my code as much as it's to minimise my inputs as there'll be quite a bit of data to be updated and I was planning on putting a 2 minute looping trigger on the refresh.
Instead of doing what I've been doing below, is it possible to reference a table, rather than a row, and do something along the lines of Cells(2,5).value to return the value within row 1, column 4? (assuming that both the array indexing starts at 0 in both dimensions?) Further to that, my first column (my primary key in some ways) may not be in the same order on all sources so is there a way in which I could do the equivalent to Columns("A:A").Find(What:=[Primary key], After:=Cells(1, 1), LookIn:=xlValues, LookAt:=xlWhole, SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:=True, SearchFormat:=False).Row to find what row within the table relates to the even I'm looking for?
Code :
Sub Scraper()
Dim appIE, allRowOfData As Object
' As per https://stackoverflow.com/questions/27066963/scraping-data-from-website-using-vba
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Navigate "http://uk.investing.com/rates-bonds/financial-futures" 'Sample page
.Visible = False
End With
Do While appIE.Busy
Application.Wait (Now + TimeValue("0:00:01")) 'If page not open, wait a second befor trying again
Loop
Set allRowOfData = appIE.document.getElementById("pair_8907")
'tr id="[ID of row within table]"
Dim myValue As String: myValue = allRowOfData.Cells(8).innerHTML
'The 8 is the column number of the table
'(note: column numbers start at 0 so the 9th column should have "8" entered here
Set appIE = Nothing
Range("A1").Value = myValue
End Sub
If you want to use Excel functions to navigate the tables why not dump the tables first onto a worksheet this code works for me
Option Explicit
Sub Scraper()
Dim appIE As Object
' As per http://stackoverflow.com/questions/27066963/scraping-data-from-website-using-vba
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Navigate "http://uk.investing.com/rates-bonds/financial-futures" 'Sample page
.Visible = True
End With
Do While appIE.Busy
DoEvents
Application.Wait (Now + TimeValue("0:00:01")) 'If page not open, wait a second befor trying again
Loop
'Debug.Print TypeName(appIE.document)
Dim doc As Object 'MSHTML.HTMLDocument
Set doc = appIE.document
'* appIE busy is good but you need to wait for the whole document to completely load and initialise so use this
While doc.readyState <> "complete"
DoEvents
Wend
'* we can select all the tables because they share the same CSS class name
Dim tablesSelectedByClass As Object 'MSHTML.HTMLElementCollection
Set tablesSelectedByClass = doc.getElementsByClassName("genTbl")
'* you can change this, it was just convenient for me to add sheets to my workbook
Dim shNewResults As Excel.Worksheet
Set shNewResults = ThisWorkbook.Worksheets.Add
Dim lRowCursor As Long '* this controls pasting down the sheet
lRowCursor = 1
Dim lTableIndexLoop As Long
For lTableIndexLoop = 0 To tablesSelectedByClass.Length - 1
Dim tableLoop As Object 'MSHTML.HTMLTable
Set tableLoop = tablesSelectedByClass.Item(lTableIndexLoop)
If LenB(tableLoop.ID) > 0 Then '* there are some extra nonsense tables, this subselects
Dim sParentColumn As String, objParentColumn As Object ' MSHTML.HTMLSemanticElement
Set objParentColumn = FindMyColumn(tableLoop, sParentColumn) '* need to understand is table on left hand or right hand side
Dim vHeader As Variant: vHeader = Empty
If sParentColumn = "leftColumn" Then
'* tables on the left have a preceding H3 element with the table's description
Dim objH3Headers As Object
Set objH3Headers = objParentColumn.getElementsByTagName("H3")
vHeader = objH3Headers.Item(lTableIndexLoop).innerText
Else
'* tables on the right have a hidden attribute we can use
vHeader = tableLoop.Attributes.Item("data-gae").Value
If Len(vHeader) > 3 Then
vHeader = Mid$(vHeader, 4)
Mid$(vHeader, 1, 1) = Chr(Asc(Mid$(vHeader, 1, 1)) - 32)
End If
End If
'* tables on the right do not have column headers
Dim bHasColumnHeaders As Boolean
bHasColumnHeaders = (tableLoop.ChildNodes.Length = 2)
Dim vTableCells() As Variant '* this will be our table data container which we will paste in one go
Dim lRowCount As Long: lRowCount = 0
Dim lColumnCount As Long: lColumnCount = 0
Dim lDataHeadersSectionIdx As Long: lDataHeadersSectionIdx = 0
Dim objColumnHeaders As Object: Set objColumnHeaders = Nothing
If bHasColumnHeaders Then
Set objColumnHeaders = tableLoop.ChildNodes.Item(0).ChildNodes.Item(0)
lRowCount = lRowCount + 1
lDataHeadersSectionIdx = 1
Else
lDataHeadersSectionIdx = 0
End If
Dim objDataRows As Object 'MSHTML.HTMLElementCollection
Set objDataRows = tableLoop.ChildNodes.Item(lDataHeadersSectionIdx).ChildNodes
lColumnCount = objDataRows.Item(0).ChildNodes.Length
lRowCount = lRowCount + objDataRows.Length
ReDim vTableCells(1 To lRowCount, 1 To lColumnCount) As Variant
'* we have them get the column headers
Dim lColLoop As Long
If bHasColumnHeaders Then
For lColLoop = 1 To lColumnCount
vTableCells(1, lColLoop) = objColumnHeaders.ChildNodes.Item(lColLoop - 1).innerText
Next
End If
'* get the data cells
Dim lRowLoop As Long
For lRowLoop = 1 To lRowCount - VBA.IIf(bHasColumnHeaders, 1, 0)
For lColLoop = 1 To lColumnCount
vTableCells(lRowLoop + VBA.IIf(bHasColumnHeaders, 1, 0), lColLoop) = objDataRows.Item(lRowLoop - 1).ChildNodes.Item(lColLoop - 1).innerText
Next
Next
'* paste our table description
shNewResults.Cells(lRowCursor, 1).Value2 = vHeader
lRowCursor = lRowCursor + 1
'* paste our table data
shNewResults.Cells(lRowCursor, 1).Resize(lRowCount, lColumnCount).Value2 = vTableCells
lRowCursor = lRowCursor + lRowCount + 1
End If
Next
End Sub
Function FindMyColumn(ByVal node As Object, ByRef psColumn As String) As Object
'* this code ascends the DOM looking for "column" in the id of each node
While InStr(1, node.ID, "column", vbTextCompare) = 0 And Not node.ParentNode Is Nothing
DoEvents
Set node = node.ParentNode
Wend
If InStr(1, node.ID, "column", vbTextCompare) > 0 Then
Set FindMyColumn = node
psColumn = CStr(node.ID)
End If
End Function
By the way, if you trade a lot the brokers get rich and you get poor, brokerage charges really impact in long run.
I have a program to remove duplicates and everything is working properly. It is just freezing with large data sets i.e. 1 to 2.5 million words.
What is wrong with my approach? Is there a better one?
Sub DeleteDuplicateParagraphs()
Dim p1 As Paragraph
Dim p2 As Paragraph
Dim DupCount As Long
Dim StartTime As Double
Dim SecondsElapsed As Double
StartTime = Timer
For Each p1 In ActiveDocument.Paragraphs
If p1.range.Text <> vbCr Then
For Each p2 In ActiveDocument.Paragraphs
If p1.range.Text = p2.range.Text Then
DupCount = DupCount + 1
If p1.range.Text = p2.range.Text And DupCount > 1 Then p2.range.Delete
End If
Next p2
End If
DupCount = 0
Next p1
SecondsElapsed = Round(Timer - StartTime, 2)
MsgBox "This code ran successfully in " & SecondsElapsed & " seconds", vbInformation
DupCount = 0
End Sub
Try this (first add a reference to the Microsoft Scripting Runtime to your VBA project):
Sub DeleteDuplicateParagraphs()
Dim p As Paragraph
Dim d As New Scripting.Dictionary
Dim t As Variant
Dim i As Integer
Dim StartTime As Single
StartTime = Timer
' collect duplicates
For Each p In ActiveDocument.Paragraphs
t = p.Range.Text
If t <> vbCr Then
If Not d.Exists(t) Then d.Add t, New Scripting.Dictionary
d(t).Add d(t).Count + 1, p
End If
Next
' eliminate duplicates
Application.ScreenUpdating = False
For Each t In d
For i = 2 To d(t).Count
d(t)(i).Range.Delete
Next
Next
Application.ScreenUpdating = True
MsgBox "This code ran successfully in " & Round(Timer - StartTime, 2) & " seconds", vbInformation
End Sub
This makes use of the fact that the Scripting.Dictionary is a hash table that is geared towards very quickly associating unique keys with values. It is therefore very good at spotting duplicate keys. Dictionary keys have to be strings, conveniently we can use the paragraph texts for that.
For values we use more dictionary objects, solely for the fact that they work a lot better than VBA's arrays. In them we collect the references to the actual paragraph instances with the same text.
Actually deleting duplicate paragraphs is a very simple matter afterwards.
Note: The duplicate detection part in the above code is very fast. However, if Word becomes unresponsive in large documents then it's in the duplicate removal part, namely because of Word's undo buffer.
The culprit is that the paragraph ranges are deleted one after another, causing Word to build a very large undo buffer. Unfortunately there is no way (that I know of) to either
delete multiple separate ranges in one step (which would result in only a single entry in the undo buffer), or
disable the undo buffer altogether from VBA
Calling UndoClear periodically in the "eliminate duplicates" loop might help, disabling ScreenUpdating is also not a bad idea:
' eliminate duplicates
Dim x As Integer
Application.ScreenUpdating = False
For Each t In d
x = x + 1
For i = 2 To d(t).Count
d(t)(i).Range.Delete
Next
If x Mod 50 = 0 Then ActiveDocument.UndoClear
Next
ActiveDocument.UndoClear
Application.ScreenUpdating = True
First of all, Just wanted to thank you so much for the time and effort you have put in to helping me.
Your idea behind the method is really impressive. I did change the code slightly and would like you to peruse it when you have the time, to see if it is of optimal standard. Again, I truly thank you, the code ran 20 splits faster than the previous and that is not even over a larger data set.
> Sub DeleteDuplicateParagraphs()
>
> Dim p As Paragraph
> Set d = CreateObject("Scripting.Dictionary")
> Dim t As Variant
> Dim i As Integer
> Dim StartTime As Single
>
> StartTime = Timer
>
> ' collect duplicates For Each p In ActiveDocument.Paragraphs
> t = p.range.Text
> If t <> vbCr Then
> If Not d.Exists(t) Then d.Add t, CreateObject("Scripting.Dictionary")
> d(t).Add d(t).Count + 1, p
> End If Next
>
> ' eliminate duplicates For Each t In d
> For i = 2 To d(t).Count
> d(t)(i).range.Delete
> Next Next
>
> MsgBox "This code ran successfully in " & Round(Timer - StartTime,
> 2) & " seconds", vbInformation
>
> End Sub
I have two bits of code. First a standard copy paste from cell A to cell B
Sheets(sheet_).Cells(x, 1).Copy Destination:=Sheets("Output").Cells(startrow, 2)
I can do almost the same using
Sheets("Output").Cells(startrow, 2) = Sheets(sheet_).Cells(x, 1)
Now this second method is much faster, avoiding copying to clipboard and pasting again. However it does not copy across the formatting as the first method does. The Second version is almost instant to copy 500 lines, while the first method adds about 5 seconds to the time. And the final version could be upwards of 5000 cells.
So my question can the second line be altered to included the cell formatting (mainly font colour) while still staying fast.
Ideally I would like to be able to copy the cell values to a array/list along with the font formatting so I can do further sorting and operations on them before I "paste" them back on to the worksheet..
So my ideal solution would be some thing like
for x = 0 to 5000
array(x) = Sheets(sheet_).Cells(x, 1) 'including formatting
next
for x = 0 to 5000
Sheets("Output").Cells(x, 1)
next
is it possible to use RTF strings in VBA or is that only possible in vb.net, etc.
Answer*
Just to see how my origianl method and new method compar, here are the results or before and after
New code = 65msec
Sheets("Output").Cells(startrow, 2) = Sheets(sheet_).Cells(x, 1)
Sheets("Output").Range("B" & startrow).Font.ColorIndex = Sheets(sheet_).Range("A" & x).Font.ColorIndex 'copy font colour as well
Old code = 1296msec
'Sheets("Output").Cells(startrow, 2).Value = Sheets(sheet_).Cells(x, 1)
'Sheets(sheet_).Cells(x, 1).Copy
'Sheets("Output").Cells(startrow, 2).PasteSpecial (xlPasteFormats)
'Application.CutCopyMode = False
You could have simply used Range("x1").value(11)
something like below:
Sheets("Output").Range("$A$1:$A$500").value(11) = Sheets(sheet_).Range("$A$1:$A$500").value(11)
range has default property "Value" plus value can have 3 optional orguments 10,11,12.
11 is what you need to tansfer both value and formats. It doesn't use clipboard so it is faster.- Durgesh
For me, you can't. But if that suits your needs, you could have speed and formatting by copying the whole range at once, instead of looping:
range("B2:B5002").Copy Destination:=Sheets("Output").Cells(startrow, 2)
And, by the way, you can build a custom range string, like Range("B2:B4, B6, B11:B18")
edit: if your source is "sparse", can't you just format the destination at once when the copy is finished ?
Remember that when you write:
MyArray = Range("A1:A5000")
you are really writing
MyArray = Range("A1:A5000").Value
You can also use names:
MyArray = Names("MyWSTable").RefersToRange.Value
But Value is not the only property of Range. I have used:
MyArray = Range("A1:A5000").NumberFormat
I doubt
MyArray = Range("A1:A5000").Font
would work but I would expect
MyArray = Range("A1:A5000").Font.Bold
to work.
I do not know what formats you want to copy so you will have to try.
However, I must add that when you copy and paste a large range, it is not as much slower than doing it via an array as we all thought.
Post Edit information
Having posted the above I tried by own advice. My experiments with copying Font.Color and Font.Bold to an array have failed.
Of the following statements, the second would fail with a type mismatch:
ValueArray = .Range("A1:T5000").Value
ColourArray = .Range("A1:T5000").Font.Color
ValueArray must be of type variant. I tried both variant and long for ColourArray without success.
I filled ColourArray with values and tried the following statement:
.Range("A1:T5000").Font.Color = ColourArray
The entire range would be coloured according to the first element of ColourArray and then Excel looped consuming about 45% of the processor time until I terminated it with the Task Manager.
There is a time penalty associated with switching between worksheets but recent questions about macro duration have caused everyone to review our belief that working via arrays was substantially quicker.
I constructed an experiment that broadly reflects your requirement. I filled worksheet Time1 with 5000 rows of 20 cells which were selectively formatted as: bold, italic, underline, subscript, bordered, red, green, blue, brown, yellow and gray-80%.
With version 1, I copied every 7th cells from worksheet "Time1" to worksheet "Time2" using copy.
With version 2, I copied every 7th cells from worksheet "Time1" to worksheet "Time2" by copying the value and the colour via an array.
With version 3, I copied every 7th cells from worksheet "Time1" to worksheet "Time2" by copying the formula and the colour via an array.
Version 1 took an average of 12.43 seconds, version 2 took an average of 1.47 seconds while version 3 took an average of 1.83 seconds. Version 1 copied formulae and all formatting, version 2 copied values and colour while version 3 copied formulae and colour. With versions 1 and 2 you could add bold and italic, say, and still have some time in hand. However, I am not sure it would be worth the bother given that copying 21,300 values only takes 12 seconds.
** Code for Version 1**
I do not think this code includes anything that needs an explanation. Respond with a comment if I am wrong and I will fix.
Sub SelectionCopyAndPaste()
Dim ColDestCrnt As Integer
Dim ColSrcCrnt As Integer
Dim NumSelect As Long
Dim RowDestCrnt As Integer
Dim RowSrcCrnt As Integer
Dim StartTime As Single
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
NumSelect = 1
ColDestCrnt = 1
RowDestCrnt = 1
With Sheets("Time2")
.Range("A1:T715").EntireRow.Delete
End With
StartTime = Timer
Do While True
ColSrcCrnt = (NumSelect Mod 20) + 1
RowSrcCrnt = (NumSelect - ColSrcCrnt) / 20 + 1
If RowSrcCrnt > 5000 Then
Exit Do
End If
Sheets("Time1").Cells(RowSrcCrnt, ColSrcCrnt).Copy _
Destination:=Sheets("Time2").Cells(RowDestCrnt, ColDestCrnt)
If ColDestCrnt = 20 Then
ColDestCrnt = 1
RowDestCrnt = RowDestCrnt + 1
Else
ColDestCrnt = ColDestCrnt + 1
End If
NumSelect = NumSelect + 7
Loop
Debug.Print Timer - StartTime
' Average 12.43 secs
Application.Calculation = xlCalculationAutomatic
End Sub
** Code for Versions 2 and 3**
The User type definition must be placed before any subroutine in the module. The code works through the source worksheet copying values or formulae and colours to the next element of the array. Once selection has been completed, it copies the collected information to the destination worksheet. This avoids switching between worksheets more than is essential.
Type ValueDtl
Value As String
Colour As Long
End Type
Sub SelectionViaArray()
Dim ColDestCrnt As Integer
Dim ColSrcCrnt As Integer
Dim InxVLCrnt As Integer
Dim InxVLCrntMax As Integer
Dim NumSelect As Long
Dim RowDestCrnt As Integer
Dim RowSrcCrnt As Integer
Dim StartTime As Single
Dim ValueList() As ValueDtl
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
' I have sized the array to more than I expect to require because ReDim
' Preserve is expensive. However, I will resize if I fill the array.
' For my experiment I know exactly how many elements I need but that
' might not be true for you.
ReDim ValueList(1 To 25000)
NumSelect = 1
ColDestCrnt = 1
RowDestCrnt = 1
InxVLCrntMax = 0 ' Last used element in ValueList.
With Sheets("Time2")
.Range("A1:T715").EntireRow.Delete
End With
StartTime = Timer
With Sheets("Time1")
Do While True
ColSrcCrnt = (NumSelect Mod 20) + 1
RowSrcCrnt = (NumSelect - ColSrcCrnt) / 20 + 1
If RowSrcCrnt > 5000 Then
Exit Do
End If
InxVLCrntMax = InxVLCrntMax + 1
If InxVLCrntMax > UBound(ValueList) Then
' Resize array if it has been filled
ReDim Preserve ValueList(1 To UBound(ValueList) + 1000)
End If
With .Cells(RowSrcCrnt, ColSrcCrnt)
ValueList(InxVLCrntMax).Value = .Value ' Version 2
ValueList(InxVLCrntMax).Value = .Formula ' Version 3
ValueList(InxVLCrntMax).Colour = .Font.Color
End With
NumSelect = NumSelect + 7
Loop
End With
With Sheets("Time2")
For InxVLCrnt = 1 To InxVLCrntMax
With .Cells(RowDestCrnt, ColDestCrnt)
.Value = ValueList(InxVLCrnt).Value ' Version 2
.Formula = ValueList(InxVLCrnt).Value ' Version 3
.Font.Color = ValueList(InxVLCrnt).Colour
End With
If ColDestCrnt = 20 Then
ColDestCrnt = 1
RowDestCrnt = RowDestCrnt + 1
Else
ColDestCrnt = ColDestCrnt + 1
End If
Next
End With
Debug.Print Timer - StartTime
' Version 2 average 1.47 secs
' Version 3 average 1.83 secs
Application.Calculation = xlCalculationAutomatic
End Sub
Just use the NumberFormat property after the Value property:
In this example the Ranges are defined using variables called ColLetter and SheetRow and this comes from a for-next loop using the integer i, but they might be ordinary defined ranges of course.
TransferSheet.Range(ColLetter & SheetRow).Value = Range(ColLetter & i).Value
TransferSheet.Range(ColLetter & SheetRow).NumberFormat = Range(ColLetter & i).NumberFormat
Does:
Set Sheets("Output").Range("$A$1:$A$500") = Sheets(sheet_).Range("$A$1:$A$500")
...work? (I don't have Excel in front of me, so can't test.)
I have the following Data in Excel.
CHM0123456 SRM0123:01
CHM0123456 SRM0123:02
CHM0123456 SRM0256:12
CHM0123456 SRM0123:03
CHM0123457 SRM0789:01
CHM0123457 SRM0789:02
CHM0123457 SRM0789:03
CHM0123457 SRM0789:04
What I need to do is pull out all the relevent SRM numbers that relate to a single CHM ref. now I have a formular that will do some thing like this
=INDEX($C$2:$C$6, SMALL(IF($B$8=$B$2:$B$6, ROW($B$2:$B$6)-MIN(ROW($B$2:$B$6))+1, ""), ROW(A1)))
however this is a bit untidy and I really want to produce this same using short vb script, do i jsut have to right a loop that will run though and check each row in turn.
For x = 1 to 6555
if Ax = Chm123456
string = string + Bx
else
next
which should give me a final string of
SRM0123:01,SRM123:02,SRM0256:12,SRM0123:03
to use with how i want.
Or is ther a neater way to do this ?
Cheers
Aaron
my current code
For x = 2 To 6555
If Cells(x, 1).Value = "CHM0123456" Then
outstring = outstring + vbCr + Cells(x, 2).Value
End If
Next
MsgBox (outstring)
End Function
I'm not sure what your definition of 'neat' is, but here is a VBA function that I consider very neat and also flexible and it's lightning fast (10k+ entires with no lag). You pass it the CHM you want to look for, then the range to look in. You can pass a third optional paramater to set how each entry is seperated. So in your case you could write (assuming your list is :
=ListUnique(B2, B2:B6555)
You can also use Char(10) as the third parameter to seperat by line breaks, etc.
Function ListUnique(ByVal search_text As String, _
ByVal cell_range As range, _
Optional seperator As String = ", ") As String
Application.ScreenUpdating = False
Dim result As String
Dim i as Long
Dim cell As range
Dim keys As Variant
Dim dict As Object
Set dict = CreateObject("scripting.dictionary")
On Error Resume Next
For Each cell In cell_range
If cell.Value = search_text Then
dict.Add cell.Offset(, 1).Value, 1
End If
Next
keys = dict.keys
For i = 0 To UBound(keys)
result = result & (seperator & keys(i))
Next
If Len(result) <> 0 Then
result = Right$(result, (Len(result) - Len(seperator)))
End If
ListUnique = result
Application.ScreenUpdating = True
End Function
How it works: It simple loops through your range looking for the search_string you give it. If it finds it, it adds it to a dictionary object (which will eliminate all dupes). You dump the results in an array then create a string out of them. Technically you can just pass it "B:B" as the search array if you aren't sure where the end of the column is and this function will still work just fine (1/5th of a second for scanning every cell in column B with 1000 unique hits returned).
Another solution would be to do an advancedfilter for Chm123456 and then you could copy those to another range. If you get them in a string array you can use the built-in excel function Join(saString, ",") (only works with string arrays).
Not actual code for you but it points you in a possible direction that can be helpful.
OK, this might be pretty fast for a ton of data. Grabbing the data for each cell takes a ton of time, it is better to grab it all at once. The the unique to paste and then grab the data using
vData=rUnique
where vData is a variant and rUnique is the is the copied cells. This might actually be faster than grabbing each data point point by point (excel internally can copy and paste extremely fast). Another option would be to grab the unique data without having the copy and past happen, here's how:
dim i as long
dim runique as range, reach as range
dim sData as string
dim vdata as variant
set runique=advancedfilter(...) 'Filter in place
set runique=runique.specialcells(xlCellTypeVisible)
for each reach in runique.areas
vdata=reach
for i=lbound(vdata) to ubound(vdata)
sdata=sdata & vdata(i,1)
next l
next reach
Personally, I would prefer the internal copy paste then you could go through each sheet and then grab the data at the very end (this would be pretty fast, faster than looping through each cell). So going through each sheet.
dim wks as worksheet
for each wks in Activeworkbook.Worksheets
if wks.name <> "CopiedToWorksheet" then
advancedfilter(...) 'Copy to bottom of list, so you'll need code for that
end if
next wks
vdata=activeworkbook.sheets("CopiedToWorksheet").usedrange
sData=vdata(1,1)
for i=lbound(vdata) + 1 to ubound(vdata)
sData=sData & ","
next i
The above code should be blazing fast. I don't think you can use Join on a variant, but you could always attempt it, that would make it even faster. You could also try application.worksheetfunctions.contat (or whatever the contatenate function is) to combine the results and then just grab the final result.
On Error Resume Next
wks.ShowAllData
On Error GoTo 0
wks.UsedRange.Rows.Hidden = False
wks.UsedRange.Columns.Hidden = False
rFilterLocation.ClearContents