Manipulate/copy .CSV data, without opening the file? - optimization

I'm trying to optimize some code that takes some test data stored in CSV files does some analysis and copies their data into an excel sheet. This code is often run on hundreds of tests at a time, and its taking about 4.5 seconds per test so it can take hours to complete at times.
I looked up some optimization techniques and cut it down by about .25 seconds per test but I think the bulk of the time is being taken up by excel having to "open" the individual files before it can do anything with them. Is there a way to do this more efficiently?
I am open to answers that involve using another language to compile the files into one big file if that would make things quicker.

I would open them as text rather than workbooks:
Sub ReadCSV()
Dim MyString As String
Open "C:\path\text.csv" For Input As #1 ' Open file for input.
Do While Not EOF(1) ' Loop until end of file.
Line Input #1, MyString ' Read a line into variable
Debug.Print MyString ' Print data to the Immediate window.
Loop
Close #1 ' Close file.
End Sub
This will be much faster than opening as a workbook

I have this function working greate handling lot of CSV files. You need to indicate in cell "D11" the name of folder containing all the CSV files and will combine them into one single file. I handle over 200 files and make it quick. Hope it helps.
Sub CombineAllFilesInADirectory()
Dim Path As String 'string variable to hold the path to look through
Dim FileName As String 'temporary filename string variable
Dim tWB As Workbook 'temporary workbook (each in directory)
Dim tWS As Worksheet 'temporary worksheet variable
Dim aWS As Worksheet 'active sheet in master workbook
Dim RowCount As Long 'Rows used on master sheet
Dim uRange As Range 'usedrange for each temporary sheet
Dim mWB_comb As Workbook 'master workbook exclusivo de esta funcion
Path = Sheets("CombineFiles").Range("D11").Value
Application.EnableEvents = False 'turn off events
Application.ScreenUpdating = False 'turn off screen updating
Set mWB_comb = Workbooks.Add(1) 'create a new one-worksheet workbook
Set aWS = mWB_comb.ActiveSheet 'set active sheet variable to only sheet in mWB
If Right(Path, 1) <> Application.PathSeparator Then 'if path doesnt end in "\"
Path = Path & Application.PathSeparator 'add "\"
End If
FileName = Dir(Path & "*.csv", vbNormal) 'set first file's name to filename variable
Application.StatusBar = "reading files, please wait."
Do Until FileName = "" 'loop until all files have been parsed
If Path <> ThisWorkbook.Path Or FileName <> ThisWorkbook.Name Then
Set tWB = Workbooks.Open(FileName:=Path & FileName) 'open file, set to tWB variable
For Each tWS In tWB.Worksheets 'loop through each sheet
Set uRange = tWS.Range("A4", tWS.Cells(tWS.UsedRange.Row + tWS.UsedRange.Rows.count - 1, _
tWS.UsedRange.Column + tWS.UsedRange.Columns.count - 1)) 'set used range
If RowCount + uRange.Rows.count > 65536 Then 'if the used range wont fit on the sheet
aWS.Columns.AutoFit 'autofit mostly-used worksheet's columns
Set aWS = mWB_comb.Sheets.Add(After:=aWS) 'add a new sheet that will accommodate data
RowCount = 0 'reset RowCount variable
End If
If RowCount = 0 Then 'if working with a new sheet
aWS.Range("A1", aWS.Cells(3, uRange.Columns.count)).Value = tWS.Range("A1", _
tWS.Cells(3, uRange.Columns.count)).Value 'copy headers from tWS
RowCount = 3 'add one to rowcount
End If
aWS.Range("A" & RowCount + 1).Resize(uRange.Rows.count, _
uRange.Columns.count).Value = uRange.Value 'move data from temp sheet to data sheet
RowCount = RowCount + uRange.Rows.count 'increase rowcount accordingly
Next 'tWS
tWB.Close False 'close temporary workbook without saving
End If
FileName = Dir() 'set next file's name to FileName variable
Loop
Application.StatusBar = "Ready"
mWB_comb.Sheets(1).Select 'select first data sheet on master workbook
Application.EnableEvents = True 're-enable events
Application.ScreenUpdating = True 'turn screen updating back on
'Clear memory of the object variables
Set tWB = Nothing
Set tWS = Nothing
Set mWB_comb = Nothing
Set aWS = Nothing
Set uRange = Nothing
End Sub

Related

Edit VBA code that Consolidates Workbooks

I have a macro here that consolidates all files into a master file. Currently this copies the entire worksheet from each file and stacks it into the master file. This ideal code will go into source file and find the header row and copy below. The header row is not static. Sometimes its on Row 5, sometimes row 15. There is data above the header and its usually long text strings.
How do I edit the below code to do that?
Thanks In Advance!
Here is the Code:
Sub Consolidate_BST()
Dim SummarySheet As Worksheet
Dim FolderPath As String
Dim NRow As Long
Dim FileName As String
Dim WorkBk As Workbook
Dim sourceRange As Range
Dim destrange As Range
Worksheets("Consolidate BSts").Range("A1:J50000").ClearContents
Application.AskToUpdateLinks = False
Application.DisplayAlerts = False
' Create a new workbook and set a variable to the first sheet.
Set SummarySheet = ThisWorkbook.Sheets("Consolidate BSts")
' Modify this folder path to point to the files you want to use. 'M
FolderPath = "C:\Users\413315\Documents\\March Bluesheets"
' NRow keeps track of where to insert new rows in the destination workbook.
NRow = 1
' Call Dir the first time, pointing it to all Excel files in the folder path.
FileName = Dir(FolderPath & "*.xl*")
' Loop until Dir returns an empty string.
Do While FileName <> ""
' Open a workbook in the folder
Set WorkBk = Workbooks.Open(FolderPath & FileName)
' Set the cell in column A to be the file name.
SummarySheet.Range("A" & NRow).Value = FileName
' Modify this range for your workbooks.
' It can span multiple rows.
Set sourceRange = WorkBk.Worksheets("Regional Estimates").Range("B3:J1000")
' Set the destination range to start at column B and
' be the same size as the source range.
Set destrange = SummarySheet.Range("B" & NRow)
Set destrange = destrange.Resize(sourceRange.Rows.Count, _
sourceRange.Columns.Count)
' Copy over the values from the source to the destination.
destrange.Value = sourceRange.Value
' Increase NRow so that we know where to copy data next.
NRow = NRow + destrange.Rows.Count
' Close the source workbook without saving changes.
WorkBk.Close savechanges:=False
' Use Dir to get the next file name.
FileName = Dir()
Loop
SummarySheet.Columns.AutoFit
Worksheets("Consolidate BSts").Range("D2:D50000").ClearContents
End Sub
Assuming that the header is always in the same column, you could add something like the following to the top of your code:
Dim cond As Boolean
Dim headerRow As Integer, i As Integer
cond = False
While cond <> True
i = i + 1
If sourceRange.Cells(i, 1) = "Name of Header" Then 'For a header in column A
i = headerRow
cond = True
End If
Wend
Then with you should be able to modify the range of data that it intakes based on what row the headers are in.
Also you could change the if statement to something more general, but it's hard to know what it should be without knowing what your headers and data look like.

Copying data from many workbooks to a summary workbook with Excel-VBA. Run time errors

I have files in a folder and I want to copy data from these files and paste them into another Master workbook sheet.
I keep getting a runtime error ‘1004’: Sorry we couldn’t find C:\Users\jjordan\Desktop\Test Dir\MASTER`, It is possible it was moved, renamed or deleted.
The error is highlighted on this line of code: Workbooks.Open SumPath & SumName
I have seen other questions similar to this on the web, I have tried making various changes. But still without success. Please advise.
Dir for source files: C:\Users\ jjordan \Desktop\Test Dir\GA Test\
Dir for Master file: C:\Users\ jjordan \Desktop\Test Dir\MASTER\
Source filenames differ, but all end in "*.xlsx."
Master filename: " MASTER – Data List - 2016.xlsm " ‘macro file
Source worksheet name = "Supplier_Comments"
Master worksheet name = "Sheet5"
Option Explicit
Sub GetDataFromMaster()
Dim MyPath As String
Dim SumPath As String
Dim MyName As String
Dim SumName As String
Dim MyTemplate As String
Dim SumTemplate As String
Dim myWS As Worksheet
Dim sumWS As Worksheet
'Define folders and filenames
MyPath = "C:\Users\jjordan\Desktop\Test Dir\GA Test\"
SumPath = "C:\Users\jjordan\Desktop\Test Dir\MASTER\"
MyTemplate = "*.xlsx" 'Set the template.
SumTemplate = "MASTER – Data List - 2016.xlsm"
'Open the template file and get the Worksheet to put the data into
SumName = Dir(SumPath & SumTemplate)
Workbooks.Open SumPath & SumName
Set sumWS = ActiveWorkbook.Worksheets("Sheet5")
'Open each source file, copying the data from each into the template file
MyName = Dir(MyPath & MyTemplate) 'Retrieve the first file
Do While MyName <> ""
'Open the source file and get the worksheet with the data we want.
Workbooks.Open MyPath & MyName
Set myWS = ActiveWorkbook.Worksheets("Suppliers_Comment")
'Copy the data from the source and paste at the end of sheet 5
myWS.Range("A2:N100").Copy
sumWS.Range("A65536").End(xlUp).Offset(1, 0).PasteSpecial Paste:=xlPasteValues
'Close the current sourcefile and get the next
Workbooks(MyName).Close SaveChanges:=False 'close
MyName = Dir 'Get next file
Loop
'Now all sourcefiles are copied into the Template file. Close and save it
Workbooks(SumName).Close SaveChanges:=True
End Sub
Here is a template for what you'd like done. NOTE that forward slashes can cause run time error b/c vba handles them in an annoying way.
Sub DougsLoop()
Dim wbk As Workbook
Dim Filename As String
Dim path As String
Dim rCell As Range
Dim rRng As Range
Dim wsO As Worksheet
Dim StartTime As Double
Dim SecondsElapsed As Double
Dim sheet As Worksheet
Application.ScreenUpdating = False 'these three statements help performance by disabling the self titled in each, remeber to re-enable at end of code
Application.DisplayAlerts = False
Application.Calculation = xlCalculationManual
StartTime = Timer 'Starts timer to see how long code takes to execute. I like having this in macors that loop through files
path = "C:\Users\jjordan\Desktop\Test Dir\GA Test" & "\" 'pay attention to this line of code********
Filename = Dir(path & "*.xl??")
Set wsO = ThisWorkbook.Sheets("Sheet5")
Do While Len(Filename) > 0 'this tells the code to stop when there are no more files in the destination folder
DoEvents
Set wbk = Workbooks.Open(path & Filename, True, True)
For Each sheet In ActiveWorkbook.Worksheets
Set rRng = sheet.Range("a2:n100")
For Each rCell In rRng.Cells
wsO.Cells(wsO.Rows.count, 1).End(xlUp).Offset(1, 0).Value = rCell
Next rCell
Next
wbk.Close False
Filename = Dir
Loop
Application.ScreenUpdating = True
Application.DisplayAlerts = True
Application.Calculation = xlCalculationAutomatic
SecondsElapsed = Round(Timer - StartTime, 2)
MsgBox "This code ran successfully in " & SecondsElapsed & " seconds", vbInformation
End Sub
alter to this to your needs and you'll find it works perfectly :)
EDIT: Also in your code you make use of COPY & PASTE a lot. Try avoid doing this in the future. Try doing something:
ThisWorkbook.Sheets("Sheet1").Range("a1").Value = OtherWork.Sheets("Sheet1").Range("a1").Value
This is more efficient and wont bog down your code as much.
here is some offset logic
wsO.Cells(wsO.Rows.count, 1).End(xlUp).Offset(1, 0).Value =
wsO.Cells(wsO.Rows.count, 1).End(xlUp).Offset(0, 1).Value =
wsO.Cells(wsO.Rows.count, 1).End(xlUp).Offset(0, 2).Value =
notice the Offset(x,y) value? Essentially x is down and y is right. this is of course referencing the original position. So to get a value to go in the same row but three columns over you would use "Offset(0,3)" etc etc
Ill let you alter your code to do this. :)
I guess actually trying to piece it together was a struggle? Here this version assumes the macro is in the master workbook(and that youre running it form the master). If you want to change go ahead, but this is as far as I go. At some point, you'll have to experiment on your own.

Delimit/text to columns .CSV file before extracting data

I have a code that goes through .xlsx , .xls, and .csv files(many files). But the .csv file data are usually all in column A separated by "|". How can I delimit these files first before looping through them and pulling an extract? It can be complicated because sometimes not only Col A has data but Col B may have a few rows.
EDIT~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sub ColumnHeaders()
'includes filling down
'Skips unreadable files
Dim wb As Workbook, fileNames As Object, errCheck As Boolean
Dim wsReport As Excel.Worksheet
Set wsReport = ActiveWorkbook.Sheets("Sheet1") 'Whatever sheet you want to write to
Dim lRow As Long
lRow = 1
' Turn off screen updating and automatic calculation
With Application
.ScreenUpdating = False
.Calculation = xlCalculationManual
End With
'get user input for files to search
Set fileNames = CreateObject("Scripting.Dictionary")
errCheck = UserInput.FileDialogDictionary(fileNames)
If errCheck Then
Exit Sub
End If
'''
For Each Key In fileNames 'loop through the dictionary
On Error Resume Next
Set wb = Workbooks.Open(fileNames(Key))
If Err.Number <> 0 Then
Set wb = Nothing ' or set a boolean error flag
End If
On Error GoTo 0 ' or your custom error handler
If wb Is Nothing Then
wksSkipped.Cells(wksSkipped.Cells(wksSkipped.Rows.Count, "A").End(xlUp).Row + 1, 1) = fileNames(Key)
Else
Debug.Print "Successfully loaded " & fileNames(Key)
wb.Application.Visible = False 'make it not visible
'--------------------DATA Extraction ----------------------------------------
Dim iIndex As Integer
Dim lCol As Long
Dim lOutputCol As Long
'Loop through the worksheets in the current workbook.
For iIndex = 1 To wb.Worksheets.Count
'Set the current worksheet
Set ws = Application.Worksheets(iIndex)
'List out the workbook and worksheet names
wsReport.Range("A" & lRow).Value = wb.Name
wsReport.Range("B" & lRow).Value = ws.Name
'Start a counter of the columns that we are writing to
lOutputCol = 3
'Loop through the columns.
For lCol = 1 To ws.UsedRange.Columns.Count
'Write the header
wsReport.Range(Col_Letter(lOutputCol) & lRow).Value = ws.Range(Col_Letter(lCol) & "1").Value
'Increment our column counters.
lOutputCol = lOutputCol + 1
Next lCol
'Increment the row we are writing to
lRow = lRow + 1
Next iIndex
'-----------------------Data Extraction END-------------------------------------
wb.Close savechanges:=False 'close the workbook do not save
Set wb = Nothing 'release the object
End If
Next 'End of the fileNames loop
Set fileNames = Nothing
' Reset system settings
With Application
.Calculation = xlCalculationAutomatic
.ScreenUpdating = True
.Visible = True
End With
End Sub
Two possible approaches.
APPROACH #1
Your code first opens the file as text, then prepends (inserts as the first line) to the file this:
sep=|
Next, your code then opens as CSV in Excel. If your text qualifiers are normal quotes, this will work surprisingly well.
I know that prepending is a little tricky in VBA. If the file is not too big, then you would create a new file with that first line followed by the rest of the original file (e.g. http://www.ozgrid.com/forum/showthread.php?t=163510).
The advantage of this approach is that your VBA doesn't need to do text-to-columns, etc.
APPROACH #2
The VBA loops through all rows and columns to clean up first before performing text-to-columns. As you said, if the delimiter is non-standard, you often end up with values in column B or C or even more so you need "clean up" first by putting them all back into column A.
I used to run my own little macro to do this cleaning up for exactly this reason i.e. "|" as a delimiter.
Later, I discovered that I could insert "sep=|" (e.g. using Notepad++) before opening in Excel.
I wrote a short article on both approaches and offered my code here: https://wildwoollytech.wordpress.com/2015/12/11/combine-excel-cells-into-one/

Trying to iterate through some workbooks from a list of workbooks, getting out of range errors

I have a problem. I'm guessing its easier to first write the code, and then explain it so here goes:
Sub Test()
Dim myHeadings() As String
Dim i As Long
Dim path As String
Dim pathtwo As String
Dim currentWb As Workbook
Dim openWb As Workbook
Dim openWs As Worksheet
Set currentWb = ActiveWorkbook
path = "C:\pathto\"
pfile = Split("File1,File2,File3", ",")
myHeadings = Split("Januari,Februari,Mars,April,Maj,Juni,Juli,Augusti,September,Oktober,November,December", ",")
For j = 0 To UBound(pfile)
pathtwo = path & pfile(j) & ".xlsx"
i = 0
If IsFile(pathtwo) = True Then
For i = 0 To UBound(myHeadings)
Set openWb = Workbooks.Open(pathtwo)
Set openWs = openWb.Sheets(myHeadings(i))
If openWs.Range("C34") = 0 Then
currentWb.Sheets("Indata").Cells(70, i + 27 + 12*j.Value = ""
Else
currentWb.Sheets("Indata").Cells(70, i + 27 + 12*j).Value = openWs.Range("C34")
End If
Next i
End if
Workbooks(openWb.Name).Close
Next j
End sub
What I want to pick a file from the pfile list, iterate through all its sheets defined in myHeadings and deduct the value at C34 (in reality there are plenty more values deducted, but to keep it short). After this I want to Close the file, go to the next file and do the same thing until all the Three files (again, in reality there are more, which some of them does not exist yet).
The function "IsFile" is
Function IsFile(fName As String) As Boolean
'Returns TRUE if the provided name points to an existing file.
'Returns FALSE if not existing, or if it's a folder
On Error Resume Next
IsFile = ((GetAttr(fName) And vbDirectory) <> vbDirectory)
End Function
written by iDevlop at stackoverflow, in this thread: VBA check if file exists
The reason why I have
currentWb.Sheets("Indata").Cells(70, i + 27 + 12*j).Value = openWs.Range("C34")
is because I want to start to write my data into currentWb at AA70 (Row 70, column 27). j*12 is because it is "periodic" depending on which file it is (the file file1 corresponds to 2015, file2 to 2016 etc), and hence in my summary I have it month and yearwise.
The problem arises though when I run this macro, at the first file at the sheet Mars I get out of range, but Before I added the iteration of files, there was not any subscript out of range at the first file. Is there anyone who can see how this can be?
Please note that indentation and so on may be somewhat off as I copied this from a much larger file with many rows in between with irrelevant code.
This isnt the right answer for your specific question but this is how I have done something similar and might help you to see how i did it. Basically what this is doing is opening up a CSV and copying the entire sheet and pasting it into a workbook. I was consolidating like 20 CSV dumps into 1 workbook to make it easier to dig through the stuff.
Regarding Dir()
You can invoke Dir with 2 arguments or with no arguments. You initialize it with 2 arguments the pathway and the attributes (which is optional). The 2nd time I am calling Dir in this sub it is without any arguments. What this does is fetch the subsequent files.
Sub Add_Sheets()
Dim ws As Worksheet
Dim PasteSheet As Worksheet
Dim wb As Workbook
Set wb = Application.Workbooks.Open("C:\Users\Desktop\CSV\All.xlsx") 'Location of where you want the workbook to be
StrFile = Dir("c:\Users\Desktop\CSV\*.csv") 'Dir of where all the CSVs were.
Do While Len(StrFile) > 0
Debug.Print StrFile
Application.Workbooks.Open ("c:\Users\Desktop\CSV\" & StrFile)
Set ws = ActiveSheet
ws.Range("A1:C" & rows.Count).Select 'Selecting Specific content on the worksheet
Selection.Copy
wb.Activate
wb.Worksheets.add(After:=Worksheets(Worksheets.Count)).name = StrFile 'Setting the sheet name to the name of the CSV file
Range("A1").PasteSpecial Paste:=xlPasteValues
StrFile = Dir
Loop
End Sub

Copy data from another Workbook through VBA

I want to collect data from different files and insert it into a workbook doing something like this.
Do While THAT_DIFFERENT_FILE_SOMEWHERE_ON_MY_HDD.Cells(Rand, 1).Value <> "" And Rand < 65536
then 'I will search if the last row in my main worksheet is in this file...
End Loop
If the last row from my main worksheet is in the file, I'll quit the While Loop. If not, I'll copy everything. I'm having trouble finding the right algorithm for this.
My problem is that I don't know how to access different workbooks.
The best (and easiest) way to copy data from a workbook to another is to use the object model of Excel.
Option Explicit
Sub test()
Dim wb As Workbook, wb2 As Workbook
Dim ws As Worksheet
Dim vFile As Variant
'Set source workbook
Set wb = ActiveWorkbook
'Open the target workbook
vFile = Application.GetOpenFilename("Excel-files,*.xls", _
1, "Select One File To Open", , False)
'if the user didn't select a file, exit sub
If TypeName(vFile) = "Boolean" Then Exit Sub
Workbooks.Open vFile
'Set targetworkbook
Set wb2 = ActiveWorkbook
'For instance, copy data from a range in the first workbook to another range in the other workbook
wb2.Worksheets("Sheet2").Range("C3:D4").Value = wb.Worksheets("Sheet1").Range("A1:B2").Value
End Sub
You might like the function GetInfoFromClosedFile()
Edit: Since the above link does not seem to work anymore, I am adding alternate link 1 and alternate link 2 + code:
Private Function GetInfoFromClosedFile(ByVal wbPath As String, _
wbName As String, wsName As String, cellRef As String) As Variant
Dim arg As String
GetInfoFromClosedFile = ""
If Right(wbPath, 1) <> "" Then wbPath = wbPath & ""
If Dir(wbPath & "" & wbName) = "" Then Exit Function
arg = "'" & wbPath & "[" & wbName & "]" & _
wsName & "'!" & Range(cellRef).Address(True, True, xlR1C1)
On Error Resume Next
GetInfoFromClosedFile = ExecuteExcel4Macro(arg)
End Function
Are you looking for the syntax to open them:
Dim wkbk As Workbook
Set wkbk = Workbooks.Open("C:\MyDirectory\mysheet.xlsx")
Then, you can use wkbk.Sheets(1).Range("3:3") (or whatever you need)
There's very little reason not to open multiple workbooks in Excel. Key lines of code are:
Application.EnableEvents = False
Application.ScreenUpdating = False
...then you won't see anything whilst the code runs, and no code will run that is associated with the opening of the second workbook. Then there are...
Application.DisplayAlerts = False
Application.Calculation = xlManual
...so as to stop you getting pop-up messages associated with the content of the second file, and to avoid any slow re-calculations. Ensure you set back to True/xlAutomatic at end of your programming
If opening the second workbook is not going to cause performance issues, you may as well do it. In fact, having the second workbook open will make it very beneficial when attempting to debug your code if some of the secondary files do not conform to the expected format
Here is some expert guidance on using multiple Excel files that gives an overview of the different methods available for referencing data
An extension question would be how to cycle through multiple files contained in the same folder. You can use the Windows folder picker using:
With Application.FileDialog(msoFileDialogFolderPicker)
.Show
If .Selected.Items.Count = 1 the InputFolder = .SelectedItems(1)
End With
FName = VBA.Dir(InputFolder)
Do While FName <> ""
'''Do function here
FName = VBA.Dir()
Loop
Hopefully some of the above will be of use
I had the same question but applying the provided solutions changed the file to write in. Once I selected the new excel file, I was also writing in that file and not in my original file. My solution for this issue is below:
Sub GetData()
Dim excelapp As Application
Dim source As Workbook
Dim srcSH1 As Worksheet
Dim sh As Worksheet
Dim path As String
Dim nmr As Long
Dim i As Long
nmr = 20
Set excelapp = New Application
With Application.FileDialog(msoFileDialogOpen)
.AllowMultiSelect = False
.Filters.Add "Excel Files", "*.xlsx; *.xlsm; *.xls; *.xlsb", 1
.Show
path = .SelectedItems.Item(1)
End With
Set source = excelapp.Workbooks.Open(path)
Set srcSH1 = source.Worksheets("Sheet1")
Set sh = Sheets("Sheet1")
For i = 1 To nmr
sh.Cells(i, "A").Value = srcSH1.Cells(i, "A").Value
Next i
End Sub
With excelapp a new application will be called. The with block sets the path for the external file. Finally, I set the external Workbook with source and srcSH1 as a Worksheet within the external sheet.