Looping through excel file in folder skipping the pre-processed ones (vba) - vba

I'm new to the site and I'm learning VBA.
Basically, I created a code which loops through excel files in a folder and processes some data which are subsequently implemented in a single common excel file with the name of the processed file in column A and all the data I want to record in the following cells.
Since I'm working with a lot of XSL files and the folder is constantly updated with new files, I was wondering which is the easiest way to go through the files once again when the macro starts and skipping the pre-processed files, in order to just record the new ones.
Thanks in advice

Add a function that checks if your file is already processed. Assuming that you have the list of processed files in column A of 1st Worksheet:
Function FileAlreadyProcessed(filename As String) As Boolean
Dim r As Range, matchRes As Variant
Set r = ThisWorkbook.Sheets(1).Range("A:A")
matchRes = Application.Match(filename, r, 0)
FileAlreadyProcessed = (Not IsError(matchRes))
End Function
This function will search Col A for the filename. When found, the function will return true, else false. So add a check in your loop
if not FileAlreadyProcessed(fileName) then
... do your processing
endif

Related

VBA - Loop through multiple subfolders on a network location with date search criteria/ Improve search speed

Purpose of my question and of the VBA code:
Get specific data (a couple columns) from each one of the "table.csv" file in a network directory. Each networkdirectory/subfolders01/subfolders02 contains one "table.csv" file but 100 other subfolders are included in each network/subfolders01. The other folders are not needed, the only one we are interested in is subfolder02 for each subfolder01. The number of subfolders01 in the network directory is about 15000. However I only need subfolders01 from Jan2020 to Apr2020,for example (200 subfolders).
Final purpose is to trend data.
Issue:
I am trying to understand how I could improve the VBA code that I am currently using.
This code goes through each subfolder one by one and then check the date and file name.
I am wondering if there is a way to add any search filters criteria for subfolder date and name to have a faster loop.
How can we avoid the code to go through each subfolders?
Please see below the code I am using,
I really appreciate your time and hope my request is clear.
'''
Function GetFiles(startPath As String) As Collection
Dim fso As Object, rv As New Collection, colFolders As New Collection, fpath As String
Dim subFolder As Object, f, dMinfold, dtMod
Set fso = CreateObject("Scripting.FileSystemObject")
dMinfold = ThisWorkbook.Sheets("Enter_Date").Cells(2, 1)
colFolders.Add startPath
Do While colFolders.Count > 0
fpath = colFolders(1)
colFolders.Remove 1
'process subfolders
For Each subFolder In fso.getfolder(fpath).subfolders
If subFolder.DateLastModified >= dMinfold Then
colFolders.Add subFolder.Path
End If
Next subFolder
'process files
f = Dir(fso.buildpath(fpath, "*Table.csv"), vbNormal)
Do While f <> ""
f = fso.buildpath(fpath, f)
dtMod = FileDateTime(f)
If dtMod >= dMinfold And Right(f, 3) = "csv" Then
rv.Add f
End If
f = Dir()
Loop
Loop
Set GetFiles = rv
End Function'''
Then I have my code to get transfer data from each file.
Thank you.
I'll put in screenshots to clear up the Get & Transform method, since it is the GUI approach rather than code.
It is possible to filter before loading contents, which will speed things up significantly.
I tried with a few thousand subfolders filtered down to 20, loads instantly.
Here's the initial screen for get data from folder
You can then filter on path. In your case it will be based on the date from the folder name.
Now that it's filtered you can expand the content using the header button.
Inside content, you'll have to expand again to convert from csv to excel table
Choose/rename columns as needed, then hit "close and load" to drop it into excel.
Default is to a new table, but you can "load to" if something more custom is needed.
Here's your output. You can right-click refresh or refresh from vba as needed.
Edit- Just noticed that I used .txt rather than .csv for the files. Might change how a step or two looks in the middle, but the general idea is the same.

Update an excel file by multiple users at same time without opening the file

Scenario
I have an excel file that contains data. There are multiple users accessing the file at the same time.
Problem
There will be problem if multiple users tried to input data to that excel file at the same time due to only one user is allowed to open the file at one time
Question
Is there any way whereby I can update the excel file (Eg: add a value to a cell, delete a value from a cell, find a particular cell etc) without opening it so that multiple users can update it at the same time using excel VBA?
I went to the direction of using shared files. But later found out to be excel shared files are very buggy. If use shared file, excel/macro can be very slow, intermittent crashes and sometime the whole file may get corrupted and could not be opened or repaired afterwards. Also depends on how many users use the file, the file size can grow quite big. So it is best not to use shared workbook. Totally not worth trying. Instead if need multiple users to update data simultaneously, it is better to use some database such as MSAccess, MSSql (Update MSSQL from Excel) etc with excel. For my situation since number of users are less, I didn't use any database, instead put a prompt for the user to wait until the other user close that file. Please see the codes below to check if a file is opened and if so, to prompt user. I got this code from stack overflow itself and I modified to suit my needs.
Call the module TestFileOpened as below.
Sub fileCheck()
Call TestFileOpened(plannerFilePathTemp)
If fileInUse = True Then
Application.ScreenUpdating = True
Exit Sub
End If
End Sub
Here plannerFilePathTemp is the temporary file location of your original file. Whenever an excel file opened, a temp file will be created. For example, your original file location is as below
plannerFilePath = "C:\TEMP\XXX\xxx.xlsx"
Thus your temporary file location will be
plannerFilePathTemp = "C:\TEMP\XXX\~$xxx.xlsx"
or in other words, temporary file name will be ~$xxx.xlsx
The following codes will be called upon Call TestFileOpened(plannerFilePathTemp)
Public fileInUse As Boolean
Sub TestFileOpened(fileOpenedOrNot As String)
Dim Folder As String
Dim FName As String
Set objFSO = CreateObject("Scripting.FileSystemObject")
If objFSO.FileExists(fileOpenedOrNot) Then
fileInUse = True
MsgBox "Database is opened and using by " & GetFileOwner(fileOpenedOrNot) & ". Please wait a few second and click again", vbInformation, "Database in Use"
Else
fileInUse = False
End If
End Sub
Function GetFileOwner(strFileName)
Set objWMIService = GetObject("winmgmts:")
Set objFileSecuritySettings = _
objWMIService.Get("Win32_LogicalFileSecuritySetting='" & strFileName & "'")
intRetVal = objFileSecuritySettings.GetSecurityDescriptor(objSD)
If intRetVal = 0 Then
GetFileOwner = objSD.Owner.Name
Else
GetFileOwner = "Unknown"
End If
End Function
I encountered Out of memory issues also when used shared files. So during the process, I figured out the following methods to minimize memory consumption
Some tips to clear memory

Adding a macro file to an XLSX/XLSM file using Packaging

I'm using System.IO.Packaging to build simple Excel files. One of our customers would like to have an autorun macro that updates data and recalcs the sheet.
Pulling apart existing sheets I can see that all you need to do is add the vbaProject.bin file and change a few types in the _rels. So I made the macro in one file, extracted the vbaProject.bin, copied it into another file, and presto, there's the macro.
I know how to add package parts when they are in XML format, like the sheets or the workbook itself, but I've never added a binary file and I can't figure it out. Has anyone done this before?
Ok I got it. Following TnTinMn's suggestion:
Open a new workbook and type in your macro. Change the extension to
zip, open it, open the xl folder and copy out the vbaProject.bin
to somewhere easy to find.
In your .Net code, make a new Part and add it to the Package as
'xl/vbaProject.bin'. Copy over byte-for-byte from the
vbaProject.bin you extracted above. It will be compressed as you
add the bytes.
Then you have to add a relationship to the workbook that points to
your new file. You can find those relationships in
xl/_rels/workbook.xml.rels.
You also have to add a content type entry at the root of the
document, which goes into the [Content Types].xls. This happens automatically when you use the ContentType parameter of CreatePart
And finally, change the extension to .xlsm or .xltm
I'm extracting the following from many places in my code, so this is pseudo...
'the package...
Dim xlPackage As Package = Package.Open(WBStream, FileMode.Create)
'start with the workbook, we need the object before we physically insert it
Dim xlPartUri As URI = PackUriHelper.CreatePartUri(New Uri(GetAbsoluteTargetUri("/", "xl/workbook.xml"), UriKind.Relative)) 'the URI is relative to the outermost part of the package, /
Dim xlPart As PackagePart = xlPackage.CreatePart(xlPartUri, "application/vnd.ms-excel.sheet.macroEnabled.main+xml", CompressionOption.Normal)
'add an entry in the root _rels folder pointing to the workbook
xlPackage.CreateRelationship(xlPartUri, TargetMode.Internal, "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument", "xlWorkbook") 'it turns out the ID can be anything unique
'now that we have the WB part, we can make our macro relative to it
Dim xlMacroUri As URI = PackUriHelper.CreatePartUri(New Uri(GetAbsoluteTargetUri("/xl/workbook.xml", "vbaProject.bin"), UriKind.Relative))
Dim xlMacroPart as PackagePart = xlPackage.CreatePart(xlPartUri, "application/vnd.ms-office.vbaProject", CompressionOption.Normal)
'time we link the vba to the workbook
xlParentPart.CreateRelationship(xlPartUri, TargetMode.Internal, "http://schemas.microsoft.com/office/2006/relationships/vbaProject", "rIdMacro") 'the ID on the macro can be anything as well
'copy over the data from the macro file
Using MacroStream As New FileStream("C:\yourdirectory\vbaProject.bin", FileMode.Open, FileAccess.Read)
MacroStream.CopyTo(xlMacroPart.GetStream())
End Using
'
'now write data into the main workbook any way you like, likely using new Parts to add Sheets

Delete top row of excel file using SSIS

I have an excel file that has a header row which is a row that I want to delete. The header row in thsi file are the cells of A1 to W1 merged into one. This causes a problem when I try to read the file because I am expecting column names. Proper column names exist in the second row of the file, which is why I want to delete the first.
To accomplish this I thought I'd be able to use the 'Excel Source' item in SSIS since it supports a SQL option to write a query. What I want to do is something like this:
SELECT * from ExcelFile WHERE Row > 1
My file only has data in columns A thru W.
I don't know what syntax I can use in the query to do this. The query builder that is in the Excel Source item will allow me to do many things with columns but I don't see an option for doing anything with rows. Searching online and using the help didn't get me anywhere.
None of these solutions will work because the Excel driver will be confused by the merged first line. You won't be able to use any driver features such as skip first row to do this. You need to run some script to open the Excel file and delete the row manually.
There is some basic sample script at this site:
http://www.sqlservercentral.com/Forums/Topic1327014-1292-1.aspx
The code below is adapted from the code written by snsingh at that site.
You would obviously want to use connnection manager properties, not hard coded paths
Excel needs to be installed on the SSIS Server for it to work - this is the only way to use Excel automation.
Dim filename As String
Dim appExcel As Object
Dim newBook As Object
Dim oSheet1 As Object
appExcel = CreateObject("Excel.Application")
filename = "C:\test.xls"
appExcel.DisplayAlerts = False
newBook = appExcel.Workbooks.Open(filename)
oSheet1 = newBook.worksheets("Sheet1")
oSheet1.Range("A1").Entirerow.Delete()
newBook.SaveAs(filename, FileFormat:=56)
appExcel.Workbooks.Close()
appExcel.Quit()
You don't need to use a syntax.
Go to control flow..
Pull in a data flow task.
Add a excel file source...add a conection manager
With excel sheet.
Open your connection manager and then check the box which says.
Column names In first row. That's it and add ur destination.

SPSS Script from version 15 to version 20 in BASIC

The below script is written in 'Winwrap basic' which is almost identical to VBA.
I would like this script to work on SPSS 20, the script works fine on SPSS15 (by changing the file extension from STT to TLO as that is what the tablelook file was back then).
However, whenever I run this script in SPSS 20 the wwb processor crashes with a generic error message 'WWBProcessor has encountered a problem and needs to close. We are sorry for the inconvenience.'
The script is well commented, but the purpose of the script is to change the tablelook of every table in the output viewer window, by activating each table in turn and setting the table look to one specified by the user, rotating the inner column labels, closing the table and activating the next table.
The loop continues until every table has been set to the new tablelook and rotation.
Manually setting the rotation of a few hundred tables is arduous and very time consuming not to mention numbingly boring. This script used to perform this task in seconds back in version 15, but ever evolving needs and lack of support for the older version has meant that I've been forced to use the newer version.
I'd be grateful for any assistance.
Mav
Option Explicit
Sub Main
'BEGIN DESCRIPTION
'This script changes all tabs to the same 'Tablelook' style. You will be prompted to choose the tablelook file.
'END DESCRIPTION
'******************
'Old description
'This script assumes that objSpssApp ist the currently running
'SPSS-Application and assigns every existing Pivot Table
'in the Output Navigator a new TableLook which can be selected
'from a Dialog box. Hidden tables will also be affected.
'Originally Created by SPSS Germany. Author: Arnd Winter.
'******************
'This script is written in the BASIC revision 'WinWrap Basic' code copied from VB or other basic languages may have to be modified to function properly.
On Error GoTo Bye
' Variable Declaration
' For an undertermined reason scripts cannot be executed throught the Utilites -> Run scripts menu,
' Instead they must be opened like a syntax file and ran from the SPSS 19 Scripting page.
' Functionality on SPSS 20 is now completely gone, error message only reads 'WWB processor has encountered a problem and needs to close'.
Dim objOutputDoc As ISpssOutputDoc 'Declares the Output variable
Set objOutputDoc = objSpssApp.GetDesignatedOutputDoc 'Assigns currently active output to Output variable
Dim strAppPath As String
Dim objOutputItems As ISpssItems 'variable defining every item in the current output window
Dim objOutputItem As ISpssItem 'variable defining the current item
Dim objPivotTable As PivotTable
Dim intCount As Integer 'declare the variable that will store the number of instances
Dim varStrLook As String
Set objOutputItems=objOutputDoc.Items
Dim i As Integer 'for loops we need an INT variable that will be counted against the number of instances 'i' is standard notation
' Find out SPSS Directory
strAppPath = objSpssApp.GetSPSSPath
' Select TableLook
'The Parametres you must enter into the GetFilePath() function are as follows:
'(Optional)Firstly you enter the initial file name (if none is required use an asterisk * and the file extention, or *.*)
'(Optional)The second part is the file extention expected, you can choose multiple filetypes if you seperate them with a semi-colon ;
'(Optional)The third parametre is the directory where the file should be opened.(default - Current path)
'The fourth parametre is the Title of the prompt, which should be enclosed in speech marks.
'The Final parametre is the 'Option'
'0 Only allow the user to select a file that exists.
'1 Confirm creation when the user selects a file that does not exist.
'2 Allow the user to select any file whether it exists or not.
'3 Confirm overwrite when the user selects a file that exists.
'+4 Selecting a different directory changes the application's current directory.
'For more detailed information visit the WWB website.
' http://www.winwrap.com/web/basic/language/?p=doc_getfilepath__func.htm
varStrLook = GetFilePath$("*.stt","stt",strAppPath,"Select Tablelook and confirm with Save.",4)
' Tested re-applying the dollar sign, cofusingly removing or adding the Dollar sign ($)
' seems to have no effect.
' If user presses Cancel or selected a file with the wrong file type then exit script
If (Len(varStrLook)= 0) Or (Right(varStrLook,3)<>"stt") Then
Exit Sub
End If
' Loop which assigns a new TableLook to all existing Tables.
intCount = objOutputItems.Count 'Assigns the total number of output items to the count-marker
For i = 0 To intCount-1 'Start loop
Set objOutputItem=objOutputItems.GetItem(i) 'Get current item
If objOutputItem.SPSSType=SPSSPivot Then 'If the item is a pivot table then...
Set objPivotTable=objOutputItem.ActivateTable 'Activate the table for editing
objPivotTable.TableLook = varStrLook 'Apply the earlier selected table look.
objPivotTable.RotateColumnLabels=True 'Rotate collumn lables
objOutputItem.Deactivate 'Confirm changes and deactivate the table
End If
Next 'End loop
'********************************************************
'Updated script from Version 15 ->
'Script now includes inner column label rotation
'Script has been modified and adapted to improve performance
'and to help people who wish to use/adapt the script
'in future endeavours.
'********************************************************
Bye:
End Sub
The first thing to try is to replace the activate/deactivate calls with
GetTableOLEObject
This is much more efficient and does not require the pivot table editor, but you can do all the things that you could do on an activated table.
If you don't have the current fixpack for V20, fixpack2, installing that would be a good idea, too.