VBA Access: Import CSV with additonal header data - vba

I am new to coding VBA. Was wondering if you all could help me? I have a CSV file which is structured as the following:
- First 22 rows cover the specfic header data(this all loads in one column in excel)
- column headers for table are in Row 23
- the data is actually located from row 24 onward.
What the code needs to do is insert this data in new table with the right column titles. Also while inserting it needs to input the file name and header data in the first few columns of the table.
So far I have imported the entire CSV into an array I believe:
See what I have so far:
Sub readCSV()
Dim fs As Object
Dim fso As New FileSystemObject
Dim tsIn As Object
Dim sFileIn, filename As String
Dim aryFile, aryHeader, aryBody As Variant
sFileIn = "C:\doc\test.csv"
Set filename = fso.GetFileName(sFileIn)
Set fs = CreateObject("Scripting.FileSystemObject")
Set tsIn = fs.OpenTextFile(sFileIn, 1)
sTmp = tsIn.ReadAll
aryFile = Split(sTmp, vbCrLf)
For i = 1 To 22
aryHeader(1, i) = aryFile(i)
Next i
For i = 23 To UBound(aryFile)
aryBody(i) = Split(aryFile(i), ",")
DoCmd.RunSQL "INSERT INTO MAINS VALUES (filename,aryHeader(1),aryBody(i))"
Next i
End Sub
is this correct? Can anyone see of i am taking the right approach
UPDATE - recoded this a bit

Use DoCmd.TransferText instead of rolling out your own code:
http://msdn.microsoft.com/en-us/library/office/ff835958%28v=office.15%29.aspx
In your Import Specification, you can set the starting row.
See Skip first three lines of CSV file (using DoCmd?) in MS Access for more information!
Edit: The import specification can be changed to rename the fields etc. See http://www.access-programmers.com/creating-an-import-specification-in-access-2003.aspx (the Import wizard exists in Access 2007 as well) and the Advanced dialog specifically.

I was a bit irked by the use of multiple arrays in your code (which is super confusing, to me, anyway, because you are looking at counters everywhere) so I thought I would post an alternative for you. If you can do it your way, more power to you, but if you run into problems, you can try this. Code below is much more verbose, but may save you time in the future if you hand it off or even have to come back to it yourself and have no idea what is going on (lol):
Sub ReadCSV()
On Error GoTo ErrorHandler
Dim db As DAO.Database
Dim rst As DAO.Recordset
Dim fso As Scripting.FileSystemObject
Dim tst As Scripting.TextStream
Dim strFileName As String
Dim intCurrentLine As Integer
Dim strCurrentLine As String
Dim intHeaderRows As Integer
Dim strHeader As String
Dim strHeaderDelimInField As String
'Consider these your 'constants', so you don't come back to this code in a month
'and wonder what the random numbers mean.
intHeaderRows = 22 'Number of header rows in CSV.
strHeaderDelimInField = "~" 'The character(s) you want to separate each
'header line, in field.
strFileName = "C:\IrregularCSV.csv"
intCurrentLine = 1 'Keep track of which line in the file we are currently on.
'Next two lines get a reference to your table; will add data via DAO and not SQL,
'to avoid messy dynamic SQL.
Set db = CurrentDb()
Set rst = db.OpenRecordset("Mains", dbOpenDynaset)
Set fso = New Scripting.FileSystemObject
Set tst = fso.OpenTextFile(strFileName, ForReading)
'Instead of storing data in arrays, let's go through the file line by line
'and do the work we need to do.
With tst
Do Until .AtEndOfStream
strCurrentLine = .ReadLine
If intCurrentLine <= intHeaderRows Then
strHeader = strHeader & strHeaderDelimInField & strCurrentLine
Else
'Add the records via DAO here.
rst.AddNew
'In DAO, rst.Fields("FieldName") are the columns in your table.
rst.Fields("FileName") = strFileName
'Remove leading delimiter with Right.
rst.Fields("HeaderInfo") = Right(strHeader, Len(strHeader) - 1)
'Note that Split always returns a zero-based array
'and is unaffected by the Option Base statement.
'The way below is less efficient than storing
'the return of Split, but also less confusing, imo.
rst.Fields("Field1") = Split(strCurrentLine, ",")(0)
rst.Fields("Field2") = Split(strCurrentLine, ",")(1)
rst.Fields("Field3") = Split(strCurrentLine, ",")(2)
rst.Update
End If
intCurrentLine = intCurrentLine + 1
Loop
End With
tst.Close
rst.Close
ExitMe:
Set tst = Nothing
Set fso = Nothing
Set rst = Nothing
Set db = Nothing
Exit Sub
ErrorHandler:
Debug.Print Err.Number & ": " & Err.Description
GoTo ExitMe
End Sub
To be honest, I think there are a lot of gotchas to the way you are going about it. Not saying it won't work, because I think it can, but this method is more robust. An unexpected single quote won't ruin your work and using a data object to do the inserts is not prone (well, less, at least) to SQL injection issues. And I've done it with no persisted arrays. Anyway, some food for thought. Good luck.

this is what i ended up:
Sub ReadCSV2()
Dim fs As Object
Dim filename As String
Dim tsIn As Object
Dim sFileIn As String
Dim aryHeader, aryBody As Variant
Dim Text As String
Dim sqlcre As String
Dim sqlsta As String
sFileIn = "C:\test\test.csv"
filename = GetFilenameFromPath(sFileIn) 'function to get the file name
Set fs = CreateObject("Scripting.FileSystemObject")
Set tsIn = fs.OpenTextFile(sFileIn, 1)
For i = 1 To 23
Tmps = tsIn.ReadLine
Next i
aryHeader = Split(Tmps, ",")
On Error Resume Next
DoCmd.RunSQL "DROP TABLE tempdata"
On Error GoTo 0
sqlcre = "CREATE TABLE tempdata ([Filename] Text,"
For k = LBound(aryHeader) To UBound(aryHeader)
sqlcre = sqlcre & "[" & aryHeader(k) & " " & k + 1 & "] Text,"
Next k
k = k - 1
sqlcre = Left(sqlcre, Len(sqlcre) - 13) & ")"
'Debug.Print k
'Debug.Print sqlcre
DoCmd.RunSQL sqlcre
DoCmd.SetWarnings False
While Not tsIn.AtEndOfStream
Tmps = tsIn.ReadLine
aryBody = Split(Tmps, ",")
sqlsta = "INSERT INTO tempdata VALUES ('" & filename & "','"
For M = LBound(aryBody) To UBound(aryBody)
sqlsta = sqlsta & Replace(aryBody(M), "'", "`") & "', '"
Next M
M = M - 1
Debug.Print M
If M < k Then
Text = ""
For i = 1 To (k - M)
Text = Text & "', '"
Next i
sqlsta = sqlsta & Text
End If
sqlsta = Left(sqlsta, Len(sqlsta) - 7) & ")"
'Debug.Print sqlsta
'Debug.Print k
DoCmd.RunSQL sqlsta
Wend
DoCmd.SetWarnings True
End Sub

Related

Why does this line of code work half the time, and the other half gives me Data Type Conversion Error 3421

Here is the full code:
Dim db As DAO.Database
Dim rs As DAO.Recordset
Dim frm As Access.Form
Dim i As Long
'For readability
Set frm = Forms!Frm_JobTicket
'Open Tbl_Schedule for adding Schedule Dates
Set db = CurrentDb
Set rs = db.OpenRecordset("Tbl_Schedule", dbOpenDynaset, dbAppendOnly)
'Creates loop for fields 1-14. Sets Date_ScheduledX = Forms!Frm_JobTicket!Txt_DateScheduledX. Runs through Loop then closes recordset
rs.AddNew
For i = 1 To 14
If (Not IsNull(frm("Txt_DateScheduled" & i & "_JobTicket"))) Then
rs("Date_Scheduled" & i) = frm("Txt_DateScheduled" & i & "_JobTicket")
End If
Next i
'Adds in Sales Order Number to Tbl_Schedule
rs!Sales_Order_Number = frm("Sales_Order_Number")
'Adds in Part Number to Tbl_Schedule
rs!Part_Number = frm("Part_Number")
'Adds updates and closes table
rs.Update
rs.Close
'Shows message box to inform the User if item was Scheduled
MsgBox "Item Scheduled."
'Runs Private Sub above. Clears all values from DateScheduled1-14 on Frm_JobTicket to null
ClearFields
'Clears DB and RS to null
Set db = Nothing
Set rs = Nothing
The line that doesn't work is this rs("Date_Scheduled" & i) = frm("Txt_DateScheduled" & i & "_JobTicket"). Sometimes it will run perfectly fine, and other times it gives me an endless flow of 3421 Data type conversion errors. I do not know what could be going wrong, none of the fields have default values, all of the fields in the table side are Date/Time with this same format, and now I am checking for nulls.
Any help would be greatly appreciated!!
Maybe something like
If Len(Me.Txt_DateScheduled & vbNullString) > 0 Then
rs("Date_Scheduled" & i) = frm("Txt_DateScheduled" & i & "_JobTicket")
Else
rs("Date_Scheduled" & i) = ""
End If
This is completely untested, but I think you should get the concept.

MS Access Query to find gaps in sequential numbers when numbers are stored in Short Text field

I have a table (tblParts) with a PartNumber field (Short Text) which stores 6 digit part numbers for parts belonging to several families. The families are denoted by the first 2 digits of the part number (00, 01, 02, etc).
(NOTE: I did not create this table and am not able to change it at this time)
I need to find gaps in the numbering in order to fill in unused part numbers. If I have a project starting that needs 6 consecutive part numbers in a specific family, I want to find the first unused number in the first gap of that size or greater within that family.
Here is a small subset of the data.
PartNumber
020001
020002
020003
020004
020005
020006
020007
020009
020010
020011
020012
020013
020014
020019
020101
If I needed a single number, the query should find 020008. If I needed 3 numbers, it should find 0200015 and if I needed 10 numbers it should find 020020.
My SQL knowledge is very limited but I am trying to learn. I realize this would be much easier if the information was stored properly but I have no control over it.
I once wrote an article on the subject:
Find and Generate Missing Values in an Access Table
but that will fill up any gap until all new numbers were established. So, that code will need an expansion with an outer loop to ensure juxtaposed numbers at all times.
Private Sub btnSearch_Click()
' Read table/query sequentially to
' record all missing IDs.
' Fill a ListBox with missing values.
' A reference to Microsoft DAO must be
' present.
' Define search table or query.
Const cstrTable As String = "Orders"
Const cstrField As String = "OrderID"
Dim dbs As DAO.Database
Dim rst As DAO.Recordset
Dim lst As ListBox
Dim col As Collection
Dim strSQL As String
Dim strList As String
Dim lngLast As Long
Dim lngNext As Long
Dim lngMiss As Long
strSQL = "Select " & cstrField & "" _
& " From " & cstrTable & _
& " Order By 1;"
Set lst = Me!lstMissing
Set col = New Collection
Set dbs = CurrentDb
Set rst = dbs.OpenRecordset(strSQL)
If rst.RecordCount = 0 Then
'The recordset is empty.
'Nothing to do.
Else
lngLast = rst(cstrField).Value
rst.MoveNext
While rst.EOF = False
lngNext = rst(cstrField).Value
For lngMiss = lngLast + 1 To _
lngNext - 1
col.Add (lngMiss)
Next
lngLast = lngNext
rst.MoveNext
Wend
'Generate next value in sequence.
'Discard if collecting only
'missing values.
col.Add (lngLast + 1)
End If
rst.Close
'Populate list box from collection.
For lngMiss = 1 To col.Count
If Len(strList) > 0 Then
strList = strList & ";"
End If
strList = strList & col(lngMiss)
Debug.Print col(lngMiss)
Next
lst.RowSource = strList
Debug.Print strList
Set rst = Nothing
Set dbs = Nothing
Set col = Nothing
Set lst = Nothing
End Sub

Access database import from excel file fails when AssocID (first field) is mixed with integers and strings

I have an access database with vba code that is attempting to access an excel sheet and copy the data to a recordset using DAO.recordset. If all of the column (assocId) is integer the import works wonderfully if all are strings it works but if you have a mixed back (eg 111111 | Vinny | etc and then on row two you have JOE-1 | Joe | etc) the import will fail. It says "You cannot record your changes because a value you entered violates the settings defined for this table"
Here is the offending sub:
Public Sub LoadFileInfo()
'Load information from selected file
On Error GoTo ErrorHappened
Dim db As DAO.Database
Dim rs As DAO.Recordset
Dim i As Integer
Dim lastTransType As String
Dim transactionCounter As Integer
Dim currentRecord As CurrentImportRecord
Dim wtf As Variant
Set db = CurrentDb()
Set rs = db.OpenRecordset(selectTransTypesSql & GetSetting("PayrollManualImportExportTransactionTypes") & ")")
ReDim transTypes(DCount("TransType", "MasterTransactionTypes", "IsActive <> 0")) As MasterTransactionTypes
rs.MoveFirst
Do While Not rs.EOF
'Add each transaction type and desc to the Private Type and increment the appropriate counter
GetMasterTransactionTypes rs!TransType, rs!TransDesc, i
rs.MoveNext
i = i + 1
Loop
rs.Close
Set db = OpenDatabase(importFileName, False, True, "Excel 12.0;HDR=Yes")
Set rs = db.OpenRecordset("SELECT * FROM " & "[" & GetSheetName(importFileName) & "$]" & " ORDER BY TransType")
rs.MoveLast
importFields = vbNullString
For i = 0 To rs.Fields.count - 1
importFields = importFields & rs.Fields(i).Name & ","
Next
fullImport = (rs.Fields.count > 4)
i = 0
transactionCounter = 1
lblFile.Caption = "File name: " & importFileName
Dim rowNum As Variant
rowNum = rs.RecordCount
wtf = rs.GetRows(rowNum)
ReDim ledgerEntries(rs.RecordCount) As PayrollLedgerImport
'Check to see if the recordset actually contains rows; if so push transaction objects to private type array
rs.MoveFirst
Do While Not rs.EOF
currentRecord.associateId = CStr(rs!assocId)
currentRecord.transactionType = rs!TransType
currentRecord.transactionNotes = CStr(rs!TransNotes)
If lastTransType = CStr(currentRecord.transactionType) Then
transactionCounter = transactionCounter + 1
Else
transactionCounter = 1
End If
If IsValidTransType(currentRecord.transactionType) Then
If Not fullImport Then
currentRecord.transactionAmount = rs!TransAmount
GetPayrollTransactions currentRecord.associateId, currentRecord.transactionType, currentRecord.transactionAmount, currentRecord.transactionNotes, i, transactionCounter
Else
currentRecord.transactionAmount = rs!TransAmt
GetPayrollTransactions currentRecord.associateId, currentRecord.transactionType, currentRecord.transactionAmount, currentRecord.transactionNotes, i, transactionCounter
End If
Else
MsgBox (currentRecord.transactionType & ": Not A Valid Transaction Type")
End If
lastTransType = rs!TransType
rs.MoveNext
i = i + 1
Loop
FormatFileInformationWindow
cmdImportFile.Enabled = True
End Sub
I've been at this for hours. I've tried casting all the columns when I take them in and I get the same issue. Works fine for all ints or all strings but in reality some of our employees have string and some have int for employee ID. I tried taking them all in as string converting where necessary but that didn't work either. Only thing that works is two sheets - one containing strings one containing ints.

Load csv file into a VBA array rather than Excel Sheet

I am currently able to enter csv file data into Excel VBA by uploading the data via the code below then handling the table, surely not the best way as I am only interested in some of the data and delete the sheet after using the data:
Sub CSV_Import()
Dim ws As Worksheet, strFile As String
Set ws = ActiveSheet 'set to current worksheet name
strFile = Application.GetOpenFilename("Text Files (*.csv),*.csv", ,"Please select text file...")
With ws.QueryTables.Add(Connection:="TEXT;" & strFile, Destination:=ws.Range("A1"))
.TextFileParseType = xlDelimited
.TextFileCommaDelimiter = True
.Refresh
End With
End Sub
Is it possible to simply load the csv into a two dimensional variant array in VBA rather than going through the use of an excel worksheet?
Okay, looks like you need two things: stream the data from the file, and populate a 2-D array.
I have a 'Join2d' and a 'Split2d' function lying around (I recall posting them in another reply on StackOverflow a while ago). Do look at the comments in the code, there are things you might need to know about efficient string-handling if you're handling large files.
However, it's not a complicated function to use: just paste the code if you're in a hurry.
Streaming the file is simple BUT we're making assumptions about the file format: are the lines in the file delimited by Carriage-Return characters or the Carriage-Return-and-Linefeed character pair? I'm assuming 'CR' rather than CRLF, but you need to check that.
Another assumption about the format is that numeric data will appear as-is, and string or character data will be encapsulated in quote marks. This should be true, but often isn't... And stripping out the quote marks adds a lot of processing - lots of allocating and deallocating strings - which you really don't want to be doing in a large array. I've short-cut the obvious cell-by-cell find-and-replace, but it's still an issue on large files.
If your file has commas embedded in the string values, this code won't work: and don't try to code up a parser that picks out the encapsulated text and skips these embedded commas when splitting-up the rows of data into individual fields, because this intensive string-handling can't be optimised into a fast and reliable csv reader by VBA.
Anyway: here's the source code: watch out for line-breaks inserted by StackOverflow's textbox control:
Running the code:
Note that you'll need a reference to the Microsoft Scripting Runtime (system32\scrrun32.dll)
Private Sub test()
Dim arrX As Variant
arrX = ArrayFromCSVfile("MyFile.csv")
End Sub
Streaming a csv file.
Note that I'm assuming your file is in the temp folder:
C:\Documents and Settings[$USERNAME]\Local Settings\Temp
You'll need to use filesystem commands to copy the file into a local folder: it's always quicker than working across the network.
Public Function ArrayFromCSVfile( _
strName As String, _
Optional RowDelimiter As String = vbCr, _
Optional FieldDelimiter = ",", _
Optional RemoveQuotes As Boolean = True _
) As Variant
' Load a file created by FileToArray into a 2-dimensional array
' The file name is specified by strName, and it is exected to exist
' in the user's temporary folder. This is a deliberate restriction:
' it's always faster to copy remote files to a local drive than to
' edit them across the network
' RemoveQuotes=TRUE strips out the double-quote marks (Char 34) that
' encapsulate strings in most csv files.
On Error Resume Next
Dim objFSO As Scripting.FileSystemObject
Dim arrData As Variant
Dim strFile As String
Dim strTemp As String
Set objFSO = New Scripting.FileSystemObject
strTemp = objFSO.GetSpecialFolder(Scripting.TemporaryFolder).ShortPath
strFile = objFSO.BuildPath(strTemp, strName)
If Not objFSO.FileExists(strFile) Then ' raise an error?
Exit Function
End If
Application.StatusBar = "Reading the file... (" & strName & ")"
If Not RemoveQuotes Then
arrData = Join2d(objFSO.OpenTextFile(strFile, ForReading).ReadAll, RowDelimiter, FieldDelimiter)
Application.StatusBar = "Reading the file... Done"
Else
' we have to do some allocation here...
strTemp = objFSO.OpenTextFile(strFile, ForReading).ReadAll
Application.StatusBar = "Reading the file... Done"
Application.StatusBar = "Parsing the file..."
strTemp = Replace$(strTemp, Chr(34) & RowDelimiter, RowDelimiter)
strTemp = Replace$(strTemp, RowDelimiter & Chr(34), RowDelimiter)
strTemp = Replace$(strTemp, Chr(34) & FieldDelimiter, FieldDelimiter)
strTemp = Replace$(strTemp, FieldDelimiter & Chr(34), FieldDelimiter)
If Right$(strTemp, Len(strTemp)) = Chr(34) Then
strTemp = Left$(strTemp, Len(strTemp) - 1)
End If
If Left$(strTemp, 1) = Chr(34) Then
strTemp = Right$(strTemp, Len(strTemp) - 1)
End If
Application.StatusBar = "Parsing the file... Done"
arrData = Split2d(strTemp, RowDelimiter, FieldDelimiter)
strTemp = ""
End If
Application.StatusBar = False
Set objFSO = Nothing
ArrayFromCSVfile = arrData
Erase arrData
End Function
Split2d
Creates a 2-dimensional VBA array from a string:
Public Function Split2d(ByRef strInput As String, _
Optional RowDelimiter As String = vbCr, _
Optional FieldDelimiter = vbTab, _
Optional CoerceLowerBound As Long = 0 _
) As Variant
' Split up a string into a 2-dimensional array.
' Works like VBA.Strings.Split, for a 2-dimensional array.
' Check your lower bounds on return: never assume that any array in
' VBA is zero-based, even if you've set Option Base 0
' If in doubt, coerce the lower bounds to 0 or 1 by setting
' CoerceLowerBound
' Note that the default delimiters are those inserted into the
' string returned by ADODB.Recordset.GetString
On Error Resume Next
' Coding note: we're not doing any string-handling in VBA.Strings -
' allocating, deallocating and (especially!) concatenating are SLOW.
' We're using the VBA Join & Split functions ONLY. The VBA Join,
' Split, & Replace functions are linked directly to fast (by VBA
' standards) functions in the native Windows code. Feel free to
' optimise further by declaring and using the Kernel string functions
' if you want to.
' ** THIS CODE IS IN THE PUBLIC DOMAIN **
' Nigel Heffernan Excellerando.Blogspot.com
Dim i As Long
Dim j As Long
Dim i_n As Long
Dim j_n As Long
Dim i_lBound As Long
Dim i_uBound As Long
Dim j_lBound As Long
Dim j_uBound As Long
Dim arrTemp1 As Variant
Dim arrTemp2 As Variant
arrTemp1 = Split(strInput, RowDelimiter)
i_lBound = LBound(arrTemp1)
i_uBound = UBound(arrTemp1)
If VBA.LenB(arrTemp1(i_uBound)) <= 0 Then
' clip out empty last row: a common artifact in data
'loaded from files with a terminating row delimiter
i_uBound = i_uBound - 1
End If
i = i_lBound
arrTemp2 = Split(arrTemp1(i), FieldDelimiter)
j_lBound = LBound(arrTemp2)
j_uBound = UBound(arrTemp2)
If VBA.LenB(arrTemp2(j_uBound)) <= 0 Then
' ! potential error: first row with an empty last field...
j_uBound = j_uBound - 1
End If
i_n = CoerceLowerBound - i_lBound
j_n = CoerceLowerBound - j_lBound
ReDim arrData(i_lBound + i_n To i_uBound + i_n, j_lBound + j_n To j_uBound + j_n)
' As we've got the first row already... populate it
' here, and start the main loop from lbound+1
For j = j_lBound To j_uBound
arrData(i_lBound + i_n, j + j_n) = arrTemp2(j)
Next j
For i = i_lBound + 1 To i_uBound Step 1
arrTemp2 = Split(arrTemp1(i), FieldDelimiter)
For j = j_lBound To j_uBound Step 1
arrData(i + i_n, j + j_n) = arrTemp2(j)
Next j
Erase arrTemp2
Next i
Erase arrTemp1
Application.StatusBar = False
Split2d = arrData
End Function
Join2D
Turns a 2-dimensional VBA array to a string:
Public Function Join2d(ByRef InputArray As Variant, _
Optional RowDelimiter As String = vbCr, _
Optional FieldDelimiter = vbTab, _
Optional SkipBlankRows As Boolean = False _
) As String
' Join up a 2-dimensional array into a string. Works like the standard
' VBA.Strings.Join, for a 2-dimensional array.
' Note that the default delimiters are those inserted into the string
' returned by ADODB.Recordset.GetString
On Error Resume Next
' Coding note: we're not doing any string-handling in VBA.Strings -
' allocating, deallocating and (especially!) concatenating are SLOW.
' We're using the VBA Join & Split functions ONLY. The VBA Join,
' Split, & Replace functions are linked directly to fast (by VBA
' standards) functions in the native Windows code. Feel free to
' optimise further by declaring and using the Kernel string functions
' if you want to.
' ** THIS CODE IS IN THE PUBLIC DOMAIN **
' Nigel Heffernan Excellerando.Blogspot.com
Dim i As Long
Dim j As Long
Dim i_lBound As Long
Dim i_uBound As Long
Dim j_lBound As Long
Dim j_uBound As Long
Dim arrTemp1() As String
Dim arrTemp2() As String
Dim strBlankRow As String
i_lBound = LBound(InputArray, 1)
i_uBound = UBound(InputArray, 1)
j_lBound = LBound(InputArray, 2)
j_uBound = UBound(InputArray, 2)
ReDim arrTemp1(i_lBound To i_uBound)
ReDim arrTemp2(j_lBound To j_uBound)
For i = i_lBound To i_uBound
For j = j_lBound To j_uBound
arrTemp2(j) = InputArray(i, j)
Next j
arrTemp1(i) = Join(arrTemp2, FieldDelimiter)
Next i
If SkipBlankRows Then
If Len(FieldDelimiter) = 1 Then
strBlankRow = String(j_uBound - j_lBound, FieldDelimiter)
Else
For j = j_lBound To j_uBound
strBlankRow = strBlankRow & FieldDelimiter
Next j
End If
Join2d = Replace(Join(arrTemp1, RowDelimiter), strBlankRow, RowDelimiter, "")
i = Len(strBlankRow & RowDelimiter)
If Left(Join2d, i) = strBlankRow & RowDelimiter Then
Mid$(Join2d, 1, i) = ""
End If
Else
Join2d = Join(arrTemp1, RowDelimiter)
End If
Erase arrTemp1
End Function
Share and enjoy.
Yes read it as a text file.
See this example
Option Explicit
Sub Sample()
Dim MyData As String, strData() As String
Open "C:\MyFile.CSV" For Binary As #1
MyData = Space$(LOF(1))
Get #1, , MyData
Close #1
strData() = Split(MyData, vbCrLf)
End Sub
FOLLOWUP
Like I mentioned below in the comments, AFAIK, there is no direct way of filling a 2d Array from a csv. You will have to use the code that I gave above and then split it per line and finally filling up a 2D array which can be cumbersome. Filling up a column is easy but if you specifically want say from Row 5 to Col 7 Data then it becomes cumbersome as you will have to check if there are sufficient columns/rows in the data. Here is a basic example to get Col B in a 2D Array.
NOTE: I have not done any error handling. I am sure you can take care of that.
Let's say our CSV File looks likes this.
When you run this code
Option Explicit
Const Delim As String = ","
Sub Sample()
Dim MyData As String, strData() As String, TmpAr() As String
Dim TwoDArray() As String
Dim i As Long, n As Long
Open "C:\Users\Siddharth Rout\Desktop\Sample.CSV" For Binary As #1
MyData = Space$(LOF(1))
Get #1, , MyData
Close #1
strData() = Split(MyData, vbCrLf)
n = 0
For i = LBound(strData) To UBound(strData)
If Len(Trim(strData(i))) <> 0 Then
TmpAr = Split(strData(i), Delim)
n = n + 1
ReDim Preserve TwoDArray(1, 1 To n)
'~~> TmpAr(1) : 1 for Col B, 0 would be A
TwoDArray(1, n) = TmpAr(1)
End If
Next i
For i = 1 To n
Debug.Print TwoDArray(1, i)
Next i
End Sub
You will get the output as shown below
BTW, I am curious that since you are doing this in Excel, why not use inbuilt Workbooks.Open or QueryTables method and then read the range into a 2D array? That would be much simpler...
OK, after looking into this, the solution I have arived at is to use ADODB (requires reference to ActiveX Data Objects, this loads the csv file into array without cycling the rows columns. Does require the data to be in good condition.
Sub LoadCSVtoArray()
strPath = ThisWorkbook.Path & "\"
Set cn = CreateObject("ADODB.Connection")
strcon = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & strPath & ";Extended Properties=""text;HDR=Yes;FMT=Delimited"";"
cn.Open strcon
strSQL = "SELECT * FROM SAMPLE.csv;"
Dim rs As Recordset
Dim rsARR() As Variant
Set rs = cn.Execute(strSQL)
rsARR = WorksheetFunction.Transpose(rs.GetRows)
rs.Close
Set cn = Nothing
[a1].Resize(UBound(rsARR), UBound(Application.Transpose(rsARR))) = rsARR
End Sub
To get a known format csv data file into a 2D array I finally adopted the following method, which seems to work well and is quite quick.
I decided that file read operations are fairly fast nowadays, so I run a first pass on the csv file to get the size required for both dimension of the array. With the array suitably dimensioned it is then a simple task to re-read the file, line by line, and populate the array.
Function ImportTestData(ByRef srcFile As String, _
ByRef dataArr As Variant) _
As Boolean
Dim FSO As FileSystemObject, Fo As TextStream
Dim line As String, Arr As Variant
Dim lc As Long, cc As Long
Dim i As Long, j As Long
ImportTestData = False
Set FSO = CreateObject("Scripting.FilesystemObject")
Set Fo = FSO.OpenTextFile(srcFile)
' First pass; read the file to get array size
lc = 0 ' Counter for number of lines in the file
cc = 0 ' Counter for number of columns in the file
While Not Fo.AtEndOfStream ' Read the csv file line by line
line = Fo.ReadLine
If lc = 0 Then ' Count commas to get array's 2nd dim index
cc = 1 + Len(line) - Len(Replace(line, ",", ""))
End If
lc = lc + 1
Wend
Fo.Close
' Set array dimensions to accept file contents
ReDim dataArr(0 To lc - 1, 0 To cc - 1)
'Debug.Print "CSV has "; n; " rows with "; lc; " fields/row"
If lc > 1 And cc > 1 Then
ImportTestData = True
End If
' Second pass; Re-open data file and copy to array
Set Fo = FSO.OpenTextFile(srcFile)
lc = 0
While Not Fo.AtEndOfStream
line = Fo.ReadLine
Arr = Split(line, ",")
For i = 0 To UBound(Arr)
dataArr(lc, i) = Arr(i)
Next i
lc = lc + 1
Wend
End Function 'ImportTestData()
I created this as a Function rather than a Sub to get a simple return value, if required.
Reading a file with 8,500 rows of 20 columns takes approximately 180ms.
This method assumes that the structure (number of delimiters) of the CSV file is the same for every row, typical of a data logging application.
The following solution does not use ActiveX:
I wrote code to import a csv (actually tab-separated) file into an array. That code is the following.
First let's designate the array (initially it is completely void but it will be resized appropriately later):
Dim TxtFile$()
Now for the sub-procedure:
' Fills TxtFile$() array
Sub FillTextFileArray(A$)
'***********************************************************************
' Declarations
'***********************************************************************
Dim I, J As Integer
Dim LineString As String
'***********************************************************************
I = -1: J = 0 ' Will hold array dimentions
Open A$ For Input As #1
Do While Not EOF(1) ' Loop until end of file.
Line Input #1, LineString
LineString = LineString + vbTab ' If not done empty lines give error with Split()
I = I + 1
If J < UBound(Split(LineString, vbTab)) Then J = UBound(Split(LineString, vbTab))
Loop
ReDim TxtFile$(1 To I + 4, 1 To J + 4) ' Not indexed from 0 ! (Plus some room at the end.) This is done to match worksheet format.
Seek #1, 1 ' Reset to start
I = -1 ' Will hold array row index
Do While Not EOF(1) ' Loop until end of file.
Line Input #1, LineString
LineString = LineString + vbTab ' If not done empty lines give error with Split()
I = I + 1
For J = 0 To UBound(Split(LineString, vbTab))
TxtFile$(I + 1, J + 1) = Split(LineString, vbTab)(J)
Next J
Loop
Close #1 ' Close file.
' TxtFile$() now holds the contents of the text file
End Sub
Obviously you can then do what you want with the TxtFile$ array. A$ is the location and name of the text file. As already said, this particular code works with tab-delimited files (vbTab), not comma-delimited (separated), but any adaptation should not be too difficult. It has the advantage of avoiding ActiveX complications.
Alternatively you can use a code like this
Dim line As String, Arr
Dim FSO As Object, Fo As Object
Set FSO = CreateObject("Scripting.FileSystemObject")
Set Fo = FSO.OpenTextFile("csvfile.csv")
While Not Fo.AtEndOfStream
line = Fo.ReadLine ' Read the csv file line by line
Arr = Split(line, ",") ' The csv line is loaded into the Arr as an array
For i = 0 To UBound(Arr) - 1: Debug.Print Arr(i) & " ";: Next
Debug.Print
Wend
01/01/2019 1 1 1 36 55.6 0.8 85.3 95 95 109 102 97 6 2.5 2.5 3.9
01/01/2019 1 2 0 24 0.0 2.5 72.1 89 0 0 97 95 10 6.7 4.9 3.9
01/01/2019 1 3 1 36 26.3 4 80.6 92 92 101 97 97 8 5.5 5.3 3.7
01/01/2019 1 4 0 16 30.0 8 79.2 75 74 87 87 86 10 3.8 4 4.2
These days, GitHub hosts at least three CSV parsers that do exactly what the OP asked for - load a CSV file into a VBA array.
I'm the author of this one:
https://github.com/PGS62/VBA-CSV
It handles a broad variety of CSV files, including those with "embedded" commas, line-feeds etc, and those with a varying number of fields per row. I provide links to alternative VBA CSV parsers in the README file.

need a faster way to match these data sets

I have a set of Excel sheets, each set up as follows:
ID | imageName
--------------
1 abc.jpg
2 def.bmp
3 abc.jpg
4 xyz123.jpg
This sheet corresponds to a folder with contents like:
abc.pdf
ghijkl.pdf
def.pdf
def.xls
x-abc.pdf
I'm trying to generate a report that matches the instance of each imageName with the lowest ID with the PDFs that match it, and also identifies unmatched imageName in the sheet and unmatched PDFs in the folder. A filename with an "x-" prefix is equivalent to one without the prefix, so the report for this data set would be as follows:
ID imageName filename
-----------------------
1 abc.jpg abc.pdf
1 abc.jpg x-abc.pdf
2 def.bmp def.pdf
4 xyz123.jpg
ghijkl.pdf
My current solution is as follows:
'sheetObj is the imageName set, folderName is the path to the file folder
sub makeReport(sheetObj as worksheet,folderName as string)
dim fso as new FileSystemObject
dim imageDict as Dictionary
dim fileArray as variant
dim ctr as long
'initializes fileArray for storing filename/imageName pairs
redim fileArray(1,0)
'returns a Dictionary where key is imageName and value is lowest ID for that imageName
set imageDict=lowestDict(sheetObj)
'checks all files in folder and populates fileArray with their imageName matches
for each file in fso.getfolder(folderName).files
fileFound=false
'gets extension and checks if it's ".pdf"
if isPDF(file.name) then
for each key in imageDict.keys
'checks to see if base names are equal, accounting for "x-" prefix
if equalNames(file.name,key) then
'adds a record to fileArray mapping filename to imageName
addToFileArray fileArray,file.path,key
fileFound=true
end if
next
'checks to see if filename did not match any dictionary entries
if fileFound=false then
addToFileArray fileArray,file.path,""
end if
end if
next
'outputs report of imageDict entries and their matches (if any)
for each key in imageDict.keys
fileFound=false
'checks for all fileArray matches to this imageName
for ctr=0 to ubound(fileArray,2)
if fileArray(0,ctr)=key then
fileFound=true
'writes the data for this match to the worksheet
outputToExcel sheetObj,key,imageDict(key),fileArray(0,ctr)
end if
next
'checks to see if no fileArray match was found
if fileFound=false then
outputToExcel sheetObj,key,imageDict(key),""
end if
next
'outputs unmatched fileArray entries
for ctr=0 to ubound(fileArray,2)
if fileArray(1,ctr)="" then
outputToExcel sheetObj,"","",fileArray(0,ctr)
end if
next
This program outputs the report successfully, but it's very slow. Because of the nested For loops, as the number of imageName entries and files grows, the time to process them grows exponentially.
Is there a better way to check for matches in these sets? It might be faster if I make fileArray into a Dictionary, but a dictionary can't have duplicate keys, and this data structure needs to have duplicate entries in its fields, as a filename may match multiple imageNames and vice versa.
this should find the first one pretty quickly. you can do whatever you want at the inside of that last if statement. It uses an ADO recordset which should be faster than nested for loops
Sub match()
Dim sheetName As String: sheetName = "Sheet1"
Dim rst As New ADODB.Recordset
Dim cnx As New ADODB.Connection
Dim cmd As New ADODB.Command
'setup the connection
'[HDR=Yes] means the Field names are in the first row
With cnx
.Provider = "Microsoft.Jet.OLEDB.4.0"
.ConnectionString = "Data Source='" & ThisWorkbook.FullName & "'; " & "Extended Properties='Excel 8.0;HDR=Yes;IMEX=1'"
.Open
End With
'setup the command
Set cmd.ActiveConnection = cnx
cmd.CommandType = adCmdText
cmd.CommandText = "SELECT * FROM [" & sheetName & "$]"
rst.CursorLocation = adUseClient
rst.CursorType = adOpenDynamic
rst.LockType = adLockOptimistic
'open the connection
rst.Open cmd
Dim fso As FileSystemObject: Set fso = New FileSystemObject
Dim filesInFolder As files, f As File
Set filesInFolder = fso.GetFolder("C:\Users\Bradley\Downloads").files
For Each f In filesInFolder
rst.MoveFirst
rst.Find "imageName = '" & f.Name & "'", , adSearchForward
If Not rst.EOF Then
Debug.Print rst("imagename") & "::" & rst("ID") '<-- Do what you need to do here
End If
Next f
End Sub
FYI: I referenced this post
Another way.
Sub Sample()
Dim ws As Worksheet, wstemp As Worksheet
Dim FileAr() As String
Dim n As Long, wsLRow As Long
Set ws = Sheets("Sheet1") '<~~ Which has imageNames
wsLRow = ws.Range("A" & ws.Rows.Count).End(xlUp).Row
n = 0
strFile = Dir("C:\Temp\*.*")
Do While strFile <> ""
n = n + 1
ReDim Preserve FileAr(n)
If Mid(strFile, Len(strFile) - 3, 1) = "." Then
FileAr(n) = Mid(strFile, 1, Len(strFile) - 4)
ElseIf Mid(strFile, Len(strFile) - 4, 1) = "." Then
FileAr(n) = Mid(strFile, 1, Len(strFile) - 5)
Else
FileAr(n) = strFile
End If
strFile = Dir
Loop
Set wstemp = Worksheets.Add
wstemp.Range("A1").Resize(UBound(FileAr) + 1, 1).Value = Application.Transpose(FileAr)
ws.Range("B1:B" & wsLRow).Formula = "=IF(ISERROR(VLOOKUP(A1," & wstemp.Name & _
"!A:A,1,0)),"""",VLOOKUP(A1," & wstemp.Name & "!A:A,1,0))"
ws.Range("B1:B" & wsLRow).Value = ws.Range("B1:B" & wsLRow).Value
Application.DisplayAlerts = False
wstemp.Delete
Application.DisplayAlerts = True
End Sub
Thanks for the responses.
I ended up solving this by making an array of the filenames in folderName, using the WinAPI FindFirstFile and FindNextFile functions to go through the folder, because it's over a network so iterating through the collection returned by fso.getfolder(foldername).files was too slow.
I then made a filename/basename dictionary from the filename array, as:
key | value
-----------------------
abc.pdf | abc
x-lmnop.pdf | lmnop
x-abc.pdf | abc
From this dictionary I made a reverse dictionary fileConcat that concatenated keys from duplicate basenames, as:
key | value
-----------------------
abc | abc.pdf,x-abc.pdf
lmnop | lmnop.pdf
I was then able to match the basename for each imageDict key to a key in fileConcat, and then iterate through an array of the concatenated values generated by:
split(fileConcat(key))
where key is the basename of the imageDict key.
As #chrisneilsen commented, eliminating the nested For loops reduces the growth rate to O(ImageNames)+O(Files), and the function now performs at a satisfactory speed.