VBA - Replacing commas in CSV not inside quotes - vba

Filename = Dir(Filepath & "\" & "*.csv")
While Filename <> ""
SourceFile = Filepath & "\" & Filename
TargetFile = SavePath & "\" & Replace(Filename, ".csv", ".txt")
OpenAsUnicode = False
Dim objFSO: Set objFSO = CreateObject("Scripting.FileSystemObject")
'Detect Unicode Files
Dim Stream: Set Stream = objFSO.OpenTextFile(SourceFile, 1, False)
intChar1 = Asc(Stream.Read(1))
intChar2 = Asc(Stream.Read(1))
Stream.Close
If intChar1 = 255 And intChar2 = 254 Then
    OpenAsUnicode = True
End If
'Get script content
Set Stream = objFSO.OpenTextFile(SourceFile, 1, 0, OpenAsUnicode)
arrData = Stream.ReadAll()
Stream.Close
'Create output file
Dim objOut: Set objOut = objFSO.CreateTextFile(TargetFile)
objOut.Write Replace(Replace(arrData,",", "#|#"), Chr(34), "") '-- This line is working fine but it is replacing all the commas inside the text qualifier as well..
objOut.Close
Filename = Dir
Wend
In the above code the line objOut.Write Replace(Replace(arrData,",", "#|#"), Chr(34), "") is replacing all the commas with #|# including the commas inside string.so I want to replace only commas which are not in double quotes.
File containing the string
"A","B,C",D
Result I need is
A#|#B,C#|#D
Thanks for your help in advance.

How about something along the line of:
objOut.Write Mid(Replace(Replace(arrData,""",""", "#|#"), Chr(34), ""), 2)
Basically, this exchanges now "," for #|#. But that's not enough as the file begins with a ". So, this one is being eliminated using the Mid() function. If the file also ends with a " then you would have to adjust that as well.
Based on the speed concerns noted in the comments here is the complete code which I used to test this solution:
Option Explicit
Option Compare Text
Public Sub ConvertFile()
Dim lngRowNumber As Long
Dim strLineFromFile As String
Dim strSourceFile As String
Dim strDestinationFile As String
strSourceFile = "C:\tmp\Extract.txt"
strDestinationFile = "C:\tmp\Extract_b.txt"
Open strSourceFile For Input As #1
Open strDestinationFile For Output As #2
lngRowNumber = 0
Do Until EOF(1)
Line Input #1, strLineFromFile
strLineFromFile = Mid(Replace(strLineFromFile, """,""", "#|#"), 2)
Write #2, strLineFromFile
strLineFromFile = vbNullString
Loop
Close #1
Close #2
End Sub
The tested file was 350 MB with a bit over 4 million rows. The code completed in less than a minute.

Related

Creating a table using VBA for Access

I managed code to create a group of tables based off of .csv files inside of a folder.
I want each of them to be a separate table so most of the concatenation posts weren't for me.
Public Function importExcelSheets(Directory As String) As Long
On Error Resume Next
Dim strDir As String
Dim strFile As String
Dim I As Long
Dim N As Long
Dim FSO As Object, MyFile As Object
Dim FileName As String, Arr As Variant
Dim Content As String
Dim objStreamIn
Dim objStreamOut
'Prepare Table names-------------------------------------------------------------------------------------
FileName = "path/to/table/names.txt"
Set FSO = CreateObject("Scripting.FileSystemObject")
Set MyFile = FSO.OpenTextFile(FileName, 1)
Arr = Split(MyFile.ReadAll, vbNewLine)
'Verify Directory and pull a file------------------------------------------------------------------------
If Left(Directory, 1) <> "\" Then
strDir = Directory & "\"
Else
strDir = Directory
End If
strFile = Dir(strDir & "*.csv")
'Fill Tables----------------------------------------------------------------------------------------------
I = UBound(Arr) - 1
While strFile <> ""
strFile = strDir & strFile
Set objStreamIn = CreateObject("ADODB.Stream")
Set objStreamOut = CreateObject("ADODB.Stream")
objStreamIn.Charset = "utf-8"
objStreamOut.Charset = "utf-8"
objStreamIn.Open
objStreamOut.Open
objStreamIn.LoadFromFile (strFile)
objStreamOut.Open
N = 1
While Not objStreamIn.EOS
Content = objStreamIn.ReadText(-2)
If N = 1 Then
Content = Replace(Content, "/", vbNullString, , 1)
objStreamOut.WriteText Content & vbCrLf
Else
objStreamOut.WriteText Content & vbCrLf
End If
N = N + 1
Wend
objStreamOut.SaveToFile strFile, 2
objStreamIn.Close
objStreamOut.Close
Set objStreamIn = Nothing
Set objStreamOut = Nothing
DoCmd.TransferText _
TransferType:=acImportDelim, _
TableName:=Arr(I), _
FileName:=strFile, _
HasFieldNames:=True, _
CodePage:=65001
strFile = Dir()
I = I - 1
Wend
importExcelSheets = I
End Function
It works until the last section where I use TransferText to create the table.
It will get different results based on a few things I've tried:
Running the script after commenting out the entire objStream section gives me the data and table names, but the headers are [empty], "F2", "F3", ... "F27".
I suspected it was because there was a forward slash in the first column header, so I put in the Replace() to remove it.
Running the script as in above gives me an empty table.
I now suspect that the encoding header of the file is the reason for this.
Running the script after changing objStreamOut.Charset = "utf-8" to objStreamOut.Charset = "us-ascii" and updating the CodePage to 20127 gives me an empty table with black diamond question marks for a column header.
I want to blame the encoding characters but it ran one time almost flawlessly with the utf-8 encoding and CodePage 65001. Is there another way around this?
Here is the Byte Order Mark of the file showing the UTF-8 Encoding
Edit: changed CodeType to CodePage and added vbCrLf to append to Content
Edit: Included picture of Hex for files showing UTF-8 offest
With the help from Comments it looks like I got it to work after fixing the vbCrLf problem. I switched the objStreamOut charset to us-ascii and changed the CodePage to 20127 to reflect that as well. I now have headers, table names, and data working normally. Here is the final code:
Public Function importExcelSheets(Directory As String) As Long
On Error Resume Next
Dim strDir As String
Dim strFile As String
Dim I As Long
Dim N As Long
Dim FSO As Object, MyFile As Object
Dim FileName As String, Arr As Variant
Dim Content As String
Dim objStreamIn
Dim objStreamOut
'Prepare Table names-------------------------------------------------------------------------------------
FileName = "path/to/table/names.txt"
Set FSO = CreateObject("Scripting.FileSystemObject")
Set MyFile = FSO.OpenTextFile(FileName, 1)
Arr = Split(MyFile.ReadAll, vbNewLine)
'Verify Directory and pull a file------------------------------------------------------------------------
If Left(Directory, 1) <> "\" Then
strDir = Directory & "\"
Else
strDir = Directory
End If
strFile = Dir(strDir & "*.csv")
'Fill Tables----------------------------------------------------------------------------------------------
I = UBound(Arr) - 1
While strFile <> ""
strFile = strDir & strFile
Set objStreamIn = CreateObject("ADODB.Stream")
Set objStreamOut = CreateObject("ADODB.Stream")
objStreamIn.Charset = "utf-8"
objStreamOut.Charset = "us-ascii"
objStreamIn.Open
objStreamOut.Open
objStreamIn.LoadFromFile (strFile)
objStreamOut.Open
N = 1
While Not objStreamIn.EOS
Content = objStreamIn.ReadText(-2)
If N = 1 Then
Content = Replace(Content, "/", vbNullString, , 1)
objStreamOut.WriteText Content & vbCrLf
Else
objStreamOut.WriteText Content & vbCrLf
End If
N = N + 1
Wend
objStreamOut.SaveToFile strFile, 2
objStreamIn.Close
objStreamOut.Close
Set objStreamIn = Nothing
Set objStreamOut = Nothing
DoCmd.TransferText _
TransferType:=acImportDelim, _
TableName:=Arr(I), _
FileName:=strFile, _
HasFieldNames:=True, _
CodePage:=20127
strFile = Dir()
I = I - 1
Wend
importExcelSheets = I
End Function
Still not entirely sure why VBA was not getting the correct data when I used utf-8 and 65001 for CodeType and works now for us-ascii. This will work for me however.

VBA to open Explorer dialogue, select txt file, and add a header that is the filename without file path

I have 100's of text files named correctly, but I need the name of the text file added into the first row (thus shifting the existing data down to the second row) with " on either side of the name.
The text files are over multiple folders, so I need to be able to open an explorer dialogue first to select multiple text files and add the new header row to every one.
Any help would be hugely appreciated as I cannot find the answer anywhere on google!
Tom
My attempt, but doesnt really work becaue 1. I have to set the directory, and 2. I need to have the filename with " either side, for example "Line1":
Sub ChangeRlnName()
'the final string to print in the text file
Dim strData As String
'each line in the original text file
Dim strLine As String
Dim time_date As String
Set FSO = CreateObject("Scripting.FileSystemObject")
'Get File Name
Filename = FSO.GetFileName("C:\Users\eflsensurv\Desktop\Tom\1.txt")
'Get File Name no Extension
FileNameWOExt = Left(Filename, InStr(Filename, ".") - 1)
strData = ""
time_date = Format(Date, "yyyymmdd")
'open the original text file to read the lines
Open "C:\Users\eflsensurv\Desktop\Tom\1.txt" For Input As #1
'continue until the end of the file
While EOF(1) = False
'read the current line of text
Line Input #1, strLine
'add the current line to strData
strData = strData + strLine & vbCrLf
Wend
'add the new line
strData = FileNameWOExt + vbLf + strData
Close #1
'reopen the file for output
Open "C:\Users\eflsensurv\Desktop\Tom\1.txt" For Output As #1
Print #1, strData
Close #1
End Sub
Try something like this:
Sub Tester()
Dim colFiles As Collection, f
'get all txt files under specified folder
Set colFiles = GetMatches("C:\Temp\SO", "*.txt")
'loop files and add the filename as a header
For Each f In colFiles
AddFilenameHeader CStr(f)
Next f
End Sub
Sub AddFilenameHeader(fpath As String)
Dim base, content
With CreateObject("scripting.filesystemobject")
base = .GetBaseName(fpath) 'no extension
With .OpenTextFile(fpath, 1)
'get any existing content
If Not .AtEndOfStream Then content = .readall()
.Close
End With
DoEvents
'overwrite existing content with header and previous content
.OpenTextFile(fpath, 2, True).write """" & base & """" & vbCrLf & content
End With
End Sub
'Return a collection of file paths given a starting folder and a file pattern
' e.g. "*.txt"
'Pass False for last parameter if don't want to check subfolders
Function GetMatches(startFolder As String, filePattern As String, _
Optional subFolders As Boolean = True) As Collection
Dim fso, fldr, f, subFldr, fpath
Dim colFiles As New Collection
Dim colSub As New Collection
Set fso = CreateObject("scripting.filesystemobject")
colSub.Add startFolder
Do While colSub.Count > 0
Set fldr = fso.getfolder(colSub(1))
colSub.Remove 1
If subFolders Then
For Each subFldr In fldr.subFolders
colSub.Add subFldr.Path
Next subFldr
End If
fpath = fldr.Path
If Right(fpath, 1) <> "\" Then fpath = fpath & "\"
f = Dir(fpath & filePattern) 'Dir is faster...
Do While Len(f) > 0
colFiles.Add fpath & f
f = Dir()
Loop
Loop
Set GetMatches = colFiles
End Function

Parse and format text file

I have a text file that is not in a format that I can use for printing labels. The current format is like this:
DY234-02 0.5 0.5 Qty 6
U21 U12 U14 U28
TR459-09 0.5 0.5 Qty 9
U11 U78 U7 U8 U30 U24
I need the file to end up like this:
DY234-02 0.5 0.5 Qty 6 U21 U12 U14 U28
TR459-09 0.5 0.5 Qty 9 U11 U78 U7 U8 U30 U24
The files contain about 100 lines of this format I have used vbscript to try to get what I need but the format is not much different. If someone could get me pointed in the right direction that would be great. I am open to all other methods for accomplishing this. Thanks
This is my code in vbscript, but is not doing the job correctly:
Const ForReading = 1
Const ForWriting = 2
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("C:\Scripts\parse.txt", ForReading)
Do Until objFile.AtEndOfStream
strLine1 = objFile.ReadLine
strLine2 = ""
If Not objFile.AtEndOfStream Then
strLine2 = objFile.ReadLine
End If
strNewLine = strLine1 & strLine2
strNewContents = strNewContents & strNewLine & vbCrLf
Loop
objFile.Close
Set objFile = objFSO.OpenTextFile("C:\Scripts\B3.txt", ForWriting, True)
objFile.Write strNewContents
objFile.Close
If the format is repeated like this, you can read in the text file line by line, and check if there is data on each line. If so join the data to an output string, otherwise add a carriage return to the output string, before finally outputting it to a new text file. Something like this perhaps:
Dim strInFile As String
Dim strOutFile As String
Dim intInFile As Integer
Dim intOutFile As Integer
Dim strInput As String
Dim strOutput As String
strInFile = "J:\downloads\data-in.txt"
strOutFile = "J:\downloads\data-out.txt"
intInFile = FreeFile
Open strInFile For Input As intInFile
intOutFile = FreeFile
Open strOutFile For Output As intOutFile
Do
Line Input #intInFile, strInput
If Len(Trim(strInput)) > 0 Then
strOutput = strOutput & " " & strInput
Else
strOutput = strOutput & vbCrLf
End If
Loop Until EOF(intInFile)
Print #intOutFile, strOutput
Reset
Regards,
Try next code, please. It is fast due to the fact it reads all the text value at once and drop the result, also at once. Everything is happening in memory.
Sub testSplitTextFile()
Dim objFSO As Object, objTF As Object, strIn As String, fullFilename As String, retFile As String
Dim arrIn As Variant, strRet As String, i As Long
'use here your path
fullFilename = "C:\Teste VBA Excel\Teste StackOverflow\TestSplit.txt"
retFile = "C:\Teste VBA Excel\Teste StackOverflow\RetFile.txt"'your path
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTF = objFSO.OpenTextFile(fullFilename, 1)
strIn = objTF.ReadAll 'it reads all the txt file string
objTF.Close
arrIn = Split(strIn, vbCrLf) 'it splits the above string on lines
'Then, it builds a string based on your conditions:
For i = 0 To UBound(arrIn) - 1
If arrIn(i) <> "" And arrIn(i + 1) <> "" Then
strRet = strRet & arrIn(i) & " " & arrIn(i + 1) & vbCrLf
End If
Next i
strRet = left(strRet, Len(strRet) - 1)' it eliminates the last vbCrLf character
FreeFile 1
Open retFile For Output As #1
Print #1, strRet 'it drops, at once the created string
Close #1
End Sub

How to find length of all .csv files in directory?

I have multiple .csv files that I need to find the length of in my directory. (The number of rows that have data in them.) I'm running the following code from a .xlsx file in the same directory. (I intend to copy data from the .csv files to the .xlsx file eventually.)
i = 1
FilePath = Application.ActiveWorkbook.Path & "\"
file = Dir(FilePath & "*.csv")
Do While Len(file) > 0
Open FilePath & file For Input As #1
length(i) = Cells(Rows.Count, 1).End(xlUp).Row
i = i + 1
Close #1
file = Dir
Loop
All the values of the length array end up being 1, even though the .csv files are probably 15-20 rows long.
You're not actually opening the file in Excel so you can't count how many cells there are. Try reading how many lines instead:
Open FilePath & file For Input As #1
While Not EOF(1): Line Input #1, trashLine: Wend
i = i + 1
Close #1
Alternatively, open the file in Excel - test - then close afterwards:
Set tempWB = Workbooks.Open(FilePath & file)
i = i + tempWB.Sheets(1).Cells(tempWB.Sheets(1).Rows.Count, 1).End(xlUp).Row
tempWB.Close False
Or an even quicker way is to use Windows Script:
Dim i As Long
For Each varFile In _
Filter(Split(CreateObject("WScript.Shell").Exec("cmd /c find /v /c """" """ _
& ThisWorkbook.Path & "\*.csv""").StdOut.ReadAll, vbCrLf), ":")
i = i + CLng(Split(varFile, ":")(2))
Next
Debug.Print i
That way, if you've got 10 files the code is only working with 10 strings rather than opening/closing a file or reading thousands of lines...
As #SOofWXLS stated, your code is not opening the files in Excel, you are opening them for direct i/o.
Here is a complete code sample that will fill your array with the file lengths as you were trying to do.
Dim fPath As String
Dim fName As String
Dim hFile As Long
Dim i As Long
Dim NumLines As Long
Dim length() As Long
Dim strLine As String
ReDim length(1 To 1)
fPath = Application.ActiveWorkbook.Path & "\"
fName = Dir(fPath & "*.csv")
Do While Len(fName) > 0
i = i + 1
NumLines = 0
ReDim Preserve length(1 To i)
hFile = FreeFile
Open fPath & fName For Input As hFile
Do While Not EOF(hFile)
Line Input #hFile, strLine
NumLines = NumLines + 1
Loop
Close hFile
length(i) = NumLines
fName = Dir
Loop
This will also dynamically expand your array to accommodate as many files as are found.

Reading in data from text file into a VBA array

I have the following VBA code:
Sub read_in_data_from_txt_file()
Dim dataArray() As String
Dim i As Integer
Const strFileName As String = "Z:\sample_text.txt"
Open strFileName For Input As #1
' -------- read from txt file to dataArrayay -------- '
i = 0
Do Until EOF(1)
ReDim Preserve dataArray(i)
Line Input #1, dataArray(i)
i = i + 1
Loop
Close #1
Debug.Print UBound(dataArray())
End Sub
I'm trying to read in text line by line (assume 'sample.txt' is a regular ascii file) from a file and assign this data to consecutive elements in an array.
When I run this, I get all my data in the first value of the array.
For example, if 'sample.txt' is:
foo
bar
...
dog
cat
I want each one of these words in a consecutive array element.
What you have is fine; if everything ends up in dataArray(0) then the lines in the file are not using a CrLf delimiter so line input is grabbing everything.
Instead;
open strFileName for Input as #1
dataArray = split(input$(LOF(1), #1), vbLf)
close #1
Assuming the delimiter is VbLf (what it would be coming from a *nix system)
Here is a clean code on how to use for each loop in VBA
Function TxtParse(ByVal FileName As String) As String
Dim fs, ts As Object
Dim strdic() As String
Dim oitem As Variant
Set fs = CreateObject("Scripting.FileSystemObject")
Set ts = fs.OpenTextFile(FileName, 1, False, -2)
strdic = Split(ts.ReadAll, vbLf)
For Each oitem In strdic
If InStr(oitem, "YourString") <> 0 Then
Else
If InStr(1, oitem, vbTab) <> 0 Then
Debug.Print "Line number is : "; "'" & Replace(oitem, vbTab, "','") & "'"
Else
Debug.Print "Line number is : "; "'" & Replace(oitem, ",", "','") & "'"
End If
End If
Next
End Function