I have a 100 million character text file that I'm trying to import into an MS Access database. The file does not have any linefeeds so it is just one giant line of text. I tried loading it into a string variable but that did not work because of the size. I then successfully loaded it into a byte array, but I'm unsure of how to parse it the way I need it. The file has fixed length records, but has more than one type of record. One type may be 180 characters of data and 220 characters of filler, and another may be 100 characters of data and 300 characters of filler. I would like to the different record types into separate tables. I was thinking of rewriting the data to a new text file in 400 character chunks, and then maybe using Trim to get just the data I need with no filler. From there I could test the line length and import.
Here is what I have, but it does not work.
Public Sub modMain_ParseAQTFiles()
Dim bytFile(400) As Byte
Dim intFileIn As Integer
Dim intFileOut As Integer
Dim intFileOut1 As Integer
Dim intFileOut2 As Integer
Dim intFFIn As Integer
Dim intFFOut As Integer
Dim lngBytePos As Long
Dim dblStartChar As Double
Dim lngNoRecs As Long
Dim lngIndex As Long
Dim strFileIn As String
Dim strFileOut1 As String
Dim strFileOut2 As String
Dim strLineOfText As String
Dim strTextLine As String
Dim strUserName As String
'Get username
strUserName = Environ("Username")
'Set file paths
strFileIn = "C:\Users\" & strUserName & "\Desktop\Pooltalk\aqt.txt"
strFileOut1 = "C:\Users\" & strUserName & "\ Desktop\Pooltalk\ AQT_Quartiles_Header-out.txt"
strFileOut2 = "C:\Users\" & strUserName & "\Desktop\Pooltalk \AQT_Quartiles_Detail-out.txt"
'Reads data into byte array
intFFIn = FreeFile
intFFOut = FreeFile
dblStartChar = 1
Open strFileIn For Binary Access Read As #intFFIn
lngNoRecs = LOF(intFFIn) / 400
For lngIndex = 1 To lngNoRecs
Get #intFFIn, dblStartChar, bytFile
strLineOfText = StrConv(bytFile, vbFromUnicode)
Open strFileOut For Binary Access Write As #intFFOut
Put intFFOut, dblStartChar, strLineOfText & vbCrLf
Debug.Print strLineOfText
dblStartChar = dblStartChar + 400
Next lngIndex
Close #intFFIn
Close #intFFOut
End Sub
I would be happy to hear if anyone has any advice to get this working. Thanks.
EDIT:
Here is one record type:
1004569 AS20170431360FCE319840701
34 characters of data and 366 blanks
Here is the second record type:
200456906875{06875{06875{06875{06875{06875{07I07I07I07I07I07I40B40B40B40B40B40B0000630000{0000630000{0000630000{0000630000{0000630000{0000630000{48{48{48{48{48{48{05926{05926{05926{05926{05926{05926{01250{01250{01250{01250{01250{01250{06875{06875{06875{06875{06875{06875{16875{16875{16875{16875{16875{16875{
307 characters and 93 blanks.
Here is my final code:
Public Sub modMain_ParseAQTFiles()
Dim intFileIn As Integer
Dim intFileOut1 As Integer
Dim intFFIn As Integer
Dim intFFOut As Integer
Dim lngNoRecs As Long
Dim lngIndex As Long
Dim strFileIn As String
Dim strFileOut1 As String
Dim strUserName As String
Dim strRecord As String
Dim dblStartChar As Double
Dim lngCharNo As Long
strUserName = Environ("Username")
'Set file paths
strFileIn = "C:\Users\" & strUserName & "\Desktop\Pooltalk\aqt.txt"
strFileOut1 = "C:\Users\" & strUserName & "\Desktop\Pooltalk\AQT_Parsed.txt"
strRecord = Space$(400)
dblStartChar = 1
'Reads data into byte array
intFFIn = FreeFile
Open strFileIn For Binary Access Read As #intFFIn
intFFOut = FreeFile
Open strFileOut1 For Binary Access Write As #intFFOut
'Find number of records
lngNoRecs = LOF(intFFIn) / 400
For lngIndex = 1 To lngNoRecs
Get #intFFIn, dblStartChar, strRecord
strRecord = Trim(strRecord)
Put intFFOut, , strRecord & vbCrLf
dblStartChar = dblStartChar + 400
strRecord = Space$(400)
Next lngIndex
Close #intFFIn
Close #intFFOut
MsgBox "Done!"
End Sub
If all records are 400 characters long, I would read them directly into a string variable of that length.
Dim strRecord As String
Dim x As Long
' Get reads as many characters as are in the target variable
strRecord = Space$(400)
Get #intFFIn, dblStartChar, strRecord
' Find first 0-byte character
x = Instr(strRecord, Chr$(0))
' and trim off the fillers
strRecord = Left$(strRecord, x-1)
See https://msdn.microsoft.com/VBA/Language-Reference-VBA/articles/get-statement at the bottom (before the example).
Related
I tried to get the number of lines of data in several CSV files in VBA.
Here is the code.
Sub Woo_Products()
Dim fso As New FileSystemObject
Dim flds As Folders
Dim fls As Files
Dim strText As String
Dim i As Integer
Dim j As Integer
Dim k As Integer
Dim extfind As String
Dim FilePath As String
Dim sLineOfText As String
On Error Resume Next
Workbooks.Open Filename:="F:\Work\scrape\" & "woocommerce-products.csv", UpdateLinks:=3
Set fls = fso.getfolder("C:\Users\star\Downloads").Files
k = 2
For Each f In fls
strText = f.Name
extfind = Right$(strText, Len(strText) - InStrRev(strText, "."))
If extfind = "csv" Then
FilePath = "C:\Users\star\Downloads\" & strText
Open FilePath For Input As #1
i = 0
Do Until EOF(1)
Line Input #1, sLineOfText
If sLineOfText <> "" Then i = i + 1
Loop
Close #1
End If
Next
Windows("woocommerce-products.csv").Activate
ActiveWorkbook.Save
ActiveWorkbook.Close
End Sub
But I am getting the same count for each file.
Of course, each file has different lines of data.
Hope to help me for this.
If all you need is a line count, I would write a function to return the count.
Function getFileLineCount(FullFileName As String, Optional LineDelimiter As String = vbNewLine) As Long
Dim text As String
Dim fileNo As Integer, n As Long
fileNo = FreeFile
Open FullFileName For Input As #fileNo
Do Until EOF(1)
Line Input #1, text
n = n + 1
Loop
Close #fileNo
getFileLineCount = n
End Function
Another approach using FileSystemObject:
Public Function GetLineCount(ByVal Path As String) As Long
With CreateObject("Scripting.FileSystemObject")
GetLineCount = UBound(Split(.OpenTextFile(Path, 1).ReadAll, vbNewLine)) + 1
End With
End Function
You'll need to add the references (Tools --> References)
Microsoft Scripting Runtime
Microsoft VBScript Regular Expressions 5.5
This will count "Return & NewLine" characters in the file.
Private Function LineCount(ByVal PathFile As String) As Long
Dim sData As String
Dim oFile As New FileSystemObject
sData = oFile.OpenTextFile(PathFile, ForReading).ReadAll
Dim oRegX As New RegExp
oRegX.Pattern = "\r\n"
oRegX.Global = True
LineCount = oRegX.Execute(sData).Count + 1
Set oRegX = Nothing
Set oFile = Nothing
End Function
i = ActiveWorkbook.ActiveSheet.Cells(ActiveWorkbook.ActiveSheet.Rows.Count, 1).End(xlUp).Row
It's working so well.
I am attempting to pull a column of letter codes using InStr from existing text files. I would like to allow this macro to be flexible enough to allow the option to manually input the letter code being searched.
Where I am running into an issue is when common numbers are in the letter code (i.e. C4 and C45). Originally the code was written to search for 2 specific letter codes, and I was hoping to be able to replace those 2 identifiers with a link to 2 cells.
No such luck, as I keep getting responses that are not exact. Please see the code below:
Private Sub CmdLettersGetfile_Click()
Dim objFSO As Object
Dim myDir As String, myList()
Dim objFolder As Object
Dim objFile As Object
Dim i As Integer
Dim sFolder As String
Dim fd As Office.FileDialog
Dim row As Long
Dim row1 As Long
Dim FCount As Integer
Dim FCount1 As Integer
Dim Val As String
Dim Pos As Integer
Dim Last1 As Long 'Mark the last row
Dim Start2 As Long
Dim Last2 As Long 'Mark the last row
Set fd = Application.FileDialog(msoFileDialogFilePicker)
For Each cn In ThisWorkbook.Connections
cn.Delete
Next cn
With ActiveSheet
Lastrow = Sheets("MAIN").Cells(.Rows.Count, "E").End(xlUp).row
End With
'
Worksheets("REPORT").Range("A6:AA1000000").ClearContents
Worksheets("REPORT").Range("A6:AA1000000").ClearFormats
row1 = 6 'Start of REPORT ROW
For row = 12 To Lastrow
sFile = Worksheets("MAIN").Cells(row, "E").Value
Pos = InStr(1, sFile, "org")
If Pos = 0 Then
Val = Worksheets("MAIN").Cells(9, "H")
Else
Val = Worksheets("MAIN").Cells(10, "H")
End If
Pos = 0
Dim strFilename As String
Dim strTextLine As String
Dim iFile As Integer
iFile = FreeFile
strFilename = folderName & "\" & sFile
Open strFilename For Input As #iFile
Last1 = 0
Last2 = 0
Do Until EOF(1)
Line Input #1, strTextLine
FCount = FCount
Pos = InStr(1, strTextLine, Val)
I cleaned up some of the irrelevant code in an attempt to save space and focus on the lines where I am having the issue.
********Update (26-Apr-2018 # 18:12 EST):
When I tried to format the Val to be one uppercase letter and 2 numbers
Dim Val As String: Val = "([A-Z]{1})([0-9]{1})([0-9]{1})"
I get the following:
Resulting data
Column I is the letter code that I need to get narrowed down.
*********Update 2 (27-Apr-2018 # 14:37 EST):
Expanded resulting data
I was able to mass redact what I needed to. I hope this offers more insight into what I'm looking to do.
I'm working on a sub that will use an HTTP post to upload a file to a server. I've already worked out all of the code to actually sent the post request, but I can't figure out how to get the file into the appropriate format. The server is expecting the file's bytes as a string, but no matter what approach I take to creating the string the values don't match up with what's expected.
Here is the code I currently have:
Sub Test()
Dim FileNum As Integer
Dim TestString As String
Dim TestBytes() As Byte
Dim TestByte As Variant
FileNum = FreeFile
Open ActiveDocument.Path & "\" & ActiveDocument.name For Binary Access Read As #FileNum 'Open document to write to string
ReDim TestBytes(LOF(FileNum) - 1)
Get FileNum, , TestBytes
Close FileNum
For Each TestByte In TestBytes
TestString = TestString & Hex(TestByte)
Next TestByte
Debug.Print TestString
End Sub
The output looks something like this (truncated, as the full string is obviously quite long):
504B3414060800021076DD8E8C9D130825B436F6E74656E745F54797065735D2E786D6C20...
The problem is, the example output I have says it should look like this:
UEsDBBQABgAIAAAAIQCuTjGvewEAAAIGAAATAAgCW0NvbnRlbnRfVHlwZXNdLnhtbCCiBAIooAACA...
I assumed the problem was that my test code is trying to encode the bytes as hex values, whereas the example is obviously not in hex, but when I just try to output the file's bytes directly as a string I get a lot of invalid characters. Here's the code:
Sub Test()
Dim FileNum As Integer
Dim TestString As String
Dim TestBytes() As Byte
Dim TestByte As Variant
FileNum = FreeFile
Open ActiveDocument.Path & "\" & ActiveDocument.name For Binary Access Read As #FileNum 'Open document to write to string
TestString = String$(LOF(FileNum), Chr(32)) 'Fill string with blank space to set string length
Get FileNum, , TestString 'Write binary data from file to string
Debug.Print TestString
End Sub
And here's the output:
PK ! vÝŽŒ p ...
Is there something I'm missing as to how I can encode the bytes to get output that's encoded the same as the example output? When performing a similar operation in another language (e.g. Java using readFileToString), how is the string encoded?
Using Alex K.'s advice in the comments above, and the Base64 encoding function found here, I came to this solution:
Sub Test()
Dim FileNum As Integer
Dim TestString As String
Dim TestBytes() As Byte
Dim TestByte As Variant
FileNum = FreeFile
Open ActiveDocument.Path & "\" & ActiveDocument.name For Binary Access Read As #FileNum 'Open document to write to string
TestString = String$(LOF(FileNum), Chr(32)) 'Fill string with blank space to set string length
Get FileNum, , TestString 'Write binary data from file to string
TestString = EncodeBase64(TestString)
Debug.Print TestString
End Sub
Function EncodeBase64(text As String) As String
Dim arrData() As Byte
arrData = StrConv(text, vbFromUnicode)
Dim objXML As MSXML2.DOMDocument
Dim objNode As MSXML2.IXMLDOMElement
Set objXML = New MSXML2.DOMDocument
Set objNode = objXML.createElement("b64")
objNode.DataType = "bin.base64"
objNode.nodeTypedValue = arrData
EncodeBase64 = objNode.text
Set objNode = Nothing
Set objXML = Nothing
End Function
Which outputs the correct string:
UEsDBBQABgAIAAAAIQCuTjGvewEAAAIGAAATAAgCW0NvbnRlbnRfVHlwZXNdLnhtbCCiBAIooAACA...
I am currently working in vb.net. My company is going paperless and I want to do a cost saving analysis on paper savings. Currently we save all of our PDF files onto a server. The file path is like this "Server>Folder1>Folder2>Folder3>Folder4>PDF files." Folders 1 and 2 are always used to navigate through. Folder 3 is a list of departments, and folder 4 is each job. Each folder 4 has multiple pdf files. To be put simply the names of Folder 1 and Folder 2 are static while folders 3 and 4 are dynamic. To make things even harder all of the PDF files located after folder 4 have different names. I have the bit of code below to detect how many pages a pdf is without having to open it but it requires the file pathway. Considering there are hundreds if not over a thousand pdf files I want to programmatically loop through all of these files, detect if the file is a pdf file, then sum all of the pages that are found. I can then use that number to calculate cost savings of going paperless.
PdfReader pr = new PdfReader("/path/to/yourFile.pdf");
return pr.getNumberOfPages();
Another idea would be to somehow merge all the files togther into a single PDF file which would make it as simple as opening the file to see how many pages are there.
Here is a VBA solution. Run the code in Excel.
Sub PDFandNumPages()
Dim Folder As Object
Dim file As Object
Dim fso As Object
Dim iExtLen As Integer, iRow As Integer
Dim sFolder As String, sExt As String
Dim sPDFName As String
sExt = "pdf"
iExtLen = Len(sExt)
iRow = 1
' Must have a '\' at the end of path
sFolder = "C:\your_path_here\"
Set fso = CreateObject("Scripting.FileSystemObject")
If sFolder <> "" Then
Set Folder = fso.GetFolder(sFolder)
For Each file In Folder.Files
If Right(file, iExtLen) = sExt Then
Cells(iRow, 1).Value = file.Name
Cells(iRow, 2).Value = pageCount(sFolder & file.Name)
iRow = iRow + 1
End If
Next file
End If
End Sub
Function pageCount(sFilePathName As String) As Integer
Dim nFileNum As Integer
Dim sInput As String
Dim sNumPages As String
Dim iPosN1 As Integer, iPosN2 As Integer
Dim iPosCount1 As Integer, iPosCount2 As Integer
Dim iEndsearch As Integer
' Get an available file number from the system
nFileNum = FreeFile
'OPEN the PDF file in Binary mode
Open sFilePathName For Binary Lock Read Write As #nFileNum
' Get the data from the file
Do Until EOF(nFileNum)
Input #1, sInput
sInput = UCase(sInput)
iPosN1 = InStr(1, sInput, "/N ") + 3
iPosN2 = InStr(iPosN1, sInput, "/")
iPosCount1 = InStr(1, sInput, "/COUNT ") + 7
iPosCount2 = InStr(iPosCount1, sInput, "/")
If iPosN1 > 3 Then
sNumPages = Mid(sInput, iPosN1, iPosN2 - iPosN1)
Exit Do
ElseIf iPosCount1 > 7 Then
sNumPages = Mid(sInput, iPosCount1, iPosCount2 - iPosCount1)
Exit Do
' Prevent overflow and assigns 0 to number of pages if strings are not in binary
ElseIf iEndsearch > 1001 Then
sNumPages = "0"
Exit Do
End If
iEndsearch = iEndsearch + 1
Loop
' Close pdf file
Close #nFileNum
pageCount = CInt(sNumPages)
End Function
Here is an alternative way of doing essentially the same thing.
Sub Test()
Dim MyPath As String, MyFile As String
Dim i As Long
MyPath = "C:\your_path_here\"
MyFile = Dir(MyPath & Application.PathSeparator & "*.pdf", vbDirectory)
Range("A:B").ClearContents
Range("A1") = "File Name": Range("B1") = "Pages"
Range("A1:B1").Font.Bold = True
i = 1
Do While MyFile <> ""
i = i + 1
Cells(i, 1) = MyFile
Cells(i, 2) = GetPageNum(MyPath & Application.PathSeparator & MyFile)
MyFile = Dir
Loop
Columns("A:B").AutoFit
MsgBox "Total of " & i - 1 & " PDF files have been found" & vbCrLf _
& " File names and corresponding count of pages have been written on " _
& ActiveSheet.Name, vbInformation, "Report..."
End Sub
'
Function GetPageNum(PDF_File As String)
'Haluk 19/10/2008
Dim FileNum As Long
Dim strRetVal As String
Dim RegExp
Set RegExp = CreateObject("VBscript.RegExp")
RegExp.Global = True
RegExp.Pattern = "/Type\s*/Page[^s]"
FileNum = FreeFile
Open PDF_File For Binary As #FileNum
strRetVal = Space(LOF(FileNum))
Get #FileNum, , strRetVal
Close #FileNum
GetPageNum = RegExp.Execute(strRetVal).Count
End Function
I am trying to delete some specific lines of a text using VB.Net. I saw a solution here however it is in VB6. The problem is, I am not really familiar with VB6. Can somebody help me?
This is the code from the link:
Public Function DeleteLine(ByVal fName As String, ByVal LineNumber As Long) _As Boolean
'Purpose: Deletes a Line from a text file
'Parameters: fName = FullPath to File
' LineNumber = LineToDelete
'Returns: True if Successful, false otherwise
'Requires: Reference to Microsoft Scripting Runtime
'Example: DeleteLine("C:\Myfile.txt", 3)
' Deletes third line of Myfile.txt
'______________________________________________________________
Dim oFSO As New FileSystemObject
Dim oFSTR As Scripting.TextStream
Dim ret As Long
Dim lCtr As Long
Dim sTemp As String, sLine As String
Dim bLineFound As Boolean
On Error GoTo ErrorHandler
If oFSO.FileExists(fName) Then
oFSTR = oFSO.OpenTextFile(fName)
lCtr = 1
Do While Not oFSTR.AtEndOfStream
sLine = oFSTR.ReadLine
If lCtr <> LineNumber Then
sTemp = sTemp & sLine & vbCrLf
Else
bLineFound = True
End If
lCtr = lCtr + 1
Loop
oFSTR.Close()
oFSTR = oFSO.CreateTextFile(fName, True)
oFSTR.Write(sTemp)
DeleteLine = bLineFound
End If
ErrorHandler:
On Error Resume Next
oFSTR.Close()
oFSTR = Nothing
oFSO = Nothing
End Function
Dim delLine As Integer = 10
Dim lines As List(Of String) = System.IO.File.ReadAllLines("infile.txt").ToList
lines.RemoveAt(delLine - 1) ' index starts at 0
System.IO.File.WriteAllLines("outfile.txt", lines)
'This can also be the file that you read in
Dim str As String = "sdfkvjdfkjv" & vbCrLf & "dfsgkjhdfj" & vbCrLf & "dfkjbhhjsdbvcsdhjbvdhs" & vbCrLf & "dfksbvashjcvhjbc"
Dim str2() As String = str.Split(vbCrLf)
For Each s In str2
If s.Contains("YourString") Then
'add your line to txtbox
Else
'don't add your line to txtbox
End If
Next
Or You Can Use
TextFile = TextFile.Replace("You want to Delete","")