This question already exists:
Read specific bytes from a file, but they might be anywhere in the file (VB.NET) [closed]
Closed 2 years ago.
I have a file that changes size depending on the amount of data it contains. With that, the location of the bytes that I want to read moves back and forth every time the file is saved by its main application. I am using the string inside the file "This is the data" to get close to the bytes I want to read 31 38 33 34. They're always on the same position after the string, regardless of the size of the file. The only consistent thing is the string, the bytes will be different every time.
Try
TextBuffer = File.ReadAllText("C:\test.txt")
Catch ex As Exception
Exit Sub
End Try
Dim indexTar As Integer = TextBuffer.IndexOf("This is the data")
If indexTar >= 0 Then
ListView1.Items.Add("This is the data")
End If
I use the code above to read the whole file and end up near the location where the bytes I want to read are.
How do I read those bytes 31 38 33 34?
I'm not quite sure why you're talking about bytes, when this seems to be a text file - it would be easier to read and treat it as such, but simplistically you can read the whole thing into memory, find the index of what you know and then add some amount to get to the thing you don't:
Dim s = File.ReadAllText("C:\test.txt")
Dim indexTar = s.IndexOf("This is the data")
If indexTar >= 0 Then
Dim tIdx = indexTar + "This is the data".Length + 4 'seems to be 4 bytes after the end of the string
Dim iWantText = s.Substring(tIdx, 4)
End If
Now, iWantText contains 1834. If you want it as a byte array, Dim bytes = Encoding.ASCII.GetBytes(iWantText) will give it you.
It might be better, if the file is huge, to read it char by char (it will be buffered elsewhere, don't worry about inefficiency of reading one char at a time) looking for T and if you find it, see if his is my data follows...
I have a simple .txt log file to which an application adds lines as it does its work. The lines consist of a timestamp and a variable-length text:
17-06-25 06:37:43 xxxxxxxxxxxxxxx
17-06-25 06:37:46 yyyyyyy
17-06-25 06:37:50 zzzzzzzzzzzzzzzzzzzzzzzzzzzz
...
I need to extract all lines with a timestamp greater than a certain date-time. This typically is about the last, say, 20-40 log entries (lines).
The problem is, that the file is large and growing.
If all lengths would be equal, I'd invoke a binary search. But they aren't, and so I end up using something like:
Private Sub ExtractNewestLogs(dEarliest As Date)
Dim sLine As String = ""
Dim oSRLog As New StreamReader(gsFilLog)
sLine = oSRLog.ReadLine()
Do While Not (sLine Is Nothing)
Debug.Print(sLine)
sLine = oSRLog.ReadLine()
Loop
End Sub
which, well, isn't really fast.
Is there a method with which I can read such files "backwards", i.e., last line first? If not, what other option do I have?
The function below will return the last x number of characters from a file as an array of strings using a binary reader. You can then pull the last records that you want much more quickly than reading the entire log file. You can fine tune the number of bytes to read according to a rough approximation of how many bytes are taken by the last 20-40 log entries. On my pc - it took <10ms to read the last 10,000 characters of a 17mb text file.
Of course this code assumes that your log file is plain ascii text.
Private Function ReadLastbytes(filePath As String, x As Long) As String()
Dim fileData(x - 1) As Byte
Dim tempString As New StringBuilder
Dim oFileStream As New FileStream(filePath, FileMode.Open, FileAccess.Read)
Dim oBinaryReader As New BinaryReader(oFileStream)
Dim lBytes As Long
If oFileStream.Length > x Then
lBytes = oFileStream.Length - x
Else
lBytes = oFileStream.Length
End If
oBinaryReader.BaseStream.Seek(lBytes, SeekOrigin.Begin)
fileData = oBinaryReader.ReadBytes(lBytes)
oBinaryReader.Close()
oFileStream.Close()
For i As Integer = 0 To fileData.Length - 1
If fileData(i)=0 Then i+=1
tempString.Append(Chr(fileData(i)))
Next
Return tempString.ToString.Split(vbCrLf)
End Function
I attempted a binary search anyway, eventhough the file has not static line lengths.
First some considerations, then the code:
Sometimes it is needed, that the last n lines of a log file are extracted, based on an ascending sort key at the beginning of the line. The key really could be anything, but in log files typically represents a date-time, usually in the format YYMMDDHHNNSS (possibly with some interpunction).
Log files typically are text based files, consisting of multiple lines, at times millions of them. Often log files feature fixed-length line widths, in which case a specific key is quite easy to access with a binary search. However, probably also as often, log files have a variable line width. To access these, one can use an estimate of an average line width in order to calculate a file position from the end, and then process from there sequentially to the EOF.
But one can employ a binary approach also for this type of files, as demonstrated here. The advantage comes in, as soon as file sizes grow. A log file's maximum size is determined by the file system: NTFS allows for 16 EiB (16 x 2^60 B), theoretically; in practice under Windows 8 or Server 2012, it's 256 TiB (256 x 2^40 B).
(What 256 TiB actually means: a typical log file is designed to be readable by a human and rarely exceeds many more than 80 characters per line. Let's assume your log file logs along happily and completely uninterrupted for astonishing 12 years for a total of 4,383 days at 86,400 seconds each, then your application is allowed to write 9 entries per millisecond into said log file, to eventually meet the 256 TiB limit in its 13th year.)
The great advantage of the binary approach is, that n comparisons suffice for a log file consisting of 2^n bytes, rapidly gaining advantage as the file size becomes larger: whereas 10 comparisons are required for file sizes of 1 KiB (1 per 102.4 B), there are only 20 comparisons needed for 1 MiB (1 per 50 KiB), 30 for 1 GiB (1 per 33⅓ MiB), and a mere 40 comparisons for files sized 1 TiB (1 per 25 GiB).
To the function. These assumptions are made: the log file is encoded in UTF8, the log lines are separated by a CR/LF sequence, and the timestamp is located at the beginning of each line in ascending order, probably in the format [YY]YYMMDDHHNNSS, possibly with some interpunction in between. (All of these assumptions could easily be modified and cared for by overloaded function calls.)
In an outer loop, binary narrowing is done by comparing the provided earliest date-time to match. As soon as a new position within the stream has been found binarily, an independent forward search is made in an inner loop to locate the next CR/LF-sequence. The byte after this sequence marks the start of the record's key being compared. If this key is larger or equal the one we are in search for, it is ignored. Only if the found key is smaller than the one we are in search for its position is treated as a possible condidate for the record just before the one we want. We end up with the last record of the largest key being smaller than the searched key.
In the end, all log records except the ultimate candidate are returned to the caller as a string array.
The function requires the import of System.IO.
Imports System.IO
'This function expects a log file which is organized in lines of varying
'lengths, delimited by CR/LF. At the start of each line is a sort criterion
'of any kind (in log files typically YYMMDD HHMMSS), by which the lines are
'sorted in ascending order (newest log line at the end of the file). The
'earliest match allowed to be returned must be provided. From this the sort
'key's length is inferred. It needs not to exist neccessarily. If it does,
'it can occur multiple times, as all other sort keys. The returned string
'array contains all these lines, which are larger than the last one found to
'be smaller than the provided sort key.
Public Shared Function ExtractLogLines(sLogFile As String,
sEarliest As String) As String()
Dim oFS As New FileStream(sLogFile, FileMode.Open, FileAccess.Read,
FileShare.Read) 'The log file as file stream.
Dim lMin, lPos, lMax As Long 'Examined stream window.
Dim i As Long 'Iterator to find CR/LF.
Dim abEOL(0 To 1) As Byte 'Bytes to find CR/LF.
Dim abCRLF() As Byte = {13, 10} 'Search for CR/LF.
Dim bFound As Boolean 'CR/LF found.
Dim iKeyLen As Integer = sEarliest.Length 'Length of sort key.
Dim sActKey As String 'Key of examined log record.
Dim abKey() As Byte 'Reading the current key.
Dim lCandidate As Long 'File position of promising candidate.
Dim sRecords As String 'All wanted records.
'The byte array accepting the records' keys is as long as the provided
'key.
ReDim abKey(0 To iKeyLen - 1) '0-based!
'We search the last log line, whose sort key is smaller than the sort
'provided in sEarliest.
lMin = 0 'Start at stream start
lMax = oFS.Length - 1 - 2 '0-based, and without terminal CRLF.
Do
lPos = (lMax - lMin) \ 2 + lMin 'Position to examine now.
'Although the key to be compared with sEarliest is located after
'lPos, it is important, that lPos itself is not modified when
'searching for the key.
i = lPos 'Iterator for the CR/LF search.
bFound = False
Do While i < lMax
oFS.Seek(i, SeekOrigin.Begin)
oFS.Read(abEOL, 0, 2)
If abEOL.SequenceEqual(abCRLF) Then 'CR/LF found.
bFound = True
Exit Do
End If
i += 1
Loop
If Not bFound Then
'Between lPos and lMax no more CR/LF could be found. This means,
'that the search is over.
Exit Do
End If
i += 2 'Skip CR/LF.
oFS.Seek(i, SeekOrigin.Begin) 'Read the key after the CR/LF
oFS.Read(abKey, 0, iKeyLen) 'into a string.
sActKey = System.Text.Encoding.UTF8.GetString(abKey)
'Compare the actual key with the earliest key. We want to find the
'largest key just before the earliest key.
If sActKey >= sEarliest Then
'Not interested in this one, look for an earlier key.
lMax = lPos
Else
'Possibly interesting, remember this.
lCandidate = i
lMin = lPos
End If
Loop While lMin < lMax - 1
'lCandidate is the position of the first record to be taken into account.
'Note, that we need the final CR/LF here, so that the search for the
'next CR/LF sequence following below will match a valid first entry even
'in case there are no entries to be returned (sEarliest being larger than
'the last log line).
ReDim abKey(CInt(oFS.Length - lCandidate - 1)) '0-based.
oFS.Seek(lCandidate, SeekOrigin.Begin)
oFS.Read(abKey, 0, CInt(oFS.Length - lCandidate))
'We're done with the stream.
oFS.Close()
'Convert into a string, but omit the first line, then return as a
'string array split at CR/LF, without the empty last entry.
sRecords = (System.Text.Encoding.UTF8.GetString(abKey))
sRecords = sRecords.Substring(sRecords.IndexOf(Chr(10)) + 1)
Return sRecords.Split(ControlChars.CrLf.ToCharArray(),
StringSplitOptions.RemoveEmptyEntries)
End Function
I've been trying for 2 weeks to uncompress this user-defined TXXX string from an MP3 ID2,3 file.
000000B0789C6330377433D63534D575F3F737B570343767B02929CA2C4B2D4BCD2B29B6B301D376367989B9A976C519F9E50ACE1989452536FA60019B924C20696800017A10CA461F2C6AA30FD58A61427E5E72AA42228A114666E6F88CD047721100D5923799
Thanks to Dr. Adler for the correct answer when I converted the values to a string.
I have tried both MS DeflateStream and GZipstream with no success.
Every example I see uses a stream file. I am not using a file, I have the above zLib code in both an array or string variable.
GZipstream gives me 'no magic number' and Deflatestream gives me 'Block length does not match with its complement'.
I read this post:
http://george.chiramattel.com/blog/2007/09/deflatestream-block-length-does-not-match.html
tried removing bytes from the head, no luck. (I read trazillions of articles for sending a string to Deflatestream but again 'no luck'!
I have the above string, so how do I send it to Deflatestream? I'd post the two hundred different code examples I tried but that would be silly.
The funny thing is, I built my webAudio cue marker editor in less than two weeks and this is the last thing I have it do (my program must get the marker positions from a program that has worst audio editor known to man (they embedded them in the MP3 for some (bad) reason). Hence, I wrote my own to change audio cue marker so I could save hours of frustration at work. However, I'm not getting much sleep lately.
Help me get some sleep, please.
You can use a MemoryStream instead of a FileStream as they are both Streams:
Imports System.IO
Imports System.IO.Compression
Imports System.Text
Module Module1
Function HexStringToBytes(s As String) As Byte()
If (s.Length And 1) = 1 Then
Throw New ArgumentException("String is an odd number of characters in length - it must be even.")
End If
Dim bb As New List(Of Byte)
For i = 0 To s.Length - 1 Step 2
bb.Add(Convert.ToByte(s.Substring(i, 2), 16))
Next
Return bb.ToArray()
End Function
Sub Main()
Dim s = "000000B0789C6330377433D63534D575F3F737B570343767B02929CA2C4B2D4BCD2B29B6B301D376367989B9A976C519F9E50ACE1989452536FA60019B924C20696800017A10CA461F2C6AA30FD58A61427E5E72AA42228A114666E6F88CD047721100D5923799"
Dim result As String = ""
' trim off the leading zero bytes and skip the three bytes 0xB0 0x78 0x9C
Dim buffer = HexStringToBytes(s).SkipWhile(Function(b) b = 0).Skip(3).ToArray()
Using ms As New MemoryStream(buffer)
Using decompressedMemoryStream As New MemoryStream
Using decompressionStream As New DeflateStream(ms, CompressionMode.Decompress)
decompressionStream.CopyTo(decompressedMemoryStream)
result = Encoding.Default.GetString((decompressedMemoryStream.ToArray()))
End Using
End Using
End Using
Console.WriteLine(result)
Console.ReadLine()
End Sub
End Module
Outputs:
71F3-15-FOO58A77 <trivevents><event><name>show Chart</name><time>10000000.000000</time></event><event><name>show once a</name><time>26700000.000000</time></event></trivevents>
(There is a leading zero byte.)
P.S. It looks a bit strange that there is 71F3-15-FOO58A77 with letter Os instead of zeros.
P.P.S. If you could get the compressed data into a Base64 string instead of a hex string, you could pack more data into the same space.
I have this System.Net.Sockets.TcpClient. I can make a connection and write packets to it accurately. The problem is want to read the packets as it is (as caught by sniffers in the hex form like 0F 03 56 56 etc.)
I tried looking at examples of GetStream.write but I fail to read them in such way. I also tried using a streamreader then convert the packets to hex but the thing I connect to sends screwed up packets which can't be converted or simple not in string form.
I hope I'm clear enough.
Dim bytes(tcpClient.ReceiveBufferSize) As Byte
' Read can return anything from 0 to numBytesToRead.
' This method blocks until at least one byte is read.
netStream.Read(bytes, 0, CInt(tcpClient.ReceiveBufferSize))
' Returns the data received from the host to the console.
Dim returndata As String = Encoding.ASCII.GetString(bytes)
Found the solution! Here it is:
Dim Bytes() as Byte = New Byte(1024){}
Array.Resize(Bytes, TcpClient.Client.Receive(Bytes, SocketFlags.None))
For Each B As Byte In Bytes
Debug.Print("Byte in HEX Format: " & B.ToString("X2"))
Next
So we can re read that again.
Say I did:
Dim offset = sr.BaseStream.Position
Dim l As String = sr.ReadLine()
Dim nextOffset = sr.BaseStream.Position
Now that nextOffset will automatically become 1024 even though the length of l is only 62. I understand that the stream read character 1k at a time. So I suppose there is a 1k stufs in the buffer. I guess I will need to find the offset within that buffer. How do I know that?
Also, knowing the offset, can we readline starting from the offset latter?
Basically, in the future, I want to do:
sr2.BaseStream.Position = offset1
Dim l2 = sr2.ReadLine