GZipstream cuts off data from original file when decompressing - vb.net

For a long time I have been trying to debug why my parsing counts were off when downloading files to be parsed and been made to look really dumb about this. I did some debugging and found that the file I download when trying to decompress using GZipStream shows that it misses data from the original file. Here is my code for decompressing:
Using originalFileStream As FileStream = fileItem.OpenRead()
Dim currentFileName As String = fileItem.FullName
Dim newFileName = currentFileName.Remove(currentFileName.Length - fileItem.Extension.Length)
newFile = newFileName
Using decompressedFileStream As FileStream = File.Create(newFileName)
Using decompressionStream As GZipStream = New GZipStream(originalFileStream, CompressionMode.Decompress)
decompressionStream.CopyTo(decompressedFileStream)
Console.WriteLine("Decompressed: {0}", fileItem.Name)
decompressionStream.Close()
originalFileStream.Close()
End Using
End Using
End Using
Now what I do is return the newfile to the calling function and read the contents from there:
Dim responseData As String = inputFile.ReadToEnd
Now pasting the url in the browser and downloading from there and then opening using winrar I can see the data is not the same. Now this does not happen all the time as some files parse and decompress correctly. Each downloaded file has check counter to compare how many posts I am supposed to be parsing from it and that triggered me to see the mismatch in counts.
EDIT
Here is what I found in addition. If I read the problem file (as I said that only some files happen this way) by individual lines I will get all the data:
Dim rData As String = inputFile.ReadLine
If Not rData = "" Then
While Not inputFile.EndOfStream
rData = inputFile.ReadLine + vbNewLine + rData
End While
getIndividualPosts(rData)
End If
Now if I try to read an individual line from a file that is not problematic it will return nothing and so I will have to readtoEnd. Can anyone explain this odd behavior and is it related to the GZIPSTREAM or some error in my code.

Related

Remove double quotes in the content of text files

I am using a legacy application where all the source code is in vb.net. I am checking if the file exists and if the condition is true replace all the " in the contents of the file. For instance "text" to be replaced as text. I am using the below code.
vb.net
Dim FileFullPath As String
FileFullPath = "\\Fileshare\text\sample.txt"
If File.Exists(FileFullPath) Then
Dim stripquote As String = FileFullPath
stripquote = stripquote.Replace("""", "").Trim()
Else
'
End If
I get no errors and at the same time the " is not being replaced in the content of the file.
Data:
ID, Date, Phone, Comments
1,05/13/2021,"123-000-1234","text1"
2,05/13/2021,"123-000-2345","text2"
3,05/13/2021,"123-000-3456","text2"
Output:
1,05/13/2021,123-000-1234,text1
2,05/13/2021,123-000-2345,text2
3,05/13/2021,123-000-3456,text2
You can read each line of the file, remove the double-quotes, write that to a temporary file, then when all the lines are done delete the original and move/rename the temporary file as the filename:
Imports System.IO
'...
Sub RemoveDoubleQuotes(filename As String)
Dim tmpFilename = Path.GetTempFileName()
Using sr As New StreamReader(filename)
Using sw As New StreamWriter(tmpFilename)
While Not sr.EndOfStream
sw.WriteLine(sr.ReadLine().Replace("""", ""))
End While
End Using
End Using
File.Delete(filename)
File.Move(tmpFilename, filename)
End Sub
Add error handling as desired.
The best way to go about this depends on the potential size of the file. If the file is relatively small then there's no point processing it line by line and certainly not using a TextFieldParser. Just read the data in, process it and write it out:
File.WriteAllText(FileFullPath,
File.ReadAllText(FileFullPath).
Replace(ControlChars.Quote, String.Empty))
Only if the file is potentially large and reading it all in one go would require too much memory should you consider processing it line by line. In that case, I'd go this way:
'Let the system create a temp file.
Dim tempFilePath = Path.GetTempFileName()
'Open the temp file for writing text.
Using tempFile As New StreamWriter(tempFilePath)
'Open the source file and read it line by line.
For Each line In File.ReadLines(FileFullPath)
'Remove double-quotes from the current line and write the result to the temp file.
tempFile.WriteLine(line.Replace(ControlChars.Quote, String.Empty))
Next
End Using
'Overwrite the source file with the temp file.
File.Move(tempFilePath, FileFullPath, True)
Note the use of File.ReadLines rather than File.ReadAllLines. The former will only read one line at a time where the latter reads every line before you can process any of them.
EDIT:
Note that this:
File.Move(tempFilePath, FileFullPath, True)
only works in .NET Core 3.0 and later, including .NET 5.0. If you're targeting .NET Framework then you have three other options:
Delete the original file (File.Delete) and then move the temp file (File.Move).
Copy the temp file (File.Copy) and then delete the temp file (File.Delete).
Call My.Computer.FileSystem.MoveFile to move the temp file and overwrite the original file in one go.
TextFieldParser is probably the way to go.
Your code with a few changes.
Static doubleQ As String = New String(ControlChars.Quote, 2)
Dim FileFullPath As String
FileFullPath = "\\Fileshare\text\sample.txt"
If IO.File.Exists(FileFullPath) Then
Dim stripquote As String = IO.File.ReadAllText(FileFullPath)
stripquote = stripquote.Replace(doubleQ, "").Trim()
Else
'
End If
Note the static declaration. I adopted this approach because it confused the heck out of me.

Add a path to a code VB.net / visual basic

how do I add a path to a code where "HERE_HAS_TO_BE_A_PATH" is. When I do, Im getting an error message. The goal is to be able to specific the path where is the final text file saved.
Thanks!
Here is a code:
Dim newFile As IO.StreamWriter = IO.File.CreateText("HERE_HAS_TO_BE_A_PATH")
Dim fix As String
fix = My.Computer.FileSystem.ReadAllText("C:\test.txt")
fix = Replace(fix, ",", ".")
My.Computer.FileSystem.WriteAllText("C:\test.txt", fix, False)
Dim query = From data In IO.File.ReadAllLines("C:\test.txt")
Let name As String = data.Split(" ")(0)
Let x As Decimal = data.Split(" ")(1)
Let y As Decimal = data.Split(" ")(2)
Let z As Decimal = data.Split(" ")(3)
Select name & " " & x & "," & y & "," & z
For i As Integer = 0 To query.Count - 1
newFile.WriteLine(query(i))
Next
newFile.Close()
1) Use a literal string:
The easiest way is replacing "HERE_HAS_TO_BE_A_PATH" with the literal path to desired output target, so overwriting it with "C:\output.txt":
Dim newFile As IO.StreamWriter = IO.File.CreateText("C:\output.txt")
2) Check permissions and read/write file references are correct:
There's a few reasons why you might be having difficulties, if you're trying to read and write into the root C:\ directory you might be having permissions issues.
Also, go line by line to make sure that the input and output files are correct every time you are using one or the other.
3) Make sure the implicit path is correct for non-fully qualified paths:
Next, when you test run the program, it's not actually in the same folder as the project folder, in case you're using a relative path, it's in a subfolder "\bin\debug", so for a project named [ProjectName], it compiles into this folder by default:
C:\path\to\[ProjectName]\bin\Debug\Program.exe
In other words, if you are trying to type in a path name as a string to save the file to and you don't specify the full path name starting from the C:\ drive, like "output.txt" instead of "C:\output.txt", it's saving it here:
C:\path\to\[ProjectName]\bin\Debug\output.txt
To find out exactly what paths it's defaulting to, in .Net Framework you can check against these:
Application.ExecutablePath
Application.StartupPath
4) Get user input via SaveFileDialogue
In addition to a literal string ("C:\output.txt") if you want the user to provide input, since it looks like you're using .Net Framework (as opposed to .Net Core, etc.), the easiest way to set a file name to use in your program is using the built-in SaveFileDialogue object in System.Windows.Forms (like you see whenever you try to save a file with most programs), you can do so really quickly like so:
Dim SFD As New SaveFileDialog
SFD.Filter = "Text Files|*.txt"
SFD.ShowDialog()
' For reuse, storing file path to string
Dim myFilePath As String = SFD.FileName
Dim newFile As IO.StreamWriter = IO.File.CreateText(myFilePath) ' path var
' Do the rest of your code here
newFile.Close()
5) Get user input via console
In case you ever want to get a path in .Net Core, i.e. with a console, the Main process by default accepts a String array called args(), here's a different version that lets the user add a path as the first parameter when running the program, or if one is not provided it asks the user for input:
Console.WriteLine("Hello World!")
Dim myFilePath = ""
If args.Length > 0 Then
myFilePath = args(0)
End If
If myFilePath = "" Then
Console.WriteLine("No file name provided, please input file name:")
While (myFilePath = "")
Console.Write("File and Path: ")
myFilePath = Console.ReadLine()
End While
End If
Dim newFile As IO.StreamWriter = IO.File.CreateText(myFilePath) ' path var
' Do the rest of your code here
newFile.Close()
6) Best practices: Close & Dispose vs. Using Blocks
In order to keep the code as similar to yours as possible, I tried to change only the pieces that needed changing. Vikyath Rao and Mary respectively pointed out a simplified way to declare it as well as a common best practice.
For more information, check out these helpful explanations:
Can any one explain why StreamWriter is an Unmanaged Resource. and
Should I call Close() or Dispose() for stream objects?
In summary, although streams are managed and should garbage collect automatically, due to working with the file system unmanaged resources get involved, which is the primary reason why it's a good idea to manually dispose of the object. Your ".close()" does this. Overrides for both the StreamReader and StreamWriter classes call the ".dispose()" method, however it is still common practice to use a Using .. End Using block to avoid "running with scissors" as Enigmativity puts it in his post, in other words it makes sure that you don't go off somewhere else in the program and forget to dispose of the open filestream.
Within your program, you could simply replace the "Dim newFile As IO.StreamWriter = IO.File.CreateText("C:\output.txt")" and "newFile.close()" lines with the opening and closing statements for the Using block while using the simplified syntax, like so:
'Dim newFile As IO.StreamWriter = IO.File.CreateText(myFilePath) ' old
Using newFile As New IO.StreamWriter(myFilePath) ' new
Dim fix As String = "Text from somewhere!"
newFile.WriteLine(fix)
' other similar operations here
End Using ' new -- ensures disposal
'newFile.Close() ' old
You can write that in this way. The stream writer automatically creates the file.
Dim newFile As New StreamWriter(HERE_HAS_TO_BE_A_PATH)
PS: I cannot mention all these in the comment section as I have reputations less than 50, so I wrote my answer. Please feel free to tell me if its wrong
regards,
vikyath

Closing an XML file after reading so it can be deleted

I have a problem deleting an XML file after loading it into .XMLDocument.
My code parses the XML file for specific nodes and allocates their values to variables.
Once complete the code processes data based on the values from the XML file.
This works fine until the end when i try to delete the XML file as it is still open and i then get a error "The process cannot access the file because it is being used by another process" which i guess is the XMLDocument reader.
Here is a section of the the XML processing code - this works fine.
`Dim xmlDoc As XmlDocument = New XmlDocument()
xmlDoc.Load(strFileName)
intPassed = xmlDoc.SelectSingleNode("//CalibrationPassed").InnerText
boolCheck = xmlDoc.SelectSingleNode("//ChecksComplete").InnerText
intCertRequired = xmlDoc.SelectSingleNode("//Schedule").InnerText
Console.WriteLine("Calibration Passed: " & intPassed)
Console.WriteLine("Checks Complete:" & boolCheck)
Console.WriteLine("Schedule: " & intCertRequired)
strFirstName = xmlDoc.SelectSingleNode("//FirstName").InnerText
strEMail = xmlDoc.SelectSingleNode("//Email").InnerText
strCusEmail = xmlDoc.SelectSingleNode("//CustomerEmail").InnerText
strCompanyName = xmlDoc.SelectSingleNode("//CompanyName").InnerText
strContractNumber = xmlDoc.SelectSingleNode("//ContractNo").InnerText
Console.WriteLine("First name: " & strFirstName)
Console.WriteLine("Email: " & strEMail)
Console.WriteLine("Customer EMail: " & strCusEmail)
Console.WriteLine("Company name: " & strCompanyName)
Console.WriteLine("Contract no: " & strContractNumber)
Console.WriteLine("XML Parsing Complete")
`
The code being used to delete the file is:
If System.IO.File.Exists(strFileName) = True Then
System.IO.File.Delete(strFileName)
Console.WriteLine("Deleted XML file")
End If
Any help on where I'm going wrong would be great-fully received.
Thanks
XmlDocument.Load uses a stream reader under the hood. There are two strategies for avoiding this:
1) A Using block will close/dispose your stream automatically and promptly
Using xmlDoc As XmlDocument = New XmlDocument()
xmlDoc.Load(strFileName)
'all of your copying stuff
End Using
'now delete your file
2) Load your XML and avoid using a reader:
Dim strXml as string
strXml = System.IO.File.ReadAllText(strFileName)
Dim xmlDoc As XmlDocument = New XmlDocument()
xmlDoc.LoadXml(strXml)
'and then the rest of your code
The downside to the 2nd approach is that my example doesn't consider any other encoding, but it should get you past your current problem. Dealing with various encoding options is a whole different matter.
If you're not using an xmlreader, then try this:
Dim xmlDoc = New XmlDocument()
doc.Load(strFileName)
//do all the reading stuff
Using writer = New StreamWriter(strFileName)
xmlDoc.Save(writer)
End Using
It will save your xmlDoc (in an attempt at disposing) then it should have unlocked the document which can then be deleted.
I haven't tested the code but give it a go
Thanks for all the help, it was being held open by streamreader which i had not considered to be a cause of the problem as I assumed it would have caused an error when XMLDocument used the file.
This worked for me. Setting the reference to nothing (null) to force a garbage collection.
xmlDoc = Nothing

Illegal Characters in Path Error When Downloading CSV File

I need download a CSV file and then read it. Here is my code:
tickerValue = "goog"
Dim strURL As String = "http://ichart.yahoo.com/table.csv?s=" & tickerValue
Dim strBuffer As String = RequestWebData(strURL)
Using streamReader = New StreamReader(strBuffer)
Using reader = New CsvReader(streamReader)
I keep getting this error: An unhandled exception of type 'System.ArgumentException' occurred in mscorlib.dll Additional information: Illegal characters in path.
What am I doing wrong?
Additional Info
In another part of my program I use this code and it works fine.
Address = http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=AMEX&render=download
Dim strBuffer As String = Historical_Stock_Prices.RequestWebData(Address)
Using streamReader = New StringReader(strBuffer)
Using reader = New CsvReader(streamReader)
Isn't my second code the same concept as my problem code?
you are giving it, essentially, a web url. somewhere in your code, it does not support the web url. it could be the streamreader. it could be the CsvReader.
what line of code does this point to?
the best bet is to save the file TO DISK, then read from disk.
UPDATE
here is an example to SAVE to disk:
using writer as new StreamWriter("C:\Test.csv")
writer.Write(strBuffer)
writer.Close()
end using
here is an example to READ from disk:
using strReader as new StreamReader("C:\Test.csv")
' this code is presumably how it works for reading into the CsvReader:
using reader as new CsvReader(strReader)
' now do your thing
end using
strReader.Close()
end using

Stream Reader and Writer Conflict

I am making a class that is to help with saving some strings to a local text file (I want to append them to that file and not overwrite so that it is a log file). When I write with the streamwriter to find the end of the previous text, I get an error "the file is not available as it is being used by another process". I looked into this problem on MSDN and I got very little help. I tried to eliminate some variables so I removed the streamreader to check was that the problem and it was. When I tried to write to the file then it worked and I got no error so this made me come to the conclusion that the problem arose in the streamreader. But I could not figure out why?
Here is the code:
Public Sub SaveFile(ByVal Task As String, ByVal Difficulty As Integer, ByVal Time_Taken As String)
Dim SW As String = "C:/Program Files/Business Elements/Dashboard System Files/UserWorkEthic.txt"
Dim i As Integer
Dim aryText(3) As String
aryText(0) = Task
aryText(1) = Difficulty
aryText(2) = Time_Taken
Dim objWriter As System.IO.StreamWriter = New System.IO.StreamWriter(SW, True)
Dim reader As System.IO.StreamReader = New System.IO.StreamReader(SW, True)
reader.ReadToEnd()
reader.EndOfStream.ToString()
For i = 0 To 3
objWriter.WriteLine(aryText(reader.EndOfStream + i))
Next
reader.Close()
objWriter.Close()
End Sub
As Joel has commented on the previous answer it is possible to change the type of locking.
Otherwise building on what Neil has suggested, if to try to write to a file with a new reader it is difficult not to lose the information already within the file.
I would suggest you rename the original file to a temporary name, "UserWorkEthicTEMP.txt" for example. Create a new text file with the original name. Now; read a line, write a line, between the two files, before adding your new data onto the end. Finally Delete the temporary file and you will have the new file with the new details. If you have an error the temporary file will serve as a backup of the original. Some sample code below:
Change file names
Dim Line as string
line=Reader.readline
Do until Line=nothing
objwriter.writeline(line)
line=reader.readline
loop
add new values on the end and remove old file
You are trying to read and write to the same file and this is causing a lock contention. Either store the contents of the file into a variable and then write it back out including your new data to the file.
Psuedo
Reader.Open file
String content = Reader.ReadToEnd()
Reader.Close
Writer.Open file
Loop
Writer.Write newContent
Writer.Close