I'd like to know the last-modifed date of a remote file (defined via url).
And only download it, if it's newer than my locally stored one.
I managed to do that for local files, but can't find a solution to do that for remote files (without downloading them)
working:
Dim infoReader As System.IO.FileInfo = My.Computer.FileSystem.GetFileInfo("C:/test.txt")
MsgBox("File was last modified on " & infoReader.LastWriteTime)
not working:
Dim infoReader As System.IO.FileInfo = My.Computer.FileSystem.GetFileInfo("http://google.com/robots.txt")
MsgBox("File was last modified on " & infoReader.LastWriteTime)
I'd love to have a solution which will only have to download the headers of a file
You can use the System.Net.Http.HttpClient class to fetch the last modified date from the server. Because it's sending a HEAD request, it will not fetch the file contents:
Dim client = New HttpClient()
Dim msg = New HttpRequestMessage(HttpMethod.Head, "http://google.com/robots.txt")
Dim resp = client.SendAsync(msg).Result
Dim lastMod = resp.Content.Headers.LastModified
You could also use the If-Modified-Since request header with a GET request. This way the response should be 304 - Not Modified if the file has not been changed (no file content sent), or 200 - OK if the file has been changed (and the contents of the file will be sent in the response), although the server is not required to honor this header.
Dim client = New HttpClient()
Dim msg = New HttpRequestMessage(HttpMethod.Get, "http://google.com/robots.txt")
msg.Headers.IfModifiedSince = DateTimeOffset.UtcNow.AddDays(-1) ' use the date of your copy of the file
Dim resp = client.SendAsync(msg).Result
Select Case resp.StatusCode
Case HttpStatusCode.NotModified
' Your copy is up-to-date
Case HttpStatusCode.OK
' Your copy is out of date, so save it
File.WriteAllBytes("C:\robots.txt", resp.Content.ReadAsByteArrayAsync.Result)
End Select
Note the use of .Result, since I was testing in a console application - you should probably await instead.
If the server offers it, you can get it through the HTTP header Last-Modified property. But your still stuck at downloading the full file.
You could get it through FTP.
See if the server allows you to see the list of files in a folder.
If the website offer the date somewhere that you could pull through screen scrapping.
I know this is a little bit old question but, there's still a better answer.
Dim req As WebRequest = HttpWebRequest.Create("someurl")
req.Method = "HEAD"
Dim resp As WebResponse = req.GetResponse()
Dim remoteFileLastModified As String = resp.Headers.Get("Last-Modified")
Dim remoteFileLastModifiedDateTime As DateTime
If DateTime.TryParse(remoteFileLastModified, remoteFileLastModifiedDateTime) Then
MsgBox("Date Last Modified:" + remoteFileLastModifiedDateTime.ToString("d MMMM yyyy dddd HH:mm:ss"))
Else
MsgBox("could not determine")
End If
Related
I need to delete a Sharepoint file using the Script Task from SSIS. In Visual Basic, I've tried using the SPListItemCollection with imports Microsoft.Sharepoint but it doesn't recognize the namespace. I didn't find lots of threads on this subject or what I've found wasn't related with script task, so any help will be really appreciated. Many thanks
Update based on #Hadi answer
Thanks Hadi for your answer. I've given up the idea of using SPListCollection as it seems too complicated. Instead I'm trying to delete the file after it is downloaded from Sharepoint to the local folder. I would need help at the line that actually deletes the file. Here is the code:
Public Sub Main()
Try
' get location of local folder
Dim dir As DirectoryInfo = New DirectoryInfo(Dts.Variables("DestP").Value.ToString())
If dir.Exists Then
' Create the filename for local storage
Dim file As FileInfo = New FileInfo(dir.FullName & "\" & Dts.Variables("FileName").Value.ToString())
If Not file.Exists Then
' get the path of the file to download
Dim fileUrl As String = Dts.Variables("SHP_URL").Value.ToString()
If fileUrl.Length <> 0 Then
Dim client As New WebClient()
If Left(fileUrl, 4).ToLower() = "http" Then
'download the file from SharePoint
client.Credentials = New System.Net.NetworkCredential(Dts.Variables("$Project::UserN").Value.ToString(), Dts.Variables("$Project::Passw").Value.ToString())
client.DownloadFile(fileUrl.ToString() & "/" & Dts.Variables("FileName").Value.ToString(), file.FullName)
Else
System.IO.File.Copy(fileUrl.ToString() & Dts.Variables("FileName").Value.ToString(), file.FullName)
End If
'delete file from Sharepoint
client.(fileUrl.ToString() & "/" & Dts.Variables("FileName").Value.ToString(), file.FullName).delete()
Else
Throw New ApplicationException("EncodedAbsUrl variable does not contain a value!")
End If
End If
Else
Throw New ApplicationException("No ImportFolder!")
End If
Catch ex As Exception
Dts.Events.FireError(0, String.Empty, ex.Message, String.Empty, 0)
Dts.TaskResult = ScriptResults.Failure
End Try
Dts.TaskResult = ScriptResults.Success
End Sub
Update 1 - Delete using FtpWebRequest
You cannot delete file using WebClient class. You can do that using FtpWebRequest class. And send a WebRequestMethods.Ftp.DeleteFile request as mentioned in the link below:
How To Delete a File From FTP Server in C#
It should work with Sharepoint also.
Here is the function in VB.NET
Private Function DeleteFile(ByVal fileName As String) As String
Dim request As FtpWebRequest = CType(WebRequest.Create(fileUrl.ToString() & "/" & fileName), FtpWebRequest)
request.Method = WebRequestMethods.Ftp.DeleteFile
request.Credentials = New NetworkCredential(Dts.Variables("$Project::UserN").Value.ToString(), Dts.Variables("$Project::Passw").Value.ToString())
Using response As FtpWebResponse = CType(request.GetResponse(), FtpWebResponse)
Return response.StatusDescription
End Using
End Function
You should replace the following line:
client.(fileUrl.ToString() & "/" & Dts.Variables("FileName").Value.ToString(), file.FullName).delete()
With
DeleteFile(Dts.Variables("FileName").Value.ToString())
Also you may use the following credentials:
request.Credentials = System.Net.CredentialCache.DefaultNetworkCredentials;
References
How to pass credentials to httpwebrequest for accessing SharePoint Library
How To Delete a File From FTP Server in C#
Downloading all files in a FTP folder and then deleting them
Deleting file from FTP in C#
How To Delete a File From FTP Server in C#
Initial Answer
I was searching for a similar issue from a while, it looks like you cannot delete a Sharepoint file in SSIS using a File System Task or Execute Process Task, the only way is using a Script Task. There are many links online describing this process such as:
how to delete or remove only text files from share point in C# or SSIS script?
Fastest way to delete all items with C#
Deleting files programatically
Deleting all the items from a large list in SharePoint
Concerning the problem that you have mentioned, i think you should make sure that Microsoft.Sharepoint.dll is added as a reference inside the Script Task. If so try using Microsoft.Sharepoint.SPListItemCollection instead of SPListItemCollection.
Thanks #Hadi for your help.
For me it didn't work with FTPWebResponse.
It worked with HttpWebRequest. Here is the script:
Dim request As System.Net.HttpWebRequest = CType(WebRequest.Create(fileUrl.ToString() & "/" & Dts.Variables("FileName").Value.ToString()), HttpWebRequest)
request.Credentials = New System.Net.NetworkCredential(Dts.Variables("$Project::UserN").Value.ToString(), Dts.Variables("$Project::Passw").Value.ToString())
request.Method = "DELETE"
Dim response As System.Net.HttpWebResponse = CType(request.GetResponse(), HttpWebResponse)
I'm trying to upload a file from an FTP site to Basecamp using the Basecamp API. I'm using a simple console application. Here's my code:
Try
Dim accountID As String = ConfigurationManager.AppSettings("BaseCampID")
Dim projectID As Integer = 9999999
Dim folderName As String = "XXXXX/XXXXX"
Dim fileName As String = "XXX.zip"
'The URL to access the attachment method of the API
Dim apiURL = String.Format("https://basecamp.com/{0}/api/v1/projects/{1}/attachments.json", accountID, projectID)
'Get the file from the FTP server as a byte array
Dim fileBytes As Byte() = GetFileBytes(String.Format("{0}\\{1}", folderName, fileName))
'Initialize the WebClient object
Dim client As New WebClient()
client.Headers.Add("Content-Type", "application/zip")
'Need to provide a user-agent with a URL or email address
client.Headers.Add("User-Agent", "Basecamp Upload (email#email.com)")
'Keep the connection alive so it doesn't close
client.Headers.Add("Keep-Alive", "true")
'Provide the Basecamp credentials
client.Credentials = New NetworkCredential("username", "password")
'Upload the file as a byte array to the API, and get the response
Dim responseStr As Byte() = client.UploadData(apiURL, "POST", fileBytes)
'Convert the JSON response to a BaseCampAttachment object
Dim attachment As BaseCampAttachment
attachment = JSonHelper.FromJSon(Of BaseCampAttachment)(Encoding.Default.GetString(responseStr))
Catch ex As Exception
Console.WriteLine(ex.Message)
Finally
Console.ReadLine()
End Try
But whenever it calls client.UploadData, I get the error message "The underlying connection was closed: The connection was closed unexpectedly." I ran into this issue earlier and thought I solved it by adding the "Keep-Alive" header, but it's not working anymore. The API works if I upload a local file with client.UploadFile, but I'd like to just upload the file from they byte array from the FTP rather than downloading the file locally then uploading it to Basecamp.
Any thoughts would be greatly appreciated. Thanks!
I never figured out what was wrong with the WebClient call, but I ended up using a Basecamp API wrapper from https://basecampwrapper.codeplex.com. That wrapper uses HTTPRequest and HTTPResponse instead of WebClient.UploadData. It's also much easier to just use that wrapper than to try writing my own code from scratch.
Is there a way to check a downloaded file is already exists by comparing it's file size?
Below is my download code.
Private Sub bgw_DoWork(ByVal sender As Object, ByVal e As System.ComponentModel.DoWorkEventArgs)
Dim TestString As String = "http://123/abc.zip," & _
"http://abc/134.zip,"
address = TestString.Split(CChar(",")) 'Split up the file names into an array
'loop through each file to download and create/start a new BackgroundWorker for each one
For Each add As String In address
'get the path and name of the file that you save the downloaded file to
Dim fname As String = IO.Path.Combine("C:\Temp", IO.Path.GetFileName(add))
My.Computer.Network.DownloadFile(add, fname, "", "", False, 60000, True) 'You can change the (False) to True if you want to see the UI
'End If
Next
End Sub
The size of a local file can be determined using the File or FileInfo class from the System.IO namespace. To determine the size of a file to be downloaded by HTTP, you can use an HttpWebRequest. If you set it up as though you were going to download the file but then set the Method to Head instead of Get you will get just the response headers, from which you can read the file size.
I've never done it myself, or even used an HttpWebRequest to download a file, so I'm not going to post an example. I'd have to research it and you can do that just as easily as I can.
Here's an existing thread that shows how it's done in C#:
How to get the file size from http headers
Here's a VB translation of the code from the top answer:
Dim req As System.Net.WebRequest = System.Net.HttpWebRequest.Create("https://stackoverflow.com/robots.txt")
req.Method = "HEAD"
Using resp As System.Net.WebResponse = req.GetResponse()
Dim ContentLength As Integer
If Integer.TryParse(resp.Headers.Get("Content-Length"), ContentLength)
'Do something useful with ContentLength here
End If
End Using
A better practice would be to write this line:
req.Method = "HEAD"
like this:
req.Method = System.Net.WebRequestMethods.Http.Head
For a long time I have been trying to debug why my parsing counts were off when downloading files to be parsed and been made to look really dumb about this. I did some debugging and found that the file I download when trying to decompress using GZipStream shows that it misses data from the original file. Here is my code for decompressing:
Using originalFileStream As FileStream = fileItem.OpenRead()
Dim currentFileName As String = fileItem.FullName
Dim newFileName = currentFileName.Remove(currentFileName.Length - fileItem.Extension.Length)
newFile = newFileName
Using decompressedFileStream As FileStream = File.Create(newFileName)
Using decompressionStream As GZipStream = New GZipStream(originalFileStream, CompressionMode.Decompress)
decompressionStream.CopyTo(decompressedFileStream)
Console.WriteLine("Decompressed: {0}", fileItem.Name)
decompressionStream.Close()
originalFileStream.Close()
End Using
End Using
End Using
Now what I do is return the newfile to the calling function and read the contents from there:
Dim responseData As String = inputFile.ReadToEnd
Now pasting the url in the browser and downloading from there and then opening using winrar I can see the data is not the same. Now this does not happen all the time as some files parse and decompress correctly. Each downloaded file has check counter to compare how many posts I am supposed to be parsing from it and that triggered me to see the mismatch in counts.
EDIT
Here is what I found in addition. If I read the problem file (as I said that only some files happen this way) by individual lines I will get all the data:
Dim rData As String = inputFile.ReadLine
If Not rData = "" Then
While Not inputFile.EndOfStream
rData = inputFile.ReadLine + vbNewLine + rData
End While
getIndividualPosts(rData)
End If
Now if I try to read an individual line from a file that is not problematic it will return nothing and so I will have to readtoEnd. Can anyone explain this odd behavior and is it related to the GZIPSTREAM or some error in my code.
Trying to download file in code.
Current code:
Dim uri As New UriBuilder
uri.UserName = "xxx"
uri.Password = "xxx"
uri.Host = "xxx"
uri.Path = "xxx.aspx?q=65"
Dim request As HttpWebRequest = DirectCast(WebRequest.Create(uri.Uri), HttpWebRequest)
request.AllowAutoRedirect = True
request = DirectCast(WebRequest.Create(DownloadUrlIn), HttpWebRequest)
request.Timeout = 10000
'request.AllowWriteStreamBuffering = True
Dim response As HttpWebResponse = Nothing
response = DirectCast(request.GetResponse(), HttpWebResponse)
Dim s As Stream = response.GetResponseStream()
'Write to disk
Dim fs As New FileStream("c:\xxx.pdf", FileMode.Create)
Dim read As Byte() = New Byte(255) {}
Dim count As Integer = s.Read(read, 0, read.Length)
While count > 0
fs.Write(read, 0, count)
count = s.Read(read, 0, read.Length)
End While
'Close everything
fs.Close()
s.Close()
response.Close()
Running this code and checking the response.ResponseUri indicates im being redirected back to the login page and not to the pdf file.
For some reason its not authorising access what could I be missing as Im sending the user name and password in the uri? Thanks for your help
You don't need all of that code to download a file from the net
just use the WebClient class and its DownloadFile method
you should check and see if the site requires cookies (most do), i'd use a packet analyzer and run your code and see exactly what the server is returning. use fiddler or http analyzer to log packets
With UWP, this has become a more pertinent question as UWP does not have a WebClient. The correct answer to this question is if you are being re-directed to the login page, then there must be an issue with your credentials OR the setting (or lack of) header for the HttpWebRequest.
According to Microsoft, the request for downloading is sent with the call to GetResponse() on the HttpWebRequest, therefore the downloaded file SHOULD be in the stream in the response (returned by the GetResponse() call mentioned above).