Download large file from FTP server in chunks - vb.net

I need to download a large file from an FTP server. A new file is uploaded once a week, and I need to be the first to download the file. I've made a check which checks if the file is uploaded, and if the file is there, it will start the download. Problem is this is a big file (3 gb). I can download about 10% of the file within the first few minutes, but as more and more people discover the file is uploaded, the avg download speed drops and drops, to the point where it takes about 3-4 hours to download the remaining 80-90%.
The time isn't a huge problem, but sure would be nice if i could get the download done quicker. The problem is my download never finishes, and i think its because the connection gets timed out.
Solution would be to extend the download timeout, but ideally i have another suggestion. My suggestion is to download the file in chunks: Right now I'm downloading from the beginning to the end in 1 go. It starts of with a good downloadspeed, but as more and more people begin their download, it slows all of us down. I would like to split up the download in smaller chunks and then have all the separate downloads start at the same time. I've made an illustration:
Here i have 8 starting points, which means i'll end up with 8-parts of the zip file, which i then need to recombine to one file once the download has ended. Is this even possible and how would i approach this solution? If i could do this, i would be able complete with the entire download in about 10-15 minutes and I wouldn't have to wait the extra 3-4 hours for the download to fail and then having to restart the download.
Currently i use a web client to download the ftp file, since all other approaches couldn't finish the download, because the file is larger than 2,4 gb.
Private wc As New WebClient()
wc.DownloadFileAsync(New Uri("ftp://user:password#ip/FOLDER/" & FILENAME), downloadPath & FILENAME)

Related

how can I safely import files to sql server in ssis while new files are actively being written to the source directory?

I need to import many xml files into sql server every day. I was thinking of running a for each loop container every few minutes to import the files to the db table and then move them to another directory, but sometimes over a dozen new files are written to the source folder every minute. Is it going to be an issue if the Package tries to loop through the folder at the exact moment new files are being written to the folder? If so, how can I work around this?
You could loop over the files in a script task and attempt to move them to a separate "ReadyToProcess" folder in a try/catch. Catch the IOException if the file is in use by another process, and continue on to the next file. The skipped file will be picked up on the next run. Then loop over the files in "ReadyToProcess" to read them into the database.
It seems like you know what files are finished writing and what files are still being modified which makes things a little easier. It is important to remember: if your SSIS task tries to open a file this currently being modified or used by another process the SSIS package will fail.
You can work around this by using a script task to generate a list of files in your source folder at a point in time and use a for or foreach loop to only fetch the files that are in the generated list. This would be in contrast to fetching everything that's in your source folders, as your post implies.
Other solutions would be to batch your incoming files and offset the package execution time so there isn't a risk of the file being exported to SQL as it's imported into your source folder.
For instance, loading your source documents in batches every 30 minutes: 1:00, 1:30, 2...
and execute your SSIS task every 30 minutes, but offset from the batch by 15 minutes: 1:15, 1:45, 2:15...
Lastly, if possible, run your SSIS package at a period where there will be no new files written to your source folder. While not always possible, if you knew there wouldn't be any new documents coming in at 2AM that'd be the best time to scheudle your SSIS package.

FTP Server Download Returns blank files

We have a process in place built on Excel VBA that uploads a file to FTP Server. On the other side, our client downloads it. Very randomly, they complain that the file they received is blank (the file name is the same though). We then check at our end and see that the file that was uploaded was never blank. So here comes the problem: we're always arguing whether it was our error or theirs.
I figured that there might be a couple of reasons behind it but I have a few questions to ask before coming to conclusions:
If, say, the file was never uploaded (a possibility), what happens when the client runs a download process at their end? Can that download process generate a blank file with the same name as our output file? It sounds impossible to me but since the client is following up on this issue, I have to ask this silly question.
How does the mechanism work - what are the steps that happen on FTP server the moment my process completes uploading the file? I sometimes see that as soon as I upload the file, a 0kb file is created and then a second later (or less) the file with right size appears? Could it be possible that their process is running right before this actual file creation?
Thank you in advance for your help!

How to delete large file in Grails using Apache camel

I am using Grails 2.5. We are using Camel. I have folder called GateIn. In this delay time is 3minutes. So Every 3minutes , it will look into the folder for file. If the file exists, it will start to process. If the file is processed within 3 minutes, file get deleted automatically. Suppose my file takes 10minutes,file is not deleted.Again and again, it process the same file. How to make file get deleted whether it is small or bulk file. I have used noop= true to stop reuse of file. But i want to delete the file too once it is preocessed. Please give me some suggestion for that.
You can check the file size using camel file language and decide what to do next.
Usually, in this kind of small interval want to process a large size of file, it will be better to have another process zone (physical directory), you have to move the file after immediately consuming it to that zone.
You can have a separate logic or camel route to process the file. After successful process, you can delete or do appropriate step according to your requirement. Hope it helps !!

How chunk file upload works

I am working on file upload and really wandering how actually chunk file upload works.
While i understand client sends data in small chunks to server instead of complete file at once. But i have few questions on this:-
For browser to divide and send whole file into chunks, Will it read complete file to its memory? If yes, then again there will me chances of memory leak and browser crash for big files(say > 10GB)
How cloud application like google drive droopbox handles such big files upload?
If multiple files are selected to upload and all have size grater than 5-10 GB, Does browser keep all files into memory then send it chunk by chunk?
Not sure if you're still looking for answer, I been in your position recently, and here's what I've come up with, hope it helps: Deal chunk uploaded files in php
During uploading, If you can print out the request from the backend, you shall see three parameters: _chunkNumber, _totalSize and _chunkSize, with these parameters it's easy to decide whether this chunk is the last piece, if it is, assemble all of the pieces as a whole shouldn't be hard.
As for javascript side, ng-file-upload has a setting named "resumeChunkSize" where you can enable chunk mode and setup the chunk size.

How can I determine if files in a "drop folder" are completely transfered

Remote clients will upload images (and perhaps some instructional files in specially formatted text) to a "drop folder." Once the upload is complete we need to begin processing these images. It would be an easy, but flawed, solution to just have a script automatically begin processing any files in the folder every few seconds (the files can be move out of the folder once processed); but problems would arise when attempting to process large images which are only partially transfered.
What are some tricks I can use to ensure the files are fully uploaded before processing them?
A few of my own thoughts:
The script can check the validity of the file; ie, a partial jpeg would result in an error and you could respond to that error in the script, this would be fairly CPU intensive though. Some files have special markers on the end, but I can't count on this, I'm not sure what formats I'll be dealing with.
I've heard of "file handles" but haven't really figured out the basics of what they are and how I can tell if there is a "file handle" on a particular file. Basically the FTP daemon (actually, I'm on Windows, so "service") would keep a "handle" on the file while it's being uploaded and you would know not to process that file. These are just a few of my thoughts but I'm not really sure if they will work or if there are better or more accepted ways of solving this problem.
If you have an server-side script upload system (PHP, ASP, JSP, whatever), you could instruct the script to call another script to process the files, or to create a flag-file indicating the upload is done, something like this.
If your server is Linux-based, you can use lsof to check if the file is open. As your ftp/script/cgi will close the file after upload completes, lsof will not show the file in the list.
If your server is Windows-based, you can use Process Explorer to list the open files.
By what method are your users uploading the images?