Stream pdfs from url and add it to Zip - pdf

I have a mvc 4.5 application where I show a grid. The first column of the grid is a document name. The document name is an hyper link to the actual document that is hosted on our site and is available via a url. The documents can be pdf or doc or ppt. I can access these documents only via url and I do not have access to the actual physical document on our server.
I am providing users an option to select one or many of these documents from the grid and then they can download them. What I am trying to achieve is read each of the selected documents via the url and write it to a zip file and make the zip file downloadable. So users will be downloading one file instead of multiple files.
I have tried to stream the documents via url in memory and then add it to the zip file using ZipArchive Library from Microsoft. This is not working for me.
I was able to add documents that was on disk to zip file using Zip Archive and it works great. But I do not have access to the physical document as I can access the documents only through URL. My next option is to download each of these documents into a temp location on server and then add it to zip file using Zip Archive.But I am trying to avoid downloading files into a temp location
Please suggest how I can achieve reading documents via url in memory and adding each of these document to zip file and make zip file downloadable.
Any help will be appreciated.

Thank you Cbroe for commenting. I figured the answer. The problem was I was reading the pdf from the url and convert it to a memory stream and then was trying to add the memory stream to ZipArchive which was not working but instead I extracted the byte array out of the memory stream and then added it to the zip archive and it worked.
Here is the code snippet that might be useful for some one. My first contribution to Stack OverFlow.
public FileResult DownloadZip()
{
MemoryStream memoryStream = new MemoryStream();
using (var archive = new ZipArchive(memoryStream, ZipArchiveMode.Create, true))
{
var demoFile = archive.CreateEntry("Pdf123.pdf");
var convertedStream = ConvertTobyte("http://www.example.com/Pdf123.pdf");
using (var entryStream = demoFile.Open())
{
entryStream.Write(convertedStream, 0, convertedStream.Length);
}
demoFile = archive.CreateEntry("Pdf456.pdf");
convertedStream = ConvertTobyte("http://www.example.com/Pdf456.pdf");
using (var entryStream = demoFile.Open())
{
entryStream.Write(convertedStream, 0, convertedStream.Length);
}
}
//This option is to write the zip to your local disk
using (var fileStream = new FileStream(#"C:\Temp\test.zip", FileMode.Create))
{
memoryStream.Seek(0, SeekOrigin.Begin);
memoryStream.CopyTo(fileStream);
}
//This option is to donload the zip via browser
memoryStream.Seek(0, SeekOrigin.Begin);
return new FileStreamResult(memoryStream, "application/zip")
{
FileDownloadName = "Archive.zip"
};
}
private static byte[] ConvertTobyte(string fileUrl)
{
byte[] imageData = null;
using (var wc = new System.Net.WebClient())
imageData = wc.DownloadData(fileUrl);
return imageData;
}

Related

.NET Core API saving image upload asynchronously with ImageSharp, MemoryStream and FileStream

I have a .NET Core API that I'd like to extend to save uploaded images asynchronously.
Using ImageSharp I should be able to check uploads and resize if predefined size limits are exceeded. However I can't get a simple async save working.
A simple (non-async) save to file works without problem:
My Controller extracts IFormFile from the upload and calls the following method without any problem
public static void Save(IFormFile image, string imagesFolder)
{
var fileName = Path.Combine(imagesFolder, image.FileName);
using (var stream = image.OpenReadStream())
using (var imgIS = Image.Load(stream, out IImageFormat format))
{
imgIS.Save(fileName);
}
}
ImageSharp is currently lacking async methods so a workaround is necessary.
The updated code below saves the uploaded file but the format is incorrect - when viewing the file I get the message "It appears we don't support this file format".
The format is extracted from the ImageSharp Load method. and used when saving to MemoryStream.
MemoryStream CopyToAsync method is used to save to FileStream to make the upload asynchronous.
public static async void Save(IFormFile image, string imagesFolder)
{
var fileName = Path.Combine(imagesFolder, image.FileName);
using (var stream = image.OpenReadStream())
using (var imgIS = Image.Load(stream, out IImageFormat format))
using (var memoryStream = new MemoryStream())
using (var fileStream = new FileStream(fileName, FileMode.OpenOrCreate))
{
imgIS.Save(memoryStream, format);
await memoryStream.CopyToAsync(fileStream).ConfigureAwait(false);
fileStream.Flush();
memoryStream.Close();
fileStream.Close();
}
}
I can't work out whether the issue is with ImageSharp Save to MemoryStream, or the MemoryStream.CopyToAsync.
I'm currently getting 404 on SixLabors docs - hopefully not an indication that the project has folded.
How can I make the upload async and save to file in the correct format?
CopyToAsync copies a stream starting at its current position. You must change the current position of memoryStream back to start before copying:
// ...
memoryStream.Seek(0, SeekOrigin.Begin);
await memoryStream.CopyToAsync(fileStream).ConfigureAwait(false);
// ...

Can Azure Data Lake Store .NET SDK read and write binary files?

The ADLS .NET SDK has some good examples for reading and creating text files. This uses StreamReader and this shouldn't be used with binary files. I tried using BinaryReader but have been unsuccessful.
https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-data-operations-net-sdk
//Read file contents
using (var readStream = new StreamReader(client.GetReadStream(fileName)))
{
string line;
while ((line = readStream.ReadLine()) != null)
{
Console.WriteLine(line);
}
}
Can the .NET SDK create/read binary? If so, are there any examples of doing this?
Can the .NET SDK create/read binary? If so, are there any examples of doing this?
Short answer is yes, please refer to the following demo code.
Create binary File
AdlsClient adlsClient = AdlsClient.CreateClient($"{datalakeAccount}.azuredatalakestore.net", clientCreds);
using (var stream = adlsClient.CreateFile("file name", IfExists.Overwrite))
{
byte[] textByteArray = File.ReadAllBytes(#"local file path");
stream.Write(textByteArray, 0, textByteArray.Length);
}
read binary file and write to local file
using (var filesream = adlsClient.GetReadStream("1.png"))
{
MemoryStream memorystream = new MemoryStream();
filesream.CopyTo(memorystream);
memorystream.Position = 0;
File.WriteAllBytes(#"filename", memorystream.ToArray());
}

Rename azure blob file and Download it

I have few files in azure blobs that are stored with unique file names and when the client wants to download, i want to rename to a friendly name.
I'm still using 2014 azure storage dlls in my project and i'm not planning to update them anytime soon. So i can't use built-in ContentDeposition and rename it.
I tried using following code in my controller:
var blob = blobStorage.GetBlobRef("https://mysite.blob.core.windows.net/my-container/WERF3234435FFF_ERFas23E.doc");
MemoryStream memStream = new MemoryStream();
blob.DownloadToStream(memStream);
Response.ContentType = blob.Properties.ContentType;
Response.AddHeader("Content-Disposition", "Attachment; filename=abcd_New.doc");
Response.AddHeader("Content-Length", blob.Properties.Length.ToString());
Response.BinaryWrite(memStream.ToArray());
but its not downloading the file.
I also tried using this:
MemoryStream memStream = new MemoryStream();
blob.DownloadToStream(memStream);
System.Web.HttpContext.Current.Response.Clear();
System.Web.HttpContext.Current.Response.ContentType = blob.Properties.ContentType;
System.Web.HttpContext.Current.Response.AddHeader("Content-Disposition", "Attachment; filename=" + friendlyName.doc);
System.Web.HttpContext.Current.Response.AddHeader("Content-Length", blob.Properties.Length.ToString());
System.Web.HttpContext.Current.Response.BinaryWrite(memStream.ToArray());
System.Web.HttpContext.Current.Response.End();
I have my business logic in a separate solution and getting the blob reference from there to my main solution.
Am i missing something?
When we're talking about ASP.NET MVC, I'm missing the Controller/Action in your code? You're not supposed to write to the HttpContext yourself when doing ASP.NET MVC. You have ActionResults for that.
public ActionResult Download()
{
// ...
var bytes = memStream.ToArray();
return File(bytes, System.Net.Mime.MediaTypeNames.Application.Octet, "abcd_New.doc");
}
The browser will decide whether to open the file download or open it directly within the browser window. If you want to control that, you will need the following piece of code before you call the return File(... method:
var contentDisposition = new System.Net.Mime.ContentDisposition
{
FileName = "abcd_New.doc",
Inline = false // true will try to open in Browser, false will download
};
Response.AppendHeader("Content-Disposition", contentDisposition.ToString());
We need to flush our response after wrote a file to response. I use the code which you provided. After adding following code, I can see the file can be download from server.
Response.BinaryWrite(memStream.ToArray());
Response.Flush();
Response.End();

Upload file with meta data and checkin to sharpoint folder using Client Object Model

Hi I'm trying to upload a file to sharepoint 2010 using the client api with meta data and also checkin the file after I'm done. Below is my code:
public void UploadDocument(SharePointFolder folder, String filename, Boolean overwrite)
{
var fileInfo = new FileInfo(filename);
var targetLocation = String.Format("{0}{1}{2}", folder.ServerRelativeUrl,
Path.AltDirectorySeparatorChar, fileInfo.Name);
using (var fs = new FileStream(filename, FileMode.Open))
{
SPFile.SaveBinaryDirect(mClientContext, targetLocation, fs, overwrite);
}
// doesn't work
SPFile newFile = mRootWeb.GetFileByServerRelativeUrl(targetLocation);
mClientContext.Load(newFile);
mClientContext.ExecuteQuery();
//check out to make sure not to create multiple versions
newFile.CheckOut();
// use OverwriteCheckIn type to make sure not to create multiple versions
newFile.CheckIn("test", CheckinType.OverwriteCheckIn);
mClientContext.Load(newFile);
mClientContext.ExecuteQuery();
//SPFile uploadFile = mRootWeb.GetFileByServerRelativeUrl(targetLocation);
//uploadFile.CheckOut();
//uploadFile.CheckIn("SOME VERSION COMMENT I'D LIKE TO ADD", CheckinType.OverwriteCheckIn);
//mClientContext.ExecuteQuery();
}
I'm able to upload the file but I can't add any meta data and file is checked out. I want to add some meta data and checkin the file after I'm done.
My SharePointFolder class has the serverRelativeUrl of the folder path to upload to. Any help greatly appreciated.
You need a credential before the executeQuery(); and SaveBinaryDirect();
For example:
mClientContext.Credentials = new NetworkCredential("LoginID","LoginPW", "LoginDomain");
SPFile newFile = mRootWeb.GetFileByServerRelativeUrl(targetLocation);
mClientContext.Load(newFile);
mClientContext.ExecuteQuery();

nutch to extract only pdf files

Is there any way to perform a urlfilter from level 1-5 and a different urlfilter from 5 onwards. I need to extract pdf files which will be only after a given level (just to experiment).
The pdf files will be stored in a binary format in the crawl/segment folder. I would like to extract these pdf files and store all in 1 folder. I have been able to write a java program to identify a pdf file. I cant figure how to make a pdf file with its content having same font, page #, images etc.
perform crawl
merge segment data
run makePDF.java
this only identifies pdf files:
String uri = "/usr/local/nutch/framework/apache-nutch-1.6/merged572/20130407131335";
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
Path path = new Path(uri, Content.DIR_NAME + "/part-00000/data");
SequenceFile.Reader reader = null;
try {
reader = new SequenceFile.Reader(fs, path, conf);
Text key = new Text();
Content content = new Content();
while (reader.next(key, content)) {
String contentType = content.getContentType();
if (contentType.equalsIgnoreCase("application/pdf")) {
//System.out.write( content.getContent(), 0, content.getContent().length );
System.out.println(key);
}
}
reader.close();
}
finally {
fs.close();
}
content.getContent() will return the content in bytes.
Just write the bytes to a file using BufferedOutputStream and save it as a pdf