PDF header signature not found error? - pdf

I am working on Asp.Net MVC application with Azure. When I upload the PDF document to Azure blob storage it will uploaded perfectly by using below code.
var filename = Document.FileName;
var contenttype = Document.ContentType;
int pdfocument = Request.ContentLength;
//uploading document in to azure blob
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageConnectionString"));
var storageAccount = CloudStorageAccount.DevelopmentStorageAccount(FromConfigurationSetting("Connection"));
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("containername");
container.CreateIfNotExists();
var permissions = container.GetPermissions();
permissions.PublicAccess = BlobContainerPublicAccessType.Blob;
container.SetPermissions(permissions);
string uniqueBlobName = string.Format(filename );
CloudBlockBlob blob = container.GetBlockBlobReference(uniqueBlobName);
blob.Properties.ContentType = ;
blob.UploadFromStream(Request.InputStream);
after uploading the document to blob trying to to read the pdf document getting an error "PDF header signature not found." that erorr code is
byte[] pdf = new byte[pdfocument];
HttpContext.Request.InputStream.Read(pdf, 0, pdfocument);
PdfReader pdfReader = new PdfReader(pdf); //error getting here
and one more thing I forgot i.e if we comment the above code(uploading document in to Azure blob) then am not getting that error.

In your combined use case you try to read Request.InputStream twice, once during upload and once later when trying to read it into your byte[] pdf --- when you read it first, you read it until its end, so the second read most likely did not get any data at all.
As you anyways intend to read the PDF into memory (the afore mentioned byte[] pdf), you could in your combined use case
first read the data into that array
int pdfocument = Request.ContentLength;
byte[] pdf = new byte[pdfocument];
HttpContext.Request.InputStream.Read(pdf, 0, pdfocument);
then upload that array using CloudBlob.UploadByteArray
var storageAccount = CloudStorageAccount.DevelopmentStorageAccount(FromConfigurationSetting("Connection"));
[...]
CloudBlockBlob blob = container.GetBlockBlobReference(uniqueBlobName);
blob.Properties.ContentType = ; // <--- something missing in your code...
blob.UploadByteArray(pdf); // <--- upload byte[] instead of stream
and then feed your PDF reader
PdfReader pdfReader = new PdfReader(pdf);
This way you read the stream only once, and a byte[] should be re-usable...

Related

Azure blob storage: Azure Source blob size is set to Zero when I download the blob using DownloadRange API

I am trying to download Azure blob using azure data movement library.
I am facing the issue where the Azure source blob size is set to "Zero" when I try to download this source blob using the APIs downloadRange
The destination file is downloaded correctly and its size is correct. Am I missing anything?
I am using azure-storage java sdk version 8.6.5. Here is the sample code
CloudStorageAccount storageAccount = CloudStorageAccount.Parse("myConnectionString");
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("myContainer");
CloudBlockBlob cloudBlockBlob = container.GetBlockBlobReference("abc.txt");
And this is loop where I am reading in chunks. Destination file size is correct. I just cannot figure why the source blob size is set to zero ? This is not reproducible when entire contents are downloaded using API download
try (OutputStream out = new ByteBufferBackedOutputStream(buffer)) {
cloudBlockBlob.downloadRange(position, (long) buffer.capacity(), out);
}...
Thanks in Advance!!
If you just want to read a file in chunks by cloudBlockBlob.downloadRange, try code below:
import java.io.ByteArrayOutputStream;
import com.microsoft.azure.storage.CloudStorageAccount;
import com.microsoft.azure.storage.blob.CloudBlobClient;
import com.microsoft.azure.storage.blob.CloudBlobContainer;
import com.microsoft.azure.storage.blob.CloudBlockBlob;
public class lagencyStorgeSdk {
public static void main(String[] args) throws Exception {
CloudStorageAccount storageAccount = CloudStorageAccount.parse(
"<storage account connection string>");
CloudBlobClient blobClient = storageAccount.createCloudBlobClient();
CloudBlobContainer container = blobClient.getContainerReference("<container name>");
CloudBlockBlob cloudBlockBlob = container.getBlockBlobReference(".txt file name");
cloudBlockBlob.downloadAttributes();
long totalSize = cloudBlockBlob.getProperties().getLength();
long readSize = 2;
for (long i = 0; i < totalSize; i += readSize) {
ByteArrayOutputStream downloadStream = new ByteArrayOutputStream();
cloudBlockBlob.downloadRange(i, readSize, downloadStream);
System.out.println("===>" + downloadStream);
}
}
}
content of my .txt file:
Result:
Let me know if you have any questions.

How do I set Set ContentType before upload to azure blob container [duplicate]

This question already has answers here:
How upload blob in Azure Blob Storage with specified ContentType with .NET v12 SDK?
(2 answers)
Closed 2 years ago.
I'm using the asp.net-core webapi to upload images to azure storage.
I was able to successfully upload a image blob to azure storage (using the quickstart). However, the content-type property (with azure) is set to application/octet-stream. The problem with this is the public url will not load in a browser due to this content type. I plan to eventually consume this url/image in my website so I'm thinking this might work. Is there any way to specify the CONTENT-TYPE to image/jpeg? I've also tried the following code but received error message: 404 (The specified blob does not exist.) during the SetHttpHeaders call (the UploadBlob method call that is currently commented out does work, but has the octet-stream content type).
BlobClient blobClient = containerClient.GetBlobClient(guids[index]);
using (var content = file.OpenReadStream())
{
blobClient.Upload(content);
blobClient.SetHttpHeaders(new BlobHttpHeaders() { ContentType = "image/jpeg" });
//containerClient.UploadBlob(guids[index], content);
}
Not specific contentType, with the filename.[ext] download ok. When create the reference to blobclient set the extension. Example download:
var fileName = $"{guids[index]}.jpg";
var pathStorage = Path.Combine(path, fileName);
BlobClient blobClient = containerClient.GetBlobClient(pathStorage);
BlobDownloadInfo download = await blobClient.DownloadAsync();
byte[] bytesContent;
using (var ms = new MemoryStream())
{
await download.Content.CopyToAsync(ms);
bytesContent = ms.ToArray();
}
return bytesContent;
Example upload:
var fileName = $"{guids[index]}.jpg";
var pathStorage = Path.Combine(path, fileName);
BlobClient blobClient = containerClient.GetBlobClient(pathStorage);
var stream = new MemoryStream(bytesContent);
var uploadInfo = await blobClient.UploadAsync(stream);

Can you download a specific Block from an Azure Block Blob?

Is it possible to download a specific block from an Azure Block Blob if you know the Block Id?
Yes, you absolutely can, here's an example of how to download the first block:
var storageAccount = CloudStorageAccount.Parse("DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;EndpointSuffix=core.windows.net");
CloudBlobClient serviceClient = storageAccount.CreateCloudBlobClient();
var container = serviceClient.GetContainerReference("containerName");
var blockBlob = container.GetBlockBlobReference("blobName");
var blocklist = await blockBlob.DownloadBlockListAsync();
var firstBlock = blocklist.First();
var memStream = new MemoryStream();
await blockBlob.DownloadRangeToStreamAsync(memStream, 0, firstBlock.Length);
string contents;
using (var streamReader = new StreamReader(memStream))
{
contents = await streamReader.ReadToEndAsync();
}
You will need a couple of packages from nuget:
Microsoft.WindowsAzure.Storage
Microsoft.WindowsAzure.Storage.Blob
You could leverage Microsoft Azure Storage SDK for getting started with Azure Blob Storage quickly. And the Azure Storage SDK is a wrapper of Blob Service REST API. From the official tutorial about Blob Service REST API, we couldn't find anything about downloading the specific block via the Block Id. In addition, you could use Get Blob to download the bytes of your blob in the specified range by specifying the offset and the length of data to download from your blob.

Stream pdfs from url and add it to Zip

I have a mvc 4.5 application where I show a grid. The first column of the grid is a document name. The document name is an hyper link to the actual document that is hosted on our site and is available via a url. The documents can be pdf or doc or ppt. I can access these documents only via url and I do not have access to the actual physical document on our server.
I am providing users an option to select one or many of these documents from the grid and then they can download them. What I am trying to achieve is read each of the selected documents via the url and write it to a zip file and make the zip file downloadable. So users will be downloading one file instead of multiple files.
I have tried to stream the documents via url in memory and then add it to the zip file using ZipArchive Library from Microsoft. This is not working for me.
I was able to add documents that was on disk to zip file using Zip Archive and it works great. But I do not have access to the physical document as I can access the documents only through URL. My next option is to download each of these documents into a temp location on server and then add it to zip file using Zip Archive.But I am trying to avoid downloading files into a temp location
Please suggest how I can achieve reading documents via url in memory and adding each of these document to zip file and make zip file downloadable.
Any help will be appreciated.
Thank you Cbroe for commenting. I figured the answer. The problem was I was reading the pdf from the url and convert it to a memory stream and then was trying to add the memory stream to ZipArchive which was not working but instead I extracted the byte array out of the memory stream and then added it to the zip archive and it worked.
Here is the code snippet that might be useful for some one. My first contribution to Stack OverFlow.
public FileResult DownloadZip()
{
MemoryStream memoryStream = new MemoryStream();
using (var archive = new ZipArchive(memoryStream, ZipArchiveMode.Create, true))
{
var demoFile = archive.CreateEntry("Pdf123.pdf");
var convertedStream = ConvertTobyte("http://www.example.com/Pdf123.pdf");
using (var entryStream = demoFile.Open())
{
entryStream.Write(convertedStream, 0, convertedStream.Length);
}
demoFile = archive.CreateEntry("Pdf456.pdf");
convertedStream = ConvertTobyte("http://www.example.com/Pdf456.pdf");
using (var entryStream = demoFile.Open())
{
entryStream.Write(convertedStream, 0, convertedStream.Length);
}
}
//This option is to write the zip to your local disk
using (var fileStream = new FileStream(#"C:\Temp\test.zip", FileMode.Create))
{
memoryStream.Seek(0, SeekOrigin.Begin);
memoryStream.CopyTo(fileStream);
}
//This option is to donload the zip via browser
memoryStream.Seek(0, SeekOrigin.Begin);
return new FileStreamResult(memoryStream, "application/zip")
{
FileDownloadName = "Archive.zip"
};
}
private static byte[] ConvertTobyte(string fileUrl)
{
byte[] imageData = null;
using (var wc = new System.Net.WebClient())
imageData = wc.DownloadData(fileUrl);
return imageData;
}

Windows Azure UploadFromStream No Longer Works After Porting to MVC4 - Pointers?

Updated my MVC3/.Net 4.5/Azure solution to MVC4.
My code for uploading an image to blob storage appears to fail each time in the upgraded MVC4 solution. However, when I run my MVC3 solution works fine. Code that does the uploading, in a DLL, has not changed.
I’ve uploaded the same image file in the MVC3 and MVC4 solution. I’ve inspected in the stream and it appears to be fine. In both instance I am running the code locally on my machine and my connections point to blob storage in cloud.
Any pointers for debugging? Any known issues that I may not be aware of when upgrading to MVC4?
Here is my upload code:
public string AddImage(string pathName, string fileName, Stream image)
{
var client = _storageAccount.CreateCloudBlobClient();
client.RetryPolicy = RetryPolicies.Retry(3, TimeSpan.FromSeconds(5));
var container = client.GetContainerReference(AzureStorageNames.ImagesBlobContainerName);
image.Seek(0, SeekOrigin.Begin);
var blob = container.GetBlobReference(Path.Combine(pathName, fileName));
blob.Properties.ContentType = "image/jpeg";
blob.UploadFromStream(image);
return blob.Uri.ToString();
}
I managed to fix it. For some reason reading the stream directly from the HttpPostFileBase wasn't working. Simply copy it into a new memorystream solved it.
My code
public string StoreImage(string album, HttpPostedFileBase image)
{
var blobStorage = storageAccount.CreateCloudBlobClient();
var container = blobStorage.GetContainerReference("containerName");
if (container.CreateIfNotExist())
{
// configure container for public access
var permissions = container.GetPermissions();
permissions.PublicAccess = BlobContainerPublicAccessType.Container;
container.SetPermissions(permissions);
}
string uniqueBlobName = string.Format("{0}{1}", Guid.NewGuid().ToString(), Path.GetExtension(image.FileName)).ToLowerInvariant();
CloudBlockBlob blob = container.GetBlockBlobReference(uniqueBlobName);
blob.Properties.ContentType = image.ContentType;
image.InputStream.Position = 0;
using (var imageStream = new MemoryStream())
{
image.InputStream.CopyTo(imageStream);
imageStream.Position = 0;
blob.UploadFromStream(imageStream);
}
return blob.Uri.ToString();
}