PDF Creation using Diagnostics.Process and Wkhtmltopdf slow on server - pdf

People,
I’m generating a PDF file within a .NET application using wkhtmltopdf inside a System.Diagnostics.Process
While this process takes approx. 1 sec to run on my local machine (Win 10 Pro 16gb mem) once deployed to server and using the same data it takes approx. 40 secs on the server (win server 2012 8gb mem). The resultant PDF file in both cases is only about 34kb.
Have run some diagnostics times on each code line I have found that it is this line which takes all the time.
if (!process.WaitForExit(120000))
I have tried changing permissions on output folder and also changing output folder. I have also change identity on IIS application pool.
With such a disparity in performance I’m not convinced it is code issue just configuration. Can anyone shed any light on this.
Slightly abbreviated code below
I should also mention that I have run procmon on server while running at appears to be doing very little after originally loading program.
var temp = HttpContext.Current.Server.MapPath("~//temppdf//")
var outputPdfFilePath = Path.Combine(temp,
String.Format("{0}.pdf", Guid.NewGuid()));
document.Url = "-";
ProcessStartInfo si;
StringBuilder paramsBuilder = new StringBuilder();
paramsBuilder.Append("--page-size A4 ");
paramsBuilder.Append("--zoom 1.000 ");
paramsBuilder.Append("--disable-smart-shrinking ");
paramsBuilder.AppendFormat("\"{0}\" \"{1}\"", document.Url, outputPdfFilePath);
si = new ProcessStartInfo();
si.CreateNoWindow = false;
si.FileName = environment.WkHtmlToPdfPath; //path to exe in programs file(x86)
si.Arguments = paramsBuilder.ToString();
si.UseShellExecute = false;
si.RedirectStandardError = false;
si.RedirectStandardInput = true;
try
{
using (var process = new Process())
{
process.StartInfo = si;
process.Start();
if (document.Html != null)
using (var stream = process.StandardInput)
{
byte[] buffer = Encoding.UTF8.GetBytes(document.Html);
stream.BaseStream.Write(buffer, 0, buffer.Length);
stream.WriteLine();
}
if (!process.WaitForExit(120000))
throw new PdfConvertTimeoutException();
// THis above command takes 1 sec locally and 42 secs on server
}
}
finally
{
if (delete && File.Exists(outputPdfFilePath))File.Delete(outputPdfFilePath);
}

Well finally got to the bottom of this and had absolutely nothing to do with the code. It transpires that the HTML I was trying to convert had links to images on a live website. While this website was accessible from my local machine it was inaccessible from the server. Therefore time delay due to timeouts trying to access inaccessible images.
Doh!!!

Related

Very slow AWS presignedURL download. How to speed it up?

I am reading 20 - 30 different objects in varying size using the S3 IAM and a unique presignedURL for each file. The download of all files occur at once. Each phase occurs in sequence. Unfortunatly the S3Client is not thread safe so we cannot use async operations. Some files transfer rapidly while others lag. The total operation can take between 7 to > 15 seconds. I expected greater performance from S3 since AWS advertises that it has high throughput.
I see several posts that are unanswered about the download performance from S3. However the problem seems to have increased once we introduced link ambiguation using the IAM and presignedURL.
FYI my internet connection is broadband. It is unlikely the cause of the performance issue.
The tests are performed only a few hundred miles from S3 storage and eliminates distance as a factor of performance issues.
There is no server between the client and the S3 for downloading objects and is not the cause of performance issues.
One caveat. We tried using async forAllChunked using the Rice.edu habenero api. When we did not have any errors due to threading problems, the download performance was still very slow. This seemingly should eleminate the idea that download performance is slow due to it's serialization in the for loop. Albiet performance should be far better if we can download files simultainiously.
Code attatched.
public void cloudGetMedia(ArrayList<MediaSyncObj> mediaObjs, ArrayList<String> signedUrls) {
long getTime = System.currentTimeMillis();
// Ensure media directory exists or create it
String toDiskDir = DirectoryMgr.getMediaPath('M');
File diskFile = new File(toDiskDir);
FileOpsUtil.folderExists(diskFile);
// Process signedURLs
for(String signedurl : signedUrls) {
LOGGER.debug("cloudGetMedia called. signedURL is null: {}", signedurl == null);
URI fileToBeDownloaded = null;
try {
fileToBeDownloaded = new URI(signedurl);
} catch (URISyntaxException e) {
e.printStackTrace();
}
// get the file name from the presignedURL
AmazonS3URI s3URI = new AmazonS3URI(fileToBeDownloaded);
String localURL = toDiskDir + "/" + s3URI.getKey();
File file = new File(localURL);
AmazonS3 client = AmazonS3ClientBuilder.standard()
.withRegion(s3URI.getRegion())
.build();
try{
URL url = new URL(signedurl);
PresignedUrlDownloadRequest req = new PresignedUrlDownloadRequest(url);
client.download(req, file);
}
catch (MalformedURLException e) {
LOGGER.warn(e.getMessage());
e.printStackTrace();
}
}
getTime = (System.currentTimeMillis() - getTime);
LOGGER.debug("Total get time in syncCloudMediaAction: {} milliseconds, numElement: {}", getTime, signedUrls.size());
}

Too many files open when using generic packager with external packager.xml file

I am using jpos 2.1.0 where i am using external packager xml file for iso8583 client. Due to large number of request in two or three days, i encountered "Too Many Files Open" and i have set ulimit -n = 50000. I doubt that the packager files are not been closed properly due to which this limit has been exceeded. Please help me to close the open file properly.
JposLogger logger = new JposLogger(isoLogLocation);
org.jpos.iso.ISOPackager customPackager = new GenericPackager(isoPackagerLocation+iso8583Properties.getPackager());
BaseChannel channel = new ASCIIChannel(iso8583Properties.getServerIp(), Integer.parseInt(iso8583Properties.getServerPort()), customPackager);
logger.jposlogconfig(channel);
try {
channel.setTimeout(45000);
channel.connect();
}catch(Exception ex) {
log4j.error(ex.getMessage());
throw new ConnectIpsException("Unable to establish connection with bank.");
}
log4j.info("Connection established using ASCIIChannel");
ISOMsg m = new ISOMsg();
m.set(0, "1200");
........
m.set(126, "connectIPS");
m.setPackager(customPackager);
log4j.info(ISOUtil.hexdump(m.pack()));
channel.send(m);
log4j.info("Message has been send");
ISOMsg r = channel.receive();
r.setPackager(customPackager);
log4j.info(ISOUtil.hexdump(r.pack()));
String actionCode = (String) r.getValue("39");
channel.disconnect();
return bancsxfr;
}
You know when you open a file, a socket, or a channel, you need to close it, right?
I don't see a finally in your try that would close the channel.
You have a huge leak there.

Uploading and downloading large files ~50gb using ASP Core 2.2 api

I'm struggling to provide ability in my ASP Core 2.2 app to upload and download large files, up to 50gb. Currently, for testing purposes, I'm saving the files on local storage but in future, I will move it to some cloud storage provider.
Files will be sent by other server written in Java, more specifically it will be Jenkins plugin that sends project builds to my ASP Core server using This library.
Currently, I use classic Controller class with HttpPost to upload the files, but this seems to me like not the best solution for my purposes since I won't use any webpage to attach files from client.
[HttpPost]
[RequestFormLimits(MultipartBodyLengthLimit = 50000000000)]
[RequestSizeLimit(50000000000)]
[AllowAnonymous]
[Route("[controller]/upload")]
public async Task<IActionResult> Upload()
{
var files = Request.Form.Files;
SetProgress(HttpContext.Session, 0);
long totalBytes = files.Sum(f => f.Length);
if (!IsMultipartContentType(HttpContext.Request.ContentType))
return StatusCode(415);
foreach (IFormFile file in files)
{
ContentDispositionHeaderValue contentDispositionHeaderValue =
ContentDispositionHeaderValue.Parse(file.ContentDisposition);
string filename = contentDispositionHeaderValue.FileName.Trim().ToString();
byte[] buffer = new byte[16 * 1024];
using (FileStream output = System.IO.File.Create(GetPathAndFilename(filename)))
{
using (Stream input = file.OpenReadStream())
{
long totalReadBytes = 0;
int readBytes;
while ((readBytes = input.Read(buffer, 0, buffer.Length)) > 0)
{
await output.WriteAsync(buffer, 0, readBytes);
totalReadBytes += readBytes;
int progress = (int)((float)totalReadBytes / (float)totalBytes * 100.0);
SetProgress(HttpContext.Session, progress);
Log($"SetProgress: {progress}", #"\LogSet.txt");
await Task.Delay(100);
}
}
}
}
return Content("success");
}
I'm using this code now to upload files but for larger files >300mb it takes ages to start the upload.
I tried looking for many articles on how to achieve this, such as:
Official docs
or
Stack
But none of the solutions seems to work for me since the upload takes ages and I also noticed that for files ~200MB(the largest file I could upload for now) the more data is uploaded the more my PC gets slower.
I need a piece of advice if I am following the right path or maybe I should change my approach. Thank you.

Download files from S3 in parallel (AWS .NET SDK)

I can't get AmazonS3Client.GetObject to download files in parallel. The code is as follows:
public async Task<string> ReadFile(string filename)
{
string filePath = config.RootFolderPath + filename;
var sw = Stopwatch.StartNew();
Console.WriteLine(filePath + " - start");
using (var response = await s3Client.GetObjectAsync(config.Bucket, filePath))
{
Console.WriteLine(filePath + " - request - " + sw.ElapsedMilliseconds);
using (var reader = new StreamReader(response.ResponseStream))
{
return await reader.ReadToEndAsync();
}
}
}
This is called like this:
var tasks = (from file in files select ReadFile(file)).ToArray();
await Task.WhenAll(tasks);
This results that the requests are returned sequentially (not in order though). I read about 50 tiny files, so that takes about 25 seconds hanging in method GetObjectAsync for the last read. Instead I hoped that I can read the 50 files in 2-3 seconds.
I've already verified:
I'm on the task pool. So the synchronization context isn't in the mix. I also added a ConfigureAwait(false) to the tasks, but that didn't make a difference as expected.
I've tried various settings with the AmazonS3Client, like using the HTTP protocol or changing the buffer size. Without success.
I added a stop watch to verify the problem isn't around reading the response stream. However, when not reading the response stream, the whole method returns quickly.

Appication goto Freeze when downloading a webpage

i wrote a function for download a webpage : function like:
public string GetWebPage(string sURL)
{
System.Net.WebResponse objResponse = null;
System.Net.WebRequest objRequest = null;
System.IO.StreamReader objStreamReader = null;
string sResultPage = null;
try
{
objRequest = System.Net.HttpWebRequest.Create(sURL);
objResponse = objRequest.GetResponse();
objStreamReader = new System.IO.StreamReader(objResponse.GetResponseStream());
sResultPage = objStreamReader.ReadToEnd();
return sResultPage;
}
catch (Exception ex)
{
return "";
}
}
But my problem is that. when this function working at that time application goto freeze (not response) and that time my can't not do any thing. How can i solve this problem. when downloading at time user can do other thing in my application.
Welcome to the world of blocking IO.
Consider the following:
You want your program to download a web page and then return the first 10 letters it finds in the source html. Your code might look like this:
...
string page = GetWebPage("http://example.com"); // download web page
page = page.Substring(0, 10);
Console.WriteLine(page);
....
When your program calls GetWebPage(), it must WAIT for the web page to be fully downloaded before it can possibly try to call Substring() - else it may try to get the substring before it actually downloads the letters.
Now consider your program. You've got lots of code - maybe a GUI interface running - and it's all executing line by line one instruction at a time. When your code calls GetWebPage(), it can't possibly continue executing additional code until that request is fully finished. Your entire program is waiting on that request to finish.
The problem can be solved in a few different ways, and the best solution depends on exactly what you're doing with your code. Ideally, your code needs to execute asynchronously. c# has methods that can handle a lot of this for you, but one way or another, you're going to want to start some work - downloading the web page in your case - and then continue executing code until your main thread is notified that the webpage is fully downloaded. Then your main thread can begin parsing the return value.
I'm assuming that since you've asked this question, you are very new to threads and concurrency in general. You have a lot of work to do. Here are some resources for you to read up about threading and implementing concurrency in c#:
C# Thread Introduction
.NET Asynchronous IO Design
the best was is to use thread
new Thread(download).Start(url);
and if your download page size is large use chunk logic.
HttpWebRequest ObjHttpWebRequest = (HttpWebRequest)HttpWebRequest.Create(Convert.ToString(url));
ObjHttpWebRequest.AddRange(99204);
ObjHttpWebRequest.Timeout = Timeout.Infinite;
ObjHttpWebRequest.Method = "get";
HttpWebResponse ObjHttpWebResponse = (HttpWebResponse)ObjHttpWebRequest.GetResponse();
Stream ObjStream = ObjHttpWebResponse.GetResponseStream();
StreamReader ObjStreamReader = new StreamReader(ObjStream);
byte[] buffer = new byte[1224];
int length = 0;
while ((length = ObjStream.Read(buffer, 0, buffer.Length)) > 0)
{
downloaddata += Encoding.GetEncoding(936).GetString(buffer);