MaxMind: loading GeoIP2 mmdb into memory for fast read - maxmind

I am using MaxMind's GeoIP2 to get the geo information for an IP address. In my Java web application, the
DatabaseReader reader = new DatabaseReader.Builder(new File("C:\GeoLite2-City.mmdb").withCache(new CHMCache()).build();
I am hoping to load the entire file into memory for efficient/fast read.
Is the way shown above the most efficient/fast way of using the mmdb database?

The code you pasted will memory-map the file and use the data cache. It should be efficient, but it will not load the whole database into memory. If you want to do that, you would need to load the database using the fileMode builder option, e.g.:
DatabaseReader reader = new DatabaseReader
.Builder(new File("C:\GeoLite2-City.mmdb")
.fileMode(com.maxmind.db.Reader.FileMode.MEMORY)
.withCache(new CHMCache())
.build();
However, in most cases, you will probably not see a performance difference between this and the memory-mapped file.

Related

Saving data so it is persistent in Objective-C

Over many different Objective-C iOS coding projects I have frequently come across the issue of having data be accessible after I initially got it.
For example, currently I am reading from the stackoverflow API. I do this with a session and get a dictionary back (my JSON response).
But outside the scope of the session, the dictionary is unavailable! I can't copy the contents to a different dictionary that I've defined globally, or anything. It's like it disappears outside of the session.
So I am wondering, what's the best way to save this data that I want to use? From what I've been reading it seems like NSUserDefaults or maybe creating a plist file, although admittedly I've been having trouble with both options. If there is a method that is best for this then I can concentrate on that.
Thank you!
It depends on how persistent you want to be.
If you save this dictionary into a global variable, it is stored in the part of device's RAM that is reserved for the running app. When the app stops running (gets killed by OS or removed by the user) or if your device reboots - this memory is lost.
If you save this dictionary onto the device's flash memory drive (and its file system) - it will live past restarts/reboots.
Usually people combine the approaches: when you get the data from the network, you keep it in a global variable, and save it to the file system. After the app restart you try to load the data from the file system. The reason for not using the FS all the time is that it is much slower than RAM access. I guess I'm describing caching.
Note that you can implement manual caching (using plain data or text files, NSUserDefaults, Core Data or other libraries), but also you can utilize a builtin HTTP cache - NSURLCache. If you create a session with NSURLSession.sharedSession it will use the default NSURLCache and respect a caching policy dictated by the server side.
For more control and full offline support I'd recommend to implement caching manually. See this about reading and writing plists and writeToFile:atomically:.

Intercepting File Writes on OS X

I have a program that generates information from the contents of files, however, I believe it would be more efficient if I were able to do this as the files are being written; rather than having to then read the contents back after some delay, since I can simply generate the data as the file is writing to disk.
What method(s) are available for an application to hook into the file-write process, i.e- to process the data stream as it's being written to disk? Also, which of these (if any) are allowable for app store apps?
I've been considering using a Spotlight Importer, however this still involves reading the contents of a file after they've been written, in which case I'm relying on the file still being in the RAM cache to reduce disk access.

PDFNet SDK Convert files on Azure storage

I have a web app that needs to convert PDFs to XODs (PDFTron’s format to display documents in their WebViewer). My Web App is hosted on Azure, the PDFs are on Azure Storage. We would like to go along with the on-premises conversion via PDFNet SDK (http://www.pdftron.com/webviewer/getstarted.html, see “Choosing a deployment model for conversions), my code so far is the following:
WebRequest req = HttpWebRequest.Create("url of PDF on Azure Storage here");
using (Stream stream = req.GetResponse().GetResponseStream())
{
PDFDoc pdfdoc = new PDFDoc(stream);
var converted = pdftron.PDF.Convert.ToXod(pdfdoc);
//pdfdoc.Save(stream, pdftron.SDF.SDFDoc.SaveOptions.e_linearized); //not clear to me
}
My approach here is to create a stream from the file on azure storage and convert that stream to XOD. I still don’t know if I should call “Save” and in that case where the file would be saved.
My questions are:
Since everything runs on the cloud does it make sense to use the CloudAPI instead of the self-hosted solution or does it not make any difference?
In both cases, where is the converted file stored (since I am getting it from the Azure storage and not from a local server), since I would then need to move it to my azure storage account. Does the file get saved locally (meaning on the web/worker role which is processing it) and therefore needs to be moved to the storage?
Here (http://www.pdftron.com/pdfnet/samplecode.html) there are conversion code samples but they all use files on local machine, which would not be my case.
Since everything runs on the cloud does it make sense to use the
CloudAPI instead of the self-hosted solution or does it not make any
difference?
In both cases, where is the converted file stored [...]
If you were to go with the cloud solution, you would transfer your files to PDFTron's servers, where they will be converted. Then you would download the converted files.
If you were to go with the on-premises solution, you would need to run DocPub CLI (https://www.pdftron.com/docpub/downloads.html) on your Azure instance, and its only communication with PDFTron would be to increment the billing counter for your PWS account (https://www.pdftron.com/pws/index.html).
You'd have to decide for yourself which solution works best for you.
Here (http://www.pdftron.com/pdfnet/samplecode.html) there are
conversion code samples but they all use files on local machine, which
would not be my case.
[Note: these samples show how to use the PDFNet SDK to run conversions. To run PDFNet you would need an additional license. So you probably want to use DocPub CLI or the cloud converter instead.]
The samples show how to convert the files locally, since XOD conversion would need to be run server-side. How most people do so is by setting up some web service to upload PDF (or other format) files. Then they convert the documents server-side, and place the converted XOD files someplace where the WebViewer can serve them.
After some extra research I found out I can get and set the Stream of the source and destination files (even if the destination file does not exist yet) directly on Azure without downloading the file. The resulting code is then something like
using (var sourceStream = sourceBlob.OpenRead())
{
var destinationContainer = BlobClient.GetContainerReference(projectKey);
var destinationBlob = destinationContainer.GetBlockBlobReference(xodName);
using (var destinationStream = destinationBlob.OpenWrite())
{
var pdfDoc = new PDFDoc(sourceStream);
pdftron.PDF.Convert.ToXod(pdfDoc);
pdfDoc.Save(destinationStream, pdftron.SDF.SDFDoc.SaveOptions.e_linearized);
});
});

Azure Blob Storage: CloudBlockBlob.DownloadToStream throws OutOfMemory exception

This happens on my Azure Storage Emulator (did not try on real Azure Storage yet). I'm saving files to Blob Storage. I don't have any problem with smaller files (e.g. <= 107 MB). However, for bigger files (e.g. >= 114 MB), I could upload the file without error, but I got out of memory exception when trying to download it.
public Stream GetStream(string fileName)
{
var blob = GetCloudBlobContainer().GetBlockBlobReference(fileName);
if (blob.Exists())
{
Stream stream = new MemoryStream();
blob.DownloadToStream(stream);
return stream;
}
return null;
}
The exception is thrown on call blob.DownloadToStream(stream).
How to fix this problem?
UPDATE:
Okay, I found a workaround for my case. Instead of returning stream, I can make it to save to a local file directly (I need it anyway) using blob.DownloadToFile(), which is working fine. However, I'm still interested in finding a solution to this problem.
MemoryStream stores all your data in memory. And the fact that DownloadToFile works for you means that your machine might be running out of memory when trying to store blobs in memory.
As for uploads, If you upload directly from a file on your file system to blob, we do not load the whole file in memory and so you will not hit the same problem as in download.
In addition to Vinay's answer above, I would make a suggestion that you alternately open the "Performance" & "Processes" tab on your Windows Task manager to monitor memory usage while downloading.

downloading huge files - application using grails

I am developing a RESTful web service that allows users to download data in csv and json formats that is dynamically retrieved from the database.
Right now I am using a StringWriter to write out the CSV data. My major concern is that the resultset could get very large depending the on the user input. In that case, having them all in memory doesn't seem to be a good idea to me.
I am thinking of creating a temp file, but how to make sure the file gets deleted soon after the download completes.
Is there a better way to do this.
Thanks for the help.
If memory is the issue, you could simply write out to the response writer that writes directly to the output stream? This way you're not storing anything (much) in memory and no need to write out temporary files:
// controller action for CSV download
def download = {
response.setContentType("text/csv")
response.setHeader("Content-disposition", "attachment;filename=downloadFile.csv")
def results = // get all your results
results.each { result ->
out << result.col1 << ',' << result.col2 // etc
out << '\n'
}
}
This writes out to the output stream as it is looping round your results.
In theory You can make this even more memory efficient by using a scrollable results set - see "Using Scrollable Results" section of Querying with GORM - Criteria - and looping round that whilst writing out to the response writer. In theory this means you're also not loading all your DB results into memory, but in practice this may not work as expected if you're using MySQL (and its Java connector). Manually batching up queries may work too (get DB rows 1-10000, write out, get 10001-20001, etc)
This kind of thing might be more difficult with JSON, depending on what library you're using to render your objects.
Well, the simplest solution to preventing temp files from sticking around too long would be a cron job that simply deletes any file in the temp directory that has a modified time older than, say, 1 hour.
If you want it to all be done within Grails, you could design a Quartz job to clean up files. This job could either do as described above (and simply check modification timestamps to decide what to delete) or you could run the job only "on demand" with a parameter of a file name to be deleted. Once the download action is called you could schedule the cleanup of that specific file for X minutes later (to allow enough time for a successful download). The job would then be in charge of simply deleting the file.
Depending on the number of files involved you can always use http://download.oracle.com/javase/1,5.0/docs/api/java/io/File.html#deleteOnExit() to ensure the file is blown away when the VM shuts down.
To create a temp file that gets automatically deleted after the session has expired, you can use the Session Temp Files plugin.