Bigger cookie-like files for local data storage (browser "caching" of complex structures) - ruby-on-rails-3

I am developing a browser based game, and I have a big map there. The terrain of the map is static. Therefore, I have some thousands of tiles that will not change (whether they represent a forest, a desert, whatever), just the players above it can change.
Hence, I wanted to store all my map in the player's computer. I am working with Ruby on Rails, and those map information are passed from the server to the javascript that runs on the user browser, in order to render a pretty map. But it makes me pretty sad to have a 200kb .html file, containing all those map related information.
What would be the simplest way to solve this issue? Cookies! Well. That's what I thought. A complete map information can get to almost 200kb (they are pretty big). A cookie can have at most 4kb.. I don't feel that the right way to achieve my objective is to create tons of cookies, one for each row of the map, for instance. Is there any more elegant way to have this static information lie on the player's browser, without creating lots of cookies? A way to cache it on his browser? I mean.. I can cache a 400kb image, why can't I cache a 200kb map structure?
Thanks in advance!
Fernando.

Well, HTML Local Storage gives you 5 MB (though data is stored *as strings*, so the actual amount of data you can fit in the container is likely a lot less than 5 MB.
This limit is oddly fluid. For one thing, it's just a recommended limit; and for another, i.e., Webkit-based browsers use UTF-16, which immediately cuts that in half (2.5 MB).
Browser support for Local Storage is good: IE, Firefox, Safari 4.0+, Chrome 4.0+, and Opera 10.5+. Both iPhone and Android are supported above versions 3.0 and 2.0. respectively.
Using Local Storage to preserve game state appears to be a proto-typical use case.
Finally, Paul Kinlan published an excellent step-by-step tutorial on HTML5Rocks, which i highly recommend (though it's a little more than a year old).

Have you considered storing it in a js file? Most browser will cache linked js files, allowing you to only serve it every once in a while. It would be very simple to deploy.

Related

Store tesseract OCR traineddata on S3?

I have an app hosted on Heroku. I seek to extract text from various PDFs. I'm currently using tesseract for this.
Since Heroku does not offer that much storage space and the .traineddata files are big in size (need to use all of them), is it possible to somehow store the tessdata language data on S3? I was not able to find any solution to this yet.
All I could find is that I can define the --tessdata-dir PATH, but that's for a directory.
Sadly, I'm not sure Heroku is a good fit for your needs if you can't make all the data fit within the heroku slug. Even if you could get it to work, it would be quite a performance hit.
You'd probably be better off setting the Tesseract as an API with it's own server(s), then sending whatever you need to that API from heroku (or moving the entire app over). Depending on the size of the rest of your app and how quickly Tesseract is growing in size, that might just mean Tesseract gets it's own heroku app with absolutely minimal dependencies or might mean moving that part of the app to AWS or something.

Cloudbees instance sizing

I am preparing to configure our production Cloudbees instance, and would like some advice on app-cell sizing. We will be using multiple instances with auto scaling enabled, but need to choose the app-cell size. This is obviously a compromise between horizontal and vertical scaling and the best choice will be different for each app.
Our app is a shopping service written in GWT, so there is one JSP and an initial download of the application HTML, JS, CSS, and Image files. After that, everything runs in the clients' browser with only search calls hitting the server and returning plain JSON with no serialization. Also, we are not using sessions, and do not need any type of state on the server, so the memory footprint should be low.
Given all of this, my gut is to go with a more horizontally scaled deployment with a larger number of smaller instances. Any suggestions would be welcome.
There is no general answer to this question. Application design and internal processing comes in the equation and changes the result. Only load test can give you answer about the best sizing for your application.
Statelessness implies you can scale horizontally very easily - so this opens up both ways to scale.
My suggestion is to fine which "container size" works well enough for a single instance (usually a memory restriction) and once you are happy with that, you can then scale horizontally based on usage.

Ajax request to SQL returns hundreds of images to JSON -> CSS Sprites? Base64?

We are writing a Web-based events calendar with thousands of theatre shows, festivals, and concerts in a SQL database.
The user goes to the Website, performs a search, the SQL server returns JSON, and jQuery code on the client side displays the 200+ events.
Our problem is that each event has an image. If I return URLs, the browser has to make 200+ HTTP GET requests for these tiny (2-4Kb) images.
It is tempting to pack all the images into one CSS sprite, but since the user searches can return any combination of images, this would have to be done dynamically and we'd lose control over caching. Every search would have to return every image in a brand new CSS sprite. Also, we want to avoid any solution that requires dynamically reading or writing from a filesystem: that would be way too slow.
I'd rather return binary images as part of the JSON, and keep track of which images we already have cached and which we really need. But I can't figure out how. It seems to require converting to Base64? Is there any example code on this?
Thanks for any help you can give! :)
The web site you are using (StackOverflow) has to provide 50 avatars for 50 questions shown in the questions page. How does it do it? Browser makes 50 requests.
I would say, you had better implement pagination so that the list does not get too big. Also you can load images on the background or when the user scrolls and gets to the image.
Keeping track of images downloaded is not our job, it is browser's. Our job is to make sure the URL is unique and consistent and response headers allow caching.
Also base64 will make the stream at least 33% longer while it's processing on the client side is not trivial - I have never seen an implementation but probably someone has done some javascript for it.
I believe all you need is just pagination.
It looks like the original poster has proceeded with essentially this solution on their own, however based on their comment about 33% space overhead, I don't think they observed an unexpected phenomenon that you get when base64-ing a bunch of images into one response entity and then gzipping it...believe it or not, this can easily produce an overall transfer size that is smaller than the total of the unencoded image files, even if each unencoded image was also separately gzipped.
I've covered this JSON technique for accelerated image presentation in depth here as an alternative to CSS sprites, complete with a couple live samples. It aims show that it is often a superior technique to CSS sprites.
Data is never random. For example you could name your sprites
date_genre_location.jpg
or however you organise your searches. This might be worth it. Maybe.
You'll need to do the math
Here is what we decided.
Creating sprites has some downsides:
To reduce image loss, we need to store the originals as PNGs instead of JPEGs on the server. This is going to add to the slowness, and there's already quite some slowness in creating dynamic CSS sprites with ImageMagick.
To reduce image size, the final CSS sprite must be a JPG, but JPGs are lossy and with a grid of images, things get weird at the edges as JPG tries to blend and compress. We can fix that by putting a blank border around all the images but even so, this adds complexity.
with CSS sprites, it becomes harder to do intelligent caching; we have to pack up all the images regardless of what we've sent in the past. (or we have to keep all the old CSS sprites regardless of whether they contain many or few images we still need).
I'm afraid we have too many combinations of date-category-location for precomputing CSS sprites, although surely we could handle part of the data this way
Thus our solution is to Base64 it, to actually send each image one by one. Despite the 33% overhead, it is far less complex to code and manage and when you take caching issues into account, it may even be less data transfer.
Thank you all for your advice on this!

Random-access data object in J2ME

I'm planning to develop a small J2ME utility for viewing local public transport schedules using a mobile phone. The data part for those is mostly a big bunch of numbers representing the times when the buses arrive or leave.
What I'm trying to figure out is what is the best way to store that data. The representation needs to
be considerably small (because of persistent storage limitations of a mobile phone)
fit into a single file (for the ease of updating the schedule database afterwards over HTTP) fit into a constant number of files, i.e. (routes.dat, times.dat, ..., agencies.dat), and not (schedule_111.dat, schedule_112.dat, ...)
have a random access ability (unserializing the whole data object into memory would be just too much for a mobile phone :))
if there's some library for accessing that data format, a Java implementation should be present
In other words, if you had to squeeze a big part of GTFS-like data into a mobile device, how would you do that?
Google Protocol Buffers seemed like a good candidate for defining data but it didn't have random access.
What would you suggest?
Persistent storage on J2ME is a tricky business; see this related question for more general background: Best practice for storing large amounts of data with J2ME
In my experience, J2ME persistent storage tends to work best/most reliably with many small records rather than a few monolithic ones. Think about how the program is going to want to access the data, then try to store it in those increments in the J2ME persistent store.
I'd generally recommend decoupling your client-server protocol for downloading updates from the on-device storage format. You can change the latter with every code update, but you're pretty much stuck supporting a client-server protocol forever, unless you want to break older clients out in the field.
Finally, I know there are some people on the Transit Developers group who have built offline transit apps in J2ME, so it's worth asking for tips there.
I made app like this and I used xml-s generated with php. This enabled us to have a single provider for 3 presentation layers which were:
j2me app
website for mobile phones
usual website
We used xslt to convert xml to html on websites and kXML - very light pull parser to do it on j2me app. This worked well even on very old phones with b/w screens and small amounts of memory.
Besides on j2me there is no concept of file. You have the db in which you can store information.
This is a link to "mobile" website.
http://mobi.krakow.pl/rozklady/
and here to the app:
http://www.mobi.krakow.pl/rozklady/j2me/rjk.jar
This is in polish, but I think it's not hard to figure out what's this and that.
If you want, I can provide you with more help and advice or if this is a commercial product then I think we can figure out something too ;)
I think your issue is requirement 2.
Updating 10MB of data just because 4 digits changed somewhere in the middle of the file seems highly inefficient.
Spliting the data into several files allows for a better update granularity that will be well worth the added code complexity.
Real-time public transport schedules are usually modified one bus/train/tram line at a time.

Optimizing for low bandwidth

I am charged with designing a web application that displays very large geographical data. And one of the requirements is that it should be optimized so the PC still on dial-ups common in the suburbs of my country could use it as well.
Now I am permitted to use Flash and/or Silverlight if that will help with the limited development time and user experience.
The heavy part of the geographical data are chunked into tiles and loaded like map tiles in Google Maps but that means I need a lot of HTTP requests.
Should I go with just javascript + HTML? Would I end up with a faster application regarding Flash/Silverlight? Since I can do some complex algorithm on those 2 tech (like DeepZoom). Deploying desktop app though, is out of the question since we don't have that much maintenance funds.
It just needs to be fast... really fast..
p.s. faster is in the sense of "download faster"
I would suggest you look into Silverlight and DeepZoom
Is something like Gears acceptable? This will let you store data locally to limit re-requests.
I would also stay away from flash and Silverlight and go straight to javascript/AJAX. jQuery is a ton-O-fun.
I don't think you'll find Flash or Silverlight is going to help too much for this application. Either way you're going to be utilizing tiled images and the images are going to be the same size in both scenarios. Using Flash or Silverlight may allow you to add some neat animations to the application but anything you gain here will be additional overhead for your clients on dialup connections. I'd stick with plain Javascript/HTML.
You may also want to look at asynchronously downloading your tiles via one of the Ajax libraries available. Let's say your user can view 9 tiles at a time and scroll/zoom. Download those 9 tiles they can see plus whatever is needed to handle the zoom for those tiles on the first load; then you'll need to play around with caching strategies for prefetching other information asynchronously.
At one place I worked a rules engine was taking a bit too long to return a result so they opted to present the user with a "confirm this" screen. The few seconds it took the users to review and click next was more than enough time to return the results. It made the app look lightening fast to the user when in reality it took a bit longer. You have to remember, user perception of performance is just as important in some cases as the actual performance.
I believe Microsoft's Seadragon is your answer. However, I am not sure if that is available to developers.
It looks like some of it has found its way into Silverlight