Sharing seaside sessions using Gemstone/S - smalltalk

I am writing a web application using VisualWorks Smalltalk, Seaside and Gemstone/S.
At the moment the session state for the client can be maintained only if the client always returns to the same image (though if the session expires the client can restart with any image).
I would like to understand whether it's feasible to share session state using the Gemstone/S database. This means straightforward http load balancing could be used and I wouldn't need to worry about image affinity for a large population of users.
I can see that all this is done "for free" with the Gemstone implementation of Seaside, but I would like to use Visualworks because it has a much more powerful IDE and I am already familiar with it. All references I can find on Google to this assume use of Gemstone smalltalk and the Gemstone implementation of Seaside.
Has anybody done this? If so are there any write ups of findings or tutorials I can follow? Many apologies if I have missed an obvious link on a search but I really couldn't see it.

As far as I know there is no easy way to do it. Very simplified, and ignoring many optimizations taken by Seaside, Seaside session is snapshot of execution stack waiting to be resumed. Migrating that frozen execution stack to another image and restarting it there is non trivial.

Our application uses the same stack (Seaside, VW, GS), but we're not using the full GBS interface between VW & GS which allows for one VW image to service several Seaside sessions. Even in this mix, however, session affinity to each VW image is required.
Your reasons for developing your Seaside app in VW makes sense. It is indeed a better development environment, but you may consider splitting development and deployment: write your code in VW and deploy in GLASS. That's the conventional way to use GLASS, although with Pharo for development.

Related

How does Smalltalk image handle IO?

I just started to learn Smalltalk, went through its syntax, but hasn't done any real coding with it. While reading some introductory articles and some SO questions like:
What gives Smalltalk the ability to do image persistence, and why
can't languages like Ruby/Python serialize
themselves?
What is a Smalltalk “image”?
One question always comes into my mind: How does Smalltalk image handle IO?
A smalltalk program can resume from where it exits, using information stored in the image. Say I have some opened TCP connections(not to mention all sorts of buffer), how do they get recovered? There seems to be no way other than reopening them(confirmed by this answer). And if Smalltalk does reopen those connections, isn't it going against the idea of "resume execution of the program at a later time exactly from where you left off"? Or is there some magic behind it?
I don't mind if the answer is specific to certain dialects, say Pharo.
Also would be interested to know some resources to learn more about this topic.
As you have noted some resources are not part of the memory heap and therefore will not be recovered just by loading the image back in memory. In particular this applies to all kinds of resources managed by the operating system, and cross-platform Smalltalks where you can copy the image from one OS to another and restart the image even have to restore such resources differently than they were before.
The trick in the Smalltalks I have seen is that all classes receive a message immediately after the image resumed. By implementing a method for that message they can restore any transient resources (sockets, connections, foreign handles, ...) that their instances might need. To find all instances some Smalltalks provide messages such as allInstances, or you must maintain a registry of the relevant objects yourself.
And if Smalltalk does reopen those connections, isn't it going against the idea of "resume execution of the program at a later time exactly from where you left off"?
From a user perspective, after that reinitialization and reallocation of resources, everything still looks like "exactly where you left off", even though some technical details have changed under the hood. Of course this won't be the case if it is impossible to restore the resources (no network, for example). Some limits cannot be overcome by Smalltalk magic.
How does the Smalltalk image handle IO?
To make that resumption described above possible, all external resources are wrapped and represented as some kind of Smalltalk object. The wrapper objects will be persisted in the image, although they will be disconnected from the outside world when Smalltalk is shut down. The wrappers can then restore the external resources after the image has been started up again.
It might be useful to add a small history lesson: Smalltalk was created in the 1970s at Xerox's Palo Alto Research Center (PARC). In the same time, at the same place, Personal Computing was invented. Also in the same time at the same place, the Ethernet was invented.
Smalltalk was a single integrated system, it was at the same time the IDE, the GUI, the shell, the kernel, the OS, even the microcode for the CPU was part of the Smalltalk System. Smalltalk didn't have to deal with non-Smalltalk resources from outside the image, because for all intents and purposes, there was no "outside". It was possible to re-create the exact machine state, since there wasn't really any boundary between the Virtual Machine and the machine. (Almost all the system was implemented in Smalltalk. There were only a couple of tiny bits of microcode, assembly, and Mesa. Even what we would consider device drivers nowadays were written in Smalltalk.)
There was no need to persist network connections to other computers, because nobody outside of a few labs had networks. Heck, almost no organization even had more than one computer. There was no need to interact with the host OS because Smalltalk machines didn't have an OS; Smalltalk was the OS. (You probably know the famous quote from Dan Ingalls' Design Principles Behind Smalltalk: "An operating system is a collection of things that don't fit into a language. There shouldn't be one.") Because Smalltalk was the OS, there was no need for a filesystem, all data was simply objects.
Smalltalk cannot control what is outside of Smalltalk. This is a general property that is not unique to Smalltalk. You can break encapsulation in Java by editing the compiled bytecode. You can break type-safety in Haskell by editing the compiled machine code. You can create a memory leak in Rust by editing the compiled machine code.
So, all the guarantees, features, and properties of Smalltalk are only available as long as you don't leave Smalltalk.
Here's an even simpler example that does not involve networking or moving the image to a different machine: open a file in the host filesystem. Suspend the image. Delete the file. Resume the image. There is no possible way to resume the image in the same state.
All Smalltalk can do, is approximate the state of external resources as good as it possibly can. It can attempt to re-open the file. If the file is gone, it can maybe attempt to create one with the same name. It can try to resume a network connection. If that fails, it can try to re-establish the connection, create a new connection to the same address.
But ultimately, everything outside the image is outside of the control of Smalltalk, and there is nothing Smalltalk can do about it.
Note that this impedance mismatch between the inside of the image and the "outside world" is one of the major criticisms that is typically leveled at Smalltalk. And if you look at Smalltalk systems that try to integrate deeply with the outside world, they often have to compromise. E.g. GNU Smalltalk, which is specifically designed to integrate deeply into a Unix system, actually gives up on the image and persistence.
I'll add one more angle to the nice answers of Joerg and JayK.
What is important to understand is the context of the time and age Smalltalk was created. (Joerg already pointed out important aspect of everything being Smalltalk). We are talking about time right after ARPANET.
I think they were not expecting the collaboration and interconnection we have nowadays. The image was meant as a record of a session of a single programmer without any external communication. Times changed and now you naturally ask the IO question. As JayK noted you can re-init the image so you will get image similar to the point you ended your session.
The real issue, the reason I've decided to add my 2c, is the collaboration among multiple developers. This is where the image, the original idea, is, in my opinion, outlived. There is no way to share an image among multiple developers so they could develop at the same time and share the code.
Imagine wouldn't it be great if you could have one central image and the developers would have only diffs and open their environment where they ended with everyone's new code incorporated? Sound familiar? This is kind of VCS we have like mercurial, git etc. without the image, only code. Sadly, to say such image re-construction does not exist.
Smalltalk is trying to catch up with the std. versioning tooling we use nowadays.
(Side note: Smalltalk had their own "versioning" systems but they rather lack in many ways compared to the current VCS. The ones used Monticello (Pharo), ENVY (VA Smalltalk), and Store (VisualWorks).)
Pharo is now trying to catch the train and implement the git functionality via iceberg. The Smalltalk/X-jv branch has integrated decent mercurial support. The Squeak has Squot for git (for now) [thank you #JayK].
Now "only" (that is BIG only) to add support for central/diff images :).
For some practical code dealing with image startUp and shutDown, take a look at the class side of ZnServer. SessionManager is the class providing all the functionality you need to deal with giving up system resources on shutdown and re-aquiring them again on startup.
Need to chime in on the source control discussion a bit.
The solution I have seen with VS(E)/VA is that you work with Envy/PVCS and share this repository with the developers.
Every developer has his/her own image with all the pros and cons of images.
One company I was working for was discussing whether it wouldnt make sense to build up the development image egain from scratch every couple of weeks in order to get rid of everything that might dilute the code quality (open Filehandles, global variables, pool dictionary entries, you name it, you will get it and it will crash your code during run-time).
When it comes to building a "run-time", you take the plain tiny standard image, and all your code comes from bind files and SLLs.

Etherpad style synchronisation in Meteor?

Looking into Meteor to create a collaborative document editing app, because it’s great that Meteor synchronizes data between multiple clients by default.
But when using a text-area, like in Sameer Kalburgi’s example
http://www.skalb.com/2012/04/16/creating-a-document-sharing-site-with-meteor-js/
http://docshare-tutorial.meteor.com/
the experience is sub-optimal.
I tried to type at the same time with a colleague and my changes would be overwritten when she typed and vice versa. So in the conflict resolution there is no merge algorithm yet, I think?
Is this planned for the feature? Are there ways to implement this currently? Etherpad seems to handle this problem rather well. Having this in Meteor would make creating collaborative document editing apps way more accessible.
So I looked into it some more, the algorithm used in Etherpad is known as Operational Transformation:
The solution is Operational Transformation (OT). If you haven’t heard of it, OT is a class of algorithms that do multi-site realtime concurrency. OT is like realtime git. It works with any amount of lag (from zero to an extended holiday). It lets users make live, concurrent edits with low bandwidth. OT gives you eventual consistency between multiple users without retries, without errors and without any data being overwritten.
Unfortunately, implementing OT sucks. There's a million algorithms with different tradeoffs, mostly trapped in academic papers. The algorithms are really hard and time consuming to implement correctly. We need some good libraries, so any project can just plug in OT if they need it.
Thats’s from the site of sharejs. A node.js based ot server-client that you can hook into your existing client.
OT is also implemented in the Racer model synchronization engine for Node.js, that forms the underpinnings for Derby. At the moment, derby.js doesn’t transparently provide it yet, but they plan too, from the Derby docs:
Currently, Racer defaults to applying all transactions in the order received, i.e. last-writer-wins. (…) Racer [also] supports conflict resolution via a combination of Software Transactional Memory (STM), Operational Transformation (OT), and Diff-match-patch techniques.
These features are not fully implemented yet, but the Racer demos show preliminary examples of STM and OT.
Coincidentally, both the sharejs and derbyjs teams have an ex Google-waver on board. Meteor has an ex etherpad/Google Waver in their core team. Since Etherpad is one of the best known implementations of OT I was imagining Meteor would surely want to support it at some point as well…
I've created a Meteor smart package that integrates ShareJS:
https://github.com/mizzao/meteor-sharejs
It's quite preliminary right now, but you can import it into your app, drop in textareas, and it "just works". Please try it out and submit some new features via pull requests :)
Check out a demo here:
http://documents.meteor.com
What you describe seems out of Meteors scope for me. Its not a tool to set up collaboration possibilities!
What it provides is a way to transparently work against a subset of a servers database. But the implementation of use-case specific merging functionality is the job of the application, not the framework.

Using a virtual machine inside of a game?

I am in the design stage of making a 3d "robot programming" game. Inspired by games such as Colobot, Robot Odyssey, Cholo, etc.
I want every robot in the game to have it's own isolated environment/operating system/virtual machine just as they would in real life. Each environment should be sandboxed so that it is local to the robot in terms of how it interacts with the rest of the game.
Originally I was going to implement the HACK VM as described in the book "Elements of Computing Systems", but then got curious as to whether there is a better solution in terms of performance for this style of game.
So my question is: Is there a virtual machine architecture already existing that would serve my purpose well?
P.s. The language and game engine to be used has not been decided yet, but will probably be either C# or smalltalk.
Hmm.. using a separate image per robot is a bit of overkill IMO.
I don't know the requirements of your project, but why just don't run all robots in same image using own Process?
(You must know that smalltalk supports green threading model).
About HydraVM: originally it was a proof of concept project. Surprisingly it is worked well and quite stable.
But for further development you need a projects which really require such architecture.
And frankly, smalltalk language-level infrastracture was not ready for it at that moment (i wouldn't say that it is ready today ;)
Because to leverage that, you need a better tools like remote debugging, remote browsing, remote image management etc etc.
I am really surprised to hear that you were able to run 11 images in parallel. That's awesome. Since i never tried to run more than 2 :)
The problem with such many images, that you need a different memory management system. And one which used in Hydra is inherited from Squeak VM and are not really fits with such design.

Is Seaside still a valid option?

Seaside just released a release candidate for the upcoming 3.0 version, so it appeared on my radar again. As I'm currently pondering what web framework to use for a future project, I wonder whether it's something to consider. Alas, most of the publicity for Seaside is from '07, which is probably one or two generations for the web. So I'm hoping that the community here can answer some questions
Continuation-based frameworks were pretty great when most of your workflow was mostly in HTML, e.g. form submits. For today's JavaScript-heavy environments, that hardly seems worthwhile anymore.
Is Squeak able to handle a reasonable workload? From other questions here and elsewhere, it seems that for proper scaling another implementation (Gemstone etc.) would probably fare better in the long run, but I don't have a proper idea how far away that is. Sessions seem to be rather expensive.
I know that comparisons are hard, but most of the articles you find on the net set Seaside and Rails side by side. How would combinations like Scala/Lift, Clojure/Compojure or Erlang/Nitrogen do instead?
I have answers to question one and two:
This is true. However since version 2.8 Seaside is not a strictly "continuation-based" framework anymore. Seaside uses continuations in the flow module only. Since Seaside 3.0 the flow module is even optional. Also note that Seaside has strong Javascript support since 2005, this is long before mainstream frameworks started to add Javascript functionality. Today Seaside comes with JQuery and JQueryUI support built-in.
Of course that depends on what you store within your session objects, but typically sessions are small (less than 20 KiB). Use the memory profiler in your application to determine the exact memory consumption.
And there is a new seaside book: http://book.seaside.st/book
I find the productivity of working in a Smalltalk IDE with a good set of abstractions outweights all other issues in engineering dominated projects. It works well as an enterprise system for a small company with about 100 (simultaneous, but not heavy) users on a single server (without going to SSD). Since 2007:
Seaside has shown to be able to make the switch from html workflows to javascript ones;
Seaside has been ported to a lot of different Smalltalks;
Has seen Gemstone release GLASS;
The new 'cog' vm with much improved performance has been released a few weeks ago and shows great promise for improved performance.
In Smalltalk we have now three web frameworks to consider, besides Seaside also
Aida/Web and
Iliad.
Both later effectively solve three-like control flow, but without needing continuations. Both also have a very strong Ajax integration, actually you don't realize anymore that you are working with Ajax.
Both also scale in memory consumption well. 10.000 sessions spend 220MB in Aida/Web, that is about 23KB per session, which can be further optimized down to mere 400B per session. This means, that you can run not only but many websites from the single Smalltalk image. Of course you can always upgrade to load balancing solution, when you really need. Which is from my experience very rarely needed.
Comparing to Ruby on Rails, a friend of mine complained that he needs 50MB of memory initially for every webshop site he is selling. He then turned to the Aida/Web solution where he needs about the same MB for the image, but then just few KB for every additional webshop site.
Avi Bryant, the developer of Seaside, said that AJAX triumphs continuations in almost all situations. Nevertheless, you can build reasonably powerful applications with Seaside and AJAX, too.
The Application part of a Web-App can be done in other frameworks quite well using Ajax.
I think a Seaside integrated Smalltalk-to-Javascript Framework like Cappuccino-for-Clamato is missing, currently. I'd like to be able to build real Javascript-Apps using Smalltalk.
Javascript is awesome but being capable of dealing with complicated workflow in a clean cheap way in the server side (as Seaside allows you to) is preventing it to become obsolete. Economy and utility are things that gives return in the short and long run. But telling this in the abstract has no value at all. You should be talking about a precise application and deciding if seaside is part of your bunch of competitive advantages to form an equation that rocks (and knowing why).
About scaling workload with Seaside, the short answer is that you can scale it like hell yah (for the long answer check my answer here: Does Seaside scale?).
too unanswerable man :) rty a variation of what you're really trying to ask
I think the best thing you can do is a prototype of something in a weekend.
If you can do a prototype in two days and you can capture some attention and you enjoyed the developing experience of doing it with seaside then you'll have the foundation of your next thing.
It costs only your time (you can publish in an amazon server).
BTW, this week I've heard about a startup that made its prototype by hand (was everything static and they processed stuff manually). Pretty amazing and crazy and cheap. When they felt that they had enough traction on the idea (which the had) they implemented the app (with whatever tech, I'm sure is no challenge for a seaside developer)

PyAMF backend choices!

I've been using PyAMF to write a backend for a flex app that will request different groups of hundreds of different images depending on what the client needs. I have been using the "simple_server" WSGI server that PyAMF supplies while developing the flex code. Now I'm ready to write a robust backend that will be able to pull images from a mySQL database and send them as fast as possible and as efficiently as possible to many concurrent clients.
The PyAMF documentation is great because they supply many examples to follow, however I am confused about what kind of backend I am trying to create.
Do I want a SocketServer or a WSGI server or something like Twisted or web2py or Tornado? Are these even all different? :) Should I be using Apache modules instead (mod_wsgi or modjy or mod_python)?
I realize that this probably touches on many open debates, so maybe you could just point me to any good summaries of these debates?
Its great to have so many options, but how do I choose?
The short answer is, of course, that it depends on the requirements of your project.
How many concurrent connections is "a lot"?
How much programmer time can you throw at the problem?
How much hardware can you throw at the problem?
...etc...
If you plan to have lots of concurrent clients, it's hard to beat Twisted in the Python world. However, you'll have to deal with your database asynchronously to avoid blocking, and depending on how complex your database interactions are, this can be a bit of a pain. You're basically limited to either using twisted.enterprise.adbapi or coming up with your own twisted-ORM integration.
If you'd rather have "easy" database code (i.e. you want to use an ORM), you're better off going with a (TurboGears/Pylons/plain wsgi) project, probably hosted using Apache and mod_wsgi. This can be a pretty scalable solution, and you get a lot of stuff for free using these frameworks, but it may be more than you need.
I would avoid using one of the many plain python wsgi servers out there (wsgiref, paster, etc.) in production if you really want high performance.
Good Luck!