Is it possible to model the Universe in an object oriented manner from the subatomic level upwards? - oop

While I'm certain this must have been tried before, I cant seem to find any examples of this concept being done myself.
What I'm describing goes off of the idea that effectively you could model all "things" which are as objects. From their you can make objects which use other objects. An example would be starting at the fundamental particles in physics combine them to get certain particles like protons neutrons and electrons - then atoms - work your way up to the rest of chemistry etc....
Has this been attempted before and is it possible? How would I even go about it?

If what you mean by "the Universe," is the entire actual universe, the answer to "Is it possible?" is a resounding "Hell no!!!"
Consider a single mole of H2O, good old water. By definition a mole contains ~6*1023 atoms, and knowing the atomic weights involved yields the mass. The density of water is well known. Pulling all the pieces together, we end up with 1 mole is about 18 mL of water. To put that in perspective, the cough syrup dose cup in my medicine cabinet is 20mL. If you could represent the state of each atom using a single byte—I doubt it!—you'd require 1011 terabytes of storage just to represent a snapshot of that mass, and you'd need to update that volume of data every delta-t for the duration you wish to simulate. Additionally, the number of 2-way interactions between N entities grows as O(N2), i.e., on the order of 1046 calculations would be involved, again at every delta-t. To put that into perspective, if you had access to the world's fastest current distributed computer with exaflop capability, it would take you O(1028) seconds (on the order of 1020 years) to perform the calculations for a single simulated delta-t update! You might be able to improve that by playing games with locality, but given the speed of light and the small distances involved you'd have to make a convincing case that heat transfer via thermal radiation couldn't cause state-altering interactions between any pair of atoms within the volume. To sum it up, the storage and calculation requirements are both infeasible for as little as a single mole of mass.
I know from a conversation at a conference a couple of years ago that there are some advanced physics labs that have worked on this approach to get an idea of what happens with a few thousand atoms. However, I can't give specific references since I haven't seen the papers and only heard about it over a beer.

Related

Detect CANBUS Patterns using Machine Learning

I would like to use AI/TensorFlow/Machine Learning of some description in order to recognise patterns in a set of data.
Most of the samples of machine learning seem to be based on decision making, whether a value falls above or a below a line, then the decision is good or bad. But what I seem to have is a set of data, that may or may not have any relationship, may or may not have a pattern, and a single entity by itself is neither good nor bad, but the whole set of data together needs to be used to work out the type of data.
The data in question is a set of readings over a period of time from an automotive CANBUS reader - so hexadecimal values containing a Command ID, and 1 or more values, usually in the format: FFF#FF:FF:FF:FF:FF:FF:FF:FF
eg:
35C#F4:C6:4E:7C:31:2B:60:28
One canbus command, may contain one or more sensor readers - so for example F4 above might represent the steering position, C6 might indicate the pitch of the vehicle, 4E might indicate the roll.
Each sensor may take up one or more octets, 7C:31 might indicate the speed of the vehicle, and "B" of "2B" might indicate whether the engine is running or not.
I can detect the data with a human eye, and can see that the relevent item might be linear, random, static (ie a limited set of values) or it might be a bell curve.
Im new to statistical analysis and machine learning so do not know the terminology. In the first instance Im looking for the terminology and references to appropriate material that will help me achieve my goal.
My goal is given a sample of data from a CANBUS reader, to scan all the values, using each and every possible combination of numbers (eg octets 1, 1+2, 1+2+3... 2, 2+3, 2+3+4... 3, 3+4 etc) to detect patterns within the data and work out whether those patterns are linear, curve, static, or random.
I want to basically read as many CAN-BUS readings from as many cars as I can, throw it at a program to do some analysis and learning and hopefully provide me with possibilities to investigate further so I can monitor various systems on different cars.
It seems like a relatively simple premise, but extremely hard for me to define.

First Fit Decreasing Algorithm on CVRP in Optaplanner

I am using the FFD Algorithm in Optaplanner as a construction heuristic for my CVRP problem. I thought I understood the FFD-Alg from bin picking, but I don't understand the logic behind it when applied in OP on CVRP. So my thought was, it focuses on the demands (Sort cities in decreasing order, starting with the highest demand). To proof my assumption, I fixed the city coordinates to only one location, so the distance to the depot of all cities is the same. Then I changed the demands from big to small. But it doesn't take the cities in decrasing order in the result file.
The Input is: City 1: Demand 16, City 2: Demand 12, City 3: Demand 8, City 4: Demand 4,
City 5: Demand 2.
3 Vehicles with a capacity of 40 per vehicle.
What I thougt: V1<-[C1,C2,C3,C4], V2<-[C5]
What happened: V1<-[C5,C4,C3,C2], V2<-[C1]
Could anyone please explain me the theory of this? Also, I would like to know what happens the other way around, same capacities, but different locations per customer. I tried this too, but it also doesn't sort the cities beginning with the farthest one.
Thank you!
(braindump)
Unlike with non-VRP problems, the choice of "difficulty comparison" to determine the "Decreasing Difficulty" of "First Fit Decreasing", isn't always clear. I've done experiments with several forms - such as based on distance to depot, angle to depot, latitude, etc. You can find all those forms of difficulty comperators in the examples, usually TSP.
One common pitfall is to tweak the comperator before enabling Nearby Selection: tweak Nearby Selection first. If you're dealing with large datasets, an "angle to depo" comparator will behave much better, just because Nearby Selection and/or Paritioned Search aren't active yet. In any case, as always: you need to be using optaplanner-benchmark for this sort of work.
This being said, on a pure TSP use case, the First Fit Decreasing algorithm has worse results than the Nearest Neighbor algorithm (which is another construction heuristics for which we have limited support atm). Both require Local Search to improve further, of course. However, translating the Nearest Neighbor algorithm to VRP is difficult/ambiguous (to say the least): I was/am working on such a translation and I am calling it the Bouy algorithm (look for a class in the vrp example that starts with Bouy). It works, but it doesn't combine well with Nearby Selection yet IIRC.
Oh, and there's also the Clarke and Wright savings algorithm, which is great on small pure CVRP cases. But suffers from BigO (= scaling) problems on bigger datasets and it too becomes difficult/ambiguous when other constraints are added (such as time windows, skill req, lunch breaks, ...).
Long story short: the jury's still out on what's the best construction heuristic for real-world, advanced VRP cases. optaplanner-benchmark will help us there. This despite all the academic papers talking about their perfect CH for a simple form of VRP on small datasets...

Is there a common/standard/accepted way to model GPS entities (waypoints, tracks)?

This question somewhat overlaps knowledge on geospatial information systems, but I think it belongs here rather than GIS.StackExchange
There are a lot of applications around that deal with GPS data with very similar objects, most of them defined by the GPX standard. These objects would be collections of routes, tracks, waypoints, and so on. Some important programs, like GoogleMaps, serialize more or less the same entities in KML format. There are a lot of other mapping applications online (ridewithgps, strava, runkeeper, to name a few) which treat this kind of data in a different way, yet allow for more or less equivalent "operations" with the data. Examples of these operations are:
Direct manipulation of tracks/trackpoints with the mouse (including drawing over a map);
Merging and splitting based on time and/or distance;
Replacing GPS-collected elevation with DEM/SRTM elevation;
Calculating properties of part of a track (total ascent, average speed, distance, time elapsed);
There are some small libraries (like GpxPy) that try to model these objects AND THEIR METHODS, in a way that would ideally allow for an encapsulated, possibly language-independent Library/API.
The fact is: this problem is around long enough to allow for a "common accepted standard" to emerge, isn't it? In the other hand, most GIS software is very professionally oriented towards geospatial analyses, topographic and cartographic applications, while the typical trip-logging and trip-planning applications seem to be more consumer-hobbyist oriented, which might explain the quite disperse way the different projects/apps treat and model the problem.
Thus considering everything said, the question is: Is there, at present or being planned, a standard way to model canonicaly, in an Object-Oriented way, the most used GPS/Tracklog entities and their canonical attributes and methods?
There is the GPX schema and it is very close to what I imagine, but it only contains objects and attributes, not methods.
Any information will be very much appreciated, thanks!!
As far as I know, there is no standard library, interface, or even set of established best practices when it comes to storing/manipulating/processing "route" data. We have put a lot of effort into these problems at Ride with GPS and I know the same could be said by the other sites that solve related problems. I wish there was a standard, and would love to work with someone on one.
GPX is OK and appears to be a sort-of standard... at least until you start processing GPX files and discover everyone has simultaneously added their own custom extensions to the format to deal with data like heart rate, cadence, power, etc. Also, there isn't a standard way of associating a route point with a track point. Your "bread crumb trail" of the route is represented as a series of trkpt elements, and course points (e.g. "turn left onto 4th street") are represented in a separate series of rtept elements. Ideally you want to associate a given course point with a specific track point, rather than just giving the course point a latitude and longitude. If your path does several loops over the same streets, it can introduce some ambiguity in where the course points should be attached along the route.
KML and Garmin's TCX format are similar to GPX, with their own pros and cons. In the end these formats really only serve the purpose of transferring the data between programs. They do not address the issue of how to represent the data in your program, or what type of operations can be performed on the data.
We store our track data as an array of objects, with keys corresponding to different attributes such as latitude, longitude, elevation, time from start, distance from start, speed, heart rate, etc. Additionally we store some metadata along the route to specify details about each section. When parsing our array of track points, we use this metadata to split a Route into a series of Segments. Segments can be split, joined, removed, attached, reversed, etc. They also encapsulate the method of trackpoint generation, whether that is by interpolating points along a straight line, or requesting a path representing directions between the endpoints. These methods allow a reasonably straightforward implementation of drag/drop editing and other common manipulations. The Route object can be used to handle operations involving multiple segments. One example is if you have a route composed of segments - some driving directions, straight lines, walking directions, whatever - and want to reverse the route. You can ask each segment to reverse itself, maintaining its settings in the process. At a higher level we use a Map class to wire up the interface, dispatch commands to the Route(s), and keep a series of snapshots or transition functions updated properly for sensible undo/redo support.
Route manipulation and generation is one of the goals. The others are aggregating summary statistics are structuring the data for efficient visualization/interaction. These problems have been solved to some degree by any system that will take in data and produce a line graph. Not exactly new territory here. One interesting characteristic of route data is that you will often have two variables to choose from for your x-axis: time from start, and distance from start. Both are monotonically increasing, and both offer useful but different interpretations of the data. Looking at the a graph of elevation with an x-axis of distance will show a bike ride going up and down a hill as symmetrical. Using an x-axis of time, the uphill portion is considerably wider. This isn't just about visualizing the data on a graph, it also translates to decisions you make when processing the data into summary statistics. Some weighted averages make sense to base off of time, some off of distance. The operations you end up wanting are min, max, weighted (based on your choice of independent var) average, the ability to filter points and perform a filtered min/max/avg (only use points where you were moving, ignore outliers, etc), different smoothing functions (to aid in calculating total elevation gain for example), a basic concept of map/reduce functionality (how much time did I spend between 20-30mph, etc), and fixed window moving averages that involve some interpolation. The latter is necessary if you want to identify your fastest 10 minutes, or 10 minutes of highest average heartrate, etc. Lastly, you're going to want an easy and efficient way to perform whatever calculations you're running on subsets of your trackpoints.
You can see an example of all of this in action here if you're interested: http://ridewithgps.com/trips/964148
The graph at the bottom can be moused over, drag-select to zoom in. The x-axis has a link to switch between distance/time. On the left sidebar at the bottom you'll see best 30 and 60 second efforts - those are done with fixed window moving averages with interpolation. On the right sidebar, click the "Metrics" tab. Drag-select to zoom in on a section on the graph, and you will see all of the metrics update to reflect your selection.
Happy to answer any questions, or work with anyone on some sort of standard or open implementation of some of these ideas.
This probably isn't quite the answer you were looking for but figured I would offer up some details about how we do things at Ride with GPS since we are not aware of any real standards like you seem to be looking for.
Thanks!
After some deeper research, I feel obligated, for the record and for the help of future people looking for this, to mention the pretty much exhaustive work on the subject done by two entities, sometimes working in conjunction: ISO and OGC.
From ISO (International Standards Organization), the "TC 211 - Geographic information/Geomatics" section pretty much contains it all.
From OGS (Open Geospatial Consortium), their Abstract Specifications are very extensive, being at the same time redundant and complimentary to ISO's.
I'm not sure it contains object methods related to the proposed application (gps track and waypoint analysis and manipulation), but for sure the core concepts contained in these documents is rather solid. UML is their schema representation of choice.
ISO 6709 "[...] specifies the representation of coordinates, including latitude and longitude, to be used in data interchange. It additionally specifies representation of horizontal point location using coordinate types other than latitude and longitude. It also specifies the representation of height and depth that can be associated with horizontal coordinates. Representation includes units of measure and coordinate order."
ISO 19107 "specifies conceptual schemas for describing the spatial characteristics of geographic features, and a set of spatial operations consistent with these schemas. It treats vector geometry and topology up to three dimensions. It defines standard spatial operations for use in access, query, management, processing, and data exchange of geographic information for spatial (geometric and topological) objects of up to three topological dimensions embedded in coordinate spaces of up to three axes."
If I find something new, I'll come back to edit this, including links when available.

Implementation of achievement systems in modern, complex games

Many games that are created these days come with their own achievement system that rewards players/users for accomplishing certain tasks. The badges system here on stackoverflow is exactly the same.
There are some problems though for which I couldn't figure out good solutions.
Achievement systems have to watch out for certain events all the time, think of a game that offers 20 to 30 achievements for e.g.: combat. The server would have to check for these events (e.g.: the player avoided x attacks of the opponent in this battle or the player walked x miles) all time.
How can a server handle this large amount of operations without slowing down and maybe even crashing?
Achievement systems usually need data that is only used in the core engine of the game and wouldn't be needed out of there anyway if there weren't those nasty achievements (think of e.g.: how often the player jumped during each fight, you don't want to store all this information in a database.). What I mean is that in some cases the only way of adding an achievement would be adding the code that checks for its current state to the game core, and thats usually a very bad idea.
How do achievement systems interact with the core of the game that holds the later unnecessary information? (see examples above)
How are they separated from the core of the game?
My examples may seem "harmless" but think of the 1000+ achievements currently available in World of Warcraft and the many, many players online at the same time, for example.
Achievement systems are really just a form of logging. For a system like this, publish/subscribe is a good approach. In this case, players publish information about themselves, and interested software components (that handle individual achievements) can subscribe. This allows you to watch public values with specialised logging code, without affecting any core game logic.
Take your 'player walked x miles' example. I would implement the distance walked as a field in the player object, since this is a simple value to increment and does not require increasing space over time. An achievement that rewards players that walk 10 miles is then a subscriber of that field. If there were many players then it would make sense to aggregate this value with one or more intermediate broker levels. For example, if 1 million players exist in the game, then you might aggregate the values with 1000 brokers, each responsible for tracking 1000 individual players. The achievement then subscribes to these brokers, rather than to all the players directly. Of course, the optimal hierarchy and number of subscribers is implementation-specific.
In the case of your fight example, players could publish details of their last fight in exactly the same way. An achievement that monitors jumping in fights would subscribe to this info, and check the number of jumps. Since no historical state is required, this does not grow with time either. Again, no core code need be modified; you only need to be able to access some values.
Note also that most rewards do not need to be instantaneous. This allows you some leeway in managing your traffic. In the previous example, you might not update the broker's published distance travelled until a player has walked a total of one more mile, or a day has passed since last update (incrementing internally until then). This is really just a form of caching; the exact parameters will depend on your problem.
You can even do this if you don't have access to source, for example in videogame emulators. A simple memory-scan tool can be written to find the displayed score for example. Once you have that your achievement system is as easy as polling that memory location every frame and seeing if their current "score" or whatever is higher than their highest score. The cool thing about videogame emulators is that memory locations are deterministic (no operating system).
There are two ways this is done in normal games.
Offline games: nothing as complex as pub/sub - that's massive overkill. Instead you just use a big map / dictionary, and log named "events". Then every X frames, or Y seconds (or, usually: "every time something dies, and 1x at end of level"), you iterate across achievements and do a quick check. When the designers want a new event logged, it's trivial for a programmer to add a line of code to record it.
NB: pub/sub is a poor fit for this IME because the designers never want "when player.distance = 50". What they actually want is "when player's distance as perceived by someone watching the screen seems to have travelled past the first village, or at least 4 screen widths to the right" -- i.e. far more vague and abstract than a simple counter.
In practice, that means that the logic goes at the point where the change happens (before the event is even published), which is a poor way to use pub/sub. There are some game engines that make it easier to do a "logic goes at the point of receipt" (the "sub" part), but they're not the majority, IME.
Online games: almost identical, except you store "counters" (int that goes up), and usually also: "deltas" (circular buffers of what's-happened frame to frame), and: "events" (complex things that happened in game that can be hard-coded into a single ID plus a fixed-size array of parameters). These are then exposed via e.g SNMP for other servers to collect at low CPU cost and asynchronously
i.e. almost the same as 1 above, except that you're careful to do two things:
Fixed-size memory usage; and if the "reading" servers go offline for a while, achievements won in that time will need to be re-won (although you usually can have a customer support person manually go through the main system logs and work out that the achievement "probably" was won, and manually award it)
Very low overhead; SNMP is a good standard for this, and most teams I know end up using it
If your game architecture is Event-driven, then you can implement achievements system using finite-state machines.

GPS signal cleaning & road network matching

I'm using GPS units and mobile computers to track individual pedestrians' travels. I'd like to in real time "clean" the incoming GPS signal to improve its accuracy. Also, after the fact, not necessarily in real time, I would like to "lock" individuals' GPS fixes to positions along a road network. Have any techniques, resources, algorithms, or existing software to suggest on either front?
A few things I am already considering in terms of signal cleaning:
- drop fixes for which num. of satellites = 0
- drop fixes for which speed is unnaturally high (say, 600 mph)
And in terms of "locking" to the street network (which I hear is called "map matching"):
- lock to the nearest network edge based on root mean squared error
- when fixes are far away from road network, highlight those points and allow user to use a GUI (OpenLayers in a Web browser, say) to drag, snap, and drop on to the road network
Thanks for your ideas!
I assume you want to "clean" your data to remove erroneous spikes caused by dodgy readings. This is a basic dsp process. There are several approaches you could take to this, it depends how clever you want it to be.
At a basic level yes, you can just look for really large figures, but what is a really large figure? Yeah 600mph is fast, but not if you're in concorde. Whilst you are looking for a value which is "out of the ordinary", you are effectively hard-coding "ordinary". A better approach is to examine past data to determine what "ordinary" is, and then look for deviations. You might want to consider calculating the variance of the data over a small local window and then see if the z-score of your current data is greater than some threshold, and if so, exclude it.
One note: you should use 3 as the minimum satellites, not 0. A GPS needs at least three sources to calculate a horizontal location. Every GPS I have used includes a status flag in the data stream; less than 3 satellites is reported as "bad" data in some way.
You should also consider "stationary" data. How will you handle the pedestrian standing still for some period of time? Perhaps waiting at a crosswalk or interacting with a street vendor?
Depending on what you plan to do with the data, you may need to supress those extra data points or average them into a single point or location.
You mention this is for pedestrian tracking, but you also mention a road network. Pedestrians can travel a lot of places where a car cannot, and, indeed, which probably are not going to be on any map you find of a "road network". Most road maps don't have things like walking paths in parks, hiking trails, and so forth. Don't assume that "off the road network" means the GPS isn't getting an accurate fix.
In addition to Andrew's comments, you may also want to consider interference factors such as multipath, and how they are affected in your incoming GPS data stream, e.g. HDOPs in the GSA line of NMEA0183. In my own GPS controller software, I allow user specified rejection criteria against a range of QA related parameters.
I also tend to work on a moving window principle in this regard, where you can consider rejecting data that represents a spike based on surrounding data in the same window.
Read the posfix to see if the signal is valid (somewhere in the $GPGGA sentence if you parse raw NMEA strings). If it's 0, ignore the message.
Besides that you could look at the combination of HDOP and the number of satellites if you really need to be sure that the signal is very accurate, but in normal situations that shouldn't be necessary.
Of course it doesn't hurt to do some sanity checks on GPS signals:
latitude between -90..90;
longitude between -180..180 (or E..W, N..S, 0..90 and 0..180 if you're reading raw NMEA strings);
speed between 0 and 255 (for normal cars);
distance to previous measurement matches (based on lat/lon) matches roughly with the indicated speed;
timedifference with system time not larger than x (unless the system clock cannot be trusted or relies on GPS synchronisation :-) );
To do map matching, you basically iterate through your road segments, and check which segment is the most likely for your current position, direction, speed and possibly previous gps measurements and matches.
If you're not doing a realtime application, or if a delay in feedback is acceptable, you can even look into the 'future' to see which segment is the most likely.
Doing all that properly is an art by itself, and this space here is too short to go into it deeply.
It's often difficult to decide with 100% confidence on which road segment somebody resides. For example, if there are 2 parallel roads that are equally close to the current position it's a matter of creative heuristics.