It looks like lots of stuff was recently deprecated or moved to contrib, and it doesn't seem like there are any up-to-date equivalents in core. What was the reasoning behind this? I can't seem to find any GitHub issue or discussion explaining these changes. Example GitHub search
Around the time of the 1.0 release, we made a large effort to carefully choose what should be part of the "official" supported API and feature set, because anything included would have to be supported for a long, long time and becomes very hard to change. The rules for stuff in contrib are less tight in terms of making non-backwards-compatible changes, so components judged likely to change substantially in the near future (i.e., to not be stable yet) were moved to contrib or elsewhere.
Related
From OptaPlanner 8.17, it seems that the code of task assigning example project has been refactored a lot. I didn't succeed in finding in the release notes nor on Github any comment about these changes.
In particular, the implementation of the problem to solve doesn't involve chained variables anymore since this version. Could someone from the OptaPlanner team explain why ? I'm also a bit confuse because the latest version of the documentation related to this example project is still referencing the previous deleted classes from version before 8.17 (eg.
org/optaplanner/examples/taskassigning/domain/TaskOrEmployee.java).
It's using #PlanningListVariable, an new (experimental) alternative to chained planning variables, which is far easier to understand and maintain.
Documentation for this new feature hasn't been written yet. We're finishing up the ListVariableListener interface and then the documantation will be updated to cover #PlanningListVariable too. At that time, it will be ready for announcement.
Unlike a normal feature, this big, complex feature took more than a year to bake. That's why it's been delivered in portions. One could argue the task assignment example shouldn't have escaped the feature branch, but it was proving extremely expensive to not merge the stable feature branches in sooner rather than later.
I'm considering adding the PH-tree to ELKI.
I couldn't find any tutorials for examples for that, and the internal architecture is not fully obvious to me at the moment.
Do you think it makes sense to add the PH-tree to ELKI?
How much effort would that be?
Could I get some help?
Does it make sense to implement only an in-memory version, as done for the kd-tree (as far as I understand)?
Some context:
The PH-tree is a spatial index that was published at SIGMOD'14: paper, Java source code is available here.
It is a bit similar to a quadtree, but much more space efficient, doesn't require rebalancing and scales quite well with dimensionality.
What makes the PH-tree different from the R*-Tree implementations is that there is no concept of leaf/inner nodes, and nodes don't will not directly map to pages. It also works quite well with random insert/delete (no bulk-loading required).
Yes.
Of course it would be nice to have a PH-tree in ELKI, to allow others to experiment with it. We want ELKI to become a comprehensive tool; it has R-trees, M-trees, k-d-trees, cover-trees, LSH, iDistance, inverted lists, space-filling-curves, PINN, ...; there are working-but-not-cleaned-up implementations of X-tree, rank-cover-trees, bond, and some more.
We want to enable researchers to study which index works best for their data easily, and of course it would be nice to have PH-tree, too. We also try to push the limits of these indexes, e.g. when supporting other distance measures than Euclidean distance.
The effort depends on how experienced you are with coding; ELKI uses some well-optimized data structures, but that means we are not using standard Java APIs in a number of places because of performance. Adding the cover tree took me about one day of work, for example (and it performed really nicely). I'd assume a more flexible (but also more memory intensive) k-d-tree would be a similar amount of work. I have not studied the PH-tree in detail, but I'd assume it is slightly more effort than that.
My guts also say that it won't be as fast as advertised. It appears to be a prefix-compressed quadtree. In my experiments bit-interleaving approaches such as required for Hilbert curves can be surprisingly expensive. It also probably only works for Minkowski metrics. But you are welcome to prove me wrong. ;-)
You are always welcome to as for help at the mailing list, or here.
I would do an in-memory variant first, to fully understand the index. Then benchmark it to identify optimization potential, and debug it. Until then, you may not have figured out all the corner cases, such as duplicate points handling, degenerated data sets etc.
Always make on-disk optional. If your data fits into memory, a memory-only implementation will be substantially faster than any on-disk version.
When contributing to ELKI, please:
avoid external dependencies. We've had bad experience with the quality of e.g. Apache Commons, and we want to have the package easy to install and maintain, so we want to keep the .jar dependencies to a minimum (also, having tons of jars with redundant functionality comes at a performance cost). I'm inclined to only accept external dependencies for optional extension modules.
do not copy code from other sources. ELKI is AGPL-3 licensed, and any contribution to ELKI itself should be AGPL-3 licensed, too. In some cases it may be possible to include e.g. public domain code, but we need to keep these to a minimum. We could probably use Apache licensed code (in an external library), but shouldn't mix them. So from a quick look, you are not allowed to copy their source code into ELKI.
If you are looking for data mining project ideas, here is a list of articles/algorithms that we would love to see contributed to ELKI (we keep this list up to date for student implementation projects):
http://elki.dbs.ifi.lmu.de/wiki/ProjectIdeas
Given that it's very hard to find anything about dependency version ranges in the official documentation (the best I could come up with is http://docs.codehaus.org/display/MAVEN/Dependency+Mediation+and+Conflict+Resolution), I wonder if they're still considered a 1st class citizen of Maven POMs.
I think most people would agree that they're a bad practice anyway, but I wonder why it's so hard to find anything official about it.
They are not deprecated in the formal sense that they will be removed in a future version. However, their limitations (and the subsequent lack of wide adoption), mean that they are not as useful as originally intended, and also that they are unlikely to get improvements without a significant re-think.
This is why the documentation is only in the form of the design doc - they exist, but important use cases were never finished to the point where I'd recommend generally using them.
If you have a use case that works currently, and can accommodate the limitations, you can expect them to continue to work for the forseeable future, but there is little beyond that in the works.
I don't know why you think that version ranges are not documented. There is a concrete abstract in the Maven Complete Reference documentation.
Nevertheless - a huge problem (in my opinion) is that it is documented that "Resolution of dependency ranges should not resolve to a snapshot (development version) unless it is included as an explicit boundary." (the link you provided) but the system behaves different. If you use version ranges you will get SNAPSHOT versions if they exists in your range (MNG-3092). The discussion if this is wanted or not has not ended yet.
Currently - if you use version ranges - you might get SNAPSHOT dependencies. So you really have to be careful and decide if this is wanted. It might be useful for your own developed depedencies but I doubt that you should use it for 3rd party libraries.
Version ranges are the only reason that Maven is still useful. Even considering not using them is bad practice as it leads you into the disaster of multi-module builds, non-functional parent poms, builds that take 10 minutes or longer, badly structured projects like Spring, Hibernate and Wicket as we cover on our Illegal Argument podcast.
To answer your question, they are not deprecated and are actively used in many projects successfully (except when Sonatype allows corrupt metadata into Apache Maven Central).
If you want a really good example of a non-multi-module build (reactor.xml's only) where version ranges are used extensively, go look at Sticky code (http://code.google.com/p/stickycode/)
I have been interested in formal methods for some time. I have used formal methods to reason about some very specific sub-areas of a few projects I have been working on. I was never able to convince other team members to try the same let alone specify an entire domain with a formal method.
One method I have found particularly interesting is Alloy. I think that it may "scale" better as foundation for an entire project because it is conceptually and notationally very close to actual programming languages. Furthermore, the tools are quite solid so that the benefits of model verification are readily available.
I'd be very much interested to hear about any real-world experiences you folks might have had with using Alloy in your projects. Do you feel that it has helped you in designing a better domain model? Did find errors in your domain model during verification? Would you use it again?
I've used Alloy on a few projects and have found it helpful; on some but not all of those projects I have been able to persuade others involved to use Alloy as well, or at least to work with the Alloy models I wrote. These projects may or may not be what you have in mind in asking for 'real-world' projects, but they certainly took place in the part of the real world I work in.
In 2006 and 2007 I created a partial Alloy model for the then-current draft of the W3C XProc specification; as far as I could tell, most members of the working group never read the paper I wrote (at http://www.w3.org/XML/XProc/2006/12/alloy-models/models.html); they said "Oh, we changed that part of the spec last week, so what the model says is no longer relevant". But the paper did manage to persuade the editor of the spec that the abstract 'component' level described in the first draft of the spec was woefully underspecified and needed to be either fully specified or dropped. He dropped it, with (I think) good results for the readability and usability of the spec.
In 2010 I made an Alloy model of the XPath 1.0 data model, which uncovered some glitches in the specification. The reaction of most interested parties (including the W3C working group responsible for maintaining the XPath 1.0 spec) has, unfortunately, not been encouraging.
A research project I'm involved with has used Alloy to model the MLCD Overlap Corpus, a collection of sample documents and related information we are creating (hyperlinks suppressed at SO's insistence); the Alloy model found a couple of errors in our initial design for the corpus catalog, so it was well worth the effort.
And we have also used Alloy to formalize some modeling work we have done on the nature of transcription and on the extension of the type/token distinction to document structure (for our paper, look for the 2010 proceedings of Balisage: The Markup Conference). This lies a little bit outside Alloy's usual area of application, as it has nothing to do with software design, but Alloy's ability to check models for consistency and generate instances has been invaluable in showing us some of the logical consequences of this or that possible axiom for our model.
To answer your specific questions: yes, Alloy has helped me specify cleaner domain models, and yes, it has found errors and glitches. They have often been small, for the reasons Daniel Jackson explains in his book Software Abstractions: first, if you use models during design, you catch errors early, when everything is still small. And, second (in Jackson's words), "In hindsight, most software design issues are trivial."
He continues: "But if you don't address them head-on, trivial issues have a nasty habit of becoming nontrivial." My experience amply confirms this. Much better to head off such problems early. So yes, I will use Alloy again.
Yes, I've used Alloy and it's cousins industrially. Alloy has been most helpful in convincing me that my models weren't wildly wrong---or rather, showing me where they were wrong and gave rise to silly results. Other more specific tools, like Song's Athena and Guttman and Ramsdell's CPSA have been more useful in their narrower domains. What more would you like to hear about?
Belatedly adding to this thread... Eunsuk Kang has recently applied Alloy to perform security analyses of web APIs for some start ups (following many applications of Alloy in security such as Apurva's analysis of OAuth and Barth et al's analysis of browser based security mechanisms for CSRF etc); Pamela Zave has been working on an impressive analysis of Chord, a peer to peer storage system, and has recently written up a fix to the original algorithm.
This is more general question then language-specific, altho I bumped into this problem while playing with python ncurses module. I needed to display locale characters and have them recognized as characters, so I just quickly monkey-patched few functions / methods from curses module.
This was what I call a fast and ugly solution, even if it works. And the changes were relativly small, so I can hope I haven't messed up anything. My plan was to find another solution, but seeing it works and works well, you know how it is, I went forward to other problems I had to deal with, and I'm sure if there's no bug in this I won't ever make it better.
The more general question appeared to me though - obviously some languages allow us to monkey-patch large chunks of code inside classes. If this is the code I only use for myself, or the change is small, it's ok. What if some other developer takes my code though, he sees that I use some well-known module, so he can assume it works as it's used to. Then, this method suddenly behaves diffrent then it should.
So, very subjective, should we use monkey patching, and if yes, when and how? How should we document it?
edit: for #guerda:
Monkey-patching is the ability to dynamicly change the behavior of some piece of code at the execution time, without altering the code itself.
A small example in Python:
import os
def ld(name):
print("The directory won't be listed here, it's a feature!")
os.listdir = ld
# now what happens if we call os.listdir("/home/")?
os.listdir("/home/")
Don't!
Especially with free software, you have all the possibilities out there to get your changes into the main distribution. But if you have a weakly documented hack in your local copy you'll never be able to ship the product and upgrading to the next version of curses (security updates anyone) will be very high cost.
See this answer for a glimpse into what is possible on foreign code bases. The linked screencast is really worth a watch. Suddenly a dirty hack turns into a valuable contribution.
If you really cannot get the patch upstream for whatever reason, at least create a local (git) repo to track upstream and have your changes in a separate branch.
Recently I've come across a point where I have to accept monkey-patching as last resort: Puppet is a "run-everywhere" piece of ruby code. Since the agent has to run on - potentially certified - systems, it cannot require a specific ruby version. Some of those have bugs that can be worked around by monkey-patching select methods in the runtime. These patches are version-specific, contained, and the target is frozen. I see no other alternative there.
I would say don't.
Each monkey patch should be an exception and marked (for example with a //HACK comment) as such so they are easy to track back.
As we all know, it is all to easy to leave the ugly code in place because it works, so why spend any more time on it. So the ugly code will be there for a long time.
I agree with David in that monkey patching production code is usually not a good idea.
However, I believe that for languages that support it, monkey patching is a very valuable tool for unit testing. It allows you to isolate the piece of code you need to test even when it has complex dependencies - for instance with system calls that cannot be Dependency Injected.
I think the question can't be addressed with a single definitive yes-no/good-bad answer - the differences between languages and their implementations have to be considered.
In Python, one needs to consider whether a class can be monkey-patched at all (see this SO question for discussion), which relates to Python's slightly less-OO implementation. So I'd be cautious and inclined to expend some effort looking for alternatives before monkey-patching.
In Ruby, OTOH, which was built to be OO down into the interpreter, classes can be modified irrespective of whether they're implemented in C or Ruby. Even Object (pretty much the base class of everything) is open to modification. So monkey-patching is rather more enthusiastically adopted as a technique in that community.