Hortonworks vs Apache projects - apache

I want to know what is the difference between installing HortonWorks HDP vs installing the components directly from Apache projects? One thing I can think of is that Horton works probably has the packages aligned so that the version of each component is compatible with that of the others within the suite, while getting them directly from Apache projects, I may have to handle version compatibility myself. Is that correct? Is there any other difference involved ignoring the support subscription aspect of it.
Thanks.

There are a lot of differences between "roll your own" and using a distribution. Some of the most obvious include:
All of the various components and versions have been tested and built to work together - incompatibility between versions (e.g. Hive, Hadoop, Spark, etc.) can be a painful problem to sort through on your own
Most distribution providers, including Hortonworks, will bring patches in from unstable releases into stable releases, so even for the "same" version (e.g. Hive 1.2.1) you're getting a better release than vanilla - these can include both bug fixes and "safe" feature changes
Most distribution providers, including Hortonworks, provide some flavor of centralized platform management. I'm a big fan of Ambari (the one that comes with HDP), for example - it makes configuration and monitoring significantly easier than coordinating a vanilla install
I would strongly recommend against trying to deploy vanilla, unless it's just for learning and playing. HDP community edition is free (both definitions) and a major improvement over doing it yourself. My last deployment of HDP was entirely based on the community edition.

Related

SemVer collision: How to release bug fix over the last stable version if there are some alpha/beta/rc versions and the work is in progress?

I'm maintaining some js library. Releases follow SemVer. Current stable version is 1.5.0. I'm working on 1.5.1 and have 1.5.1-beta.2 which is published at npm with "next" tag. Today I got bug report, discovered the issue and ready to fix it. The thing is that 1.5.1 is not going to be finished during nearest days, it turned out to be more complicated than I planned initially. But I want the fix to be published.
What is the right strategy in this situation? Obvious approach which I'd like to avoid is to postpone the bug fix until 1.5.1 is done and published and then release 1.5.2 containing the fix.
Another way is to publish the fix as 1.5.1 based on 1.5.0 and then continue previous work switching it from 1.5.1-beta.2 to 1.5.2 or even 1.6.0. I'm concerning about inconsistency with the result chain in this case:
1.5.0 → 1.5.1-beta → 1.5.1-beta.1 → 1.5.1-beta.2 → 1.5.1 (bug fix, based on 1.5.0) → 1.5.2 (based on 1.5.1-beta.2)
How such collisions are being addressed using SemVer?
Okay, so you have bug set A currently baking as 1.5.1-beta2 and you have a new bug set B that you want to get the fix out for immediately. The correct mechanism for this is to fork 1.5.0, fix bug set B, and release 1.5.2 (assuming you don't need a beta). Then merge your B fixes into your A working branch and release 1.5.3-beta1 and proceed to drive that to an official release.
It gets a little more complicated when you have two parallel beta sequences running, particularly when you're not sure which is going to make it to release first, but it is manageable. The key is to to just keep in mind, how SemVer precedence impacts the decisions your customers make (the algorithms they apply), whether to fast-track a particular version into their production systems, verses how their developers pull bits from you.
My production systems, have two inputs:
Development is the product of my engineers.
Automated maintenance is the product of a system that:
Pulls patch releases and applies them to a fork of my current production code.
Tests the applied changes against an extensive suite of functional and performance tests.
If the tests are green, flight tests the changes in my production environment, while monitoring for unusual changes in production failure rates.
As long as everything is going well and a human doesn't step-in to stop it, eventually rolls out the changes to the entire production system.
There are of course, variations for services and packaged products. The point is, you can use your release points to signal to your customers automation, or developers, that you have an important bug fix that has little risk of breaking anything. There is no requirement that 1.5.2 have any lineage back to 1.5.1-beta#. You are not required to ever release a 1.5.1. It is customary however to add a comment in your release notes that 1.5.2 is a hot fix for the bug in 1.5.0 and does not contain the fixes in 1.5.1-beta#.
While you may never encounter a need to do so, you don't have to include the bug fixes from 1.5.2 in your eventual 1.5.3 release, provided the later release, passes your quality controls. It is sometimes the case that a specific bug fix, winds up not being applicable in later releases.
How you maintain your product quality is entirely up to you. How you signal the level of risk/importance for a specific release, is defined by the SemVer standard.

Why akka.persistence is still having beta release? Is it stable?

Why akka.persistence is still having beta release on nuget packages. Does it imply it is still not stable and not good for used in production applications?
In Akka.NET in order to get out of prerelease, a package must meet multiple criteria, like:
Having full test suite up and running. In case of clustered plugins, this also includes multi-node tests.
Having a fixed API. There are dedicated API Approval tests ensuring, that no public API has been accidentally changed.
Having a battery of performance tests. While many of plugins are ready and usually fast without it, stress tests are needed in order to check if any of the merged pull requests didn't introduce any performance penalties.
Having all documentation writen and published.
While this is a lot, not all of these are necessary to make plugin functional. In case of Akka.Persistence there are minor changes (like deprecation of PersistentView in favor of persistence queries), but the plugin itself is production ready and used as such already. However maturity of persistent backend plugins, that are used underneat, may vary.
Akka.Persistence is stable now. You can download it by running following command in Package Manager Console
Install-Package Akka.Persistence

How long will a version of a Python package be available on PyPI?

I would like to distribute an application that depends on several PyPI-packaged libraries. I have carefully selected certain versions of some of these libraries as newer versions (in some cases) are incompatible. My installer downloads them (with pip) at install-time and sets up the environment for the application. But how long are those versions going to be available? 6 hours? 2 years? Anything in between?
I'm basically looking for some sort of policy that tells me how long those versions of libraries are going to be hosted on PyPI (and who makes that decision).
In-before-"distribute them yourself": That is an answer to a different question.
This is really about how PyPI works, not how I distribute my application.
The friendly people in #python tell me that authors can delete any version of their packages at any time.
The only way to indemnify yourself against a version of something becoming nuked is to (assuming their license allows it) ship it yourself.
There is an argument for continuous integration against the latest versions on PyPI but that does assume there will be a new version and that the author doesn't just delete the whole thing. CI is just a good practice here, not a panacea.

Building couchdb with minimal dependencies

I want to play around and check out Apache's couchdb as a possible back-end for a web-app that I am designing. Therefore I want to have an instance of couchdb, but also to be able to throw it away when the testing is done. The development computer is an Ubuntu laptop (not server). The problems are:
The Ubuntu repository has couchdb 1.0, but the couchdb website strongly recommends to install 1.1, built from source.
I have Erlang built and installed from source, because the Erlang distro from the repository is defective. I don't see the point in installing another Erlang aside it.
couchdb has a lot of dependencies, including a whole bunch of perl libs, that I really don't need, and prefer to throw away when I'm done.
So I am looking for a way to either:
Install couchdb 1.1 as a package that can be easily uninstalled, or
Build couchdb from source, with as few as possible installed dependencies, so when I'm done I can just delete it. Preferably, do this without building another Erlang distro, but configuring it to use the existing one.
Is any of these possible, and how? Thanks in advance.
Btw, I am aware of the build-couchdb project, but from what I read, it requires installing all the build dependencies in advance, which is undesirable, because it will leave a whole bunch of dangling packages in my system, without being a dependency of a couchdb package. It also fetches a copy of Erlang, which is redundant for me.
(Dear moderators: This questions combines issues that relate not only to programming, but also to server administration, Unix software, and, particularly, Ubuntu Linux issues. Therefore, it might be suitable for a few other stack exchange sites. I recon it is most likely to be answered here, since this kind of hackery is often done by programmers. However, if I am wrong, feel free to migrate it, and I apologies in advance for your troubles.)
You could install CouchDB into a chroot jail
A chroot is a way of isolating applications from the rest of your computer, by putting them in a jail. This is particularly useful if you are testing an application which could potentially alter important system files
From the Ubuntu instructions on creating a chroot jail
Another option, assuming your laptop has the appropriate hardware virtualization support, is to use KVM.
The KVM option might be more helpful in the long run as you could move the VM's disk image onto a server.

Should I start with Trac 0.12?

I'm going to start using Trac for the first time. From what I've gathered, the latest 0.12 is capable of supporting multiple project easily (which is something I will need since I got about 5 projects). However, it seems 0.12 is still in the development (0.12-dev). So, my question is, is it good enough for a newbie in Trac like me to use it? Does anyone has any experience using it ? It will be installed on a Linux server.
BTW, I'll only be using the basic functions such as svn browser, wiki, tickets and others.
0.12 is only going to support a subset of multiple projects (milestone) - you can now connect multiple source repositories with a single Trac environment. you will still need to create your own logic for handling multiple projects inside that single environment, with ticket components or whathaveyou.
i'm running all envs on 0.12 trunk (currently) r9280, i follow trac development timeline and hand pick my next revision to upgrade to, when something important gets a fix. some of my environments have multiple svn and git repositories connected. svn is rock solid, GitPlugin occasionally causes some quirks (rev caching issues mainly), but for me it's all minor compared to the convenience i get.
i would definitely recommend moving straight to 0.12-dev, i've already written a bit about some other benefits over 0.11.