Pentaho Data Integration commercial costs + azure hosting - pentaho

I did some Google searches but could not find any clear answer. How much does PDI/kettle costs for commercial usage? Is it potentially free? Can it be hosted in Azure?

The company I work for was recently given a quote for the Pentaho EE licence. I'm from the uk so the price is in pounds. We were offered a quote of £30,000 a year and told it is usually £50,000 a year.
Needless to say, we settled with CE.

It depends on your work amount. If you are looking for a very limited amount of usage, then Pentaho Community Edition would be better, since its free. But for Pentaho Enterprise Edition, you need to pay for the licence and stuff. Do not know much on the exact pricing and stuff.
But since you are looking only for Kettle, i would suggest to go with CE edition
. Hope it helps :)

There are two flavors of Pentaho: a limited free version (Community Edition), and a professional version (Enterprise Edition).
The Community Edition Kettle ETL (Extract, Transform and Load) tool is open-source and quite powerful, but the free version of the Business Analytics tool is not as versatile. You can find both here: http://community.pentaho.com/
The Enterprise Edition's price will vary depending on your planned use, primarily the number of cores you want to run it on. I can't give exact numbers, but as of December 2014 it's the most afforable of the professional BI platforms, probably about 10% of the cost of Microstrategy. Might still be out of reach of most small to medium businesses, though.

Related

Pentaho OLAP limit

I have been working on multidimensional analysis with pentaho community. The problem is, when I do the aggregations and filters, I get in the output no more than 1000 records(rows). I want to know if am doing something wrong or pentaho analysis tool has a limitation.
If so, does power BI community edition have a good limit ? Or can you suggest me another community tool to continue the work with it.
Are you using Saiku for OLAP analysis?
For Saiku, we have the limit in
TABLE_LAZY_SIZE = 1000 (default) Which you can change as per your requirement.
reference: http://saiku-documentation.readthedocs.io/en/latest/saiku_settings.html

Testing my anaphora resolution tool

I am in the course of building an anaphora resolution tool. I have done a lot of literature review and I have a pretty good idea on what I should do to build a basic tool. However, the problem is, how do I test it. I can't find any annotated corpus which I could test it on. Could someone suggest how I would measure the precision and recall of my tool.
From here:
http://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00152
Section 4.1
OntoNotes-Dev
– development partition of OntoNotes v4.0 provided in
the CoNLL2011 shared task (Pradhan et al. 2011).
OntoNotes-Test
– test partition of OntoNotes v4.0 provided in the
CoNLL-2011 shared task.
ACE2004-Culotta-Test
– partition of the ACE 2004 corpus reserved for
testing by several previous studies (Culotta et al. 2007; Bengtson and Roth
2008; Haghighi and Klein 2009).
ACE2004-nwire
– newswire subset of the ACE 2004 corpus, utilized by
Poon and Domingos (2008) and Haghighi and Klein (2009) for testing.
MUC6-Test
– test corpus from the sixth Message Understanding
Conference (MUC-6) evaluation.
You can find MUC details here
http://www-nlpir.nist.gov/related_projects/muc/muc_data/muc_data_index.html
Just look around at the start of the experimental section in your references. You are bound to find links. If you look at the most commonly used ones, you will find your data sets.

Alternatives to Essbase

I have Essbase as the BI solution (for Predictive Analytics and Data Mining) in my current workplace. It's a really clunky tool, hard to configure and slow to use. We're looking at alternatives. Any pointers as to where I can start at?
Is Microsoft Analysis Services an option I can look at? SAS or any others?
Essbase focus and strenght is in the information management space, not in predictive analytics and data mining.
The top players (and expensive ones) in this space are SAS (with Enterprise Miner & Enteprise Guide combination) and IBM with SPSS.
Microsoft SSAS (Analysis Services) is a lot less expensive (it's included with some SQL Server versions) and has good Data Mining capabilities but is more limited in the OR (operations research) and Econometrics/Statistics space.
Also, you could use R, an open source alternative, that is increasing its popularity and capabilities over time, for example some strong BI players (SAP, Microstrategy, Tableau, etc.) are developing R integration for predictive analytics and data mining.
Check www.kpionline.com , is a product cloud based in Artus.
It has many dashboards, scenarios and functions prebuilt to do analysis.
Other tool than you could check is microstrategy. It has many functions to analysis.

Quality Audit Software applied to Free Software

I'm doing an investigation searching for quality audit software applied to free software but haven't not much luck so far. I already found information of software that make quality audit http://en.wikipedia.org/wiki/Computer-aided_audit_tools but don't know if those applies to free software too.
Any idea or information in this matter will be very useful.
Thanks in advanced.
From experience working with CAAT technology, IDEA software has been the strongest by far, but it is not free. I'm sure there is some free software, but it will be poor quality.
There's the annual Open Source Code Quality report, which seems to be the most relevant document.
References
Projects Audited by Coverity Scan
2013 Coverity Scan Report(pdf)
2012 Coverity Scan Report(pdf)

Scaling cheaply: MySQL and MS SQL

How cheap can MySQL be compared to MS SQL when you have tons of data (and joins/search)? Consider a site like stackoverflow full of Q&As already and after getting dugg.
My ASP.NET sites are currently on SQL Server Express so I don't have any idea how cost compares in the long run. Although after a quick research, I'm starting to envy the savings MySQL folks get.
MSSQL Standard Edition (32 or 64 bit) will cost around $5K per CPU socket. 64 bit will allow you to use as much RAM as you need. Enterprise Edition is not really necessary for most deployments, so don't worry about the $20K you would need for that license.
MySQL is only free if you forego a lot of the useful tools offered with the licenses, and it's probably (at least as of 2008) going to be a little more work to get it to scale like Sql Server.
In the long run I think you will spend much more on hardware and people than you will on just the licenses. If you need to scale, then you will probably have the cash flow to handle $5K here and there.
The performance benefits of MS SQL over MySQL are fairly negligible, especially if you mitigate them with server and client side optimzations like server caching (in RAM), client caching (cache and expires headers) and gzip compression.
I know that stackoverflow has had problems with deadlocks from reads/writes coming at odd intervals but they're claiming their architecture (MSSQL) is holding up fine. This was before the public beta of course and according to Jeff's twitter earlier today:
the range of top 32 newest/modified
questions was about 20 minutes in the
private beta; now it's about 2
minutes.
That the site hasn't crashed yet is a testament to the database (as well as good coding and testing).
But why not post some specific numbers about your site?
MySQL is extremely cheap when you have the distro (or staff to build) that carries MySQL Enterprise edition. This is a High Availability version which offers multi-master replication over many servers.
Pros are low (license-) costs after initial purchase of hardware (Gigs of RAM needed!) and time to set up.
The drawbacks are suboptimal performance with many joins, no full-text indexing, stored procesures (I think) and one need to replicate grants to every master node.
Yet it's easier to run than the replication/proxy balancing setup that's available for PostgreSQL.