Does Pentaho Kettle create dimension and fact tables - pentaho

I am new to pentaho kettle and wanted to know that does it create data warehouse and dimension and fact tables automatically?
Can anyone provide me a link from where i can study the entire features of kettle?

How about the Docs?
Kettle Documentation
Specifically, look at the Combination Lookup/Update step and the Dimension Lookup/Update step.
Any other questions, check here or on the Pentaho forums.

Related

Example data for Hive tutorial

The original Hive tutorial available online refers to a dataset called "pv_2008-06-08.txt":
https://cwiki.apache.org/confluence/display/Hive/Tutorial
And of course, it is referenced in dozens of tutorials all over the Internet. However, there is no way I can find the original data anywhere. Does anybody have a clue where is it?
After reading through the given site, found that examples given in that site is outdated. Please use new link for more examples.
https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-UsageandExamples
NOTE: Many of the following examples are out of date.  More up to date information can be found in the LanguageManual.
If you are still interested in that dataset, suggest you to mail to the community to provide (please refer below link)
http://hive.apache.org/mailing_lists.html
Hortonworks datasets:
Recently I come across this Hortonworks datasets which can be used for creating database and queries in Hive and Pig.
https://app.box.com/v/hadoopcrashcoursedata
If you want to try with this dataset, here is the link for creating table using the above dataset
http://hortonworks.com/hadoop-tutorial/hello-world-an-introduction-to-hadoop-hcatalog-hive-and-pig/#section_4

Why Pentaho PDI has many names?

I was just wondering why Pentaho PDI has many names such as spoon, kettle and Pentaho PDI what is the real name of this tool?
(I'm talking about the tool to extract data from certain data source and modify and migrate to another location)
As dirk said - basically PDI started out life as any other independent open source product. The naming schema of these products is typically quite jokey. You can clearly see the reason for the original naming - everything being related to things you do in the kitchen. For example - you use a spoon to mix your recipe, and spoon is the tool to make your ETL. So it's actually a pretty useful naming scheme.
Obviously when Pentaho came along they needed something more professional. Hence PDI was created.
In reality everyone still uses the old names. (Even pentaho - look at the jira, github, and shell script names!)
I'm surprised there is both a kettle and a pdi tag on here actually! they should be merged!

How to use Pentaho Data Integration (Spoon 4.4) for Star Modeling

I'm looking for a tool to build a Mondrian's cube XML description file to use over my Star Schema (one fact table and 4 dimensions tables directly link to the fact table).
I'm a bit lost, browsing tutorial doesn't help since they seem to be about another version than mine. Mine doesn't include the Star Model perspective, I just have a Model perspective.
My feeling is that this Model perspective is just useful to build Mondrian cube for a flat model (with just one big fact table). I would be glad if someone could confirm that. But my main demand is : how to build a Star Model description schema (with Pentaho's tool) ? If there's a missing plugin, how to install it ?
You need to download a separate tool for the moment called schema workbench. Get it from the mondrian not pentaho project on SourceForge.
This tool will be replaced by something new in the next 6 months but it does what you need and has handy validation built in

mondrian adapter for bigquery

It would be mighty to have a way to query Google's BigQuery with MDX. I believe the natural solution would be a Mondrian adapter.
Is something like this in the works?
The reason I'm asking is because there is a lot of know-how in MDX and an MDX connector would allow us to reuse what we already know.
Furthermore, MDX is ideally suited for OLAP queries. Things like hierarchies and calculating a ratio of one's parent (e.g. % contribution to total) are standardized in MDX but can be solved in 100 different ways in SQL.
Calculating a Moving Average of the last 3 non empty weeks is still complicated in SQL and easy in MDX. There are many examples.
And lastly, it would allow to analyze data from Google BigQuery with an Excel Pivot or any of the 100+ other existing tools spewing out MDX queries.
Cheers,
Micha
There is a demo here that is using Mondrian/BigQuery with the Saiku user interface:
http://dev.analytical-labs.com/
This archive contains dependencies that can be used to set up a BigQuery data source in Saiku's embedded Mondrian server (got this from the Saiku twitter feed):
http://t.co/EbtaP95G
Their instructions are here for setting up BigQuery:
https://gist.github.com/4073088
You can download Saiku (with embedded Tomcat and Mondrian) here to run locally for testing:
http://analytical-labs.com/downloads.php
One issue I notice is that the drill-down functionality doesn't work because of the limitations of BigQuery SQL. My guess is that Mondrian devs will have to add some special SQL support for BigQuery to get around this. For example, any fields used in an ORDER BY clause must also be in the SELECT field list.
There is no existing BigQuery integration with Pentaho's Mondrian. One thing I would point out is that BigQuery is already very fast over massive datasets, so some of Mondrian's advantages may be moot with a BigQuery back end. However, I could imagine that one could use an existing Pentaho analysis tool to explore data. I'd like to know more about the use case.

any way to populate random datas into a data base for testing purpose?

I have been searching in StackOverflow, but I only got so far this website http://www.generatedata.com/#generator
but it doesn't support foreign constraints.
is there any software which can full fill a database with a lots of tables?
or the only way is writing my own script?
thanks
Check out SQL Data Generator from Red Gate:
http://www.red-gate.com/products/sql-development/sql-data-generator/
The features page lists:
Foreign key support for generating consistent data across multiple tables
Not exactly what you're looking for, but DBUnit can be used to generate meaningful test data.