Exporting data sources between environments in pentaho - datasource

I'm new with Pentaho and I'm trying to set up an automatic deployment process for the pentaho business analytics platform repository, but I'm having troubles to find out how to proceed with the data sources.
I would like to do export/import all the data sources, the same that here is explained with the repository (Reporting, Analyzer, Dashboards, Solution Files...) but with the data connections, mondrian files, schemas....
I know there's way to backup and restore the entire repository (explained here), but that's not the way I want to proceed, since the entire repository could contain undesired changes for production.
This would need to be with command line or rest system or some other thing that be triggered by Jenkins.

Did you try import-export with the -ds(DataSource) qualifier ? This will include the data connection, mondrian schema and metadata models.
Otherwise, you can import everything, unzip, filter according a certain logic (to be defined by the guy in charge of the deployment), zip again and export it to prod. A half day project with the Pentaho Data Integrator.

Related

How mysql repository works in Pentaho User console?

Based on Pentaho guideline (https://help.pentaho.com/Documentation/8.2/Setup/Installation/Archive/MySQL_Repository) I successfully converted pentaho File based repository to MySQL database repository.
Now does anyone have any idea how MySQL repository store the data in database? It means If create new folder, new dashboard or new connection then how pentaho store this data in mysql database. Also need to know which tables is used for which purpose of data store.
Default created attached schema and tables based on mysql pentaho repository.
Please Provide any inputs or any reference material for same?
Pentaho's repository comprises three third party technologies: Jackrabbit, Hibernate, and Quartz. Reports/Jobs/Transformations and any other artifacts stored inside the Pentaho Server are generally stored in Jackrabbit. Scheduling info and triggers are stored in Quartz. And diagnostic info is stored in Hibernate (such as who accessed what reports, how long a report took to run, etc.).
None of this info is designed to be human readable directly out of the database tables. These are sort of "black box" technologies. These are third party technologies that Pentaho simply leverages for its repository functions. If you have additional questions, I'd recommend checking out the technologies themselves on their project pages.

Deploying USQL project

I am new to data lake analytics and using USQL.
I am currently setting up data factory pipeline which would replace an existing SSIS workflow. The data factory pipeline would essentially
Extract data transactional database into ADLS
Transform raw entities using USQL
Load the data into SSAS using custom activity
Question
I have a USQL project set up and wanted if there was a standard way of deploying them to ADLA other than just uploading the scripts to a folder in the store.
Great question!
I'm not sure about a standard way, or even a way that might be considered best practice yet. But I use all of the tools you mention to perform very similar tasks.
To try and answer your question: What I do is create the U-SQL scripts as stored procedures within the logical ADLA database. In the VS USQL project I have 1 script per stored proc. The ADF activities then call the proc name. This gives you the right level of disconnection between services and also means you don't need additional blob storage for USQL files.
In my VS solution I often also have a PowerShell project to help manage things. Specifically one what takes all my 'usp_' U-SQL scripts to create one big DDL style thing that can be deployed to the logical ADLA database.
The PowerShell then does the deployment for me using the submit job cmdlet. Example below.
Submit-AzureRmDataLakeAnalyticsJob `
-Name $JobName `
-AccountName $DLAnalytics `
–Script $USQLProcDeployAll `
-DegreeOfParallelism $DLAnalyticsDoP
Hope this gives you a steer. I also accept that these tools are still fairly new. So open to other suggestions.
Cheers

MongoDB connection with Pentaho Kettle (PDI)

I've just downloaded Pentaho Data Integration Community (pdi-ce-6.1.0.1-196) a.k.a. Kettle, with the goal of designing an ETL routine to make nightly migrations from MongoDB scheme into PostgreSQL.
I couldn't achieve the very first task: create a MongoDB connection. MongoDB is not listed as a Connection Type in the New Connection dialog, so I chose Generic database. Then, I failed to find anything related to MongoDB in the Custom Driver Class Name field required for the generic connection.
Is it possible that the installation/configuration went wrong with Kettle? I remember that I had to kill the first startup because it hanged forever.
Or does PDI-CE lacks some component that I must get somewhere else?
PDI handles Mongodb differently than other databases.
If working on a transformation (vs a job), go to the "Big Data" group of steps and there are two steps - one for MongoDB Input and one for MongoDB Output.
Within those steps you specify the connection information to your database.
Hope that helps,
Mark
P.S. There is also a "MongoDB Delete" in the marketplace that comes in useful when deleting data from collections.

The Pentaho BI Platform Workflow Issue

I have been working with Pentaho for the last few days. I have been able to setup the Pentaho Report Designer to generate a sample report by follow their documentation. Then I follow this article http://www.robertomarchetto.com/www/how_to_use_pentaho_report_designer_tutorial and managed to export the report to Pentaho BI server.
All I don't understand is Pentaho workflow. What should be the process I should follow which means what's the purpose of exporting the export to Pentaho BI server? Why there is a Data Integration tool? Why there is a BI sever when I can export the report from the Designer tool?
Requirement
All I want to do is retrieve the data from the MYSQL DB. Put them into a data-mart. Then from the data-mart generate a report.(According to what I have read, creating a data mart is the efficient way).
How can I get it done?
Pentaho Data Integration can be used to make this report generation automated.
In report designer you will be passing a parameter or set of parameters to generate a single report output.
With Data integration you can generate the reports for different set of parameters. for eg: if reports are generated on daily basis, we can make it automated for the whole month, so that there is no need of generating reports daily and manually.
And using the Pentaho Business Intelligence server we can make all these operations scheduled.
To generate Data/Table(Fact tables/dimension table) in MYSQL DB From difference source like files/different DB - Data Integration tool comes in to picture .
To create Schema on top of Fact tables - Mondrian tool
To handle user/roles on top of created cubes -Meta data editor
To create simple reports on top of small tables - Report Designer
For sequential Execution (at a go) usage of DI jobs/transformation , Reports, Java script - Design Studio
thanks to user surya.thanuri # forums.pentaho.com
The Data Integration tool is mostly for ETL, it's a separate tool and you can ignore it unless you are doing complex analysis of data from multiple dissimilar data sources. You don't need to 'export' reports to the pentaho server, you can write them directly to a directory then refresh the repository from inside the Pentaho web application. Exporting them is just one workflow technique.
You're going to find that there are about a dozen ways to do any one thing with Pentaho. For instance I use the CDA datasources with my reports vice placing the sql code inside my report. Alternatively you can link up to a Data Integration server to execute the Data Integration scripts to view a result set.
Just to answer your datamart question. In general a datamart should probably be supported by either the Data Integration tool (depending on your situation I don't exactly recommend this) or database functions/replication streams (recommended).
Just to hazard a guess, it sounds like someone tossed you a project saying: We need a BI system, here's the database where the data is stored, here are the reports we're already getting. X looked at Pentaho and liked it. You should use that.
First thing you need to do is understand the shape of the data, volume, tables, interrelations. Figure out what the real questions they want to answer are. Determine whether they need real time reporting, etc..etc. Just getting the datamart together itself, if you even need one, can take quite awhile. I think you may have jumped the gun on Pentaho itself.
thanks to user flamierd # forums.pentaho.com

How to do deployment using alter script in ssas

Is any thing wrong if i create alter script on the entire database in analysis service in the development server SSMS and execute that script on the production server SSMS instead of deploying through BIDS?
no, you actually should never use BIDS to deploy to prod. BIDS will always overwrites the management settings(security and partition) of the target server.
the best option is to use the Deployment Wizard. It enables you to generate an incremental deployment script that updates the cube and dimension structures. Can customize how roles and partitions are handled. It uses as input files the XML output files generated by building the SSAS in BIDS and you can run on several modes:
Silent Mode (/s): Runs the utility in silent mode and not display any dialog boxes.
Answer file mode (/a): Do not deploy. Only modify the input files.
Output mode (/o): No user interface is displayed. Generate the XMLA script that would be sent to the deployment targets. Deployment will not occur.
If you want a complete synchronization, you can use the "Synchronize Database Wizard". It pretty much clones a database. When the destination database already exists, it performs metadata synchronization and incremental data synchronization. When the destination database does not exist, a full deployment and data synchronization is done.
I think the main disadvantage of scripting the whole database is that everything may be reprocessed. Also, if another team or team member is responsible for deploying the script it may be a lot harder to review and understand if everything is rebuilt with each update.
I work for Red Gate and we recently introduced a free tool called SSAS Compare to help manage this scenario. It helps you to create a script containing just the changes you want to deploy