The Pentaho BI Platform Workflow Issue - pentaho

I have been working with Pentaho for the last few days. I have been able to setup the Pentaho Report Designer to generate a sample report by follow their documentation. Then I follow this article http://www.robertomarchetto.com/www/how_to_use_pentaho_report_designer_tutorial and managed to export the report to Pentaho BI server.
All I don't understand is Pentaho workflow. What should be the process I should follow which means what's the purpose of exporting the export to Pentaho BI server? Why there is a Data Integration tool? Why there is a BI sever when I can export the report from the Designer tool?
Requirement
All I want to do is retrieve the data from the MYSQL DB. Put them into a data-mart. Then from the data-mart generate a report.(According to what I have read, creating a data mart is the efficient way).
How can I get it done?

Pentaho Data Integration can be used to make this report generation automated.
In report designer you will be passing a parameter or set of parameters to generate a single report output.
With Data integration you can generate the reports for different set of parameters. for eg: if reports are generated on daily basis, we can make it automated for the whole month, so that there is no need of generating reports daily and manually.
And using the Pentaho Business Intelligence server we can make all these operations scheduled.

To generate Data/Table(Fact tables/dimension table) in MYSQL DB From difference source like files/different DB - Data Integration tool comes in to picture .
To create Schema on top of Fact tables - Mondrian tool
To handle user/roles on top of created cubes -Meta data editor
To create simple reports on top of small tables - Report Designer
For sequential Execution (at a go) usage of DI jobs/transformation , Reports, Java script - Design Studio
thanks to user surya.thanuri # forums.pentaho.com
The Data Integration tool is mostly for ETL, it's a separate tool and you can ignore it unless you are doing complex analysis of data from multiple dissimilar data sources. You don't need to 'export' reports to the pentaho server, you can write them directly to a directory then refresh the repository from inside the Pentaho web application. Exporting them is just one workflow technique.
You're going to find that there are about a dozen ways to do any one thing with Pentaho. For instance I use the CDA datasources with my reports vice placing the sql code inside my report. Alternatively you can link up to a Data Integration server to execute the Data Integration scripts to view a result set.
Just to answer your datamart question. In general a datamart should probably be supported by either the Data Integration tool (depending on your situation I don't exactly recommend this) or database functions/replication streams (recommended).
Just to hazard a guess, it sounds like someone tossed you a project saying: We need a BI system, here's the database where the data is stored, here are the reports we're already getting. X looked at Pentaho and liked it. You should use that.
First thing you need to do is understand the shape of the data, volume, tables, interrelations. Figure out what the real questions they want to answer are. Determine whether they need real time reporting, etc..etc. Just getting the datamart together itself, if you even need one, can take quite awhile. I think you may have jumped the gun on Pentaho itself.
thanks to user flamierd # forums.pentaho.com

Related

How mysql repository works in Pentaho User console?

Based on Pentaho guideline (https://help.pentaho.com/Documentation/8.2/Setup/Installation/Archive/MySQL_Repository) I successfully converted pentaho File based repository to MySQL database repository.
Now does anyone have any idea how MySQL repository store the data in database? It means If create new folder, new dashboard or new connection then how pentaho store this data in mysql database. Also need to know which tables is used for which purpose of data store.
Default created attached schema and tables based on mysql pentaho repository.
Please Provide any inputs or any reference material for same?
Pentaho's repository comprises three third party technologies: Jackrabbit, Hibernate, and Quartz. Reports/Jobs/Transformations and any other artifacts stored inside the Pentaho Server are generally stored in Jackrabbit. Scheduling info and triggers are stored in Quartz. And diagnostic info is stored in Hibernate (such as who accessed what reports, how long a report took to run, etc.).
None of this info is designed to be human readable directly out of the database tables. These are sort of "black box" technologies. These are third party technologies that Pentaho simply leverages for its repository functions. If you have additional questions, I'd recommend checking out the technologies themselves on their project pages.

Querying PowerBI data from MS-SQL

I finally decided to ask this (after a lot of google searching):
So we use Power BI for data visualization and thus in it are some calculated dashboards / data outputs which are used to monitor data quality etc. I want to be able to historical log these results so that over-time we can monitor progress i.e. was data quality improved. This is the end of the initial problem.
One approach to this problem was to connect to PowerBI from the MS-SQL side - hoping we can then set timed triggers to do the log by READING THE POWER-BI DASHBOARDS: So how do I query that (I have already developed a method to determine the connection using the Power-BI port as described here:
EXPORTING DATA FROM POWER BI DESKTOP TO MS-SQL
This is a screenshot from one of my MS-SQL connections through "Analysis Services":
I am assuming the objects named like "LocalDateTable_" are the actual BI analysis I want to query. "New Query" is an MDX type of Query. Should I go this route for my problem (logging powerbi analyses)?
At first this sounds crazy but on reflection I guess it was only a matter of time, and a sign of the maturity of Power BI solutions ...
I would use the SQL Server Profiler to capture the queries generated while you use your dashboard & report.
https://insightsquest.com/2017/05/07/profiler-trace-for-power-bi-desktop/
Then I would build an SSIS package to run the MDX queries and deliver the datasets to SQL Server, with extra columns e.g. StartTime.

SSAS - MSBI - Solution - Suggestions

Is it correct in my understanding that we can build SSAS cubes sourcing from the transaction Systems? I meant the not the live but copy of the Live.
I'm trying to see if there is any scope to address few reporting needs without the need to build a traditional Data Warehouse and then build cubes on top of the data warehouse, instead build cubes to do Financial monthly aggregated reporting needs sourcing from backup copy of the Transaction systems.
Alternatively, if you have any better way to proceed please suggest.
Regards,
KK
You can create a set of views on top of you transactional system tables and then build your SSAS cubes ontop of those views. This would be less effort than creating a fully fledged datawarehouse.
I am a data warehouse developer (and therefore believe in cubes), but not every reporting solution warrants the cost of building a cube. If your short to medium term reporting requirements are fixed and you don't have users requiring data to be sliced differently each week, then a series of fixed reports may suffice.
You can create a series of SQL Server Reporting Services reports (or extract to Excel) either directly against your copied transactional data, or against a series of summarised tables that are created periodically. If you decide to utilise a series of pre-formatted reporting tables, try to create tables that cover multiple similar reports (rather than 1 monthly report table = 1 report) for ease of ongoing maintenance.
There are many other important aspects to this that you may need to consider first. Like how busy is the transaction system, what is the size of the data, concurrency and availability issues etc.
It is absolutely fine to have a copy of your live data and then build a report on the top of it. Bear in mind that the data you see in the report will not be the latest and there will be a latency factor depending on the frequency of your data pull.

Pulling data across multiple servers

The company i am working for is implementing Share-point with reporting servers that runs on an SQL back end. The information that we need lives on two different servers. The first server being the Manufacturing server that collects data from PLCs and inputs that information into a SQL database, the other server is our erp server which has data for payroll and hours worked on specific projects. The i have is to create a view on a separate database and then from there i can pull the information from both servers. I am having a little bit of trouble with the syntax for connecting the two servers to run the View. We are running ms SQL. If you need any more information or clarification please let me know.
Please read this about Linked Servers.
Alternatively you can make a Data Warehouse - which would be a reporting data base. You can feed this by either making procs with linked servers or use SSIS packages if they're not linked.
It all depends on a project size and complexity, but in many cases it is difficult to aggregate data from multiple sources with Views. The reason is that the source data structure is modeled for the source application and not optimized for reporting.
In that case, I would suggest going with an ETL process, where you would create a set of Extract, Transform and Load jobs to get data from multiple sources (databases) into a target database where data will be stored in the format optimized for reporting.
Ralph Kimball has many great books on the subject, for example:
1) The Data Warehouse ETL Toolkit
2) The Data Warehouse Toolkit
They are truly worth the read if you are dealing with data

What are ways to transfer tables from Oracle to SQL Server

I've been searching the internet for this question:
What are ways to transfer data and tables on a daily basis from an Oracle's Hyperion to SQL Server 2000?
I am an intern at a company and trying to figure out possible ways to do this. Any help or point in the right direction is greatly appreciated
This is going to depend a lot on specifics. Here are just a few possible solutions:
DTS
DTS is packaged with SQL 2000 and is made for this kind of a task. If written correctly, your DTS package can have good error-handling and be rerunnable/reusable.
SSIS
SSIS is actually packaged with SQL 2005 and above, but you can connect it to other databases. It's basically a better version of DTS. (technically it's radically different than DTS, but has a lot of the same functionality)
Linked Servers
From SQL 2000 you should be able to connect directly to your Oracle database as a linked server. In the pros column this kind of direct access can be easy to work with if you don't have any other technical skills such as DTS or SSIS, but it can be complex to get the initial set-up right and there may be security concerns/issues.
Build Your Own
Depending on what other technologies you use you can build your own application to do the ETL (Extract/Transform/Load, which is what you're doing). This could be in .NET, Java, etc. In the pros column you can use something with which you're familiar but there's a big downside here in that most of the low level type of work is already out there in tools like DTS/SSIS, so why reinvent the wheel?
BCP
You can simply extract the data from Oracle as .csv files (or some other format) and then import them back in using SQL Server's Bulk Copy Process. This can be fast, but there aren't many bells and whistles to go with this. If this is a one-time thing with just a few tables though then this is probably the easiest and fastest way to do it.
Third Party Applications
There are a slew of ETL applications already written out there (Data Import, Data Slave, etc.). They will usually provide wizards and one-click solutions (maybe a few more than one click), but they are also going to cost a bit of extra money.
EDIT:
Given your latest comment, I would probably go with a DTS package that's scheduled in SQL Agent to run daily. You can add in error-handling and have the system email/text/call someone if there's ever an issue (or do positive case reporting - ie. send a message when it's successful so that someone knows that there's a problem if they don't get a message each day.
In our company we use ADO.Net for the same task.
We created a source to Oracle , taking all data and then creating it in SQL server
You could write DTS packages to copy the data, and schedule them to run within Sql Server Agent.
See DTS Overview for information on DTS packages.
Here's a tutorial on creating a DTS package: Creating DTS Packages With SQL Server 2000
Oracle Hyperion is a suite of products, largely unrelated to Oracle's database product. I expect you are referring to a product such as Hyperion Financial Management or Hyperion Strategic Finance. These products have APIs that can be consumed using COM Interop or web services. The data can be extracted from the internal multidimensional database by analyzing the database metadata, creating dimension trees, and then using the information to create selections, that represent subcubes within the database; allowing you to get or set cell data.
I don't know what your level of knowledge of multidimensional databases is, but unless it is substantial you may find the task pretty hard. You also need to get a handle on the particular product API.
My company specializes in these kinds of activities, and we have components for this kind of thing. Drop me a line on my blog if you need further advice.
danielvaughan.org
Cheers,
Daniel
I don't know anything about Hyperion, but SQL Server 2000 is very old and may not have a driver to be able to pull data from Hyperion if the version of that is newer than the year 2000. You may need to look to see if there is a way to push the data from Hyperion rather than pull it into SQL Server 2000. One way i have done this is the past is to create pipe delimited text file from the data base that orginally has the data and palce it in a processing directory. I do know that DTS will process a pipe-delimited text file. So if you can't find a driver to process this data directly, consider if you can push it out to file and then process. You wil have to schedule a time gap between the job on Hyperion that creates the file and the DTS package job. But if you are only doing it once a day, that's prbably not a problme.