Storing data files on Pentaho Repository - pentaho

I am working with Pentaho products: Pentaho Data Integration (PDI) and Pentaho Server.
From PDI, I could create connection to the Pentaho Repository running on a Pentaho Server. I already created some jobs or transformations, stored them in Repository and they're all well-executed.
However when I configurate the input/ output sources of jobs/ transformations, I could only use data files from my local machine. Is there any way that I can store data files on Repository, and configurate the jobs/ transformations to read data from them?
Thank you so much.
P/s: I am working with PDI v9.3 CE and Pentaho Server v9.3 CE.

This is one of the solution I found.
For each transformation/ job stored on Pentaho Repository, there are 3 running configurations: pentaho local, pentaho server, pentaho slave.
Pentaho local: execute trans/job on the machine being used. That means the trans/job can only read & write data in local storage.
Pentaho server: execute trans/job on the machine hosting Pentaho Server (also Pentaho Repository), wih this running mode, trans/job can read & write data in server's storage
Pentaho slave: execute trans/job using slave server's resource.
So, in my case, the solution is switching running config of trans/job to Pentaho Server config.

Related

Tableau with Hive Kerberos

My project uses published tableau data-sources.
These data-sources have been created using tableau desktop.
All connect to Hive database using the Native Hortonworks Hadoop Hive connector.
We have a database user and a tableau user with publish rights.
Database credentials are embedded in the extract and then it's published to tableau server.
The reports fetch data from these published data-sources.
The Hive database is now getting Kerberoized + SSL.
Will my existing published data-sources be of use anymore?
Do I have to re-create all the extracts again and publish them again to tableau server?
What will be the best plan to migrate all these data-sources to this new Kerberoized environment?
Regards
Please see below link from tableau community forum, versions may be different but people able to solve the Kerberos Hive connectivity issue.
https://community.tableau.com/thread/149383

How can I use NiFi to ingest data from/to ADLS

I would like to use NiFi to connect with ADLS. My scenario is like this: Nifi is installed and running in windows machine. Now I want to move data from my windows local directory to ADLS. I am not using any hadoop component for now. From ADLS again I want to move that data to SQL server which is in Azure too.
How can I connect windows running Nifi to ADLS? All the instruction I found configuring core-site.xml files and taking the jars to Nifi specific folder. But as I don't have Hadoop running(so I don't have core-site.xml file) in that case how I can connect Nifi to ADLS?
Can anyone please share the pointers how it can be done?
Thanks in advance.
You can try to use ExecuteGroovyScript processor and the native azure lib to work with adls.
Here is a java example:
https://github.com/Azure-Samples/data-lake-store-java-upload-download-get-started/blob/master/src/main/java/com/contoso/sample/UploadDownloadApp.java
But it could be easily converted to groovy script.

Apache Ignite Application to load database table?

How can I create a maven project in java to load an Oracle database table on the Apache Ignite server?
Also, I'm supposed to create the project on my local machine while Apache Ignite runs on a remote machine to which I have SSH connection.
You can use Ignite Web Console to do that. There is a public Ignite Web Console hosted by GridGain.
It will ask you to download Ignite Web Console Agent, connect to your Oracle database, analyze your data structure and output a zipped Maven project with data load functionality out of box (via loadCache).
Deployment of the project to remote machine is out of scope of this excercise.

copying reports from a reporting database to antoher

I am a total newbie in SQL/SQL server stuff, and I am using SSRS to make a new reporting server/service and upload some .rdl files to it
I have a reporting server on a machine, which has a lot of reports and data sources uploaded to it's database.
I created a new reporting server with a fresh database on another machine, and what I want to do is to copy the old database content to the fresh one (the reports and the datasources..etc)
I have no copy of the individual reports to upload them to the new server using localhost/reports
is there's a fast solution to what i am having? please do it in detail because I never worked with SQL before.
Different ways to do this:
Report Server Databases
Use the detach/attach or backup/restore instructions here. Both of these methods require a backup of encryption keys on the existing instance, which are then restored to the new report server instance. Instructions on backup/restore of encryption keys here. Migrating the ReportServer and ReportServerTempdb databases is the easiest way to ensure all content is available on the new server.
Report Object Scripting
Reporting Services Scripter is an older (but still working with SSRS 2008R2, not sure about 2012) tool that can be used to transfer objects (folders, shared data sources, shared data sets, reports, etc) between report servers. Good choice if you want to pick and choose what is migrated.
If you are receiving an error regarding unsupported scale-out deployment, this means you are running Standard edition and need to remove the old report server entry from the database in the new location. It can be done using Reporting Services Configuration Manager, or by using rskeymgmt at command line.
Reporting Services Configuration Manager
Open Reporting Services Configuration Manager and connect to the new report server instance.
Click on Scale-out Deployment to view registered report servers.
Select the old report server instance and click the Remove Server button.
Command line and rskeymgmt
Browse to the Tools\Binn folder of your SQL Server client installation.
Run the following to list registered report servers
rskeymgmt -l -i
Using the installation ID (GUID) of the old report server, remove it
rskeymgmt -r -i
More info on scale-out deployments and rskeymgmt here.
To migrate Reporting Services, use migration manual from MSDN (https://msdn.microsoft.com/en-us/library/ms143724(v=sql.120).aspx). If you encounter "the feature: scale-out deployment is not supported in this edition of reporting services. (rsoperation notsupported)" error, go to ReportServer database and remove the old encryption key from table dbo.Keys.

How to load SQL data into the Hortonworks?

I have Installed Hortonworks SandBox in my pc. also tried with a CSV file and its getting in a table structerd manner its OK (Hive + Hadoop), nw I want to migrate my current SQL Databse into Sandbox (MS SQL 2008 r2).How I will do this? Also want to connect to my project (VS 2010 C#).
Is it possible to connect through ODBC?
I Heard sqoop is using for transferring data from SQL to Hadoop so how I can do this migration with sqoop?
You could write your own job to migrate the data. But Sqoop would be more convenient. To do that you have to download Sqoop and the appropriate connector, Microsoft SQL Server Connector for Apache Hadoop in your case. You can download it from here.Please go through the Sqoop user guide. It contains all the information in proper detail.
And Hive does support ODBC. You can find more on this at this page.
I wrote down the steps you need to go through in the Hortonworks Sandbox to install the JDBC driver and get it to work: http://hortonworks.com/community/forums/topic/import-microsoft-sql-data-into-sandbox/
To connect to Hadoop in your C# project you can use the Hortonworks Hive ODBC driver from http://hortonworks.com/thankyou-hdp13/#addon-table. Read the PDF (which is also on that page) to see how it works (I used Hive Server Type 2 with user name sandbox)