How can I use NiFi to ingest data from/to ADLS - azure-storage

I would like to use NiFi to connect with ADLS. My scenario is like this: Nifi is installed and running in windows machine. Now I want to move data from my windows local directory to ADLS. I am not using any hadoop component for now. From ADLS again I want to move that data to SQL server which is in Azure too.
How can I connect windows running Nifi to ADLS? All the instruction I found configuring core-site.xml files and taking the jars to Nifi specific folder. But as I don't have Hadoop running(so I don't have core-site.xml file) in that case how I can connect Nifi to ADLS?
Can anyone please share the pointers how it can be done?
Thanks in advance.

You can try to use ExecuteGroovyScript processor and the native azure lib to work with adls.
Here is a java example:
https://github.com/Azure-Samples/data-lake-store-java-upload-download-get-started/blob/master/src/main/java/com/contoso/sample/UploadDownloadApp.java
But it could be easily converted to groovy script.

Related

Storing data files on Pentaho Repository

I am working with Pentaho products: Pentaho Data Integration (PDI) and Pentaho Server.
From PDI, I could create connection to the Pentaho Repository running on a Pentaho Server. I already created some jobs or transformations, stored them in Repository and they're all well-executed.
However when I configurate the input/ output sources of jobs/ transformations, I could only use data files from my local machine. Is there any way that I can store data files on Repository, and configurate the jobs/ transformations to read data from them?
Thank you so much.
P/s: I am working with PDI v9.3 CE and Pentaho Server v9.3 CE.
This is one of the solution I found.
For each transformation/ job stored on Pentaho Repository, there are 3 running configurations: pentaho local, pentaho server, pentaho slave.
Pentaho local: execute trans/job on the machine being used. That means the trans/job can only read & write data in local storage.
Pentaho server: execute trans/job on the machine hosting Pentaho Server (also Pentaho Repository), wih this running mode, trans/job can read & write data in server's storage
Pentaho slave: execute trans/job using slave server's resource.
So, in my case, the solution is switching running config of trans/job to Pentaho Server config.

How to connect sql database with ignite cluster to sync data?

I am new to apache ignite. I created ignite cluster and connect my nodejs thin client to it. It is working fine but It only create cache create functions specified in js file. Now I want to sync my sql server data with ignite. Any idea how I will do it?
I tried to connect with Grid gain but it does not allow me to create free cluster?
Please refer to 3rd Party Persistence documentation regarding RDBMS integration.
GridGain Web Console can help you set up database integration by generating Maven project corresponding to your RDBMS data model.
GridGain Community Edition is free to use as long as you deploy it on your own. But, it is also supported by stock Apache Ignite.

Apache Ignite Application to load database table?

How can I create a maven project in java to load an Oracle database table on the Apache Ignite server?
Also, I'm supposed to create the project on my local machine while Apache Ignite runs on a remote machine to which I have SSH connection.
You can use Ignite Web Console to do that. There is a public Ignite Web Console hosted by GridGain.
It will ask you to download Ignite Web Console Agent, connect to your Oracle database, analyze your data structure and output a zipped Maven project with data load functionality out of box (via loadCache).
Deployment of the project to remote machine is out of scope of this excercise.

How to load SQL data into the Hortonworks?

I have Installed Hortonworks SandBox in my pc. also tried with a CSV file and its getting in a table structerd manner its OK (Hive + Hadoop), nw I want to migrate my current SQL Databse into Sandbox (MS SQL 2008 r2).How I will do this? Also want to connect to my project (VS 2010 C#).
Is it possible to connect through ODBC?
I Heard sqoop is using for transferring data from SQL to Hadoop so how I can do this migration with sqoop?
You could write your own job to migrate the data. But Sqoop would be more convenient. To do that you have to download Sqoop and the appropriate connector, Microsoft SQL Server Connector for Apache Hadoop in your case. You can download it from here.Please go through the Sqoop user guide. It contains all the information in proper detail.
And Hive does support ODBC. You can find more on this at this page.
I wrote down the steps you need to go through in the Hortonworks Sandbox to install the JDBC driver and get it to work: http://hortonworks.com/community/forums/topic/import-microsoft-sql-data-into-sandbox/
To connect to Hadoop in your C# project you can use the Hortonworks Hive ODBC driver from http://hortonworks.com/thankyou-hdp13/#addon-table. Read the PDF (which is also on that page) to see how it works (I used Hive Server Type 2 with user name sandbox)

Accessing Hive through web browser using thrift php

I ve hive installed in my ubuntu. Installed PHP5 and apache2 server as well.
Started thrift server using hive --service hiveserver .
Querying hive tables from php file in Command line interface(CLI) giving me expected results.
but from the web browser(http://localhost:10000/) i'm not able to invoke hive.
Tried googling the problem couldn't find it. please help me the solution.
Hive thrift server just provide a thrift service for hive query but not web service.
I think what you need is HWI (hive web interface). I recommend this project. We use it in production environment.