I believe Apache Hive can be embedded in Java programs. Can somebody please direct me to the page where "Embedded Hive" can be downloaded? I need to embed Hive to be able to run it on Windows, which is where I am developing my application. Further instructions for embedding and code samples will also be useful.
Hive supports embedded mode only in the sense that the RDBMS which stores the meta information for the Hive tables can run locally or on a stand alone server (ref https://cwiki.apache.org/confluence/display/Hive/HiveClient ). Furthermore, hive with it's accompanying database is merely an orchestrator for a string of MapReduce jobs, which requires the Hadoop framework to be running as well.
Use the following class org.apache.hadoop.hive.service.HiveServer.HiveServerHandler make sure that hive/conf is on the classpath. And all the hive jars from hive/lib. This embedded client needs to be run from the same machine where your hive is installed. If your hive-site.xml is using derby than the embedded client will create a .metastore folder, if your hive-site.xml is using standalone db, then the embedded client will communicate with the db directly so make sure it is running.
Related
I have used in development a Text File Output step in Pentaho 8 CE.
While using the client local installation, output files are written normally to file system.
When I installed the Pentaho Server 8 CE and configured everything to use MySQL as repository, I noticed that system was not writing files to file system.
I suspect that this is because Jackrabbit has been configured to use MySQL as repository following official documentation (https://help.pentaho.com/Documentation/8.2/Setup/Installation/Archive/MySQL_Repository)
Is it possible to configure Jackrabbit so all files use the filesystem?
If so, where in the documentation is this process documented?
Is there any alternative step which forces using local file system?
Quick answers:
No.
See above.
The files can be written, but most likely they’re inside your pentaho-solutions folder inside the server installation. You must use absolute paths when running from the Pentaho repository.
I want to use Pentaho for my work. After a bit of research I found that to store the ktr/kjb files I can have either database as a repository or I can use file system as a repository. However I don't find any benefits of using database as a repository over file system. The basic purpose of repository here is to create a common location where I can keep all the developed ktr/kjb files in production environment. Basically if I consider the database repository, it will hold all the developed ktr/kjb files in production and every time I need to run a job/transformation I will connect to database to get the respective ktr/kjb file (similar to how informatica stores transformation) on the other hand file based repository will be like a folder holding all the developed files.
Can somebody here will be able to explain pros and cons of both type of repository?
Please let me know if you need any other information.
Thanks in advance.
When several people develop on the same jobs/transformations, the database repository will hold the changes, and ensure the latest versions.
The pros of a filesystem is of course ease of backup, no database connection that can trouble you, and the possibility to use other, more modern and mature version control systems for the files, than the database repositories use.
If you are using the free community edition, I would definitely go with the file repository, along with external file-based version control and migration systems. If you are using the enterprise edition, then you might want to consider the database repository, since you can then use Pentaho's built-in version control and migration systems.
I have set up a single node cluster of Hadoop 2.6 but i need to integrate zookeeper and hbase with it.
I am a beginner with no prior experience in big data tools.
How do you set up zookeeper to coordinate hadoop cluster and how do we use hbase over hdfs?
How do they combine to make a ecosystem?
For standalone mode, just follow the steps provided in this HBase guide:http://hbase.apache.org/book.html#quickstart
HBase has a standalone mode that makes it easy for starters to get going. In standalone mode hbase,hdfs, and zk runs in single JVM process.
It depends on the kind of system that you want to build. As you said, the Hadoop ecosystem is made my three major components: HBase, HDFS, and zookeeper. Although they can be installed independently from each other, sometimes there is no need to install them all depending on the kind of cluster that you want to setup.
Since you are using a single node cluster, there are two HBase run modes that you can choose: the standalone mode and the pseudo-distributed mode. In the standalone mode there is no need to install HDFS or Zookeeper. HBase will do everything in a transparent way. If you want to use the pseudo-distributed mode, you can run HBase against the local filesystem or against HDFS. If you want to use HDFS, you'll have to install Hadoop. About the zookeeper, again, the HBase will also do the job by itself (you just need to tell him that through the configuration files).
In case you want to use HDFS in the pseudo-distributed mode, downloading Hadoop you will get HDFS and MapReduce. If you don't want to execute MapReduce jobs, just ignore its tools.
If you want to learn more, I think that this guide explains it all very well: https://hbase.apache.org/book.html (check the HBase run modes).
I want to configure Python/Jython in IBM BPM, so that these files can directly executed from process app. How can I do that?
How to setup this entry in WebSphere Application Server?
Why do you need to install python or jython on IBM BPM , if you need it to make deploy using WAS command line , there're commands that not related to jython or python and can do the same.
I don't believe that IBM BPM Standard really handles this use case (although more details would help). It is possible it maybe part of the "Advanced" offering, but I'm not as familiar with the integration designer product.
IBM BPM Standard allows you to call java code directly either as LiveConnect (bad) or executing java code you place in JAR files in the server files of your Process App (good). I have seen this used to leverage the Java ability to call command line scripts in order to issue some of the WASAdmin scripts, but that has been the limit of the integration with jython that I have seen.
For details about creating Java connectors you can use this article - http://www.ibm.com/developerworks/bpm/bpmjournal/1206_olivieri/1206_olivieri.html. While it says 7.5.1 the approach works for TeamWorks 7 through IBM BPM 8.5.5
Can you give more details about the use case you are trying to meet with this technical approach?
You can call any system process api/command using java, java code can be called by using as jar libraries.
System process api/command can execute python or any other code.
IBM BPM > Jar libs > System (OS) Process API/Commands > Python
I haven't come across any such use case.
I am new to Cloudbees and have been trying to find out how I can run an existing Jboss Portal Server based application which we run in our locally hosted CI in Cloudbees infrastructure.
Our stack has the following components
JDK 1.6 JBoss
Portal Server (EPP 4.3)
Oracle Express Edition (XE)
Would appreciate any help from the community to ensure that I dont discard the option of running Jenkins in the cloud on the Cloudbees platform without proper research.
You will have to setup your build job to install and start the adequate runtime
JDK 6 is available as part of CloudBees runtimes, you can then use /private repository to store EPP 4.3 as a zip and expand to /tmp during a pre-build step
Same principle applies to your database, but I'm not sure you can install Oracle XE without user interaction and without being root. I remember doing this myself some years ago on ubuntu and was not as trivial as just unzipping a binary distro.
Is your code tied to this DB ? Or are you using some DB abstraction layer that you could use to test using another DB runtime (mysql / postgres) ?