Oozie Java Action access to Hive Server 2(Kerberized) using delegation token - hive

Currently I am having an issue really need some help.
We are trying kerberize our hadoop cluster including hive server2 and oozie. My oozie job spins off a java action in data node which tries to connect to kerberized hive server 2.
There is no user’s kerberos keytab for authentication. So I can only use delegation token passed by oozie in the java action to connect to hive server 2.
My question is: is there any way that I can use delegation token in a oozie java action to connect to hive server 2? If so, how can I do it through hive JDBC?
Thanks
Jary

When using Oozie in a kerberized cluster...
for a "Hive" or "Pig" Action, you must configure <credentials> of
type HCat
for a "Hive2" Action (just released with V4.2) you must configure
<credentials> of type Hive2
for a "Java" action opening a custom JDBC connection to HiveServer2,
I fear that Oozie cannot help -- unless there is an undocumented hack that would make it possible to reuse this new Hive2 credential?!?
Reference: Oozie documentation about Kerberos credentials
AFAIK you cannot use Hadoop delegation tokens with HiveServer2. HS2 uses Thrift for managing client connections, and Thrift supports Kerberos; but the Hadoop delegation tokens are something different (Kerberos was never intended for distributed computing, a workaround was needed)
What you can do is ship a full set of GSSAPI configuration, including a keytab, in your "Java" Action. It works, but there are a number of caveats:
the Hadoop Auth library seems to be hard-wired on the local ticket
cache in a very lame way; if you must connect to both HDFS and
HiveServer2, then do HDFS first, because as soon as JDBC initiates
its own ticket based on your custom conf, the Hadoop Auth will be broken
Kerberos configuration is tricky, GSSAPI config is worse, and since
these are security features the error messages are not very
helpful, by design (would be bad taste to tell hackers why their intrusion
attempt was rejected)
use OpenJDK if possible; by default the Sun/Oracle JVM has
limitations on cryptography (because of silly and obsolete US
exports policies) so you must download 2 JARs with "unlimited
strength" crypto settings to replace the default ones
Reference: another StackOverflow post that I found really helpful to set up "raw" Kerberos authentication when connecting to HiveServer2; plus a link about a very helpful "trace flag" for debugging your GSSAPI config e.g.
-Djava.security.debug=gssloginconfig,configfile,configparser,logincontext
Final warning: Kerberos is black magic. It will suck your soul away. More prosaically, it will have you lose many man-days to cryptic config issues, and team morale will suffer. We've been there.

Like Samson said Java action in Oozie requires additional authentication to connect to some "kerberized" services like hive. It can be achieved in a relativly simple way, without modifications in application.
Oozie action
<action name="java-action">
<java>
...
<main-class>some.App</main-class>
<java-opts>-Djavax.security.auth.useSubjectCredsOnly=true -Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=jaas.conf</java-opts>
<file>hdfs://some/path/App.jar</file>
<file>hdfs://some/path/user.keytab</file>
<file>hdfs://some/path/jaas.conf</file>
</java>
...
</action>
jaas.conf
com.sun.security.jgss.initiate {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
useTicketCache=true
principal="USER#EXAMPLE.COM"
doNotPrompt=true
keyTab="user.keytab"
};

Related

Issue with Dspring.cloud.bootstrap.enabled on enable cloud config with master 2.4 version

i upgraded my spring boot application to master pom 2.4 version and using cloud configs with the property enabled spring.cloud.bootstrap.enabled = true, I have db password encrypted in cloud properties so by the time i use the db properties i don't have my encryption framework available, so eventually my application failing with invalid username and password .."i have my own encryption service "
i am looking to see load the cloud config properties after i have my encryption service available, but the spring.cloud.bootstrap.enabled makes it load first on application startup, before i upgrade to master pom, i was not using spring.cloud.bootstrap.enabled so i didn't had any issue, with adding the property the order of loading changed, so i am running into an issue. Any help will be greatly appreciated. Thanks
so by the time i use the db properties i don't have my encryption
framework available
Use #DependsOn annotation in the bean that uses the db properties to depend on the encryption framework.

How to do kerberos authentication on a flink standalone installation?

I have a standalone Flink installation on top of which I want to run a streaming job that is writing data into a HDFS installation. The HDFS installation is part of a Cloudera deployment and requires Kerberos authentication in order to read and write the HDFS. Since I found no documentation on how to make Flink connect with a Kerberos-protected HDFS I had to make some educated guesses about the procedure. Here is what I did so far:
I created a keytab file for my user.
In my Flink job, I added the following code:
UserGroupInformation.loginUserFromKeytab("myusername", "/path/to/keytab");
Finally I am using a TextOutputFormatto write data to the HDFS.
When I run the job, I'm getting the following error:
org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBE
ROS]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1730)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1668)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1593)
at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397)
at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786)
at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:405)
For some odd reason, Flink seems to try SIMPLE authentication, even though I called loginUserFromKeytab. I found another similar issue on Stackoverflow (Error with Kerberos authentication when executing Flink example code on YARN cluster (Cloudera)) which had an answer explaining that:
Standalone Flink currently only supports accessing Kerberos secured HDFS if the user is authenticated on all worker nodes.
That may mean that I have to do some authentication at the OS level e.g. with kinit. Since my knowledge of Kerberos is very limited I have no idea how I would do it. Also I would like to understand how the program running after kinit actually knows which Kerberos ticket to pick from the local cache when there is no configuration whatsoever regarding this.
I'm not a Flink user, but based on what I've seen with Spark & friends, my guess is that "Authenticated on all worker nodes" means that each worker process has
a core-site.xml config available on local fs with
hadoop.security.authentication set to kerberos (among other
things)
the local dir containing core-site.xml added to the CLASSPATH so that it is found automatically by the Hadoop Configuration object [it will revert silently to default hard-coded values otherwise, duh]
implicit authentication via kinit and the default cache [TGT set globally for the Linux account, impacts all processes, duh] ## or ## implicit authentication via kinit and a "private" cache set thru KRB5CCNAME env variable (Hadoop supports only "FILE:" type) ## or ## explicit authentication via UserGroupInformation.loginUserFromKeytab() and a keytab available on the local fs
That UGI "login" method is incredibly verbose, so if it was indeed called before Flink tries to initiate the HDFS client from the Configuration, you will notice. On the other hand, if you don't see the verbose stuff, then your attempt to create a private Kerberos TGT is bypassed by Flink, and you have to find a way to bypass Flink :-/
You can also configure your stand alone cluster to handle authentication for you without additional code in your jobs.
Export HADOOP_CONF_DIR and point it to directory where core-site.xml and hdfs-site.xml is located
Add to flink-conf.yml
security.kerberos.login.use-ticket-cache: false
security.kerberos.login.keytab: <path to keytab>
security.kerberos.login.principal: <principal>
env.java.opts: -Djava.security.krb5.conf=<path to krb5 conf>
Add pre-bundled Hadoop to lib directory of your cluster https://flink.apache.org/downloads.html
The only dependencies you should need in your jobs is:
compile "org.apache.flink:flink-java:$flinkVersion"
compile "org.apache.flink:flink-clients_2.11:$flinkVersion"
compile 'org.apache.hadoop:hadoop-hdfs:$hadoopVersion'
compile 'org.apache.hadoop:hadoop-client:$hadoopVersion'
In order to access a secured HDFS or HBase installation from a standalone Flink installation, you have to do the following:
Log into the server running the JobManager, authenticate against Kerberos using kinit and start the JobManager (without logging out or switching the user in between).
Log into each server running a TaskManager, authenticate again using kinit and start the TaskManager (again, with the same user).
Log into the server from where you want to start your streaming job (often, its the same machine running the JobManager), log into Kerberos (with kinit) and start your job with /bin/flink run.
In my understanding, kinit is logging in the current user and creating a file somewhere in /tmp with some login data. The mostly static class UserGroupInformation is looking up that file with the login data when its loaded the first time. If the current user is authenticated with Kerberos, the information is used to authenticate against HDFS.

How to submit code to a remote Spark cluster from IntelliJ IDEA

I have two clusters, one in local virtual machine another in remote cloud. Both clusters in Standalone mode.
My Environment:
Scala: 2.10.4
Spark: 1.5.1
JDK: 1.8.40
OS: CentOS Linux release 7.1.1503 (Core)
The local cluster:
Spark Master: spark://local1:7077
The remote cluster:
Spark Master: spark://remote1:7077
I want to finish this:
Write codes(just simple word-count) in IntelliJ IDEA locally(on my laptp), and set the Spark Master URL to spark://local1:7077 and spark://remote1:7077, then run my codes in IntelliJ IDEA. That is, I don't want to use spark-submit to submit a job.
But I got some problem:
When I use the local cluster, everything goes well. Run codes in IntelliJ IDEA or use spark-submit can submit job to cluster and can finish the job.
But When I use the remote cluster, I got a warning log:
TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
It is sufficient resources not sufficient memory!
And this log keep printing, no further actions. Both spark-submit and run codes in IntelliJ IDEA result the same.
I want to know:
Is it possible to submit codes from IntelliJ IDEA to remote cluster?
If it's OK, does it need configuration?
What are the possible reasons that can cause my problem?
How can I handle this problem?
Thanks a lot!
Update
There is a similar question here, but I think my scene is different. When I run my codes in IntelliJ IDEA, and set Spark Master to local virtual machine cluster, it works. But I got Initial job has not accepted any resources;... warning instead.
I want to know whether the security policy or fireworks can cause this?
Submitting code programatically (e.g. via SparkSubmit) is quite tricky. At the least there is a variety of environment settings and considerations -handled by the spark-submit script - that are quite difficult to replicate within a scala program. I am still uncertain of how to achieve it: and there have been a number of long running threads within the spark developer community on the topic.
My answer here is about a portion of your post: specifically the
TaskSchedulerImpl: Initial job has not accepted any resources; check
your cluster UI to ensure that workers are registered and have
sufficient resources
The reason is typically there were a mismatch on the requested memory and/or number of cores from your job versus what were available on the cluster. Possibly when submitting from IJ the
$SPARK_HOME/conf/spark-defaults.conf
were not properly matching the parameters required for your task on the existing cluster. You may need to update:
spark.driver.memory 4g
spark.executor.memory 8g
spark.executor.cores 8
You can check the spark ui on port 8080 to verify that the parameters you requested are actually available on the cluster.

Read-only web console access in ActiveMQ

I'm using ActiveMQ 5.10 and would like to create a user that has read-only access through the web console.
Red Hat published this article, mentioning that it's not really read only due to a bug in ActiveMQ.
According to the bug report AMQ-4567, the bug is fixed as of ActiveMQ 5.9. However, I'm not seeing it work appropriately.
I have tried a number of different configurations, with the most recent being two separate JAAS implementations, one for Jetty and one for ActiveMQ. The relevant property files are excerpted below.
I can mostly log in to the web console using the "system" user. But the guest user doesn't work at all. The application user (appuser) doesn't need access to the web console at all.
My authN/authZ needs are pretty trivial: one admin user, one application account, and one read-only monitoring account.
Is there any good way to get this working with a recent version of ActiveMQ (>= 5.9.0)?
groups.properties
admins=system
users=appuser,admin
guests=guest
users.properties
system={password redacted}
appuser=appuser
guest=guest
jetty-realm.properties
system: MD5:46cf1b5451345f5176cd70713e0c9e07,user,admin
guest: guest,guest
As an aside, I used the Jetty tutorial and the Rundeck instructions to figure out the jetty-realm.properties file and chapter 6 of ActiveMQ in Action to work out the ActiveMQ JAAS.
I was finally able to get to what I wanted by deploying the web console to an external Tomcat instance. I assume that when it runs out of process, it can't bypass security and so has to use whatever credentials you provide. In this case, I gave the Tomcat instance the read-only JMX user credentials.
It's not great, as there is no security trimmed UI. You can still attempt to create new destinations, delete destinations, etc. When you try with a read-only user, you get an error. That gets a "D" for UX, but a "B" for security.

How to use java.util.logging in Weblogic?

I have an application that was migrated from Glassfish to Weblogic, and it uses java.util.logging as logging framework.
The only way I have found to make the logs work is by editing the logging.properties file of the JVM and restart the server. This solution is awkward and gives problems because the log is written to a different file than the standard ones for weblogic, so we have to look at too many files for a log in a clustered environment. Besides, for some reason this does not work on some Windows systems.
Is there a way to keep using standard java logging to write messages to weblogic's standard log files? I tried the instructions on this page but it doesn't work either.
WebLogic Server ships with a JDK logging handler which will pick up log messages emitted from JDK logging framework and direct them into the WebLogic Server logging system.
Set the default logging level for new ServerLoggingHandler instances in logging.properties as well as adding the ServerLoggingHandler to the handlers.
handlers = weblogic.logging.ServerLoggingHandler
weblogic.logging.ServerLoggingHandler.level = ALL
http://docs.oracle.com/cd/E14571_01/web.1111/e13739/logging_services.htm#CHDBBEIJ
To direct the JDK logging framework to use the logging.properties file, the standard System property java.util.logging.config.file is used. With WebLogic Server, this can be easily accomplished by setting the JAVA_OPTIONS System property with the corresponding value.
$ export JAVA_OPTIONS="-Djava.util.logging.config.file=/Users/xxx/Projects/Domains/wls1035/logging.properties"
Some more hints here: http://buttso.blogspot.de/2011/06/using-slf4j-with-weblogic-server.html