Standalone spark cluster Authorization with Ranger

Standalone spark cluster Authorization with Ranger - apache-spark-sql

I'm working on a setup of EC2 machines that has standalone Spark cluster, Hive, Apache Ranger. Hive is integrated to Ranger.
As Ranger doesn't have support for Spark-SQL JDBC (port 10015), i tried this open source project https://github.com/yaooqinn/spark-authorizer for Spark Authorization. But didn't work as it seems to rely on yarn resource manager.
I wanted to know any possible ways to acheive authorization on Spark-sql with Apache Ranger.
We are not using any distributions implemented, so features like SPARK-LLAP in hortonworks is not an option.
I have already tried what is explained in http://mail-archives.apache.org/mod_mbox/ranger-user/201601.mbox/%3CCAC1CY9P7iek6U6VDwLEXvLdCNRTcJzk5UWg3sei1MuUMCGrtWA#mail.gmail.com%3E , but that didn't work either.
Have raised a spark jira last year for this but doesnt seem to have picked up yet. https://issues.apache.org/jira/browse/SPARK-24503
We are using Spark 2.3, Hive 2.3, Ranger 1.0.

Build a simple authentication java application to spark-sql port 10015.
package hive.test;
import java.util.Hashtable;
import javax.security.sasl.AuthenticationException;
import org.apache.hive.service.auth.PasswdAuthenticationProvider;
/*
javac -cp $HIVE_HOME/lib/hive-service-0.11-mapr.jar SampleAuthenticator.java -d .
jar cf sampleauth.jar hive
cp sampleauth.jar $HIVE_HOME/lib/.
*/
public class SampleAuthenticator implements PasswdAuthenticationProvider {
Hashtable<String, String> store = null;
public SampleAuthenticator () {
store = new Hashtable<String, String>();
store.put("user1", "passwd1");
store.put("user2", "passwd2");
}
#Override
public void Authenticate(String user, String password)
throws AuthenticationException {
String storedPasswd = store.get(user);
if (storedPasswd != null && storedPasswd.equals(password))
return;
throw new AuthenticationException("SampleAuthenticator: Error validating user");
}
}
Configure the following properties in the hive-site.xml file on each node where HiveServer2 is installed:
hive.server2.authentication CUSTOM
hive.server2.custom.authentication.class The authentication class name.
<property>
<name>hive.server2.authentication</name>
<value>CUSTOM</value>
</property>
<property>
<name>hive.server2.custom.authentication.class</name>
<value>hive.test.SampleAuthenticator</value>
</property>
Then restart Hiveserver2 to apply the changes:
reference: https://mapr.com/docs/52/Hive/HiveServer2-CustomAuth.html

Related

Cannot connect Flink to Elasticache Redis cluster - FlinkJedisClusterConfig unable to parse cport in CLUSTER NODES response

How can I use an Elasticache Redis Replication Group as a data sink in Flink for Kinesis Analytics?
I have created an Elasticache Redis Replication Group, and would like to compute something in Flink and store the results in this group.
My Java code,
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSink;
import org.apache.flink.streaming.connectors.redis.RedisSink;
import org.apache.flink.streaming.connectors.redis.common.config.FlinkJedisPoolConfig;
import java.net.InetSocketAddress;
import java.util.Set;
...
var endpoint = "foo.bar.clustercfg.usw2.cache.amazonaws.com";
var port = 6379;
var node = new InetSocketAddress(endpoint, port);
var jedisConfig = new FlinkJedisClusterConfig.Builder().setNodes(Set.of(node))
.build();
var redisMapper = new MyRedisMapper();
var redisSink = new RedisSink<>(jedisConfig, redisMapper);
This gives me the following error:
java.lang.NumberFormatException: For input string: "6379#1122"
at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.base/java.lang.Integer.parseInt(Integer.java:652)
at java.base/java.lang.Integer.valueOf(Integer.java:983)
at redis.clients.util.ClusterNodeInformationParser.getHostAndPortFromNodeLine(ClusterNodeInformationParser.java:39)
at redis.clients.util.ClusterNodeInformationParser.parse(ClusterNodeInformationParser.java:14)
at redis.clients.jedis.JedisClusterInfoCache.discoverClusterNodesAndSlots(JedisClusterInfoCache.java:50)
at redis.clients.jedis.JedisClusterConnectionHandler.initializeSlotsCache(JedisClusterConnectionHandler.java:39)
at redis.clients.jedis.JedisClusterConnectionHandler.<init>(JedisClusterConnectionHandler.java:28)
at redis.clients.jedis.JedisSlotBasedConnectionHandler.<init>(JedisSlotBasedConnectionHandler.java:21)
at redis.clients.jedis.JedisSlotBasedConnectionHandler.<init>(JedisSlotBasedConnectionHandler.java:16)
at redis.clients.jedis.BinaryJedisCluster.<init>(BinaryJedisCluster.java:39)
at redis.clients.jedis.JedisCluster.<init>(JedisCluster.java:45)
This occurs while trying to parse the response of CLUSTER NODES. The ip:port#cport is expected as part of the response (see https://redis.io/commands/cluster-nodes/) but Jedis is unable to parse this.
Am I doing something wrong here, or is this a bug in Jedis?

After a little digging I found that this is a bug which affects Jedis 2.8 and earlier when using Redis 4.0 or later. https://github.com/redis/jedis/issues/1958
My Redis cluster is running 6.2.6, and my Apache Flink is 1.13, which is old but is the newest version currently supported by AWS.
To solve this issue, I had to upgrade Jedis to the latest 2.x version so that this bug was fixed but it was still compatible with the Flink 1.13 libraries. Upgrading Jedis to a 3.x or 4.x version broke Flink.
<!-- https://mvnrepository.com/artifact/redis.clients/jedis -->
<dependency>
<groupId>redis.clients</groupId>
<artifactId>jedis</artifactId>
<version>2.10.2</version>
</dependency>

Apache ActiveMQ AMQP Spring Boot AWS

I have an ActiveMQ AWS service with protocol AMQP. AWS returns to me:
failover:(amqp+ssl://b-ca138bd4-e6c4-4596-8329-f11bebf40111-1.mq.us-east-1.amazonaws.com:5671,amqp+ssl://b-ca138bd4-e6c4-4596-8329-f11bebf40111-2.mq.us-east-1.amazonaws.com:5671)
I am trying to implement using Spring Boot the connection with that endpoint, but I have many problems. I have tried with many ways, but I can't connect to the ActiveMQ using Spring.
I have tried:
Creating many configuration Beans, like:
#Bean
fun connectionFactory(): ConnectionFactory {
val activeMQConnectionFactory = ActiveMQConnectionFactory()
activeMQConnectionFactory.brokerURL = "amqp+ssl://b-ca138bd4-e6c4-4596-8329-f11bebf40111-1.mq.us-east-1.amazonaws.com:5671"
activeMQConnectionFactory.trustedPackages = listOf("com.rappi.scaffolding")
return activeMQConnectionFactory
}
and using many dependencies like:
implementation("org.apache.activemq:activemq-spring:5.17.0")
implementation("org.springframework:spring-jms")
and
implementation("org.springframework.boot:spring-boot-starter-artemis")
But is not possible for me establish the connection. At this moment I am seeing this error:
Reason: java.io.IOException: Transport scheme NOT recognized: [amqp+ssl]
There are some example in Java or Kotlin or guide to connect me with AWS using AMQP protocol? I didn't find any in Google.
I have read that using QPid, but it not works for me.
I have found many examples using Rabbit, but not Apache ActiveMQ protocol amqp+ssl.
Finally It works using the Bean:
#Bean
fun connectionFactory(): ConnectionFactory {
return JmsConnectionFactory(
"failover:(amqps://b-ca138bd4-e6c4-4596-8329-f11bebf40111-1.mq.us-east-1.amazonaws.com:5671,amqps://b-ca138bd4-e6c4-4596-8329-f11bebf40111-2.mq.us-east-1.amazonaws.com:5671)").apply {
this.username = user
this.password = passwordAQ
}

There are many things wrong with your code and configuration.
First, the URL you're using for your client is incorrect. The amqp+ssl scheme is not valid for any client. That's the scheme used to define the connector in the ActiveMQ broker configuration.
Second, your dependencies are wrong. As far as the client goes you just need:
implementation("org.apache.qpid:qpid-jms-client:1.6.0")
Of course, if you're using Spring you'll need all the related Spring dependencies, but as far as the client itself goes this is all you need.
Third, your code is wrong. You should be using something like this:
#Bean
fun connectionFactory(): ConnectionFactory {
return new org.apache.qpid.jms.JmsConnectionFactory("failover:(amqps://b-ca138bd4-e6c4-4596-8329-f11bebf40111-1.mq.us-east-1.amazonaws.com:5671,amqps://b-ca138bd4-e6c4-4596-8329-f11bebf40111-2.mq.us-east-1.amazonaws.com:5671)");
}

Spring Cloud Task not started with Spring Cloud Stream using RabbitMQ

I am experimenting with Spring Cloud APIs as part of microservices course.
To setup server-less task, I am using Cloud Task, Cloud Stream(RabbitMQ), and Spring Web.
For this I have setup following projects:
Serverless task to be executed -
https://github.com/Omkar-Shetkar/pluralsight-springcloud-m3-task
Component to receive Http request from user and submit to RabbitMQ -
https://github.com/Omkar-Shetkar/pluralsight-springcloud-m3-taskintake
Sink component to receive TaskLaunchRequest and forward to cloud task - https://github.com/Omkar-Shetkar/pluralsight-springcloud-m3-tasksink
Having setup above components, ensured that task component is available in local maven repository.
After initiating a POST request onto /tasks in pluralsight.com.TaskController.launchTask(String) I see a HTTP response.
But, I couldn't see any update in tasklogs DB associated with serverless task.
This means, task it self is not called.
In RabbitMQ console I could see connections are established from intake and sink components but I don't see any message exchange happening.
Queue with name tasktopic is having ZERO message count.
Appreciate any pointers and suggestions on how to proceed on this to resolve this issue.
Thanks.

There were two issue with my current implementation:
In intake and sink modules -> application.properties, binding property key was wrong.
It should be:
In intake module
spring.cloud.stream.bindings.output.destination=tasktopic
In sink module
spring.cloud.stream.bindings.input.destination=tasktopic
Also, local cloud deployer versions were incompatible in sink modules pom.xml.
Updated the same to:
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-deployer-local</artifactId>
<version>1.3.0.RELEASE</version>
</dependency>
With these changes, I am able to get RabbitMQ messages.

#EnableTaskLauncher annotation is missing in TaskIntakeApplication.
#SpringBootApplication
#EnableTaskLauncher
public class PluralsightSpringcloudM3TaskintakeApplication {
public static void main(String[] args) {
SpringApplication.run(PluralsightSpringcloudM3TaskintakeApplication.class, args);
}
}

How do I connect to Neptune using Java

I have the following code based on the docs...
#Controller
#RequestMapping("neptune")
public class NeptuneEndpoint {
#GetMapping("")
#ResponseBody
public String test(){
Cluster.Builder builder = Cluster.build();
builder.addContactPoint("...endpoint...");
builder.port(8182);
Cluster cluster = builder.create();
GraphTraversalSource g = EmptyGraph.instance()
.traversal()
.withRemote(
DriverRemoteConnection.using(cluster)
);
GraphTraversal t = g.V().limit(2).valueMap();
t.forEachRemaining(
e -> System.out.println(e)
);
cluster.close();
return "Neptune Up";
}
}
But when I try to run I get ...
java.util.concurrent.TimeoutException: Timed out while waiting for an available host - check the client configuration and connectivity to the server if this message persists
Also how would I add Secret key from AWS IAM account?

Neptune doesn't allow you to connect to the db instance from your local machine. You can only connect to Neptune via an EC2 inside the same VPC as Neptune (aws documentation).
Try making a runnable jar of this code and run it inside an ec2, the code should work fine. If you're trying to debug something from your local system, then use PuTTY instance tunneling to connect to ec2 which then will be forwarded to neptune cluster.

Have you created an instance with IAM auth enabled?
If yes, you will have to sign your request using SigV4. More information (and examples) on how to connect using SigV4 is available at https://docs.aws.amazon.com/neptune/latest/userguide/iam-auth-connecting-gremlin-java.html
The examples given in the documentation above also contain information on how to use your IAM credentials to connect to a Neptune cluster.

I just had the same issue and the root cause was a dependency version conflict with Netty which is unfortunately a very pervasive dependency. Gremlin 3.3.2 uses io.netty/netty-all version 4.0.56.Final. You might find your project depends on another Netty jar such as io.netty/netty or io.netty/netty-handler both of which can cause issues so you will need to excluded them from other dependencies in your POM or use managed-dependencies to set a project level Netty version.

Another option is to use a AWS SigV4 signing proxy that acts as a bridge between Neptune and your local development environment. One of these proxies is https://github.com/monken/aws4-proxy
npm install --global aws4-proxy
# have your credentials exported as AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
aws4-proxy --service neptune-db --endpoint cluster-die4eenu.cluster-eede5pho.eu-west-1.neptune.amazonaws.com --region eu-west-1
wscat localhost:3000/gremlin

Refer this
Note: You need to be in the same VPC to access Neptune cluster.

EJB lookup problem

I have a Glassfish v2.1.1 cluster setup. I deployed an EAR file consisting a single stateless bean to stand alone server. It has an IIOP port 3752.
My client application which will be communicating with this bean is deployed on cluster. When i lookup bean's name, i get NameNotFoundException. Code looks as below:
Properties props = new Properties();
props.setProperty("java.naming.factory.initial", "com.sun.enterprise.naming.SerialInitContextFactory");
props.setProperty("java.naming.factory.url.pkgs", "com.sun.enterprise.naming");
props.setProperty("java.naming.factory.state", "com.sun.corba.ee.impl.presentation.rmi.JNDIStateFactoryImpl");
if (logger.isDebugEnabled()) {
logger.debug("Looking for bean from location : " + PropertiesService.instance().getSchedulerOrbHost() + ":"
+ PropertiesService.instance().getSchedulerOrbPort());
}
props.setProperty("org.omg.CORBA.ORBInitialHost", PropertiesService.instance().getSchedulerOrbHost());
props.setProperty("org.omg.CORBA.ORBInitialPort", PropertiesService.instance().getSchedulerOrbPort());
InitialContext context = null;
try {
context = new InitialContext(props);
} catch (NamingException e) {
e.printStackTrace();
}
String beanName = "test.OperationControllerRemote";
OperationControllerRemote remote = (OperationControllerRemote) context.lookup(beanName);
Note that i checked JNDI tree and name "test.OperationControllerRemote" is there.
Any opinions please?

Here are the ways I have got it to work with a GF 2.1.1 cluster and a Swing client. I'm currently going with the Standalone option because of client launch speed, but the ACC might work for you.
Standalone
The way you're doing it is considered standalone.
http://glassfish.java.net/javaee5/ejb/EJB_FAQ.html#StandaloneRemoteEJB
http://blogs.oracle.com/dadelhardt/entry/standalone_iiop_clients_with_glassfish
ACC
Another way to approach this is to launch the client with the ACC. This means packaging the client code into the ear as an Application Client and either launching using the JNLP method or manually installing a bundled ACC (mini glassfish really) on client machines. In GF 2.1, either way works ok, but both are pretty fat and JNLP method can make startup times a bit longer. Supposedly in GF 3.1 they've reworked the ACC and it starts up faster. Something that may not be obvious is that with the ACC you get the list of servers in the cluster provided automatically at client startup.
http://blogs.oracle.com/theaquarium/entry/java_ee_clients_with_or
http://download.oracle.com/docs/cd/E18930_01/html/821-2418/beakv.html#scrolltoc
http://download.oracle.com/docs/cd/E18930_01/html/821-2418/gkusn.html
Lookups
Either of the above ways provides RMI/CORBA failover and load balancing for the client.
Either way, when you have the right dependencies on your classpath and the com.sun.appserv.iiop.endpoints system property set (like node1:33700,node2:33701), you'll only need the no-args InitialContext because Glassfish's stuff autoregisters their connection properties, etc as described in the first link:
new InitialContext()
And lookups will work. For my remote session beans (EJB 3.0) I typically do it like:
#Stateless(mappedName="FooBean")
public class FooBean implements FooBeanRemote {}
#Remote
public interface FooBeanRemote {}
then in client code:
FooBeanRemote foo = (FooBeanRemote) ctx.lookup("FooBean");

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Standalone spark cluster Authorization with Ranger - apache-spark-sql

Related

Cannot connect Flink to Elasticache Redis cluster - FlinkJedisClusterConfig unable to parse cport in CLUSTER NODES response

Apache ActiveMQ AMQP Spring Boot AWS

Spring Cloud Task not started with Spring Cloud Stream using RabbitMQ

How do I connect to Neptune using Java

EJB lookup problem

Categories

Resources