i want to write stream data to accumulo!. There is any API for accumulo to write data. It is possible in python instead of java?
See the BatchWriter with you instantiate via Connector. The Accumulo Thrift Proxy enables non-Java clients to interact with Accumulo.
Related
For my current project, I am working with Kafka (python) and wanted to know if there is any method by which I can send the streaming Kafka data to the AWS S3 bucket(without using Confluent). I am getting my source data from Reddit API.
I even wanted to know whether Kafka+s3 is a good combination for storing the data which will be processed using pyspark or I should skip the s3 step and directly read data from Kafka.
Kafka S3 Connector doesn't require "using Confluent". It's completely free, open source and works with any Apache Kafka cluster.
Otherwise, sure, Spark or plain Kafka Python consumer can write events to S3, but you've not clearly explained what happens when data is in S3, so maybe start with processing the data directly from Kafka
Also if I have authorisation header fields-> Where and how do I give them?? Is there any tool for this?
I have done local host "ws://localhost:8080/socket" using
If you are using the "WebSockets Samplers" plugin with JMeter, you can open multiple websocket connections, but not at the same time on the same thread (JMeter user). If you are sure you need that feature, there is an experimental build of the plugin that supports multiple connections on one thread.
You can add authorisation fields by using the standard JMeter HeaderManager.
Is it possible to access contents of spark.sql output via JDBC, like %sql interpreter does?
You just need to setup Spark's thriftserver as described in the Spark's documentation. Zeppelin is just consumer of the Spark's execution, and doesn't expose data itself.
If you really need to extract specific information from paragraph, you can use Notebook API.
Why do Apache Hive needs Apache Thrift? On the Thrift's site it says that it can compile in multiple languages, but I can't understand where does it fits and why do Hive need it.
Thanks
Cited from safaribooksonline:
Chapter 16. Hive Thrift Service
Hive has an optional component known as HiveServer or HiveThrift that
allows access to Hive over a single port. Thrift is a software
framework for scalable cross-language services development. See
http://thrift.apache.org/ for more details. Thrift allows clients
using languages including Java, C++, Ruby, and many others, to
programmatically access Hive remotely.
The CLI is the most common way to access Hive. However, the design of
the CLI can make it difficult to use programmatically. The CLI is a
fat client; it requires a local copy of all the Hive components and
configuration as well as a copy of a Hadoop client and its
configuration. Additionally, it works as an HDFS client, a MapReduce
client, and a JDBC client (to access the metastore). Even with the
proper client installation, having all of the correct network access
can be difficult, especially across subnets or datacenters.
Couldn't have said it better. Emphasis mine.
https://cwiki.apache.org/confluence/display/Hive/HiveServer
HiveServer is an optional service that allows a remote client to submit requests to Hive, using a variety of programming languages, and retrieve results. HiveServer is built on Apache ThriftTM (http://thrift.apache.org/), therefore it is sometimes called the Thrift server although this can lead to confusion because a newer service named HiveServer2 is also built on Thrift.
For more details on how to connect to hive server(thrift server) see the link above.
what is difference between org.apache.hive.jdbc.HiveDriver and org.apache.hadoop.hive.jdbc.HiveDriver ?
Which one to use to Write a JDBC Client to connect to hive ?
Hive 0.11 includes a new JDBC driver that works with HiveServer2, enabling users to write JDBC applications against Hive. The application needs to use the JDBC driver class and specify the network address and port in the connection URL in order to connect to Hive.
HiveServer2 (HIVE-2935), brings concurrency, authentication, and a foundation for authorization to Hive
HiveServer2 is an improved version of HiveServer that supports Kerberos authentication and multi-client concurrency and use the driver "org.apache.hive.jdbc.HiveDriver"
HiveServer1 or Thrift server cannot handle concurrent requests from more than one client. This is actually a limitation imposed by the Thrift interface that HiveServer exports, and can't be resolved by modifying the HiveServer code. The Driver which Hive Server "org.apache.hadoop.hive.jdbc.HiveDriver"
Please find the links which will help you to understand more.
org.apache.hadoop.hive.jdbc.HiveDriver and org.apache.hive.jdbc.HiveDriver
To work with, it depend on ur requirement which version you are having and how you have done Hive Configuration.