APACHE DRILL:ISSUE connecting to hive with kerbros enabled - hive

I Have a cluster which is kerbroized ,i have installed drill in another server and i am trying to use hive which is part of kerbrorized cluster .
As part of hive i have put below configuration on my drill-override.conf
drill.exec: {
security: {
# user.auth.enabled:true,
auth.mechanisms:["KERBEROS"],
auth.principal:"xxxx/xxxxxxxx",
auth.keytab:"/xxx/xxxx/drill.keytab"
drill.exec.http.ssl_enabled="true"
}
}
drill.exec:
{
cluster-id: "drillbits1",
zk.connect: "localhost:2181"
}
when i am accessing hive from drill ui ,getting below errors:
2017-04-07 12:32:48,322 [2718c667-5587-b307-58f7-b673e29b7dbf:frag:0:0] WARN o.a.d.e.s.h.schema.HiveSchemaFactory - Failure while getti
ng Hive database list.
org.apache.thrift.TException: java.util.concurrent.ExecutionException: MetaException(message:Got exception: org.apache.thrift.transport.
TTransportException null)
I have tried with drill version:1.5.0,1.10.0
Appriciate any help to resolve this issue.

The configuration you have mentioned inside drill-override.conf is for DrillClient to Drillbit connection using kerberos.
For Hive, I don't think we have tried it before, but based on some research I think you can try to add below in your Drill Hive Storage Plugin. Also make sure that you have generated a kerberos ticket on the Drillbit node using kinit command for the process user which you are using to run Drillbit. Please try and let us if it helps.
{
"type": "hive",
"enabled": true,
"configProps": {
"hive.metastore.uris": "thrift://<metastore_ip:port>",
"hive.metastore.sasl.enabled": "true",
"hive.metastore.kerberos.principal": "<metastore_kerberos_principal"
}
}

Related

Connect App Engine to Google cloud SQL fails

I'm following this guide
I'm filling the config like this:
val datasourceConfig = HikariConfig().apply {
jdbcUrl = "jdbc:mysql:///$DB_NAME"
username = DB_PASS
password = DB_USER
mapOf(
"cloudSqlInstance" to CLOUD_SQL_CONNECTION_NAME,
"socketFactory" to "com.google.cloud.sql.mysql.SocketFactory",
"ipTypes" to "PUBLIC,PRIVATE",
).forEach {
addDataSourceProperty(
it.key,
it.value
)
}
}
output of the gcloud sql instances describe project-name:
backendType: SECOND_GEN
connectionName: project-name:europe-west1:project-name-db
databaseVersion: MYSQL_5_7
failoverReplica:
available: true
gceZone: europe-west1-d
instanceType: CLOUD_SQL_INSTANCE
ipAddresses:
- ipAddress: *.*.*.*
type: PRIMARY
kind: sql#instance
name: project-name-db
project: project-name
region: europe-west1
from which I'm filling my env variables:
DB_NAME=project-name-db
CLOUD_SQL_CONNECTION_NAME=project-name:europe-west1:project-name-db
On the deployed app line val dataSource = HikariDataSource(datasourceConfig) crashes with the following exception:
com.zaxxer.hikari.pool.HikariPool$PoolInitializationException: Failed to initialize pool: Cannot connect to MySQL server on localhost:3,306.
Make sure that there is a MySQL server running on the machine/port you are trying to connect to and that the machine this software is running on is able to connect to this host/port (i.e. not firewalled). Also make sure that the server has not been started with the --skip-networking flag.
update: I've tried adding google between second and third slashes("jdbc:mysql://google/$DB_NAME"), according to this answer, now I get:
Cannot connect to MySQL server on google:3,306.
I was missing the following dependency:
implementation("com.google.cloud.sql:mysql-socket-factory-connector-j-8:1.2.2")
more info here
Also DB_NAME is not name of gcloud sql instances output, but a database name that should be created in Console -> Project -> Sql -> Databases

Error while running query on Impala with Superset

I'm trying to connect impala to superset, and when I test the connection prints: "Seems OK!", and when I try to see databases on impala with the SQL Editor in the left side it shows all databases without problems.
Preview of Databases/Tables
But when i write a query and click on "Run Query", it gives the error: "Could not start SASL: b'Error in sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Ticket expired)'"
Error running query
I'm running superset with SSL and in production mode (with Gunicorn) and Impala with SSL in a Kerberized Hadoop Cluster, and my impala database config is:
Impala Config
And in the extras I put:
{
"metadata_params": {},
"engine_params": {
"connect_args": {
"port": 21050,
"use_ssl": "True",
"ca_cert": "path/to/my/ca_cert.pem",
"auth_mechanism": "GSSAPI"
}
},
"metadata_cache_timeout": {},
"schemas_allowed_for_csv_upload": []
}
How can I solve this error? In my superset log it only shows:
Triggering query_id: 65
INFO:superset.views.core:Triggering query_id: 65
Query 65: Running query on a Celery worker
INFO:superset.views.core:Query 65: Running query on a Celery worker
Versions: Superset 0.36.0, Impyla 0.16.2
I was able to fix this error doing this steps:
1 - Created service user for celery-worker, created a kerberos ticket for him and created a crontab to renew the ticket.
2 - Runned celery worker from this service user, instead running from root.
3 - Killed an celery-worker that was running in another machine of my cluster
4 - Restarted Impala and Superset
I think this error ocurred because in some queries instead of use the celery worker in my superset machine, it was using the celery worker that was in another machine without a valid kerberos ticket. I could fix this error because when I was reading celery-worker log , it showed that a connection with the celery worker in other machine failed in a query running.

400 bad request when attempting connection to AWS Neptune with IAM enabled

I am unable to connect to neptune instance that has IAM enabled. I have followed the AWS documentation (corrected a few of my silly errors on the way) but without luck.
When I connect via my Java application using the SigV4Signer and when I use the gremlin console, I get a 400 bad request websocket error.
o.a.t.g.d.Handler$GremlinResponseHandler : Could not process the response
io.netty.handler.codec.http.websocketx.WebSocketHandshakeException: Invalid handshake response getStatus: 400 Bad Request
at io.netty.handler.codec.http.websocketx.WebSocketClientHandshaker13.verify(WebSocketClientHandshaker13.java:267)
at io.netty.handler.codec.http.websocketx.WebSocketClientHandshaker.finishHandshake(WebSocketClientHandshaker.java:302)
at org.apache.tinkerpop.gremlin.driver.handler.WebSocketClientHandler.channelRead0(WebSocketClientHandler.java:69)
When I run com.amazon.neptune.gremlin.driver.example.NeptuneGremlinSigV4Example (from my machine over port-forwarding AND from the EC2 jumphost) I get:
java.util.concurrent.TimeoutException: Timed out while waiting for an available host - check the client configuration and connectivity to the server if this message persists
I am able to connect to my neptune instance using the older deprecated certificate mechanism. I am using a jumphost ec2 instance and port-forwarding.
I believe that the SigV4 aspect is working as in the neptune audit logs I can see attempts to connect with the aws_access_key:
1584098990319, <jumphost_ip>:47390, <db_instance_ip>:8182, HTTP_GET, [unknown], [unknown], "HttpObjectAggregator$AggregatedFullHttpRequest(decodeResult: success, version: HTTP/1.1, content: CompositeByteBuf(ridx: 0, widx: 0, cap: 0, components=0)) GET /gremlin HTTP/1.1 upgrade: websocket connection: upgrade sec-websocket-key: g44zxck9hTI9cZrq05V19Q== sec-websocket-origin: http://localhost:8182 sec-websocket-version: 13 Host: localhost:8182 X-Amz-Date: 20200313T112950Z Authorization: AWS4-HMAC-SHA256 Credential=<my_access_key>/20200313/eu-west-2/neptune-db/aws4_request, SignedHeaders=host;sec-websocket-key;sec-websocket-origin;sec-websocket-version;upgrade;x-amz-date, Signature=<the_signature> content-length: 0", /gremlin
But when I look
This is the policy that I created:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"neptune-db:*"
],
"Resource": [
"arn:aws:neptune-db:eu-west-2:<my_aws_account>:*/*"
]
}
]
}
I have previously tried with a policy that references my cluster resource id.
I created a new api user with this policy attached as its only permission. (I've tried this twice).
IAM is showing my that the graph-user I created has not successfully logged in (duh).
Seems that the issue is with the IAM set-up somewhere along the line. Is it possible to get more information out of AWS with regards to why the connection attempt is failing?
I am using the most recent release of Neptune and the 3.4.3 Gremlin Driver and console. I am using Java 8 when running the NeptuneGremlinSigV4Example and building the libraries to deploy to the console.
thanks
It appears from the audit log output that the SigV4 Signature that is being created is using localhost as the Host header. This is most likely due to the fact that you're using a proxy to connect to Neptune. By default, the NeptuneGremlinSigV4Example assumes that you're connecting directly to a Neptune endpoint and reuses the endpoint as the Host header in creating the Signature.
To get around this, you can use the following example code that overrides this process and allows you to use a proxy and still sign the request properly.
https://github.com/aws-samples/amazon-neptune-samples/tree/master/gremlin/gremlin-java-client-demo
I was able to get this to work using the following.
Create an SSL tunnel from you local workstation to your EC2 jumphost:
ssh -i <key-pem-file> -L 8182:<neptune-endpoint>:8182 ec2-user#<ec2-jumphost-hostname>
Set the following environment variables:
export AWS_ACCESS_KEY_ID=<access_key>
export AWS_SECRET_ACCESS_KEY=<secret_key>
export SERVICE_REGION=<region_id> (i.e. us-west-2)
Once the tunnel is up and your environment variables are set, use the following format with the Gremlin-Java-Client-Demo:
java -jar target/gremlin-java-client-demo.jar --nlb-endpoint localhost --lb-port 8182 --neptune-endpoint <neptune-endpoint> --port 8182 --enable-ssl --enable-iam-auth

logstash and centralised redis problems

I'm trying to get logstash working in a centralised setup using the docs as an example:
http://logstash.net/docs/1.2.2/tutorials/getting-started-centralized
I've got logstash (as indexer), redis, elasticsearch and standalone kibana3 running on my web server. I then need to run logstash as an agent on another server to collect apache logs and send them to the web server via redis. The number of agents will increase and the logs will vary, but for now I just want to get this working!
I need everything to run as a service so that all is well after reboots etc. All servers are running Ubuntu.
For all logstash instances (indexer and agent), I'm using the following init script (Ubuntu version, second gist):
https://gist.github.com/shadabahmed/5486949#file-logstash-ubuntu
For running redis as a service, I followed the instructions here:
http://redis.io/topics/quickstart (Installing redis more properly)
Elasticsearch is also running as a service.
On the web server, running redis-cli returns PONG correctly. Navigating to the correct Elasticsearch URL returns the correct JSON response. Navigating to the Kibana3 url gives me the dashboard, but no data. UFW is set to allow the redis port (at the moment from everywhere).
On the web server, my logstash.conf is:
input {
file {
path => "/var/log/apache2/access.log"
type => "apache-access"
sincedb_path => "/etc/logstash/.sincedb"
}
redis {
host => "127.0.0.1"
data_type => "list"
key => "logstash"
codec => json
}
}
filter {
grok {
type => "apache-access"
pattern => "%{COMBINEDAPACHELOG}"
}
}
output {
elasticsearch {
embedded => true
}
statsd {
# Count one hit every event by response
increment => "apache.response.%{response}"
}
}
From the agent server, I can telnet successfully to the web server IP and redis port. logstash is running. The logstash.conf file is:
input {
file {
path => "/var/log/apache2/shift.access.log"
type => "apache"
sincedb_path => "/etc/logstash/since_db"
}
stdin {
type => "example"
}
}
filter {
if [type] == "apache" {
grok {
pattern => "%{COMBINEDAPACHELOG}"
}
}
}
output {
stdout { codec => rubydebug }
redis { host => ["xx.xx.xx.xx"] data_type => "list" key => "logstash" }
}
If I comment out the stdin and stdout lines, I still don't get a result. The logstash logs do not give me any connection errors - only warnings about the deprecated grok settings format.
I have also tried running logstash from the command line (making sure to stop the demonised service first). The apache log file is correctly outputted in the terminal, so I know that logstash is accessing the log correctly. And I can write random strings and they are output in the correct logstash format.
The redis logs on the web server show no sign of trouble......
The frustrating thing is that this has worked once. One message from stdin made it all the way through to elastic search. That was this morning just after getting everything setup. Since then, I have had no luck and I have no idea why!
Any tips/pointers gratefully received... Solving my problem will stop me tearing out more of my hair which will also make my wife happy......
UPDATE
Rather than filling the comments....
Thanks to #Vor and #rutter, I've confirmed that the user running logstash can read/write to the logstash.log file.
I've run the agent with -vv and the logs are populated with e.g.:
{:timestamp=>"2013-12-12T06:27:59.754000+0100", :message=>"config LogStash::Outputs::Redis/#host = [\"XX.XX.XX.XX\"]", :level=>:debug, :file=>"/opt/logstash/logstash.jar!/logstash/config/mixin.rb", :line=>"104"}
I then input random text into the terminal and get stdout results. However, I do not see anything in the logs until AFTER terminating the logstash agent. After the agent is terminated, I get lines like these in the logstash.log:
{:timestamp=>"2013-12-12T06:27:59.835000+0100", :message=>"Pipeline started", :level=>:info, :file=>"/opt/logstash/logstash.jar!/logstash/pipeline.rb", :line=>"69"}
{:timestamp=>"2013-12-12T06:29:22.429000+0100", :message=>"output received", :event=>#<LogStash::Event:0x77962b4d #cancelled=false, #data={"message"=>"test", "#timestamp"=>"2013-12-12T05:29:22.420Z", "#version"=>"1", "type"=>"example", "host"=>"Ubuntu-1204-precise-64-minimal"}>, :level=>:info, :file=>"(eval)", :line=>"16"}
{:timestamp=>"2013-12-12T06:29:22.461000+0100", :level=>:debug, :host=>"XX.XX.XX.XX", :port=>6379, :timeout=>5, :db=>0, :file=>"/opt/logstash/logstash.jar!/logstash/outputs/redis.rb", :line=>"230"}
But while I do get messages in stdout, I get nothing in redis on the other server. I can however telnet to the correct port on the other server, and I get "ping/PONG" in telnet, so redis on the other server is working..... And there are no errors etc in the redis logs.
It looks to me very much like the redis plugin on the logstash shipper agent is not working as expected, but for the life of me, I can't see where the breakdown is coming from.....

Logstash initially reads but then stops reading log files from CIFS network share

I've set up a logstash on a CentOS server to read from our production web servers IIS logs via a CIFS mount.
input {
file {
path => "/mnt/remote/server*/W3SVC1/ex*.log"
type => "w3c"
}
}
filter {
grok {
type => "w3c"
match => [ "message", "%{HOST:hostname} %{IP:hostip} %{WORD:method} %{URIPATH:request} (?:%{NOTSPACE:param}|-) %{NUMBER:port} (?:%{USER:username}|-) %{IPORHOST:clientip} %{NOTSPACE:httpver} (?:%{NOTSPACE:agent}|-) %{NOTSPACE:cookies} %{NOTSPACE:referer} %{IPORHOST:webhostname} %{NUMBER:status} %{NUMBER:time-taken}" ]
}
}
But, after initially reading an initial burst of logs, it just dies.
(The elevated data afterwards is from a different data source)
I tried a hack from Jordan from this thread, but it doesn't seem to work
tail -f /mnt/remote/server1/W3SVC1/ex130913.log | java -jar logstash.jar
We are purposely avoiding installing Java/Logstash on our front-end web servers because of security issues. So, can you think of a way to make this work?