I am planning to apply Knox to our HDP 2.2 cluster but I encountered the hiveserver2 http mode connection. I configured hive-site.xml like below:
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>
<property>
<name>fs.hdfs.impl.disable.cache</name>
<value>true</value>
</property>
<property>
<name>fs.file.impl.disable.cache</name>
<value>true</value>
</property>
<property>
<name>hive.server2.allow.user.substitution</name>
<value>true</value>
</property>
<property>
<name>hive.server2.thrift.http.path</name>
<value>cliservice</value>
<description>Path component of URL endpoint when in HTTP mode.</description>
</property>
<property>
<name>hive.server2.transport.mode</name>
<value>http</value>
</property>
<property>
<name>hive.server2.thrift.http.port</name>
<value>10010</value>
</property>
<property>
<name>hive.server2.thrift.http.min.worker.threads</name>
<value>5</value>
</property>
<property>
<name>hive.server2.thrift.http.max.worker.threads</name>
<value>100</value>
</property>
However, when I try to connect to the server via beeline, the beeline hangs. There is no exception and error messages in the log file. If I change the transport mode to "biniary", the problem is totally gone. I don't know why it happens.
What should I do for that?
Related
Use case : I am trying to set up EMR cluster which can be connected to jdbc for querying hive. I have decided to go for ldap authentication and I am able to authenticate
Problem : I am trying to restrict who can authenticate to cluster using group filter or custom filter using this document hive-ldap, when i authenticate using email it is trying to search for my email id's prefix in the ldap and it is not able to find (since my username is different from email id prefix). Do any one know how to fix it
hive-site.xml :
<property>
<name>hive.server2.authentication.ldap.baseDN</name>
<value>DC=some,DC=com</value>
</property>
<property>
<name>hive.server2.authentication.ldap.groupDNPattern</name>
<value>CN=%s,CN=something,OU=something,DC=some,DC=com</value>
</property>
<property>
<name>hive.server2.authentication.ldap.groupFilter</name>
<value>group_name</value>
</property>
<property>
<name>hive.server2.authentication.ldap.userMembershipKey</name>
<value>memberOf</value>
</property>
<property>
<name>hive.server2.authentication.ldap.groupClassKey</name>
<value>groupOfNames</value>
</property>
<property>
<name>hive.server2.authentication.ldap.groupMembershipKey</name>
<value>member</value>
</property>
<property>
<name>hive.server2.authentication.ldap.binddn</name>
<value>CN=something,OU=something,DC=some,DC=com</value>
</property>
<property>
<name>hive.server2.authentication.ldap.bindpw</name>
<value>bind_password</value>
</property>
<property>
<name>hive.server2.authentication</name>
<value>LDAP</value>
</property>
<property>
<name>hive.server2.transport.mode</name>
<value>binary</value>
</property>
<property>
<name>hive.users.in.admin.role</name>
<value>admin_user</value>
</property>
<property>
<name>hive.server2.authentication.ldap.url</name>
<value>ldaps://ldap.some.com:1234</value>
</property>
Error message
2021-07-16T22:24:26,405 INFO [HiveServer2-Handler-Pool: Thread-37([])]: ldap.LdapSearch (LdapSearch.java:findUserDn(100)) - Expected exactly one user result for the user: first_name.last_name, but got 0. Returning null
2021-07-16T22:24:26,406 ERROR [HiveServer2-Handler-Pool: Thread-37([])]: transport.TSaslTransport (TSaslTransport.java:open(315)) - SASL negotiation failure
beeline -u "jdbc:hive2://namenode:10000/default"
Connecting to jdbc:hive2://namenode:10000/default
19/05/11 17:21:52 [main]: WARN jdbc.HiveConnection: Failed to connect to namenode:10000
Unknown HS2 problem when communicating with Thrift server.
Error: Could not open client transport with JDBC Uri: jdbc:hive2://namenode:10000/default: java.net.SocketException: Connection reset (state=08S01,code=0)
Beeline version 3.1.0 by Apache Hive
hive-site.xml
javax.jdo.option.ConnectionURL
jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true
<property>
<name>metastore.warehouse.dir</name>
<value>hdfs://namenode:9820/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<property>
<name>metastore.schema.verification</name>
<value>true</value>
</property>
<property>
<name>metastore.hmshandler.retry.attemp</name>
<value>10</value>
<description>The number of times to retry a HMSHandler call if there were a connection error.</description>
</property>
<property>
<name>metastore.thrift.uris</name>
<value>thrift://namenode:9083</value>
</property>
<property>
<name>metastore.thrift.port</name>
<value>9083</value>
</property>
<property>
<name>hive.server2.transport.mode</name>
<value>binary</value>
<description>The server transport mode. The value can be binary or http. Set to http to enable HTTP transport mode.
</description>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
<description>TCP port number to listen on</description>
</property>
<property>
<name>hiver.server2.thrift.bind.host</name>
<value>namenode</value>
<description>TCP interface to bind to</description>
</property>
<property>
<name>hive.server2.thrift.http.port</name>
<value>100002</value>
<description>HTTP port number to listen on</description>
</property>
<property>
<name>hive.server2.thrift.http.min.worker.threads</name>
<value>5</value>
<description>Maximum worker threads in the server pool</description>
</property>
<property>
<name>hive.server2.thrift.http.max.worker.threads</name>
<value>500</value>
<description>Maximum worker threads in the server pool</description>
</property>
<property>
<name>hive.server2.thrift.min.worker.threads</name>
<value>5</value>
<description>Minimum number of worker threads</description>
</property>
<property>
<name>hive.server2.thrift.max.worker.threads</name>
<value>500</value>
<description>Maximum number of worker threads</description>
</property>
<property>
<name>hive.server2.authentication</name>
<value>NOSASL</value>
<description>
Expects one of [nosasl, none, ldap, kerberos, pam, custom].
Client authentication types.
NONE: no authentication check
LDAP: LDAP/AD based authentication
KERBEROS: Kerberos/GSSAPI authentication
CUSTOM: Custom authentication provider
(Use with property hive.server2.custom.authentication.class)
PAM: Pluggable authentication module
NOSASL: Raw transport
</description>
</property>
<property>
<name>hive.server2.thrift.http.path</name>
<value>cliservice</value>
<description>The path component of the URL endpoint when in HTTP mode</description>
</property>
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
<description>
Should metastore do authorization against database notification related APIs such as get_next_notification.
If set to true, then only the superusers in proxy settings have the permission
</description>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>hdfs://namenode:9820/tmp/hive</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission.</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/tmp/${user.name}</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/tmp/${user.name}_resources</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.scratch.dir.permission</name>
<value>733</value>
<description>The permission for the user specific scratch directories that get created.</description>
</property>
Hello all,
hive server is listening on port 10000(tcp) and 10002(http).
If I hit, beeline -u jdbc:hive2://
It works, but when I try to access from host-ip or hostname,
It shows above errors. Anyone have idea?
I have my own s3 running locally instead of aws s3. Is there a way to overwrite s3.amazonaws.com?
I have created hive-site.xml and put it in ${HIVE_HOME}/conf/.
This is what I have got in .xml:
<configuration>
<property>
<name>fs.s3n.impl</name>
<value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
</property>
<property>
<name>fs.s3n.endpoint</name>
<value>local_s3_ip:port</value>
</property>
<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>VALUE</value>
</property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>VALUE</value>
</property>
Now I want to create table and if I put:
LOCATION('s3n://hive/sample_data.csv')
I have an error:
org.apache.hadoop.hive.ql.exec.DDLTask. java.net.UnknownHostException: hive.s3.amazonaws.com: Temporary failure in name resolution
It doesn't work neither for s3 nor s3n.
Is it possible to overwrite default s3.amazonaws.com and use my own s3?
Switch to the S3A Connector (and Hadoop 2.7+ JARs)
set "fs.s3a.endpoint" to the hostname of your server
and "fs.s3a.path.style.access" = true (rather than expect every bucket to have DNS)
Expect to spend time working on authentication options as signing is always a troublespot in third-party stores.
With this configuration I am able to reach my own s3 endpoint.
<configuration>
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value> <ip>:<port> </value>
</property>
<property>
<name>fs.s3a.path.style.access</name>
<value>true</value>
</property>
<property>
<name>fs.s3a.access.key</name>
<value> <ak> </value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value> <sk> </value>
</property>
<property>
<name>fs.s3a.awsAccessKeyId</name>
<value> <ak> </value>
</property>
<property>
<name>fs.s3a.awsSecretAccessKey</name>
<value> <sk> </value>
</property>
<property>
<name>fs.s3a.connection.ssl.enabled</name>
<value>false</value>
</property>
i am trying to setup a multi node hadoop 2.x cluster in Virtual machine,after the setup and configurations,while i try to start the cluster,node manager is not getting started in slave nodes,all other daemons are get started in the cluster,can anyone help me to resolve this issue.
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hduser/hadoop-2.6.0/data/nnode</value>
</property>
<property>
<name>dfs.datanode.name.dir</name>
<value>/home/hduser/hadop-2.6.0/data/dnode</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/home/hduser/exclude</value>
</property>
<property>
<name>dfs.hosts.include</name>
<value>/home/hduser/include</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.nodes.include-path</name>
<value>/home/hduser/include</value>
</property>
<property>
<name>yarn.resourcemanager.nodes.exclude-path</name>
<value>/home/hduser/exclude</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
</configuration>
I see that in your configuration:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
Node Manager is a component of YARN, on startup, this component registers with the RM. If "master" is not resolved to a valid ip address the Node Manager can't contact the resource manager.
Status is not pulled by the Resource Manager, instead is pushed by the datanodes to RM.
I think you should investigate this.
I recently upgraded our cluster's HiveServer to HiveServer2. I also set up the Hive Metastore (in remote mode) and moved away from embedded mode (which we were previously running).
I want to test that things are properly configured and that the metatdata is acutally being stored in the remote metastore. What would be the easiest way to do this? Are their certain logs I could check to verify this behavior?
I am afraid things are not configured correctly, and I am still running my metastore in local mode, as when I query the postgresql database on the machine hosting the metastore, there are no rows in the metastore DB (despite the fact that I have created test tables through beeline).
It might be worth mentioning is that the end goal of this is to be able to query data stored in HDFS via SparkSQL. Do I need HiveServer2 to accomplish this? Apologies, I am new to a lot of this technology.
Here is my hive-site.xml:
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://w7/metastore</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.postgresql.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://w7:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>hive.warehouse.subdir.inherit.perms</name>
<value>true</value>
</property>
<property>
<name>hive.zookeeper.quorum</name>
<value>mn</value>
</property>
<property>
<name>hive.zookeeper.client.port</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>mn</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hive.zookeeper.namespace</name>
<value>hive_zookeeper_namespace_hive</value>
</property>
<property>
<name>hive.cluster.delegation.token.store.class</name>
<value>org.apache.hadoop.hive.thrift.MemoryTokenStore</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>
<property>
<name>hive.server2.use.SSL</name>
<value>false</value>
</property>
<property>
<name>hive.support.concurrency</name>
<description>Enable Hive's Table Lock Manager Service</description>
<value>true</value>
</property>
</configuration>