I have my own s3 running locally instead of aws s3. Is there a way to overwrite s3.amazonaws.com?
I have created hive-site.xml and put it in ${HIVE_HOME}/conf/.
This is what I have got in .xml:
<configuration>
<property>
<name>fs.s3n.impl</name>
<value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
</property>
<property>
<name>fs.s3n.endpoint</name>
<value>local_s3_ip:port</value>
</property>
<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>VALUE</value>
</property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>VALUE</value>
</property>
Now I want to create table and if I put:
LOCATION('s3n://hive/sample_data.csv')
I have an error:
org.apache.hadoop.hive.ql.exec.DDLTask. java.net.UnknownHostException: hive.s3.amazonaws.com: Temporary failure in name resolution
It doesn't work neither for s3 nor s3n.
Is it possible to overwrite default s3.amazonaws.com and use my own s3?
Switch to the S3A Connector (and Hadoop 2.7+ JARs)
set "fs.s3a.endpoint" to the hostname of your server
and "fs.s3a.path.style.access" = true (rather than expect every bucket to have DNS)
Expect to spend time working on authentication options as signing is always a troublespot in third-party stores.
With this configuration I am able to reach my own s3 endpoint.
<configuration>
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value> <ip>:<port> </value>
</property>
<property>
<name>fs.s3a.path.style.access</name>
<value>true</value>
</property>
<property>
<name>fs.s3a.access.key</name>
<value> <ak> </value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value> <sk> </value>
</property>
<property>
<name>fs.s3a.awsAccessKeyId</name>
<value> <ak> </value>
</property>
<property>
<name>fs.s3a.awsSecretAccessKey</name>
<value> <sk> </value>
</property>
<property>
<name>fs.s3a.connection.ssl.enabled</name>
<value>false</value>
</property>
Related
Use case : I am trying to set up EMR cluster which can be connected to jdbc for querying hive. I have decided to go for ldap authentication and I am able to authenticate
Problem : I am trying to restrict who can authenticate to cluster using group filter or custom filter using this document hive-ldap, when i authenticate using email it is trying to search for my email id's prefix in the ldap and it is not able to find (since my username is different from email id prefix). Do any one know how to fix it
hive-site.xml :
<property>
<name>hive.server2.authentication.ldap.baseDN</name>
<value>DC=some,DC=com</value>
</property>
<property>
<name>hive.server2.authentication.ldap.groupDNPattern</name>
<value>CN=%s,CN=something,OU=something,DC=some,DC=com</value>
</property>
<property>
<name>hive.server2.authentication.ldap.groupFilter</name>
<value>group_name</value>
</property>
<property>
<name>hive.server2.authentication.ldap.userMembershipKey</name>
<value>memberOf</value>
</property>
<property>
<name>hive.server2.authentication.ldap.groupClassKey</name>
<value>groupOfNames</value>
</property>
<property>
<name>hive.server2.authentication.ldap.groupMembershipKey</name>
<value>member</value>
</property>
<property>
<name>hive.server2.authentication.ldap.binddn</name>
<value>CN=something,OU=something,DC=some,DC=com</value>
</property>
<property>
<name>hive.server2.authentication.ldap.bindpw</name>
<value>bind_password</value>
</property>
<property>
<name>hive.server2.authentication</name>
<value>LDAP</value>
</property>
<property>
<name>hive.server2.transport.mode</name>
<value>binary</value>
</property>
<property>
<name>hive.users.in.admin.role</name>
<value>admin_user</value>
</property>
<property>
<name>hive.server2.authentication.ldap.url</name>
<value>ldaps://ldap.some.com:1234</value>
</property>
Error message
2021-07-16T22:24:26,405 INFO [HiveServer2-Handler-Pool: Thread-37([])]: ldap.LdapSearch (LdapSearch.java:findUserDn(100)) - Expected exactly one user result for the user: first_name.last_name, but got 0. Returning null
2021-07-16T22:24:26,406 ERROR [HiveServer2-Handler-Pool: Thread-37([])]: transport.TSaslTransport (TSaslTransport.java:open(315)) - SASL negotiation failure
beeline -u "jdbc:hive2://namenode:10000/default"
Connecting to jdbc:hive2://namenode:10000/default
19/05/11 17:21:52 [main]: WARN jdbc.HiveConnection: Failed to connect to namenode:10000
Unknown HS2 problem when communicating with Thrift server.
Error: Could not open client transport with JDBC Uri: jdbc:hive2://namenode:10000/default: java.net.SocketException: Connection reset (state=08S01,code=0)
Beeline version 3.1.0 by Apache Hive
hive-site.xml
javax.jdo.option.ConnectionURL
jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true
<property>
<name>metastore.warehouse.dir</name>
<value>hdfs://namenode:9820/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<property>
<name>metastore.schema.verification</name>
<value>true</value>
</property>
<property>
<name>metastore.hmshandler.retry.attemp</name>
<value>10</value>
<description>The number of times to retry a HMSHandler call if there were a connection error.</description>
</property>
<property>
<name>metastore.thrift.uris</name>
<value>thrift://namenode:9083</value>
</property>
<property>
<name>metastore.thrift.port</name>
<value>9083</value>
</property>
<property>
<name>hive.server2.transport.mode</name>
<value>binary</value>
<description>The server transport mode. The value can be binary or http. Set to http to enable HTTP transport mode.
</description>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
<description>TCP port number to listen on</description>
</property>
<property>
<name>hiver.server2.thrift.bind.host</name>
<value>namenode</value>
<description>TCP interface to bind to</description>
</property>
<property>
<name>hive.server2.thrift.http.port</name>
<value>100002</value>
<description>HTTP port number to listen on</description>
</property>
<property>
<name>hive.server2.thrift.http.min.worker.threads</name>
<value>5</value>
<description>Maximum worker threads in the server pool</description>
</property>
<property>
<name>hive.server2.thrift.http.max.worker.threads</name>
<value>500</value>
<description>Maximum worker threads in the server pool</description>
</property>
<property>
<name>hive.server2.thrift.min.worker.threads</name>
<value>5</value>
<description>Minimum number of worker threads</description>
</property>
<property>
<name>hive.server2.thrift.max.worker.threads</name>
<value>500</value>
<description>Maximum number of worker threads</description>
</property>
<property>
<name>hive.server2.authentication</name>
<value>NOSASL</value>
<description>
Expects one of [nosasl, none, ldap, kerberos, pam, custom].
Client authentication types.
NONE: no authentication check
LDAP: LDAP/AD based authentication
KERBEROS: Kerberos/GSSAPI authentication
CUSTOM: Custom authentication provider
(Use with property hive.server2.custom.authentication.class)
PAM: Pluggable authentication module
NOSASL: Raw transport
</description>
</property>
<property>
<name>hive.server2.thrift.http.path</name>
<value>cliservice</value>
<description>The path component of the URL endpoint when in HTTP mode</description>
</property>
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
<description>
Should metastore do authorization against database notification related APIs such as get_next_notification.
If set to true, then only the superusers in proxy settings have the permission
</description>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>hdfs://namenode:9820/tmp/hive</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission.</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/tmp/${user.name}</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/tmp/${user.name}_resources</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.scratch.dir.permission</name>
<value>733</value>
<description>The permission for the user specific scratch directories that get created.</description>
</property>
Hello all,
hive server is listening on port 10000(tcp) and 10002(http).
If I hit, beeline -u jdbc:hive2://
It works, but when I try to access from host-ip or hostname,
It shows above errors. Anyone have idea?
1 I have created jar with custom UDF function and copied jar into dynamic.jar.dir so when I use my UDF function as part of SELECT I getting result without issues.
2 But when function is a part of WHERE clause I getting error that class of my custom function is not found.
select PK FROM "my.custom.view" where MY_FUN(ARRAY["COLF"."COL1"], 'SOMEPARAM') limit 1;
Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: BooleanExpressionFilter failed during reading: java.lang.ClassNotFoundException: com.myCompany.phoenix.MyCustomFunction
at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:96)
at org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:62)
at org.apache.phoenix.filter.BooleanExpressionFilter.readFields(BooleanExpressionFilter.java:109)
at org.apache.phoenix.filter.SingleKeyValueComparisonFilter.readFields(SingleKeyValueComparisonFilter.java:133)
at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:131)
at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:101)
at org.apache.phoenix.filter.SingleCQKeyValueComparisonFilter.parseFrom(SingleCQKeyValueComparisonFilter.java:50)
... 16 more
base-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:57000/user/hbase</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>21081</value>
</property>
<property>
<name>hbase.client.keyvalue.maxsize</name>
<value>0</value>
</property>
<!-- SEP is basically replication, so enable it -->
<property>
<name>hbase.replication</name>
<value>true</value>
</property>
<property>
<name>hbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily</name>
<value>128</value>
</property>
<property>
<name>hbase.fs.tmp.dir</name>
<value>/tmp/hbase</value>
</property>
<property>
<name>phoenix.functions.allowUserDefinedFunctions</name>
<value>true</value>
</property>
<property>
<name>hbase.dynamic.jars.dir</name>
<value>${hbase.rootdir}/lib/</value>
</property>
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
</property>
</configuration>
Manually adding jar with:
hdfs dfs -copyFromLocal -f /my.jar hdfs:///user/hbase/lib/my.jar
For function creation using:
CREATE FUNCTION MY_FUN(BINARY[], VARCHAR) RETURNS BOOLEAN as 'com.myCompany.phoenix.MyCustomFunction' using jar 'hdfs://localhost:57000/user/hbase/lib/my.jar';
I ran into something similar when I upgraded to Phoenix 5.0 from 4.7. I got an exception stating that I now needed to place my UDF .jar into /apps/hbase/data/lib due to permission issues. In the old environment I was able to get away using the /apps/hbase/lib directory. Maybe this is happening to you as well but it's not alerting you to the new path change.
I recently upgraded our cluster's HiveServer to HiveServer2. I also set up the Hive Metastore (in remote mode) and moved away from embedded mode (which we were previously running).
I want to test that things are properly configured and that the metatdata is acutally being stored in the remote metastore. What would be the easiest way to do this? Are their certain logs I could check to verify this behavior?
I am afraid things are not configured correctly, and I am still running my metastore in local mode, as when I query the postgresql database on the machine hosting the metastore, there are no rows in the metastore DB (despite the fact that I have created test tables through beeline).
It might be worth mentioning is that the end goal of this is to be able to query data stored in HDFS via SparkSQL. Do I need HiveServer2 to accomplish this? Apologies, I am new to a lot of this technology.
Here is my hive-site.xml:
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://w7/metastore</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.postgresql.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://w7:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>hive.warehouse.subdir.inherit.perms</name>
<value>true</value>
</property>
<property>
<name>hive.zookeeper.quorum</name>
<value>mn</value>
</property>
<property>
<name>hive.zookeeper.client.port</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>mn</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hive.zookeeper.namespace</name>
<value>hive_zookeeper_namespace_hive</value>
</property>
<property>
<name>hive.cluster.delegation.token.store.class</name>
<value>org.apache.hadoop.hive.thrift.MemoryTokenStore</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>
<property>
<name>hive.server2.use.SSL</name>
<value>false</value>
</property>
<property>
<name>hive.support.concurrency</name>
<description>Enable Hive's Table Lock Manager Service</description>
<value>true</value>
</property>
</configuration>
I am planning to apply Knox to our HDP 2.2 cluster but I encountered the hiveserver2 http mode connection. I configured hive-site.xml like below:
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>
<property>
<name>fs.hdfs.impl.disable.cache</name>
<value>true</value>
</property>
<property>
<name>fs.file.impl.disable.cache</name>
<value>true</value>
</property>
<property>
<name>hive.server2.allow.user.substitution</name>
<value>true</value>
</property>
<property>
<name>hive.server2.thrift.http.path</name>
<value>cliservice</value>
<description>Path component of URL endpoint when in HTTP mode.</description>
</property>
<property>
<name>hive.server2.transport.mode</name>
<value>http</value>
</property>
<property>
<name>hive.server2.thrift.http.port</name>
<value>10010</value>
</property>
<property>
<name>hive.server2.thrift.http.min.worker.threads</name>
<value>5</value>
</property>
<property>
<name>hive.server2.thrift.http.max.worker.threads</name>
<value>100</value>
</property>
However, when I try to connect to the server via beeline, the beeline hangs. There is no exception and error messages in the log file. If I change the transport mode to "biniary", the problem is totally gone. I don't know why it happens.
What should I do for that?