Hadoop Namenode not accessible - apache

So I have been trying to setup a hadoop cluster on a private network and I have come across a strange problem.
My datanodes and namenode are all running but the datanodes cannot connect to the namenode.
I have noticed in the logs that the namenode is starting on
system1/192.168.100.11:9000
and datanodes are searching for
192.168.100.11:9000
My configuration only has:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.100.11:9000</value>
</property>
</configuration>
Any ideas as to why hadoop is trying to work this way?

Related

Connect timeout from Presto / Trino to Amazon S3

I currently have a Kubernetes setup outside of AWS where a data lake which resides in Amazon S3 gets queried using Presto v348. Data is stored in parquet file format. Additional component is a Hive metastore.
I encounter the following error and am at a loss on regards to troubleshooting the underlying issue:
io.prestosql.spi.PrestoException: Unable to execute HTTP request: Connect to s3-eu-central-1.amazonaws.com:80 [s3-eu-central-1.amazonaws.com] failed: connect timed out
This issue sometimes arises with bigger queries and interestingly brings the system into a state where all following queries time out. There are cases where in 1/5 of tries the query will succeed. Smaller queries in general work perfectly fine. This gets better after about 10-20min. Restarting Presto does not solve the 10-20min problem. Therefore I suspect there must be another problem.
I am aware of the fact that I might run into a performance ceiling, but the fact that instead of an error there are just timeouts and the whole system is unusable for 10-20 minutes is not acceptable.
I have already increased configs like hive.s3.max-connections in Presto and fs.s3a.connection.maximum in the metastore config but it doesn't seem to solve the problem. Besides these, I found no suggestions on how to tweak the setup to prevent the error from happening.
Presto connector config:
connector.name=hive-hadoop2
hive.metastore.uri=thrift://hive-metastore:9083
hive.metastore.username=prestodb
hive.s3.aws-access-key="S3_ACCESS_KEY"
hive.s3.aws-secret-key="S3_SECRET_KEY"
hive.s3.endpoint=s3-eu-central-1.amazonaws.com
hive.s3.ssl.enabled=false
hive.s3.path-style-access=true
hive.parquet.use-column-names=true
hive.allow-drop-table=true
hive.s3-file-system-type=PRESTO
hive.s3.max-connections=50000
hive.s3select-pushdown.max-connections=50000
hive.s3.connect-timeout=60s
hive.allow-rename-column=true
Metatore config:
core-site.xml: |
<configuration>
<property>
<name>fs.s3a.connection.ssl.enabled</name>
<value>false</value>
</property>
<property>
<name>fs.s3a.access.key</name>
<value>xxx</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>xxx</value>
</property>
<property>
<name>fs.s3a.fast.upload</name>
<value>true</value>
</property>
<property>
<name>fs.s3a.connection.maximum</name>
<value>50000</value>
</property>
<property>
<name>fs.s3a.connection.establish.timeout</name>
<value>60000</value>
</property>
<property>
<name>fs.s3a.threads.max</name>
<value>64</value>
</property>
<property>
<name>fs.s3a.max.total.tasks</name>
<value>128</value>
</property>
</configuration>

When YARN is running the hadoop job submitted get stuck in Accepted state

I am using VirualBox to run Ubuntu 14 VM on Windows laptop. I have configured Apache distribution HDFS and YARN for Single Node. When I run dfs and YARN then all required demons are running. When I don't configure YARN and run DFS only then I can execute MapReduce job successfully, But when I run YARN as well then job get stuck at ACCEPTED state, I tried many settings regarding changing memory settings of node but no luck.
Following link I followed to set single node
https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/SingleCluster.html
core-site.xml
`
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>`
settings of hdfs-site.xml`
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/shaileshraj/hadoop/name/data</value>
</property>
</configuration>`
settings of mapred-site.xml
`<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>`
settings of yarn-site.xml`
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2200</value>
<description>Amount of physical memory, in MB, that can be allocated for containers.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>500</value>
</property>
RM Web UI
Here is Application Master screen of RM Web UI. What I can see AM container is not allocated, may be that is problem
If the job is not getting enough number of resources, it will be in ACCEPTED state. Whenever it gets resources it will change to RUNNING state.
In your case, open Resource Manager WebUI and check how much of resources are available to run jobs.

HRegionServer shows "error telling master we are up". Showing socket exception: Invalid argument

Iam trying to create a hbase cluster in 3 centos machines. Hadoop(v - 2.8.0) is up and running on top I configured HBase(v - 1.2.5).Hbase start up is fine it started HMaster and Region servers but still it shows the follwing error in region servers and in HMaster log it shows no region servers are checked in.
2017-04-20 19:30:33,950 WARN [regionserver/localhost/127.0.0.1:16020] regionserver.HRegionServer: error telling master we are up
com.google.protobuf.ServiceException: java.net.SocketException: Invalid argument
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:240)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)
at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8982)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2316)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:907)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Invalid argument
at sun.nio.ch.Net.connect0(Native Method)
at sun.nio.ch.Net.connect(Net.java:454)
at sun.nio.ch.Net.connect(Net.java:446)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupConnection(RpcClientImpl.java:416)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:722)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:906)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:873)
at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1241)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
JPS of my master node
[hadoop#localhost bin]$ jps
20624 SecondaryNameNode
20800 ResourceManager
20401 NameNode
18061 Jps
17839 HMaster
JPS of myregion nodes are
[hadoop#localhost bin]$ jps
11168 Jps
482 DataNode
10840 HQuorumPeer
10974 HRegionServer
hbase-site.xml of all nodes
<configuration>
<property>
<name>hbase.master.hostname</name>
<value>NameNode</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://NameNode:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>hdfs://NameNode:8020/zookeeper</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>DataNode1,DataNode2</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
</configuration>
regionservers file contain
DataNode1
DataNode2
etc/hosts file in all nodes contain actual ips rather than loopback ips
192.168.00.00 NameNode
192.168.00.00 DataNode1
192.168.00.00 DataNode2
Note configuration is same in all nodes. Any help will be appreciated.
I put the following property in all region servers hbase-site.xml solved my problem.<property> <name>hbase.regionserver.hostname</name> <value>DataNode1</value> </property> <property> <name>hbase.regionserver.port</name> <value>16020</value> </property>
i was facing the same problem but...
changing hostname resolved my problem
sudo hostnamectl set-hostname new_hostname
i had a master and a node called node1
link to wiki that have the configs

Applications not shown in yarn UI when running mapreduce hadoop job?

I am using Hadoop2.2. I see that my jobs are completed with success. I can browse the filesystem to find the output. However, when I browse http://NNode:8088/cluster/apps, I am unable to see any applications that have been completed so far ( I ran 3 wordcount jobs, but none of it is seen here).
Are there any configurations that need to be taken into account?
Here is the yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>NNode</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
-->
Here is mapred-site.xml:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
I have job history server running too:
jps
4422 NameNode
5452 Jps
4695 SecondaryNameNode
4924 ResourceManager
72802 Jps
5369 JobHistoryServer
After applications are completed, their responsibility might be moved to Job History Server. So check Job History Server URL. It normally listen on port 19888. E.g.
http://<job_history_server_address>:19888/jobhistory
Log directories and log retain durations are configurable in yarn-site.xml. With YARN, even one can aggregate logs to a single (configurable) location.
Sometimes, even though application is listed, logs are not available (I am not sure if its due to some bug in YARN). However, almost each time I was able to get the logs using command line:
yarn logs -applicationId the_application_id
Athough there are multiple options. Use help for details:
yarn logs --help
you can refer Hadoop is not showing my job in the job tracker even though it is running
conf.set("fs.defaultFS", "hdfs://master:9000");
conf.set("mapreduce.jobtracker.address", "master:54311");
conf.set("mapreduce.framework.name", "yarn");
conf.set("yarn.resourcemanager.address", "master:8032");
I tested in my cluster. It works!

Looking for proper hbase-site.xml hbase-default.xml config example for HBase client

I am trying to connect to an HBase node from a Java application. HBaseConfiguration is key, but the available Javadoc and documentation is really poor and insufficient.
Does anyone have proper examples of hbase-site.xml hbase-default.xml to use for remote connection?
Thanks!
There are only two variables you need to set from a clients point of view:
hbase.rootdir
hbase.zookeeper.quorum
Here are the steps from my setup doc about the hbase-site.xml. We don't make any changes to the hbase-default.xml as ... well... that's all the default settings. :)
edit hbase-site.xml. Copy the following to the file.
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl"
href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.rootdir</name> <value>hdfs://PDHadoop1.corp.COMPANY.com:54310/usr/hbase</value>
<final>true</final> </property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>PDHadoop1.corp.COMPANY.com,PDHadoop2.corp.COMPANY.com,PDHadoop3.corp.COMPANY.com,PDHadoop4.corp.COMPANY.com</value>
<final>true</final> </property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<final>true</final> </property>
</configuration>
Save the file and quit the editor.
Please note that hbase.rootdir is pointing to PDHadoop1 as that is the name node in development environment. Similarly, hbase.zookeeper.quorum is pointing to all zookeeper servers in development environment. Please substitute these values with appropriate server names in your environment.
edit regionservers. Copy the following to the file.
PDHadoop3.corp.COMPANY.com
PDHadoop2.corp.COMPANY.com
PDHadoop1.corp.COMPANY.com
I apologize for the XML's lack of formatting.
These are the settings we use in production, I opened the file on my dev cluster to verify.
I hope that helps.
One major gotcha that I've encountered is that if your /etc/hosts contains an entry for that hostname pointing to the loopback address (127.0.0.1, 127.0.1.1, et cetra), then Hbase master will incorrectly register itself in Zookeeper with that loopback address -- which will not work when your client is not on the same machine as your master.
I wasted quite a bit of time to (first) get Hbase working. The solution is to remove the entry in /etc/hosts; but this requires that you override the "out of the box" behavior of the OS, at least on the ubuntu box that I've tested this on...