I am currently working with flume and i tried to work with basic examples.but the thing is whenever i tried to run flume agent not logs are showing
I followed many tutorials but all end up as same result
/usr/hadoop-env/flume-1.10.1/bin$ ./flume-ng agent -n na -c conf -f ../conf/flume-conf-netcat.properties>text.txt
This is the command i tried to run
na.sources = nas1
na.channels = nac1
na.sinks = nal1
na.sources.nas1.type = netcat
na.sources.nas1.bind = localhost
na.sources.nas1.port = 77777
na.channels.nac1.capacity = 100
na.channels.nac1.type = memory
na.sinks.nal1.type = logger
na.channels.nac1.capacity = 100
This the config file file.
i have tried many examples but all end up at this stage
Output is showing like this
Info: Including Hadoop libraries found via (/usr/local/hadoop/bin/hadoop) for HDFS access
Info: Including Hive libraries found via () for Hive access
+ exec /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx20m -cp 'conf:/usr/hadoop-env/flume-1.10.1/lib/*:/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/yarn:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/contrib/capacity-scheduler/*.jar:/lib/*' -Djava.library.path=:/usr/local/hadoop/lib org.apache.flume.node.Application -n na -f ../conf/flume-conf-netcat.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hadoop-env/flume-1.10.1/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
and noting is showing and i tried telnet but no output can someone help me.
Related
I'm using the Docker approach from here and I can't find a way to make the archiver with report and qa plugins fully working.
Apparently, the archiver plugin is not executed (the tasks for archiver update are launched and listed in the queue but not executed - don't see any archive created).
The configs for celery in production.ini are as follow:
# Before [app:main]
[app:celery]
BROKER_URL = redis://redis:6379
RESULT_BACKEND = redis://redis:6379
BROKER_BACKEND = redis
BROKER_HOST = redis://redis/1
CELERY_RESULT_BACKEND = redis
REDIS_HOST = redis
REDIS_PORT = 6379
REDIS_DB = 0
REDIS_CONNECT_RETRY = True
and the configs for Archiver
# Archiver Settings
ckanext-archiver.archive_dir=/var/lib/ckan/storage/resource_cache
ckanext-archiver.cache_url_root=http://ckan:5000/resource_cache
ckanext-archiver.max_content_length=50000000
For all 3 plugins the db schema is initiated.
Another issue, the report info in the dataset detail page is not visible
Any suggestion for the right way to configure the archiver plugin will be appreciated.
Firstly, did you included the archiver plugin in the Docker setup? If not follow this Docker as an example.
Next, the archiver can be triggered by adding a dataset with a resource or updating a resource URL using ckanext-archiver command. Please check using archiver docs.
You can also check these sh scripts, to run as cron jobs that are executing the archiver and then generate reports:
archiver.sh
report.sh
I downloaded datastax 6 and would like to spin up a single (on mac El Capitan) analytics (spark is fine, but spark + search would be good). I extracted the gz, configured the directory structures and executed dse cassandra -ks. Start up seems to work just fine, I can get to the spark master node, problem is when I run dse spark-sql (or just spark). I constantly get the following error:
Is it possible to setup a single node for development?
ERROR [ExecutorRunner for app-20180623083819-0000/212] 2018-06-23 08:40:28,323 SPARK-WORKER Logging.scala:91 - Error running executor
java.lang.IllegalStateException: Cannot find any build directories.
at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:248) ~[spark-launcher_2.11-2.2.0.14.jar:2.2.0.14]
at org.apache.spark.launcher.AbstractCommandBuilder.getScalaVersion(AbstractCommandBuilder.java:240) ~[spark-launcher_2.11-2.2.0.14.jar:2.2.0.14]
at org.apache.spark.launcher.AbstractCommandBuilder.buildClassPath(AbstractCommandBuilder.java:194) ~[spark-launcher_2.11-2.2.0.14.jar:2.2.0.14]
at org.apache.spark.launcher.AbstractCommandBuilder.buildJavaCommand(AbstractCommandBuilder.java:117) ~[spark-launcher_2.11-2.2.0.14.jar:2.2.0.14]
at org.apache.spark.launcher.WorkerCommandBuilder.buildCommand(WorkerCommandBuilder.scala:39) ~[spark-core_2.11-2.2.0.14.jar:2.2.0.14]
at org.apache.spark.launcher.WorkerCommandBuilder.buildCommand(WorkerCommandBuilder.scala:45) ~[spark-core_2.11-2.2.0.14.jar:2.2.0.14]
at org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:63) ~[spark-core_2.11-2.2.0.14.jar:6.0.0]
at org.apache.spark.deploy.worker.CommandUtils$.buildProcessBuilder(CommandUtils.scala:51) ~[spark-core_2.11-2.2.0.14.jar:6.0.0]
at org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:150) ~[spark-core_2.11-2.2.0.14.jar:6.0.0]
at org.apache.spark.deploy.worker.DseExecutorRunner$$anon$2.run(DseExecutorRunner.scala:80) [dse-spark-6.0.0.jar:6.0.0]
INFO [dispatcher-event-loop-7] 2018-06-23 08:40:28,323 SPARK-WORKER Logging.scala:54 - Executor app-20180623083819-0000/212 finished with state FAILED message java.lang.IllegalStateException: Cannot find any build directories.
INFO [dispatcher-event-loop-7] 2018-06-23 08:40:28,324 SPARK-MASTER Logging.scala:54 - Removing executor app-20180623083819-0000/212 because it is FAILED
INFO [dispatcher-event-loop-0] 2018-06-23 08:40:30,288 SPARK-MASTER Logging.scala:54 - Received unregister request from application app-20180623083819-0000
INFO [dispatcher-event-loop-0] 2018-06-23 08:40:30,292 SPARK-MASTER Logging.scala:54 - Removing app app-20180623083819-0000
INFO [dispatcher-event-loop-0] 2018-06-23 08:40:30,295 SPARK-MASTER CassandraPersistenceEngine.scala:50 - Removing existing object
Check the dse.yaml for directories in the /var/lib/... - you should have write access to them... For example, check that DSEFS directories are correctly configured, AlwaysOn SQL directories, etc.
But in reality, issue should be fixed by starting DSE as dse cassandra -s -k instead of -sk...
P.S. I'm using following script to point logs, etc. to the specific directories:
export CASSANDRA_HOME=WHERE_YOU_EXTRACTED
export DATA_BASE_DIR=SOME_DIRECTORY
export DSE_DATA=${DATA_BASE_DIR}/data
export DSE_LOGS=${DATA_BASE_DIR}/logs
# set up where you want log so you don’t have to mess with logback.xml files
export CASSANDRA_LOG_DIR=$DSE_LOGS/cassandra
mkdir -p $CASSANDRA_LOG_DIR
# so we don’t have to play with dse-spark-env.sh
export SPARK_WORKER_DIR=$DSE_DATA/spark/worker
# new setting in 6.0, in older versions set SPARK_LOCAL_DIRS
export SPARK_EXECUTOR_DIRS=$DSE_DATA/spark/rdd
export SPARK_LOCAL_DIRS=$DSE_DATA/spark/rdd
mkdir -p $SPARK_LOCAL_DIRS
export SPARK_WORKER_LOG_DIR=$DSE_DATA/spark/worker/
export SPARK_MASTER_LOG_DIR=$DSE_DATA/spark/master
# if you want to run the always on sql server
export ALWAYSON_SQL_LOG_DIR=$DSE_DATA/spark/alwayson_sql_server
export ALWAYSON_SQL_SERVER_AUDIT_LOG_DIR=$DSE_DATA/spark/alwayson_sql_server
# so tomcat logs for solr goes to a place we know
export TOMCAT_LOGS=$DSE_LOGS/tomcat
PATH=${CASSANDRA_HOME}/bin:${CASSANDRA_HOME}/resources/cassandra/tools/bin:$PATH
error screenshot here
I am trying to stream real time twitter data into HDFS using Apache flume. And when i run the command ./flume-ng agent -c /usr/local/apache-flume-1.4.0-bin/conf/ -f /usr/local/apache-flume-1.4.0-bin/conf/flume.conf -n TwitterAgent it gives me a SLF4j exception. It does not allow the program to run further. Any suggestions to solve this issue would be of immense help
I am trying to start hiveserver2, by going to the bin folder of my Hive install and typing hiveserver2. However, nothing happens - it just hangs there, and when I check if anything is running on Hive ports (the interface on 10002 for example) there is nothing, nor anything in netstat.
Initially I had errors about SLF4J:
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
But I solved that by moving the /usr/lib/hive/lib/log4j-slf4j-impl-2.4.1.jar in to the /tmp directory.
Now when I run bin/hiveserver2 it hangs for quite a while, with no output, and then just returns to the command line - and the Hive server isn't started. I'm struggling to find any logs either.
There is a chance that the hiveserver2 running without showing any information. Try this answer with attached argument flag showing running log
i am running below hive coomand from beeline . Can someone please tell where can I see Map reudce logs for this ?
0: jdbc:hive2://<servername>:10003/> select a.offr_id offerID , a.offr_nm offerNm , b.disp_strt_ts dispStartDt , b.disp_end_ts dispEndDt , vld_strt_ts validStartDt, vld_end_ts validEndDt from gcor_offr a, gcor_offr_dur b where a.offr_id = b.offr_id and b.disp_end_ts > '2016-09-13 00:00:00';
When using beeline, MapReduce logs are part of HiveServer2 log4j logs.
If your Hive install was configured by Cloudera Manager (CM), then it will typically be in /var/log/hive/hadoop-cmf-HIVE-1-HIVESERVER2-*.out on the node where HiveServer2 is running (may or may not be the same as where you are running beeline from)
Few other scenarios:
Your Hive install was not configured by CM ? You will need to manually create log4j config file:
Create hive-log4j.properties config file in directory specified by HIVE_CONF_DIR environment variable. (This makes it accessible to HiveServer2 JVM classpath)
In this file, log location is specified by log.dir and log.file. See conf/hive-log4j.properties.template in your distribution for an example template for this file.
You run beeline in "embedded HS2 mode" (i.e. beeline -u jdbc:hive2:// user password) ?:
You will customize beeline log4j (as opposed to HiveServer2 log4j).
Beeline log4j properties file is strictly called beeline-log4j2.properties (in versions prior to Hive 2.0, it is called beeline-log4j.properties). Needs to be created and made accessible to beeline JVM classpath via HIVE_CONF_DIR. See HIVE-10502 and HIVE-12020 for further discussion on this.
You want to customize what HiveServer2 logs get printed on beeline stdout ?
This can be configured at HiveServer2 level using hive.server2.logging.operation.enabled and hive.server2.logging.operation configs.
Hive uses log4j for logging. These logs are not emitted to the standard output by default but are instead captured to a log file specified by Hive's log4j properties file. By default, Hive will use hive-log4j.default in the conf/ directory of the Hive installation which writes out logs to /tmp/<userid>/hive.log and uses the WARN level.
It is often desirable to emit the logs to the standard output and/or change the logging level for debugging purposes. These can be done from the command line as follows:
$HIVE_HOME/bin/hive --hiveconf hive.root.logger=INFO,console
set hive.async.log.enabled=false