Filebeat does not complete on close_eof + --once - filebeat

Using filebeat 7.5.2:
I'm using a filebeat configuration with close_eof enabled and I run filebeat with the flag --once. I can see the harvester reaching eof but the filebeat keeps going.
Flebeat conf:
filebeat.inputs:
- type: log
close_eof: true
enabled: true
paths:
- "${LOGS_PATH}"
scan_frequency: 1s
fields: {
machine: "${HOST}"
}
output.logstash:
hosts: ["192.168.41.6:5044"]
bulk_max_size: 1024
timeout: 30s
pipelining: 1
workers: 1
And I run it using:
filebeat run --once -v -c "PATH TO CONF..."
And some logs from the filebeat instance:
...
2020-02-04T18:30:16.950Z INFO instance/beat.go:297 Setup Beat: filebeat; Version: 7.5.2
2020-02-04T18:30:17.059Z INFO [publisher] pipeline/module.go:97 Beat name: logstash
2020-02-04T18:30:17.167Z WARN beater/filebeat.go:152 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch out
put is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2020-02-04T18:30:17.168Z INFO instance/beat.go:429 filebeat start running.
2020-02-04T18:30:17.168Z INFO [monitoring] log/log.go:118 Starting metrics logging every 30s
2020-02-04T18:30:17.168Z INFO registrar/migrate.go:104 No registry home found. Create: /tmp/tmp.BXJtfiaEzb/data/registry/filebeat
2020-02-04T18:30:17.179Z INFO registrar/migrate.go:112 Initialize registry meta file
2020-02-04T18:30:17.192Z INFO registrar/registrar.go:108 No registry file found under: /tmp/tmp.BXJtfiaEzb/data/registry/filebeat/data.json. Creating a new re
gistry file.
2020-02-04T18:30:17.193Z INFO registrar/registrar.go:145 Loading registrar data from /tmp/tmp.BXJtfiaEzb/data/registry/filebeat/data.json
2020-02-04T18:30:17.193Z INFO registrar/registrar.go:152 States Loaded from registrar: 0
2020-02-04T18:30:17.193Z WARN beater/filebeat.go:368 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch out
put is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2020-02-04T18:30:17.193Z INFO crawler/crawler.go:72 Loading Inputs: 1
2020-02-04T18:30:17.194Z INFO log/input.go:152 Configured paths: [/tmp/tmp.BXJtfiaEzb/*.log]
2020-02-04T18:30:17.206Z INFO input/input.go:114 Starting input of type: log; ID: 13918413832820009056
2020-02-04T18:30:17.225Z INFO input/input.go:167 Stopping Input: 13918413832820009056
2020-02-04T18:30:17.225Z INFO crawler/crawler.go:106 Loading and starting Inputs completed. Enabled inputs: 1
2020-02-04T18:30:17.225Z INFO log/harvester.go:251 Harvester started for file: /tmp/tmp.BXJtfiaEzb/dcbgw-20200124080032_darkblue.log
2020-02-04T18:30:17.231Z INFO beater/filebeat.go:384 Running filebeat once. Waiting for completion ...
2020-02-04T18:30:17.231Z INFO beater/filebeat.go:386 All data collection completed. Shutting down.
2020-02-04T18:30:17.231Z INFO crawler/crawler.go:139 Stopping Crawler
2020-02-04T18:30:17.231Z INFO crawler/crawler.go:149 Stopping 1 inputs
2020-02-04T18:30:17.258Z INFO pipeline/output.go:95 Connecting to backoff(async(tcp://192.168.41.6:5044))
2020-02-04T18:30:17.296Z INFO pipeline/output.go:105 Connection to backoff(async(tcp://192.168.41.6:5044)) established
... Only metrics here ...
2020-02-04T18:35:55.686Z INFO log/harvester.go:274 End of file reached: /tmp/tmp.BXJtfiaEzb/dcbgw-20200124080032_darkblue.log. Closing because close_eof is enabled.
2020-02-04T18:35:55.686Z INFO crawler/crawler.go:165 Crawler stopped
... MORE METRICS ...
2020-02-04T18:36:26.609Z ERROR logstash/async.go:256 Failed to publish events caused by: read tcp 192.168.41.6:49662->192.168.41.6:5044: i/o timeout
2020-02-04T18:36:26.621Z ERROR logstash/async.go:256 Failed to publish events caused by: client is not connected
2020-02-04T18:36:28.520Z ERROR pipeline/output.go:121 Failed to publish events: client is not connected
2020-02-04T18:36:28.520Z INFO pipeline/output.go:95 Connecting to backoff(async(tcp://192.168.41.6:5044))
2020-02-04T18:36:28.521Z INFO pipeline/output.go:105 Connection to backoff(async(tcp://192.168.41.6:5044)) established
... MORE METRICS ...
From this I'm outputing this to Logstash 7.5.2 running in the same Ubuntu 18 VM. Running Logstash with log level trace does not output any error.

Related

Filebeat not starting TCP server (input)

So I have configured filebeat to accept input via TCP. This is filebeat.yml file.
filebeat.inputs:
- type: tcp
host: ["localhost:9000"]
max_message_size: 20MiB
For some reason filebeat does not start the TCP server at port 9000. I have verified this using wireshark. Wireshark shows nothing at port 9000.
This is output of command "filebeat -e -d "*"" run on terminal
2019-08-14T09:12:40.745-0600 INFO instance/beat.go:468 Home path: [/usr/local/Cellar/filebeat/6.2.4] Config path: [/usr/local/etc/filebeat] Data path: [/usr/local/var/lib/filebeat] Logs path: [/usr/local/var/log/filebeat]
2019-08-14T09:12:40.745-0600 DEBUG [beat] instance/beat.go:495 Beat metadata path: /usr/local/var/lib/filebeat/meta.json
2019-08-14T09:12:40.745-0600 INFO instance/beat.go:475 Beat UUID: 764da0fd-ea93-4777-b1ea-63149be0d6b6
2019-08-14T09:12:40.745-0600 INFO instance/beat.go:213 Setup Beat: filebeat; Version: 6.2.4
2019-08-14T09:12:40.745-0600 DEBUG [beat] instance/beat.go:230 Initializing output plugins
2019-08-14T09:12:40.745-0600 DEBUG [processors] processors/processor.go:49 Processors:
2019-08-14T09:12:40.745-0600 INFO pipeline/module.go:76 Beat name: Ad-MBP.domain
2019-08-14T09:12:40.745-0600 ERROR fileset/modules.go:95 Not loading modules. Module directory not found: /usr/local/Cellar/filebeat/6.2.4/module
2019-08-14T09:12:40.745-0600 INFO [monitoring] log/log.go:97 Starting metrics logging every 30s
2019-08-14T09:12:40.745-0600 INFO instance/beat.go:301 filebeat start running.
2019-08-14T09:12:40.745-0600 DEBUG [registrar] registrar/registrar.go:90 Registry file set to: /usr/local/var/lib/filebeat/registry
2019-08-14T09:12:40.746-0600 INFO registrar/registrar.go:110 Loading registrar data from /usr/local/var/lib/filebeat/registry
2019-08-14T09:12:40.746-0600 INFO registrar/registrar.go:121 States Loaded from registrar: 0
2019-08-14T09:12:40.746-0600 WARN beater/filebeat.go:261 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2019-08-14T09:12:40.746-0600 INFO crawler/crawler.go:48 Loading Prospectors: 1
2019-08-14T09:12:40.746-0600 DEBUG [registrar] registrar/registrar.go:152 Starting Registrar
2019-08-14T09:12:40.746-0600 DEBUG [cfgfile] cfgfile/reload.go:95 Checking module configs from: /usr/local/etc/filebeat/modules.d/*.yml
2019-08-14T09:12:40.746-0600 DEBUG [cfgfile] cfgfile/reload.go:109 Number of module configs found: 0
2019-08-14T09:12:40.746-0600 INFO crawler/crawler.go:82 Loading and starting Prospectors completed. Enabled prospectors: 0
2019-08-14T09:12:40.746-0600 INFO cfgfile/reload.go:127 Config reloader started
2019-08-14T09:12:40.748-0600 DEBUG [cfgfile] cfgfile/reload.go:151 Scan for new config files
2019-08-14T09:12:40.748-0600 DEBUG [cfgfile] cfgfile/reload.go:170 Number of module configs found: 0
2019-08-14T09:12:40.748-0600 INFO cfgfile/reload.go:219 Loading of config files completed.
I am not sure what I am doing wrong..
I believe filebeat inputs are only available from filebeat 6.3+, anything older used filebeat prospectors.
6.3 TCP input documentation, nothing available for 6.2 or older as it uses prospectors:
https://www.elastic.co/guide/en/beats/filebeat/6.3/filebeat-input-tcp.html
Your logs show that you are on filebeat version 6.24, could you try out your configuration with 6.3+?

Problem with filebeat yml file on Windows

I am quite new to the Elastic stack and trying to experiment with visualization of apache log files in Kibana. I am using filebeat to ingest the apache logs. However when I run .\filebeat.exe setup -e, I get the following error:
2019-02-05T20:53:10.515+0530 INFO elasticsearch/client.go:165 Elasticsearch url: http://localhost:9200
2019-02-05T20:53:10.520+0530 INFO elasticsearch/client.go:721 Connected to Elasticsearch version 6.6.0
2019-02-05T20:53:10.520+0530 INFO kibana/client.go:118 Kibana url: http://localhost:5601
2019-02-05T20:53:10.567+0530 WARN fileset/modules.go:388 X-Pack Machine Learning is not enabled
2019-02-05T20:53:10.572+0530 ERROR instance/beat.go:911 Exiting: 1 error: error loading config file: invalid con
fig: yaml: line 4: did not find expected hexdecimal number
My filebeat.yml file looks like this:
filebeat.inputs:
- type: log
enabled: true
paths: C:\Users\bigdataadmin\Downloads\ApacheLogs\*
#============================= Filebeat modules ===============================
filebeat.config.modules:
path: C:\Program Files\Filebeat\modules.d\*.yml
reload.enabled: true
reload.period: 60s
#==================== Elasticsearch template setting ==========================
setup.template.settings:
index.number_of_shards: 3
setup.kibana:
host: "localhost:5601"
output.elasticsearch:
hosts: ["localhost:9200"]
# Configure processors to enhance or manipulate events generated by the beat.
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
I also checked the yml on http://www.yamllint.com/ but didn't find any problems. I can't seem to figure out what's wrong with line 4 of this file.
I am using filebeat 6.6
The path key(on line 4) is an array. So you need to represent an array there.
Example :
filebeat.inputs:
- type: log
enabled: true
paths:
- C:\Users\bigdataadmin\Downloads\ApacheLogs\*
Please be very cautious about the data type you are representing in such config files, I had made the same mistake while I was working on Filebeat and I had to spend a lot of time for a small mistake...

How to get get Application Id in submitting Flink jobs into Yarn use command line interface?

my team is building a flink based realtime computation platform. We submit flink job to Yarn.
We create a Process and run commit command use CLI. In order to get yarn application id, we create a thread and parse process output. Application id is used in other methods.
For example, we submit job by this command:
nohup flink run -m yarn-cluster -d -yqu root.default
-ynm BDP_RTC_FLINK_10457_MultiOutputTestFrontEnd -yjm 1024
-yn 2 -ytm 1024 -ys 2
The output is shown below:
2018-10-10 11:21:04 [info] 2018-10-10 11:21:04,629 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Submitting application master application_1536669298614_67675
2018-10-10 11:21:04 [info] 2018-10-10 11:21:04,654 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1536669298614_67675
2018-10-10 11:21:04 [info] 2018-10-10 11:21:04,656 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying cluster, current state ACCEPTED
2018-10-10 11:21:12 [info] 2018-10-10 11:21:12,699 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - YARN application has been deployed successfully.
2018-10-10 11:21:12 [info] 2018-10-10 11:21:12,700 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - The Flink YARN client has been started in detached mode.
We parse process output and get application id: application_1536669298614_67675.
Are there any other elegant solutions to get application id in our situation?
Maybe you can get the relation between the yarn application and the flink job.
Firstly, list the yarn application.
yarn application -list
Then, you get the application list, and you can list flink job on the yarn application.
./bin/flink list -m yarn-cluster -yid <Yarn Application Id>
By the way, you can use
./bin/flink run -d
not use
nohup

Hiveserver2 does not start after installing HDP 2.6.4.0-91 using cloudbreak on AWS

Hiveserver2 does not start after installing HDP 2.6.4.0-91 using cloudbreak on AWS.
Start the hiveserver2 in the Ambari UI and check the contents of /var/log/hive/hiveserver2.log.
Below is the error log.
Any help would be appreciated.
Contents of hiveserver2.log
2018-03-08 04:41:53,345 WARN [main-EventThread]: server.HiveServer2 (HiveServer2.java:process(343)) - This instance of HiveServer2 has been removed from the list of server instances available for dynamic service discovery. The last client session has ended - will shutdown now.
2018-03-08 04:41:53,347 INFO [main]: zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x16203aad5af0040 closed
2018-03-08 04:41:53,347 INFO [main]: server.HiveServer2 (HiveServer2.java:removeServerInstanceFromZooKeeper(361)) - Server instance removed from ZooKeeper.
2018-03-08 04:41:53,348 INFO [main-EventThread]: server.HiveServer2 (HiveServer2.java:stop(405)) - Shutting down HiveServer2
2018-03-08 04:41:53,348 INFO [main-EventThread]: server.HiveServer2 (HiveServer2.java:removeServerInstanceFromZooKeeper(361)) - Server instance removed from ZooKeeper.
2018-03-08 04:41:53,348 INFO [main-EventThread]: zookeeper.ClientCnxn (ClientCnxn.java:run(524)) - EventThread shut down
2018-03-08 04:41:53,348 WARN [main]: server.HiveServer2 (HiveServer2.java:startHiveServer2(508)) - Error starting HiveServer2 on attempt 1, will retry in 60 seconds
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1520480101488_0046 failed 2 times due to AM Container for appattempt_1520480101488_0046_000002 exited with exitCode: -1000
For more detailed output, check the application tracking page: http://ip-10-0-91-7.ap-northeast-2.compute.internal:8088/cluster/app/application_1520480101488_0046 Then click on links to logs of each attempt.
Diagnostics: ExitCodeException exitCode=2: tar: Removing leading `/' from member names
tar: Skipping to next header
gzip: /hadoopfs/fs1/yarn/nodemanager/filecache/60_tmp/tmp_tez.tar.gz: invalid compressed data--format violated
tar: Exiting with failure status due to previous errors
Failing this attempt. Failing the application.
at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:699)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:218)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:116)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.startPool(TezSessionPoolManager.java:76)
at org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:488)
at org.apache.hive.service.server.HiveServer2.access$700(HiveServer2.java:87)
at org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:720)
at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:593)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
I had exactly the same issue with HDP on AWS. FYI, In my case the issue was with HDP version 2.6.4.5-2. I'm going to show how I fixed using this version because it is the latest at this time.
As the error log shows the problem is with tez.tar.gz that is corrupted then YARN is unable to decompress it in the YARN container.
This tez.tar.gz file is copied from the hdfs:///hdp/apps/<hdp_version>/tez/tez.tar.gz.
To reproduce the error and confirm that this file is corrupted, you can run the following command:
sudo su
su hdfs
hdfs dfs -get /hdp/apps/2.6.4.5-2/tez.tar.gz
tar -xvzf tez.tar.gz
You will get the following error:
gzip: stdin: invalid compressed data--format violated
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
The fix is pretty simple, you must just replace the HDFS file with the one that you have on your local file-system running the following command:
hdfs dfs -rm /hdp/apps/2.6.4.5-2/tez/tez.tar.gz
hdfs dfs -put /usr/hdp/current/tez-client/lib/tez.tar.gz /hdp/apps/2.6.4.5-2/tez/tez.tar.gz
Now restart Hive Server 2 service and done!
NOTE: If something similar happens with other services you can do the same thing. Please check the following link that has more details: https://community.hortonworks.com/articles/30096/foxing-broken-targz-and-jar-files-in-hdp-24.html
Hope this helps!

Apache hadoop Installation on Windows 10

While setting up a single node cluster without Cygwin on windows 10,I followed the specific document- Link for Hadoop installation in windows 10
I am facing the below error while starting the hdfs using D:\hadoop-2.6.2.tar\hadoop-2.6.2\hadoop-2.6.2\sbin>start-dfs.cmd
Error message stack trace:
17/01/12 12:25:42 FATAL datanode.DataNode: Exception in secureMain java.lang.RuntimeException: Error while running command to get file permissions : ExitCodeException exitCode=-1073741515:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:808)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1097)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:582)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:557)
at org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:139)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:156)
at org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:2299)
at org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2341)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2323)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2215)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2262)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2438)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2462)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:620)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:557)
at org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:139)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:156)
at org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:2299)
at org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2341)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2323)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2215)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2262)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2438)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2462) 17/01/12 12:25:42 INFO util.ExitUtil: Exiting with status 1
Also this error message about starting namenode:
17/01/12 12:25:43 FATAL namenode.NameNode: Failed to start namenode.
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:557)
at org.apache.hadoop.fs.FileUtil.canWrite(FileUtil.java:996)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:490)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:309)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:202)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1022)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:741)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:538)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:597)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:764)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:748)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1441)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1507)
17/01/12 12:25:43 INFO util.ExitUtil: Exiting with status 1
[]Problem analysis ] /data directory permissions is not enough, the NameNode cannot be started.
[Solution]
(1) in the root, the operation of the/data/directory permissions assigned to hadoop users;
(2) empty /data directory file;
(3) to reformat the NameNode, restart the hadoop cluster.