RabbitMQ Crash Explanation - rabbitmq

Our RabbitMQ service crashed twice with the following report in the $RABBITMQ_NODENAME-sasl.log:
=CRASH REPORT==== 7-Jun-2016::14:37:25 ===
crasher:
initial call: gen:init_it/6
pid: <0.223.0>
registered_name: []
exception exit: {{badmatch,
{[{msg_location,
<<162,171,39,113,226,229,228,92,227,253,48,186,
45,48,29,98>>,
1,357,0,583},
******************
16000 similar msg_location lines snipped
******************
1795219}},
[{rabbit_msg_store,combine_files,3,[]},
{rabbit_msg_store_gc,attempt_action,3,[]},
{rabbit_msg_store_gc,handle_cast,2,[]},
{gen_server2,handle_msg,2,[]},
{proc_lib,wake_up,3,
[{file,"proc_lib.erl"},{line,250}]}]}
in function gen_server2:terminate/3
ancestors: [msg_store_persistent,rabbit_sup,<0.159.0>]
messages: [{'$gen_cast',{combine,394,380}}]
links: [#Port<0.86370>,<0.218.0>,#Port<0.86369>]
dictionary: [{{"/var/lib/rabbitmq/mnesia/$RABBITMQ_NODENAME/msg_store_persistent/357.rdq",
fhc_file},
{file,1,false}},
{{"/var/lib/rabbitmq/mnesia/$RABBITMQ_NODENAME/msg_store_persistent/340.rdq",
fhc_file},
{file,1,true}},
{fhc_age_tree,{2,
{{1465,346244,764691},
#Ref<0.0.3145729.257998>,nil,
{{1465,346244,891543},
#Ref<0.0.3145729.258001>,nil,nil}}}},
{{#Ref<0.0.3145729.257998>,fhc_handle},
{handle,{file_descriptor,prim_file,{#Port<0.86369>,59}},
0,false,0,1048576,[],false,
"/var/lib/rabbitmq/mnesia/$RABBITMQ_NODENAME/msg_store_persistent/357.rdq",
[raw,binary,read_ahead,read],
[{write_buffer,1048576}],
false,true,
{1465,346244,764691}}},
{{#Ref<0.0.3145729.258001>,fhc_handle},
{handle,{file_descriptor,prim_file,{#Port<0.86370>,64}},
14212552,false,0,1048576,[],false,
"/var/lib/rabbitmq/mnesia/$RABBITMQ_NODENAME/msg_store_persistent/340.rdq",
[raw,binary,read_ahead,read,write],
[{write_buffer,1048576}],
true,true,
{1465,346244,891543}}}]
trap_exit: false
status: running
heap_size: 121536
stack_size: 27
reductions: 835024
neighbours:
We'd like to understand what this crash report means. Does it signify a bad message, RMQ can't find a message, or something completely different? We're using RabbitMQ 3.1.5 with Erlang 18, and while we know we're using an old version, we want to first know what's causing the crash before dedicating resources to an upgrade.

This message means that RabbitMQ message storage process has failed to combine files during garbage collection on message store. This can in theory cause message loss.
Note that 3.1.5 is not supported and has not been tested with OTP 10. This issue can be already fixed in newer versions though.

Related

Issues while running with BigQueryIO.Write.Method.STORAGE_WRITE_API

We are testing with STORAGE_WRITE_API to insert data into BigQuery. We've seen several errors/warnings in our Dataflow pipeline(written in Java). It might work well in the beginning, but eventually the system lag would be increasing, it would stop processing any data from PubSub and the unacked messages piled up.
One common warning is:
Operation ongoing in step insertTableRowsToBigQuery/StorageApiLoads/StorageApiWriteSharded/Write Records for at least 03h35m00s without outputting or completing in state process
at java.base#11.0.9/jdk.internal.misc.Unsafe.park(Native Method)
at java.base#11.0.9/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
at java.base#11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
at java.base#11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1039)
at java.base#11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
at java.base#11.0.9/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
at app//org.apache.beam.sdk.io.gcp.bigquery.RetryManager$Callback.await(RetryManager.java:153)
at app//org.apache.beam.sdk.io.gcp.bigquery.RetryManager$Operation.await(RetryManager.java:136)
at app//org.apache.beam.sdk.io.gcp.bigquery.RetryManager.await(RetryManager.java:256)
at app//org.apache.beam.sdk.io.gcp.bigquery.RetryManager.run(RetryManager.java:248)
at app//org.apache.beam.sdk.io.gcp.bigquery.StorageApiWritesShardedRecords$WriteRecordsDoFn.process(StorageApiWritesShardedRecords.java:453)
at app//org.apache.beam.sdk.io.gcp.bigquery.StorageApiWritesShardedRecords$WriteRecordsDoFn$DoFnInvoker.invokeProcessElement(Unknown Source)
Other exceptions we've seen:
Got error io.grpc.StatusRuntimeException: FAILED_PRECONDITION: Stream is closed
Got error io.grpc.StatusRuntimeException: ALREADY_EXIST
PodSandboxStatus of sandbox "..." for pod "df-...-pipeline-...-harness-qw4j_default(...)" error: rpc error: code = Unknown desc = Error: No such container
Code sample:
toBq.apply("insertTableRowsToBigQuery",
BigQueryIO
.writeTableRows()
.to(String.format("%s:%s.%s", PROJECT_ID, DATASET, table))
.withTriggeringFrequency(Duration.standardSeconds(options.getTriggeringFrequency()))
.withNumStorageWriteApiStreams(options.getNumStorageWriteApiStreams())
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));
There was a production issue related to connection being stuck after streaming 10MB which has been fixed. If you try again, it should work.

Reactor Kafka health check in a Spring webflux app

I have a Reactor Kafka application that consumes messages from a topic indefinitely. I need to expose a health check REST endpoint that can indicate the health of this process - Essentially interested in knowing if the Kafka receiver flux sequence has terminated so that some action can be taken to start it. Is there a way to know the current status of a flux (completed/terminated etc)? The application is Spring Webflux + Reactor Kafka.
Edit 1 - doOnTerminate/doFinally do not execute
Flux.range(1, 5)
.flatMap(record -> Mono.just(record)
.map(i -> {
throw new OutOfMemoryError("Forcing exception for " + i);
})
.doOnNext(i -> System.out.println("doOnNext: " + i))
.doOnError(e -> System.err.println(e))
.onErrorResume(e -> Mono.empty()))
.doFinally(signalType -> System.err.println("doFinally: Terminating with Signal type: " + signalType))
.doOnTerminate(()-> System.err.println("doOnTerminate: executed"))
.subscribe();
"C:\Program Files\Java\jdk1.8.0_211\bin\java.exe" "-javaagent:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2019.2.4\lib\idea_rt.jar=52295:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2019.2.4\bin" -Dfile.encoding=UTF-8 -classpath "C:\Program Files\Java\jdk1.8.0_211\jre\lib\charsets.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\deploy.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\access-bridge-64.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\cldrdata.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\dnsns.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\jaccess.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\jfxrt.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\localedata.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\nashorn.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\sunec.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\sunjce_provider.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\sunmscapi.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\sunpkcs11.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\zipfs.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\javaws.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\jce.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\jfr.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\jfxswt.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\jsse.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\management-agent.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\plugin.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\resources.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\rt.jar;C:\Users\akoul680\intellij-workspace\basics\target\classes;C:\Users\akoul680\.m2\repository\com\zaxxer\HikariCP\3.4.1\HikariCP-3.4.1.jar;C:\Users\akoul680\.m2\repository\org\apache\kafka\kafka-clients\2.2.0\kafka-clients-2.2.0.jar;C:\Users\akoul680\.m2\repository\com\github\luben\zstd-jni\1.3.8-1\zstd-jni-1.3.8-1.jar;C:\Users\akoul680\.m2\repository\org\lz4\lz4-java\1.5.0\lz4-java-1.5.0.jar;C:\Users\akoul680\.m2\repository\org\xerial\snappy\snappy-java\1.1.7.2\snappy-java-1.1.7.2.jar;C:\Users\akoul680\.m2\repository\org\apache\avro\avro\1.9.0\avro-1.9.0.jar;C:\Users\akoul680\.m2\repository\com\fasterxml\jackson\core\jackson-core\2.9.8\jackson-core-2.9.8.jar;C:\Users\akoul680\.m2\repository\com\fasterxml\jackson\core\jackson-databind\2.9.8\jackson-databind-2.9.8.jar;C:\Users\akoul680\.m2\repository\com\fasterxml\jackson\core\jackson-annotations\2.9.0\jackson-annotations-2.9.0.jar;C:\Users\akoul680\.m2\repository\org\apache\commons\commons-compress\1.18\commons-compress-1.18.jar;C:\Users\akoul680\.m2\repository\com\codahale\metrics\metrics-core\3.0.2\metrics-core-3.0.2.jar;C:\Users\akoul680\.m2\repository\org\junit\jupiter\junit-jupiter-api\5.3.2\junit-jupiter-api-5.3.2.jar;C:\Users\akoul680\.m2\repository\org\apiguardian\apiguardian-api\1.0.0\apiguardian-api-1.0.0.jar;C:\Users\akoul680\.m2\repository\org\opentest4j\opentest4j\1.1.1\opentest4j-1.1.1.jar;C:\Users\akoul680\.m2\repository\org\junit\platform\junit-platform-commons\1.3.2\junit-platform-commons-1.3.2.jar;C:\Users\akoul680\.m2\repository\org\slf4j\slf4j-api\1.7.26\slf4j-api-1.7.26.jar;C:\Users\akoul680\.m2\repository\ch\qos\logback\logback-core\1.2.3\logback-core-1.2.3.jar;C:\Users\akoul680\.m2\repository\ch\qos\logback\logback-classic\1.2.3\logback-classic-1.2.3.jar;C:\Users\akoul680\.m2\repository\io\projectreactor\reactor-core\3.4.10\reactor-core-3.4.10.jar;C:\Users\akoul680\.m2\repository\org\reactivestreams\reactive-streams\1.0.3\reactive-streams-1.0.3.jar;C:\Users\akoul680\.m2\repository\io\projectreactor\reactor-test\3.4.10\reactor-test-3.4.10.jar;C:\Users\akoul680\.m2\repository\commons-net\commons-net\3.6\commons-net-3.6.jar;C:\Users\akoul680\.m2\repository\com\box\box-java-sdk\2.32.0\box-java-sdk-2.32.0.jar;C:\Users\akoul680\.m2\repository\com\eclipsesource\minimal-json\minimal-json\0.9.1\minimal-json-0.9.1.jar;C:\Users\akoul680\.m2\repository\org\bitbucket\b_c\jose4j\0.4.4\jose4j-0.4.4.jar;C:\Users\akoul680\.m2\repository\org\bouncycastle\bcprov-jdk15on\1.52\bcprov-jdk15on-1.52.jar;C:\Users\akoul680\.m2\repository\com\jcraft\jsch\0.1.55\jsch-0.1.55.jar;C:\Users\akoul680\.m2\repository\org\apache\commons\commons-vfs2\2.4\commons-vfs2-2.4.jar;C:\Users\akoul680\.m2\repository\commons-logging\commons-logging\1.2\commons-logging-1.2.jar;C:\Users\akoul680\.m2\repository\org\bouncycastle\bcpkix-jdk15on\1.52\bcpkix-jdk15on-1.52.jar;C:\Users\akoul680\intellij-workspace\basics\lib\db2jcc4.jar" lrn.chapter14.ErrorHandling
2021-10-12T09:53:34,344 main r.util.Loggers - Using Slf4j logging framework
Exception in thread "main" java.lang.OutOfMemoryError: Forcing exception for 1
at lrn.chapter14.ErrorHandling.lambda$null$0(ErrorHandling.java:19)
at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.onNext(FluxMapFuseable.java:281)
at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2398)
at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.request(FluxMapFuseable.java:354)
at reactor.core.publisher.FluxPeekFuseable$PeekFuseableConditionalSubscriber.request(FluxPeekFuseable.java:437)
at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.request(MonoPeekTerminal.java:139)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.set(Operators.java:2194)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onSubscribe(FluxOnErrorResume.java:74)
at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.onSubscribe(MonoPeekTerminal.java:152)
at reactor.core.publisher.FluxPeekFuseable$PeekFuseableConditionalSubscriber.onSubscribe(FluxPeekFuseable.java:471)
at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.onSubscribe(FluxMapFuseable.java:263)
at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:55)
at reactor.core.publisher.Mono.subscribe(Mono.java:4361)
at reactor.core.publisher.FluxFlatMap$FlatMapMain.onNext(FluxFlatMap.java:426)
at reactor.core.publisher.FluxRange$RangeSubscription.slowPath(FluxRange.java:156)
at reactor.core.publisher.FluxRange$RangeSubscription.request(FluxRange.java:111)
at reactor.core.publisher.FluxFlatMap$FlatMapMain.onSubscribe(FluxFlatMap.java:371)
at reactor.core.publisher.FluxRange.subscribe(FluxRange.java:69)
at reactor.core.publisher.Flux.subscribe(Flux.java:8468)
at reactor.core.publisher.Flux.subscribeWith(Flux.java:8641)
at reactor.core.publisher.Flux.subscribe(Flux.java:8438)
at reactor.core.publisher.Flux.subscribe(Flux.java:8362)
at reactor.core.publisher.Flux.subscribe(Flux.java:8280)
at lrn.chapter14.ErrorHandling.ex5(ErrorHandling.java:26)
at lrn.chapter14.ErrorHandling.main(ErrorHandling.java:12)
Process finished with exit code 1
You can't query the flux itself, but you can tell it to do something if it ever stops.
In the service that contains your Kafka listener, I'd recommend adding a terminated (or similar) boolean flag that's false by default. You can then ensure that the last operator in your flux is:
.doOnTerminate(() -> terminated = true)
...and then get the healthcheck endpoint to monitor that value, marking the container as unhealthy if that flag is ever true.
doOnTerminate() is more reliable than doOnError() in this use-case, as it executes whether the publisher has terminated either with an error, or a completion signal. As per the comment though, this isn't completely reliable - if your publisher terminates due to a JVM error or similar, that doOnTerminate() operator won't be run.
In my experience, if this happens it's usually due to an OutOfMemoryError, in which case the -XX:+ExitOnOutOfMemoryError is a good VM option to use (the immediate exit can then trigger an immediate restart policy, without waiting for a healthcheck endpoint to be called and trigger the restart after a while.)
Bear in mind there are other fatal JVM errors that wouldn't get caught by the above process though, so that's still not 100% reliable.

How to handle reactor-netty terminating a request in spring-webflux?

We have a spring-webflux application running on spring-boot-starter-webflux:jar:2.1.5.RELEASE and reactor-netty:jar:0.8.8.RELEASE. When a reactive client goes away (k8s pod killed or the client subscription is disposed off) before the request is completed, we see the server simply stops processing the request and no more application logs related to that request are seen. However, reactive-netty prints a trace log indicating that the channel is inactive and will be terminated.
Is there anything we can do to handle this termination gracefully? We would ideally like to respond to a cancellation signal from reactor.
2020-03-18T16:46:40.372Z TRACE --- [reactor-http-epoll-2] r.n.c.ChannelOperations : [id: 0x4cad974b, L:/<SERVER_IP_ADDRESS>:8080 ! R:/<SOME_OTHER_IP_ADDRESS>:55436] Disposing ChannelOperation from a channel
java.lang.Exception: ChannelOperation terminal stack
at reactor.netty.channel.ChannelOperations.terminate(ChannelOperations.java:391)
at reactor.netty.channel.ChannelOperations.onInboundClose(ChannelOperations.java:360)
at reactor.netty.channel.ChannelOperationsHandler.channelInactive(ChannelOperationsHandler.java:72)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelInactive(CombinedChannelDuplexHandler.java:420)
at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:393)
at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:358)
at io.netty.channel.CombinedChannelDuplexHandler.channelInactive(CombinedChannelDuplexHandler.java:223)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1416)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:912)
at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:816)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:331)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Thread.java:834)
The doOnCancel operator allows for a callback to be provided when a subscription is cancelled. This is different from the doOnEach operator which is run only when the subscription completes or errors out.
https://projectreactor.io/docs/core/release/api/reactor/core/publisher/Mono.html#doOnCancel-java.lang.Runnable-

Can anyone help to understand the following errors where RabbitMQ is going in error state?

I am getting this error again and again.
"operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
For the above error I have already found something from this https://bugzilla.redhat.com/show_bug.cgi?id=1343027 i.e
Rabbit can join the rabbitmq cluster if the controller-0 was rebooted,came up,started all the resources and only when everything works controller-1 goes for the reboot. In other words everything should work when rebooting one of the controllers. If,for some reason, controller-1 reboots while controller-0 not fully recovered after its reboot - things go wrong.
But I am not sure why is the error log file also showing me the below error:
=ERROR REPORT==== 29-Dec-2019::17:44:26 === Mnesia('messaging#rabbit-2'): ** ERROR ** (ignoring core) ** FATAL ** mnesia_monitor crashed: {badarg, [{ets, lookup, [mnesia_decision, 'messaging#rabbit-3'], []}, {mnesia_recover, has_mnesia_down, 1, [{file, "mnesia_recover.erl"}, {line, 299}]}, {mnesia_monitor, check_mnesia_down, 2, [{file, "mnesia_monitor.erl"}, {line, 862}]}, {mnesia_monitor, handle_info, 2, [{file, "mnesia_monitor.erl"}, {line, 579}]}, {gen_server, try_dispatch, 4, [{file, "gen_server.erl"}, {line, 615}]}, {gen_server, handle_msg, 5, [{file, "gen_server.erl"}, {line, 681}]}, {proc_lib, init_p_do_apply, 3, [{file, "proc_lib.erl"}, {line, 240}]}]} state: {state, <0.745.0>, [], [], true, [], undefined, [], []}
The error message says one system process of the Mnesia DB, mnesia_monitor is crashing when it tries to look up a value from an ETS table (mnesia_decision) owned by an other system process of the DB, mnesia_recover. This can only happen if the ETS table no longer exists, that is if the mnesia_recover has stopped.
This error message doesn't say why mnesia_recover has stopped. If it has crashed, there should be an other error message about that event in the log. But it is also possible that the whole Mnesia application has been stopping at that time, because the supervisor would stop mnesia_recover before mnesia_monitor. If that's the case, this error is just caused by bad timing: mnesia_monitor sees the messaging#rabbit-3 node coming up at a point when Mnesia on its node is already shutting down.

php7.0.2 Program terminated with signal 11, Segmentation fault

I am running php-7.0.2 with codeigniter (a php mvc frame). I got some segmentation faults which caused core dumps. And, I found that these segmentation faults randomly occurred when the child php-fpm processes shutdown and restart. I don't know why.
Using gdb "bt" to display the core dump:
Core was generated by `php-fpm: pool www '.
Program terminated with signal 11, Segmentation fault.
\#0 zend_string_release (ht=0x114dae0) at /home/smt/phpng/php-7.0.2/Zend/zend_string.h:269
269 /home/smt/phpng/php-7.0.2/Zend/zend_string.h: No such file or directory.
in /home/smt/phpng/php-7.0.2/Zend/zend_string.h
Missing separate debuginfos, use: debuginfo-install php7-7.0.2-20160407105024.x86_64
(gdb) bt
\#0 zend_string_release (ht=0x114dae0) at /home/smt/phpng/php-7.0.2/Zend/zend_string.h:269
\#1 zend_hash_destroy (ht=0x114dae0) at /home/smt/phpng/php-7.0.2/Zend/zend_hash.c:1273
\#2 0x000000000080647b in module_destructor (module=0x14b6ae0)
at /home/smt/phpng/php-7.0.2/Zend/zend_API.c:2509
\#3 0x000000000080075c in module_destructor_zval (zv=<value optimized out>)
at /home/smt/phpng/php-7.0.2/Zend/zend.c:615
\#4 0x000000000080dcff in _zend_hash_del_el_ex (ht=0x1154780)
at /home/smt/phpng/php-7.0.2/Zend/zend_hash.c:1013
\#5 _zend_hash_del_el (ht=0x1154780) at /home/smt/phpng/php-7.0.2/Zend/zend_hash.c:1037
\#6 zend_hash_graceful_reverse_destroy (ht=0x1154780) at /home/smt/phpng/php-7.0.2/Zend/zend_hash.c:1489
\#7 0x0000000000800096 in zend_shutdown () at /home/smt/phpng/php-7.0.2/Zend/zend.c:840
\#8 0x00000000007a2a6a in php_module_shutdown () at /home/smt/phpng/php-7.0.2/main/main.c:2339
\#9 0x000000000089e45d in main (argc=<value optimized out>, argv=<value optimized out>)
at /home/smt/phpng/php-7.0.2/sapi/fpm/fpm/fpm_main.c:1997
(gdb) quit
The php-fpm.log is as following:
[20-Apr-2016 08:00:02] WARNING: [pool www] child 11751 exited on signal 11 (SIGSEGV - core dumped) after 3600.462022 seconds from start
I am very curious about this bug.
Until now, I am sure that the core dumps occurred when the fpm restarted. The restarts were caused by the command 'kill -10 fpm-master-process-ids'. Or, the fpm also restarted when it had processed 'pm.max_requests' requests.
However, the core dumps didn't occur at every restart and the probability of core dumps was very small. I cannot find the role.
Fortunately, I have installed the 7.0.5 version to replace the 7.0.2 version in our production environment and it had run for three days without core dumps.
I cannot find any modification in the changelogs from 7.0.2 to 7.0.5. This is exactly a very strange bug and I want to know the reason. who can tell me something about this bug?
After updating to 7.0.5, core dump has not occurred for 2 weeks. So, this bug has been fixed in 7.0.5!
I still don't know what case this bug.
I am a curious cat. #_#