hang in google/protobuf/pyext/_message.so at exit - tensorflow

This is TensorFlow 1.0.1 installed via pip.
It runs via an embedded CPython (libpython).
Sometimes (maybe 30% of my runs) it hangs in Py_Finalize(), and I see this backtrace:
/work/asr2/zeyer/sprint-executables/20160902.235443.fad8965.linux-x86_64-standard/Flf/flf-tool.linux-intel-standard(_ZN17AssertionsPrivate15safe_stackTraceEi+0x21)[0xc5b891]
/work/asr2/zeyer/sprint-executables/20160902.235443.fad8965.linux-x86_64-standard/Flf/flf-tool.linux-intel-standard[0xc5b8ef]
/u/zeyer/tools/glibc217/libpthread.so.0(+0x113d0)[0x2b6d89bad3d0]
/u/zeyer/tools/glibc217/libpthread.so.0(raise+0x29)[0x2b6d89bad2a9]
/u/zeyer/py-envs/py2-ubuntu16/local/lib/python2.7/site-packages/faulthandler.so(+0x3198)[0x2b6dc2372198]
/u/zeyer/tools/glibc217/libpthread.so.0(+0x113d0)[0x2b6d89bad3d0]
/u/zeyer/py-envs/py2-ubuntu16/local/lib/python2.7/site-packages/google/protobuf/pyext/_message.so(+0xaa943)[0x2b6dc14f0943]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(+0x160f6b)[0x2b6d8b23af6b]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(+0xc8f0e)[0x2b6d8b1a2f0e]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(+0x15d747)[0x2b6d8b237747]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyDict_SetItem+0x7b)[0x2b6d8b23becb]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(_PyModule_Clear+0xb5)[0x2b6d8b278565]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyImport_Cleanup+0x437)[0x2b6d8b2280e7]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(Py_Finalize+0xfe)[0x2b6d8b1fed9e]
/work/asr2/zeyer/sprint-executables/20160902.235443.fad8965.linux-x86_64-standard/Flf/flf-tool.linux-intel-standard(_ZN6Python11Initializer19AtExitUninitHandlerEv+0x2e)[0xff80de]
/u/zeyer/tools/glibc217/libc.so.6(+0x39fe8)[0x2b6d8bc39fe8]
/u/zeyer/tools/glibc217/libc.so.6(+0x3a035)[0x2b6d8bc3a035]
/u/zeyer/tools/glibc217/libc.so.6(__libc_start_main+0xf7)[0x2b6d8bc20837]
/work/asr2/zeyer/sprint-executables/20160902.235443.fad8965.linux-x86_64-standard/Flf/flf-tool.linux-intel-standard[0x7d6991]
or with GDB:
(gdb) bt full
#0 0x00002b6dc14f0943 in std::tr1::_Hashtable<google::protobuf::DescriptorPool const*, std::pair<google::protobuf::DescriptorPool const* const, google::protobuf::python::PyDescriptorPool*>, std::allocator<std::pair<google::protobuf::DescriptorPool const* const, google::protobuf::python::PyDescriptorPool*> >, std::_Select1st<std::pair<google::protobuf::DescriptorPool const* const, google::protobuf::python::PyDescriptorPool*> >, std::equal_to<google::protobuf::DescriptorPool const*>, google::protobuf::hash<google::protobuf::DescriptorPool const*>, std::tr1::__detail::_Mod_range_hashing, std::tr1::__detail::_Default_ranged_hash, std::tr1::__detail::_Prime_rehash_policy, false, false, true>::erase (
__k=#0x7ffd1bbea740: 0x8269780, this=0x2b6dc1826e40 <google::protobuf::python::descriptor_pool_map>)
at /opt/rh/devtoolset-2/root/usr/include/c++/4.8.2/tr1/hashtable.h:1041
__slot = <optimized out>
__saved_slot = <optimized out>
__code = 136746880
__n = 0
__result = 0
#1 google::protobuf::python::cdescriptor_pool::Dealloc (self=0x2b6dc0d86880)
at google/protobuf/pyext/descriptor_pool.cc:152
No locals.
#2 0x00002b6d8b23af6b in ?? () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
No symbol table info available.
#3 0x00002b6d8b1a2f0e in ?? () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
No symbol table info available.
#4 0x00002b6d8b237747 in ?? () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
No symbol table info available.
#5 0x00002b6d8b23becb in PyDict_SetItem () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
No symbol table info available.
#6 0x00002b6d8b278565 in _PyModule_Clear () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
No symbol table info available.
#7 0x00002b6d8b2280e7 in PyImport_Cleanup () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
No symbol table info available.
#8 0x00002b6d8b1fed9e in Py_Finalize () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
No symbol table info available.
#9 0x0000000000ff80de in Python::Initializer::AtExitUninitHandler() ()
No symbol table info available.
#10 0x00002b6d8bc39fe8 in ?? () from /u/zeyer/tools/glibc217/libc.so.6
No symbol table info available.
#11 0x00002b6d8bc3a035 in exit () from /u/zeyer/tools/glibc217/libc.so.6
No symbol table info available.
#12 0x00002b6d8bc20837 in __libc_start_main () from /u/zeyer/tools/glibc217/libc.so.6
No symbol table info available.
#13 0x00000000007d6991 in _start ()
No symbol table info available.
I.e. it happens in _PyModule_Clear, and then inside google/protobuf/pyext/_message.so, that's why I think this is TF related.
In the case when it does not hang, I see this output:
Exception AttributeError: AttributeError("'NoneType' object has no attribute 'raise_exception_on_not_ok_status'",) in <bound method Session.__del__ of <tensorflow.python.client.session.Session object at 0x2afd625b12d0>> ignored
I also asked upstream on TF but they suggested to post it here.
Any idea why it might hang and how to resolve this?

Note that this crash was happening inside a callback via std::atexit. I guess the problem is that some stuff from Google or std is cleaned up before I call Py_Finalize from my atexit-handler which leads to this crash. I think this should not happen though.
Anyway, I kind of worked around the problem by not using std::atexit now but by using my own exit handler logic instead (which however would not work if I directly use exit() anywhere).

Related

Nanomsg gives signal 6 abort during fetching from couchbase

I am getting a signal 6 error when i am trying to fetch data from couchbase, this occurs at erratic intervals. I am using version 1.1 and from the code i can see if poll returns value less than 0, errno_assert is being triggered which crashes the application with signal 6.
Below is the backtrace of nanomsg thread:
Program terminated with signal 6, Aborted.
#0 0x00007ffff4c74a33 in select () from /usr/lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install boost-system-1.53.0-28.el7.x86_64 cyrus-sasl-lib-2.1.26-23.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-50.el7.x86_64 libcom_err-1.42.9-19.el7.x86_64 libcurl-7.29.0-59.el7_9.1.x86_64 libidn-1.28-4.el7.x86_64 libselinux-2.5-15.el7.x86_64 libssh2-1.8.0-4.el7.x86_64 nspr-4.32.0-1.el7_9.x86_64 nss-3.53.1-3.el7_9.x86_64 nss-util-3.67.0-1.el7_9.x86_64 openldap-2.4.44-22.el7.x86_64 openssl-libs-1.0.2k-21.el7_9.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-19.el7_9.x86_64
(gdb) thread 18
[Switching to thread 18 (Thread 0x7fffcef74700 (LWP 40630))]
#0 0x00007ffff4bfce00 in _IO_cleanup () from /usr/lib64/libc.so.6
(gdb) bt full
#0 0x00007ffff4bfce00 in _IO_cleanup () from /usr/lib64/libc.so.6
No symbol table info available.
#1 0x00007ffff4bb6be5 in abort () from /usr/lib64/libc.so.6
No symbol table info available.
#2 0x00000000009ff371 in nn_err_abort ()
No symbol table info available.
#3 0x00000000009ff2cd in nn_efd_wait ()
No symbol table info available.
#4 0x00000000009fbb13 in nn_sock_recv ()
No symbol table info available.
#5 0x00000000009f95fa in nn_recvmsg ()
No symbol table info available.
#6 0x00000000009f9015 in nn_recv ()
No symbol table info available.
#7 0x000000000099b8c9 in vcmNpsIcmMsgRecv ()
No symbol table info available.
#8 0x0000000000975a57 in __vcmNpsIcmRecv ()
No symbol table info available.
#9 0x00000000007261f5 in vcmDpeEmaIcmStatsCb(void*) ()
No symbol table info available.
#10 0x000000000099a54b in vcmNpsIcmInterfaceCreate ()
No symbol table info available.
#11 0x000000000095b1c0 in ?? ()
No symbol table info available.
#12 0x00007ffff7250ea5 in start_thread (arg=0x7fffcef74700) at pthread_create.c:307
__res =
pd = 0x7fffcef74700
now =
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140736665700096, -1738215837302118578, 0, 33558528, 0, 140736665700096, 1738108024908031822,
1738235171653297998}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
#13 0x00007ffff4c7d9fd in clone () from /usr/lib64/libc.so.6
This is present in one of our stdout files:
Invalid argument [22] (home/3rd-party/nanomsg/src/utils/efd.c:91)
Code::
88 rc = poll (&pfd, 1, timeout);
89 if (nn_slow (rc < 0 && errno == EINTR))
90 return -EINTR;
91 errno_assert (rc >= 0);
I found that in version 1.2, this errno_assert is not present in the code, Can this errno_assert be safely removed from our code without updating the nanomsg version . Please help.

just_audio SocketException

When I am trying to open a source (url) without the internet on the device (flight mode on), I receive the next logs:
[ +1 ms] flutter: SocketException: Failed host lookup: 'raw.githubusercontent.com' (OS Error: nodename nor servname provided, or not known, errno = 8)
[ +1 ms] flutter:
#0 _NativeSocket.startConnect (dart:io-patch/socket_patch.dart:682:35)
#1 _RawSocket.startConnect (dart:io-patch/socket_patch.dart:1817:26)
#2 RawSocket.startConnect (dart:io-patch/socket_patch.dart:27:23)
#3 RawSecureSocket.startConnect (dart:io/secure_socket.dart:237:22)
#4 SecureSocket.startConnect (dart:io/secure_socket.dart:60:28)
#5 _ConnectionTarget.connect (dart:_http/http_impl.dart:2438:24)
#6 _HttpClient._getConnection.connect (dart:_http/http_impl.dart:2834:12)
#7 _HttpClient._getConnection (dart:_http/http_impl.dart:2839:12)
#8 _HttpClient._openUrl (dart:_http/http_impl.dart:2698:12)
#9 _HttpClient.getUrl (dart:_http/http_impl.dart:2575:48)
#10 _proxyHandlerForUri.handler (package:just_audio/just_audio.dart:3039:46)
#11 _proxyHandlerForUri.handler (package:just_audio/just_audio.dart:3038:23)
#12 _ProxyHttpServer.start.<anonymous closure> (package:just_audio/just_audio.dart:1946:16)
#13 _ProxyHttpServer.start<…>
try/catch doesn't help...
Any advice or idea?
UPDATE:
It take me some time to debug example app and compare with mine.
The issue comes cos I use headers with my source.
Here is piece of code from library, where dramattic change happened:
#override
Future<void> _setup(AudioPlayer player) async {
await super._setup(player);
if (uri.scheme == 'asset') {
_overrideUri = await _loadAsset(uri.pathSegments.join('/'));
} else if (uri.scheme != 'file' &&
!kIsWeb &&
(headers != null || player._userAgent != null)) {
await player._proxy.ensureRunning();
_overrideUri = player._proxy.addUriAudioSource(this);
}
}
we are interested in this particular line:
(headers != null || player._userAgent != null)) {
I use headers, so in my app code goes futher, and then fails inside _proxyHandlerForUri, line 3039.

Getting Seg Fault when I try to dynamically load a custom library (.so) which is compiled with webkit2gtk library

I have created a shared library which has a function displaywebview that launches a GTK window and loads the URL into it using webkit2gtk.
Now I am writing a caller program which loads this library using dlopen, gets the method displaywebview using dlsym and calls this function.
I get a seg fault inside displaywebview at the point where I call webkit_web_view_new(). Could someone help me out on why this is happening?
webkit_main.so
#include <gtk/gtk.h>
#include <webkit2/webkit2.h>
extern "C"
{
int displayWebView();
}
int displayWebView()
{
printf("Entered in displayWebView\n");
// Initialize GTK+
gtk_init(NULL, NULL);
// Create an 800x600 window that will contain the browser instance
GtkWidget *main_window = gtk_window_new(GTK_WINDOW_TOPLEVEL);
gtk_window_set_default_size(GTK_WINDOW(main_window), 800, 600);
WebKitWebView *webView = (WebKitWebView*)webkit_web_view_new();
// webkit_web_view_new();
// // Put the browser area into the main window
gtk_container_add(GTK_CONTAINER(main_window), GTK_WIDGET(webView));
// Set up callbacks so that if either the main window or the browser instance is
// closed, the program will exit
g_signal_connect(main_window, "destroy", G_CALLBACK(destroyWindowCb), NULL);
g_signal_connect(webView, "close", G_CALLBACK(closeWebViewCb), main_window);
// // Load a web page into the browser instance
webkit_web_view_load_uri(webView, "http://www.gmail.com");
// // Make sure that when the browser area becomes visible, it will get mouse
// // and keyboard events
gtk_widget_grab_focus(GTK_WIDGET(webView));
// // Make sure the main window and all its contents are visible
gtk_widget_show_all(main_window);
// // Run the main GTK+ event loop
gtk_main();
return 0;
}
caller.cpp
#include <unistd.h>
#include <stdlib.h>
#include <dlfcn.h>
#include <stdio.h>
typedef int (*PDISPLAYWEBVIEW)();
int main(){
void* hnd = dlopen("/home/radix/Desktop/webkit_socket/webkit_main.so", RTLD_LAZY);
// sleep(10);
if(hnd!=NULL){
PDISPLAYWEBVIEW pdisplayWebView = (PDISPLAYWEBVIEW)dlsym(hnd,"displayWebView");
if(pdisplayWebView == NULL){
printf("dlsym error %s", dlerror());
}
else{
printf("Everything okay, launch the function\n");
(*pdisplayWebView)();
}
dlclose(hnd);
}
else{
printf("The error is %s", dlerror());
}
}
BACKTRACE details:
Thread 1 "caller" received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff790d165 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2 0x00007ffff2009168 in bmalloc::Scavenger::Scavenger(std::lock_guard<bmalloc::Mutex>&) () from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#3 0x00007ffff1d12f61 in bmalloc::PerProcess<bmalloc::Scavenger>::getSlowCase() () from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#4 0x00007ffff2001fcc in bmalloc::Heap::Heap(bmalloc::HeapKind, std::lock_guard<bmalloc::Mutex>&) () from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#5 0x00007ffff1fff1ff in bmalloc::PerProcess<bmalloc::PerHeapKind<bmalloc::Heap> >::getSlowCase() () from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#6 0x00007ffff1ffee99 in bmalloc::Cache::Cache(bmalloc::HeapKind) () from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#7 0x00007ffff1fff311 in bmalloc::PerThread<bmalloc::PerHeapKind<bmalloc::Cache> >::getSlowCase() () from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#8 0x00007ffff1ffef0d in bmalloc::Cache::allocateSlowCaseNullCache(bmalloc::HeapKind, unsigned long) () from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#9 0x00007ffff1af22a2 in JSC::ExecutableAllocator::initializeAllocator() () from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#10 0x00007ffff1d0bf25 in ?? () from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#11 0x00007ffff0926739 in __pthread_once_slow (once_control=0x7ffff22a6ff0, init_routine=0x7ffff790c120 <__once_proxy>) at pthread_once.c:116
#12 0x00007ffff1d0d90d in JSC::initializeThreading() () from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#13 0x00007ffff49beb29 in ?? () from /usr/lib/x86_64-linux-gnu/libwebkit2gtk-4.0.so.37
#14 0x00007ffff4aa4add in ?? () from /usr/lib/x86_64-linux-gnu/libwebkit2gtk-4.0.so.37
#15 0x00007ffff4b0eb00 in ?? () from /usr/lib/x86_64-linux-gnu/libwebkit2gtk-4.0.so.37
#16 0x00007ffff0e5e777 in ?? () from /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#17 0x00007ffff0e5fc0d in g_object_newv () from /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#18 0x00007ffff0e603c4 in g_object_new () from /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#19 0x00007ffff4b0ab7d in ?? () from /usr/lib/x86_64-linux-gnu/libwebkit2gtk-4.0.so.37
#20 0x00007ffff0ba74a5 in g_once_impl () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#21 0x00007ffff4b31cc9 in webkit_web_view_new () from /usr/lib/x86_64-linux-gnu/libwebkit2gtk-4.0.so.37
#22 0x00007ffff6d96f0f in displayWebView () at webkit_main.cpp:81
#23 0x00005555555548df in main () at caller.cpp:20
Note:
When I use the same program with webkitgtk-1.0 it runs absolutely fine. With webkit2gtk-4.0 it gives this issue.
When I compile caller.cpp with libwebkit2gtk-4.0 it strangely doesn't give seg fault anymore.
Could someone help me out why this is happening?
I am using debian 9 with webkit2gtk-4.0-37 of version: 2.22.2-1~bpo9+1
There are some compiler options that have to be the same within every components (object modules, libraries, shared objects) so that they could work together.
Other than -pthread, large-file-support (in 32-bit) comes to my mind.
Another possible source of problems if different versions of a component are linked together.

Jaguar orm - reflexive relationship

Does jaguar_orm support reflexive relationships?
I have a Category class that can be part of another Category:
class Category {
/// constructor
Category();
Category.make(this.id, this.name);
/// fields
#PrimaryKey()
int id;
#Column(isNullable: false)
String name;
#BelongsTo(CategoryBean, isNullable: true, refCol: 'id')
int parentCategoryId;
/// database
// String toString() => "Product($id, $name, $parentCategoryId)";
String toString() => "Product($id, $name)";
}
When I try to create one I get a stack overflow as follows:
Exception while parsing field: id!
Stack Overflow
#0 _Uri._uriEncode (dart:core/runtime/liburi_patch.dart:34:3)
#1 _Uri._makePath.<anonymous closure> (dart:core/uri.dart:2116:23)
#2 ListIterable.join (dart:_internal/iterable.dart)
#3 _Uri._makePath (dart:core/uri.dart:2117:12)
#4 _SimpleUri.replace (dart:core/uri.dart:4358:19)
#5 urlOfElement (package:source_gen/src/utils.dart:87:11)
#6 _MirrorTypeChecker.isExactly (package:source_gen/src/type_checker.dart:264:49)
#7 _ListBase&Object&ListMixin.any (dart:collection/list.dart)
#8 TypeChecker.isAssignableFromType (package:source_gen/src/type_checker.dart:162:57)
#9 ParsedBean._makeField.<anonymous closure> (package:jaguar_orm_gen/src/parser/parser.dart:351:47)
#10 WhereIterator.moveNext (dart:_internal/iterable.dart)
#11 MappedIterator.moveNext (dart:_internal/iterable.dart:391:19)
#12 new List.from (dart:core/runtime/libarray_patch.dart:40:17)
#13 ParsedBean._makeField (package:jaguar_orm_gen/src/parser/parser.dart:472:34)
#14 ParsedBean._parseFields (package:jaguar_orm_gen/src/parser/parser.dart:325:21)
#15 ParsedBean.detect (package:jaguar_orm_gen/src/parser/parser.dart:74:5)
#16 ParsedBean.detect (package:jaguar_orm_gen/src/parser/parser.dart:85:56)
This repeats for some time.
Is there something special I need to do, to prevent the stack overflow?
You need to add a #HasOne or #HasMany decorator depending of your case.

EXC_BAD_ACCESS on presentRenderbuffer

When I call presentRenderBuffer in some situations my app crash with EXC_BAD_ACCESS. But usually all is ok.
Call stack is here:
#0 0x2f53f02e in glrGetPrivateInteger ()
#1 0x329a192e in gliGetInteger ()
#2 0x002eec04 in __collect_all_context_profiling_data_block_invoke ()
#3 0x0015ea7c in iter_contexts ()
#4 0x002ee7f2 in collect_all_context_profiling_data ()
#5 0x00163fbc in copy_profiling_data_dictionary(ContextInfo*, unsigned int, unsigned long long) ()
#6 0x00160566 in handle_frame_boundary ()
#7 0x002f194c in EAGLContext_presentRenderbuffer(EAGLContext*, objc_selector*, unsigned int) ()
#8 0x00044a68 in __36-[CanvasView initializeWithContext:]_block_invoke56
Do you have any ideas about this?
SOLVE:
Texture is created and deleted in different contexts. This has caused problems.
Now texture is created and deleted in one context. It has solved the problem.