Since last week, Word and Excel are often crashing, even on very simple documents. Just now, Firefox also crashed. What can I do to repair it or find the real root cause?
I have already repaired Office installation using the Setup. All regular updates should be installed. Laptop is rebooted every day.
I have configured WinDbg to attach and this is what I get. I also have a dump, so if you need more information, I can still get it. Here's info from my first dump of Word:
0:020> .exr -1
ExceptionAddress: 11fdf91c
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000008
Parameter[1]: 11fdf91c
Attempt to execute non-executable address 11fdf91c
0:020> kb
ChildEBP RetAddr Args to Child
WARNING: Frame IP not in any known module. Following frames may be wrong.
11fdf91c 75d05d3f 00000008 0dc0bbe8 fffffffe 0x11fdf91c
11fdfacc 75d38f82 bf3079e6 0dc0bbe8 00000000 ole32!COIDTable::ThreadCleanup+0xcb [d:\w7rtm\com\ole32\com\dcomrem\idobj.cxx # 1760]
11fdfb10 75d38ec3 00000000 11fdfb60 75e37724 ole32!FinishShutdown+0x9d [d:\w7rtm\com\ole32\com\class\compobj.cxx # 1035]
11fdfb30 75d2bac3 00000000 75d309ad 0dc0bbe8 ole32!ApartmentUninitialize+0x96 [d:\w7rtm\com\ole32\com\class\compobj.cxx # 1291]
11fdfb48 75d388e8 11fdfb60 00000000 00000000 ole32!wCoUninitialize+0x153 [d:\w7rtm\com\ole32\com\class\compobj.cxx # 2766]
11fdfb64 6e77314a 11fdfbf4 75f043c0 0b179b08 ole32!CoUninitialize+0x72 [d:\w7rtm\com\ole32\com\class\compobj.cxx # 2620]
11fdfb6c 75f043c0 0b179b08 00000000 00000000 NetworkItemFactory!FDBackgroundThreadHandler+0x21
11fdfbf4 75bf336a 0da0f624 11fdfc40 773a9f72 SHLWAPI!WrapperThreadProc+0x1b5
11fdfc00 773a9f72 0da0f624 66709c63 00000000 kernel32!BaseThreadInitThunk+0xe
11fdfc40 773a9f45 75f042ed 0da0f624 ffffffff ntdll!__RtlUserThreadStart+0x70
11fdfc58 00000000 75f042ed 0da0f624 00000000 ntdll!_RtlUserThreadStart+0x1b
0:020> vertarget
Windows 7 Version 7601 (Service Pack 1) MP (8 procs) Free x86 compatible
Product: WinNt, suite: SingleUserTS
kernel32.dll version: 6.1.7601.18229 (win7sp1_gdr.130801-1533)
Machine Name:
Debug session time: Wed Feb 5 14:55:55.241 2014 (UTC + 1:00)
System Uptime: 0 days 3:46:03.386
Process Uptime: 0 days 0:05:08.582
Kernel time: 0 days 0:00:03.822
User time: 0 days 0:00:11.528
If I recall correctly, it was related to a VOIP software called Netphone Client from Deutsche Telekom, which includes itself into other applications via a COM object.
Therefore this was exactly as projected by Hans Passant in the comments:
You are buried inside the COM plumbing with a clear hint that its internal state is corrupted. This is an environmental problem, some kind of DLL that gets injected into the process and screws things up. Long before the crash occurs so you'll have very little hope of diagnosing it with a debugger. Find the common source of the problem from the modules list. Suspect any shell extension, anti-malware, any utility similar to Dropbox. Use SysInternals' AutoRuns to disable them.
I don't remember how exactly I found the culprit, but I finally used a physical VOIP phone instead of the software + headset.
Related
First some background:
We have the following setup in our iMX6-based embedded system. There are two U-Boot partitions and two system (Linux) partitions. Currently we use only the first U-Boot partition and it uses a standard method for selecting, running and (if need be) rolling back the system partitions.
We are now looking into a similar scheme for upgrading U-Boot itself (this will happen very rarely but we do want the ability to do this without having to return the devices to base).
However, this is more fraught with danger because, once you tell the iMX6 device to boot from the alternate U-Boot partition, that's it - there's no U-Boot/watchdog combo that will revert to the previous one if boot fails, so a bad update runs a serious risk of bricking the device until we can return it to base for repair (a costly option which is why we're trying to mitigate it as much as possible).
The method chosen is a two-step U-Boot install procedure, consisting of 'write' and 'activate'. It relies on our ability to successfully figure out which U-Boot partition will be run if the device reboots (the selected one) and which is currently being run (the booted one). We've got this bit sorted out already.
But the bit that we're missing is the ability for UBoot to transfer control to the other UBoot partition under some circumstances. We got it doing different actions based solely on the UBoot environment as follows:
First, mmcboot has a prefix added so that it checks for the control transfer, specifically it's set to run ub_xfer_chk ; <original content of mmcboot>.
Secondly, we have a variable ub_xfer_flag normally set to 0.
Thirdly, we have the checking function ub_xfer_chk, defined as:
if test ${ub_xfer_flag} -eq 1 ; then
echo Soft-booting other UBoot...
setenv ub_xfer_flag 0
saveenv
weave_magic
fi
The weave_magic code is where we are having trouble :-) The idea is that this will load the other UBoot partition into memory (at our CONFIS_SYS_TEXT_BASE of 0x1780000) and execute it as if the actual device had done it.
We've tested the meat of this solution by using reset in place of weave_magic and it successfully restarts the device once, so we're certain we can make it safe.
My specific question then is :how can I convince U-Boot to load a second copy from another partition and run it?
The two UBoot partitions live in the /dev/mmcblk3boot0 and /dev/mmcblk3boot1 devices accessible from the system partition and are 2M files, including the 1K lead-in header and a fair bit of padding at the end.
Update:
We have actually had some success and managed to load an IMX image from the boot partition with the command:
ext4load mmc ${mmcdev}:${mmcpart} 0x17800000 ${bootdir}/u-boot.imx
but, when trying to execute it with:
go 0x17800000
we get an illegal instruction and immediate reboot:
pc : [<17800070>] lr : [<4ff83c64>]
sp : 4f579ac0 ip : 00000030 fp : 4f57be58
r10: 00000002 r9 : 4f579efc r8 : 4ffbe2b0
r7 : 4f57be68 r6 : 17800000 r5 : fffff200 r4 : 000002cc
r3 : 17800000 r2 : 4f57be6c r1 : 4f57be6c r0 : 00000000
Flags: nZCv IRQs off FIQs off Mode SVC_32
Resetting CPU ...
So I'm guessing that's not executable code at the start of that file. Any ideas on where to go from here?
The actual code in the IMX file is not at the beginning. You can discover this fact by using the excellent on-line disassembler with armv5 big-endian no-thumb architecture to figure out that the bytes at the beginning frequently give you invalid and/or not-very-sensible code:
ldtdmi a1, [a1], -a2 ; <UNPREDICTABLE>
strne a1, [a1, a1]
andeq a1, a1, a1
ldrbne pc, [pc, -ip, lsr #8]! ; <UNPREDICTABLE>
In any case, the data at the start of the IMX file is known to be header information (the d1 at the start is a "magic" marker indicating the IVT header and there should also be a DCD block after that. However, even beyond the IVT and DCD blocks (based on their purported lengths in the header fields), the code is not sensible.
However, there's viable information at offset 0xc00 following a large chunk of 0x00 bytes:
00000be0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000bf0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000c00: 0f00 00ea 14f0 9fe5 14f0 9fe5 14f0 9fe5 ................
00000c10: 14f0 9fe5 14f0 9fe5 14f0 9fe5 14f0 9fe5 ................
Putting the hex bytes at offset 0xc00 into the disassembler, and adjusting for areas that are skipped by branches, shows both valid and sane ARM code.
And, indeed, stripping the IMX file with:
dd if=u-boot.imx bs=1 skip=3072 of=ub-at-c00.imx
should give you a file you can boot with:
ext4load mmc ${mmcdev}:${mmcpart} 0x17800000 ${bootdir}/ub-at-c00.imx
go 0x17800000
When we do this, it outputs:
U-Boot 2014.04 (Nov 07 2018 - 19:05:32)
CPU: Freescale i.MX6Q rev1.5 at 792 MHz
CPU: Temperature 32 C, calibration data: 0x5764e169
Reset cause: unknown reset
Board: DTI BRD0208 (Spitfire I) 05/01/2017
I2C: ready
DRAM: 1 GiB
We know this is the newer UBoot simply because the normal one we're using outputs an October date rather than a November one.
Unfotunately, it hangs at that point, with the watchdog timer eventually kicking in and rebooting back to the original UBoot but I suspect that has to do with UBoot not liking the current state of the device (i.e., it doesn't like initialising it twice).
So we'll have to figure out how to convince it to do so but at least we've gotten it booting another copy of itself, which is what the question was about.
One of my redis servers is repeatedly going down today without any overt, diagnosable cause. My users all end up getting Error 111 connecting to unix socket: /var/run/redis/redis2.sock. Connection refused errors.
Looking into the logs at /var/log/redis, the last few lines capture nothing more nefarious than a scheduled backup:
[8248] 09 Mar 07:48:17.090 * 10 changes in 21600 seconds. Saving...
[8248] 09 Mar 07:48:17.374 * Background saving started by pid 47613
[47613] 09 Mar 07:51:02.257 * DB saved on disk
[47613] 09 Mar 07:51:02.486 * RDB: 526 MB of memory used by copy-on-write
[8248] 09 Mar 07:51:02.920 * Background saving terminated with success
The pid file still exists too. Which implies the server wasn't formally shut down, and redis was still daemonized?
I logged into my system and did sudo service redis-server restart twice to get it up and running. Apart from these logs, how else can I diagnose what might have gone wrong?
Update: I noticed that at the time of the first crash, disk swapping started taking place. This hasn't happened before. Moreover, cat /proc/sys/vm/swappiness confirms swappiness is set to 2.
free -m shows (after normal operation):
total used free shared buffers cached
Mem: 28136 27015 1120 305 80 6586
-/+ buffers/cache: 20349 7787
Swap: 1023 991 32
free -m shows (after the redis server goes down):
total used free shared buffers cached
Mem: 28136 8770 19365 305 60 441
-/+ buffers/cache: 8268 19868
Swap: 1023 1022 1
This sounds like the work of the OS' OOM killer - you can verify/discredit the hypothesis by reviewing the /var/log/syslog.
In this case, the persistence job's overhead triggered the killer. You need to provision for that by setting maxmemory and allocating enough RAM to accommodate persistence's requirements, including COW.
Note that free isn't useful after the fact - you need to monitor your resources continuously.
As for swap, if you don't care about latency then you can certainly do that.
I have a 32 bit app website running on IIS 7.5 that is randomly crashing and provides a hdmp file under the WER directory. Problem is the dump file is around 3GB and can't be read by the 32 bit Windbg tool as it gives an error that it is too big. The 64 bit version of Windbg will read it but when it comes to running any commands, it says it can't run as the crash dump is for a 32 bit app.
Is there a way to read this dump file or any way to make it smaller for future crashes? Is there a size limit on 32 bit crash dumps that Windbg can analyse ie 2GB? I'm a sys admin rather than a web developer, so this is all new to me.
UPDATE
Got some kind of output from !Analyse -v, although symbols haven't loaded properly direct from MS:
NTGLOBALFLAG: 0
APPLICATION_VERIFIER_FLAGS: 0
APP: w3wp.exe
ANALYSIS_VERSION: 6.3.9600.17336 (debuggers(dbg).150226-1500) amd64fre
MANAGED_STACK: !dumpstack -EE
SOS does not support the current target architecture.
MANAGED_BITNESS_MISMATCH:
Managed code needs matching platform of sos.dll for proper analysis. Use 'x86' debugger.
PRIMARY_PROBLEM_CLASS: WRONG_SYMBOLS
BUGCHECK_STR: APPLICATION_FAULT_WRONG_SYMBOLS_CLR_EXCEPTION
LAST_CONTROL_TRANSFER: from 72ec615a to 756fb727
STACK_TEXT:
1cdef038 72ec615a e0434352 00000001 00000005 KERNELBASE!RaiseException+0x58
1cdef0dc 72fd1bec 00000000 10f354c4 1cdef100 clr!RaiseTheExceptionInternalOnly+0x276
1cdef0f4 72fd1e1d 00000004 1cdef26c 72ec637e clr!RaiseTheException+0x86
1cdef11c 72fd1e4d 00000004 00000004 00000000 clr!RaiseTheExceptionInternalOnly+0x30a
1cdef150 72faef2d 8843cf04 00000000 1b8f51c0 clr!RealCOMPlusThrow+0x2f
1cdef278 72fae48a 00000000 1cdef2bc 8843ce3c clr!Thread::RaiseCrossContextException+0x37b
1cdef340 72e57c22 00000002 01e57c43 00000000 clr!Thread::DoADCallBack+0x2d3
1cdef398 00b4a9bd ffffffff 00b4a940 1cdef3d4 clr!UM2MDoADCallBack+0x92
WARNING: Frame IP not in any known module. Following frames may be wrong.
1cdef3cc 72cf9651 00000000 00c4f1c4 0000000e 0xb4a9bd
1cdef3f0 72cfa3b4 733e81f0 72cfa382 1cdef448 webengine4!W3_MGD_HANDLER::ProcessNotification+0x58
1cdef400 72e4de98 00c4f1c4 8843c96c 1cdef479 webengine4!ProcessNotificationCallback+0x32
1cdef448 72e519b1 1cdef479 1cdef47b 000b001c clr!UnManagedPerAppDomainTPCount::DispatchWorkItem+0x1c6
1cdef45c 72e52591 8843c914 72e5245b 00000000 clr!ThreadpoolMgr::ExecuteWorkRequest+0x42
1cdef4c4 72e5b4ad 00000000 00000000 776620c0 clr!ThreadpoolMgr::WorkerThreadStart+0x353
1cdef85c 751933ca 1b89d348 1cdef8a8 77599ed2 clr!Thread::intermediateThreadProc+0x4d
1cdef868 77599ed2 1b89d348 4e01fdde 00000000 kernel32!BaseThreadInitThunk+0xe
1cdef8a8 77599ea5 72e5b464 1b89d348 ffffffff ntdll!__RtlUserThreadStart+0x70
1cdef8c0 00000000 72e5b464 1b89d348 00000000 ntdll!_RtlUserThreadStart+0x1b
FOLLOWUP_IP:
clr!Thread::DoADCallBack+2d3
72fae48a 8b4508 mov eax,dword ptr [ebp+8]
SYMBOL_STACK_INDEX: 6
SYMBOL_NAME: clr!Thread::DoADCallBack+2d3
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: clr
IMAGE_NAME: clr.dll
DEBUG_FLR_IMAGE_TIMESTAMP: 52310b2d
STACK_COMMAND: ~34s; .ecxr ; kb
FAILURE_BUCKET_ID: WRONG_SYMBOLS_e0434352_clr.dll!Thread::DoADCallBack
BUCKET_ID: APPLICATION_FAULT_WRONG_SYMBOLS_CLR_EXCEPTION_clr!Thread::DoADCallBack+2d3
ANALYSIS_SOURCE: UM
FAILURE_ID_HASH_STRING: um:wrong_symbols_e0434352_clr.dll!thread::doadcallback
FAILURE_ID_HASH: {15ee992e-553a-06bb-20c5-a254780b9452}
Followup: MachineOwner
Anybody any ideas???
Thanks a lot!
to be able to run my eSata Sheevaplug with Debian Wheezy I had to upgrade U-Boot to the DENX version.
As step-by-step guide I used this read from Martin Michlmayr. I did the upgrade using screen and a USB stick at the plug.
The upgrade went good and after resetting I got the plug started with the new version.
Marvell>> version
U-Boot 2013.10 (Oct 21 2013 - 21:06:56)
Marvell-Sheevaplug - eSATA - SD/MMC
gcc (Debian 4.8.1-9) 4.8.1
GNU ld (GNU Binutils for Debian) 2.23.52.20130727
Marvell>>
In the guide is written to set machid environment variable and MAC address.
But unfortunatly saveenv fails due to bad blocks in the NAND. I tried different versions of U-Boot also the one provided by NewIT. All behave the same way.
Marvell>> setenv machid a76
Marvell>> saveenv
Saving Environment to NAND...
Erasing NAND...
Skipping bad block at 0x00060000
Writing to NAND... FAILED!
There are some blocks marked as bad, which might be normal - by NewIT.
Marvell>> nand info
Device 0: nand0, sector size 128 KiB
Page size 2048 b
OOB size 64 b
Erase size 131072 b
Marvell>> nand bad
Device 0 bad blocks:
00060000
00120000
00360000
039c0000
0c300000
10dc0000
1ac40000
1f1c0000
Has someone a clue what the problem is and what I need to change to be able saving environment variables in u-boot?
Thanks,
schibbl
Due to configuration of environment variable storage at NAND, the sector size of 128k and a bad block mapping the environment variable storage adress it is not possible to write env to NAND.
Marvell>> nand bad
Device 0 bad blocks:
00060000
...
include/configs/sheevaplug.h which points perfectly to the bad block.
/*
* max 4k env size is enough, but in case of nand
* it has to be rounded to sector size
*/
#define CONFIG_ENV_SIZE 0x20000 /* 128k */
#define CONFIG_ENV_ADDR 0x60000
#define CONFIG_ENV_OFFSET 0x60000 /* env starts here */
Because of unused sector 0x80000 to 0x9FFFF I moved env storage there.
/*
* max 4k env size is enough, but in case of nand
* it has to be rounded to sector size
*/
#define CONFIG_ENV_SIZE 0x20000 /* 128k */
#define CONFIG_ENV_ADDR 0x80000
#define CONFIG_ENV_OFFSET 0x80000 /* env starts here due to bad block */
Beware! We have to ensure our compiled u-boot.kwb is less then 384k. Otherwise we will write u-boot to bad block marked memory and will brick the device.
Best way to recompile with custom env address, is to use Michlmayrs sources, which includes patches for mmc and e-sata support.
HI,
I'm running an apache 2.2.3 on an Oracle64-bit (Red Hat clone) and I'm hitting a brick wall with an issue. I have a program which utilizes MIME::Lite to send mail through sendmail (I apologize, not sure what versions of sendmail or mod_perl I'm running, although I do believe the sendmail portion is irrelevant as you'll see in a moment)
On occasion, apache will segfault (11), and digging deep into the MIME::Lite module, I see it is on the following line:
open SENDMAIL, "|$sendmailcmd" or Carp::croak "open |$sendmailcmd: $!\n"; (this is in MIME::Lite)
Now, one would automatically suspect sendmail, but if I did the same line to use /bin/cat (as shown):
open SENDMAIL, "|/bin/cat"
apache still segfaults.
I attached an strace to the apache processes and see the following:
(when it does NOT crash)
12907 write(2, "SENDMAIL send_by_sendmail 1\n", 28) = 28
12907 write(2, "SENDMAIL /usr/lib/sendmail -t -o"..., 40) = 40
12907 pipe([24, 26]) = 0
12907 pipe([28, 29]) = 0
12907 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b4bcbbd75d0) = 13186
Note the "SENDMAIL sent_by_sendmail" are my comments. You can clearly see pipes opening. When it DOES crash, you'll see the following:
10805 write(2, "SENDMAIL send_by_sendmail (for y"..., 40) = 40
10805 --- SIGSEGV (Segmentation fault) # 0 (0) ---
Now notice it never pipes. I've tried GDB and it hasn't really shown me anything.
Finally, I wrote a simple program to run through mod_perl and regular cgi:
print header();
print "test";
open SENDMAIL, "|/bin/cat" or Carp::croak "open |sendmailcmd: $!\n";
print SENDMAIL "foodaddy";
close SENDMAIL;
print "test done <br/>";
Under mod_perl it has successfully crashed.
My analysis is telling me it has to do with it trying to open a file handle, the piping function returns either false or a corrupt file handle.
I also increased the file descriptor limit to 2048, no dice.
Does anyone have any thoughts as to where I should look? Any thoughts?
I appreciate the help
I just spent a long time tracking down a problem that started with identical symptoms. I eventually discovered that Test::More does not play well with mod_perl . Removing this module from my code appears to have solved the problem (so far!). I didn't follow this any deeper, but I suspect that the problem actually lies in Test::Builder.
I managed to treat perhaps only the symptoms, not the cause. I happened to have this issue when used global/package scope variables on the package level, used inside a perl object instance, as soon as I passed them as object properties instead, not as automatic default perl variables scoping, I stopped to experience perl segmentation fault suddenly.