Request objects causing memory leaks - scrapy

just a quick question. I am facing problems with scrapy using too much memory. The documentation shows an example with HtmlResponse objects lasting for as long as Request objects, and uses this argument to conclude that HtmlResponse objects are causing a memory leak. My question is, what about Request objects? shouldn't they not last as long too?
The following is my prefs() output:
Spider 1 oldest: 353s ago
Request 46544 oldest: 338s ago
HtmlResponse 9 oldest: 3s ago

Related

keep getting "distributed.utils_perf - WARNING - full garbage collections took 19% CPU time..."

I keep getting "distributed.utils_perf - WARNING - full garbage collections took 19% CPU time recently" warning message after I finished DASK code. I am using DASK doing a large seismic data computing. After the computing, I will write the computed data into disk. The writing to disk part takes much longer than computing. Before I wrote the data to the disk, I call client.close(), which I assume that I am done with DASK. But "distributed.utils_perf - WARNING - full garbage collections took 19% CPU time recently" keep coming. When I doing the computing, I got this warning message 3-4 times. But when I write the data to the disk, I got the warning every 1 sec. How can I get ride of this annoying warning? Thanks.
same was happening with me in the Colab where we start the session
client = Client(n_workers = 40, threads_per_worker = 2 )
I terminate all my Colab sessions and installed and imported all the Dask libs
!pip install dask
!pip install cloudpickle
!pip install 'dask[dataframe]'
!pip install 'dask[complete]'
from dask.distributed import Client
import dask.dataframe as dd
import dask.multiprocessing
Now everything is working fine and not facing any issues.
Don't know how this solved my issue :D
I had been struggling with this warning too. I would get many of these warnings and then the workers would die. I was getting them because I had some custom python functions for aggregating my data together that was handling large python objects (dict). It makes sense so much time was being spent of garbage collection if I was creating these large objects.
I refactored my code so more computation was being done in parallel before they were aggregated together and the warnings went.
I looked at the progress chart on the status page of dask dashboard to see which tasks were taking along time to process (Dask tries to name the tasks after the function in your code which called them so that can help, but they're not always that descriptive). From there I could figure out which part of my code I needed to optimise.
You can disable garbage collection in Python.
gc.disable()
I found that it was easier to manage Dask worker memory through periodic usages of the Dask client restart: Client.restart()
Just create a process to run Dask cluster and return the ip address. Create the client using that ip address.

How to determine what's blocking the main thread

So I restructured a central part in my Cocoa application (I really had to!) and I am running into issues since then.
Quick outline: my application controls the playback of QuickTime movies so that they are in sync with external timecode.
Thus, external timecode arrives on a CoreMIDI callback thread and gets posted to the application about 25 times per sec. The sync is then checked and adjusted if it needs to be.
All this checking and adjusting is done on the main thread.
Even if I put all the processing on a background thread it would be a ton of work as I'm currently using a lot of GCD blocks and I would need to rewrite a lot of functions so that they can be called from NSThread. So I would like to make sure first if it will solve my problem.
The problem
My Core MIDI callback is always called in time, but the GCD block that is dispatched to the main queue is sometimes blocked for up to 500 ms. Understandable that adjusting the sync does not quite work if that happens. I couldn't find a reason for it, so I'm guessing that I'm doing something that blocks the main thread.
I'm familiar with Instruments, but I couldn't find the right mode to see what keeps my messages from being processed in time.
I would appreciate if anyone could help.
Don't know what I can do about it.
Thanks in advance!
Watchdog
You can use watch dog that stop when the main thread stopped for time
https://github.com/wojteklu/Watchdog
you can install it using cocoapod
pod 'Watchdog'
You may be blocking the main thread or you might be flooding it with events.
I would suggest three things:
Grab a timestamp for when the timecode arrives in the CoreMIDI callback thread (see mach_absolute_time(). Then grab the current time when your main thread work is being done. You can then adjust accordingly based on how much time has elapsed between posting to the main thread and it actually being processed.
create some kind of coalescing mechanism such that when your main thread is blocked, interim timecode events (that are now out of date) are tossed. This can be as simple as a global NSUInteger that is incremented every time an event is received. The block dispatched to the main queue could capture the current value on creation, then check it when it is processed. If it differs by more than N (N for you to determine), then toss the event because more are in flight.
consider not sending an event to the main thread for every timecode notification. 25 adjustments per second is a lot of work. If processing only 5 per second yields a "good enough" perceptual experience, then that is an awful lot of work saved.
In general, instrumenting the main event loop is a bit tricky. The CPU profiler in Instruments can be quite helpful. It may come as a surprise, but so can the Allocations instrument. In particular, you can use the Allocations instrument to measure memory throughput. If there are tons of transient (short lived) allocations, it'll chew up a ton of CPU time doing all those allocations/deallocations.

Ordered call stacks - with Instruments Time Profiler?

I am trying to fix a randomly occurring crash in a iOS app (EXC_BAD_ACCESS (SIGSEGV) KERN_INVALID_ADDRESS at 0xe104019e). The application performs many operations at the same time - loading data with NSURLConnection, redrawing a complex layout with many nested UIViews, drawing UIImages, caching downloaded files on the local disk, finding out files availability in local cache etc. etc. These operations are distributed into a number of threads and when the crash takes place I am totally confused what exact steps preceded it.
I would like to use a method to track the call stacks in the order of occurrence distributed into threads and see what objects methods were called during last 1-5ms before the crash, then I could isolate the bug that produces the crash.
At first sight the Time Profiler in Instruments offers this kind of tracking ability with many details but sampled call stacks seem to be presented in a random order - or maybe I get it wrong...
Is there a method that would tell me what exactly happens in the order of execution with ?
1) Run with zombies enabled, and removed all compiler and static analyzer warnings.
2) It is likely a threading issue, where you are not protecting your shared data appropriately.
Instruments is quite powerful regarding thread sorting/filtering, but there is no exact template for what you are trying to accomplish.
If you can reduce the problem to a small collection of object types, then you can follow execution based on the reference counts if you add the allocations instrument and enable reference count recording.

Why is playing audio through AV Foundation blocking the UI on a slow connection?

I'm using AV Foundation to play an MP3 file loaded over the network, with code that is almost identical to the playback example here: Putting it all Together: Playing a Video File Using AVPlayerLayer, except without attaching a layer for video playback. I was trying to make my app respond to the playback buffer becoming empty on a slow network connection. To do this, I planned to use key-value observing on the AVPlayerItem's playbackBufferEmpty property, but the documentation did not say whether that was possible. I thought it might be possible because the status property can be observed (and is the example above) even though the documentation doesn't say that.
So, in an attempt to create conditions where the buffer would empty, I added code on the server to sleep for two seconds after serving up each 8k chunk of the MP3 file. Much to my surprise, this caused my app's UI (updated using NSTimer) to freeze completely for long periods, despite the fact that it shows almost no CPU usage in the profiler. I tried loading the tracks on another queue with dispatch_async, but that didn't help at all.
Even without the sleep on the server, I've noticed that loading streams using AVPlayerItem keeps the UI from updating for the short time that the stream is being downloaded. I can't see why a slow file download should ever block the responsiveness of the UI. Any idea why this is happening or what I can do about it?
Okay, problem solved. It looks like passing AVURLAssetPreferPreciseDurationAndTimingKey in the options to URLAssetWithURL:options: causes the slowdown. This also only happens when the AVURLAsset's duration property or some other property relating to the stream's timing is accessed from the selector fired by the NSTimer. So if you can avoid polling for timing information, this problem may not affect you, but that's not an option for me. If precise timing is not requested, there's still a delay of around 0.75 seconds to 1 second, but that's all.
Looking back through it, the documentation does warn that precise timing might cause slower performance, but I never imagined 10+ second delays. Why the delay should scale with the loading time of the media is beyond me; it seems like it should only scale with the size of the file. Maybe iOS is doing some kind of heavy polling for new data and/or processing the same bytes over and over.
So now, without "precise timing and duration," the duration of the asset is permanently at 0.0, even when it's fully loaded. I can also answer my original goal of doing KVO on AVPlayerItem.isPlaybackBufferEmpty. It seems KVO would be useless anyway, since the property starts out being NO, changes to YES as soon as I start playback, and continues to be YES even as the media is playing for minutes at a time. The documentation says this about the property:
Indicates whether playback has consumed all buffered media and that playback will stall or end.
So I guess that's not accurate, and, at least in this particular case, the property is not very useful.

NSThread Picks Up Queue and Processes It

I have an app that needs to send collected data every X milliseconds (and NOT sooner!). My first thought was to stack up the data on an NSMutableArray (array1) on thread1. When thread2 has finished waiting it's X milliseconds, it will swap out the NSMutableArray with a fresh one (array2) and then process its contents. However, I don't want thread1 to further modify array1 once thread2 has it.
This will probably work, but thread safety is not a field where you want to "just try it out." What are the pitfalls to this approach, and what should I do instead?
(Also, if thread2 is actually an NSTimer instance, how does the problem/answer change? Would it all happen on one thread [which would be fine for me, since the processing takes a tiny fraction of a millisecond]?).
You should use either NSOperationQueue or Grand Central Dispatch. Basically, you'll create an operation that receives your data and uploads it when X milliseconds have passed. Each operation will be independent and you can configure the queue wrt how many concurrent ops you allow, op priority, etc.
The Apple docs on concurrency should help:
http://developer.apple.com/library/ios/#documentation/General/Conceptual/ConcurrencyProgrammingGuide/Introduction/Introduction.html
The pitfalls of this approach have to do with when you "swap out" the NSArray for a fresh one. Imagine that thread1 gets a reference to the array, and at the same time thread2 swaps the arrays and finishes processing. Thread1 is now writing to a dead array (one that will no longer be processed), even if it's just for a few milliseconds. The way to prevent this, of couse, is by using synchronized code-blocks (i.e., make your code "thread-safe") in the critical sections, but it's kind of hard not to overshoot the mark and synchronize too much of your code (sacrificing performance).
So the risks are you'll:
Make code that is not thread-safe
Make code that overuses synchronize and is slow (and threads already have a performance overhead)
Make some combination of these two: slow, unsafe code.
The idea is to "migrate away from threads" which is what this link is about.