Lately numerous network requests with Alamofire made from our iOS device fail with the following error:
Error Domain=NSPOSIXErrorDomain Code=28 "No space left on device"
UserInfo={_NSURLErrorFailingURLSessionTaskErrorKey=LocalDataTask
.<3>,
_kCFStreamErrorDomainKey=1, _NSURLErrorRelatedURLSessionTaskErrorKey=( "LocalDataTask .<3>" ),
_kCFStreamErrorCodeKey=28}
Our app has a mechanism to send a network request if the user has moved +- 10 meters. This is checked every 5 seconds, so in theory every five seconds a call can be made. The network request fails occasionally with this message, returning no status code and the above error.
The message implies the error has to do with available disk/memory space on the device. However, after checking both there is no link to be found since there is plenty of space available. Also, the error occurs on multiple devices, all running iOS 14.4 or higher.
Is there information available regarding error code 28 and what could be the culprit on iOS devices? Even better; how can this error be prevented?
To answer the occurrence of the error itself:
NSPOSIXErrorDomain Code=28 "No space left on device"
With logs in the Xcode terminal:
2021-05-07 15:56:50.873428+0200 MYAPP[21757:7406020] [] nw_path_evaluator_create_flow_inner NECP_CLIENT_ACTION_ADD_FLOW 05CD829A-810D-412F-B86E-7524369359E8 [28: No space left on device]
2021-05-07 15:56:50.877243+0200 MYAPP[21757:7400322] Task <5504BCDF-7DFE-4045-BD4B-E75054636D5B>.<1> finished with error [28] Error Domain=NSPOSIXErrorDomain Code=28 "No space left on device" UserInfo={_NSURLErrorFailingURLSessionTaskErrorKey=LocalUploadTask <5504BCDF-7DFE-4045-BD4B-E75054636D5B>.<1>, _kCFStreamErrorDomainKey=1, _NSURLErrorRelatedURLSessionTaskErrorKey=(
"LocalUploadTask <5504BCDF-7DFE-4045-BD4B-E75054636D5B>.<1>"
), _kCFStreamErrorCodeKey=28}
It appears to get called when there are too many NSURLSessions created, reaching a limit of (in our tests) 600-700 sessions, which are not maintained or closed properly. The error started to get thrown since iOS 14, so it is interesting to see if there was a limit introduced.
Linked is a github issue raised stating the same issues on the ktor microservices framework by JetBrains, pointing in the same direction, mentioning the invalidation of sessions to prevent this issue:
https://github.com/ktorio/ktor/issues/1341
In our own project the origin of the problem turned out to be our implementation of the StarScream websocket library. This might not be relevant for the issues others are having, but explained anyways to create a complete picture of the problem. It is the cause and fix of our specific situation.
At first we assumed it had something to do with the URLSession created by Alamofire (networking library used) since POST requests started to get cancelled, and a kill of the app seemed the only solution to do requests again.
However, we also make use of websocket connections using the StarScream library, which attempts to connect to an socket, and if failed retry to connect every two seconds for a max time of two hours. This would mean for two hours, every two seconds, we connect to the socket -> receive a failure to connect -> disconnect the socket -> connect again. Using a singleton of the socket it was thought there was no possibility of creating multiple URLSessions, since the socket was only initiated once. However calling the connect to the socket again would create a new nw_connection object every single time, since the library did not handle the disconnect properly.
image of NWConcrete_nw_connection objects generated in socket connection
The way this was validated was using the instruments app to check for the creation of new nw_connection objects. Logged as a "memory leak" there, the creation of the nw_connection objects was getting logged and the solution was to make sure we disconnect the socket (invalidate the session) properly before connecting again.
I hope to answer a big part of the issue here, and I will mark my own question answered since this was the solution to the problem at hand. I think Apple should consider giving accurate reports on the number of objects created being limited, instead of giving an error "No space left on device".
Just wanted to chime in with more info, since we're experiencing the same issue.
Based on our analytics, this issue only started happening since iOS 14. We've verified it happening on 14.2, 14.4 and 14.5. Naturally the most straightforward cause for this error would be low memory or disk storage. We've excluded this option with additional logging, as you seem to have done as well.
A possibly related SO post has attributed the issue to a network inspecting framework that was enabled in their release build. It's worth checking if you use a similar tool.
Another report of this issue, this time on the Github of AFNetworking (predecessor to the Alamofire library you use), says they were able to fix it by limiting the creation of URLSession objects.
For us personally, neither of these did the trick. We created a support ticket with Apple, but this hasn't lead to a solution. They requested a small sample project that reproduces the issue, but the error only manifested after 7 days of continuous use in our app. If you have a faster way to reproduce this, it may be worth it to submit your own support ticket.
Hopefully this helps you find a solution, if you do please add this to your post to help others!
We are in Australia East.
We have an event hub with events coming through from an application. On Friday 19th March morning I created a Stream Analytics job, to try and read one of the event streams. This worked successfully on the Event Hub and returned results in the "Input preview" window when setting this up. This seems to match the timings on the message below (we are about 12 hours in front of UTC).
However by Friday afternoon, it started failing with one of the error messages "InternalServerError" or "No such host is known". I was working through the drop down boxes available when creating a new input after selecting "Select Event Hub from your subscriptions", so I know I haven't got anything wrong in the setup.
When trying to submit a support request, we get this slightly cryptic message:
The link doesn't work, it claims Stream Analytics is not supported in Resource Health, even though it is. Does this mean "It's down sorry, we are working on it" (as in actually working on it), or is it a canned response and we should escalate it?
Or is anyone else having trouble creating Stream Analytics Jobs and we are suffering an outage? The Azure Status monitor shows they are in good health.
Looks like this was a permission error. I needed to go into the IAM of the Event Hub and set the Stream Analytics Job up with Reader permissions. For some reason, this wasn't automatically added when setting up the Stream Analytics Job, as I thought it would be.
Once the permission was set, the job started successfully.
I integrated spring-cloud-sleuth with GCP support into an application. Under load the app suddenly stops reporting any spans until it is restarted.
The only tracing relevant log i can see is the following exception:
Unexpected error flushing spans java.lang.IllegalStateException: timeout waiting for onClose. timeoutMs=5000, resultSet=false
at zipkin2.reporter.stackdriver.internal.AwaitableUnaryClientCallListener.await(AwaitableUnaryClientCallListener.java:49)
at zipkin2.reporter.stackdriver.internal.UnaryClientCall.doExecute(UnaryClientCall.java:50)
at zipkin2.Call$Base.execute(Call.java:380)
at zipkin2.Call$Mapping.doExecute(Call.java:237) at zipkin2.Call$Base.execute(Call.java:380)
at zipkin2.reporter.AsyncReporter$BoundedAsyncReporter.flush(AsyncReporter.java:285)
at zipkin2.reporter.AsyncReporter$Flusher.run(AsyncReporter.java:354)
at java.base/java.lang.Thread.run(Unknown Source)
This exception happens a few times around the time the traces end and then never aggain (as if something permanently breaks)
I read in a spring-cloud-gcp issue (see here) that this can be related to to few executer threads so i already configured the number of threads to 8 (from 4).
I got below warning in weblogic 9.2
Warning <Socket> <managed2> <ExecuteThread: '0' for queue: 'weblogic.socket.Muxer'> <<WLS Kernel
>> <> <> <> <BEA-000450> <Unable to find internal data record for the socket 7; event received
Can it effect any performance in my application?
If it effects, please provide a solution for this.
Thanks in advance.
Regards,
Ganesan Chandrasekaran.
Is this a one-off or occurring frequently?
From the documentation
BEA-000450
Warning: Socket socket internal data record unavailable (probable
closure due idle timeout), event received event
Description Unable to find internal data record for the socket.
Cause Server already closed the idle socket.
Action If situation persists, contact Oracle Customer Support.
This is mostly a harmless message but you would need Oracle help if it is always appearing
This was also discussed on the Oracle forums
The root cause of this message is due to fixing the OutOfMemory
message being reported in an incident CR346335 for WebLogic 9.2 MP3.
Code changes were made to explicitly remove the associated objects
from memory for a given socket after the timer thread closed the
socket as OS was not reporting the error on the OS level. As a result
of the removal of the associated objects, the message starts appearing
in the log.
We need to be able to specify a delay in retrying failed messages. NServiceBus retries more or less instantly up to n times (as configured) before moving the message to error queue.
What I need to be able to do is for a given message type specify that its not to be retried for an arbitrary period of time
I've read the post here:
NServiceBus Retry Delay
but this doesn't give what I'm looking for.
Kind regards
Ben
This isn't supported as of right now. What you can do is let the messages go to the error queue and setup and endpoint to monitor that queue. Your code could then determine the rules for replaying messages. You could use a Saga to achieve this in combination with the Timeout manager.
Typically you'll have some rules around when to replay messages. In NSB 3.0 we have a better way to do this using the FaultManager. This gives you options on where to put failed messages and includes the exception. One of the options is a DB which you could then set up a job to inspect the exception and determine what to do with it.
Lastly a low tech way of getting this is to schedule a job that runs the ReturnToSourceQueue tool periodically to "clean up". We are doing this and including an alert so we don't endlessly cycle messages around.