Real Time screen grabbing and streaming with libav-tools

Real Time screen grabbing and streaming with libav-tools - live-streaming

For my school project I have to stream screen grabbing from 1 station (i.e. server) to another (i.e. client) in Real Time, both running linux (ubuntu).
I'm using libav-tools (avconv as the encoder on the server side and avplay as the player on the client side)
avconv uses x11grab format to grab from the screen.
My problem is: avconv needs a few seconds to output the encoded video. this wait is too long for RT.
I've tried streaming to localhost to avoid network influence on speed, it still seems that avconv is responsible for the long wait.
Also, streaming a video file seems to be much faster, almost immediately.
The project is implemented in C++ and executes avconv in a fork.
Any suggestions as to shortening the procedure?

This is most likely due to internal buffering. There is often a buffer which is way too big on default. That is because having no delay is not the primary concern of most software, they are more concerned with bad connections and that sort of problems, which is what buffers are for.
See https://libav.org/avconv.html, search for "nobuffer" or "-analyzeduration" or "-rtbufsize" or "-max_delay" or "-fpsprobesize" or "rtmp_buffer" (if you use rtmp) or others and try your luck.
There will always be a noticable delay, especially if you use an encodings like h264 for transfer. But a few seconds it does not need to be in a controlled environment. You should be able to bring it down to fractions of a second.

Related

In a Cloudflare worker why the faster stream waits the slower one when using the tee() operator to fetch to R2?

I want to fetch an asset into R2 and at the same time return the response to the client.
So simultaneously streaming into R2 and to the client too.
Related code fragment:
const originResponse = await fetch(request);
const originResponseBody = originResponse.body!!.tee()
ctx.waitUntil(
env.BUCKET.put(objectName, originResponseBody[0], {
httpMetadata: originResponse.headers
})
)
return new Response(originResponseBody[1], originResponse);
I tested the download of an 1GB large asset with a slower, and a faster internet connection.
In theory the outcome (success or not) of putting to R2 should be the same in both cases. Because its independent of the client's internet connection speed.
However, when I tested both scenarios, the R2 write was successful with the fast connection, and failed with the slower connection. That means that the ctx.waitUntil 30 second timeout was exceeded in case of the slower connection. It was always an R2 put "failure" when the client download took more than 30 sec.
It seems like the R2 put (the reading of that stream) is backpressured to the speed of the slower consumer, namely the client download.
Is this because otherwise the worker would have to enqueue the already read parts from the faster consumer?
Am I missing something? Could someone confirm this or clarify this? Also, could you recommend a working solution for this use-case of downloading larger files?
EDIT:
The Cloudflare worker implementation of the tee operation is clarified here: https://community.cloudflare.com/t/why-the-faster-stream-waits-the-slower-one-when-using-the-tee-operator-to-fetch-to-r2/467416
It explains the experiences.
However, a stable solution for the problem is still missing.

Cloudflare Workers limits the flow of a tee to the slower stream because otherwise it would have to buffer data in memory.
For example, say you have a 1GB file, the client connection can accept 1MB/s while R2 can accept 100MB/s. After 10 seconds, the client will have only received 10MB. If we allowed the faster stream to go as fast as it could, then it would have accepted all 1GB. However, that leaves 990MB of data which has already been received from the origin and needs to be sent to the client. That data would have to be stored in memory. But, a Worker has a memory limit of 128MB. So, your Worker would be terminated for exceeding its memory limit. That wouldn't be great either!
With that said, you are running into a bug in the Workers Runtime, which we noticed recently: waitUntil()'s 30-second timeout is intended to start after the response has finished. However, in your case, the 30-second timeout is inadvertently starting when the response starts, i.e. right after headers are sent. This is an unintended side effect of an optimization I made: when Workers detects that you are simply passing through a response body unmodified, it delegates pumping the stream to a different system so that the Worker itself doesn't need to remain in memory. However, this inadvertently means that the waitUntil() timeout kicks in earlier than expected.
This is something we intend to fix. As a temporary work-around, you could write your worker to use streaming APIs such that it reads each chunk from the tee branch and then writes it to the client connection in JavaScript. This will trick the runtime into thinking that you are not simply passing the bytes through, but trying to perform some modifications on them in JavaScript. This forces it to consider your worker "in-use" until the entire stream completes, and the 30-second waitUntil() timeout will only begin at that point. (Unfortunately this work-around is somewhat inefficient in terms of CPU usage since JavaScript is constantly being invoked.)

What is a typical time for a USB write on an STM32?

I have an STM32f042 and I have loaded the example Custom HID firmware from the STM32F0x2_USB-FS-Device_Lib V1.0.0.
I then did some simple write transfers sending just one or two bytes, and watched the response using wireshark.
After doing about ten transfers it looks like time for a transfer to complete ranges between 15ms and 31ms with the average being somewhere around 25ms.
I've been told in the past that a single fast USB transaction should take around 1ms so this feels to me to be about an order of magnitude slow.
Is this a normal time for this chip? (And how would I go about figuring out what "normal" is?) Or is this abnormally slow?

Please check configuration descriptor in usbd_customhid.c file. The polling interval for each endpoint set but parameter: bInterval, the default value in examples(as I remember) set to 0x20(32ms) try to change it!

Does StackExchange.Redis supports MONITOR?

I recently migrated from Booksleeve to StackExchange.Redis.
For monitoring purposes, I need to use the MONITOR command.
In the wiki I read
From the IServer instance, the Server commands are available
But I can't find any method concerning MONITOR in IServer ; After a quick search in the repository, it seems this command is not mappped even if RedisCommand.MONITOR is defined.
So, is the MONITOR command supported by StackExchange.Redis ?

Support for monitor is not provided, for multiple reasons:
invoking monitor is a path with of no return; a monitor connection can never be anything except a monitor connection - it certainly doesn't play nicely with the multiplexer (although I guess a separate connection could be used)
monitor is not something that is generally encouraged - it has impact; and when you do use it, it would be a good idea to run it as close to the server as possible (typically in a terminal to the server itself)
it should typically be used for short durations
But more importantly, perhaps, I simply haven't seen a suitable user-case or had a request for it. If there is some scenario where monitor makes sense, I'm happy to consider adding some kind of support. What is it that you want to do with it here?
Note that caveat on the monitor page you link to:
In this particular case, running a single MONITOR client can reduce the throughput by more than 50%. Running more MONITOR clients will reduce throughput even more.

How to share the APC user cache between CLI and Web Server instances?

I am using PHP's APC to store a large amount of information (with apc_fetch(), etc.). This information occasionally needs analyzed and dumped elsewhere.
The story goes, I'm getting several hundred hits/sec. These hits increase various counters (with apc_inc(), and friends). Every hour, I would like to iterate over all the values I've accumulated, and do some other processing with them, and then save them on disk.
I could do this as a random or time-based switch in each request, but it's a potentially long operation (may require 20-30 sec, if not several minutes) and I do not want to hang a request for that long.
I thought a simple PHP cronjob would do the task. However, I can't even get it to read back cahe information.
<?php
print_r(apc_cache_info());
?>
Yeilds a seemingly different APC memory segment, with:
[num_entries] => 1
(The single entry seems to be a opcode cache of itself)
While my webserver, powered by nginx/php5-fpm, yields:
[num_entries] => 3175
So, they are obviously not sharing the same chunk of memory. How can I either access the same chunk of memory in the CLI script (preferred), or if that is simply not possible, what would be the absolute safest way to execute a long running sequence on say, a random HTTP request every hour?
For the latter, would using register_shutdown_function() and immediately set_time_limit(0) and ignore_user_abort(true) do the trick to ensure execution completes and doesn't "hang" anyone's browser?
And yes, I am aware of redis, memcache, etc that would not have this problem, but I am stuck to APC for now as neither could demonstrate the same speed as APC.

This is really a design issue and a matter of selecting preferred costs vs. payoffs.
You are thrilled by the speed of APC since you do not spend time to persist the data. You also want to persist the data but now the performance hit is too big. You have to balance these out somehow.
If persistence is important, take the hit and persist (file, DB, etc.) on every request. If speed is all you care about, change nothing - this whole question becomes moot. There are cache systems with persistent storage that can optimize your disk writes by aggregating what gets written to disk and when but you will generally always have a payoff between the two with varying tipping points. You just have to choose which of those suits your objectives.
There will probably never exist an enduring, wholesome technological solution to the wolf being sated and the lamb being whole.
If you really must do it your way, you could have a cron that CURLs a special request to your application which would trigger persisting your cache to disk. That way you control the request, its timeout, etc., and don't have to worry about everything users might do to kill their requests.
Potential risks in this case, however, are data integrity (as you will be writing the cache to disk while it is being updated by other requests in the meantime) as well as requests being served while you are persisting the cache paying the performance hit of your server being busy.
Essentially, we introduced a bundle of hay to the wolf/lamb dilemma ;)

debugging a slow site -> long delay between connection and data sending

i ran a test from pingdom tools to check the loading time of my website... the result is that i have a lot of files that, in spite of being very small (5kB), take a lot of time (1 second or more) to load because there is a big delay between the beginning of the connection and the beginning of data downloading (in pingdom tools, this results in a very large green bar).
Have a look at this for example: http://tools.pingdom.com/default.asp?url=http%3a%2f%2fwww.giochigratis-online.net%2f&id=5691308
How can i lower the "green bar" time? Is this an apache problem (like, i dont know, the number of max. parallel connections, or something similar...), or an hardware problem? Cpu-limited, bandwith-limited, or what else?
I see that many other websites have very little green bars... how do they reduce the delay between the connection and the real data sending?
Thanks!
ps.: the site is made with drupal. Homepage generation takes about 700ms
pps.: i tested 3 other websites on the same server: same problem.

I think it could be the problem with max no. of parallel connections as you mentioned - either on server or client side. For instance, Firefox has default of network.http.max-connections-per-server = 15 (see here) while you have >70 files to be downloaded in your domain and next 40 from Facebook.
You can reduce number of loaded images by generating sprites i.e. the image consisting of multiple small images, and then using CSS to diplay them properly in places that you want. This is widely used e.g. by Google - see http://www.google.com/images/nav_logo83.png

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas