Is there some kind of limit regarding the packet size in a SSL/HTTPS connection? - ssl

In the last couple of weeks I've been developing a boot loader that performs a firmware update on a certain device. The setup is as follows:
The firmware binary and its respective SHA1 hash are stored in a web server;
The device is composed of an ESP8266 and a STM32 microcontroller (STM32F401 or STM32F030, there two hardware versions, but the one I'm using is the F401). The ESP is used only with AT+ commands, i.e., I did not built it's firmware, just used the latest version from Espressif.
The idea is that, the STM32 bootloader should use the ESP to download the firmware hash and binary from the webserver and then boot the firmware if the hash is OK. The download is made using the ESP in passive mode, i.e. the STM has to manually request X bytes to read from the ESP buffer, currently I'm using 1 MTU (1460 bytes).
At first, the connection to the webserver was made using HTTP and everything worked perfectly, however, I had to change it to HTTPS, and that's where the problem starts. After the STM has received around 100kB of the firmware (which has 110kB), the ESP only provides 30 bytes per request (which should be around 1 MTU), thus, making the download time extremely large.
I've already did some digging trying to find out if this is related to the ESP, but didn't find anything. Also, the point where this 30 byte download rate starts to happen isn't always at the 100kB mark, I've tested with a 170kB firmware and It started to happen at 160kB ish, so, it looks like it's always the last 10kB.
I've also added some delays in the firmware when the packet size becomes smaller than 1 MTU, to give more time for the ESP to process the packet, since the SSL decryption stuff takes longer to process; but it did not help.
My question is: is there some characteristic in the HTTPS/SSL protocols that reduces the packet length? What could be the causes of what is happening here?

Related

Trouble with RTMP ingest chunk stream

I am trying to build my own client RTMP library for an app that I am working on. So far everything has gone pretty successfully in that I am able to connect to the RTMP server negotiate the handshake and then send all the necessary packets (FCPublish Publish ETC) then from the server i get the onStatus message of NetStream.Publish.Start which means that I have successfully got the server to allow me to start publishing my live video broadcast. Wireshark also confirms that the information (/Data packetizing) is correct as it shows up correctly there also.
Now for where I am having some trouble is RTMP Chunking, going off the Adobe RTMP Specification on page 17 & 18 shows an example of how a message is chunked. From this example I can see that it is broken down based on the chunk size (128 bytes). For me the chunk size gets negotiated in the initial connect and exchange which is always 4096 bytes. So for when I am exchanging video data that is larger than 4096 bytes I need to chunk the message down sending the RTMP packetHeader combined with the first 4096 bytes of data then sending a small RTMP header which is 0xc4 (0xc0 | packetHeaderType (0x04)) combined with 4096 bytes of video data until the full packet specified by the header has been sent. Then a new frame comes in and the same process is repeated.
By checking other RTMP client example written in different languages this seems to be what they are all doing. Unfortunately the ingest server that I am trying to stream to is not picking up the broadcast video data, they dont close the connection on my they just never show video or any sign that the video is right. Wireshark shows that after the video atom packet is sent most packets sent are Unknown (0x0) for a little bit and then they will switch into Video Data and will sort of flip flop between showing Unknown (0x0) and Video Data. However if I restrict my payload max size to 20000 bytes Wireshark shows everything as Video Data. Obviously the ingest server will not show video in this situation as i am removing chunks of data for it to be only 20k bytes.
Trying to figure out what is going wrong I started another xcode project that allows me to spoof a RTMP server on my Lan so that I can see what the data looks like from libRTMP IOS as it comes into the server. Also with libRTMP I can make it log the packets it sends and they seem to inject the byte 0xc4 even 128 bytes even tho I have sent the Change Chunk size message as the server. When I try to replicate this in my RTMP client Library by just using a 128 chunk size even tho it has been set to 4096 bytes the server will close my connection on me. However if change libRTMP to try to go to the live RTMP server it still prints out within LibRTMP that it is sending packets in a chunk size of 128. And the server seems to be accepting it as video is showing up. When I do look at the data coming in on my RTMP server I can see that it is all their.
Anyone have any idea what could be going on?
While I haven't worked specifically with RTMP, I have worked with RTSP/RTP/RTCP pretty extensively, so, based on that experience and the bruises I picked up along the way, here are some random, possibly-applicable tips that might help/things to look for that might be causing an issue:
Does your video encoding match what you're telling the server? In other words, if your video is encoded as H.264, is that what you're specifying to the server?
Does the data match the container format that the server is expecting? For example, if the server expects to receive an MPEG-4 movie (.m4v) file but you're sending only an encoded MPEG-4 (.mp4) stream, you'll need to encapsulate the MPEG-4 video stream into an MPEG-4 movie container. Conversely, if the server is expecting only a single MPEG-4 video stream but you're sending an encapsulated MPEG-4 Movie, you'll need to de-mux the MPEG-4 stream out of its container and send only that content.
Have you taken into account the MTU of your transmission medium? Regardless of chunk size, getting an MTU mismatch between the client and server can be hard to debug (and is possibly why you're getting some packets listed as "Unknown" type and others as "Video Data" type). Much of this will be taken care of with most OS' built-in Segmentation-and-Reassembly (SAR) infrastructure so long as the MTU is consistent, but in cases where you have to do your own SAR logic it's very easy to get this wrong.
Have you tried capturing traffic in Wireshark with libRTMP iOS and your own client and comparing the packets side by side? Sometimes a "reference" packet trace can be invaluable in finding that one little bit (or many) that didn't originally seem important.
Good luck!

Having difficulty sending small lwip packets immediately using the lwip API

I am creating a server on a ST Cortex M3 device. I am using the lwip API and FreeRTOS. All is working, but the response time is way off. I am currently using lwip 1.3.2 and FreeRTOS 7.3.
A single client connects to the server and must have some time-critical data sent frequently. These packets are on the order of 6 or so bytes. Other times, I am sending upwards of 20K.
The problem I am having is that these smaller packets seem to be taking forever to be sent. I assume this is because lwip is waiting for more data to be enqueued to make more efficient transmissions. I cannot wait around for 2 or 3 seconds for the data to be sent; the client is expecting the data nominally in a few micro-seconds or milli-seconds.
I have tried using lwip_send and lwip_write. (I understand that one is the same as the other with a flag passed at the end. Just had to try...) I have tried setting TCP_NODELAY on the socket to no avail. I tried to set SO_SNDLOWAT to '1', but this always returned -1, so I do not think it is supported.
I do not want to redo all of my code using TCP RAW. Is there a way to invoke the tcp_output() function outside of TCP RAW mode? Is there any way to speed things up or is this just how slow lwip TCP with small packets is?
Any and all suggestions are welcome. Thanks.
--EDIT--
I would also like to add that once I am ready to transmit, I make sure that my TX task in FreeRTOS is at the highest priority. There are no other tasks running up to the point at which I call lwip_send/write.
I'm fairly experienced with bare metal lwIP on xilinx and lwip does not wait to send things out. It will pump packets out as fast as your interrupts are acknowledged based on the ethernet hardware. I've been using UDP only. What is coming to mind though, is your problem might be on the receive end. If you are doing TCP, maybe those small packets are coming out late because you are having receive issues. What you need to do is find in the code the lowest level point at which ethernet is transmit, put a general purpose output toggle on that. Then also put a general purpose output toggle on when a ethernet packet is received. Look at the signals on a scope. If it confirms your hypothesis, then move the output toggles around to narrow down the issue. Wash, rinse and repeat until you are down to where the issue its. It's crude and time consuming, but oftentimes this brute force approach solves many "impossible" embedded software problems, due to pure determination. Good luck!

USB CDC device stalling

I'm writing a simple virtual serial port device to report an older serial port. By this point I'm able to enumerate the device and send/receive characters.
After a varying number of bulk-out transmissions from the host to the device the endpoint appears to give up and stop transferring data. On the PC side I receive a write error, and judging from a USBlyzer trace the music stops on a stall (USBD_STATUS_STALL_PID). However my code never intentionally issues a STALL condition on that endpoint and the status flag for having generated one never gets set though.
Given the short amount of time elapsed (<300 µs) between issuing the request and the STALL it would appear to be an invalid response of some sort, and not a time-out. On the device side the output endpoint is ready to go, with data in the buffer and proper DATA0/1 synchronization, but nothing further ever happens.
Note that the device appears to work fine even for long periods of time until I start sending "large" quantities of data. As near as I can tell the device enumeration/configuration also appears to complete successfully. Oh, and the bulk-in endpoint continues to work just fine after this.
For the record I'm using the standard Windows usbser.sys driver and an XMega128A4U µP. I'm also seeing the same behaviour across multiple Windows Vista and 7 machines.
Any ideas what I'm doing wrong or what further tests I might run to narrow things down?
USBlyzer log,
USB CDC stack,
test project
For the record this eventually turned out to be an oscillator problem. (Apparently the FLL's reference is always 1,024 Hz even when the 1,000 Hz USB frames are chosen. The slight clock error meant that a packet occasionally got rejected if it happened to contain one too many 1-bits in a row.)
I guess the moral of the story is to check the basics before assuming you've got a problem with the higher-level protocol. Also in retrospect a hardware USB analyzer would have been a worthwhile investment, the software alternatives mostly seems to spit out a generic error code or nothing at all when something goes awry.
Stalling the out-endpoint may happen on an overflow of the output buffer on the host side. Are you sure that the device does fetch the data it receives via out-endpoint - and if so does it fetch the data at least as fast as data is sent to the device?
Note that the device appears to work fine even for long periods of
time until I start sending "large" quantities of data.
This seems to be a hint for an overflow of the output-buffer.

Using WinPCap for UDP receiving

I would like to use WinPCap library for "reliable" UDP receiving in my C++ application. All examples that I found, using this library for capturing and then proceding. Is there any way (example) how to configure PCap for streaming mode and receive UDP only and on uder defined port or how to solve this. In this time I have reliable UDP server able to receiving 0.5Gb/s. But on slower PC I have a packet lose I can see packets in ethereal but not in application.
thanks
vsm
I assume that you have already tried all of the more standard methods of increasing the number of datagrams that you are able to process? Things like increasing the recv buffer size, speeding up the processing that you do per datagram and using IOCP to allow you to bring more threads to bear on the problem or using RIO if you can target Windows 8?
If so then using WinPCap might work but it sounds like a bit of an extreme solution.
What you need to do is create a filter so that you only capture the datagrams that you are interested in... The docs include examples which use filters.
I have server from here: http://www.gamedev.net/topic/533159-article-using-udp-with-iocp/. This code working with IOCP. Its working fine on WIndows XP. There is no problem with receiving 0.5Gb/s. But now on Win7 is little unreliable. Sometimes there are packets positions error. (my device generating udp packets and in its payload there is PacketNumber - number increasing with each packet. When error occured i write all packet numbers into file. I can see for exmaple: 10,11,290,13,14... ). Is there any known differences in WinXP and Win7 for IOCP and multi threading? Or do you konw any free UDP server with IOCP processing?
In procedding loop I only adding packets into buffer and checking their numbers.

When do USB Hosts require a zero-length IN packet at the end of a Control Read Transfer?

I am writing code for a USB device. Suppose the USB host starts a control read transfer to read some data from the device, and the amount of data requested (wLength in the Setup Packet) is a multiple of the Endpoint 0 max packet size. Then after the host has received all the data (in the form of several IN transactions with maximum-sized data packets), will it initiate another IN transaction to see if there is more data even though there can't be more?
Here's an example sequence of events that I am wondering about:
USB enumeration process: max packet size on endpoint 0 is reported to be 64.
SETUP-DATA-ACK transaction starts a control read transfer, wLength = 128.
IN-DATA-ACK transaction delivers first 64 bytes of data to host.
IN-DATA-ACK transaction delivers last 64 bytes of data to host.
IN-DATA-ACK with zero-length DATA packet? Does this transaction ever happen?
OUT-DATA-ACK transaction completes Status Phase of the transfer; transfer is over.
I tested this on my computer (Windows Vista, if it matters) and the answer was no: the host was smart enough to know that no more data can be received from the device, even though all the packets sent by the device were full (maximum size allowed on Endpoint 0). I'm wondering if there are any hosts that are not smart enough, and will try to perform another IN transaction and expect to receive a zero-length data packet.
I think I read the relevant parts of the USB 2.0 and USB 3.0 specifications from usb.org but I did not find this issue addressed. I would appreciate it if someone can point me to the right section in either of those documents.
I know that a zero-length packet can be necessary if the device chooses to send less data than the host requested in wLength.
I know that I could make my code flexible enough to handle either case, but I'm hoping I don't have to.
Thanks to anyone who can answer this question!
Read carefully USB specification:
The Data stage of a control transfer from an endpoint to the host is complete when the endpoint does one of
the following:
Has transferred exactly the amount of data specified during the Setup stage
Transfers a packet with a payload size less than wMaxPacketSize or transfers a zero-length packet
So, in your case, when wLength == transfer size, answer is NO, you don't need ZLP.
In case wLength > transfer size, and (transfer size % ep0 size) == 0 answer is YES, you need ZLP.
In general, USB uses a less-than-max-length packet to demarcate an end-of-transfer. So in the case of a transfer which is an integer multiple of max-packet-length, a ZLP is used for demarcation.
You see this in bulk pipes a lot. For example, if you have a 4096 byte transfer, that will be broken down into an integer number of max-length packets plus one zero-length-packet. If the SW driver has a big enough receive buffer set up, higher-level SW receives the entire transfer at once, when the ZLP occurs.
Control transfers are a special case because they have the wLength field, so ZLP isn't strictly necessary.
But I'd strongly suggest SW be flexible to both, as you may see variations with different USB host silicon or low-level HCD drivers.
I would like to expand on MBR's answer. The USB specification 2.0, in section 5.5.3, says:
The Data stage of a control transfer from an endpoint to the host is
complete when the endpoint does one of the following:
Has transferred exactly the amount of data specified during the Setup stage
Transfers a packet with a payload size less than wMaxPacketSize or transfers a zero-length packet
When a Data stage is complete, the Host Controller advances to the
Status stage instead of continuing on with another data transaction.
If the Host Controller does not advance to the Status stage when the
Data stage is complete, the endpoint halts the pipe as was outlined in
Section 5.3.2. If a larger-than-expected data payload is received from
the endpoint, the IRP for the control transfer will be
aborted/retired.
I added emphasis to one of the sentences in that quote because it seems to specifically say what the device should do: it should "halt" the pipe if the host tries to continue the data phase after it was done, and it is done if all the requested data has been transmitted (i.e. the number of bytes transferred is greater than or equal to wLength). I think halting refers to sending a STALL packet.
In other words, the device does not need a zero-length packet in this situation and in fact the USB specification says it should not provide one.
You don't have to. (*)
The whole point of wLength is to tell the host the maximum number of bytes it should attempt to read (but it might read less !)
(*) I have seen devices that crash when IN/OUT requests were made at incorrect time during control transfers (when debugging our host solution). So any host doing what you are worried about, would of killed those devices and is hopefully not in the market.