How to stop overflowing file descriptors in C

How to stop overflowing file descriptors in C - system

I am writing a message to a file descriptor that is 88kb. The file descriptor on my linux system only can hold 64kb. Once the data gets put on the file descriptor it gets read and piped into a tcp connection.
How do i know if I can write more data to the file descriptor, meaning the file descriptor is empty? I need a function that blocks until the file descriptor is empty or at least can return a value letting me know how data I can safely write to the file descriptor.

I think select() is probably what you want. With the right arguments, you can arrange
for it to block until your file descriptor is ready for your I/O operations.
Your question is worded a little strangely, though, since we don't generally refer to
file descriptors as having a capacity, or being "empty". It sounds like maybe
you're talking about a file descriptor that represents one endpoint of a pipe,
and it's the pipe (not the file descriptor) that has a capacity of 64 KB.

Sounds to me more like the OP is trying to send a datagram (AF_UNIX or AF_INET/UDP) larger than the configured maximum packet size. Other possibilities are pipes and FIFOs, which have maximum atomic write sizes; the fix here is to do multiple writes, and successive writes will block until there is room for them in the pipe/FIFO.

Related

A rarely mentioned Vulkan function "vkCmdUpdateBuffer()", what is it used for?

This seems to be a simple Vulkan API question but I really can not find answer after search Internet.
I noticed there is a Vulkan function:
void vkCmdUpdateBuffer(
VkCommandBuffer commandBuffer,
VkBuffer dstBuffer,
VkDeviceSize dstOffset,
VkDeviceSize dataSize,
const void* pData);
At first glance, I thoughts it can be used to record the command buffer since it has prefix vkCmd in its name, but the document says that
vkCmdUpdateBuffer is only allowed outside of a render pass. This command is treated as “transfer” operation, for the purposes of synchronization barriers.
So I start thinking that it is a convenience function that wraps the buffer data transferring operation like using memcpy() to copy the data from host to the device.
Then my question is: Why there is NOT a single Vulkan sample / tutorial (I have searched all of them) using vkCmdUpdateBuffer() instead of manually coping data by memcpy(). Did I understand it wrong?

All vkCmd* functions generate commands into a command buffer. This one is no exception. It is a transfer command, and like most transfer commands, you don't get to do them within a render pass. But there are plenty of command buffer generating commands that don't work in render passes.
Normally Vulkan memory transfer operations only happen between device memory. The typical mechanism for the host to put something in device memory is to write to a mapped pointer. But by definition, that requires that the destination memory be mappable. So if you want to write something to non-mappable memory, you have to copy it to mappable memory, then do a transfer operation between the mappable memory to the non-mappable memory via vkCmdCopy* functions.
And that's fine if you're doing a bunch of transfers all at once. You can copy a bunch of stuff into mapped memory, then submit a batch containing all of the copy operations to copy the data into the appropriate locations.
But sometimes, you're just updating a small piece of device memory. If it's not mappable, then that's a lot of work to do just to get a few kilobytes of data to the GPU. In that case, vkCmdUpdateBuffer may be the better choice, since it can "directly" copy from CPU memory to any device memory.
I say "directly" because that's obviously not what it's doing. It's really doing the same thing you would have done, except it's doing it within the command buffer. You would have copied your CPU data into GPU mappable memory, then created a command that copies from that mappable memory into non-mappable memory.
vkCmdUpdateBuffer does the exact same thing. It copies the data from the pointer/size you give it into mappable memory (which is provided by the command buffer itself. This is why it has an upper limit of 64KB). This copy happens immediately, just as it would have if you did a memcpy, so when this function returns, you can do whatever you want with the pointer you gave it. Then it creates a command in the command buffer that copies from the mappable memory in the command buffer to the destination memory location.
The documentation for this function explicitly gives warnings about using it for larger transfers. That is, it tells you not to do that. This is for quick, small, one-shot updates of unmappable memory. Nothing more.
That's one reason why tutorials don't talk about it: it's a highly special-case function that many novice users will try to use because it's easier than the explicit code. But in most cases, they should not be using it.

Where are buffers located?

I hear a lot about flushing buffers, sending to buffer etc. but I don't have a visual image about where buffers reside and how they look like.
Are buffers part of the OS' kernel or part of each process? If the case is the first, can the same buffers be used by multiple processes?

A buffer is a generic term for a collection of bytes, typically used in the context of either sending, receiving or storing information where the internal data-structure of the information isn't important.
In the case of "flushing" buffers, this typically is used in the context of sending data either to a file or network; the buffer in this case being used to coalesce multiple small writes to the file or network into one larger and more-efficient-to-transmit buffer. After the final write has been performed (or after some "commit" point), the buffer must be "flushed" to ensure that any data left waiting to coalesce with a future write is committed immediately to the underlying file sent over the network rather than left waiting for a future write that might never come.
In both the case of network and file IO, buffers are usuaully used in multiple places. File IO may well be buffered by a buffer in the application, in a library (for instance an implementation of fwrite may buffer the output), in the kernel and even on the device itself - network writes may well be buffered by the device whilst waiting for bandwidth on the wire and hard-disk drives will buffer output from the OS to ensure that data isn't lost as the physical platters spin to the correct position for the write.

IPC through writing into files?

I have a question about Inter-process-communication in operating systems.
Can 2 processes communicate with each other by both processes opening the same file (which say was created before both processes, so both processes have the file handler) and then communicating by writing into this file?
If yes, what does this method come under? I have heard that 2 major ways of IPC is by shared-memory and message-passing. Which one of these, this method comes under?
The reason, I am not sure if it comes under shared-memory is that, because this file is not mapped to address space of any of these processes. And, from my understanding, in shared-memory, the shared-memory-region is part of address space of both the processes.
Assume that processes write into the file in some pre-agreed protocol/format so both have no problem in knowing where the other process writes and when etc. This assumption is to merely understand. In real world though, this may be too stringent to hold true etc.
If no, what is wrong with this scenario? Is it that if 2 different processes open the same file, then the changes made by 1st process are not flushed into persistent storage for others to view until the process terminates? or something else?
Any real world example from Windows and Linux should also be useful.
Thanks,

Using a file is a kind of shared memory. Instead of allocating a common memory buffer in RAM, a common file is used.
To successfully manage the communication some kind of locking mechanism for different ranges in the file is needed. This could either be locking of ranges provided by the file system (available at least on Windows) or global operating system mutexes.
One real-world scenario where disk storage is used for inter-process-communication is the quorom disk used in clusters. It is a common disk resource, accessible over a SAN by all cluster nodes, that stores the cluster's configuration.

The posix system call mmap does mappings of files to virtual memory. If the mapping is shared between two processes, writes to that area in one process will affect other processes. Now coming to you question, yes a process reading from or writing to the underlying file will not always see the same data that the process that has mapped it, since the segment of the file is copied into RAM and periodically flushed to disk. Although I believe you can force synchronization with the msync system call. Do read up on mmap(). It has a host of other memory sharing options.

process control block vs process descriptor

what is the exact difference between process control block and process descriptor?.
I was reading about kernel of linux. It was written that there is some thread_info structure which contains the pointer to actual process descriptor table. It was written that the thread_info lies just above/below of kernel stack. So definitely thread_info is in main memory. But what about actual process descriptor task_struct? where is it located? If process descriptor resides in main memory, where is the actual place for it ?

The thread_info and task_struct structures are just two different structures that hold different pieces of information about a thread, with the thread_info holding more architecture-specific data than the task_struct. It makes more sense to split up the information rather than keep it all in the same structure. (Although you could put them in the same struct; the 2.4 Linux kernel did this.)
How those structs are allocated depends on the architecture you're using. The relevant functions you want to examine are alloc_task_struct() and alloc_thread_info().

In the kernel, the process descriptor is a structure called task_struct, which keeps track of process attributes and information. All kernel information regarding a process is found there.

Max file size for File.ReadAllLines

I need to read and process a text file. My processing would be easier if I could use the File.ReadAllLines method but I'm not sure what is the maximum size of the file that could be read with this method without reading by chunks.
I understand that the file size depends on the computer memory. But are still there any recommendations for an average machine?

On a 32-bit operating system, you'll get at most a contiguous chunk of memory around 550 Megabytes, allowing loading a file of half that size. That goes down hill quickly after your program has been running for a while and the virtual memory address space gets fragmented. 100 Megabytes is about all you can hope for.
This is not an issue on a 64-bit operating system.
Since reading a text file one line at a time is just as fast as reading all lines, this should never be a real problem.

I've done stuff like this with 1-2GB before, albeit in Python. I do not think .NET would have a problem, though. But I would only do this for one-off processing.
If you are doing this on a regular basis, you might want to go through the file line by line.

Its bad design unless you know the files sizes vs the computer memory that would be avaiable in the running app.
A better solution would be consider memory mapped files. They use themselvses as page fil storage,

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to stop overflowing file descriptors in C - system

Related

A rarely mentioned Vulkan function "vkCmdUpdateBuffer()", what is it used for?

Where are buffers located?

IPC through writing into files?

process control block vs process descriptor

Max file size for File.ReadAllLines

Categories

Resources