Pharo: download file in parallel chunks - smalltalk

I'm trying to download a large file in chunks, with each chunk downloaded in parallel with the others. All chunks would be combined at the end.
However, from my understanding, Pharo's basic threading interleaves instead of being truly parallel, meaning one chunk has to wait for the previous to finish.
Is there a way to accomplish something like this?
Note: apologies for lack of code. I had a "working" project, but I don't have access to that code at the moment.

one chunk has to wait for the previous to finish
The key is that with non-blocking sockets you don't have to wait for a chunk to finish. Look for something like SocketStream>>isDataAvailable and SocketStream>>nextAvailable and read only what is available without blocking.

Related

The best way to add a new 3D object at runtime

All the time till now I had 3D objects created during the startup. But now I need to add them dynamically. What can be simpler, I thought...
The main issue right now is how to upload the new object's data in the fastest way and find out when the data is uploaded.
Here's my setup:
I'm using the vulkan memory allocator library, so I'm free form memory management burden.
I'm planning to use a separate VkBuffer for every object - this way I don't need to manage offsets, alignments and it would be easier to add/remove objects.
And here are my thoughts/questions:
How to upload the data? I want the buffer to be gpu-visible only, that means I need a staging buffer.
If I use the staging buffer I need to know when the data is ready to use on the gpu. I don't want to flush the pipeline and wait. The only way I see is to use a fence per object and only call the draw command when this fence is ready.
If I use a staging buffer and want to upload multiple objects during a short frame, I need somehow to be sure that the parts of this staging buffer not being overridden by different objects. For this, I need to keep it big, handle alignment for the offsets. But how big?
I'm pretty sure I'm overcomplicating. I believe there should be a much simpler pattern. How would you do this?
I believe there should be a much simpler pattern.
It's Vulkan; it's an explicit, low-level API. "Simple" is not its goal.
Overall, your Vulkan code needs to be written to adapt to the capabilities of the hardware. That's the best way to get performance out of it.
The first decision that needs to be made is whether you need staging at all. Staging (for buffer copies) is only necessary if your device's DEVICE_LOCAL memory is not mappable. And yes, there are (integrated) GPUs that allow you to map DEVICE_LOCAL memory. If that is the case, then you can just write directly to where you need the data to go.
If staging is needed, then you need to decide if the hardware supports an independent transfer-only queue. If so, then you will likely get performance benefits by employing it. Not all hardware supports transfer-only queues, so your application needs to adapt. Also, transfer-only queues can have restrictions on the granularity of memory transfers taking place on those queues, so you need to check to see if your streaming strategy fits within the limits of that particular hardware.
Also, if there is no appropriate transfer queue, you can create the effect of a transfer queue by using a second compute or graphics queue... if the hardware supports multiple queues at all. Being able to submit transfer commands and rendering commands on different queues is a good thing, assuming you are taking advantage of threading (ie: issuing submits of the batches to the different queues on different threads).
If you are able to use a separate queue for transfers (whether a true transfer queue or just a separate compute/graphics queue), then you get to play around with semaphores. The batch that transfers data must signal a semaphore when it completes; this is part of the batch in the vkQueueSubmit call. The batch on the main queue that uses the transferred data for some process needs to wait on that semaphore. So both threads need to be using the same VkSemaphore object. And the wait on the semaphore should just have a global memory barrier, to make the memory visible.
The tricky part is this: you cannot submit the batch that waits on the semaphore until the submit call for the batch that signals it has been submitted. You don't have to wait until completion, but you do have to wait until the vkQueueSubmit call on the transfer queue has returned. So you need a way to transfer the semaphore between different threads, or you could just issue both submit commands on the same thread.
If you aren't using a second queue, then things are slightly simpler.
You still want to build the transfer command buffer itself on a different thread (to take advantage of threading CB construction). But that CB now needs to be communicated to the thread responsible for submitting the rendering stuff. And this channel of communication needs to know that this CB contains transfer commands, which some of the rendering CB processes ought to wait on.
The simplest and most flexible way to do this is to build the transfer CB so that the last command is a vkCmdSetEvent command (and the first command is a vkCmdResetEvent to reset it from previous frames of usage). The submission thread then only needs to create a small CB that only contains a vkCmdWaitEvents command which waits on the transfer event that will be set. That command should issue a full memory barrier, and that CB should execute between the transfer CB and any rendering CBs that read from the transferred data.
The flexibility of this is in the structure of the process. It is structured similarly to how the multi-queue version works. In both cases, a separate thread needs to communicate something to the render submission thread (in one case, a semaphore; in the other, a CB and an event). And the render submission thread needs to do things to wait on that "something", but without disrupting the process of building the rendering commands itself (in one case, you just change the batch to wait on the semaphore; in the other, you insert a CB that waits for the event).
If you want to get a bit smarter about execution dependencies, you can even have the transfer operation forward information about which pipeline stages need to wait on the operation. But that's mostly an optimization.
Here's the thing though: all of the staging cases are not performance-friendly. They're problematic because you can't do anything while the transfer operation is going on. And that is the case because... you're trying to read from the memory in the same frame you're writing to it. That's bad.
You should endeavor instead to delay rendering any objects for which loading is not complete. Or put another way, you want to load the data for new objects before you need them, not on the same frame you need them. This is what streaming systems do: they pre-emptively load data that will be needed soon, but not right now.
But how big?
Only you and your use cases can answer that question. If you are streaming in fixed-sized blocks (which you should do where possible), then it's fairly easy: your staging buffer should be one or maybe two streaming blocks in size. If your rendering system is more flexible, imposing few limitations on the higher-level code, then your staging buffer and your streaming system needs to be more flexible. And there's no right answer for that; it depends entirely on how it gets used.
Welcome to using explicit, low-level APIs.

How to resume loading in PhpMyAdmin?

I am using XAMPP and PHPMyAdmin and I'm trying to load English Wikipedia. Since the file is so big (1.7GB), it take a lot of time. I'm wondering if there is any way to resume the loading process. I have no problem with TimeOut or something like that. The problem is that if my firefox crashes for any reason, the process must start from the scratch.
The part which says allow interrupt is already checked with a check mark. But the problem is that for such a big file that I am loading, it's really difficult to expect to be done without any interrupt. If the laptop is shut down or restarted or so, the process is repeated from the beginning. Is there any way to solve this problem?
In the meantime, I am using
$cfg['UploadDir'] = 'upload';
and load the file from the upload directory on my computer.
Thanks in advance
First, I would recommend against using phpMyAdmin for such a large file. You're going to be constrained by PHP/Apache resource limits for things such as execution time and memory used (or, apparently, some Firefox resource on the client side), to a degree that even if it works properly will have to be done in so many small chunks that it's just not ideal. Even using the UploadDir functionality, you're going to be limited in ways that make it non-ideal to import your file this way. I suggest using the command-line tool for importing a file of this size.
Secondly, if you're going to use phpMyAdmin anyway, it's better to uncompress the file and deal with the raw .sql. This is not intuitive, because of course you think the smaller filesize is better, but phpMyAdmin has to first uncompress the compressed file before it can begin working with it, which can cause problems such as the resource limits (or even running out of disk space). phpMyAdmin can pick up an aborted import, but if you're spending 95% of the execution time uncompressing the file each time, you're going to make very, very slow progress. Actually, I wonder if you're even getting the full file uncompressed on execution before PHP kills the process due to timeout.
phpMyAdmin can pick up execution part way through; you can select which line to begin the import from. If you restart your computer part way through the export, you can use this means to resume your partial import.

Pentaho text file input step crashing (out of memory)

I am using Pentaho for reading a very large file. 11GB.
The process is sometime crashing with out of memory exception, and sometimes it will just say process killed.
I am running the job on a machine with 12GB, and giving the process 8 GB.
Is there a way to run the Text File Input step with some configuration to use less memory? maybe use the disk more?
Thanks!
Open up spoon.sh/bat or pan/kettle .sh or .bat and change the -Xmx figure. Search for JAVAMAXMEM Even though you have spare memory unless java is allowed to use it it wont work. although to be fair in your example above i can't really see why/how it would be consuming much memory anyway!

How to reduce PhantomJS's CPU and memory usage?

I'm using PhantomJS via Python's webdriver lib. It eats lots of RAM and CPU, and it's an issue because I'd like to run as many instances as it's possible.
Some google'ing didn't give me anything helpful. So I'll ask directly:
Does the size matter? If I set driver.set_window_size(1280, 1024), will it eat more memory than 1024x768?
Is there any option in the source code which can be turned off without real issues and which lead to significant memory usage reduce? Yes I still need images and CSS and JS loading and applying, but I can get rid of some other features... For example, I can turn off caching (and load all media files every time). Yes, I do need to speed it up and make it less greedy and I'm ready to re-compile it... Any ideas here?
Thanks a lot!
I assume you call phantomjs once for every rendering job. This creates a new phantomjs process every time. You could try batching as many as you could in the one js script and call phantomjs once for the whole batch.

Reading a whole file on a network drive the fast way (Windows, C/C++, C#, ...)

Lately I've been having problems reading big files on a network drive and I just can't pinpoint what I may be doing wrong. I tried both in C++ (Unmanaged) and in C# and had about the same performances on both...which were somewhat abysmal.
Sometimes it will read at 4 KB/s a file on the network, but if this file is located on the local HD it will achieve easily the maximum data rate the HD can output. That is with reading 64 KB chunks at a time... I tried with bigger buffers up to insane numbers, or smaller and it doesn't make much differences.
I tried async IO in C# with BeginRead on the FileStream and OVERLAPPED IO in C++ as well as synchronous reads and they all had the same problems, which is being slow on the network.
The only solution we came up with is to copy the file using the OS CopyFile function on the local HD before actually reading the file but I'm not too satisfied with this approach. It just seems like CopyFile is doing something we are not that makes it incredibly faster than our approach.
Anyone has a clue as to why this is?
We would have to guess, since you aren't showing us your code. So my guess is that Windows file copy is opening the file with the FILE_FLAG_SEQUENTIAL_SCAN flag which in turn causes the file system/cache to choose optimal block sizes and submit read requests in anticipation of read calls that havn't been submitted yet.
We only can assume that you have been trying really all possible methods of reading/writing. Have you been reading synchronously or asynchronously? Did you try I/O completion ports? Or ReadFileEx() function? I would guess that the Windows CopyFile() function detects that you want to read a file from network and will use different method for reading then it would use for disk access.
If you have really exhausted all possible reading methods, and if you really need thing to be solved, then I would suggest to check out a bit on what is the CopyFile() function doing. There are numerous tools for doing that. E.g.: this one (or some other -- links on the same page).