I was looking at a macro that imports several csvs from a fileserver. Running the macros takes a few seconds (20ish) to initialize before the first csv gets imported. the imports themselves happen fairly quick. If I run the amcro a second time, ther eis no delay.
When I manually open the folder on the file server with explorer it also takes quite a while (30 secs or so) until all the files are shown, so I assume the macro also has to wait until the relevant files are loaded. So, my question: Is there a way to have excel automatically index that folder to be able to open it quicker or can I already run a process in the background when opening the excel file that would read out the folder?
Cheers,
CE
Edit: I can not archive the folder and make it slimmer
The file might be cached in memory, thereby avoiding lengthy disk I/O. You need to monitor your machine activity in terms of CPU, I/O and Network activity to figure out where the time is spent. Launch perfmon.msc and add the relevant counters to do so.
Related
I am using XAMPP and PHPMyAdmin and I'm trying to load English Wikipedia. Since the file is so big (1.7GB), it take a lot of time. I'm wondering if there is any way to resume the loading process. I have no problem with TimeOut or something like that. The problem is that if my firefox crashes for any reason, the process must start from the scratch.
The part which says allow interrupt is already checked with a check mark. But the problem is that for such a big file that I am loading, it's really difficult to expect to be done without any interrupt. If the laptop is shut down or restarted or so, the process is repeated from the beginning. Is there any way to solve this problem?
In the meantime, I am using
$cfg['UploadDir'] = 'upload';
and load the file from the upload directory on my computer.
Thanks in advance
First, I would recommend against using phpMyAdmin for such a large file. You're going to be constrained by PHP/Apache resource limits for things such as execution time and memory used (or, apparently, some Firefox resource on the client side), to a degree that even if it works properly will have to be done in so many small chunks that it's just not ideal. Even using the UploadDir functionality, you're going to be limited in ways that make it non-ideal to import your file this way. I suggest using the command-line tool for importing a file of this size.
Secondly, if you're going to use phpMyAdmin anyway, it's better to uncompress the file and deal with the raw .sql. This is not intuitive, because of course you think the smaller filesize is better, but phpMyAdmin has to first uncompress the compressed file before it can begin working with it, which can cause problems such as the resource limits (or even running out of disk space). phpMyAdmin can pick up an aborted import, but if you're spending 95% of the execution time uncompressing the file each time, you're going to make very, very slow progress. Actually, I wonder if you're even getting the full file uncompressed on execution before PHP kills the process due to timeout.
phpMyAdmin can pick up execution part way through; you can select which line to begin the import from. If you restart your computer part way through the export, you can use this means to resume your partial import.
I am using Pentaho for reading a very large file. 11GB.
The process is sometime crashing with out of memory exception, and sometimes it will just say process killed.
I am running the job on a machine with 12GB, and giving the process 8 GB.
Is there a way to run the Text File Input step with some configuration to use less memory? maybe use the disk more?
Thanks!
Open up spoon.sh/bat or pan/kettle .sh or .bat and change the -Xmx figure. Search for JAVAMAXMEM Even though you have spare memory unless java is allowed to use it it wont work. although to be fair in your example above i can't really see why/how it would be consuming much memory anyway!
I realize this number will change based on many factors, but in general, when I write data to a hard-drive (e.g. copy a file), how long does it take for that data to actually be written to the platter after Windows says the copy is done?
Could anyone point me in the right direction to discover more on this topic?
If you are looking for a hard number, that is pretty much unknowable. Generally it is the order of a tens to a few hundred milliseconds for the data to start reaching the disk platters, but can be as high as several seconds in a large server disk array with RAID and de-duplication.
The flow of events goes something like this.
The application calls a function like fwrite().
This call is handled by the filesystem layer in your Operating System, which has to figure out what specific disk sectors are to be manipulated.
The SATA/IDE driver in your OS will talk to the hard drive controller hardware. On a modern PC, it typically uses DMA to feed the data to the disk.
The data sits in a write cache inside the hard disk (RAM).
When the physical platters and heads have made it into position, it will begin to transfer the contents of cache onto the platters.
Steps 3-6 may repeat several times depending on how much data is to be written, where on the disk it is to be written. Additionally, there is usually filesystem metadata that must be updated (e.g. free space counters), which will trigger more writes to the disk.
The time it takes from steps 1-3 can be unpredictable in a general purpose OS like Windows due to task scheduling, background threads, and your disk write is probably queued up with a few dozen other processes. I'd say it is usually on the order of 10-100msec on a typical PC. If you go to the Windows Resource Monitor and click the Disk tab, you can get an idea of the average disk queue length. You can use the Performance Monitor to produce more finely-controlled graphs.
Steps 3-4 are largely controlled by the disk controller and disk interface (SATA, SAS, etc). In the server world, you can be talking about a SAN with FC or iSCSI network switches, which impose their own latencies.
Step 5 will be controlled by they physical performance of the disk. Many consumer-grade HDD manufacturers do not post average seek times anymore, but 10-20msec is common.
Interesting detail about Step 5: Some HDDs lie about flushing their write cache to get better benchmark scores.
Step 6 will depend on your filesystem and how much data you are writing.
You are right that there can be a delay between Windows indicating that data writing is finished and the last data actually written. Things to consider are:
Device Manager, Disk Drive, Properties, Policies - Options for disabling Write Caching.
You might be better off using Direct I/O so that Windows does not save it temporarily in File Cache.
If your program writes the data, you can log what has been copied.
If you are sending the data over a network, you are likely to have no control of when the remote system has finished.
To see what is happening, you can set up Perfmon logging. One of my examples of monitoring:
http://www.roylongbottom.org.uk/monitor1.htm#anchor2
I have a problem. Everyday I have to upload my whole source code (it is a directory with several directories and files) to a server over VPN. The size of source code is around 250 MB. What I do everyday is, compress it (that reduces it's size to around 100 MB), then I transfer this zipped file over ftp to the server and finally unzip it there. It takes me around 20 minutes to transfer that.
I am sure there has got to be a better way of doing this than what I am doing. Either suggest me a better compression mechanism or faster upload method.
If you could set up a Version Control server it would be great, Mercurial and Git are perfect for this.
The other option is using rsync, which is a synchronizing tool that only uploads the differences between the two versions, avoiding repetitive transmission of data.
I'm assuming a UNIX-like environment here, but on windows the options are pretty much the same.
PD: this question is more fitted for superuser.
Lately I've been having problems reading big files on a network drive and I just can't pinpoint what I may be doing wrong. I tried both in C++ (Unmanaged) and in C# and had about the same performances on both...which were somewhat abysmal.
Sometimes it will read at 4 KB/s a file on the network, but if this file is located on the local HD it will achieve easily the maximum data rate the HD can output. That is with reading 64 KB chunks at a time... I tried with bigger buffers up to insane numbers, or smaller and it doesn't make much differences.
I tried async IO in C# with BeginRead on the FileStream and OVERLAPPED IO in C++ as well as synchronous reads and they all had the same problems, which is being slow on the network.
The only solution we came up with is to copy the file using the OS CopyFile function on the local HD before actually reading the file but I'm not too satisfied with this approach. It just seems like CopyFile is doing something we are not that makes it incredibly faster than our approach.
Anyone has a clue as to why this is?
We would have to guess, since you aren't showing us your code. So my guess is that Windows file copy is opening the file with the FILE_FLAG_SEQUENTIAL_SCAN flag which in turn causes the file system/cache to choose optimal block sizes and submit read requests in anticipation of read calls that havn't been submitted yet.
We only can assume that you have been trying really all possible methods of reading/writing. Have you been reading synchronously or asynchronously? Did you try I/O completion ports? Or ReadFileEx() function? I would guess that the Windows CopyFile() function detects that you want to read a file from network and will use different method for reading then it would use for disk access.
If you have really exhausted all possible reading methods, and if you really need thing to be solved, then I would suggest to check out a bit on what is the CopyFile() function doing. There are numerous tools for doing that. E.g.: this one (or some other -- links on the same page).