Creating files in synced location too slow

Creating files in synced location too slow - vba

I have a multitude of macros breaking down information into multiple files, e.g. for each row, create a separate worksheet; or for each row, create a .docx and a .pdf document.
Now to test these macros, I always need to move outside my folders synced to OneDrive/Sharepoint because whenever a new file is created in a synced location, Office takes its damned time to do some stuff related to the synchronisation, thus considerably slowing down the macro execution.
This is equally, or even moreso a problem in production where the macro is run on a much larger sample and by other users, so I have to train them to move the file outside of the shared location (dedicated to collaboration) to their own drive.
Is there a way to defer these actions after the macro execution (besides disabling OneDrive app)? This is causing me issues with development as I am used to the file being autosaved and to having my own version control. This is equally important during the testing, when I change a lot of the code.

Related

Attaching a specific piece of non-intrusive info to a file or folder to keep a connection to a program

This is going to be a question with a lot of hypotheticals, but it's been on my mind for a while now and I finally want to get some perspectives on how to tackle this "issue". For the sake of the question, I'll make up an example requirement of how the program I want to make would work on a conceptual level without too many specifics.
The Problem
I want to create a program to keep track of miscellaneous info for files and folders. This miscellaneous info can be anything from comments, authors, to more specific info like the original source of the file (a URL for example), categories, tags, and more. All this info is kept track of in an SQLite database.
Now... how would you create a connection to the file (or folder) to the database? Whatever file is added to the program, the file should continue to operate on an independent level from the program, meaning you should be able to edit, copy, move, rename or do anything else with the file you would usually do with your OS of choice - even deleting it.
You should even be able to archive it, zip it, upload it somewhere or do other things that temporarily or permanently removes the file from your system, without losing the connection to the database. The program itself doesn't actually ever touch the files themselves, unless to generate a new entry in the database, but obviously, there should be some kind of reference in the file to a database entry in the program.
Yes, I know that if you delete the file, you would have a dead entry in the database. For now, just treat this as an unfortunate reality that can't be solved unless you incorporate the file more closely into the program.
Possible solutions and why I decided against them
Reference inside Filename
Probably the most obvious choice, you could just have a reference inside the filename to point to a database entry, for example by including the id at the start of the filename:
#1 my-example-file.txt
#12814 this-is-one-of-many-files.txt
Obviously, that goes against what I established earlier, as you would be restricted from freely renaming the file. You would always have to keep in mind to not mess with the id inside the filename, or else the connection to your program is broken. Unfortunately, that is the best bet I currently have, but I would like to avoid using that approach if possible.
Alternate Data Streams (ADS)
A pretty cool feature I recently discovered that's available on NTFS file systems, ADS allows you to store different streams of data for your files, to grossly simplify it. You could attach a data stream to your file that saves the id for the database entry in the program, and a regular user would never be able to mess directly with that.
However, since this is a feature reserved for specific file systems, there's some ugly side effects to ADS, as you can easily lose that part of the file by:
moving/copying it to a file system that doesn't support ADS, such as the file systems most often used in removable drives
uploading it to a cloud then later downloading it
moving it to another OS that might not support ADS or treats it in an unexpected way
zipping it
Thus I can't really rely on ADS either.

Is Dropbox considered a Distributed File System?

I was just reading this https://en.wikipedia.org/wiki/Clustered_file_system#Distributed_file_systems
The definition of a DFS seems to exactly describe Dropbox to me but it isn't in the list of examples, which of course it would be if it was one I think.
So what is different about Dropbox which makes it not fall into this category?

Usually, when talking about distributed file-systems, you expect properties that Dropbox doesn't support. For example, if you and I share a folder, I can create a file called "work.txt" in it and you can create a file "work.txt" in it, and if we do it fast enough (or when we're not syncing with dropbox) we'll have conflicting copies of the same file.
A similar example would be if we both edit the same file concurrently - we'll have conflicting copies, which is something a distributed file system should prevent. In the link you refer to, this is called "Concurrency transparency; all clients have the same view of the state of the file system".
Another example of a property dropbox doesn't support: if my computer fails (e.g., my hard-drive is corrupted) I might lose data that wasn't uploaded to Dropbox. There is a small window in which I think my data was written to the local disk, but if my computer fails, I lose that data.
Lastly, I'm not sure how Dropbox will operate with file locks. For example, MS office takes locks on .doc files, to ensure no one else is working on them at the same time. I don't think Dropbox supports this feature.
I've written a blog post about some of complexities of implementing a distributed file-system, you might find it helpful as well.

FIleSystemWatcher.Created how does it work?

I am working on a project that will copy files to a database every time something is added to a specific directory. Now the program works fine when I'm testing with a small set of data but I was wondering if someone could explain how the FileSystemWatcher.Created event work.
My main concern is when I use this on a larger scale the program may slow down when it handles 100,000+ files.
If this is an issue could anyone explain if there is some sort of workaround to polling the original folder, lets call that "C:\folder", and maybe poll a temp folder instead.

I have not tested the watcher with 100,000 files. However, in most cases you should not have so many files in a folder awaiting processing. I recommend a structure like
C:\folder
C:\folder\processing
C:\folder\archive
C:\folder\error
As soon as you begin working on a given file, move it into processing. If you successfully process it, move the file again to archive. If there is an error while processing a file, instead move it into error.
This will make it easier for you to keep the files organized and diagnose problems that occur in production.
With that file structure, you will not run into issues with large numbers of files in the folder you are watching, unless you receive files in incredibly large bursts compared to the speed with which they can be moved into the processing state.

Objective-C - Finding directory size without iterating contents

I need to find the size of a directory (and its sub-directories). I can do this by iterating through the directory tree and summing up the file sizes etc. There are many examples on the internet but it's a somewhat tedious and slow process, particularly when looking at exceptionally large directory structures.
I notice that Apple's Finder application can instantly display a directory size for any given directory. This implies that the operating system is maintaining this information in real time. However, I've been unable to determine how to access this information. Does anyone know where this information is stored and if it can be retrieved by an Objective-C application?

IIRC Finder iterates too. In the old days, it used to use FSGetCatalogInfo (an old File Manager call) to do this quickly. I think there's a newer POSIX call for that these days that's the fastest, lowest-level API for this, especially if you're not interested in all the other info besides the size and really need blazing speed over easily maintainable code.
That said, if it is cached somewhere in a publicly accessible place, it is probably Spotlight. Have you checked whether the spotlight info for a folder includes its size?
PS - One important thing to remember when determining the size of a file: Mac files can have two "forks", the data fork, and the resource fork (where e.g. Finder keeps the info if you override a particular file to open with another application than the default for its file type, and custom icons assigned to files). So make sure you add up both forks' sizes, or your measurements will be off.

Improve performance for large PPT Slide-Copy task

Here is the thing: I need a VBA script that generates power point presentations from others. The main difficulty here are the large file sizes. The final PPTs are going to contain up to 1000 slides each. I have to open the initial PPTs and re-sort them to tell the long story short.
A huge factor will be the reopening task. I can not open all those files at once at this point since the machine would run out of memory very quick.
Is there somhow a time saving or memory saving method to fulfil this task? Since this task is mainly a re organizational thing there might be a way to accomplish my needs.
I would be thankful for any help.
Best regards.

In order to insert slides from one file into another, you either need to open the source file yourself or let PPT do it implicitly behind the scenes.
If you let it do the work (ie, by using the Slides.InsertFromFile method) you can only insert contiguous ranges, and for each invocation of the method, PPT will open/close the file. If you need to work with non-contiguous ranges, you'll save time by opening the file (windowless if you like) and managing the copy process yourself. That way you can do a bit of pre-sorting and open each source file only one time.
Also, current versions of PPT will open files considerably faster if they're saved as PPTX rather than PPT, I've noticed. The difference isn't especially apparent with small files but can become quite noticeable as files get larger.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas