What exactly is the VBA/VB6 Collection? - vba

For integer keys the speed of indexing is O(N), so it looks like it is a list. But apparently (wqw's comment in Unable to properly loop through keys AND values in a VB6 For Each loop) access time for string keys is O(log(N))... Maybe it is a tree built on top of single-linked list (indexing the list elements)?
I tried hard to get the documentation about this, but all I can find is how to use the Collection, never what is its data model.

I doubt you'll find much of anything regarding the implementation details of VBA.Collection. (Official usage reference at https://learn.microsoft.com/en-us/office/vba/language/reference/user-interface-help/collection-object)
The general advice is to consider using the VBScript Dictionary object instead; certainly use that instead if you need access to the keys. According to this, Dictionary is also faster up to a point: https://fastexcel.wordpress.com/2012/07/10/comparing-two-lists-vba-udf-shootout-between-linear-search-binary-search-collection-and-dictionary/
Another alternative I've done myself was to make use of the .NET Hashtable (making reference to mscorlib.dll). I've not done any performance tests of that versus VBScript Dictionary, so I don't if it's worth the effort (but it was fun to implement). An example is at Cannot iterate Hashtable in VBA (Excel).

Related

How to use Kotlin's sequences and lambdas in a way that's memory efficient

So I'm in the process of writing some code that needs to be both memory efficient and fast. I have a working reference in java already, but was rewriting it in kotlin.
I basically need to load a lot of csv files and load them into a tree once and then traverse them repeatedly once they're loaded.
I originally wrote the whole thing using sequences, but found it cause the GC to spike repeatedly.
I can't really share this code, but was wondering if yall know what would cause this to happen.
I'll be happy to add details as you need them, but here's my basic pattern.
step1: inputStream -> csvLines: List<String>
step2: csvLines.drop(x).fold(emptySequence()) -> callOtherFunctionWithFold -> callOtherFunctionWithFold -> Sequence<OutputObjects>
I keep the csvLines as a seperate list because I'm access specific rows based on the rules I need.
step3: Sequence<OuputObjects> -> nodes
The result is functional, but this code is much less memory efficient and less performant than the java equivalent only using arraylists and modifying them in place.
After looking at the visualvm output, I created a ton of kotlin.*.ArrayIterators. It looks like I create one every time I use a lamda.
So what can I do to make this more efficient? I though sequences were supposed to reduce object creation lazily, but it looks like I'm doing things that break its ability to do so.
Do sequences reevaluate after ever GC run or run in general? If so, that would make them unsuitable to use in objects that are loaded at startup, right?
To use Kotlin sequences, you need to start with asSequence()
csvLines.asSequence()
.drop(x)
.fold(...)
...
If you leave that out, it uses Collection functions instead which creates a new (intermediate) collection after every function.

Dict vs Record in elm

While implementing a simple app I ran into the problem of trying to update a nested record. I found a solution online but it really seems like a whole lot of bloated code.
As I was looking for alternatives I found Dictionaries. This seem like a solution to that problem -- If I use a dictionary inside of a record I can avoid all that bloated code and get nested updates.
Seeing dictionaries and records next to each other made me wonder, why would I use a record instead of a dictionary, or vice versa? The two seem really similar to me, so I am not sure I see the advantage of one or the other. Of course I can see that there is a difference in syntax, but is that all ?
I learned somewhere that the access time complexity of Dict is O(log(n)) -- does it do a binary search on the keys ? -- but I can't find the access time complexity for record, but I am wondering if that is O(1) and that is one of the advantages.
Either way, they both seem to map to 1 single data structure in other languages (e.g Python's dictionaries, JS objects, Java hash-tables), why do we need two in elm ?
Dicts and records might seem very similar when coming from JavaScript, but in a statically typed language they are actually very different. I think just about the only property they have in common is that they are both key-value containers.
The biggest differences, I think, are that Dicts are homogeneous, meaning values must be of the same type, and "dynamically" keyed and sized, meaning keys are not statically checked (ie. at compile-time) and that key-value pairs can be added at runtime. Records on the other hand includes the key names and value types in the record type, which means they can hold values of different types, but also can't have keys added or removed at runtime without changing the type itself.
The benefits of easily being able to insert and update a Dict is something you pay for when you try to get it back out. Dict.get returns a Maybe which you'll then have to handle, because the type doesn't give any guarantee that it contains anything at all. You also won't get a compiler error if you mistype the name of a key.
Overall, a Dict forsakes most of the benefits of static typing. I think a good rule of thumb is that if you know the key names, you should most likely go with records. If you don't, go with Dict.
You also seem about right regarding performance, but I think that's a secondary concern. Record access should be equivalent to accessing the elements of an array by index, since so much information is known at compile time that it can essentially be compiled down to a fixed-size array.

Reassign an interface or allow GC to do its work on temporary variables

I'm very new to Go and am currently porting a PHP program.
I understand that Go is not a dynamically-typed language and I like that about it. It seems very structured and easy to keep track of everything.
But I've been coming across situations that seem to be a little ... ugly. Is there a better way of performing this sort of process:
plyr := builder.matchDetails.plyr[i]
plyrDetails := strings.Split(plyr, ",")
details := map[string]interface{}{
"position": plyrDetails[0], "id": plyrDetails[1],
"xStart": plyrDetails[2], "zStart": plyrDetails[3],
}
EDIT:
Is there a better way to achieve a map containing the strings from plyr than to create two additional variables, to be destroyed straight afterwards? Or is this the correct way?
tl;dr:
If possible, choose a different format and let a library do the string parsing/generation for you
Use structs rather than maps for anything you use a few times, for more compiler checks
The common way of using encoding/json accomplishes both of those.
Meanwhile, don't sweat perf too much because you'll probably vastly improve the old app's speed regardless; there's no indication speed of parsing or GC is a problem yet; and the syntactical differences you mentioned in the first rev. of the post don't necessarily actually relate to GC.
So, I understand you may be porting piece-for-piece, and that may limit what you can change now.
But if/when you can change things, a really clean solution would be to use the encoding/json package and a struct: the json package will parse input/generate output from structs without any manual string manipulation on your part, and using a struct gives you compile-time checking rather than only the runtime checking you get with a map. Lots of Go apps (and others) use JSON to expose their services.
An intermediate step could be to introduce struct types for any internal structure you use at least a few times, rather than maps, so even without updating the parsing, at least the internals of the app get the benefits of compile-time checking. structs are also what things like the gorm object/relational mapper expect to deal with. They happen to use less memory than maps, and be quicker (and more concise syntactically) to access, but those aren't even necessarily the most important considerations here.
On the performance of what you have now, and particularly whether different syntax would make it faster: don't sweat that, for a bunch of reasons: the port's likely to be faster than the PHP was whatever you do; we don't yet have any indication that parsing or GC is actually slow or your bottleneck; and the syntactical differences you talked about in the first revision of your question may not relate to GC much or at all. More/fewer var names in your code may not correspond to more/fewer heap allocations, 'cause often Go can allocate on the stack, briefly discussed under 'escape analysis' in Dave Cheney's Gocon Tokyo slides. And as peterSO said, we seem to be looking at allocations of smallish references, not, say, copying all of the string bytes from the request each time.
Go is NOT PHP. Write Go programs in Go. Write PHP programs in PHP.
Interface values are represented as a two-word pair giving a pointer
to information about the type stored in the interface and a pointer to
the associated data. Go Data Structures:
Interfaces
Reusing Go interface variables to "increase performance" makes no sense.

Arrays in Visual Basic

In declaring an array in VB, would you ever leave the zero element empty and adjust the code to make it more user friendly?
This is for Visual Basic 2008
No, I wouldn't do that. It seems like it might help maintainability, but that's a very short-sighted view.
Think about it this way. It only takes each programmer who has to understand and maintain the code a short amount of time to get comfortable with zero-indexed arrays. But if you're using one-based arrays, which are unlike those found in almost all other VB.NET code, and in fact almost every other common programming language, it will take everyone on the team much longer. They'll be constantly making mistakes, tripping up because their natural assumptions aren't accurate in this one special case.
I know how it feels. When I worked in VB 6, I loved one-based arrays. They were very natural for the type of data that I was storing, and I used them all over the place. Perfectly documentable here, because you have an explicit syntax to specify the upper and lower bounds of the array. That's not the case in VB.NET (which is a newer, but incompatible version of the Visual Basic language), where all arrays have to be zero-indexed. I had a hard time switching to VB.NET's zero-based arrays for the first couple of days. After that initial period of adjustment, I can honestly say I've never looked back.
Some might argue that leaving the first element of every array empty would consume extra memory needlessly. While that's obviously true, I think it's a secondary reason behind the one I presented above. Good developers write code for others to read, so I commend you for considering how to make your code logical and understandable. You're on the right path by asking this question. But in the long run, I don't think this decision is a good one.
There might be a handful of exceptions in very specific cases, depending on the type of data that you're storing in the array. But again, failing to do this across the board seems like it would hurt readability in the aggregate, rather than helping it. It's not particularly counter-intuitive to simply write the following, once you've learned how arrays are indexed:
For i As Integer = 0 To (myArray.Length - 1)
'Do work
Next
And remember that in VB.NET, you can also use the For Each statement to iterate through your array elements, which many people find more readable. For example:
For Each i As Integer In myArray
'Do work
Next
First, it is about programmer friendly, not user friendly. User will never know the code is 0-based or 1-based.
Second, 0-based is the default and will be used more and more.
Third, 0-based is more natural to computer. From the very element, it has two status, 0 and 1, not 1 and 2.
I have upgraded a couple of VB6 projects to vb.net. To modify to 0-based array in the beginning is better than to debug the code a later time.
Most of my VB.Net arrays are 0-based and every element is used. That's usual in VB.Net and code mustn't surprise the reader. Readability is vital.
Any exceptions? Maybe if I had a program ported from VB6, so it used 0-based arrays with unused initial elements, and it needed a small change, I might match the pattern of the existing code. Least surprise again.
99 times out of 100 the question shouldn't arise because you should be using List(Of T) rather than an array!
Who are the "users" that are going to see the array indexes? Any good developer will be able to handle a zero-indexed array and no real user should ever see them. If the user has to interact with the array, then make an actually user-friendly system for doing so (text or a 1-based virtual index or whatever is called for).
In visual basic is it possible to declare an array starting from 1, if you find inconvenient to use a 0 based array.
Dim array(1 to 10) as Integer
It is just a matter of tastes. I use 1 based arrays in visual basic but 0 based arrays in C ;)

how to create a system-wide independent universal counter object primarily for Database keys?

I would like to create/use a system-wide independent universal 'counter object' that can be called via COM in a thread-safe manner.
The counter object will be passed an ID to identify which counter to return, handle the counting, 'persist' the count (occasionally), have reasonable performance (as fast as possible) perhaps capable of 1000 counts per second or better (1mS) and be accessible cross-process/out-of-process. The current count status must be persisted between object restarts/shutdowns.
The counter object is liklely to be a 'singleton' type object implemented in some form of free-threaded dictionary, containing maybe 10 counters (perhaps 50 max). The count needs to be monotonic and consistent, (ie: guaranteed unique sequential values).
Each counter should have a few methods, like reset, inc, dec, set, clear, remove. As a luxury, I would like to have a variable-increment (ie: 'step by' value). To support thread-safefty, perhaps some sorm of critical-section or mutex call. It just needs to return a long/4byte signed integer.
I really want something that can be called from anywhere, including VBScript, so I figure COM is my preferred solution.
The primary use of this is for database keys. I am unable to use autoinc or guid type keys and have ruled out database-generated counting systems at this point.
I've spent days researching this and I have really struggled to find a solution. The best I can find is a free-threaded dictionary object that can be instantiated using COM+ from Motobit - it seems to offer all the 'basics' and I guess I could create some form of wrapper for this.
So, here are my questions:
Does such a 'general purpose
counter-object already exist? Can you direct me to it? (MS did
do an IIS/ASP object called
'MSWC.Counter' but this isn't
'cross-process'/ out-of-process
component and isn't thread-safe. (but if it was, it would do!)
What is the best way of creating such
a Component? (I'd prefer VB6
right-now, [don't ask!] but can do in VB.NET2005
if I had to). I don't have the
skills/knowledge/tools to use
anything else.
I am desparate for a workable solution. I need specific guidance! If anybody can code something up for me I am prepared to pay for it.
Update:
Whats wrong with GUIDs? a) 16bytes if I'm lucky (Binary storage), 32+bytes if I'm not (ANSI without formatting) or even worse(64bytes Unicode). b) I have an high-volume replicated app where the GUID is just too big (compared to the actual row data) and c) the overhead of indexing and inserts d) I want a readable number! - I only need 4 byte integer, so why not try and get that? I know you will say that disc-space is cheap, but for my application the cost is in slow inserts, and guids don't help (and I have tried/tested) but would prefer not to use if I have a choice.
Autonumber/autoincs are evil: a) don't get the value until after the insert, b) session specific, c) easy to lose/screw up on a table alter, d) no good for mutli-table inserts, (its not MS-SQL Svr) plus I have a need for counters outside my DB...
By the sound of it, what you're looking to create is an ActiveX EXE. They run in their own process but can be accessed from any other process by instantiating an object from it as though it is just another COM object. It handles all the marshaling necessary to sync its internal thread with the threads of any process calling it. Since all you planning on using is integers, there's no need to worry about the thread safety of objects passed between the threads.
More than likely you can use the MSWC.Counter object within that ActiveX EXE and let it do the counter work.
A database engine is already very good at generating unique primary key values for a dbase table. Either by marking the column auto-increment or by using a Guid. Trying to create your own is a grave mistake. System wide is just not wide enough, it fails miserably when your app grows and more than one machine starts using the database.
Nevertheless, you can get what you want in VB6 by creating a COM server. It's been to long, I forgot the exact names of the project options, something resembling "single use".
I have implemented a similar solution implemented as a REST web service - accessible from any technology that supports http.
Simple c sharp backend implementation using a singleton pattern and will scale nicely under IIS.
The whole thing sounds like a twisted idea, so why should I not add another twisted one. :P
Host an old-skool ASP page.
You can use Application.Lock with a counter then, just like in the sample.
Added benefit: use it from any platform/language. (e.g. other HTML pages with XMLHttpRequest. :)
If you save the value at say every 100th request to a file, you do not even have to worry about IIS resets.
Just set the starting value to last saved value + 100 in Application_OnStart. :P