Better to use size or count on collection? - sql

When counting a collection. Is it better to do it via size or count?
Size = Ruby (#foobars.size)
Count = SQL (#foobars.count)
I also notice, count makes another trip to the db.

I tend to suggest using size for everything, just because it's safer. People make fewer silly mistakes using size.
Here's how they work:
length: length will return the number of elements from an array, or otherwise loaded collection - the key point is that the collection will be loaded here regardless. So if you're working with an activerecord association, it will pull the elements from the DB to memory, and then return the number.
count: count issues a database query, so if you have an array already it's a pointless call to your database.
size: best of both worlds - size checks which type you're using and then uses whichever seems more appropriate (so if you have an array, it will use length; if you have an unretrieved ActiveRecord::Association it will use count, and so on).
Source:
http://blog.hasmanythrough.com/2008/2/27/count-length-size/

It depends on the situation. In the example you show I would go with size since you already have the collection loaded and a call to size will just check the length of the array. As you noticed, count will do an extra db query and you really want to avoid that.
However, in the scenario that you only want to display the number of Foobars and not show those objects, then I would go with count because it will not load the instances into memory, just return the number of records.

Related

Select nth value of NSArray

How would I go about selecting the nth values of an array and adding them to another array.
For example, if i have an NSArray which has 100 objects and I want to add every 5th object? I understand how to select the 5th object and how to add to a new array etc, but just looking for the best way to do this. This is for image manipulation, so will be dealing with arrays of up to 2m pixel values.
Is the best way to just use for loops?
You can using striding:
.stride(to: 100, by: 5)
So to create a new array:
Array(0.stride(to: 10, by: 2).map( { myArray[$0] }))
UPDATE: As Leo Dabus points out, the above will start at element 0 (and take every 2nd). If you want to start at the 5th and take every 5th, you would use:
Array(4.stride(to: 100, by: 5).map( { myArray[$0] }))
Using loops is pretty good: they are easy to read, and they are about as efficient as anything else that you may want to use for this purpose. The only optimization to the for loop approach is to reserve a specific number of elements upfront, because you know how many elements you are going to write.
If you are going to make the same selection from multiple arrays (e.g. processing an array of arrays), consider creating NSIndexSet, and applying it with objectsAtIndexes to perform the selection. This may give your code slightly better readability, because the for loop for creating indexes would be separate from the process of selection.
Finally, if you need to optimize for speed, and your arrays store wrapped primitives, consider using plain arrays instead of NSArray to avoid wrapping and unwrapping. This has a potential of giving you the most improvement, because by eliminating additional memory accesses for unwrapping it would also significantly improve locality of reference, which has crucial importance for cache use optimization.

Fast, efficient method of assigning large array of data to array of clusters?

I'm looking for a faster, more efficient method of assigning data gathered from a DAQ to its proper location in a large cluster containing arrays of subclusters.
My current method 1 relies heavily on the OpenG cluster manipulation tools, but with a large data-set the performance is far too slow.
The array and cluster location of each element of data from the DAQ is determined during an initialization phase and doesn't change during acquisition.
Because the data element origin and end points are the same throughout acquisition, I would think an array of memory locations could be created and the data directly assigned to its proper place. I'm just not sure how to implement such a thing.
The following code does what you want:
For each of your cluster elements (AMC, ANLG_PM and PA) you should add a case in the string case structure, for the elements AMC and PA you will need to place a second case structure.
This is really more of a comment, but I do not have the reputation to leave those yet, so here it is:
Regarding adding cases for every possible value of Array name, is there any reason why you cannot use an enum here? Since you are placing it into a cluster anyway, I would suggest making a type-defined enum of your possible array names. That way, when you want to add or remove one, you only have to do it in one place.
You will still need to right-click on your case structures that use this enum and select Add item for every value if you are adding a value, or manually delete the obsolete value if you are removing one. I suppose some maintenance is required either way...

Keeping an array sorted - at setting, getting or later?

As an aid to learning objective c/oop, I'm designing an iOS app to store and display periodic bodyweight measurements. I've got a singleton which returns a mutablearray of the shared store of measurement object. Each measurement will have at least a date and a body weight, and I want to be able to add historic measurements.
I'd like to display the measurements in date order. What's the best way to do this? As far as I can see the options are as follows: 1) when adding a measurement - I override addobject to sort the shared store every time after a measurement is added, 2) when retrieving the mutablearray I sort it, or 3) I retrieve the mutablearray in whatever order it happens to be in the shared store, then sort it when displaying the table/chart.
It's likely that the data will be retrieved more frequently than a new datum is added, so option 1 will reduce redundant sorting of the shared store - so this is the best way, yes?
You can use a modified version of (1). Instead of sorting the complete array each time a new object is inserted, you use the method described here: https://stackoverflow.com/a/8180369/1187415 to insert the new object into the array at the correct place.
Then for each insert you have only a binary search to find the correct index for the new object, and the array is always in correct order.
Since you said that the data is more frequently retrieved than new data is added, this seems to be more efficient.
If I forget your special case, this question is not so easy to answer. There are two basic solutions:
Keep array unsorted and when you try to access the element and array is not sorted, then sort it. Let's call it "lazy sorting".
Keep array sorted when inserting elements. Note this is not about appending new element at the end and then sort the whole array. This is about finding where the element should be (binary search) and place it there. Let's call it "sorted insert".
Both techniques are correct and useful and deciding which one is better depends on your use cases.
Example:
You want to insert hundreds of elements into the array, then access the elements, then again insert hundreds of elements, then access. In summary, you will be inserting values in big chunks. In this case, lazy sorting will be better.
You will often insert individual elements and you will access the elements often. Then sorted insert will have better performance.
Something in the middle (between inserting 1 and inserting tens of elements). You probably don't care which one of the methods will be used.
(Note that you can use also specialized structures to keep an array sorted, not based on NSArray, e.g. structures based on a balanced tree, while keeping number of elements in the subtree).

Core DAta: Get a random row from the fetched result

I'm looking for a memory efficient way to take only one row from a fetch result set. This must be random.
I thought using [context countForFetchRequest:fetch error:nil]; and get an int random value between 0 and that and offset + limit the fetch to 1 result. But I can't find whether or not it doesn't allocate memory for each item it counts.
Is "count" a lightweight operation? Or does it need to instantiate objects in the context before being able to count them?
The documentation is somewhat unclear, but it includes the phrase "number of objects a given fetch request would have returned." Furthermore, Core Data tends to make things like count very lightweight - entity instances, for example, allow you to call count to find out the number of objects on the end of a to-many relationship without instantiating all those objects or firing that fault. I'd say go for it, but profile it yourself - don't optimize prematurely!

What is the maximum record count for a DataGrid in VB.NET?

What is the maximum record count for a DataGrid in VB.NET? Is there any limit?
There is no explicit limitation in a DataGrid.
However it is constrained by both it's internal data structures which tend to count rows in terms of Integer. This means the hard limit is Integer.MaxValue
On a 32 bit system though, you will hit problems long before you hit Integer.MaxValue rows. Every item added to the DataGrid has a certain amount of overhead. If each item only has 4 bytes of overhead, you will max out at Integer.MaxValue / 4. This is really a simplistic view of the problem though because it doesn't take into account other controls, internal WinForms resources, etc ...
How many records are you thinking of adding?
I'm not aware of any hard limit outside the physical limitations of available memory or perhaps Integer.MaxValue. I wouldn't be too surprised if there is, though.
What's more important is that a datagrid is a visual construct shown to real users. Users won't much appreciate a grid with 1000 items it, let alone 100,000+ (which I have seen before).
The point is that rather than worrying about the maximum records you can stuff in there, you're better off implementing paging or a better search mechanism.
The Count property of the DataGridView.Rows collection is an Integer, so the normal range for 4-byte signed integers should apply as an effective limit: 0 to 2,147,483,647
As to what the Rows collection can actually hold, I'm not sure... it would depend on what your application is actually doing.
Apart from Integer.MaxValue, I guess it depends on the memory available to the application & already used.
Why are you thinking of filling the grid with all rows at once?
It doesn't make sense showing users all the records.