Limiting Amount of Rows in List View - vb.net

Simple enough question, how would I be able to limit the amount of rows on a ListView to the amount of items/rows that actually contain information. I know how to count the rows with items by using this code
ListView1.Items.Count
But how can I limit the amount of rows the listview has to the amount of items?

Assuming a version of .Net that includes LINQ (3.5+), you get some really nice features which help a lot. These apply to any IQueryable including IList..
Dim MyList = [Some code to get hundreds of items]
Dim MyShortList = MyList.Take(30)
You can also implement paging very easily by using Skip...
Dim MyShortListPage2 = MyList.Skip(30).Take(30)
You should look into using the Entity framework or equivalents which implement IQueryable. These reduce memory overhead by using deferred processing aka Lazy Loading.
In short, if I were to do the following using the EF:
Dim Users = DBContext.Set(Of Users)
Users won't actually contain all users in the database, instead it will contain the query to get all users. If I did Users.First, it would run the query against SQL to get the first user. If instead, I did Users.Where(function(x) x.Age=30).First it would only query SQL for the first user whose age is 30.
Thus, IQueryable lets you pare down a dataset quickly using the power of the underlying provider instead of doing it in-memory.
If, instead, I did
Dim Users = DBContext.Set(Of Users).ToList()
It would retrieve all users from the database into memory. The ToList() is what forces this to happen. A List has to be stored in local memory, an IQueryable does not, it can run the appropriate query at the last possible moment and get as little as possible to satisfy your request.
Whether you want this to happen or not depends on the use case.

Related

Resolving performance Issues on Linq with "LIKE"

I have a recognition table containing 25,000 records, and an incoming table of strings that must be recognised using LIKE matching, typically between 200 and 4000 per batch. This used to be in sql server but I am trying to get it to go faster by doing it all in memory, however linq is much slower, taking 5 seconds instead of 250ms in sql when the incoming table has 200 rows.
The recognition table is declared as follows:
Private mRecognition377LK As New SortedDictionary(Of String, RecognitionItem)(StringComparer.CurrentCultureIgnoreCase)
The actual like comparison is here:
r = mRecognition377LK.FirstOrDefault(Function(v As KeyValuePair(Of String, RecognitionItem)) sTitle Like v.Key).Value
So this is executed for every incoming record and I thought that using v.key would enable the linq engine to not scan records that start with a different character, but it seems not.
I can reinvent the wheel and create a collection class that splits the recognition table into its constituent
E.g. if an incoming string is abcdef and we have a recognition record of "abc*" then I could store collection grouped by length of recognition item up to the first star (3), then inside that a collection of recognition items with that length, keyed on the text up to the first star (abc)
So abc* has a string length of 3 so:
r = Itemz(3).Recog("abc")
I think that will work and perform well but its a lot of faff and I'm sure that collection classes and linq would have been designed in a way that such a simple thing could be executed quickly without this performance drag.
So my question is is there a way to make this go fast without going to my proposed solution ?
DRAFT ANSWER
So having programmed up several iterations of TRIE and binary searches I realised that all this was excessive processing and that is because....
BOTH LISTS ARE SORTED
... that means we only need one loop to process both lists and join them, i.e. we are doing in C#/VB what Sql Server does when it performs a MERGE join. So now I am pursuing this as a solution and will update here as appropriate.
FINAL UPDATE
The solution is now finished, and you can indeed join as many lists as you like as long as they are all sorted ascending or all sorted descending on the attributes you are joining, and you can do this in a single loop (because they are sorted). My code is about 1000 lines and very specific, so I'm not going to post a code solution, but for anyone that hits this kind of problem in future, it seems there is nothing in linq that will help do a merge join which is not based on equality (we have LIKE matching) so writing your own merge join in a single loop is possible when the incoming data is sorted.
The basis of the algorithm is to loop through the table which is your "maintable", and advance a pointer into each other list until the text comparison becomes greater than or equal. When its equal, you don't advance this list again until it doesn't match the maintable list, since one item on the right could join many items on the left. This can be repeated for multiple arrays.
It would be nice to see a library where you can pass lambda functions to perform merge joins on multiple sorted arrays. I will consider writing one in future.
The solution runs in 0.007 seconds to join 200 records to a 70,000 record recognition list. With linq performing effectively an inner loop, it took 5 seconds. When joining 4000 records to the same 70,000 record recognition list, the performance degrades only slightly to around 0.01s, showing the great effectiveness of the merge join logic. Sql server took around 250ms to perform the join.

Efficient Querying Data With Shared Conditions

I have multiple sets of data which are sourced from an Entity Framework code-first context (SQL CE). There's a GUI which displays the number of records in each query set, and upon changing some set condition (e.g. Date), the sets all need to recalculate their "count" value.
While every set's query is slightly different in some way, most of them share common conditions in some way. A simple example:
RelevantCustomers = People.Where(P=>P.Transactions.Where(T=>T.Date>SelectedDate).Count>0 && P.Type=="Customer")
RelevantSuppliers = People.Where(P=>P.Transactions.Where(T=>T.Date>SelectedDate).Count>0 && P.Type=="Supplier")
So the thing is, there's enough of these demanding queries, that each time the user changes some condition (e.g. SelectedDate), it takes a really long time to recalculate the number of records in each set.
I realise that part of the reason for this is the need to query through, for example, the transactions each time to check what is really the same condition for both RelevantCustomers and RelevantSuppliers.
So my question is that, given these sets share common "base conditions" which depend on the same sets of data, is there some more efficicent way I could be calculating these sets?
I was thinking something with custom generic classes like this:
QueryGroup<People>(P=>P.Transactions.Where(T=>T.Date>SelectedDate).Count>0)
{
new Query<People>("Customers", P=>P.Type=="Customer"),
new Query<People>("Suppliers", P=>P.Type=="Supplier")
}
I can structure this just fine, but what I'm finding is that it makes basically no difference to the efficiency as it still needs to repeat the "shared condition" for each set.
I've also tried pulling the base condition data out as a static "ToList()" first, but this causes issues when running into navigation entities (i.e. People.Addresses don't get loaded).
Is there some method I'm not aware of here in terms of efficiency?
Thanks in advance!
Give something like this a try: Combine "similar" values into fewer queries, then separate the results afterwards. Also, use Any() rather than Count() for exists check. Your updated attempt goes part-way, but will still result in 2x hits to the database. Also, when querying it helps to ensure that you are querying against indexed fields, and those indexes will be more efficient with numeric IDs rather than strings. (I.e. a TypeID of 1 vs. 2 for "Customer" vs. "Supplier") Normalized values are better for indexing and lead to smaller records, at the cost of extra verbose queries.
var types = new string[] {"Customer", "Supplier"};
var people = People.Where(p => types.Contains(p.Type)
&& p.Transactions.Any(t => t.Date > selectedDate)).ToList();
var relevantCustomers = people.Where(p => p.Type == "Customer").ToList();
var relevantSuppliers = people.Where(p => p.Type == "Supplier").ToList();
This results in just one hit to the database, and the Any should be more perform-ant than fetching an entire count. We split the customers and suppliers after the fact from the in-memory set. The caveat here is that any attempt to access details such as transactions etc. on customers and suppliers would result in lazy-load hits since we didn't eager load them. If you need entire entity graphs then be sure to .Include() relevant details, or be more selective on the data extracted from the first query. I.e. select anonymous types with the applicable details rather than just the entity.

Need for long and dynamic select query/view sqlite

I have a need to generate a long select query of potentially thousands of where conditions like (table1.a = ? OR table1.a = ? OR ...) AND (table2.b = ? OR table2.b = ? ...) AND....
I initially started building a class to make this more bearable, but have since stopped to wonder if this will work well. This query is going to be hammering a table of potentially 10s of millions of rows joined with 2 more tables with thousands of rows.
A number of concerns are stemming from this:
1.) I wanted to use these statements to generate a temp view so I could easily transfer over existing code base, the point here is I want to filter data that I have down for analysis based on selected parameters in a GUI, so how poorly will a view do in this scenario?
2.) Can sqlite even parse a query with thousands of binds?
3.) Isn't there a framework that can make generating this query easier other than with string concatenation?
4.) Is the better solution to dump all of the WHERE variables into hash sets in memory and then just write a wrapper for my DB query object that gets next() until a query is encountered this satisfies all my conditions? My concern here is, the application generates graphs procedurally on scrolls, so waiting to draw while calling query.next() x 100,000 might cause an annoying delay? Ideally I don't want to have to wait on the next row that satisfies everything for more than 30ms at a time.
edit:
New issue, it came to my attention that sqlite3 is limited to 999 bind values(host parameters) at compile time.
So it seems as if the only way to accomplish what I had originally intended is to
1.) Generate the entire query via string concatenations(my biggest concern being, I don't know how slow parsing all the data inside sqlite3 will be)
or
2.) Do the blanket query method(select * from * where index > ? limit ?) and call next() until I hit what valid data in my compiled code(including updating index variable and re-querying repeatedly)
I did end up writing a wrapper around the QSqlQuery object that will walk a table using index > variable and limit to allow "walking" the table.
Consider dumping the joined results without filters (denormalized) into a flat file and index it with Fastbit, a bitmap index engine.

django objects...values() select only some fields

I'm optimizing the memory load (~2GB, offline accounting and analysis routine) of this line:
l2 = Photograph.objects.filter(**(movie.get_selectors())).values()
Is there a way to convince django to skip certain columns when fetching values()?
Specifically, the routine obtains all rows of the table matching certain criteria (db is optimized and performs it very quickly), but it is a bit too much for python to handle - there is a long string referenced in each row, storing the urls for thumbnails.
I only really need three fields from each row, but, if all the fields are included, it suddenly consumes about 5kB/row which sadly pushes the RAM to the limit.
The values(*fields) function allows you to specify which fields you want.
Check out the QuerySet method, only. When you declare that you only want certain fields to be loaded immediately, the QuerySet manager will not pull in the other fields in your object, till you try to access them.
If you have to deal with ForeignKeys, that must also be pre-fetched, then also check out select_related
The two links above to the Django documentation have good examples, that should clarify their use.
Take a look at Django Debug Toolbar it comes with a debugsqlshell management command that allows you to see the SQL queries being generated, along with the time taken, as you play around with your models on a django/python shell.

Lazy Loading on a Collection of Objects

i have a sql query that can bring back a large number of rows via a DataReader. Just now I query the DB transform the result set into a List(of ) and data bind the Grid to the List.
This can result occasionally in a timeout due to the size of Dataset.
I currently have a three teir setup where by the UI is acting on the List of objects in the business layer.
Can anyone suggest the best approach to implementing lazy loading in this scenatrio? or is there some other way of implementing this cleanly?
I am currently using Visual Studio 2005, .NET 2.0
EDIT: How would paging be used in this instance?
LINQ to SQL seems to make sense in your situation.
Otherwise if for any reason, you don't want to use LINQ to SQL (e.g. you are on .NET 2.0), consider writing an iterator that reads the DataReader and converts it to the appropriate object:
IEnumerator<MyObject> ReadDataReader() {
while(reader.MoveNext())
yield return FetchObject(reader);
}
Do you need to bring back all the data at once? You could considering paging.
Paging might be your best solution. If you are using SQL Server 2005 or greater there was new feature added. ROWNUMBER():
WITH MyThings AS
(
SELECT ThingID, DateEntered,
ROW_NUMBER() OVER (ORDER BY DateEntered) AS 'RowNumber'
FROM dbo.Things
)
SELECT *
FROM ThingDetails
WHERE RowNumber BETWEEN 50 AND 60;
There is an example by David Hayden which is very helpful in demonstrating the SQL .
This method would decrease the number of records returned, reducing the overall load time. It does mean that you will have to do a bit more to track where you are in the sequence of records, but it is worth the effort.
The standard paging technique requires everything to come back from the database and then be filtered at the middle tier, or client tier (code-behind) this method reduces the records to a more manageable subset.