Two blocks on the same line in a FlowDocument - xaml

Is it possible to have two blocks (say, two Sections) rendered on the same line in a FlowDocument?
It seems to always start the second section on the next line and I can't seem to work around this.
A wrapper using InlineUIContainer might work, but what do I put in the InlineUIContainer? I need to render tens of thousands of these lines, so it needs to be relatively efficient.

Sounds like you should use a table.
E.g.
<Table>
<Table.Columns>
<TableColumn/>
<TableColumn/>
</Table.Columns>
<TableRowGroup>
<TableRow>
<TableCell>
<Section><Paragraph>First section</Paragraph></Section>
</TableCell>
<TableCell>
<Section><Paragraph>Second section</Paragraph></Section>
</TableCell>
</TableRow>
</TableRowGroup>
</Table>
You could either repeat the whole table every time you need this, or just add rows to the row group every time.
I haven't done any performance testing with tens of thousands of tables or a table with tens of thousands of rows. I have go to the range of about a thousand tables in a single flow document and the FlowDocumentReader maintains good performance up to that level.

Related

Resolving performance Issues on Linq with "LIKE"

I have a recognition table containing 25,000 records, and an incoming table of strings that must be recognised using LIKE matching, typically between 200 and 4000 per batch. This used to be in sql server but I am trying to get it to go faster by doing it all in memory, however linq is much slower, taking 5 seconds instead of 250ms in sql when the incoming table has 200 rows.
The recognition table is declared as follows:
Private mRecognition377LK As New SortedDictionary(Of String, RecognitionItem)(StringComparer.CurrentCultureIgnoreCase)
The actual like comparison is here:
r = mRecognition377LK.FirstOrDefault(Function(v As KeyValuePair(Of String, RecognitionItem)) sTitle Like v.Key).Value
So this is executed for every incoming record and I thought that using v.key would enable the linq engine to not scan records that start with a different character, but it seems not.
I can reinvent the wheel and create a collection class that splits the recognition table into its constituent
E.g. if an incoming string is abcdef and we have a recognition record of "abc*" then I could store collection grouped by length of recognition item up to the first star (3), then inside that a collection of recognition items with that length, keyed on the text up to the first star (abc)
So abc* has a string length of 3 so:
r = Itemz(3).Recog("abc")
I think that will work and perform well but its a lot of faff and I'm sure that collection classes and linq would have been designed in a way that such a simple thing could be executed quickly without this performance drag.
So my question is is there a way to make this go fast without going to my proposed solution ?
DRAFT ANSWER
So having programmed up several iterations of TRIE and binary searches I realised that all this was excessive processing and that is because....
BOTH LISTS ARE SORTED
... that means we only need one loop to process both lists and join them, i.e. we are doing in C#/VB what Sql Server does when it performs a MERGE join. So now I am pursuing this as a solution and will update here as appropriate.
FINAL UPDATE
The solution is now finished, and you can indeed join as many lists as you like as long as they are all sorted ascending or all sorted descending on the attributes you are joining, and you can do this in a single loop (because they are sorted). My code is about 1000 lines and very specific, so I'm not going to post a code solution, but for anyone that hits this kind of problem in future, it seems there is nothing in linq that will help do a merge join which is not based on equality (we have LIKE matching) so writing your own merge join in a single loop is possible when the incoming data is sorted.
The basis of the algorithm is to loop through the table which is your "maintable", and advance a pointer into each other list until the text comparison becomes greater than or equal. When its equal, you don't advance this list again until it doesn't match the maintable list, since one item on the right could join many items on the left. This can be repeated for multiple arrays.
It would be nice to see a library where you can pass lambda functions to perform merge joins on multiple sorted arrays. I will consider writing one in future.
The solution runs in 0.007 seconds to join 200 records to a 70,000 record recognition list. With linq performing effectively an inner loop, it took 5 seconds. When joining 4000 records to the same 70,000 record recognition list, the performance degrades only slightly to around 0.01s, showing the great effectiveness of the merge join logic. Sql server took around 250ms to perform the join.

How to delete multiple rows in for wxDataviewCtrl and wxDataViewVirtualListModel

I am using the wxDataviewCtrl and wxDataViewVirtualListModel to show the a long list of data, the wxDataViewVirtualListModel has 3 wxArrayString to store the data.
Currently when I want to delete a row, I will delete the data in 3 wxArrayString and call RowDelete(row) to notify the wxDataViewCtrl.
However, when I want to delete hundreds of rows I need to use a loop to delete them which is very slow.
How can I delete multiple rows faster?
Thank you
Sorry to dig up an old thread, but this pops to the top of the search, and I may have a solution that will help. The wxDataView example doesn't exactly show how to clear the entire list. Here is how I did it, and it seems very fast:
In your derived wxDataViewVirtualListModel class, add a function to clear all the column data out of your model. Like this:
void Clear(){
m_myDescriptionColValues.clear();
m_myNumberColValues.clear();
m_myFooColValues.clear();
Reset(0); // This is like DeleteRows(), but better.
}
In the wxDataView sample, this would go in the MyListModel class. Call this function when you want to clear out the model and repopulate the control with fresh data. It's really fast in my program with several hundred items.
At the very least, you should use a single RowsDeleted() call instead of multiple RowDeleted(). You could also use a more efficient representation than 3 parallel arrays, although I seriously doubt it's the bottleneck for just a few hundreds of rows -- but as usual, you need to profile to find out whether this is really [not] the case.

Storing trillions of document similarities

I wrote a program to compute similarities among a set of 2 million documents. The program works, but I'm having trouble storing the results. I won't need to access the results often, but will occasionally need to query them and pull out subsets for analysis. The output basically looks like this:
1,2,0.35
1,3,0.42
1,4,0.99
1,5,0.04
1,6,0.45
1,7,0.38
1,8,0.22
1,9,0.76
.
.
.
Columns 1 and 2 are document ids, and column 3 is the similarity score. Since the similarity scores are symmetric I don't need to compute them all, but that still leaves me with 2000000*(2000000-1)/2 ≈ 2,000,000,000,000 lines of records.
A text file with 1 million lines of records is already 9MB. Extrapolating, that means I'd need 17 TB to store the results like this (in flat text files).
Are there more efficient ways to store these sorts of data? I could have one row for each document and get rid of the repeated document ids in the first column. But that'd only go so far. What about file formats, or special database systems? This must be a common problem in "big data"; I've seen papers/blogs reporting similar analyses, but none discuss practical dimensions like storage.
DISCLAIMER: I don't have any practical experience with this, but it's a fun exercise and after some thinking this is what I came up with:
Since you have 2.000.000 documents you're kind of stuck with an integer for the document id's; that makes 4 bytes + 4 bytes; the comparison seems to be between 0.00 and 1.00, I guess a byte would do by encoding the 0.00-1.00 as 0..100.
So your table would be : id1, id2, relationship_value
That brings it to exactly 9 bytes per record. Thus (without any overhead) ((2 * 10^6)^2)*9/2bytes are needed, that's about 17Tb.
Off course that's if you have just a basic table. Since you don't plan on querying it very often I guess performance isn't that much of an issue. So you could go 'creative' by storing the values 'horizontally'.
Simplifying things, you would store the values in a 2 million by 2 million square and each 'intersection' would be a byte representing the relationship between their coordinates. This would "only" require about 3.6Tb, but it would be a pain to maintain, and it also doesn't make use of the fact that the relations are symmetrical.
So I'd suggest to use a hybrid approach, a table with 2 columns. First column would hold the 'left' document-id (4 bytes), 2nd column would hold a string of all values of documents starting with an id above the id in the first column using a varbinary. Since a varbinary only takes the space that it needs, this helps us win back some space offered by the symmetry of the relationship.
In other words,
record 1 would have a string of (2.000.000-1) bytes as value for the 2nd column
record 2 would have a string of (2.000.000-2) bytes as value for the 2nd column
record 3 would have a string of (2.000.000-3) bytes as value for the 2nd column
etc
That way you should be able to get away with something like 2Tb (inc overhead) to store the information. Add compression to it and I'm pretty sure you can store it on a modern disk.
Off course the system is far from optimal. In fact, querying the information will require some patience as you can't approach things set-based and you'll pretty much have to scan things byte by byte. A nice 'benefit' of this approach would be that you can easily add new documents by adding a new byte to the string of EACH record + 1 extra record in the end. Operations like that will be costly though as it will result in page-splits; but at least it will be possible without having to completely rewrite the table. But it will cause quite bit of fragmentation over time and you might want to rebuild the table once in a while to make it more 'aligned' again. Ah.. technicalities.
Selecting and Updating will require some creative use of SubString() operations, but nothing too complex..
PS: Strictly speaking, for 0..100 you only need 7 bytes, so if you really want to squeeze the last bit out of it you could actually store 8 values in 7 bytes and save another ca 300Mb, but it would make things quite a bit more complex... then again, it's not like the data is going to be human-readable anyway =)
PS: this line of thinking is completely geared towards reducing the amount of space needed while remaining practical in terms of updating the data. I'm not saying it's going to be fast; in fact, if you'd go searching for all documents that have a relation-value of 0.89 or above the system will have to scan the entire table and even with modern disks that IS going to take a while.
Mind you that all of this is the result of half an hour brainstorming; I'm actually hoping that someone might chime in with a neater approach =)

VBA: Performance of multidimensional List, Array, Collection or Dictionary

I'm currently writing code to combine two worksheets containing different versions of data.
Hereby I first want to sort both via a Key Column, combine 'em and subsequently mark changes between the versions in the output worksheet.
As the data amounts to already several 10000 lines and might some day exceed the lines-per-worksheet limit of excel, I want these calculations to run outside of a worksheet. Also it should perform better.
Currently I'm thinking of a Quicksort of first and second data and then comparing the data sets per key/line. Using the result of the comparison to subsequently format the cells accordingly.
Question
I'd just love to know, whether I should use:
List OR Array OR Collection OR Dictionary
OF Lists OR Arrays OR Collections OR Dictionaries
I have as of now been unable to determine the differences in codability and performance between this 16 possibilities. Currently I'm implementing an Array OF Arrays approach, constantly wondering whether this makes sense at all?
Thanks in advance, appreciate your input and wisdom!
Some time ago, I had the same problem with the macro of a client. Additionally to the really big number of rows (over 50000 and growing), it had the problem of being tremendously slow from certain row number (around 5000) when a "standard approach" was taken, that is, the inputs for the calculations on each row were read from the same worksheet (a couple of rows above); this process of reading and writing was what made the process slower and slower (apparently, Excel starts from row 1 and the lower is the row, the longer it takes to reach there).
I improved this situation by relying on two different solutions: firstly, setting a maximum number of rows per worksheet, once reached, a new worksheet was created and the reading/writing continued there (from the first rows). The other change was moving the reading/writing in Excel to reading from temporary .txt files and writing to Excel (all the lines were read right at the start to populate the files). These two modifications improved the speed a lot (from half an hour to a couple of minutes).
Regarding your question, I wouldn't rely too much on arrays with a macro (although I am not sure about how much information contains each of these 10000 lines); but I guess that this is a personal decision. I don't like collections too much because of being less efficient than arrays; and same thing for dictionaries.
I hope that this "short" comment will be of any help.

Need for long and dynamic select query/view sqlite

I have a need to generate a long select query of potentially thousands of where conditions like (table1.a = ? OR table1.a = ? OR ...) AND (table2.b = ? OR table2.b = ? ...) AND....
I initially started building a class to make this more bearable, but have since stopped to wonder if this will work well. This query is going to be hammering a table of potentially 10s of millions of rows joined with 2 more tables with thousands of rows.
A number of concerns are stemming from this:
1.) I wanted to use these statements to generate a temp view so I could easily transfer over existing code base, the point here is I want to filter data that I have down for analysis based on selected parameters in a GUI, so how poorly will a view do in this scenario?
2.) Can sqlite even parse a query with thousands of binds?
3.) Isn't there a framework that can make generating this query easier other than with string concatenation?
4.) Is the better solution to dump all of the WHERE variables into hash sets in memory and then just write a wrapper for my DB query object that gets next() until a query is encountered this satisfies all my conditions? My concern here is, the application generates graphs procedurally on scrolls, so waiting to draw while calling query.next() x 100,000 might cause an annoying delay? Ideally I don't want to have to wait on the next row that satisfies everything for more than 30ms at a time.
edit:
New issue, it came to my attention that sqlite3 is limited to 999 bind values(host parameters) at compile time.
So it seems as if the only way to accomplish what I had originally intended is to
1.) Generate the entire query via string concatenations(my biggest concern being, I don't know how slow parsing all the data inside sqlite3 will be)
or
2.) Do the blanket query method(select * from * where index > ? limit ?) and call next() until I hit what valid data in my compiled code(including updating index variable and re-querying repeatedly)
I did end up writing a wrapper around the QSqlQuery object that will walk a table using index > variable and limit to allow "walking" the table.
Consider dumping the joined results without filters (denormalized) into a flat file and index it with Fastbit, a bitmap index engine.