add a lot of rows with jsreport-core - jsreport

relates to this.
I have a lot of rows to add to the report. the report can be in excel or pdf.
and I want to add it in bulks.
how it can be achieved?

The chrome-pdf recipe doesn't support bulks. The html generation is typically fast enough and chrome uses multiple threads to render big tables and there is no space for further optimization.
The html-to-xlsx recipe includes optimization techniques for big tables. The main point is to use the htmlToXlsxEachRows which splits rows and runs the conversion in bulks.
<table>
{{#htmlToXlsxEachRows people}}
<tr>
<td>{{name}}</td>
<td>{{address}}</td>
</tr>
{{/htmlToXlsxEachRows}}
</table>
See the performance section for the further information.

Related

BigQuery Count Appears to be Processing Data

I noticed that running a SELECT count(*) FROM myTable on my larger BQ tables yields long running times, upwards of 30/40 seconds despite the validator claiming the query processes 0 bytes. This doesn't seem quite right when 500 GB queries run faster. Additionally, total row counts are listed under details -> Table Info. Am I doing something wrong? Is there a way to get total row counts instantly?
When you run a count BigQuery still needs to allocate resources (such as: slot units, shards etc). You might be reaching some limits which cause a delay. For example, the slots default per project is 2,000 units.
BigQuery execution plan provides very detail information about the process which can help you better understand the source of the delay.
One way to overcome this is to use an approximate method described in this link
This Slide by Google might also help you
For more details see this video about how to understand the execution plan

Large results set from Oracle SELECT

I have a simple, pasted below, statement called against an Oracle database. This result set contains names of businesses but it has 24,000 results and these are displayed in a drop down list.
I am looking for ideas on ways to reduce the result set to speed up the data returned to the user interface, maybe something like Google's search or a completely different idea. I am open to whatever thoughts and any direction is welcome.
SELECT BusinessName FROM MyTable ORDER BY BusinessName;
Idea:
SELECT BusinessName FROM MyTable WHERE BusinessName LIKE "A%;
I'm know all about how LIKE clauses are not wise to use but like I said this is a LARGE result set. Maybe something along the lines of a BINARY search?
The last query can perform horribly. String comparisons inside the database can be very slow, and depending on the number of "hits" it can be a huge drag on performance. If that doesn't concern you that's fine. This is especially true if the Company data isn't normalized into it's own db table.
As long as the user knows the company he's looking up, then I would identify an existing JavaScript component in some popular JavaScript library that provides a search text field with a dynamic dropdown that shows matching results would be an effective mechanism. But you might want to use '%A%', if they might look for part of a name. For example, If I'm looking for IBM Rational, LLC. do I want it to show up in results when I search for "Rational"?
Either way, watch your performance and if it makes sense cache that data in the company look up service that sits on the server in front of the DB. Also, make sure you don't respond to every keystroke, but have a timeout 500ms or so, to allow the user to type in multiple chars before going to the server and searching. Also, I would NOT recommend bringing all of the company names to the client. We're always looking to reduce the size and frequency of traversals to the server from the browser page. Waiting for 24k company names to come down to the client when the form loads (or even behind the scenes) when shorter quicker very specific queries will perform sufficiently well seems more efficient to me. Again, test it and identify the performance characteristics that fit your use case best.
These are techniques I've used on projects with large data, like searching for a user from a base of 100,000+ users. Our code was a custom Dojo widget (dijit), I 'm not seeing how to do it directly with the dijit code, but jQuery UI provides the autocomplete widget.
Also use limit on this query with a text field so that the drop down only provides a subset of all the matches, forcing the user to further refine the query.
SELECT BusinessName FROM MyTable ORDER BY BusinessName LIMIT 10

Table Detection Algorithms

Context
I have a bunch of PDF files. Some of them are scanned (i.e. images). They consist of text + pictures + tables.
I want to turn the tables into CSV files.
Current Plan:
1) Run Tesseract OCR to get text of all the documents.
2) ??? Run some type of Table Detection Algorithm ???
3) Extract the rows / columns / cells, and the text in them.
Question:
Is there some standard "Table Extraction Algorithm" to use?
Thanks!
Abbyy Fine Reader includes table detection and will be the easiest approach. It can scan, import PDF', TIFF's etc. You will also be able to manually adjust the tables and columns when the auto detection fails.
www.abbyy.com - You should be able to download a trial version and you will also find the OCR results are much more accurate than Tesseract which will also save you a lot of time.
Trying to write something yourself will be hit and miss as there are too many different types of tables to cope with. ie. with lines, without lines, shaded, multiple lines, different alignments, headers, footers etc..
Good luck.

django objects...values() select only some fields

I'm optimizing the memory load (~2GB, offline accounting and analysis routine) of this line:
l2 = Photograph.objects.filter(**(movie.get_selectors())).values()
Is there a way to convince django to skip certain columns when fetching values()?
Specifically, the routine obtains all rows of the table matching certain criteria (db is optimized and performs it very quickly), but it is a bit too much for python to handle - there is a long string referenced in each row, storing the urls for thumbnails.
I only really need three fields from each row, but, if all the fields are included, it suddenly consumes about 5kB/row which sadly pushes the RAM to the limit.
The values(*fields) function allows you to specify which fields you want.
Check out the QuerySet method, only. When you declare that you only want certain fields to be loaded immediately, the QuerySet manager will not pull in the other fields in your object, till you try to access them.
If you have to deal with ForeignKeys, that must also be pre-fetched, then also check out select_related
The two links above to the Django documentation have good examples, that should clarify their use.
Take a look at Django Debug Toolbar it comes with a debugsqlshell management command that allows you to see the SQL queries being generated, along with the time taken, as you play around with your models on a django/python shell.

Two blocks on the same line in a FlowDocument

Is it possible to have two blocks (say, two Sections) rendered on the same line in a FlowDocument?
It seems to always start the second section on the next line and I can't seem to work around this.
A wrapper using InlineUIContainer might work, but what do I put in the InlineUIContainer? I need to render tens of thousands of these lines, so it needs to be relatively efficient.
Sounds like you should use a table.
E.g.
<Table>
<Table.Columns>
<TableColumn/>
<TableColumn/>
</Table.Columns>
<TableRowGroup>
<TableRow>
<TableCell>
<Section><Paragraph>First section</Paragraph></Section>
</TableCell>
<TableCell>
<Section><Paragraph>Second section</Paragraph></Section>
</TableCell>
</TableRow>
</TableRowGroup>
</Table>
You could either repeat the whole table every time you need this, or just add rows to the row group every time.
I haven't done any performance testing with tens of thousands of tables or a table with tens of thousands of rows. I have go to the range of about a thousand tables in a single flow document and the FlowDocumentReader maintains good performance up to that level.