MergerFacor effect on indexes - indexing

my solrconfig.xml configuration is as :
<mainIndex>
<useCompoundFile>false</useCompoundFile>
<ramBufferSizeMB>32</ramBufferSizeMB>
<mergeFactor>5</mergeFactor>
<maxMergeDocs>10</maxMergeDocs>
<maxFieldLength>10000</maxFieldLength>
<unlockOnStartup>false</unlockOnStartup>
</mainIndex>
and index size is 12mb. but when i change my mergeFactor i am not finding any effect in my indexes., ie. the no of segments are exactly same. i am not getting which configuration will effect the no of segments. as i suppose it is mergefactor.
and my next problem is which configuration defines the number of docs per segments and what will be the size of this segment so that next segments will be created
please make me clear about these points

To your questions:
MergeFactor: If you have a mergefactor of 10 .. every 10 documents there will be a new segment up to the number of 10segements than each segment is added to a segment of 100 and so on.
MaxMergeDocs give you the maximum number of documents a segment can take before starting to merge in a new segment.
So in the end both will have an influence on segments
Update:
If you use the dataImportHandler be sure that you dont auto-optimize to maxSegments=1 on a full import to see effects.

Related

Is there a way to do string replacement/substitution in sql?

I have some records in a CMS that include HTML fragments with custom tags for a widget tool. The maker of the CMS has apparently updated their CMS without providing proper data conversion. Their widgets use keys for layout based on screen width such as block_lg, block_md, block_sm. The problem kicks in with the fact they used to have a block_xs and they have now shifted them all -- dropping the block_xs and instead placing a block_xl on the other end.
We don't really use these things, but their widget configurations do. What this means for us is the values for each key are identical. The problem occurs when the updated CMS code is looking for the 'block_xl' in any widget definition tags, it can't find it and errors out.
What I'm thinking then is that the new code will appear to 'ignore' the block_xs due to how it reads the tags. (and similarly, the old code will ignore block_xl) Since the values for each are identical, I need to basically read any widget definition and add a block_xl value to it matching the value of [any one of] the other width parameters.
Since the best place order-wise would be 'before' the block_lg value, it's probably easiest to do it as follows:
Replace any thing matching posix style regex matching /block_lg(="\d+,\d+")/ with: block_xl="$1" block_lg="$1"
Or whatever the equivalent of that would be.
Example of an existing CMS block with multiple widget definitions:
<div>{{widget type="CleverSoft\CleverBlock\Block\Widget"
widget_title="The Album" classes="highlight-bottom modish greenfont font52 fontlight"
enable_fullwidth="0" block_ids="127" lazyload="0"
block_lg="127,12," block_md="127,12," block_sm="127,12," block_xs="127,12,"
template="widget/block.phtml" scroll="0" background_overlay_o="0"}}</div>
<!-- Image Block -->
<div>{{widget type="CleverSoft\CleverBlock\Block\Widget"
widget_title="What’s Your Favorite Cover Style?"
classes="zoo-widget-style2 modish grey font26 fontlight"
enable_fullwidth="0" block_ids="126" lazyload="0"
block_lg="126,12," block_md="126,12," block_sm="126,12," block_xs="126,12,"
template="widget/block.phtml" scroll="0" background_overlay_o="0"}}</div>
What I would prefer to end up with from the above (adding block_xl):
<div>{{widget type="CleverSoft\CleverBlock\Block\Widget"
widget_title="The Album" classes="highlight-bottom modish greenfont font52 fontlight"
enable_fullwidth="0" block_ids="127" lazyload="0"
block_xl="127,12," block_lg="127,12," block_md="127,12," block_sm="127,12," block_xs="127,12,"
template="widget/block.phtml" scroll="0" background_overlay_o="0"}}</div>
<!-- Image Block -->
<div>{{widget type="CleverSoft\CleverBlock\Block\Widget"
widget_title="What’s Your Favorite Cover Style?"
classes="zoo-widget-style2 modish grey font26 fontlight"
enable_fullwidth="0" block_ids="126" lazyload="0"
block_xl="126,12," block_lg="126,12," block_md="126,12," block_sm="126,12," block_xs="126,12,"
template="widget/block.phtml" scroll="0" background_overlay_o="0"}}</div>
I know how to do it in php and if necessary, I will just replace it on my local DB and write an sql script to update the modified records, but the html blocks can be kind of big in some cases. It would be preferable, if it is possible, to make the substitutions right in the SQL but I'm not sure how to do it or if it's even possible to do.
And yes, there can be more than one instance of a widget in any given cms page or block. (i.e. there may be a need for more than one such substitutions with different local 'values' assigned to the block_lg)
If anyone can help me do it in SQL, it would be greatly appreciated.
for reference, the tables effected are called cms_page and cms_block, the name of the row in both cases is content
SW

Change images slider step in TensorBoard

TensorBoard 1.1.0's images history. I would like to set the slider's position (on top of the black image with 7) more precisely, to be able to select any step. Now I can only select e.g. between steps 2050 or 2810. Is that possible?
Maybe a place in sources where the 10 constant is hardcoded?
I answered this question over there "TensorBoard doesn't show all data points", but this seems to be more popular so I will quote it here.
You don't have to change the source code for this, there is a flag called --samples_per_plugin.
Quoting from the help command
--samples_per_plugin: An optional comma separated list of plugin_name=num_samples pairs to explicitly
specify how many samples to keep per tag for that plugin. For unspecified plugins, TensorBoard
randomly downsamples logged summaries to reasonable values to prevent out-of-memory errors for long
running jobs. This flag allows fine control over that downsampling. Note that 0 means keep all
samples of that type. For instance, "scalars=500,images=0" keeps 500 scalars and all images. Most
users should not need to set this flag.
(default: '')
So if you want to have a slider of 100 images, use:
tensorboard --samples_per_plugin images=100
I managed to do this by changing this line in TensorBoard backend
This question is covered in the FAQ:
Is my data being downsampled? Am I really seeing all the data?
TensorBoard uses reservoir sampling to downsample your data so that it
can be loaded into RAM. You can modify the number of elements it will
keep per tag in tensorboard/backend/application.py. See this
StackOverflow question for some more information.

querying categorymembers with wikimedia and the size

I try to get the page sizes of all category members through the wikimedia api with only one request.(or less then 10).
I know I would get the sizes of pages by:
(1) Requesting every page separately and get the size
or
(2) A search query like this:
http://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=physics
The result is several pages with the size and word count property.
Now how can I get the size and word count for a category member with a query like this or with another trick ?
http://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Physics
Any hints shared would be appreciated.
You can use a category query as a generator, using generator=categorymember and gcmtitle=Category:Physics. This will execute the query action for each and every page in that category:
api.php?action=query
&generator=categorymembers
&gcmtitle=Category:Lakes
&prop=info
In the docs you can see what properties can be used as generators: categories, links and templates. Also, more or less every list module can be used as a generator in the same fashion.
Note that parameter names are prefixed with a g when used for a generator, so that cmtitle in the example above becomes gcmtitle, to distinguish them from parameters to the query action (that is applied to every page returned by the generator), prop and inprop, parameters

Fast Image Retrieval

I have a website that displays products. Each page displays 16 products and there are around 70,000 products on the site. The HTML for each page is generated using PHP.
Product information is stored within a database. Roughly, the first page of results (if I want to show cheapest items first) would be displayed like this (pseudo code only):
// run sql to fetch product titles and image filenames
SELECT itemTitle, itemImageFileName FROM items ORDER BY itemPrice ASC LIMIT 16
// loop through and display items
for (i=0; i<16; i++) {
echo "<p>$itemTitle</p>";
echo "<img src=$itemImageFileName height='100px' width='80px'>";
}
When I do this, the image titles appear first, and then the images are loaded around half a second to one second afterwards. I am wondering how I can accelerate the image loading.
All images are stored in a single folder containing 70,000 images. Nearly all images are less than 50KB in size. Each image filename is of the form: id_width_height.jpg. For example, a filename might be like: 32193_80_100.jpg
I am wondering whether the bottleneck is that it takes the server some time to find the required files because there are 70,000 files in the folder. Is there a way I can accelerate this? Are there any other reasons why images are slow to load?
First of all, I would inspect the image request in a network tab of Chrome/Firefox. That will answer the question of whether the lookup or the download is the critical part.
Do you have a link ?
70.000 is imho too much.. I would split the filename up and create subfolders such as:
/htdocs/img/32193/80_100.jpg

Break the PDF document after 100 pages

I am working with JasperReports and iReport tool. One of the requirements the client wants is that the PDF file will be generated to a 100 page document only.
Could you please help me? How can I generated the 100 page PDF document?
As #WEG mentioned in the answer for JasperReport size limit question it can be done with help of this parameters:
net.sf.jasperreports.governor.max.pages.enabled - a flag indicating whether the governor that checks if a report exceeds a specified limit of pages is turned on. With this property enabled, the JR engine will stop the report execution if the number of pages becomes greater than a custom given value;
net.sf.jasperreports.governor.max.pages - if the governor that checks if a report exceeds a specified limit of pages is turned on, this property will indicate the maximum number of pages allowed to be ran, in order to prevent a memory overflow error. If the number of pages in the report becomes greater than this value, the report execution will be stopped;
REPORT_MAX_COUNT - an integer allowing limit the datasource size.
In the iReport you can find a built in variable PAGE_COUNT. For every element in the detail band you can put the following in the "Print when expression" textbox:
Boolean.valueOf($V{PAGE_COUNT}.intValue() < 100)
This will stop printing after page number 100.