How to implement KB id for pages in XWiki? - xwiki

Is there any way to navigate xwiki pages using automatically incrementing KnowledgeBase ID. Like in Microsoft Support https://support.microsoft.com/kb/825751 -- every article has its own unique numeric searchable ID.
Requirements:
KB ID should be automatically incremented when new page has been created (no manual inc)
Ability to find pages by their KB ID (from search form)
KB ID should be displayed on page or placed in page URL
Pages should have normal text titles

Related

Is it possible/wise to NOT link any pages from index? (SEO, Search Engines)

I have a humble question :)
i plan to set up a rather unusual webproject with about a thousand pages, where there won't be a classical navigation (only for about page and contact) and all pages won't link to one and another.
its index > opens random page > opens random page > opens random page.. all via a small php action..
i know from basic SEO understanding, that you should then generate a static directory like a sitemap, that links to all pages, so that google finds all pages from the index downwards..
BUT i don't want users, to see it.. it kills the fun on using the site, when you can see all content pages at a glance.. this project is all about exploring random things..
is this somehow possible? to have a dead end index page and a thousand dead end html pages that are only connected via a php script?
thanks in advance..
From a technical standpoint, there are no issues in what you are planning. From a SEO indexing and Google standpoint, make sure none of the pages you want discovered and indexed by Google are orphans, i.e. without a link to these pages.
These "hidden" pages need not be linked from the home page or a sitemap (one-to-many), instead you can try the breadcrumb method where a page leads only to the next page, which leads to the next page (one-to-one) and so on.
e.g. -
Parent 1 > child 1 > child 1a > child 1b .......
Parent 2 > child 2 > child 2a > child 2b .......
Parent 3 > child 3 > child 3a > child 3b .......
Here, the home page and your sitemap will have links ONLY to Parent 1, Parent 2 & Parent 3
UPDATE--
Also, not having a HTML sitemap for your users will not affect Google indexing as long as your XML sitemap is in place for Google to access.
Hope this helps.
There's not technical problem in having one page generate links to other pages upon generation, but I feel like there are some issues in the general idea here..
Firstly, why do you want your "sub pages" to be indexed by Google? Per definition, this defeats the "random page" idea. A Google search over your site (for instance using the "site:" feature of Google) will list all your pages, since they're indexed. This means it is easy to navigate between the "secret pages" (even if only cached versions of them).
Secondly, unless you prevent Google from indexing your pages (via a robots.txt file, for instance) - the Google bot will generate at least a subset of your pages by visiting the index page and then generating a link to a sub page.
To conclude, you can create an index which sends users over to random pages, but it probably makes little sense to have the web site indexed by a search engine if you'd like the sub pages to be "secret".

How to search category and fill site parameters automatically

I have seen some sites where you search (in Google) for a particular item category and when you click the link found in Google it automatically goes to the site clicked with the search criteria filled in displaying the categorised products.
Hypothetical Example
Go into Google type in Sony TV , click to search.
Results are displayed.
Clicking one of the links takes me to a website which shows all the Sony TV models beginning with AA.
Looking at the search options on the page some fields have been automatically filled in (in other words if you did this search manually the site would prompt you to enter some search criteria) - Not sure if this is relevant but thought to mention.
How is this done? Do i need to setup something in our Google account to get the same results?
It's fairly simple. You pass parameters in your URLs that identify the product, and then you just read the URL parameters when pre-populating the search form on the page. When building your site / sitemap / internal & external links you use those page URLs and Google will naturally pick them up.
In your example, you search for Sony TV. One of the results may be
example.com/index.php?product=sony-tv
The website has the variable sony-tv, which it gets from the URL and pre-populates on the search form.
The important part to note is that the site will have built its URL structure in this method typically and the page you're presented with just happens to look like the site dynamically searched based on your query (it hasn't).

Ektron 9 - How to set Page Title (<html><head><title>...</title>...)

Noob question, but google is not giving me the goods:
How do you set the page title in a page in Ektron 9? You know: the text that goes into the title tag in the head of the html document.
In an Ektron 8 site I have used, there was a page title meta value that was used.
Does this work out-of-the-box?
If not, are there best practices?
What I have tried
Title of page is the "Content Title" not "Page Title"
New page widget does not have "page title" on alias screen as one pdf suggested
Googling "ektron page title" and variants did not throw up much.
Editing ektron page's folder properties did not show up anything.
Ektron settings "metadata definitions" does not have one for page title
Will keep you posted if I find the answer myself
Ektron does not create the definitions for Title, Description, Keywords or other SEO-related metadata out of the box (when you set up a min site, as is standard).
The site manager / developer defines those. How you set that on the page depends on your implementation and Ektron version.
For example, most 8.0 (and prior) sites will use the CMS:Metadata control in the of the master page (or page, if no master). The control accepts one dynamic parameter, so I used to place three controls, one for content (dynamic param = id), for forms (dynamic param = ekfrm), and for PageBuilder (dynamic param = pageid).
It's more common now (versions 8.5+) to see developers retrieving the metadata from the content (whether html / smart form, html form, or pagebuilder) using the ContentManager API method GetItem. This method accepts two params - the first is the ID of the item you want to retrieve, the second is boolean and, when set to true, will tell the API to retrieve metadata values. Once you have the values, you define the output.
Either method will work in versions 8.5+. The latter gives you more control.
Using Ektron 9 SP2.
We have a single line in the master page:
We use the DefaultContentID of the PageBuilder wireframe that is the front page. The other aspx templates just get the metadata of the HTML content item - all of our non-pagebuilder pages have a content block, then a bunch of smartform data.

Get page numbers of searchresult of a pdf in solr

I'm building a web application where users can search for pdf documents and view them with pdf.js. I would like to display the search results with a short snippet of the paragraph where the search term where found and a link to open the document at the right page.
So what I need is the page number and a short text snippet of every search result.
I'm using SOLR 4.1 to index pdf documents. The indexing itself works fine but I don't know how to get the page number and paragraph of a search result.
I found this here "Indexing PDF with page numbers with Solr" but it wasn't really helpfully.
I'm now splitting the PDF and sending each page separately to SOLR.
So every page is an own document with an id <id_of_document>_<page_number> and an additional field doc_id which contains only the <id_of_document> for grouping the results.
There is JIRA SOLR-380 with a Patch, which you can check upon.
I also tried getting the results with page number but could not do it. I used Apache PDFBox for splitting all the PDFs present in a directory and sending the files to Solr server.
I have not tried it myself.
Approach,
Solr customer connector integrating with Apache Tika parser for indexing PDFs
Create multiple attributes in Solr like page1, page2, page3…,pageN – Alternatively, can use dynamic attributes in Solr
In the customer connector, read the PDFs, page by page, index them onto the respective page attributes/dynamic attributes
Enable search on all the “page” attributes
When user searches, use the “highlighter/Summary/Teaser” component to only retrieve “page” attributes that has hits
The “page” attributes that has a hit (find from highlighter/Summary/Teaser) for a given records are the pages that has the searched phrase.
Link the PDF with the “#PageNumber” of the PDF and pop up the page on click
A far better approach compared to splitting the PDFs and indexing them as separate Solr docs.
If you find a flaw in this design, respond to my thread. I will attempt to resolve it.

fetch specific title in every page with nutch and solr

I have solr and nutch installed and my web page structure is that in every page the title is the same; e.g. Bank Something; but in every page there is a tag with an ID of TITLE, something like:
<div ID="TITLE"><h1>my page specific title</h1></div>
I want to add another field to solr like second Title that fetch my page specific title and search words in it.(indeed now my page specific title is in content field and i want to have this in other field)
How can I do this?!
Check Nutch Plugin which should allow you to extarct an element from a web page.