Alfresco FTS - why first digit of folder's name should be escaped? - lucene

I have a question regarding the alfresco FTS/lucene search. It is known that in the search query some special characters have to be escaped, like space (by _x0020_).
But it turned out that if folder's name first chatacter is a digit, it should also be escaped. It can be easily tested in Node Browser by creating a folder, like 123456 and navigate to the parent folder in node browser (in my case I have following folder structure: */2017/123456/):
Primary Path: /app:company_home/st:sites/<some-folders>/cm:_x0032_017/cm:_x0031_23456
^this is 2 ^ and this is 1
If I don't ecape first character of the folder I have 500 error returned.
Why is that, I tried to find something relevant in Alfresco documentation, but didn't manage to.
Alfresco v.4.2.0

Lucene search uses ISO 9075 codification (SQL) like similar frameworks, so we need to encode the path elements. It would be nice if the API hides this requirement like the browser url but you could use ISO9075Encode to do the job.

Related

Alfresco lucene search cannot find folder

I have a folder in document library of a site. I want to find all content of that folder. Running following lucene/alfresco-fts query in Node Browser returns No items found:
PATH:"/app:company_home/st:sites/cm:mysite/cm:documentLibrary/cm:MyFolder/*"
Which is wrong, as I have documents in that folder and running same query for different folder returns proper result. Another strange thing is that I cannot get this folder: following query also returns No items found:
PATH:"/app:company_home/st:sites/cm:mysite/cm:documentLibrary/cm:MyFolder"
Also if I get content of document library then MyFolder is skipped in the results and subfolder is returned:
PATH:"/app:company_home/st:sites/cm:mysite/cm:documentLibrary/*"
Name | Parent
--------------|---------------------
cm:MyFolder2 | /app:company_home/st:sites/cm:mysite/cm:documentLibrary
cm:MySubfolder| /app:company_home/st:sites/cm:mysite/cm:documentLibrary/cm:MyFolder
I have checked the aspects and properties of MyFolder and they are the same as MyFolder2. I do not have any custom behaviours/rules/etc.
How can I make first lucene query work and return content of MyFolder?
Try updating metadata on the folder so Solr re-indexes it. You could also get its db id and then tell solr to re-index it by db id. If it has over 1000 children, a FTS query may fail. - Known issue. Try using a txmd query.
I would suggest you to get the node ref of the folder from folder details page and search in node browser. There you can get the primary path. Please cross verify the path you use to search using lucene or use that primary path to search for the folder in lucene search.
Another possibility is that the locale property(sys:locale) of the folder(MyFolder) will be different from the locale of your browser. Please check whether the locale of MyFolder and the other folders for which result is shown, are same or not. If not that can also be a reason.

Drupal 7 Apache solr faceted search with OR condition on two fields instead of drill down/AND

I have a Drupal 7 website that is running apachesolr search and is using faceting through the facetapi module.
When I use the facets to narrow my searches, everything works perfectly and I can see the filters being added to the search URL, so I can copy them as links (ready-made narrowed searches) elsewhere on the site.
Here is an example of how the apachesolr URL looks after I select several facets/filters:
search_url/search_keyword?f[0]=im_field_tag_term1%3A1&f[1]=im_field_tag_term2%3A100
Where the 'search_keyword' portion is the text I'm searching for and the '%3A' is just the url encoded ':' (colon).
Knowing this format, I can create any number of ready-made searches by creating the correct format for the URL. Perfect!
However, these filters are always ANDed, the same way they are when using the facet interface. Does anyone know if there is a syntax I can use, specifically in the search URL, to OR my filters/facets? Meaning, to make it such that the result is all entries that contains EITHER of the two filters?
New edit:
I do know how to OR terms for one facet through the URL im_field_tag_term1:(x or y) but I need to know how to apply OR condition between two facets .
Thanks in advance .

sharepoint crawl rule to exclude AllItems.aspx , but get an item/document in search resu lts if queried in the search box

I followed this blog Tips 1and created a crawl rule http://.*forms/allitems.aspx and ran full crawl. I no longer get the results with AllItems.aspx. However, if there is any document with name Something.doc in a Document Library , it no longer gets pulled in the search results.
I think what I desire is a basic functionality, like the user should not get to see Allitems.aspx in the search results but should get the item/document with names entered in the search box.
Please let me know if I am missing anything. I have already put in 24 hours...googled the max I could.
It seems that an Index Reset is required. Here's the steps I did:
1. Add the following crawl rule to exclude: *://*allitems.aspx.
2. Index Reset.
3. Full Crawl.
I could not find a good way to do this using crawl rules. Instead, I opted to set up a restriction on the search results web part.
In the search results web part properties, select "Change Query"
Add a property filter to exclude anything with "AllItems" (and any other exclusions you want in place.
Used Steve Mann's blog as a reference and for the images: http://stevemannspath.blogspot.com/2013/04/sharepoint-2013-search-removing-junk.html

Selenium IDE : How to use pattern checking for a dynamic id using XPath

In my website there is a recently uploaded image section.
in this section all recently uploaded images are displayed randomly
using firepath i traced the xpath of that location
//div[#id='udtkbdf50']/a/div[2]/div
so on each time page refresh this #id='udtkbdf50' value changes ,only one thing is common that is the value is always starting with u
so i want to use pattern matching technique [regular expression or Globbing Patterns ]
#id='udtkbdf50' for this value and rest of the path i.e /a/div[2]/div will remain same.
//div[contains(#id,'u')]/a/div[2]/div will work.
UPDATE:
//div[starts-with(#id,'u')]/a/div[2]/div will be more specific.
All d best.

Are colons allowed in URLs?

I thought using colons in URIs was "illegal". Then I saw that vimeo.com is using URIs like http://www.vimeo.com/tag:sample.
What do you feel about the usage of colons in URIs?
How do I make my Apache server work with the "colon" syntax because now it's throwing the "Access forbidden!" error when there is a colon in the first segment of the URI?
Colons are allowed in the URI path. But you need to be careful when writing relative URI paths with a colon since it is not allowed when used like this:
<a href="tag:sample">
In this case tag would be interpreted as the URI’s scheme. Instead you need to write it like this:
<a href="./tag:sample">
Are colons allowed in URLs?
Yes, unless it's in the first path segment of a relative-path reference
So for example you can have a URL like this:
https://en.wikipedia.org/wiki/Template:Welcome
And you can use it normally as an absolute URL or some relative variants:
Welcome Template
Welcome Template
Welcome Template
But this would be invalid:
Welcome Template
because the "Template" here would be mistaken for the protocol scheme.
You would have to use:
Welcome Template
to use a relative link from a page on the same level in the hierarchy.
The spec
See the RFC 3986, Section 3.3:
https://www.rfc-editor.org/rfc/rfc3986#section-3.3
The path component contains data, usually organized in hierarchical
form, that, along with data in the non-hierarchical query component
(Section 3.4), serves to identify a resource within the scope of the
URI's scheme and naming authority (if any). The path is terminated
by the first question mark ("?") or number sign ("#") character, or
by the end of the URI.
If a URI contains an authority component, then the path component
must either be empty or begin with a slash ("/") character. If a URI
does not contain an authority component, then the path cannot begin
with two slash characters ("//"). In addition, a URI reference
(Section 4.1) may be a relative-path reference, in which case the
first path segment cannot contain a colon (":") character. The ABNF
requires five separate rules to disambiguate these cases, only one of
which will match the path substring within a given URI reference. We
use the generic term "path component" to describe the URI substring
matched by the parser to one of these rules. [emphasis added]
Example URL that uses a colon:
https://en.wikipedia.org/wiki/Template:Welcome
Also note the difference between Apache on Linux and Windows. Apache on Windows somehow doesn't allow colons to be used in the first part of the URL. Linux has no problem with this, however.