Is the forward slash "/" a reserved character in solr field names?
I'm having trouble writing a solr sort query which will parse for fields containing a forward slash "/"
When making an http query to my solr server:
q=*&sort=normal+desc
Will work but
q=*&sort=with/slash+desc
q=*&sort=with%2Fslash+desc
Both fail say "can not use FieldCache on multivalued field: with"
Each solr document contains two int fields "normal", and "with/slash". With my solr schema indexing the fields as so
...
<field name="normal" type="int" indexed="true" stored="true" required="false" />
<field name="with/slash" type="int" indexed="true" stored="true" required="false" />
...
Is there any special way I need to encode forward slashes in solr? Or are there any other delimiter characters I can use? I'm already using '-' and "." for other purposes.
I just came across the same problem, and after some experimentation found that if you have a forward-slash in the field name, you must escape it with a backslash in the Solr query (but note that you do not have to do this in the field list parameter, so a search looking for /my/field/name containing my_value is entered in the "q" field as:
\/my\/field\/name:my_value
I haven't tried the sort field, but try this and let us know :)
This is on Solr 4.0.0 alpha.
From the solr wiki at https://wiki.apache.org/solr/SolrQuerySyntax :
Solr 4.0 added regular expression support, which means that '/' is now
a special character and must be escaped if searching for literal
forward slash.
In my case I needed to search for forward slash / with wild card *, e.g.:
+(*/*)
+(*2016/17*)
I Tried to escape it like so:
+(*2016\/*)
+(*2016\/17*)
but that didn't work also.
the solution was to wrap the text with double quote " like do:
+("*\/*")
+("*/*")
+("*2016\/17*")
+("*2016/17*")
both returned the same result with and without escaping the forward slash
Related
Is it possible to conditionally replace parts of strings in MySQL?
Introduction to a problem: Users in my database stored articles (table called "table", column "value", each row = one article) with wrong links to images. I'd like to repair all of them at once. To do that, I have to replace all of the addresses in "href" links that are followed by images, i.e.,
<img src="link2">
should by replaced by
<img src="link2">
My idea is to search for each "href" tag and if the tag is followed by and "img", than I'd like to obtain "link2" from the image and use it replace "link1".
I know how to do it in bash or python but I do not have enough experience with MySQL.
To be specific, my table contains references to images like
<a href="www.a.cz/b/c"><img class="image image-thumbnail " src="www.d.cz/e/f.jpg" ...
I'd like to replace the first adress (href) by the image link. To get
<a href="www.d.cz/e/f.jpg"><img class="image image-thumbnail " src="www.d.cz/e/f.jpg" ...
Is it possible to make a query (queries?) like
UPDATE `table`
SET value = REPLACE(value, 'www.a.cz/b/c', 'XXX')
WHERE `value` LIKE '%www.a.cz/b/c%'
where XXX differs every time and its value is obtained from the database? Moreover, "www.a.cz/b/c" varies.
To make things complicated, not all of the images have the "href" link and not all of the links refer to images. There are three possibilities:
"href" followed by "img" -> replace
"href" not followed by "img" -> keep original link (probably a link to another page)
"img" without "href" -> do nothing (there is no wrong link to replace)
Of course, some of the images may have a correct link. In this case it may be also replaced (original and new will be the same).
Database info from phpMyAdmin
Software: MariaDB
Software version: 10.1.32-MariaDB - Source distribution
Protocol version: 10
Server charset: UTF-8 Unicode (utf8)
Apache
Database client version: libmysql - 5.6.15
PHP extension: mysqli
Thank you in advance
SELECT
regexp_replace(
value,
'^<a href="([^"]+)"><img class="([^"]+)" src="([^"]+)"(.*)$',
'<a href="\\3"><img class="\\2" src="\\3"\\4'
)
FROM
yourTable
The replacement only happens if the pattern is matched.
^ at the start means start of the string
([^"]+) means one of more characters, excluding "
(.*) means zero or more of any character
$ at the end means end of the string
The replacement takes the 3rd "pattern enclosed in braces" (back-reference) and puts it where the 1st "pattern enclosed in braces" (back-reference) was.
The 2nd, 3rd and 4th back-references are replaced with themselves (no change).
https://dbfiddle.uk/?rdbms=mariadb_10.2&fiddle=96aef2214f844a1466772f41415617e5
If you have strings that don't exactly match the pattern, it will do nothing. Extra spaces will trip it up, for example.
In which case you need to work out a new regular expression that always matches all of the strings you want to work on. Then you can use the \\n back-references to make replacements.
For example, the following deals with extra spaces in the href tag...
SELECT
regexp_replace(
value,
'^<a[ ]+href[ ]*=[ ]*"([^"]+)"><img class="([^"]+)" src="([^"]+)"(.*)$',
'<a href="\\3"><img class="\\2" src="\\3"\\4'
)
FROM
yourTable
EDIT:
Following comments clarifying that these are actually snippets from the MIDDLE of the string...
https://dbfiddle.uk/?rdbms=mariadb_10.2&fiddle=48ce1cc3df5bf4d3d140025b662072a7
UPDATE
yourTable
SET
value = REGEXP_REPLACE(
value,
'<a href="([^"]+)"><img class="([^"]+)" src="([^"]+)"',
'<a href="\\3"><img class="\\2" src="\\3"'
)
WHERE
value REGEXP '<a href="([^"]+)"><img class="([^"]+)" src="([^"]+)"'
(Though I prefer the syntax RLIKE, it's functionally identical.)
This will also find an replace that pattern multiple times. You're not clear if that's desired or possible.
Solved, thanks to #MatBailie , but I had to modified his answer. The final query, including the update, is
UPDATE `table`
SET value = REGEXP_REPLACE(value, '(.*)<a href="([^"]+)"><img class="([^"]+)" src="([^"]+)"(.*)', '\\1<a href="\\4"><img class="\\3" src="\\4"\\5'
)
A wildcard (.*) had to be put at the beginning of the search because the link is included in an article (long text) and, consequently, the arguments of the replace pattern are increased.
How can I request Solr to search for special characters.
e.g. to search for Strings containing the '#' character
When I am quering
"name_tsi : "#*" AND type_ssi :program"
It is giving me all the available entries in the index.
Which I get through
"type_ssi :program"
I am getting same results in both the cases, but it should filter the result on the basis of (name_tsi : "#*").
And use of back slash \ before # is not working.
Is there anything I can do in solrconfig.xml or schema.xml
I am using Lucene version 5.0.0.
In my search string, there is a minus character like “test-”.
I read that the minus sign is a special character in Lucene. So I have to escape that sign, as in the queryparser documentation:
Escaping Special Characters:
Lucene supports escaping special characters that are part of the query syntax. The current list special characters are:
- + - && || ! ( ) { } [ ] ^ " ~ * ? : \ /`
To escape these character use the \ before the character. For example to search for (1+1):2 use the query:
\(1\+1\)\:2
To do that I use the QueryParser.escape method:
query = parser.parse(QueryParser.escape(searchString));
I use the classic Analyzer because I noticed that the standard Analyzer has some problems with escaping special characters.
The problem is that the Parser deletes the special characters and so the Query has the term
content:test
How can I set up the parser and searcher to search for the real value “test-“?
I also created my own query with the content test- but that also didn’t work. I recieved 0 results but my index has entries like:
Test-VRF
Test-IPLS
I am really confused about this problem.
While escaping special characters for the queryparser deals with part of the problem, it doesn't help with analysis.
Neither classic nor standard analyzer will keep punctuation in the indexed form of the field. For each of these examples, the indexed form will be in two terms:
test and vrf
test and ipls
This is why a manually constructed query for "test-" finds nothing. That term does not exist in the index.
The goal of these analyzers is to attempt to index words. As such, punctuation is mostly eliminated, and is not searchable. A phrase query for "test vrf" or "test-vrf" or "test_vrf" are all effectively identical. If that is not what you need, you'll need to look to other analyzers.
The goal to fix this issue is to store the value content in an NOT_ANALYZED way.
Field fieldType = new Field(key.toLowerCase(),value, Field.Store.YES, Field.Index.NOT_ANALYZED);
Someone who has the same problem has to take care how to store the contents in the index.
To request the result create a query in this way
searchString = QueryParser.escape(searchString);
and use for example a WhitespaceAnalyzer.
I have set up Zend Lucene to search products_name and part_number.
This works well, however there are issues with hyphenated part numbers.
For example, if I have the part number: 5130193-00
This will return any part number with '00' at the end.
How can I make Lucene only return the exact part number?
I am using Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive()); when indexing and searching (CaseInsensitive does not work, but that's another issue) and the part numbers are indexed as Text.
Try escaping dash with a slash: part_number:5130193\-00.
More information is available here (see Escaping Special Characters).
I have query that I run on Oracle that is supposed to allow hyphen characters. The results are supposed to match the exact String including the hyphen character as follows:
SELECT <field>
FROM <table>
WHERE LOWER(field) LIKE '%-pa%';
The results however show "web-page", "web -page" as well as "web page". However, I only would like to find "web-page" and "web -page" in this case. I tried to escape the hyphen character with a backslash but that results in no records found. Can anybody give me a hint on how to make this work?
That's not my observation of how Oracle treats hyphens. Here's a brief sample of what I see:
SQL> select * from fb;
ID
----------
Web-Page
Web Page
Web -Page
SQL> select * from fb where lower(id) like '%-pa%';
ID
----------
Web-Page
Web -Page
Are you sure you're not using the underscore instead of the hyphen? The underscore is a single character wild card.
Normaly the hyphen shouldn't need to be escaped, but you can try
select <field> from <table> where lower(field) like '%X-pa%' escape 'X';
instead of 'X', you can use any arbitrary character