What characters are valid in couchdb-lucene keys?

What characters are valid in couchdb-lucene keys? - lucene

I am able to store values in couchdb-lucene with whatever key I like, but it seems that if the key includes any chars outside of [0-9a-zA-Z_] any search fails.
Does anyone know what chars are valid and/or how to properly escape special chars in searches such that special chars can be used?

This shows how to escape special characters and also gives a list of such characters.

All UTF-8 characters should work. I've just verified that I can search for items with é, for example.
A little more information on how you're querying would help, though given the age of this ticket perhaps you've moved on.

Related

Weird character (�) in SQL Server View definition

I have generated the Create statement for a SQL Server view.
Pretty standard, although there is a some replacing happening on a varchar column, such as:
select Replace(txt, '�', '-')
What the heck is '�'?
When I run that against a row that contains that character, I am seeing the literal '?' being replaced.
Any ideas? Do I need some special encoding in my editor?
Edit
If it helps the end point is a Google feed.

You need to read the script in the same encoding as that in which it was written. Even then, if your editor's font doesn't include a glyph for the character, it may still not display correctly.
When the script was created, did you choose an encoding, or accept the default? If the later, you need to find out which encoding was used. UTF-8 is likely.
However, in this case, the character may not be a mis-representation. Unicode replacement character explains that this character is used as a replacement for some other character that cannot be represented. It's possible in your case that the code you are looking at is simply saying, if we have some data that could not be represented, treat it as a hyphen instead. In other words, this may be nothing to do with the script generation/viewing process, but rather a deliberate piece of code.

What's the limitations of PDO named arguments

Using prepared statements with PDO, I understand it as there's two paths,
either ? or :name.
What are the limitations regarding the named parameters? White spaces? Non ASCII-chars?
(I'm well acquainted with the hell of non-ASCII in field names. So please stick to the topic.)

Those are tokens. Limits are probably A-Z, 0-9 characters, not starting with 0-9.

From that switch starting on the line of 304 I would say it is [a-zA-Z0-9_]

How can you query a SQL database for malicious or suspicious data?

Lately I have been doing a security pass on a PHP application and I've already found and fixed one XSS vulnerability (both in validating input and encoding the output).
How can I query the database to make sure there isn't any malicious data still residing in it? The fields in question should be text with allowable symbols (-, #, spaces) but shouldn't have any special html characters (<, ", ', >, etc).
I assume I should use regular expressions in the query; does anyone have prebuilt regexes especially for this purpose?

If you only care about non-alphanumerics and it's SQL Server you can use:
SELECT *
FROM MyTable
WHERE MyField LIKE '%[^a-z0-9]%'
This will show you any row where MyField has anything except a-z and 0-9.
EDIT:
Updated pattern would be: LIKE '%[^a-z0-9!-# ]%' ESCAPE '!'
I had to add the ESCAPE char since you want to allow dashes -.

For the same reason that you shouldn't be validating input against a black-list (i.e. list of illegal characters), I'd try to avoid doing the same in your search. I'm commenting without knowing the intent of the fields holding the data (i.e. name, address, "about me", etc.), but my suggestion would be to construct your query to identify what you do want in your database then identify the exceptions.
Reason being there are just simply so many different character patterns used in XSS. Take a look at the XSS Cheat Sheet and you'll start to get an idea. Particularly when you get into character encoding, just looking for things like angle brackets and quotes is not going to get you too far.

Accented character replacement for search then reinserted afterwards

Basically my issue is that users would like to search for a french word that has accented characters but without typing in the accented characters and then have the actual accented word appeared highlighted if found... So for example they would type in "declare" but in the result sets it would look like "déclare" and if found "déclare" would be highlighted.
My first thought was to just simply replace the characters with a regex but then I remembered that I would need to re-insert the replaced characters after the search... I was thinking of then using some sort of character map that would track position and the character so that when the search was finshed I could put the result set back to the way it was. This seems a little brute force to me and I was wondering if anyone had a better alternative? I'm using Visual Studio 2005 with this app.
Any advice would be much appreciated!
Thanks

A regular expression by default matches text. The "replacement" mode is not the normal mode. So, what you want is in fact the default. The precise syntax will depend on your Regex engine, e.g. in .Net you'd use Regex.IsMatch()

Special Characters in URL

We're currently replacing all special characters and spaces in our URLs with hypens (-). From an SEO and readability point-of-view this works fine. However, in some cases, we are feeding parts of the URL into a search after stripping the hyphens out. The problem occurs when the search term should have hyphens as it returns no results when they get stripped. We could modify the search algorithm we're using but this will slow it down (especially bad as we're using it with an AJAX-ed search box and this needs to be fast).
The best option to deal with this, as far as we can tell is to replace pre-existing hyphens with pipes (|). I have a feeling that this will have a negative impact on SEO for those terms as the pipe character will be treated as a part of the word and not as a separator. As far as I can tell, the only characters that are considered to be separators are hyphens and forward slashes (/).
So my questions are:
Are there alternative characters we can use to represent hyphens?
If we can't use any other characters, how much impact will using a pipe character have on a search engine?
Cheers,
Zac

Would ~ (tilde) work?
Edit: Google now treats underscores and dashes as word separators so you can use dashes as dashes and underscores as spaces.

Why not use Url Encoding? Most frameworks have built in utilities to do this.

I was going to say the same thing about URL encoding, but if you're trying to get rid of the special characters, I suppose you don't want URLs with percent signs, right?
What about altering the algorithm that "feeds parts of the URL into a search"? Couldn't you add some logic to not replace hyphens within the search query part of the URL?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

What characters are valid in couchdb-lucene keys? - lucene

I am able to store values in couchdb-lucene with whatever key I like, but it seems that if the key includes any chars outside of [0-9a-zA-Z_] any search fails. Does anyone know what chars are valid and/or how to properly escape special chars in searches such that special chars can be used?

This shows how to escape special characters and also gives a list of such characters.

All UTF-8 characters should work. I've just verified that I can search for items with é, for example. A little more information on how you're querying would help, though given the age of this ticket perhaps you've moved on.

Related

Weird character (�) in SQL Server View definition

What's the limitations of PDO named arguments

How can you query a SQL database for malicious or suspicious data?

Accented character replacement for search then reinserted afterwards

Special Characters in URL

Categories

Resources