Accented character replacement for search then reinserted afterwards - vb.net

Basically my issue is that users would like to search for a french word that has accented characters but without typing in the accented characters and then have the actual accented word appeared highlighted if found... So for example they would type in "declare" but in the result sets it would look like "déclare" and if found "déclare" would be highlighted.
My first thought was to just simply replace the characters with a regex but then I remembered that I would need to re-insert the replaced characters after the search... I was thinking of then using some sort of character map that would track position and the character so that when the search was finshed I could put the result set back to the way it was. This seems a little brute force to me and I was wondering if anyone had a better alternative? I'm using Visual Studio 2005 with this app.
Any advice would be much appreciated!
Thanks

A regular expression by default matches text. The "replacement" mode is not the normal mode. So, what you want is in fact the default. The precise syntax will depend on your Regex engine, e.g. in .Net you'd use Regex.IsMatch()

Related

Searching for multuple valeus using REGEX

I have a big sporadic sql scripts and need to find and replace a few values in it. I am trying to pass my values in REGEX to Notepad++ but I can't seem to make it work. To be more specific, I have around 50 script, each with 5000 lines, and I need to look for a list of values, e.g. "[dbo].[livesales]" "[dbo].[CreditCards]" in all my scripts separately. I undertand that I need either run this separately against each script or merger them all into one file, but I need the proper REGEX command for it. I need to include square bracket and dots as well. I end up building this but it doesn't work for me:
^(?=.*\b[dbo].[LiveSales]\b)(?=.*\b[dbo].[CreditCards]\b).+$
enter image description here
thanks in advance,
I wouldn't bother using word boundaries, as square brackets in SQL Server are pretty ubiquitous for database object names (e.g. database and column names). I suggest the following pattern:
\[dbo\]\.\[(?:LiveSales|CreditCards)\]
Demo
The major changes I have made include not using word boundaries, escaping the [ and ] brackets (since square bracket is a regex metacharacter with a special meaning), and also not try to match the entire input. Presumably you want to find all such occurrences, and so don't bother trying to scope your pattern with ^ and $.

SQL2008 fulltext index search without word breakers

I are trying to search an FTI using CONTAINS for Twitter-style usernames, e.g. #username, but word breakers will ignore the # symbol. Is there any way to disable word breakers? From research, there is a way to create a custom word breaker DLL and install it and assign it but that all seems a bit intensive and, frankly, over my head. I disabled stop words so that dashes are not ignored but I need that # symbol. Any ideas?
You're not going to like this answer. But full text indexes only consider the characters _ and ` while indexing. All the other characters are ignored and the words get split where these characters occur. This is mainly because full text indexes are designed to index large documents and there only proper words are considered to make it a more refined search.
We faced a similar problem. To solve this we actually had a translation table, where characters like #,-, / were replaced with special sequences like '`at`','`dash`','`slash`' etc. While searching in the full text, u've to again replace ur characters in the search string with these special sequences and search. This should take care of the special characters.

Weird character (�) in SQL Server View definition

I have generated the Create statement for a SQL Server view.
Pretty standard, although there is a some replacing happening on a varchar column, such as:
select Replace(txt, '�', '-')
What the heck is '�'?
When I run that against a row that contains that character, I am seeing the literal '?' being replaced.
Any ideas? Do I need some special encoding in my editor?
Edit
If it helps the end point is a Google feed.
You need to read the script in the same encoding as that in which it was written. Even then, if your editor's font doesn't include a glyph for the character, it may still not display correctly.
When the script was created, did you choose an encoding, or accept the default? If the later, you need to find out which encoding was used. UTF-8 is likely.
However, in this case, the character may not be a mis-representation. Unicode replacement character explains that this character is used as a replacement for some other character that cannot be represented. It's possible in your case that the code you are looking at is simply saying, if we have some data that could not be represented, treat it as a hyphen instead. In other words, this may be nothing to do with the script generation/viewing process, but rather a deliberate piece of code.

How can you query a SQL database for malicious or suspicious data?

Lately I have been doing a security pass on a PHP application and I've already found and fixed one XSS vulnerability (both in validating input and encoding the output).
How can I query the database to make sure there isn't any malicious data still residing in it? The fields in question should be text with allowable symbols (-, #, spaces) but shouldn't have any special html characters (<, ", ', >, etc).
I assume I should use regular expressions in the query; does anyone have prebuilt regexes especially for this purpose?
If you only care about non-alphanumerics and it's SQL Server you can use:
SELECT *
FROM MyTable
WHERE MyField LIKE '%[^a-z0-9]%'
This will show you any row where MyField has anything except a-z and 0-9.
EDIT:
Updated pattern would be: LIKE '%[^a-z0-9!-# ]%' ESCAPE '!'
I had to add the ESCAPE char since you want to allow dashes -.
For the same reason that you shouldn't be validating input against a black-list (i.e. list of illegal characters), I'd try to avoid doing the same in your search. I'm commenting without knowing the intent of the fields holding the data (i.e. name, address, "about me", etc.), but my suggestion would be to construct your query to identify what you do want in your database then identify the exceptions.
Reason being there are just simply so many different character patterns used in XSS. Take a look at the XSS Cheat Sheet and you'll start to get an idea. Particularly when you get into character encoding, just looking for things like angle brackets and quotes is not going to get you too far.

Accented character regex

I'm trying to create a regex that will look for french words whether a user specifies the accented characters or not. So if the the user has searched for "déclaré" but types in declare instead I would like to be able to match the text still. I'm having difficulty making this more dynamic so that it can be matched on any french word...
Closest example from another user from a different post was:
d[eèéê]cl[aàáâ]r[eèéê]
Is it even possible to write a regex for something like this?
Any advice would be much appreciated.
I had once to create something like that.
The best thing I could come up with was something akin to having a dictionary of known letters with diacritics and replace them on the search terms, before creating a pattern for a regular expression.
Pretty much like you did on your own example.