How to format keywords in SQL Server Full Text Search - sql

I have a sql function that accepts keywords and returns a full text search table.
How do I format the keyword string when it contains multiple keywords? Do I need to splice the string and insert "AND"? (I am passing the keywords to the method through Linq TO SQL)
Also, how do I best protect myself from sql injection here.? Are the default ASP.NET filters sufficient?
thanks

I would use "AND" and asterisks on each word. The asterisk will help the matching be a bit wider since I believe it is best to return too many rather than too few. For example, a search for "Georgia Peach" would use the keyword string '"Georgia*" AND "Peach*"' (the double quotes around each word are important).
And I believe the ASP.NET Filters are sufficient. Plus, since you are using parameterized queries (which LINQ to SQL does), you are pretty safe.

Related

Does array like function exist in SQL

Using SQL I would like to know if its possible to do the following:
If I have a variable that the user inputs mutiple strings into seperated by a comma for example ('aa','bbb','c','dfd'), is it possible using LIKE with a wilcard at the end of each string in stead of having the user to enter each variations in multiple macros.
So say if user was looking for employee numbers that start with ('F','E','C') is it possible without using OR statements is the question I guess am asking?
It would be similar to that of an array I guess
No, LIKE is its own operator and therefore needs separated by an OR.
You might prefer ILIKE to LIKE, as it is a case-insensitive comparison.
You can also try to use REGEXP_LIKE, which is similar to what you want, except you'll have to use regex expressions instead of 'FEC%'
That depends on your SQL dialect; I don't know Impala at all, but other SQL engines have support for regular expressions in string matches, so that you can build a query string like
SELECT fld FROM tbl WHERE fld REGEXP '^[FEC].*$';
No matter what you do, you will need to build a query from your user's input. Passing through user input unprocessed into your SQL processor is a big "nope" anyways, from a "don't accidentally delete a table" point of view:

Escaping single quotes in the PLACEHOLDER clause of a HANA SQL statement

I noticed an inconsistency in how "HANA SQL" escapes single quotes in the context of the PLACEHOLDER clause. For example, consider the following PLACEHOLDER clause snippet:
('PLACEHOLDER' = ('$$CC_PARAM$$','''foo'',''an escaped single quote \'' '''))
The PLACEHOLDER clause above contains multiple values assigned to the CC_PARAM. parameter. We can see that inside of the second argument we have a single quote that's escaped with a backslash. However, we escape the single quotes outside each argument with another single quote (i.e. we do '' instead of \''. It's possible to use the \'' format for the first case, but it's not possible to use the '' format in the second case.
Why is there this discrepancy? It makes escaping quotes in multi-input input parameters tricky. I'm looking to programmatically craft SQL queries for HANA. Am I missing something here? Is it safe to use \'' over '' in all cases? Or do I need logic that can tell where a single quote occurs and escape as appropriate?
The implicit rule here - given by how the software is implemented - is that for parameter values of calculation views, the backslash \ is used to escape the single quotation mark.
For all standard SQL string occurrences, using the single-quotation mark twice '' is the correct way to differentiate between syntax element and string literal.
As for the why:
the PLACEHOLDER syntax is not SQL, but a HANA-specific command extension. So, there is no general standard that the current implementation violates.
that given, this command extension is embedded into, respectively clamped onto the standard SQL syntax and has to be handled by the same parser.
But the parameters are not only parsed once, by the SQL parser but again by the component that instantiates the calculation scenario based on the calculation view. With a bit of squinting it's not hard to see that the parameters interface is a general key-value interface that allows for all sorts of information to be handed over to the calc. engine.
One might argue that the whole approach of providing parameters via key-value pairs is not consistent with the general SQL syntax approach and be correct. On the flip side, this approach allows for general flexibility for adding new command elements to the HANA-specific parts, without structurally changing the syntax (and with it the parser).
The clear downside of this is that both the key names, as well as the values, are string-typed. To avoid losing the required escaping for the "inner string" an escape string different from the main SQL escape string needs to be used.
And here we are with two different ways of handing over a string value to be used as a filter condition.
Funny enough, both approaches may still lead to the same query execution plan.
As a matter of fact, in many scenarios with input parameters, the string value will be internally converted into a SQL conforming form. This is the case when the input parameter is used for filtering or in expressions in the calc. view that can be converted into SQL expressions.
For example
SELECT
"AAA"
FROM "_SYS_BIC"."sp/ESC"
('PLACEHOLDER' = ('$$IP_TEST$$', 'this is a test\''s test'));
shows the following execution plan on my system
OPERATOR_NAME OPERATOR_DETAILS
PROJECT TEST.AAA
COLUMN TABLE FILTER CONDITION: TEST.AAA = 'this is a test's test'
(DETAIL: ([SCAN] TEST.AAA = 'this is a test's test'))
Note how the escape-\' has been removed.
All in all: when using PLACEHOLDER values, the \' escaping needs to be used and in all other cases, the '' escaping.
That should not be terribly difficult to implement for a query builder as you can consider this when dealing with the PLACEHOLDER syntax.

SQLite function that works like the Oracle's "Translate" function?

Oracle has a function called translate that can be used to replace individual characters of the string by others, in the same order that they appear. It is different than the replace function, which replaces the entire second argument occurence by the entire third argument.
translate('1tech23', '123', '456'); --would return '4tech56'
translate('222tech', '2ec', '3it'); --would return '333tith'
I need this to implement a search on a SQLite database ignoring accents (brazilian portuguese language) on my query string. The data in the table that will be queried could be with or without accents, so, depending on how the user type the query string, the results would be different.
Example:
Searching for "maçã", the user could type "maca", "maça", "macã" or "maçã", and the data in the table could also be in one of the four possibilities.
Using oracle, I would only use this:
Select Name, Id
From Fruits
Where Translate(Name, 'ãç','ac') = Translate(:QueryString, 'ãç','ac')
... and these other character substitutions:
áéíóúÁÉÍÓÚàèìòùÀÈÌÒÙãõÃÕäëïöüÄËÏÖÜâêîôûÂÊÎÔÛñÑçÇ
by:
aeiouAEIOUaeiouAEIOUaoAOaeiouAEIOUaeiouAEIOUnNcC
Of course I could nest several calls to Replace, but this wouldn't be a good choice.
Thanks in advance by some help.
Open-source Oracle functions for SQLite have been written at Kansas State University. They include translate() (full UTF-8 support, by the way) and can be found here.
I don't believe there is anything in sqlite that will translate text in a single pass as you describe.
This wouldn't be difficult to implement as a user defined function however. Here is a decent starting reference.
I used replace
REPLACE(string,pattern,replacement)
https://www.sqlitetutorial.net/sqlite-replace-function/

Preg_replace solution for prepared statements

I have a command class that abstracts almost all specific database functions (We have the exactly same application running on Mssql 2005 (using ODBC and the native mssql library), MySQL and Oracle. But sometimes we had some problems with our prepare method that, when executed, replaces all placeholders with their respective values. But the problem is that I am using the following:
if(is_array($Parameter['Value']))
{
$Statement = str_ireplace(':'.$Name, implode(', ', $this->Adapter->QuoteValue($Parameter['Value'])), $Statement);
}
else
{
$Statement = str_ireplace(':'.$Name, $this->Adapter->QuoteValue($Parameter['Value']), $Statement);
}
The problem arises when we have two or mer similar parameters names, for example, session_browser and session_browse_version... The first one will partially replace the last one.
Course we learned to go around specifying the parameters within a specific order, but now that I have some "free" time I want to make it better, so I am thinking on switching to preg_replace... and I am not good in regular expression, can anyone give any help with a regex to replace a string like ':parameter_name`?
Best Regards,
Bruno B B Magalhaes
You should use the \b metacharacter to match the word boundary, so you don't accidentally match a short parameter name within a longer parameter name.
Also, you don't need to special-case arrays if you coerce a scalar Value to an array of one entry:
preg_replace("/:$Name\b/",
implode(",", $this->Adapter->QuoteValue( (array) $Parameter['Value'] )),
$Statement);
Note, however, that this can make false positive matches when an identifier or a string literal contains a pattern that looks like a parameter placeholder:
SELECT * FROM ":Name";
SELECT * FROM Table WHERE column = ':Name';
This gets even more complicated when quoted identifiers and string literals can contain escaped quotes.
SELECT * FROM Table WHERE column = 'word\':Name';
You might want to reconsider interpolating variables into SQL strings during prepare, because you're defeating any benefits of prepared statements with respect to security or performance.
I understand why you're doing what you're doing, because not all RDBMS back-end supports named parameters, and also SQL parameters can't be used for lists of values in an IN() predicate. But you're creating an awfully leaky abstraction.

fulltext search and highlighting by PHP and MySQL?

MySQL can take care of fulltext search quite well,
but it doesn't highlight the keywords that are searched,
how can I do this most efficiently?
The solutions posed above require retrieval of the entire document in order to search, replace, and highlight text. If the document is large, and many are, this seems like a really bad idea. Better would be for MySQL FTS to return the text offsets directly like SQLITE does, then use an indexed substring operator - that would be significantly more efficient.
Do your sql query and then do a preg_replace on the result, replacing each keyword with KeyWord
$hilitedText = preg_replace ( '/keyword/' , '/<span class="hilite">keyword<\/span>/' , $row['columName']);
And define the hilite class in your css to format however you want the hilighted keywords to appear.
If you have multiple keywords put them in an array and their replacements in a second array in the same order and pass those arrays as the firt two arguments of the function.
Get the result set from mysql. Do a search and replace for each search word, replacing each word with whatever you're doing for highlighting, e.g., <span class='highlight'>word</span>