Is there a standard way of restricting the results of a SPARQL query to belong to a specific namespace.
Short answer - no there is no standard direct way to do this
Long answer - However yes you can do a limited form of this with the string functions and a FILTER clause. What function you use depends on what version of SPARQL your engine supports.
SPARQL 1.1 Solution
Almost all implementations will these days support SPARQL 1.1 and you can use the STRSTARTS() function like so:
FILTER(STRSTARTS(STR(?var), "http://example.org/ns#"))
This is my preferred approach and should be relatively efficient because it is simple string matching.
SPARQL 1.0 Solution
If you are stuck using an implementation that only supports SPARQL 1.0 you can still do this like so but it uses regular expressions via the REGEX() function so will likely be slower:
FILTER(REGEX(STR(?var), "^http://example\\.org/ns#"))
Regular Expressions and Meta-Characters
Note that for the regular expression we have to escape the meta-character . as otherwise it could match any character e.g. http://exampleXorg/ns#foo would be considered a valid match.
As \ is the escape character for both regular expressions and SPARQL strings it has to be double escaped here in order to get the regular expression to have just \. in it and treat . as a literal character.
Recommendation
If you can use SPARQL 1.1 then do so because using the simpler string functions will be more performant and avoids the need to worry about escaping any meta-characters that you have when using REGEX
Related
I am using Firebird 2.5 and I have a field (called identifier) with mixed letters, numbers and special characters. I would like to use regex to extract only the numbers in a new column. I have tried something like below, but it is not working.
Any idea how I can achieve this using regex without using stored procedures or execute block
SELECT ORDER_ID,
ORDER_DATE,
SUBSTRING(IDENTIFIER FROM 1 TO 10) SIMILAR TO '^[0-9]{10}$' --- DESIRED EXTRACTION COLUMN
FROM ORDERS
Example of data
IDENTIFIER DESIRED OUTPUT
ANDRE 02869567995 02869567995
02869567995 MARIA 02869567995
028.695.67.995 02869567995
028695679-95 02869567995
You cannot do this in Firebird 2.5, at least not without help from a UDF, or a (selectable) stored procedure. I'm not aware of third-party UDFs providing regular expressions, so you might have to write this yourself.
In Firebird 3.0, you could also use a UDR or stored function to achieve this. Unfortunately, using the regular expression functionality available in Firebird alone will not be enough to solve this.
NOTE: The rest of the answer is based on the assumption to extract digits if the first 10 characters of string are digits. With the updated question, this assumption is no longer valid.
That said, if your need is exactly as shown in your question, that is only extract the first 10 characters from a string if they are all digits, then you could use:
case
when IDENTIFIER similar to '[[:DIGIT:]]{10}%'
then substring(IDENTIFIER from 1 for 10)
end
(as an aside, the positional SUBSTRING syntax is from <start> for <length>, not from <start> to <end>)
In Firebird 3.0 and higher, you can use SUBSTRING(... SIMILAR ...) with a SQL regular expression pattern. Assuming you want to extract 10 digits from the start of a string, you can do:
substring(IDENTIFIER similar '#"[[:DIGIT:]]{10}#"%' escape '#')
The #" delimits the pattern to extract (where # is a custom escape character as specified in the ESCAPE clause). The remainder of the pattern must match the rest of the string, hence the use of % here (in other cases, you may need to specify a pattern before the first #" as well.
See this dbfiddle for an example.
It is not possible in any version of Firebird.
When trying to list down all tables names in a database having a specific name format, the following query works fine :
show tables like '*case*';
while the following does not
show tables like '%case%';
On the other hand, when comparing the actual data inside string columns its the vice-versa case
Working query :
select column from database.table where column like '%ABC%' limit 5;
Not working query :
select column from database.table where column like '*ABC*' limit 5;
What's the difference between the 2 operators * and % ?
This is the difference between regular expressions and like patterns.
LIKE is built into the SQL language. It has two wildcards:
% represents any number of characters including zero.
_ represents exactly one character.
Regular expressions are much more flexible for matching almost any pattern in a string.
When SQL was invented, I don't think regular expressions were in common use in computer systems -- at the very least, the folks at IBM who worked on relational databases may not have been familiar with the folks at ATT who were inventing Unix.
Regular expressions are much more powerful than LIKE patterns, of course. And Hive supports them via the RLIKE operator (and some other functions).
The SHOW functionality is not standard SQL. So, the developers of Hive chose the more flexible method for pattern matching.
HiveQL attempts to mimic the SQL, but it does not strictly follow its standards.
The usage of the wildcards are not pertinent to the LIKE clause, but to the statement itself. SHOW statements validate the wildcards based on the Java regular expression whereas when it comes to SELECT statements, Hive tries to stick with the SQL's wildcard validation.
I noticed an inconsistency in how "HANA SQL" escapes single quotes in the context of the PLACEHOLDER clause. For example, consider the following PLACEHOLDER clause snippet:
('PLACEHOLDER' = ('$$CC_PARAM$$','''foo'',''an escaped single quote \'' '''))
The PLACEHOLDER clause above contains multiple values assigned to the CC_PARAM. parameter. We can see that inside of the second argument we have a single quote that's escaped with a backslash. However, we escape the single quotes outside each argument with another single quote (i.e. we do '' instead of \''. It's possible to use the \'' format for the first case, but it's not possible to use the '' format in the second case.
Why is there this discrepancy? It makes escaping quotes in multi-input input parameters tricky. I'm looking to programmatically craft SQL queries for HANA. Am I missing something here? Is it safe to use \'' over '' in all cases? Or do I need logic that can tell where a single quote occurs and escape as appropriate?
The implicit rule here - given by how the software is implemented - is that for parameter values of calculation views, the backslash \ is used to escape the single quotation mark.
For all standard SQL string occurrences, using the single-quotation mark twice '' is the correct way to differentiate between syntax element and string literal.
As for the why:
the PLACEHOLDER syntax is not SQL, but a HANA-specific command extension. So, there is no general standard that the current implementation violates.
that given, this command extension is embedded into, respectively clamped onto the standard SQL syntax and has to be handled by the same parser.
But the parameters are not only parsed once, by the SQL parser but again by the component that instantiates the calculation scenario based on the calculation view. With a bit of squinting it's not hard to see that the parameters interface is a general key-value interface that allows for all sorts of information to be handed over to the calc. engine.
One might argue that the whole approach of providing parameters via key-value pairs is not consistent with the general SQL syntax approach and be correct. On the flip side, this approach allows for general flexibility for adding new command elements to the HANA-specific parts, without structurally changing the syntax (and with it the parser).
The clear downside of this is that both the key names, as well as the values, are string-typed. To avoid losing the required escaping for the "inner string" an escape string different from the main SQL escape string needs to be used.
And here we are with two different ways of handing over a string value to be used as a filter condition.
Funny enough, both approaches may still lead to the same query execution plan.
As a matter of fact, in many scenarios with input parameters, the string value will be internally converted into a SQL conforming form. This is the case when the input parameter is used for filtering or in expressions in the calc. view that can be converted into SQL expressions.
For example
SELECT
"AAA"
FROM "_SYS_BIC"."sp/ESC"
('PLACEHOLDER' = ('$$IP_TEST$$', 'this is a test\''s test'));
shows the following execution plan on my system
OPERATOR_NAME OPERATOR_DETAILS
PROJECT TEST.AAA
COLUMN TABLE FILTER CONDITION: TEST.AAA = 'this is a test's test'
(DETAIL: ([SCAN] TEST.AAA = 'this is a test's test'))
Note how the escape-\' has been removed.
All in all: when using PLACEHOLDER values, the \' escaping needs to be used and in all other cases, the '' escaping.
That should not be terribly difficult to implement for a query builder as you can consider this when dealing with the PLACEHOLDER syntax.
The SQL wildcards "%" and "_" are well documented and widely known. However as w3schools explains, there are also "charlist" style wildcards for matching a single character within or outside a given range, for example to find all the people called Carl but not those called Earl:
select * from Person where FirstName like '[A-D]arl'
... or to find the opposite, use either:
select * from Person where FirstName like '[!A-D]arl'
or (depending on the RDBMS, presumably):
select * from Person where FirstName like '[^A-D]arl'
Is this type of wildcard part of the SQL-92 standard, and what databases actually support it? For example:
Oracle 11g doesn't support it
SQL Server 2005 supports it, with the negation operator being "^" (not "!")
The SQL-99 Standard has a SIMILAR TO predicate which uses "charlist" style as well as the "%" and "_" wildcard characters.
Nothing similar (no pun intended) in the SQL-92 Standard, though.
The "charlist" operators look like regular expressions, or a limited subset of them. AFAIK there's no regular expression syntax specified in SQL-92 although many databases support regex's, and HOW they support it varies. Oracle, for example, has functions to do regular expression comparisons and substitutions. Don't know how others do it.
Share and enjoy.
I have a sql function that accepts keywords and returns a full text search table.
How do I format the keyword string when it contains multiple keywords? Do I need to splice the string and insert "AND"? (I am passing the keywords to the method through Linq TO SQL)
Also, how do I best protect myself from sql injection here.? Are the default ASP.NET filters sufficient?
thanks
I would use "AND" and asterisks on each word. The asterisk will help the matching be a bit wider since I believe it is best to return too many rather than too few. For example, a search for "Georgia Peach" would use the keyword string '"Georgia*" AND "Peach*"' (the double quotes around each word are important).
And I believe the ASP.NET Filters are sufficient. Plus, since you are using parameterized queries (which LINQ to SQL does), you are pretty safe.