I'm trying to parse a Cypher query that uses the =~ regex operator, but I get an error when trying to use libcypher-parser.
I've tried to parse the query found in the Neo4J documentation:
WITH ['mouse', 'chair', 'door', 'house'] AS wordlist
UNWIND wordlist AS word
WITH word
WHERE word =~ '.*ous.*'
RETURN word
It can be found here: https://neo4j.com/docs/cypher-manual/3.5/syntax/operators/#syntax-using-a-regular-expression-to-filter-words
I tried this command:
echo "WITH ['mouse', 'chair', 'door', 'house'] AS wordlist UNWIND wordlist AS word WITH word WHERE word =~ '.*ous.*' RETURN word" | cypher-lint -a
I get this error:
<stdin>:1:100: Invalid input '~': expected NOT, '+', '-', TRUE, FALSE, NULL, "...string...", a float, an integer, '[', a parameter, '{', CASE, FILTER, EXTRACT, REDUCE, ALL, ANY, NONE, SINGLE, shortestPath, allShortestPaths, '(', a function name or an identifier
...ordlist UNWIND wordlist AS word WITH word WHERE word =~ '.*ous.*' RETURN word
^
Is it really not supported? What version of the libcypher-parser should I use that supports the =~ operator?
This is a bug in libcypher-parser prior to version 0.6.0. It should be resolved in later releases.
Related
I want to convert text like this using Snowflake SQL:
A very(!) punctuated sentence! A complicated/confusing regex.
to this:
'A very ( ! ) punctuated sentence ! A complicated / confusing regex . '
Double spaces between punctuation is ok because I can do a second pass to compress whitespace. The punctuation list is
.,&-_()[]{}-/:;%$#!*|?~=+"\'
But if there is a standard shortcut for all punctuation I would consider that. I have seen answers that use Java regex that uses \p{Punct}. But in my tests I can't use the punctuation identifier and don't see it in the Snowflake docs.
I have a working version that makes my eyes bleed and it's not even fully written out:
select regexp_replace(
'a very(!) punctuated sentence! A complicated/confusing regex?.',
'(\\(|\\)|\\/|!|\\?)', -- only addresses (), /, !, ?, not the full list
' \\1 '
) as "result" from table
result: "a very ( ! ) punctuated sentence ! A complicated / confusing regex ? ."
For some reason there are not double spaces, which makes me question the result as well as the readability of the implementation
My understanding is that character classes are more performant and simpler to visually parse. But this doesn't work:
select regexp_replace(
'a very(!) punctuated sentence! A complicated/confusing regex?.',
'[.,&-_()[]{}-/:;%$#!*|?~=+"\'`]',
' \\1 '
) as "result" from table
-- Error: no argument for repetition operator: ?
It also doesn't seem that back references are available to character classes.
Is there a way to write this query that is relatively performant and allows the reader to easily visually parse the punctuation list such as in the character classes above?
I see two potential problems in your current approach. First, the hyphen should appear last in the character class, or else it should be escaped. Currently, your character class has &-_, which means include every character in between & and _, probably not what you intended. Second, your regex doesn't actually have a first capture group. You could try replacing with \0, or just put the punctuation character into a first capture group and then use \1 as you already were doing.
SELECT REGEXP_REPLACE(
'a very(!) punctuated sentence! A complicated/confusing regex?.',
'([.,&_()[]{}-/:;%$#!*|?~=+"\'`-])',
' \\1 '
) AS "result"
FROM yourTable;
This solution works great and is very readable:
select regexp_replace(
'a very(!) punctuated sentence! A complicated/confusing regex?.',
'[[:punct:]]',
' \\0 '
)
I got some errors when trying Tim's answer no argument for repetition operator: ?
that got me to Snowflakes documentation of using POSIX basic and extended usage where [:punct:] is a valid character class. That character class covers all of the punctuation I had before, plus <>^# which works for my purposes.
Thank you Tim and Abra
This is my SQL code:
CREATE TABLE country (
id serial NOT NULL PRIMARY KEY,
name varchar(100) NOT NULL CHECK(name ~ '^[-\p{L} ]{2,100}$'),
code varchar(3) NOT NULL
);
Notice the regex constraint at the name attribute. The code above will result in ERROR: invalid regular expression: invalid escape \ sequence.
I tried using escape CHECK(name ~ E'^[-\\p{L} ]{2,100}$') but again resulted in ERROR: invalid regular expression: invalid escape \ sequence.
I am also aware that if I do CHECK(name ~ '^[-\\p{L} ]{2,100}$'), or CHECK(name ~ E'^[-\p{L} ]{2,100}$'), - the SQL will receive wrong Regex and therefore will throw a constraint violation when inserting valid data.
Does PostgreSQL regex constraints not support regex patterns (\p) or something like that?
Edit #1
The Regex ^[-\p{L} ]{2,100}$ is basically allows country name that are between 2-100 characters and the allowed characters are hyphen, white-space and all letters (including latin letters).
NOTE: The SQL runs perfectly fine during the table creation but will throw the error when inserting valid data.
Additional Note: I am using PostgreSQL 12.1
The \p{L} Unicode category (property) class matches any letter, but it is not supported in PostgreSQL regex.
You may get the same behavior using a [:alpha:] POSIX character class
'^[-[:alpha:] ]{2,100}$'
I am using solr 3.I can search starting with attributeValue:\hin* But it fails forattributeValue:\uo*
error is
"error": {
"msg": "org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape sequence: o",
"code": 400
}
Issue is \u I can not exclude u as user can search anything from type+search.
When you're searching for something starting with \u it's treated as a Unicode symbol. Of course o symbol isn't allowed to be in the Unicode symbol. If you want to search for \ you need to escape it. More info on this:
Lucene/Solr supports escaping special characters that are part of the query
syntax. The current list special characters are
+ - && || ! ( ) { } [ ] ^ " ~ * ? : \ /
To escape these character use the \ before the character
SQLITE Query question:
I have a query which returns string with the character '#' in it.
I would like to remove all characters after this specific character '#':
select field from mytable;
result :
text#othertext
text2#othertext
text3#othertext
So in my sample I would like to create a query which only returns :
text
text2
text3
I tried something with instr() to get the index, but instr() was not recognized as a function -> SQL Error: no such function: instr (probably old version of db . sqlite_version()-> 3.7.5).
Any hints howto achieve this ?
There are two approaches:
You can rtrim the string of all characters other than the # character.
This assumes, of course, that (a) there is only one # in the string; and (b) that you're dealing with simple strings (e.g. 7-bit ASCII) in which it is easy to list all the characters to be stripped.
You can use sqlite3_create_function to create your own rendition of INSTR. The specifics here will vary a bit upon how you're using
I am parsing SQL in Haskell using Parsec. How can I ensure that a statement with a where clause will not treat the WHERE as a table name? Find below some part of my coding. The p_Combination works but it sees the WHERE as part of the list of attributes instead of the where clause.
--- from clause
data Table_clause = Table {table_name :: String, alias :: Maybe String} deriving Show
p_Table_clause:: Parser Table_clause
p_Table_clause = do
t <- word
skipMany (space <?> "require space at the Table clause")
a <- optionMaybe (many1 (alphaNum)) <?> "aliase for table or nothing"
return $ Table t a
newtype From_clause = From [Table_clause] deriving Show
p_From_clause :: Parser From_clause
p_From_clause = do
string "FROM" <?> "From";
skipMany1 (space <?> "space in the from clause ")
x <- sepBy p_Table_clause (many1(char ',' <|> space))
return $ From x
-- where clause conditions elements
data WhereClause = WhereFCondi String deriving Show
p_WhereClause :: Parser WhereClause
p_WhereClause = do
string "WHERE"
skipMany1 space
x <- word
return $ WhereFCondi x
data Combination = FromWhere From_clause (Maybe WhereClause) deriving Show
p_Combination :: Parser Combination
p_Combination = do
x <- p_From_clause
skipMany1 space
y <- optionMaybe p_WhereClause
return $ FromWhere x y
Normal SQL parsers have a number of reserved words, and they’re often not context-sensitive. That is, even where a where might be unambiguous, it is not allowed simply because it is reserved. I’d guess most implementations do this by first lexing the source in a conceptually separate stage from parsing the lexed tokens, but we do not need to do that with Parsec.
Usually the way we do this with Parsec is by using Text.Parsec.Token. To use it, you first create a LanguageDef defining some basic characteristics about the language you intend to parse: how comments work, the reserved words, whether it’s case sensitive or not, etc. Then you use makeTokenParser to get a record full of functions tailored to that language. For example, identifier will not match any reserved word, they are all careful to require whitespace where necessary, and when they are skipping whitespace, comments are also skipped.
If you want to stay with your current approach, using only Parsec primitives, you’ll probably want to look into notFollowedBy. This doesn’t do exactly what your code does, but it should provide some inspiration about how to use it:
string "FROM" >> many1 space
tableName <- many1 alphaNum <* many1 space
aliasName <- optionMaybe $ notFollowedBy (string "WHERE" >> many1 space)
>> many1 alphaNum <* many1 space
Essentially:
Parse a FROM, then whitespace.
Parse a table name, then whitespace.
If WHERE followed by whitespace is not next, parse an alias name then whitespace.
I guess the problem is that p_Table_clause accepts "WHERE". To fix this, check for "WHERE" and fail the parser:
p_Table_clause = do
t <- try (do w <- word
if w == "WHERE"
then unexpected "keyword WHERE"
else return w)
...
I guess there might be a try missing in sepBy p_Table_clause (many1 (char ',' <|> space)). I would try sepBy p_Table_clause (try (many1 (char ',' <|> space))).
(Or actually, I would follow the advice from the Parsec documentation and define a lexeme combinator to handle whitespace).
I don't really see the combinator you need right away, if there is one. Basically, you need p_Combination to try (string "WHERE" >> skipMany1 space) and if it succeeds, parse a WHERE "body" and be done. If it fails, try p_Table_clause, if it fails be done. If p_Table_clause succeeds, read the separator and loop. After the loop is done prepend the Table_clause to the results.
There's some other problems with your parser, too. many1 (char ',' <|> space) matches " ,,, , ,, " which is not a valid separator between tables in a from clause, for example. Also, SQL keywords are case-insensitive, IIRC.
In general, you want to exclude keywords from matching identifiers, with something like:
keyword :: Parser Keyword
keyword = string "WHERE" >> return KW_Where
<|> string "FROM" >> return KW_From
<|> string "SELECT" >> return KW_Select
identifier :: Parser String
identifier = try (keyword >> \kw -> fail $ "Expected identifier; got:" ++ show kw)
<|> (liftA2 (:) identiferStart (many identifierPart))
If two (or more) or your keywords have common prefixes, you'll want to combine them for more efficiency (less backtracking) like:
keyword :: Parser Keyword
keyword = char 'D' >> ( string "ROP" >> KW_Drop
<|> string "ELETE" >> KW_Delete
)
<|> string "INSERT" >> return KW_Insert