ANTLR tokens ambiguity - antlr

I would like to parse string like "Lorem ipsum AND dolor AND sit amet consectetur"
I need two tokens
word "AND"
any other word
Using antlr2, if I define tokens like
AND_WORD: "AND" ;
ANY_OTHER_WORD: ('0'..'9'|'a'..'z'|'A'..'Z'|'_')+ ;
I'm getting "warning:lexical nondeterminism between rules"
How can I solve it? Should I somehow exclude AND_WORD from ANY_OTHER_WORD definition?

Related

How to Detect Question Mark Invalid Character in SQL

I am working in a database that accepts imported files. When the client enters a registered trademark, copyright, or another invalid symbol, the database imports the symbol as an invalid character in the form of a question mark, like the following:
lorem ipsum dolor sit amet, consectetur � lorem ipsum dolor sit amet, consectetur
When printing this character, it appears as such:
lorem ipsum dolor sit amet, consectetur ? lorem ipsum dolor sit amet, consectetur
Is there a way to detect that symbol, as using a like statement doesn't detect the symbol.
The desired result is to be able to send a warning in a stored procedure that asks the user to check the inserted data to ensure validity.
Note: It is not enough to insert the string into a temp table and then check the temp table for question marks, as a question mark in the string is not uncommon and would create for more false positives than helpful alerts.
Thank you
That special character is NCHAR(65533) but evades normal pattern matching using LIKE, CHARINDEX, PATINDEX, etc. I did find one way to detect it using TRANSLATE, by swapping the Unicode replacement character for a different Unicode character that can't possibly be in the data already. I picked an 8-pointed star (✵, NCHAR(10037)) but there are so many to choose from...
CREATE TABLE dbo.whatever(things nvarchar(32));
INSERT dbo.whatever(things) VALUES
(N'this row is just fine.'),
(N'well, here there is a � rhombus.'),
(N'this row is just fine too.');
SELECT things
FROM dbo.whatever
WHERE TRANSLATE(things, nchar(65533), N'✵') LIKE N'%✵%';
Output:
well, here there is a � rhombus.
Also note the difference between print 'hi � there'; and print N'hi � there'; - don't be lazy, if your string is (or could contain) Unicode, always use the N'prefix'.
As Martin suggests, though, SQL Server can store whatever character is leading to the � - it is most likely because you are treating the file as ASCII, inserting them into a varchar column, or it is getting lost somewhere else along the way.

Which approaches of quick adding of surroundings for text are exist in IntelliJ IDEA family IDEs?

Basic example
Consider below code. I took Pug preprocessor for example, but it could be any other declarative language like HTML, HAML, etc.
p.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et
dolore magna aliqua.
I need:
p.
Lorem ipsum dolor sit amet, #[+ImprovedUnderline consectetur] adipiscing elit, sed do eiusmod tempor incididunt ut labore et
dolore magna aliqua. #[+ImprovedUnderline ]
The content of last #[+ImprovedUnderline ] has not been inputted yet.
Target
Provide 2 methods of quick adding of #[+ImprovedUnderline TARGET_WORDS] surrounding without direct typing:
Before TARGET_WORDS will be input.
After TARGET_WORDS will be input (select the TARGET_WORDS and surround it by #[+ImprovedUnderline TARGET_WORDS]).
Why Live Templates can not handle it
Consider below Live template:
#[+ImprovedUnderline $SELECTION$]
For the version 2020.2, if we have something in line and try to use above live template with Ctrl + Alt + J, all previous characters in line will be wrapped:
So the Live templates does not satisfy to first target.
Which other methods ItelliJ IDEA suggests?
IntelliJ IDEA makes it very easy to surround a code block with if, while, for and other statements or to make a code block become part of such statments as try/catch or synchronized. Simply select the code block to surround (don’t forget to use Ctrl + W to increase the current selection) and then press Ctrl + Alt + T (or right-click the selection and select Surround with… from the menu). IntelliJ IDEA will show a list of options to choose from

Replacing Linefeeds in SQL database

I'm in a bit over my head here.
I have an SQL database, and I'm trying to replace all linefeeds (LF), which are NOT preceeded by a whitespace, with a whitespace + the linefeed. I'm using SQLiteStudio for this. What I have right now is the following:
UPDATE table
SET column = replace( column, '%' + char(10) + '%', ' ' )
When I run the above query, the following data:
<br><strong><font color="2018283286c3">
Lorem ipsum dolor sit amet, consectetur adipiscing[LF]
elit, sed do eiusmod tempor incididunt ut labore et[LF]
<hr size="1px" noshade style="clear:both;margin-top:10px;height:1px;">
... Becomes:
<br><strong><font color="2% %18283286c3">
Lorem ipsum dolor sit amet, consectetur adipiscing[LF]
elit, sed do eiusmod tempor incididunt ut labore et[LF]
<hr size="1px" noshade style="clear:both;margin-top:1% %px;height:1px;">
I have added the [LF]'s in the above for clarity. As can be seen, my query only replaces the zeroes, for some reason, and doesn't match the linefeeds.
What I need to end up with is this:
<br><strong><font color="2018283286c3">
Lorem ipsum dolor sit amet, consectetur adipiscing[WHITESPACE][LF]
elit, sed do eiusmod tempor incididunt ut labore et[WHITESPACE][LF]
<hr size="1px" noshade style="clear:both;margin-top:1% %px;height:1px;">
... so that only LF's NOT already preeceded by a whitespace are matched and replaced with a whitespace + LF. LF's already preeceded by a whitespace are left alone, ideally.
Any ideas what I'm doing wrong, or if there is a better method for this? I found the above query online and have tried to tweak it. Not used to working with these things. Thanks for reading!
Not sure if your DB setup supports regular expressions, but if so, you can try to do your search/replace with them. Take a look at this link:
replace a part of a string with REGEXP in sqlite3
Once you get your regexp replace function in place, you can use this as your search pattern:
(?P<mystring>.*\S+)\n$
This will match strings that end with a LF, but no whitespace directly preceding it. You can then use the named group "mystring" to construct the string you want.
You can test/revise your regexp here: https://regex101.com/

Sql to select and extract 10 characters before and 10 characters after a substring in Oracle Clob

Let's consider the text example below:
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever
since the 1500s, when an unknown printer took a galley of type and
scrambled it to make a type specimen book. It has survived not only
five centuries, but also the leap into electronic typesetting,
remaining essentially unchanged. It was popularised in the 1960s with
the release of Letraset sheets containing Lorem Ipsum passages, and
more recently with desktop publishing software like Aldus PageMaker
including versions of Lorem Ipsum.
How do I extract 10 characters before and 10 characters after the substring, let's say 'Lorem Ipsum' from the above paragraph in Oracle? The datatype is clob. I have been trying with Oracle functions such as SUBSTR, but no luck so far. Thanks for everyone's help.
You can use regexp_replace with back references:
select
regexp_replace(col, '.*?(.{0,10}Lorem Ipsum.{0,10}).*?','\1')
from t;
Here . matches any character and ? is for lazy matching. For the given input, it produces:
Lorem Ipsum is simplyindustry. Lorem Ipsum has been ontaining Lorem Ipsum passages,rsions of Lorem Ipsum.

Newline character in Cucumber-JVM parameters

Is there any way of passing parameters containing newline characters into Cucumber-JVM scenarios?
As a workaround I'm putting "\n" strings into the parameter and replacing them with a newline at the beginning of the scenario method but it feels as if there might be a nicer way.
You can use Doc Strings using triple-quote. Example stolen from official wiki http://cukes.info/step-definitions.html
Given a blog post named "Random" with Markdown body
"""
Some Title, Eh?
==============
Here is the first paragraph of my blog post. Lorem ipsum dolor sit amet,
consectetur adipiscing elit.
"""