How to Detect Question Mark Invalid Character in SQL - sql

I am working in a database that accepts imported files. When the client enters a registered trademark, copyright, or another invalid symbol, the database imports the symbol as an invalid character in the form of a question mark, like the following:
lorem ipsum dolor sit amet, consectetur � lorem ipsum dolor sit amet, consectetur
When printing this character, it appears as such:
lorem ipsum dolor sit amet, consectetur ? lorem ipsum dolor sit amet, consectetur
Is there a way to detect that symbol, as using a like statement doesn't detect the symbol.
The desired result is to be able to send a warning in a stored procedure that asks the user to check the inserted data to ensure validity.
Note: It is not enough to insert the string into a temp table and then check the temp table for question marks, as a question mark in the string is not uncommon and would create for more false positives than helpful alerts.
Thank you

That special character is NCHAR(65533) but evades normal pattern matching using LIKE, CHARINDEX, PATINDEX, etc. I did find one way to detect it using TRANSLATE, by swapping the Unicode replacement character for a different Unicode character that can't possibly be in the data already. I picked an 8-pointed star (✵, NCHAR(10037)) but there are so many to choose from...
CREATE TABLE dbo.whatever(things nvarchar(32));
INSERT dbo.whatever(things) VALUES
(N'this row is just fine.'),
(N'well, here there is a � rhombus.'),
(N'this row is just fine too.');
SELECT things
FROM dbo.whatever
WHERE TRANSLATE(things, nchar(65533), N'✵') LIKE N'%✵%';
Output:
well, here there is a � rhombus.
Also note the difference between print 'hi � there'; and print N'hi � there'; - don't be lazy, if your string is (or could contain) Unicode, always use the N'prefix'.
As Martin suggests, though, SQL Server can store whatever character is leading to the � - it is most likely because you are treating the file as ASCII, inserting them into a varchar column, or it is getting lost somewhere else along the way.

Related

Replacing Linefeeds in SQL database

I'm in a bit over my head here.
I have an SQL database, and I'm trying to replace all linefeeds (LF), which are NOT preceeded by a whitespace, with a whitespace + the linefeed. I'm using SQLiteStudio for this. What I have right now is the following:
UPDATE table
SET column = replace( column, '%' + char(10) + '%', ' ' )
When I run the above query, the following data:
<br><strong><font color="2018283286c3">
Lorem ipsum dolor sit amet, consectetur adipiscing[LF]
elit, sed do eiusmod tempor incididunt ut labore et[LF]
<hr size="1px" noshade style="clear:both;margin-top:10px;height:1px;">
... Becomes:
<br><strong><font color="2% %18283286c3">
Lorem ipsum dolor sit amet, consectetur adipiscing[LF]
elit, sed do eiusmod tempor incididunt ut labore et[LF]
<hr size="1px" noshade style="clear:both;margin-top:1% %px;height:1px;">
I have added the [LF]'s in the above for clarity. As can be seen, my query only replaces the zeroes, for some reason, and doesn't match the linefeeds.
What I need to end up with is this:
<br><strong><font color="2018283286c3">
Lorem ipsum dolor sit amet, consectetur adipiscing[WHITESPACE][LF]
elit, sed do eiusmod tempor incididunt ut labore et[WHITESPACE][LF]
<hr size="1px" noshade style="clear:both;margin-top:1% %px;height:1px;">
... so that only LF's NOT already preeceded by a whitespace are matched and replaced with a whitespace + LF. LF's already preeceded by a whitespace are left alone, ideally.
Any ideas what I'm doing wrong, or if there is a better method for this? I found the above query online and have tried to tweak it. Not used to working with these things. Thanks for reading!
Not sure if your DB setup supports regular expressions, but if so, you can try to do your search/replace with them. Take a look at this link:
replace a part of a string with REGEXP in sqlite3
Once you get your regexp replace function in place, you can use this as your search pattern:
(?P<mystring>.*\S+)\n$
This will match strings that end with a LF, but no whitespace directly preceding it. You can then use the named group "mystring" to construct the string you want.
You can test/revise your regexp here: https://regex101.com/

Sql to select and extract 10 characters before and 10 characters after a substring in Oracle Clob

Let's consider the text example below:
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever
since the 1500s, when an unknown printer took a galley of type and
scrambled it to make a type specimen book. It has survived not only
five centuries, but also the leap into electronic typesetting,
remaining essentially unchanged. It was popularised in the 1960s with
the release of Letraset sheets containing Lorem Ipsum passages, and
more recently with desktop publishing software like Aldus PageMaker
including versions of Lorem Ipsum.
How do I extract 10 characters before and 10 characters after the substring, let's say 'Lorem Ipsum' from the above paragraph in Oracle? The datatype is clob. I have been trying with Oracle functions such as SUBSTR, but no luck so far. Thanks for everyone's help.
You can use regexp_replace with back references:
select
regexp_replace(col, '.*?(.{0,10}Lorem Ipsum.{0,10}).*?','\1')
from t;
Here . matches any character and ? is for lazy matching. For the given input, it produces:
Lorem Ipsum is simplyindustry. Lorem Ipsum has been ontaining Lorem Ipsum passages,rsions of Lorem Ipsum.

Hibernate import.sql error in sql syntax: Unsuccessful: INSERT INTO

In Spring boot app I'm trying to load on startup import.sql file with my sql schema for testing app. Weird problem because the same sql file works when I'm adding its by hand to my DB.
sample of import.sql:
INSERT INTO car
(name, description, price) VALUES
('Audi Q7', 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse libero ex.', 150),
('Audi A4', 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse libero ex.', 79.99),
...
spring boot startup listing:
Listing on GitHubGist
sample of error:
HHH000388: Unsuccessful: INSERT INTO car
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1
HHH000388: Unsuccessful: (name, description, price) VALUES
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'name, description, price) VALUES' at line 1
...
That is because the entries in import.sql shouldn't span multiple lines. Hibernate reads it per line and executes each line read as a single statement.

ANTLR tokens ambiguity

I would like to parse string like "Lorem ipsum AND dolor AND sit amet consectetur"
I need two tokens
word "AND"
any other word
Using antlr2, if I define tokens like
AND_WORD: "AND" ;
ANY_OTHER_WORD: ('0'..'9'|'a'..'z'|'A'..'Z'|'_')+ ;
I'm getting "warning:lexical nondeterminism between rules"
How can I solve it? Should I somehow exclude AND_WORD from ANY_OTHER_WORD definition?

Newline character in Cucumber-JVM parameters

Is there any way of passing parameters containing newline characters into Cucumber-JVM scenarios?
As a workaround I'm putting "\n" strings into the parameter and replacing them with a newline at the beginning of the scenario method but it feels as if there might be a nicer way.
You can use Doc Strings using triple-quote. Example stolen from official wiki http://cukes.info/step-definitions.html
Given a blog post named "Random" with Markdown body
"""
Some Title, Eh?
==============
Here is the first paragraph of my blog post. Lorem ipsum dolor sit amet,
consectetur adipiscing elit.
"""