The SQL comment character consists of two hyphens, thusly:
-- cannot create table if one already exists
drop table if exists mytable;
When using lstlisting in package listings for source code, the comment characters are converted to en dashes. If I insert a space between the hyphens, it looks like [hyphen][space][hyphen], instead of two hyphens next to each other. So, using lstlistings in package listings for SQL source code, how do I specify the comment characters?
Actually, someone pointed out that the hyphens as comment characters displayed properly. The issue is that the space between them is so small as to be indistinguishable. With a high quality printed document and a magnifying glass, you can see the space. On my display, a rather small one, the two hyphens bleed together.
Thanks for looking at this.
Related
I have a table in SQL that contains text data, and sometimes that text data contains emojis. See below for sample table output:
CommentID
Comment_Text
1
A walk in the park.
2
A lovely day in the park
3
A sunny day in the park ๐
What I'd like to do is separate out the emoji as a column. Ultimately what I'd like to end up with is a binary column indicating whether the comment text contains an emoji.
After some research, I was able to find the following solution:
REGEXP_SUBSTR(Comment_Text, '[^\x00-\x7F]+', 1, 1)
Which will separate out the first emoji the code finds. In actuality, this regex doesn't find emojis, it just finds non-ASCII characters - emojis just happen to fall in that category. While this solution does work sometimes, it does not work when there are emojis and non-ASCII characters in the same comment. So, for example, if the comment was 'A lovely walk in the ฯaรk ๐', the code would output both the emojis but also the ฯ and the ร.
What I need is a way to separate out the emojis from the other non-ASCII characters.
Good day sir.
Can you try this function on Regex101 site?
I think it will work.
[^\x00-\x7F]+[^x00-\x7F]
I am trying to pull 'COURSE_TITLE' column value from 'PS_TRAINING' table in PeopleSoft and writing into UTF-8 text file to get loaded into Workday system. The file is erroring out while loading because of bad characters(ร รข and many more) present in the column. I have used a procedure which will convert non-ascii value into space. But because of this procedure, the 'Course_Title' which are written in non-english language like Chinese, Korean, Spanish also replacing with spaces.
I even tried using regular expressions (``regexp_like(course_title, 'ร) only to find bad characters but since the table has hundreds of thousands of rows, it would be difficult to find all bad characters. Please suggest a way to solve this.
If you change your approach, this may work.
Define what you want, and retrieve it.
select *
from PS_TRAINING
where not regexp_like(course_title, '[0-9A-Za-z]')```
If you take out too much data, just add it to the regex
I am seeing an issue here. I have a sql database with over 10,000 records. There is a description column that contains user input from our support website. Some users put commas into their description for grammar purposes. When I go to export my sql results as a excel file, the commas in the user description text mess up the arrangement of the file. I need to export as what's in the SQL cells and not every time it sees a comma. Please help?
I believe if you wrap each output field in quotes, Excel should know to treat that as one field.
I hope this helps.
Thank you, I also did a replace within the database and replaced all the commas with a space, and then replaced all the tabs and line breaks with a space as well. The new line delimiter was making excel think it was a new cell. I opened the excel file in notepad++ to see all of the LF's and CRLF's and then just searched+replaced the ascii sequence of the two in SQL with a space. LF's, commas, and tabs, are all non important characters to preserve. Thanks again. -Chris
I am trying to write a simple query to get all records that contain Unicode/non-ASCII characters in the "description" field:
select *
from products
where description <criterion goes here>
I can search for specific characters, but what I really want is any field that would fail if it were loaded to a system (or by a driver) that didn't suport Unicode.
It would be nice if there were a regex atom for ASCII, like \a, similar to \d for digit, \s for whitespace, etc, but if there is it has escaped me.
I suspect I am missing something easy or obvious.
This would be for Oracle, Sybase or PostgreSQL.
I are trying to search an FTI using CONTAINS for Twitter-style usernames, e.g. #username, but word breakers will ignore the # symbol. Is there any way to disable word breakers? From research, there is a way to create a custom word breaker DLL and install it and assign it but that all seems a bit intensive and, frankly, over my head. I disabled stop words so that dashes are not ignored but I need that # symbol. Any ideas?
You're not going to like this answer. But full text indexes only consider the characters _ and ` while indexing. All the other characters are ignored and the words get split where these characters occur. This is mainly because full text indexes are designed to index large documents and there only proper words are considered to make it a more refined search.
We faced a similar problem. To solve this we actually had a translation table, where characters like #,-, / were replaced with special sequences like '`at`','`dash`','`slash`' etc. While searching in the full text, u've to again replace ur characters in the search string with these special sequences and search. This should take care of the special characters.