This question already has an answer here:
How to allow special characters in JSON_VALUE
(1 answer)
Closed 11 months ago.
I'm working with JSON values in a database for the first time. I want to use JSON_VALUE or JSON_QUERY to return a value or section of the JSON, but whoever designed this went and used '-' in the keys, which is an illegal value. Research suggests I need to use FOR JSON to escape it, but I can't figure out how.
Attempt at the query:
select
xt.ID,
JSON_QUERY(xt.JSON_DB_VALUE, '$.CusomterQuery.Details.cust-names') as JSON_WITH_NAMES,
JSON_VALUE(xt.JSON_DB_VALUE, '$.CusomterQuery.Details.cust-names.first-name') as FIRST_NAME
from EXAMPLE_TABLE xt
Error received:
JSON path is not properly formatted. Unexpected character '-' is found at position xx.
Thanks
thanks to Zhorov's quick comment, this is the very simple solution - add quotation marks around the parts of the path with the illegal character.
select
xt.ID,
JSON_QUERY(xt.JSON_DB_VALUE, '$.CusomterQuery.Details."cust-names"') as JSON_WITH_NAMES,
JSON_VALUE(xt.JSON_DB_VALUE, '$.CusomterQuery.Details."cust-names"."first-name"') as FIRST_NAME
from EXAMPLE_TABLE xt
Related
When trying to remove all accents from a string in Oracle using the techniques described in this stackoverflow answer: how replace accented letter in a varchar2 column in oracle I’m getting mixed results.
select CONVERT('JUAN ROMÄN', 'US7ASCII') from dual;
Returns the original string but replaces characters with for example ñ by a question mark (probably because of the chosen charset - tests with different charsets led to different results).
Using the following technique:
select utl_raw.cast_to_varchar2(nlssort(NAME_USER, 'nls_sort=binary_ai')) from YOUR_TABLE;
Returns the complete string but also places a NUL value at the end of the string.
Is there a characterset that I can use with Spanish accents to get a correct result (the original string with the different accents removed); is there a way to avoid the NUL value in the utl_raw.cast_to_varchar2 technique?
Based on the comments the the replace char(0) seems to remove the NUL value. For example
select
upper(utl_raw.cast_to_varchar2((nlssort('this is áà ñew test','nls_sort=binary_ai')))) as test,
replace(upper(utl_raw.cast_to_varchar2((nlssort('this is áà ñew test','nls_sort=binary_ai')))),chr(0),'') as test2
from dual;
If possible I would however to have a more 'straightforward/simpler' solution.
You can use TRANSLATE(your_string, from_chars, to_chars) https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions196.htm
Just put all chars with accents in from_chars string and their corresponding replacement chars in to_chars.
Based on my research so far this character indicates bad encoding between the database and front end. Unfortunately, I don't have any control over either of those. I'm using Teradata Studio.
How can I filter this character out? I'm trying to perform a REGEX_SUBSTR function on a column that occasionally contains �, which throws the error "The string contains an untranslatable character".
Here is my SQL. AIRCFT_POSITN_ID is the column that contains the replacement character.
SELECT DISTINCT AIRCFT_POSITN_ID,
REGEXP_SUBSTR(AIRCFT_POSITN_ID, '[0-9]+') AS AUTOROW
FROM PROD_MAE_MNTNC_VW.FMR_DISCRPNCY_DFRL
WHERE DFRL_CREATE_TMS > CURRENT_DATE -25
Your diagnostic is correct, so first of all, you might want to check the Session Character Set (it is part of the connection definition).
If it is ASCII change it to UTF8 and you will be able to see the original characters instead of the substitute character.
And in case the character is indeed part of the data and not just an indication for encoding translations issues:
The substitute character AKA SUB (DEC: 26 HEX: 1A) is quite unique in Teradata.
you cannot use it directly -
select '�';
-- [6706] The string contains an untranslatable character.
select '1A'XC;
-- [6706] The string contains an untranslatable character.
If you are using version 14.0 or above you can generate it with the CHR function:
select chr(26);
If you're below version 14.0 you can generate it like this:
select translate (_unicode '05D0'XC using unicode_to_latin with error);
Once you have generated the character you can now use it with REPLACE or OTRANSLATE
create multiset table t (i int,txt varchar(100) character set latin) unique primary index (i);
insert into t (i,txt) values (1,translate ('Hello שלום world עולם' using unicode_to_latin with error));
select * from t;
-- Hello ���� world ����
select otranslate (txt,chr(26),'') from t;
-- Hello world
select otranslate (txt,translate (_unicode '05D0'XC using unicode_to_latin with error),'') from t;
-- Hello world
BTW, there are 2 versions for OTRANSLATE and OREPLACE:
The functions under syslib works with LATIN.
the functions under TD_SYSFNLIB works with UNICODE.
In addition to Dudu's excellent answer above, I wanted to add the following now that I've encountered the issue again and had more time to experiment. The following SELECT command produced an untranslatable character:
SELECT IDENTIFY FROM PROD_MAE_MNTNC_VW.SCHD_MNTNC;
IDENTIFY
24FEB1747659193DC330A163DCL�ORD
Trying to perform a REGEXP_REPLACE or OREPLACE directly on this character produces an error:
Failed [6706 : HY000] The string contains an untranslatable character.
I changed the CHARSET property in my Teradata connection from UTF8 to ASCII and I could now see the offending character, looks like a tab
IDENTIFY
Using the TRANSLATE_CHK command using this specific conversion succeeds and identifies the position of the offending character (Note that this does not work using the UTF8 charset):
TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE) AS BADCHAR
BADCHAR
28
Now this character can be dealt with using some CASE statements to remove the bad character and retain the remainder of the string:
CASE WHEN TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE) = 0 THEN IDENTIFY
ELSE SUBSTR(IDENTIFY, 1, TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE)-1)
END AS IDENTIFY
Hopes this helps someone out.
I'm working with Google Big Query and try to extract some information from a string column into another column using Regexp_extract. In short:
Data in myVariable:
yippie/eggs-spam/?portlet:hungry=1234
yippie/eggs-spam/?portlet:hungry=456&portlet:hungrier=7890
I want a column with:
1234
456
My command:
SELECT Regexp_extract(myVariable, r'SOME_MAGIC') as result
FROM table
I tried for SOME_MAGIC:
hungry=(.*)[&$] - null, 456 (I learned that $ is interpreted as is)
hungry=(.*)(&|$) - Error: Exactly one capturing group must be specified
hungry=(.*)^& - null, null
hungry=(&.*)?$ - null, null
I read this, but there the number has a fixed length. Also looked at this, but "?=" is no known command for perl.
Does anybody have an idea? Thank you in advance!
I just found an answer to how I can solve my problem differently:
hungry=([0-9]+) - 1234, 456
It isn't an answer to my abstract question (regex for selecting Charater A to [Character B or EOL]), so it's not that satisfying. E.g. it won't work with
yippie/eggs-spam/?portlet:hungry=12AB34
However my original problem is solved. I leave the question open for a while in case somebody has a better answer.
I think I had a similar problem were I was trying to select the last 6 characters in a string (link_id) as a new column.
I kept getting this error:
Exactly one capturing group must be specified
My code originally was:
SELECT
...
REGEXP_EXTRACT(link_id, r'......$') AS updated_link_id
FROM sometable;
To get rid of the error and retrieve the correct substring as a column, I had to add parentheses around my regex string.
SELECT
...
REGEXP_EXTRACT(link_id, r'(......$)') AS updated_link_id
FROM sometable;
This question already has answers here:
SQL query for a carriage return in a string and ultimately removing carriage return
(10 answers)
Closed 8 years ago.
Facing a problem, it seems my data stored in SQL Server does not stored correctly, simply put, how to verify that a varchar data has carriage return and line feed in it? I try to print them out, does not show the special characters.
Thanks
You can use SQL Server's char(n) and contains() functions to match on field contents in your WHERE clause.
carriage return: char(13)
line feed: char(10)
The following SQL will find all rows in some_table where the values of some_field contain newline and/or carriage return characters:
SELECT * FROM some_table
WHERE CONTAINS(some_field, char(13)) OR CONTAINS(some_field, char(10))
To remove carriage returns in your some_field values you could use the replace() function long with char() and contains(). The SQL would look something like:
UPDATE some_table
SET some_field = REPLACE(some_field, char(13), '')
WHERE CONTAINS(some_field, char(13))
To remove new lines you can follow up the last statement with the char(10) version of that update. How you do it all depends on what types of newlines your text contains. But, depending on where the text was inserted/pasted from the new lines may be \r\n or \n so running updates against both \r and \n characters would be safer than assuming that you're getting one version of newline or the other.
Note that if the newlines were removed and you want to retain them then you have to fix the issue at the point of entry. You can't replace or fix what has been removed so you should save the original text data in a new column that holds the original, unmodified text.
To add to what others have said; when I need to embed newlines in T-SQL, I tend to do;
DECLARE #nl CHAR(2) = CHAR(13) + CHAR(10);
..then use #nl as required. That's for Windows line-endings, naturally.
Take a look at the Char function. See MSDN. This will help look for the special characters.
I am having some rows in table with some unusual character. When I use ascii() or unicode() for that character, it returns 63. But when I try this -
update MyTable
set MyColumn = replace(MyColumn,char(63),'')
it does not replace. The unusual character still exists after the replace function. Char(63) incidentally looks like a question mark.
For example my string is 'ddd#dd ddd' where # it's my unusual character and
select unicode('#')
return me 63.But this code
declare #str nvarchar(10) = 'ddd#dd ddd'
set #char = char(unicode('#'))
set #str = replace(#str,#char,'')
is working!
Any ideas how to resolve this?
Additional information:
select ascii('�') returns 63, and so does select ascii('?'). Finally select char(63) returns ? and not the diamond-question-mark.
When this character is pasted into Excel or a text editor, it looks like a space, but in an SQL Server Query window (and, apparently, here on StackOverflow as well), it looks like a diamond containing a question mark.
Not only does char(63) look like a '?', it is actually a '?'.
(As a simple test ensure you have numlock on your keyboard on, hold down the alt key andtype '63' into the number pad - you can all sorts of fun this way, try alt-205, then alt-206 and alt-205 again: ═╬═)
Its possible that the '?' you are seeing isn't a char(63) however, and more indicitive of a character that SQL Server doesn't know how to display.
What do you get when you run:
select ascii(substring('[yourstring]',[pos],1));
--or
select unicode(substring('[yourstring]',[pos],1));
Where [yourstring] is your string and [pos] is the position of your char in the string
EDIT
From your comment it seems like it is a question mark. Have you tried:
replace(MyColumn,'?','')
EDIT2
Out of interest, what does the following do for you:
replace(replace(MyColumn,char(146),''),char(63),'')
char(63) is a question mark. It sounds like these "unusual" characters are displayed as a question mark, but are not actually characters with char code 63.
If this is the case, then removing occurrences of char(63) (aka '?') will of course have no effect on these "unusual" characters.
I believe you actually didn't have issues with literally CHAR(63), because that should be just a normal character and you should be able to properly work with it.
What I think happened is that, by mistake, an UTF character (for example, a cyrilic "А") was inserted into the table - and either your:
columns setup,
the SQL code,
or the passed in parameters
were not prepared for that.
In this case, the sign might be visible to you as ?, and its CHAR() function would actually give 63, but you should really use the NCHAR() to figure out the real code of it.
Let me give a specific example, that I had multiple times - issues
with that Cyrilic "А", which looks identical to the Latin one, but has
a unicode of 1040.
If you try to use the non-UTF CHAR function on that 1040 character,
you would get a code 63, which is not true (and is probably just an
info about the first byte of multibyte character).
Actually, run this to make the differences in my example obvious:
SELECT NCHAR(65) AS Latin_A, NCHAR(1040) Cyrilic_A, ASCII(NCHAR(1040)) Latin_A_Code, UNICODE(NCHAR(1040)) Cyrilic_A_Code;
That empty string Which shows us '?' in substring.
Gives us Ascii value as 63.
It's a Zero Width space which gets appended if you copy data from ui and insert into the database.
To replace the data, you can use below query
**set MyColumn = replace(MyColumn,NCHAR(8203),'')**
It's an older question, but I've run into this problem as well. I found the solution somewhere else on internet, but I thought it would be good to share it here as well. Have a good day.
Replace(YourString, nchar(65533) COLLATE Latin1_General_BIN2, '')
This should work as well:
UPDATE TABLE
SET [FieldName] = SUBSTRING([FieldName], 2, LEN([FieldName]))
WHERE ASCII([FieldName]) = 63