Replace ASCII / URI encoded characters in Hive to String - hive

I have some data in Hive that looks something like this -
select "showTitle=October01%2C2019%7C11PM&c3.video.isLive=T&Version=%2817888%29" as input;
I would like to replace all the URI encoded characters to their string equivalents. For example, %2C means a comma ,. %28 means an opening bracket ( , %29 means a closing bracket ) and so on.
Can I do this with regexp_replace() function ?
The final output would be
showTitle=October01,2019|11PM&c3.video.isLive=T&Version=(17888)
The reference for ASCII encoding is here - https://www.w3schools.com/tags/ref_urlencode.asp

You can use reflect function to use "java.net.URLDecoder
SELECT reflect("java.net.URLDecoder", "decode", input, "UTF-8")
from input_table
Here's a link you can also follow to build HiveUDF (if you want)

Related

REGEXP_REPLACE URL BIGQUERY

I have two types of URL's which I would need to clean, they look like this:
["//xxx.com/se/something?SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"]
["//www.xxx.com/se/car?p_color_car=White?SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"]
The outcome I want is;
SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"
I want to remove the brackets and everything up to SE, the URLS differ so I want to remove:
First URL
["//xxx.com/se/something?
Second URL:
["//www.xxx.com/se/car?p_color_car=White?
I can't get my head around it,I've tried this .*\/ . But it will still keep strings I don't want such as:
(1 url) =
something?
(2 url) car?p_color_car=White?
You can use
regexp_replace(FinalUrls, r'.*\?|"\]$', '')
See the regex demo
Details
.*\? - any zero or more chars other than line breakchars, as many as possible and then ? char
| - or
"\]$ - a "] substring at the end of the string.
Mind the regexp_replace syntax, you can't omit the replacement argument, see reference:
REGEXP_REPLACE(value, regexp, replacement)
Returns a STRING where all substrings of value that match regular
expression regexp are replaced with replacement.
You can use backslashed-escaped digits (\1 to \9) within the
replacement argument to insert text matching the corresponding
parenthesized group in the regexp pattern. Use \0 to refer to the
entire matching text.

Delimit String off of Backslash SQL/SSIS

I'm trying to delimit a string based off of a backslash, I tried using the token function but then realized that the '\' character is an escape character. Is there any way to delimit the string off of a backslash?
This is what my token function currently looks like.
Token(#[User::DynamicFilename],"\", 7)
FIrst of all use double backslash \\ instead of one \, and you should use TOKEN with TOKEN Count functions in order to retrieve the file name:
TOKEN(#[User::DynamicFilename],"\\", TOKENCOUNT(#[User::DynamicFilename],"\\"))
So if you are looking to extract a filename from a full file path tokencount will detect the latest occurence of backslash. Example:
Consider that #[User::DynamicFilename] value is:
C:\My Files\Folder\file.txt
Since the TOKENCOUNT() will return 3 then the expression be will be
TOKEN(#[User::DynamicFilename],"\\",3)
And it will returns
File.txt
You need to put double the number of your backslashes.
In your example, it should be
Token(#[User::DynamicFilename],"\\", 7)
If you don't know how deep to go with token, i suggest the following to get your result.
right(#[User::DynamicFilename],findstring(reverse(#[User::DynamicFilename]),"\\")-1)

howto cut text from specific character in sqlite query

SQLITE Query question:
I have a query which returns string with the character '#' in it.
I would like to remove all characters after this specific character '#':
select field from mytable;
result :
text#othertext
text2#othertext
text3#othertext
So in my sample I would like to create a query which only returns :
text
text2
text3
I tried something with instr() to get the index, but instr() was not recognized as a function -> SQL Error: no such function: instr (probably old version of db . sqlite_version()-> 3.7.5).
Any hints howto achieve this ?
There are two approaches:
You can rtrim the string of all characters other than the # character.
This assumes, of course, that (a) there is only one # in the string; and (b) that you're dealing with simple strings (e.g. 7-bit ASCII) in which it is easy to list all the characters to be stripped.
You can use sqlite3_create_function to create your own rendition of INSTR. The specifics here will vary a bit upon how you're using

Remove Special Characters from an Oracle String

From within an Oracle 11g database, using SQL, I need to remove the following sequence of special characters from a string, i.e.
~!##$%^&*()_+=\{}[]:”;’<,>./?
If any of these characters exist within a string, except for these two characters, which I DO NOT want removed, i.e.: "|" and "-" then I would like them completely removed.
For example:
From: 'ABC(D E+FGH?/IJK LMN~OP' To: 'ABCD EFGHIJK LMNOP' after removal of special characters.
I have tried this small test which works for this sample, i.e:
select regexp_replace('abc+de)fg','\+|\)') from dual
but is there a better means of using my sequence of special characters above without doing this string pattern of '\+|\)' for every special character using Oracle SQL?
You can replace anything other than letters and space with empty string
[^a-zA-Z ]
here is online demo
As per below comments
I still need to keep the following two special characters within my string, i.e. "|" and "-".
Just exclude more
[^a-zA-Z|-]
Note: hyphen - should be in the starting or ending or escaped like \- because it has special meaning in the Character class to define a range.
For more info read about Character Classes or Character Sets
Consider using this regex replacement instead:
REGEXP_REPLACE('abc+de)fg', '[~!##$%^&*()_+=\\{}[\]:”;’<,>.\/?]', '')
The replacement will match any character from your list.
Here is a regex demo!
The regex to match your sequence of special characters is:
[]~!##$%^&*()_+=\{}[:”;’<,>./?]+
I feel you still missed to escape all regex-special characters.
To achieve that, go iteratively:
build a test-tring and start to build up your regex-string character by character to see if it removes what you expect to be removed.
If the latest character does not work you have to escape it.
That should do the trick.
SELECT TRANSLATE('~!##$%sdv^&*()_+=\dsv{}[]:”;’<,>dsvsdd./?', '~!##$%^&*()_+=\{}[]:”;’<,>./?',' ')
FROM dual;
result:
TRANSLATE
-------------
sdvdsvdsvsdd
SQL> select translate('abc+de#fg-hq!m', 'a+-#!', etc.) from dual;
TRANSLATE(
----------
abcdefghqm

SQL Server LIKE containing bracket characters

I am using SQL Server 2008. I have a table with the following column:
sampleData (nvarchar(max))
The value for this column in some of these rows are lists formatted as follows:
["value1","value2","value3"]
I'm trying to write a simple query that will return all rows with lists formatted like this, by just detecting the opening bracket.
SELECT * from sampleTable where sampleData like '[%'
The above query doesn't work, because '[' is a special character. How can I escape the bracket so my query does what I want?
... like '[[]%'
You use [ ] to surround a special character (or range).
See the section "Using Wildcard Characters As Literals" in SQL Server LIKE
Note: You don't need to escape the closing bracket...
Aside from gbn's answer, the other method is to use the ESCAPE option:
SELECT * from sampleTable where sampleData like '\[%' ESCAPE '\'
See the documentation for details.
Just a further note here...
If you want to include the bracket (or other specials) within a set of characters, you only have the option of using ESCAPE (since you are already using the brackets to indicate the set).
Also you must specify the ESCAPE clause, since there is no default escape character (it isn't backslash by default as I first thought, coming from a C background).
E.g., if I want to pull out rows where a column contains anything outside of a set of 'acceptable' characters, for the sake of argument let's say alphanumerics... we might start with this:
SELECT * FROM MyTest WHERE MyCol LIKE '%[^a-zA-Z0-9]%'
So we are returning anything that has any character not in the list (due to the leading caret ^ character).
If we then want to add special characters in this set of acceptable characters, we cannot nest the brackets, so we must use an escape character, like this...
SELECT * FROM MyTest WHERE MyCol LIKE '%[^a-zA-Z0-9\[\]]%' ESCAPE '\'
Preceding the brackets (individually) with a backslash and indicating that we are using backslash for the escape character allows us to escape them within the functioning brackets indicating the set of characters.