How do i tokenise the non-space separated string? - tokenize

if i have a string "Hello,I am XYZ" it can be tokenised into tokens :- "Hello","I","am","XYZ" . But How would i tokenise a non-space separated string . for eg : "Hello,IamXYZ" ?

If you do not have spaces in a string, then you use a dictionary to tokenize it.
Another approach is using ngrams, but be careful of the length of the string since it could create many ngrams!

Related

replace backslash in redshift

I am working with a string column in redshift database where the instance of \" occurs multiple times in the same value.
I want to replace every occurrence of \" with "
For example, if a string = \"name\"
I want the output to be string = "name"
From what I have found, redshift does not allow the existence of a single backslash, and automatically converts it to a double backslash, but that is not happening in this case.
I have tried to use the REPLACE() with REPLACE( string, '\"', '"' ) but it did not have any effect. Can the string being a JSON string have any bearing on the function of REPLACE()?
I have been trying to use regexp_replace but maybe I am not using the right regular expression, hence I am not able to solve the problem.
REPLACE( string, '\\"', '"' ) seems works in this situation. I am guessing its because redshift doesn't allow a single backslash, but converts them to double backslash.
So, even though the string looked like \"name\" it was probably stored as \\"name\\" and hence putting a single backslash in the replace was not working.
EDIT: please read Bill's explanation in the comment below this reply

How to insert delimiter in a string with kotlin

I have a mac string:
mac=7A2918D5434F
And I need to convert to this:
mac=7A:29:18:D5:43:4F
How can I do that in kotlin?
If you are sure that your initial string is correct you can do something like:
"7A2918D5434F".chunked(2).joinToString(":")
chunked(2) splits the string in chunks of size 2 (can be used for any Iterable).
jointToString(":") takes a list, joins the elements to string using : as delimiter

Printing Unnecessary escape character [duplicate]

I tried many ways to get a single backslash from an executed (I don't mean an input from html).
I can get special characters as tab, new line and many others then escape them to \\t or \\n or \\(someother character) but I cannot get a single backslash when a non-special character is next to it.
I don't want something like:
str = "\apple"; // I want this, to return:
console.log(str); // \apple
and if I try to get character at 0 then I get a instead of \.
(See ES2015 update at the end of the answer.)
You've tagged your question both string and regex.
In JavaScript, the backslash has special meaning both in string literals and in regular expressions. If you want an actual backslash in the string or regex, you have to write two: \\.
The following string starts with one backslash, the first one you see in the literal is an escape character starting an escape sequence. The \\ escape sequence tells the parser to put a single backslash in the string:
var str = "\\I have one backslash";
The following regular expression will match a single backslash (not two); again, the first one you see in the literal is an escape character starting an escape sequence. The \\ escape sequence tells the parser to put a single backslash character in the regular expression pattern:
var rex = /\\/;
If you're using a string to create a regular expression (rather than using a regular expression literal as I did above), note that you're dealing with two levels: The string level, and the regular expression level. So to create a regular expression using a string that matches a single backslash, you end up using four:
// Matches *one* backslash
var rex = new RegExp("\\\\");
That's because first, you're writing a string literal, but you want to actually put backslashes in the resulting string, so you do that with \\ for each one backslash you want. But your regex also requires two \\ for every one real backslash you want, and so it needs to see two backslashes in the string. Hence, a total of four. This is one of the reasons I avoid using new RegExp(string) whenver I can; I get confused easily. :-)
ES2015 and ES2018 update
Fast-forward to 2015, and as Dolphin_Wood points out the new ES2015 standard gives us template literals, tag functions, and the String.raw function:
// Yes, this unlikely-looking syntax is actually valid ES2015
let str = String.raw`\apple`;
str ends up having the characters \, a, p, p, l, and e in it. Just be careful there are no ${ in your template literal, since ${ starts a substitution in a template literal. E.g.:
let foo = "bar";
let str = String.raw`\apple${foo}`;
...ends up being \applebar.
Try String.raw method:
str = String.raw`\apple` // "\apple"
Reference here: String.raw()
\ is an escape character, when followed by a non-special character it doesn't become a literal \. Instead, you have to double it \\.
console.log("\apple"); //-> "apple"
console.log("\\apple"); //-> "\apple"
There is no way to get the original, raw string definition or create a literal string without escape characters.
please try the below one it works for me and I'm getting the output with backslash
String sss="dfsdf\\dfds";
System.out.println(sss);

inputmask for documentum

I know that I can validate an input field by adding a inputmaskvalidator tag. I read the documentum doc :
The mask character string:
: numeric characters
& : all characters
A : alphanumeric characters only
? : alphabetic characters only
U : uppercase alphabetic characters only
L : lowercase alphabetic characters only
Example: date mask ##/##/## permits
the input date 12/24/95 To use one of
the mask characters as a literal
member of the mask string, place a
double slash (\) preceding the
character.
Let's guess I want to accept double only to store it as a double in the content server. What must be the inputmask value?
Something like that?
<dmf:inputmaskvalidator inputmask="#.#" controltovalidate="my_double" name="my_double_validator"/>
or
<dmf:inputmaskvalidator inputmask="##.##" controltovalidate="my_double" name="my_double_validator"/>
You must use other type of validator. Inputmaskvalidator is bad for your purpose. Use for example regexpvalidator. Example you can find on this page:

what characters should be escaped in sql string parameters

I need a complete list of characters that should be escaped in sql string parameters to prevent exceptions. I assume that I need to replace all the offending characters with the escaped version before I pass it to my ObjectDataSource filter parameter.
No, the ObjectDataSource will handle all the escaping for you. Any parametrized query will also require no escaping.
As others have pointed out, in 99% of the cases where someone thinks they need to ask this question, they are doing it wrong. Parameterization is the way to go. If you really need to escape yourself, try to find out if your DB access library offers a function for this (for example, MySQL has mysql_real_escape_string).
SQL Books online:
Search for String Literals:
String Literals
A string literal consists of zero or more characters surrounded by quotation marks. If a string contains quotation marks, these must be escaped in order for the expression to parse. Any two-byte character except \x0000 is permitted in a string, because the \x0000 character is the null terminator of a string.
Strings can include other characters that require an escape sequence. The following table lists escape sequences for string literals.
\a
Alert
\b
Backspace
\f
Form feed
\n
New line
\r
Carriage return
\t
Horizontal tab
\v
Vertical tab
\"
Quotation mark
\
Backslash
\xhhhh
Unicode character in hexadecimal notation
Here's a way I used to get rid of apostrophes. You could do the same thing with other offending characters that you run into. (example in VB.Net)
Dim companyFilter = Trim(Me.ddCompany.SelectedValue)
If (Me.ddCompany.SelectedIndex > 0) Then
filterString += String.Format("LegalName like '{0}'", companyFilter.Replace("'", "''"))
End If
Me.objectDataSource.FilterExpression = filterString
Me.displayGrid.DataBind()