parse html with single or double quotes - rebol

When using the parse dialect, how to parse tags that have properties enclosed by ' or '"`, as in:
thru <h2 class="txt-medium txt-bold">
thru <h2 class='txt-medium txt-bold'>
One way was to do:
thru {<h2 class=} thru {txt-medium txt-bold} thru ">"
Tried to use the | or operator but with no success. Can I use the | operator to parse the tag?

Yes, you can use | operator, but defining a charset is better in this case:
delimiter: charset [#"^"" #"'"]
single: {<h2 class='txt-medium txt-bold'>}
double: {<h2 class="txt-medium txt-bold">}
>> parse single [thru "class=" delimiter copy values to delimiter thru ">"] values
== "txt-medium txt-bold"
>> parse double [thru "class=" delimiter copy values to delimiter thru ">"] values
== "txt-medium txt-bold"
The golden rule is to avoid to and thru when possible and define what to match.

Related

How Splunk field contains double quote

When use Splunk, if we have log
key="hello"
Search in Splunk by
* | table a
we can see hello value
We might print out value with double quote, if we don't escape
key="hel"lo"
We'll see key value is hel. Value breaks before the position of quote
If try to escape double quote with \,
key="hel\"lo"
We'll see key value is hel\
Is use single quote around
key='hel"lo'
We'll see key value include single quotes, it's 'hello"lo'. In this case, search criteria should be
* key="'ab\"c'" | table a
single quotes are parts of value
Question is how to include double quote as part of value?
Ideally, there should be a way to escape double quotes, input
key="hel\"lo"
should match the query
key="hel\"lo"
But it's not.
I have this problem for many years. Splunk value is dynamic, it could contain double quotes. I'm not going to use JSON my log format.
I'm curious why there is no answer in Splunk's official website.
Someone can help? Thanks.
| makeresults
| eval bub="hell\"o"
| table bub
Puts a double-quote mark right in the middle of the bub field
If you want to search for the double-quote mark, use | where match() like this:
| where match(bub,"\"")
Ideally, the data source would not generate events with embedded quotes without escaping them. Otherwise, how would a reader know the quote is embedded and not mismatched? This is the problem Splunk is struggling with.
The fix is to create your own parser using transforms.
In props.conf:
[mysourcetype]
TRANSFORMS-parseKey = parse_key
In transforms.conf:
[parse_key]
REGEX = (\w+)="(.*\".*)"
FORMAT = $1::$2
Of course, this regex is simplified. You'll need to modify it to match your data.

Extract Strings Using AWS Athena or PrestoDB Regex Function

I have a table named
logs
and column named
url
whose value is like this
https://api.abc.com:443/xyz/live/dashboard_critical/v1?cust_id=-1111%7C-1111%7C-1111%7C-1111%7C-1111%7C-1111%7C-1111%7C-1111%7C-1111%7C-1111%7C-1111&model_id=&startdate=2021-06-29&enddate=2021-07-28&wcombo=&site_id=&instance=&unit=&combo=&source=&priority=&verification=&critical=Yes%7Ctrue&percentile=95&startevent=receiveddatetime&endevent=uploadtohostdatetime
I trying to extract all strings after '/' OR '?' OR '%7C' OR '&'
I am able to extract it individually but not all 4 together
SELECT regexp_extract_all(url, '\d+[/]*')
FROM logs.url
WHERE request_verb='GET'
AND REGEXP_LIKE(url, 'https://api.abc.com:443/xyz/live/');
I am looking for output like below
[https:,api.abc.com:443,xyz,live,dashboard_critical,v1,cust_id=-1111,-1111,-1111,-1111,-1111,-1111,-1111,-1111,-1111,-1111,-1111,model_id=,startdate=2021-06-29,enddate=2021-07-28,wcombo=,site_id=,instance=,unit=,combo=,source=,priority=,verification=,critical=Yes,true,percentile=95,startevent=receiveddatetime,endevent=uploadtohostdatetime]
You can use regexp_split(str, regexp) function, as a regexp pattern concatenate all values by wich string should be splitted using | (OR in regexp), it will produce array required. Note: some characters have special meaning in Presto CLI or regexp and need shielding.
select regexp_split(url,'/+|\?|%%7C|&')
Regexp meaning:
/+ - one or more slash
| - OR
\? - literally ?, needs shielding by backslash because ? has special meaning in regexp
| - OR
%%7C - literally %7C. % has special meaning in Presto CLI and presto is using double-character notation for shielding.
| - OR
& - literally &
After you got an ARRAY, you can concatenate it or explode, etc

REGEXP_REPLACE URL BIGQUERY

I have two types of URL's which I would need to clean, they look like this:
["//xxx.com/se/something?SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"]
["//www.xxx.com/se/car?p_color_car=White?SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"]
The outcome I want is;
SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"
I want to remove the brackets and everything up to SE, the URLS differ so I want to remove:
First URL
["//xxx.com/se/something?
Second URL:
["//www.xxx.com/se/car?p_color_car=White?
I can't get my head around it,I've tried this .*\/ . But it will still keep strings I don't want such as:
(1 url) =
something?
(2 url) car?p_color_car=White?
You can use
regexp_replace(FinalUrls, r'.*\?|"\]$', '')
See the regex demo
Details
.*\? - any zero or more chars other than line breakchars, as many as possible and then ? char
| - or
"\]$ - a "] substring at the end of the string.
Mind the regexp_replace syntax, you can't omit the replacement argument, see reference:
REGEXP_REPLACE(value, regexp, replacement)
Returns a STRING where all substrings of value that match regular
expression regexp are replaced with replacement.
You can use backslashed-escaped digits (\1 to \9) within the
replacement argument to insert text matching the corresponding
parenthesized group in the regexp pattern. Use \0 to refer to the
entire matching text.

Printing Unnecessary escape character [duplicate]

I tried many ways to get a single backslash from an executed (I don't mean an input from html).
I can get special characters as tab, new line and many others then escape them to \\t or \\n or \\(someother character) but I cannot get a single backslash when a non-special character is next to it.
I don't want something like:
str = "\apple"; // I want this, to return:
console.log(str); // \apple
and if I try to get character at 0 then I get a instead of \.
(See ES2015 update at the end of the answer.)
You've tagged your question both string and regex.
In JavaScript, the backslash has special meaning both in string literals and in regular expressions. If you want an actual backslash in the string or regex, you have to write two: \\.
The following string starts with one backslash, the first one you see in the literal is an escape character starting an escape sequence. The \\ escape sequence tells the parser to put a single backslash in the string:
var str = "\\I have one backslash";
The following regular expression will match a single backslash (not two); again, the first one you see in the literal is an escape character starting an escape sequence. The \\ escape sequence tells the parser to put a single backslash character in the regular expression pattern:
var rex = /\\/;
If you're using a string to create a regular expression (rather than using a regular expression literal as I did above), note that you're dealing with two levels: The string level, and the regular expression level. So to create a regular expression using a string that matches a single backslash, you end up using four:
// Matches *one* backslash
var rex = new RegExp("\\\\");
That's because first, you're writing a string literal, but you want to actually put backslashes in the resulting string, so you do that with \\ for each one backslash you want. But your regex also requires two \\ for every one real backslash you want, and so it needs to see two backslashes in the string. Hence, a total of four. This is one of the reasons I avoid using new RegExp(string) whenver I can; I get confused easily. :-)
ES2015 and ES2018 update
Fast-forward to 2015, and as Dolphin_Wood points out the new ES2015 standard gives us template literals, tag functions, and the String.raw function:
// Yes, this unlikely-looking syntax is actually valid ES2015
let str = String.raw`\apple`;
str ends up having the characters \, a, p, p, l, and e in it. Just be careful there are no ${ in your template literal, since ${ starts a substitution in a template literal. E.g.:
let foo = "bar";
let str = String.raw`\apple${foo}`;
...ends up being \applebar.
Try String.raw method:
str = String.raw`\apple` // "\apple"
Reference here: String.raw()
\ is an escape character, when followed by a non-special character it doesn't become a literal \. Instead, you have to double it \\.
console.log("\apple"); //-> "apple"
console.log("\\apple"); //-> "\apple"
There is no way to get the original, raw string definition or create a literal string without escape characters.
please try the below one it works for me and I'm getting the output with backslash
String sss="dfsdf\\dfds";
System.out.println(sss);

what characters should be escaped in sql string parameters

I need a complete list of characters that should be escaped in sql string parameters to prevent exceptions. I assume that I need to replace all the offending characters with the escaped version before I pass it to my ObjectDataSource filter parameter.
No, the ObjectDataSource will handle all the escaping for you. Any parametrized query will also require no escaping.
As others have pointed out, in 99% of the cases where someone thinks they need to ask this question, they are doing it wrong. Parameterization is the way to go. If you really need to escape yourself, try to find out if your DB access library offers a function for this (for example, MySQL has mysql_real_escape_string).
SQL Books online:
Search for String Literals:
String Literals
A string literal consists of zero or more characters surrounded by quotation marks. If a string contains quotation marks, these must be escaped in order for the expression to parse. Any two-byte character except \x0000 is permitted in a string, because the \x0000 character is the null terminator of a string.
Strings can include other characters that require an escape sequence. The following table lists escape sequences for string literals.
\a
Alert
\b
Backspace
\f
Form feed
\n
New line
\r
Carriage return
\t
Horizontal tab
\v
Vertical tab
\"
Quotation mark
\
Backslash
\xhhhh
Unicode character in hexadecimal notation
Here's a way I used to get rid of apostrophes. You could do the same thing with other offending characters that you run into. (example in VB.Net)
Dim companyFilter = Trim(Me.ddCompany.SelectedValue)
If (Me.ddCompany.SelectedIndex > 0) Then
filterString += String.Format("LegalName like '{0}'", companyFilter.Replace("'", "''"))
End If
Me.objectDataSource.FilterExpression = filterString
Me.displayGrid.DataBind()