How to cut out part of a string and insert into other string in vb.net? - vb.net

I have unknown number of strings (assume 3 )
<li>A</li>
<li>B</li>
<li>C</li>
I want to cut out a.html, b.html and c.html. Then put them into the following structure with given a string MyLink = http://ccc.com/
randomlinks[0]="http://ccc.com/a.html"
randomlinks[1]="http://ccc.com/b.html"
randomlinks[2]="http://ccc.com/c.html"
What functions in Vb.net allow me to do that?

If your strings always have this exact format, String.Split is your friend.
myString.Split(""""c)
with myString containing the first of your strings will yield a three-element array with the following entries:
<li><a href=
a.html
>A</a></li>
How to proceed from there should be obvious and is left as an exercise to the reader. :-)
If the strings don't always have this exact format, a HTML parsing engine is probably the right tool.

Related

How to write xpath for following example?

For example, I have div tag that has two attributes.
class='hello#123' text='321#he#321llo#321'
<div> class='hello#123' text='321#he#321llo#321'></div>
Here, I want to write xpath for both class and text attributes but numbers may change dynamically. ie., "hello#123" may become "345" when we reload. "321#he#321llo#321" may become "567#he#456llo#321".
Note: Need to write xpath in single line not separately.
Assuming that you have the (corrected) two-attribute-HTML
<div class='hello#123' text='321#he#321llo#321'>...</div>
you can select it using the following, for example:
Using the contains() function
//div[contains(#class,'hello') and contains(#text,'#he#')]
This is quite specific and only applicable if the "hello" is always split in the same way
Using the translate() function to mask everything except the chars for "hello"
//div[translate(#class,'#0123456789','')='hello' and translate(#text,'#0123456789','')='hello']
This removes all # chars and digits and checks if the remaining string is "hello"
I guess combining these two approaches you will be able to create your own XPath expression fitting your needs. The patterns you provided were not fully clear, so this may only approach a good enough solution.

Rich string with cell alignement formatting using xlsxwriter

I'm writing an HTML parser that generates an XLSX file from an HTML table. The table contains colored data such as:
<td>Some <mark color="red"><b>coloured, bolded</b></mark> text</td>
During parsing, I generate an array of tokens ready for passing to write_rich_string or write_string depending on how many strings are generated by the HTML parser.
There are quite a few cases where the HTML parser generates a array of 2 strings and a format, to be written to a cell, like:
['string 1', 'string2', format]
I cannot use write_string because there is more than 1 string. But I cannot use write_rich_string either, because write_rich_string pops the format and chokes on an array of 2 strings. Passing the following data to write_rich_string does not raise any issue, which feels strange in comparison:
['string1', 'string2', 'string3', format]
Am I missing something?
A workaround could have been to join string1 and string2, and then to feed that to write_string. I though this made the code unnecessarily complex.
I decided to use instead a 3rd, user-invisible string. This is is easily achieveable thanks the zero-width-space (\u200b):
string_parts = [...]
count = len(string_parts)
if count > 2:
wb.write_rich_string(row, col, *string_parts)
elif count == 2:
string_parts = ['\u200b'] + string_parts
wb.write_rich_string(row, col, *string_parts)
elif count == 1:
wb.write_string(row, col, string_parts[0])

How to remove HTML tags from column in redshift? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Regular expression to remove HTML tags
Is there an expression which will get the value between two HTML tags?
Given this:
<td class="played">0</td>
I am looking for an expression which will return 0, stripping the <td> tags.
You should not attempt to parse HTML with regex. HTML is not a regular language, so any regex you come up with will likely fail on some esoteric edge case. Please refer to the seminal answer to this question for specifics. While mostly formatted as a joke, it makes a very good point.
The following examples are Java, but the regex will be similar -- if not identical -- for other languages.
String target = someString.replaceAll("<[^>]*>", "");
Assuming your non-html does not contain any < or > and that your input string is correctly structured.
If you know they're a specific tag -- for example you know the text contains only <td> tags, you could do something like this:
String target = someString.replaceAll("(?i)<td[^>]*>", "");
Edit:
Ωmega brought up a good point in a comment on another post that this would result in multiple results all being squished together if there were multiple tags.
For example, if the input string were <td>Something</td><td>Another Thing</td>, then the above would result in SomethingAnother Thing.
In a situation where multiple tags are expected, we could do something like:
String target = someString.replaceAll("(?i)<td[^>]*>", " ").replaceAll("\\s+", " ").trim();
This replaces the HTML with a single space, then collapses whitespace, and then trims any on the ends.
A trivial approach would be to replace
<[^>]*>
with nothing. But depending on how ill-structured your input is that may well fail.
You could do it with jsoup http://jsoup.org/
Whitelist whitelist = Whitelist.none();
String cleanStr = Jsoup.clean(yourText, whitelist);

XPath - not able to find an element with children text

<a>
This is <var>Me</var> and That is <var> You</var>
</a>
I can find an element "a" which contains "This is" by following code:
//a[contains(text(),'This is')]
But I am not able to find element "a" which contains "This is Me and That is You".
//a[contains(text(),'This is Me and That is You')]
Is there a way to find an element with children text as well?
I am not sure if this what you need but you can use string() to get the result as required,
//a[string()='This is Me and That is You']
The caveat however will be that you need to have precised information about the String being used.
See working example here.
This also can be find using normalize-space() function of xpath which strips leading and trailing white-space from a string, replaces sequences of whitespace characters by a single space, and returns the resulting string as below :-
//a[normalize-space()='This is Me and That is You']

T-SQL code for converting nvarchar string to UTF-8 (for URL percent-encoding)

I need to generate an URL string for a SSRS report (in order to link it with our CRM software). The report name is in Hebrew. When I send the URL string (with Heb) to Internet Explorer, it doesn't recognize the address because it isn't encoded with Percent-encoding (BTW, it works fine in Firefox). (Sending a URL with English only does work fine that way.)
Anyway, I tried to perform the encoding. I succeeded converting it to URI with UNICODE characters. I need to get the URI in UTF-8. For example, the letter 'י' should be converted into '%d7%99' and not to '%05%D9'.
I included a link:
A table with the codes, for your use, if needed.
I need the conversion\encoding function for 1 character. I can build the rest of the script / function for the complete string by myself.
I used a script which used the master.sys.fn_varbintohexstr function. As I said, though, the results aren't proper for IE.
the following:
SELECT master.sys.fn_varbintohexstr((CAST (N'י' AS varbinary)))
will get 0xd905, which I formatted into percent encoding. I should get 'd7 99' instead.
wrap up:
I convert an Hebrew character into URI percent encoding. I get a unicode result. I wish > to get a utf8 result.
Input = 'י'. Current output = %d9. Wanted output = %d7%99
How can I get those results?
I have had to deal with a few similar problems and there are two approaches that you may wish to consider; the first is to transform your data into HTML in the query and then render the result as HTML in the RDL, the second is to use JQuery to identify those cells with the incorrect value on the client and then transform that cell (again, using JQuery). The benefit of the second option is that if the server rendering is working on Firefox the transformation overhead doesn't get invoked. The downside is that if you are not rendering the report as HTML it won't work.
For the first option, in the select statement you would need to alter the appropriate column to produce a nvarchar value that looks like
<span style="font=yourfont;" charset="UTF-8">linkname</span>
With that string as data you then assign that to the appropriate columns (or cells, as needed)
In the RDL designer drag a placeholder for your field onto the designer and right click the placeholder and select placeholder properties then you can select to display the content as HTML.