httpagility pack scraping between broken tag - vb.net

i need to scrape a p tag which has h3 tag after it but does not have a closing p tag. It looks like this :
<script ad>asdasdasd</script>
<p>Translation companies are
-----------------------
-----------------------
<h3 class="this_class">mind blown site</h3>
There is no </p> tag so i cannot parse it completely. Now i have two questions :
1) can this be parsed using httpagility xpath ?
2) i have a function to find text between two strings (getbetween). But i have a doubt - If i use "asdasdasd" and " is it always 100% that vb.net will use the script tag which is just above h3 because there are 2-3 same lines - "asdasdasd"
3) Any other method you guys are aware of ?
(had to write in code so html does not mess up)
Regards,

It might be a good idea to post some more "real" html to really help you, at least the tags between the h3 and the p.
Anyway, this should get you the p-Tag from the h3-Tag.
HtmlDocument doc = new HtmlDocument();
doc.Load(... //Load the Html...
//Either of these lines will do
HtmlNode pNode = doc.DocumentNode.SelectSingleNode("//h3[#class='this_class']/preceding-sibling::p");
//HtmlNode pNode = doc.DocumentNode.SelectSingleNode("//h3[contains(text(),'mind blown site')]/preceding-sibling::p");
string pInnerHtml = pNode.NextSibling.InnerHtml; //Has the text "Translation companies are...."

So in general, to get all the nodes from the opening p tag to the start of a tag you don't want, you could do this:
var p = doc.DocumentNode.SelectSingleNode("//p");
var h3 = p.SelectSingleNode("following-sibling::h3[#class='this_class']");
var following = new List<string>();
for (var current = p.NextSibling; current != h3; current = current.NextSibling)
{
following.Add(current.InnerText);
}
var innerText = String.Concat(following);

Related

Update attributes on existing iframe elements in nvarchar column via SQL

I've implemented code inside a ckeditor that replaces the src attribute of iframe elements with data-src and adds the data-cookieconsent attribute. In addition, a placeholder div is added after the iframe element. Regex is used to match the iframe elements in the string.
var value = CKEDITOR.instances.editor1.getData();
const regex = new RegExp('(?:<iframe[^>]*)(?:(?:\/>)|(?:>.*?<\/iframe>))');
const matches = value.match(regex);
if (typeof matches !== "undefined" && matches != null) {
matches.forEach(element => {
if (!element.includes("data-cookieconsent")) {
value = value.replace(element, element.replace("src=", "data-add-placeholder data-cookieconsent=\"marketing\" data-src="))
value = value.replace("</iframe>", "</iframe><div class=\"row justify - content - center\">" +
"<div class=\"cookieconsent-optout-marketing blocked-media-placeholder\">" +
"<div class=\"col-xs-6 col-xs-offset-3\">" +
"<h3>For at se denne video skal vi bruge dit samtykke til at anvende cookies.Venligst tryk på dette link og vfremvist denne video.</h3>" +
"</div></div></div>")
}
});
}
However, I now need to replace the currently existing iframe elements and add the placeholder div in the database using sql. I'm aware that I can use update with the replace function to update parts of a text column, but this seems to require Regex to work, which as I understand, is quite limited in T-SQL.
What would be the best approach to this problem? How can I ensure only the iframe elements in the column are altered?

AmCharts 4 : Can't customize (color, strokeWidth) my series

EDIT : OK, It was my css page which had a rule on path, 'cause I use svg a lot. Removed that rule and the problem was gone !
I'm facing something pretty annoying and which I do not understand.
I'm using amChart to make a XY chart with multiple series. Not that hard.
The thing is, I can't customize my series ! Bullets and legend are ok, but not series.
Here's a screenshot for better understanding :
MyWeirdChart (new OP can't embed images, sorry)
As you can see I have my custom bullet pushed on my series and my legend is exactly what I want for my chart BUT series are staying unchanged.
Here is my JS draw function :
function drawChart(dateArray, casesArray, deathsArray, healedArray, hospitalizationsArray, reanimationsArray) {
am4core.useTheme(am4themes_animated);
var chart = am4core.create("chartdiv", am4charts.XYChart);
chart.data = generateChartData(dateArray, casesArray, deathsArray, healedArray, hospitalizationsArray, reanimationsArray);
var dateAxis = chart.xAxes.push(new am4charts.DateAxis());
var valueAxis = chart.yAxes.push(new am4charts.ValueAxis());
function pushSeries(field, name, color) {
let series = chart.series.push(new am4charts.LineSeries());
series.dataFields.valueY = field;
series.dataFields.dateX = "date";
series.name = name;
series.tooltipText = name + ": [b]{valueY}[/]";
series.stroke = am4core.color(color);
series.strokeWidth = 3;
series.fill = am4core.color(color);
series.fillOpacity = 0.5;
let bullet = series.bullets.push(new am4charts.CircleBullet());
bullet.circle.stroke = am4core.color(color);
bullet.circle.strokeWidth = 2;
bullet.circle.fill = am4core.color(color);
bullet.circle.fillOpacity = 0.5;
bullet.circle.radius = 3;
}
pushSeries("cases", "Cas confirmés", "#32B3E3");
pushSeries("healed", "Guéris", "#00C750");
pushSeries("hospitalizations", "Hospitalisations", "#FFBB33");
pushSeries("reanimations", "Réanimations", "#FE3446");
pushSeries("deaths", "Morts", "black");
chart.cursor = new am4charts.XYCursor();
chart.scrollbarX = new am4core.Scrollbar();
chart.legend = new am4charts.Legend();
chart.cursor.maxTooltipDistance = 0;
}
Did I miss something ? I crawled forums and documentations and I'm now helpless.
My code is in my webpack app.js file. But I include amCharts with HTML scripts,
<script src="https://www.amcharts.com/lib/4/core.js"></script>
<script src="https://www.amcharts.com/lib/4/charts.js"></script>
<script src="https://www.amcharts.com/lib/4/themes/animated.js"></script>
not with webpack import. But I guess that if this was the problem, I would not be able to draw a chart at all.
OK, It was my css page which had a rule on path, 'cause I use svg a lot. Removed that rule and the problem was gone !

How to add data-attributes to a Bootstrap Treeview?

I am using bootstrap-treeview 1.2.0 (from Jon Miles).
My goal is to add custom data-attributes to my list items' markup, e.g.
<li class="list-group-item node-tree" data-id="100" data-type="user" ...>
I tried to follow these instructions see here and here is part of my JSON:
[{"text":"Root","icon":null,"data-id":1,"data-type":"branch","nodes":[{"text":"Steve","icon":null,"data-id":17, "data-type":"user","nodes":...
To me the JSON looks good. But none of my data-attributes gets rendered in the markup.
Any ideas?
Sorry, I see it's too late. I searched about this and couldn't find anything. But you can
change bootstrap-treeview.js file a bit. There is some attribute set code in buildTree function. It's looking like this:
Tree.prototype.buildTree = function (nodes, level) {
if (!nodes) return;
level += 1;
var _this = this;
$.each(nodes, function addNodes(id, node) {
var treeItem = $(_this.template.item)
.addClass('node-' + _this.elementId)
.addClass(node.state.checked ? 'node-checked' : '')
.addClass(node.state.disabled ? 'node-disabled': '')
.addClass(node.state.selected ? 'node-selected' : '')
.addClass(node.searchResult ? 'search-result' : '')
.attr('data-nodeid', node.nodeId)
.attr('style', _this.buildStyleOverride(node));
......................SOME CODES ........SOME CODES..........................}
You can add :
.attr('data-type', node.dataType)
.attr('data-id', node.dataId)
And then change the object(json) like this:
[{"text":"Root","icon":null,"dataId":1,"dataType":"branch","nodes":
[{"text":"Steve","icon":null,"dataId":17, "dataType":"user","nodes":...
The link provided simply instructs how to expand the node object with additional properties. There is no correlation between a node's properties and the attributes assigned in HTML.
You might want to do something like this:
var allNodes = $('#tree').treeview('getNodes);
$(allNodes).each(function(index, element) {
$(this.$el[0]).attr('data-attribute-name', this.data-attribute-name);
});

The ':' character, hexadecimal value 0x3A, cannot be included in a name

I saw this question already, but I didnt see an answer..
So I get this error:
The ':' character, hexadecimal value 0x3A, cannot be included in a name.
On this code:
XDocument XMLFeed = XDocument.Load("http://feeds.foxnews.com/foxnews/most-popular?format=xml");
XNamespace content = "http://purl.org/rss/1.0/modules/content/";
var feeds = from feed in XMLFeed.Descendants("item")
select new
{
Title = feed.Element("title").Value,
Link = feed.Element("link").Value,
pubDate = feed.Element("pubDate").Value,
Description = feed.Element("description").Value,
MediaContent = feed.Element(content + "encoded")
};
foreach (var f in feeds.Reverse())
{
....
}
An item looks like that:
<rss>
<channel>
....items....
<item>
<title>Pentagon confirms plan to create new spy agency</title>
<link>http://feeds.foxnews.com/~r/foxnews/most-popular/~3/lVUZwCdjVsc/</link>
<category>politics</category>
<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/" />
<pubDate>Tue, 24 Apr 2012 12:44:51 PDT</pubDate>
<guid isPermaLink="false">http://www.foxnews.com/politics/2012/04/24/pentagon-confirms-plan-to-create-new-spy-agency/</guid>
<content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[|http://global.fncstatic.com/static/managed/img/Politics/panetta_hearing_030712.jpg<img src="http://feeds.feedburner.com/~r/foxnews/most-popular/~4/lVUZwCdjVsc" height="1" width="1"/>]]></content:encoded>
<description>The Pentagon confirmed Tuesday that it is carving out a brand new spy agency expected to include several hundred officers focused on intelligence gathering around the world.&amp;#160;</description>
<dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">2012-04-4T19:44:51Z</dc:date>
<feedburner:origLink>http://www.foxnews.com/politics/2012/04/24/pentagon-confirms-plan-to-create-new-spy-agency/</feedburner:origLink>
</item>
....items....
</channel>
</rss>
All I want is to get the "http://global.fncstatic.com/static/managed/img/Politics/panetta_hearing_030712.jpg", and before that check if content:encoded exists..
Thanks.
EDIT:
I've found a sample that I can show and edit the code that tries to handle it..
EDIT2:
I've done it in the ugly way:
text.Replace("content:encoded", "contentt").Replace("xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"","");
and then get the element in the normal way:
MediaContent = feed.Element("contentt").Value
The following code
static void Main(string[] args)
{
var XMLFeed = XDocument.Parse(
#"<rss>
<channel>
....items....
<item>
<title>Pentagon confirms plan to create new spy agency</title>
<link>http://feeds.foxnews.com/~r/foxnews/most-popular/~3/lVUZwCdjVsc/</link>
<category>politics</category>
<dc:creator xmlns:dc='http://purl.org/dc/elements/1.1/' />
<pubDate>Tue, 24 Apr 2012 12:44:51 PDT</pubDate>
<guid isPermaLink='false'>http://www.foxnews.com/politics/2012/04/24/pentagon-confirms-plan-to-create-new-spy-agency/</guid>
<content:encoded xmlns:content='http://purl.org/rss/1.0/modules/content/'><![CDATA[|http://global.fncstatic.com/static/managed/img/Politics/panetta_hearing_030712.jpg<img src='http://feeds.feedburner.com/~r/foxnews/most-popular/~4/lVUZwCdjVsc' height='1' width='1'/>]]></content:encoded>
<description>The Pentagon confirmed Tuesday that it is carving out a brand new spy agency expected to include several hundred officers focused on intelligence gathering around the world.&amp;#160;</description>
<dc:date xmlns:dc='http://purl.org/dc/elements/1.1/'>2012-04-4T19:44:51Z</dc:date>
<!-- <feedburner:origLink>http://www.foxnews.com/politics/2012/04/24/pentagon-confirms-plan-to-create-new-spy-agency/</feedburner:origLink> -->
</item>
....items....
</channel>
</rss>");
XNamespace contentNs = "http://purl.org/rss/1.0/modules/content/";
var feeds = from feed in XMLFeed.Descendants("item")
select new
{
Title = (string)feed.Element("title"),
Link = (string)feed.Element("link"),
pubDate = (string)feed.Element("pubDate"),
Description = (string)feed.Element("description"),
MediaContent = GetMediaContent((string)feed.Element(contentNs + "encoded"))
};
foreach(var item in feeds)
{
Console.WriteLine(item);
}
}
private static string GetMediaContent(string content)
{
int imgStartPos = content.IndexOf("<img");
if(imgStartPos > 0)
{
int startPos = content[0] == '|' ? 1 : 0;
return content.Substring(startPos, imgStartPos - startPos);
}
return string.Empty;
}
results in:
{ Title = Pentagon confirms plan to create new spy agency, Link = http://feeds.f
oxnews.com/~r/foxnews/most-popular/~3/lVUZwCdjVsc/, pubDate = Tue, 24 Apr 2012 1
2:44:51 PDT, Description = The Pentagon confirmed Tuesday that it is carving out
a brand new spy agency expected to include several hundred officers focused on
intelligence gathering around the world. , MediaContent = http://global
.fncstatic.com/static/managed/img/Politics/panetta_hearing_030712.jpg }
Press any key to continue . . .
A few points:
You never want to treat Xml as text - in your case you removed the namespace declaration but actually if the namespace was declared inline (i.e. without binding to the prefix) or a different prefix would be defined your code would not work even though semantically both documents would be equivalent
Unless you know what's inside CDATA and how to treat it you always want to treat is as text. If you know it's something else you can treat it differently after parsing - see my elaborate on CDATA below for more details
To avoid NullReferenceExceptions if the element is missing I used explicit conversion operator (string) instead of invoking .Value
the Xml you posted was not a valid xml - there was missing namespace Uri for feedburner prefix
This is no longer related to the problem but may be helpful for some folks so I am leaving it
As far as the contents of the encode element is considered it is inside CDATA section. What's inside CDATA section is not an Xml but plain text. CDATA is usually used to not have to encode '<', '>', '&' characters (without CDATA they would have to be encoded as < > and & to not break the Xml document itself) but the Xml processor treat characters in the CDATA as if they were encoded (or to be more correct in encodes them). The CDATA is convenient if you want to embed html because textually the embedded content looks like the original yet it won't break your xml if the html is not a well-formed Xml. Since the CDATA content is not an Xml but text it is not possible to treat it as Xml. You will probably need to treat is as text and use for instance regular expressions. If you know it is a valid Xml you can load the contents to an XElement again and process it. In your case you have got mixed content so it is not easy to do unless you use a little dirty hack. Everything would be easy if you have just one top level element instead of mixed content. The hack is to add the element to avoid all the hassle. Inside the foreach look you can do something like this:
var mediaContentXml = XElement.Parse("<content>" + (string)item.MediaContent + "</content>");
Console.WriteLine((string)mediaContentXml.Element("img").Attribute("src"));
Again it's not pretty and it is a hack but it will work if the content of the encoded element is valid Xml. The more correct way of doing this is to us XmlReader with ConformanceLevel set to Fragment and recognize all kinds of nodes appropriately to create a corresponding Linq to Xml node.
You should use XNamespace:
XNamespace content = "...";
// later in your code ...
MediaContent = feed.Element(content + "encoded")
See more details here.
(Of course, you the string to be assigned to content is the same as in xmlns:content="...").

Content templates rendering in TYPO3

I've got a strange problem connected with content rendering.
I use following code to grab the content:
lib.otherContent = CONTENT
lib.otherContent {
table = tt_content
select {
pidInList = this
orderBy = sorting
where = colPos=0
languageField = sys_language_uid
}
renderObj = COA
renderObj {
10 = TEXT
10.field = header
10.wrap = <h2>|</h2>
20 = TEXT
20.field = bodytext
20.wrap = <div class="article">|</div>
}
}
and everything works fine, except that I'd like to use also predefined column-content templates other than simple text (Text with image, Images only, Bullet list etc.).
The question is: with what I have to replace renderObj = COA and the rest between the brackets to let the TYPO3 display it properly?
Thanks,
I.
The available cObjects are more or less listed in TSRef, chapter 8.
TypoScript for rendering Text w/image can be found in typo3/sysext/css_styled_content/static/v4.3/setup.txt at line 724, and in the neighborhood you'll find e.g. bullets (below) and image (above), which is referenced in textpic line 731. Variants of this is what you'll write in your renderObj.
You will find more details in the file typo3/sysext/cms/tslib/class.tslib_content.php, where e.g. text w/image is found at or around line 897 and is called IMGTEXT (do a case-sensitive search). See also around line 403 in typo3/sysext/css_styled_content/pi1/class.cssstyledcontent_pi1.php, where the newer css-based rendering takes place.