The ':' character, hexadecimal value 0x3A, cannot be included in a name - .net-4.0

I saw this question already, but I didnt see an answer..
So I get this error:
The ':' character, hexadecimal value 0x3A, cannot be included in a name.
On this code:
XDocument XMLFeed = XDocument.Load("http://feeds.foxnews.com/foxnews/most-popular?format=xml");
XNamespace content = "http://purl.org/rss/1.0/modules/content/";
var feeds = from feed in XMLFeed.Descendants("item")
select new
{
Title = feed.Element("title").Value,
Link = feed.Element("link").Value,
pubDate = feed.Element("pubDate").Value,
Description = feed.Element("description").Value,
MediaContent = feed.Element(content + "encoded")
};
foreach (var f in feeds.Reverse())
{
....
}
An item looks like that:
<rss>
<channel>
....items....
<item>
<title>Pentagon confirms plan to create new spy agency</title>
<link>http://feeds.foxnews.com/~r/foxnews/most-popular/~3/lVUZwCdjVsc/</link>
<category>politics</category>
<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/" />
<pubDate>Tue, 24 Apr 2012 12:44:51 PDT</pubDate>
<guid isPermaLink="false">http://www.foxnews.com/politics/2012/04/24/pentagon-confirms-plan-to-create-new-spy-agency/</guid>
<content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[|http://global.fncstatic.com/static/managed/img/Politics/panetta_hearing_030712.jpg<img src="http://feeds.feedburner.com/~r/foxnews/most-popular/~4/lVUZwCdjVsc" height="1" width="1"/>]]></content:encoded>
<description>The Pentagon confirmed Tuesday that it is carving out a brand new spy agency expected to include several hundred officers focused on intelligence gathering around the world.&amp;#160;</description>
<dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">2012-04-4T19:44:51Z</dc:date>
<feedburner:origLink>http://www.foxnews.com/politics/2012/04/24/pentagon-confirms-plan-to-create-new-spy-agency/</feedburner:origLink>
</item>
....items....
</channel>
</rss>
All I want is to get the "http://global.fncstatic.com/static/managed/img/Politics/panetta_hearing_030712.jpg", and before that check if content:encoded exists..
Thanks.
EDIT:
I've found a sample that I can show and edit the code that tries to handle it..
EDIT2:
I've done it in the ugly way:
text.Replace("content:encoded", "contentt").Replace("xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"","");
and then get the element in the normal way:
MediaContent = feed.Element("contentt").Value

The following code
static void Main(string[] args)
{
var XMLFeed = XDocument.Parse(
#"<rss>
<channel>
....items....
<item>
<title>Pentagon confirms plan to create new spy agency</title>
<link>http://feeds.foxnews.com/~r/foxnews/most-popular/~3/lVUZwCdjVsc/</link>
<category>politics</category>
<dc:creator xmlns:dc='http://purl.org/dc/elements/1.1/' />
<pubDate>Tue, 24 Apr 2012 12:44:51 PDT</pubDate>
<guid isPermaLink='false'>http://www.foxnews.com/politics/2012/04/24/pentagon-confirms-plan-to-create-new-spy-agency/</guid>
<content:encoded xmlns:content='http://purl.org/rss/1.0/modules/content/'><![CDATA[|http://global.fncstatic.com/static/managed/img/Politics/panetta_hearing_030712.jpg<img src='http://feeds.feedburner.com/~r/foxnews/most-popular/~4/lVUZwCdjVsc' height='1' width='1'/>]]></content:encoded>
<description>The Pentagon confirmed Tuesday that it is carving out a brand new spy agency expected to include several hundred officers focused on intelligence gathering around the world.&amp;#160;</description>
<dc:date xmlns:dc='http://purl.org/dc/elements/1.1/'>2012-04-4T19:44:51Z</dc:date>
<!-- <feedburner:origLink>http://www.foxnews.com/politics/2012/04/24/pentagon-confirms-plan-to-create-new-spy-agency/</feedburner:origLink> -->
</item>
....items....
</channel>
</rss>");
XNamespace contentNs = "http://purl.org/rss/1.0/modules/content/";
var feeds = from feed in XMLFeed.Descendants("item")
select new
{
Title = (string)feed.Element("title"),
Link = (string)feed.Element("link"),
pubDate = (string)feed.Element("pubDate"),
Description = (string)feed.Element("description"),
MediaContent = GetMediaContent((string)feed.Element(contentNs + "encoded"))
};
foreach(var item in feeds)
{
Console.WriteLine(item);
}
}
private static string GetMediaContent(string content)
{
int imgStartPos = content.IndexOf("<img");
if(imgStartPos > 0)
{
int startPos = content[0] == '|' ? 1 : 0;
return content.Substring(startPos, imgStartPos - startPos);
}
return string.Empty;
}
results in:
{ Title = Pentagon confirms plan to create new spy agency, Link = http://feeds.f
oxnews.com/~r/foxnews/most-popular/~3/lVUZwCdjVsc/, pubDate = Tue, 24 Apr 2012 1
2:44:51 PDT, Description = The Pentagon confirmed Tuesday that it is carving out
a brand new spy agency expected to include several hundred officers focused on
intelligence gathering around the world. , MediaContent = http://global
.fncstatic.com/static/managed/img/Politics/panetta_hearing_030712.jpg }
Press any key to continue . . .
A few points:
You never want to treat Xml as text - in your case you removed the namespace declaration but actually if the namespace was declared inline (i.e. without binding to the prefix) or a different prefix would be defined your code would not work even though semantically both documents would be equivalent
Unless you know what's inside CDATA and how to treat it you always want to treat is as text. If you know it's something else you can treat it differently after parsing - see my elaborate on CDATA below for more details
To avoid NullReferenceExceptions if the element is missing I used explicit conversion operator (string) instead of invoking .Value
the Xml you posted was not a valid xml - there was missing namespace Uri for feedburner prefix
This is no longer related to the problem but may be helpful for some folks so I am leaving it
As far as the contents of the encode element is considered it is inside CDATA section. What's inside CDATA section is not an Xml but plain text. CDATA is usually used to not have to encode '<', '>', '&' characters (without CDATA they would have to be encoded as < > and & to not break the Xml document itself) but the Xml processor treat characters in the CDATA as if they were encoded (or to be more correct in encodes them). The CDATA is convenient if you want to embed html because textually the embedded content looks like the original yet it won't break your xml if the html is not a well-formed Xml. Since the CDATA content is not an Xml but text it is not possible to treat it as Xml. You will probably need to treat is as text and use for instance regular expressions. If you know it is a valid Xml you can load the contents to an XElement again and process it. In your case you have got mixed content so it is not easy to do unless you use a little dirty hack. Everything would be easy if you have just one top level element instead of mixed content. The hack is to add the element to avoid all the hassle. Inside the foreach look you can do something like this:
var mediaContentXml = XElement.Parse("<content>" + (string)item.MediaContent + "</content>");
Console.WriteLine((string)mediaContentXml.Element("img").Attribute("src"));
Again it's not pretty and it is a hack but it will work if the content of the encoded element is valid Xml. The more correct way of doing this is to us XmlReader with ConformanceLevel set to Fragment and recognize all kinds of nodes appropriately to create a corresponding Linq to Xml node.

You should use XNamespace:
XNamespace content = "...";
// later in your code ...
MediaContent = feed.Element(content + "encoded")
See more details here.
(Of course, you the string to be assigned to content is the same as in xmlns:content="...").

Related

Update attributes on existing iframe elements in nvarchar column via SQL

I've implemented code inside a ckeditor that replaces the src attribute of iframe elements with data-src and adds the data-cookieconsent attribute. In addition, a placeholder div is added after the iframe element. Regex is used to match the iframe elements in the string.
var value = CKEDITOR.instances.editor1.getData();
const regex = new RegExp('(?:<iframe[^>]*)(?:(?:\/>)|(?:>.*?<\/iframe>))');
const matches = value.match(regex);
if (typeof matches !== "undefined" && matches != null) {
matches.forEach(element => {
if (!element.includes("data-cookieconsent")) {
value = value.replace(element, element.replace("src=", "data-add-placeholder data-cookieconsent=\"marketing\" data-src="))
value = value.replace("</iframe>", "</iframe><div class=\"row justify - content - center\">" +
"<div class=\"cookieconsent-optout-marketing blocked-media-placeholder\">" +
"<div class=\"col-xs-6 col-xs-offset-3\">" +
"<h3>For at se denne video skal vi bruge dit samtykke til at anvende cookies.Venligst tryk på dette link og vfremvist denne video.</h3>" +
"</div></div></div>")
}
});
}
However, I now need to replace the currently existing iframe elements and add the placeholder div in the database using sql. I'm aware that I can use update with the replace function to update parts of a text column, but this seems to require Regex to work, which as I understand, is quite limited in T-SQL.
What would be the best approach to this problem? How can I ensure only the iframe elements in the column are altered?

VTD-XML creating an XML document

I am a bit confused on how to do this, all of the docs / examples show how to read and edit xml docs but there doesn't seem to be any clear way of creating an xml from scratch, I would rather not have to ship my program with a dummy xml file in order to edit one. any ideas? thanks.
What you could do instead would be to just hard-code an empty document like this:
byte[] emptyDoc = "<?xml version='1.0' encoding='UTF-8'?><root></root>".getBytes("UTF-8");
And then use that to create your VTDGen and XMLModifier, and start adding elements:
VTDGen vg = new VTDGen();
vg.setDoc(emptyDoc);
vg.parse(true);
VTDNav vn = vg.getNav();
XMLModifier xm = new XMLModifier(vn);
// Cursor is at root, update Root Element Name
xm.updateElementName("employee");
xm.insertAttribute(" id='6'");
xm.insertAfterHead("<name>Bob Smith</name>");
vn = xm.outputAndReparse();
// etc...

httpagility pack scraping between broken tag

i need to scrape a p tag which has h3 tag after it but does not have a closing p tag. It looks like this :
<script ad>asdasdasd</script>
<p>Translation companies are
-----------------------
-----------------------
<h3 class="this_class">mind blown site</h3>
There is no </p> tag so i cannot parse it completely. Now i have two questions :
1) can this be parsed using httpagility xpath ?
2) i have a function to find text between two strings (getbetween). But i have a doubt - If i use "asdasdasd" and " is it always 100% that vb.net will use the script tag which is just above h3 because there are 2-3 same lines - "asdasdasd"
3) Any other method you guys are aware of ?
(had to write in code so html does not mess up)
Regards,
It might be a good idea to post some more "real" html to really help you, at least the tags between the h3 and the p.
Anyway, this should get you the p-Tag from the h3-Tag.
HtmlDocument doc = new HtmlDocument();
doc.Load(... //Load the Html...
//Either of these lines will do
HtmlNode pNode = doc.DocumentNode.SelectSingleNode("//h3[#class='this_class']/preceding-sibling::p");
//HtmlNode pNode = doc.DocumentNode.SelectSingleNode("//h3[contains(text(),'mind blown site')]/preceding-sibling::p");
string pInnerHtml = pNode.NextSibling.InnerHtml; //Has the text "Translation companies are...."
So in general, to get all the nodes from the opening p tag to the start of a tag you don't want, you could do this:
var p = doc.DocumentNode.SelectSingleNode("//p");
var h3 = p.SelectSingleNode("following-sibling::h3[#class='this_class']");
var following = new List<string>();
for (var current = p.NextSibling; current != h3; current = current.NextSibling)
{
following.Add(current.InnerText);
}
var innerText = String.Concat(following);

VTD-XML Exception: Name space qualification Exception: prefixed attribute not qualified

I receive an XML via a web service and I am using legacy code (which uses dom4j) to perform some xml transformation. Loading/parsing the original XML into VTD-XML (VTDGen) works fine, no exceptions thrown. However, after loading the xml into dom4j, I noticed some of the element namespace declarations and attributes are re-arranged. Apparently, this re-arrangement causes VTD-XML to throw the following exception:
Exception:
Name space qualification Exception: prefixed attribute not qualified
Line Number: 101 Offset: 1827
Here is the element at this line number in the original XML:
<RR_PerformanceSite:PerformanceSite_1_4 RR_PerformanceSite:FormVersion="1.4" xmlns:NSF_ApplicationChecklist="http://apply.grants.gov/forms/NSF_ApplicationChecklist-V1.1" xmlns:NSF_CoverPage="http://apply.grants.gov/forms/NSF_CoverPage-V1.1" xmlns:NSF_DeviationAuthorization="http://apply.grants.gov/forms/NSF_DeviationAuthorization-V1.1" xmlns:NSF_Registration="http://apply.grants.gov/forms/NSF_Registration-V1.1" xmlns:NSF_SuggestedReviewers="http://apply.grants.gov/forms/NSF_SuggestedReviewers-V1.1" xmlns:PHS398_CareerDevelopmentAwardSup="http://apply.grants.gov/forms/PHS398_CareerDevelopmentAwardSup_1_1-V1.1" xmlns:PHS398_Checklist="http://apply.grants.gov/forms/PHS398_Checklist_1_3-V1.3" xmlns:PHS398_CoverPageSupplement="http://apply.grants.gov/forms/PHS398_CoverPageSupplement_1_4-V1.4" xmlns:PHS398_ModularBudget="http://apply.grants.gov/forms/PHS398_ModularBudget-V1.1" xmlns:PHS398_ResearchPlan="http://apply.grants.gov/forms/PHS398_ResearchPlan_1_3-V1.3" xmlns:PHS_CoverLetter="http://apply.grants.gov/forms/PHS_CoverLetter_1_2-V1.2" xmlns:RR_Budget="http://apply.grants.gov/forms/RR_Budget-V1.1" xmlns:RR_KeyPersonExpanded="http://apply.grants.gov/forms/RR_KeyPersonExpanded_1_2-V1.2" xmlns:RR_OtherProjectInfo="http://apply.grants.gov/forms/RR_OtherProjectInfo_1_2-V1.2" xmlns:RR_PerformanceSite="http://apply.grants.gov/forms/PerformanceSite_1_4-V1.4" xmlns:RR_PersonalData="http://apply.grants.gov/forms/RR_PersonalData-V1.1" xmlns:RR_SF424="http://apply.grants.gov/forms/RR_SF424_1_2-V1.2" xmlns:RR_SubawardBudget="http://apply.grants.gov/forms/RR_SubawardBudget-V1.2" xmlns:SF424C="http://apply.grants.gov/forms/SF424C-V1.0" xmlns:att="http://apply.grants.gov/system/Attachments-V1.0" xmlns:codes="http://apply.grants.gov/system/UniversalCodes-V2.0" xmlns:globlib="http://apply.grants.gov/system/GlobalLibrary-V2.0">
Here is the same element after loaded into dom4j:
<RR_PerformanceSite:PerformanceSite_1_4 xmlns:RR_PerformanceSite="http://apply.grants.gov/forms/PerformanceSite_1_4-V1.4" xmlns:NSF_ApplicationChecklist="http://apply.grants.gov/forms/NSF_ApplicationChecklist-V1.1" xmlns:NSF_CoverPage="http://apply.grants.gov/forms/NSF_CoverPage-V1.1" xmlns:NSF_DeviationAuthorization="http://apply.grants.gov/forms/NSF_DeviationAuthorization-V1.1" xmlns:NSF_Registration="http://apply.grants.gov/forms/NSF_Registration-V1.1" xmlns:NSF_SuggestedReviewers="http://apply.grants.gov/forms/NSF_SuggestedReviewers-V1.1" xmlns:PHS398_CareerDevelopmentAwardSup="http://apply.grants.gov/forms/PHS398_CareerDevelopmentAwardSup_1_1-V1.1" xmlns:PHS398_Checklist="http://apply.grants.gov/forms/PHS398_Checklist_1_3-V1.3" xmlns:PHS398_CoverPageSupplement="http://apply.grants.gov/forms/PHS398_CoverPageSupplement_1_4-V1.4" xmlns:PHS398_ModularBudget="http://apply.grants.gov/forms/PHS398_ModularBudget-V1.1" xmlns:PHS398_ResearchPlan="http://apply.grants.gov/forms/PHS398_ResearchPlan_1_3-V1.3" xmlns:PHS_CoverLetter="http://apply.grants.gov/forms/PHS_CoverLetter_1_2-V1.2" xmlns:RR_Budget="http://apply.grants.gov/forms/RR_Budget-V1.1" xmlns:RR_KeyPersonExpanded="http://apply.grants.gov/forms/RR_KeyPersonExpanded_1_2-V1.2" xmlns:RR_OtherProjectInfo="http://apply.grants.gov/forms/RR_OtherProjectInfo_1_2-V1.2" xmlns:RR_PersonalData="http://apply.grants.gov/forms/RR_PersonalData-V1.1" xmlns:RR_SF424="http://apply.grants.gov/forms/RR_SF424_1_2-V1.2" xmlns:RR_SubawardBudget="http://apply.grants.gov/forms/RR_SubawardBudget-V1.2" xmlns:SF424C="http://apply.grants.gov/forms/SF424C-V1.0" xmlns:att="http://apply.grants.gov/system/Attachments-V1.0" xmlns:codes="http://apply.grants.gov/system/UniversalCodes-V2.0" xmlns:globlib="http://apply.grants.gov/system/GlobalLibrary-V2.0" RR_PerformanceSite:FormVersion="1.4">
The problem is regarding the attribute (at offset 1827, at the end of the element) in the new XML element: RR_PerformanceSite:FormVersion="1.4"
Here is what removes the exception:
1. Adding the RR_PerformanceSite xmlns declaration for this element to the root element of the XML doc.
2. Replacing new element with original element. This SEEMS to lead me to believe that the order of the attributes/ns declarations affects VTD when parsing.
NOTE: I parse the xml doc setting ns aware to 'true' with both xml docs (original and post-dom4j xml). Also, new VTD objects are created for each xml, original and post-dom4j.
I tried to put 'RR_PerformanceSite:FormVersion="1.4"' at the beginning of the element like the original but that does not remove the exception. The offset in the error message is different due to the change of location of the attribute. Does the order of the xmlns declarations affect VTD?
I have looked at the VTDGen source code and cannot figure out why this exception is being thrown.
Why would dom4j parse the new doc and vtd is unable to? Can anyone can shed some light on this?
It appears to be a bug on VTD-XML, related with namespace declaration order.
Always reproducible using the following Java code
public class SchemaTester {
/**
* #param args
*/
public static void main(String[] args) throws Exception {
String bad = "C:/Temp/VTD_bad.xml"; // XML files to test
String good = "C:/Temp/VTD_good.xml";
StringBuilder sb = new StringBuilder();
char[] buf = new char[4*1024];
FileReader fr = new FileReader(bad);
int readed = 0;
while ((readed = fr.read(buf, 0, buf.length)) != -1) {
sb.append(buf, 0, readed);
}
fr.close();
String x = sb.toString();
//instantiate VTDGen
//and call parse
VTDGen vg = new VTDGen();
vg.setDoc(x.getBytes("UTF-8"));
vg.parse(true); // set namespace awareness to true
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot (vn);
ap.selectXPath("//*/#*");
int i= -1;
while((i=ap.evalXPath()) != -1) {
// i will be attr name, i+1 will be attribute value
System.out.println("\t\tAttribute ==> " + vn.toNormalizedString(i));
System.out.println("\t\tValue ==> " + vn.toNormalizedString(i+1));
}
}
}
The OP has uploaded the XML to https://gist.github.com/2696220

Content templates rendering in TYPO3

I've got a strange problem connected with content rendering.
I use following code to grab the content:
lib.otherContent = CONTENT
lib.otherContent {
table = tt_content
select {
pidInList = this
orderBy = sorting
where = colPos=0
languageField = sys_language_uid
}
renderObj = COA
renderObj {
10 = TEXT
10.field = header
10.wrap = <h2>|</h2>
20 = TEXT
20.field = bodytext
20.wrap = <div class="article">|</div>
}
}
and everything works fine, except that I'd like to use also predefined column-content templates other than simple text (Text with image, Images only, Bullet list etc.).
The question is: with what I have to replace renderObj = COA and the rest between the brackets to let the TYPO3 display it properly?
Thanks,
I.
The available cObjects are more or less listed in TSRef, chapter 8.
TypoScript for rendering Text w/image can be found in typo3/sysext/css_styled_content/static/v4.3/setup.txt at line 724, and in the neighborhood you'll find e.g. bullets (below) and image (above), which is referenced in textpic line 731. Variants of this is what you'll write in your renderObj.
You will find more details in the file typo3/sysext/cms/tslib/class.tslib_content.php, where e.g. text w/image is found at or around line 897 and is called IMGTEXT (do a case-sensitive search). See also around line 403 in typo3/sysext/css_styled_content/pi1/class.cssstyledcontent_pi1.php, where the newer css-based rendering takes place.