How can I convert XHTML nested list to pdf with iText?

How can I convert XHTML nested list to pdf with iText? - pdf

I have XHTML content, and I have to create from this content a PDF file on the fly. I use iText pdf converter.
I tried the simple way, but I always get bad result after calling the XMLWorkerHelper parser.
XHTML:
<ul>
<li>First
<ol>
<li>Second</li>
<li>Second</li>
</ol>
</li>
<li>First</li>
</ul>
The expected value:
First
Second
Second
First
PDF result:
First Second Second
First
In the result there is no nested list. I need a solution for calling the parser, and not creating an iText Document instance.

Please take a look at the example NestedListHtml
In this example, I take your code snippet list.html:
<ul>
<li>First
<ol>
<li>Second</li>
<li>Second</li>
</ol>
</li>
<li>First</li>
</ul>
And I parse it into an ElementList:
// CSS
CSSResolver cssResolver =
XMLWorkerHelper.getInstance().getDefaultCssResolver(true);
// HTML
HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
htmlContext.autoBookmark(false);
// Pipelines
ElementList elements = new ElementList();
ElementHandlerPipeline end = new ElementHandlerPipeline(elements, null);
HtmlPipeline html = new HtmlPipeline(htmlContext, end);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);
// XML Worker
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
p.parse(new FileInputStream(HTML));
Now I can add this list to the Document:
for (Element e : elements) {
document.add(e);
}
Or I can list this list to a Paragraph:
Paragraph para = new Paragraph();
for (Element e : elements) {
para.add(e);
}
document.add(para);
You will get the desired result as shown in nested_list.pdf
You can not add nested lists to a PdfPCell or to a ColumnText. For instance: this will not work:
PdfPTable table = new PdfPTable(2);
table.addCell("Nested lists don't work in a cell");
PdfPCell cell = new PdfPCell();
for (Element e : elements) {
cell.addElement(e);
}
table.addCell(cell);
document.add(table);
This is due to a limitation in the ColumnText class that has been there for many years. We have evaluated the problem and the only way to fix this, would be to rewrite ColumnText entirely. This is not an item on our current technical road map.

Here's a workaround for nested ordered and un-ordered lists.
The rich Text editor I am using giving the class attribute "ql-indent-1/2/2/" for li tags, based on the attribute adding ul/ol starting and ending tags.
public String replaceIndentSubList(String htmlContent) {
org.jsoup.nodes.Document document = Jsoup.parseBodyFragment(htmlContent);
Elements element_UL = document.select("ul");
Elements element_OL = document.select("ol");
if (!element_UL.isEmpty()) {
htmlContent = replaceIndents(htmlContent, element_UL, "ul");
}
if (!element_OL.isEmpty()) {
htmlContent = replaceIndents(htmlContent, element_OL, "ol");
}
return htmlContent;
}
public String replaceIndents(String htmlContent, Elements element, String tagType) {
String attributeKey = "class";
String startingULTgas = "<" + tagType + ">";
String endingULTags = "</" + tagType + ">";
int lengthOfQLIndenet = new String("ql-indent-").length();
HashMap<String, String> startingLiTagMap = new HashMap<String, String>();
HashMap<String, String> lastLiTagMap = new HashMap<String, String>();
Pattern regex = Pattern.compile("ql-indent-\\d");
HashSet<String> hash_Set = new HashSet<String>();
Elements element_Tag = element.select("li");
for (org.jsoup.nodes.Element element2 : element_Tag) {
org.jsoup.nodes.Attributes att = element2.attributes();
if (att.hasKey(attributeKey)) {
String attributeValue = att.get(attributeKey);
Matcher matcher = regex.matcher(attributeValue);
if (matcher.find()) {
if (!startingLiTagMap.containsKey(attributeValue)) {
startingLiTagMap.put(attributeValue, element2.toString());
}
hash_Set.add(matcher.group(0));
if (!startingLiTagMap.get(attributeValue)
.equalsIgnoreCase(element2.toString())) {
lastLiTagMap.put(attributeValue, element2.toString());
}
}
}
}
System.out.println(htmlContent);
Iterator value = hash_Set.iterator();
while (value.hasNext()) {
String liAttributeKey = (String) value.next();
int noOfIndentes = Integer
.parseInt(liAttributeKey.substring(lengthOfQLIndenet));
if (noOfIndentes > 1)
for (int i = 1; i < noOfIndentes; i++) {
startingULTgas = startingULTgas + "<" + tagType + ">";
endingULTags = endingULTags + "</" + tagType + ">";
}
htmlContent = htmlContent.replace(startingLiTagMap.get(liAttributeKey),
startingULTgas + startingLiTagMap.get(liAttributeKey));
if (lastLiTagMap.get(liAttributeKey) != null) {
System.out.println("Inside last Li Map");
htmlContent = htmlContent.replace(lastLiTagMap.get(liAttributeKey),
lastLiTagMap.get(liAttributeKey) + endingULTags);
}
else {
htmlContent = htmlContent.replace(startingLiTagMap.get(liAttributeKey),
startingLiTagMap.get(liAttributeKey) + endingULTags);
}
startingULTgas = "<" + tagType + ">";
endingULTags = "</" + tagType + ">";
}
System.out.println(htmlContent);[enter image description here][1]
return htmlContent;
}

Related

Syntax Highlighting for go in vb.net

Ok so I have been making a simple code editor in vb.net for go.. (for personal uses)
I tried this code -
Dim tokens As String = "(break|default|func|interface|select|case|defer|go|map|struct|chan|else|goto|package|switch|const|fallthrough|if|range|type|continue|for|import|return|var)"
Dim rex As New Regex(tokens)
Dim mc As MatchCollection = rex.Matches(TextBox2.Text)
Dim StartCursorPosition As Integer = TextBox2.SelectionStart
For Each m As Match In mc
Dim startIndex As Integer = m.Index
Dim StopIndex As Integer = m.Length
TextBox2.[Select](startIndex, StopIndex)
TextBox2.SelectionColor = Color.FromArgb(0, 122, 204)
TextBox2.SelectionStart = StartCursorPosition
TextBox2.SelectionColor = Color.RebeccaPurple
Next
but I couldn't add something like print statements say I want a fmt.Println("Hello World"), that is not possible, anyone help me?
I want a simple result that will do proper syntax without glitching text colors like this current code does.

Here's a code showing how to update highlighting with strings and numbers.
You would need to tweak it further to support syntax like comments, etc.
private Regex BuildExpression()
{
string[] exprs = {
"(break|default|func|interface|select|case|defer|go|map|struct|chan|else|goto|package|switch|const|fallthrough|if|range|type|continue|for|import|return|var)",
#"([0-9]+\.[0-9]*(e|E)(\+|\-)?[0-9]+)|([0-9]+\.[0-9]*)|([0-9]+)",
"(\"\")|\"((((\\\\\")|(\"\")|[^\"])*\")|(((\\\\\")|(\"\")|[^\"])*))"
};
StringBuilder sb = new StringBuilder();
for (int i = 0; i < exprs.Length; i++)
{
string expr = exprs[i];
if ((expr != null) && (expr != string.Empty))
sb.Append(string.Format("(?<{0}>{1})", "_" + i.ToString(), expr) + "|");
}
if (sb.Length > 0)
sb.Remove(sb.Length - 1, 1);
RegexOptions options = RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.IgnoreCase;
return new Regex(sb.ToString(), options);
}
private void HighlightSyntax()
{
var colors = new Dictionary<int, Color>();
var expression = BuildExpression();
Color[] clrs = { Color.Teal, Color.Red, Color.Blue };
int[] intarray = expression.GetGroupNumbers();
foreach (int i in intarray)
{
var name = expression.GroupNameFromNumber(i);
if ((name != null) && (name.Length > 0) && (name[0] == '_'))
{
var idx = int.Parse(name.Substring(1));
if (idx < clrs.Length)
colors.Add(i, clrs[idx]);
}
}
foreach (Match match in expression.Matches(richTextBox1.Text))
{
int index = match.Index;
int length = match.Length;
richTextBox1.Select(index, length);
for (int i = 0; i < match.Groups.Count; i++)
{
if (match.Groups[i].Success)
{
if (colors.ContainsKey(i))
{
richTextBox1.SelectionColor = colors[i];
break;
}
}
}
}
}
What we found during development of our Code Editor libraries, is that the regular expression-based parsers are hard to adapt to fully support advanced syntax like contextual keywords (LINQ) or interpolated strings.
You might find a bit more information here:
https://www.alternetsoft.com/blog/code-parsing-explained
The most accurate syntax highlighting for VB.NET can be implemented using Microsoft.CodeAnalysis API, it's the same API used internally by Visual Studio text editor.
Below is sample code showing how to get classified spans for VB.NET code (every span contains start/end position within the text and classification type, i.e. keyword, string, etc.). These spans then can be used to highlight text inside a textbox.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Reflection;
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.Classification;
using Microsoft.CodeAnalysis.Host.Mef;
using Microsoft.CodeAnalysis.Text;
public class VBClassifier
{
private Workspace workspace;
private static string FileContent = #"
Public Sub Run()
Dim test as TestClass = new TestClass()
End Sub";
public void Classify()
{
var project = InitProject();
var doc = AddDocument(project, "file1.vb", FileContent);
var spans = Classify(doc);
}
protected IEnumerable<ClassifiedSpan> Classify(Document document)
{
var text = document.GetTextAsync().Result;
var span = new TextSpan(0, text.Length);
return Classifier.GetClassifiedSpansAsync(document, span).Result;
}
protected Document AddDocument(Project project, string fileName, string code)
{
var documentId = DocumentId.CreateNewId(project.Id, fileName);
ApplySolutionChanges(s => s.AddDocument(documentId, fileName, code, filePath: fileName));
return workspace.CurrentSolution.GetDocument(documentId);
}
protected virtual void ApplySolutionChanges(Func<Solution, Solution> action)
{
var solution = workspace.CurrentSolution;
solution = action(solution);
workspace.TryApplyChanges(solution);
}
protected MefHostServices GetRoslynCompositionHost()
{
IEnumerable<Assembly> assemblies = MefHostServices.DefaultAssemblies;
var compositionHost = MefHostServices.Create(assemblies);
return compositionHost;
}
protected Project CreateDefaultProject()
{
var solution = workspace.CurrentSolution;
var projectId = ProjectId.CreateNewId();
var projectName = "VBTest";
ProjectInfo projectInfo = ProjectInfo.Create(
projectId,
VersionStamp.Default,
projectName,
projectName,
LanguageNames.VisualBasic,
filePath: null);
ApplySolutionChanges(s => s.AddProject(projectInfo));
return workspace.CurrentSolution.Projects.FirstOrDefault();
}
protected Project InitProject()
{
var host = GetRoslynCompositionHost();
workspace = new AdhocWorkspace(host);
return CreateDefaultProject();
}
}
Update:
Here's a Visual Studio project demonstrating both approaches:
https://drive.google.com/file/d/1LLuzy7yDFAE-v40I7EswECYQSthxheEf/view?usp=sharing

How to write a tag-helper for alphabetical paging

I came across the following article https://www.mikesdotnetting.com/article/256/entity-framework-recipe-alphabetical-paging-in-asp-net-mvc describing how to generate paging links from the data instead of the alphabet in a asp.net application.
The solution shown here is based on html helpers.
How can I implement this feature using tag helpers instead?
I'm using asp.net core 1.1.
The code I'm referrig to is:
public static class HtmlHelpers
{
public static HtmlString AlphabeticalPager(this HtmlHelper html, string selectedLetter, IEnumerable<string> firstLetters, Func<string, string> pageLink)
{
var sb = new StringBuilder();
var numbers = Enumerable.Range(0, 10).Select(i => i.ToString());
var alphabet = Enumerable.Range(65, 26).Select(i => ((char)i).ToString()).ToList();
alphabet.Insert(0, "All");
alphabet.Insert(1, "0-9");
var ul = new TagBuilder("ul");
ul.AddCssClass("pagination");
ul.AddCssClass("alpha");
foreach (var letter in alphabet)
{
var li = new TagBuilder("li");
if (firstLetters.Contains(letter) || (firstLetters.Intersect(numbers).Any() && letter == "0-9") || letter == "All")
{
if (selectedLetter == letter || selectedLetter.IsEmpty() && letter == "All")
{
li.AddCssClass("active");
var span = new TagBuilder("span");
span.SetInnerText(letter);
li.InnerHtml = span.ToString();
}
else
{
var a = new TagBuilder("a");
a.MergeAttribute("href", pageLink(letter));
a.InnerHtml = letter;
li.InnerHtml = a.ToString();
}
}
else
{
li.AddCssClass("inactive");
var span = new TagBuilder("span");
span.SetInnerText(letter);
li.InnerHtml = span.ToString();
}
sb.Append(li.ToString());
}
ul.InnerHtml = sb.ToString();
return new HtmlString(ul.ToString());
}
}
Any idea how to proceed?

Obtaining Lucene term vectors for a found term in a string

I am trying to highlight terms in a string. My code searches along a string and looks for equivalent terms in an index. The code returns found terms ok. However, I would like to return the original string, to the user, that was inputted by the user with found terms highlighted. I am using Lucene 4 because that is the book I am using to learn Lucene. I have a pitiful attempt to get term vectors and such but it iterates through the entire field, I can't figure out how to just get the found terms.. Here is my code:
public class TokenArrayTest {
private static final String INDEX_DIR = "C:/ontologies/Lucene/icnpIndex";
//private static List<Float> levScore = new ArrayList<Float>();
//add key and value pairs of tokens to a map to send to a servlet. key 10,11,12 etc
//private static HashMap<Integer, String> hashMap = new HashMap<Integer, String>();
private static List<String> tokens = new ArrayList<String>();
private static int totalResults=0;
public static void main(String[] pArgs) throws IOException, ParseException, InvalidTokenOffsetsException
{
//counters which detect found term changes to advance the html table to the next cell
int b=1;
int c=1;
String searchText="Mrs. smith has limited mobility and fell out of bed. She needs a feeding assessment. She complained of abdominal pains nuring the night. She woke with a headache and she is due for a shower this morning.";
//Get directory reference
Directory dir = FSDirectory.open(new File(INDEX_DIR));
//Index reader - an interface for accessing a point-in-time view of a lucene index
IndexReader reader = DirectoryReader.open(dir);
//Create lucene searcher. It search over a single IndexReader.
IndexSearcher searcher = new IndexSearcher(reader);
//analyzer with the default stop words
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_40);
TokenStream tokenStream = analyzer.tokenStream(null, new StringReader(searchText));
CharTermAttribute termAttribute = tokenStream.getAttribute(CharTermAttribute.class);
//Query parser to be used for creating TermQuery
QueryParser qp = new QueryParser(Version.LUCENE_40, "Preferred Term", analyzer);
/*add all of the words to an array after they have passed through the analyzer.
* The words are used one by one through the query method later on.
*/
while (tokenStream.incrementToken()) {
tokens.add(termAttribute.toString());
}
//print the top half of the html page
System.out.print("<html>\r\n" +
"\r\n" +
"<head>\r\n" +
"<meta http-equiv=\"Content-Type\" content=\"text/html; charset=windows-1252\">\r\n" +
"\r\n" +
"<title>ICNP results</title>\r\n" +
"</head>\r\n" +
"\r\n" +
"<body>\r\n" +
"\r\n" +
"<p>"+
searchText+"<br>"+
"<p>"+
"<div align=\"center\">\r\n" +
" <center>\r\n" +
" <table border=\"1\" \r\n" +
" <tr>\r\n" +
"<td>\r\n"+
"");
//place each word from the previous array into the query
for(int n=0;n<tokens.size();++n) {
//Create the query
Query query = qp.parse(tokens.get(n));
//Search the lucene documents for the hits
TopDocs hits = searcher.search(query, 20);
//Total found documents
totalResults =totalResults+hits.totalHits;
//print out the score for each searched term
//for (ScoreDoc sd : hits.scoreDocs)
//{
//Document d = searcher.doc(sd.doc);
// System.out.println("Score : " + sd.score);
// }
/** Highlighter Code Start ****/
//Put a html code in here for each found term if need be
Formatter formatter = new SimpleHTMLFormatter("", "");
//Scores text fragments by the number of unique query terms found
QueryScorer scorer = new QueryScorer(query);
//used to markup highlighted terms found in the best sections of a text
Highlighter highlighter = new Highlighter(formatter, scorer);
//It breaks text up into same-size texts but does not split up spans
Fragmenter fragmenter = new SimpleSpanFragmenter(scorer, 20);
//set fragmenter to highlighter
highlighter.setTextFragmenter(fragmenter);
//Iterate over found results
for (int i = 0; i < hits.scoreDocs.length; i++)
{
int docid = hits.scoreDocs[i].doc;
Document doc = searcher.doc(docid);
//Get stored text from found document
String text = doc.get("Preferred Term");
//a pitiful attempt to get term vectors and such like
termsVector = reader.getTermVector(i, "Preferred Term");
termsEnum = termsVector.iterator(termsEnum);
while ( (term = termsEnum.next()) != null ) {
val = term.utf8ToString();
System.out.println("DocId: " + i);
System.out.println(" term: " + val);
System.out.println(" length: " + term.length);
docsAndPositionsEnum = termsEnum.docsAndPositions(null, docsAndPositionsEnum);
if (docsAndPositionsEnum.nextDoc() >= 0) {
int freq = docsAndPositionsEnum.freq();
System.out.println(" freq: " + docsAndPositionsEnum.freq());
for (int j = 0; j < freq; j++) {
System.out.println(" [");
System.out.println(" position: " + docsAndPositionsEnum.nextPosition());
System.out.println(" offset start: " + docsAndPositionsEnum.startOffset());
System.out.println(" offset end: " + docsAndPositionsEnum.endOffset());
System.out.println(" ]");
}
}
}
//Create token stream
TokenStream stream = TokenSources.getAnyTokenStream(reader, docid, "Preferred Term", analyzer);
//Get highlighted text fragments
String[] frags = highlighter.getBestFragments(stream, text,20);
for (String frag : frags)
{
//On the first pass print this html out
if((c==1)&&(b!=c)) {
System.out.println("<select>");
c=b;
}else if((b!=c)) { //and every other time move to the next cell when b changes
System.out.println("</select>"
+ "</td><td>"
+ "<select>");
c=b;
}
System.out.println("<option value='"+frag+"'>"+frag+"</option>");
}
}
b=b+1;
}
dir.close();
b=1;
c=1;
totalResults=0;
//print the bottom half of the html page
System.out.print("</select></td>\r\n" +
" </tr>\r\n" +
" </table>\r\n" +
" </center>\r\n" +
"</div>\r\n" +
"\r\n" +
"</body>\r\n" +
"\r\n" +
"</html>\r\n" +
"");
}
}

I Don't know if possible with lucene v4 but with newer versions it's easy possible with a Highlighter a UnifiedHighlighter.
There are several tutorials in which text highlighting is achieved on different ways (just google it...):
Lucene Search Highlight Example
Lucene UnifiedHighlighter Example
Lucene Highlighter Tutorial with Example
If you start with a new project i would strongly suggest using the most recent version even if your book is based on lucene v4. The book is good to get a basic understanding about how lucene works but using an old version of the library is an instant technical dept which you habe to deal later on. Additional to this a newer version usually provides additional features which may be interesting for you.

For future readers, here is my Plain old java method (POJM) that prints out offsets.
generatePreviewText( analyzer, searchText, tokens, frags );
public static void generatePreviewText(Analyzer analyzer, String inputText, List<String> tokens, String[] frags) throws IOException
{
String contents[]= {inputText};
String[] foundTerms = frags;
//for(int n=0;n<frags.length;++n) {
//System.out.println("Found terms array= "+foundTerms[n]);
// }
Directory directory = new RAMDirectory();
IndexWriterConfig config =
new IndexWriterConfig(Version.LUCENE_40, analyzer);
IndexWriter indexWriter = new IndexWriter(directory, config);
FieldType textFieldType = new FieldType();
textFieldType.setIndexed(true);
textFieldType.setTokenized(true);
textFieldType.setStored(true);
textFieldType.setStoreTermVectors(true);
textFieldType.setStoreTermVectorPositions(true);
textFieldType.setStoreTermVectorOffsets(true);
Document doc = new Document();
Field textField = new Field("content", "", textFieldType);
for (String content : contents) {
textField.setStringValue(content);
doc.removeField("content");
doc.add(textField);
indexWriter.addDocument(doc);
}
indexWriter.commit();
IndexReader indexReader = DirectoryReader.open(directory);
DocsAndPositionsEnum docsAndPositionsEnum = null;
Terms termsVector = null;
TermsEnum termsEnum = null;
BytesRef term = null;
String val = null;
for (int i = 0; i < indexReader.maxDoc(); i++) {
termsVector = indexReader.getTermVector(i, "content");
termsEnum = termsVector.iterator(termsEnum);
while ( (term = termsEnum.next()) != null ) {
val = term.utf8ToString();
// if(foundTerms.get(i)==val) {
System.out.println(" term: " + val);
System.out.println(" length: " + term.length);
docsAndPositionsEnum = termsEnum.docsAndPositions(null, docsAndPositionsEnum);
if (docsAndPositionsEnum.nextDoc() >= 0) {
int freq = docsAndPositionsEnum.freq();
System.out.println(" freq: " + docsAndPositionsEnum.freq());
for (int j = 0; j < freq; j++) {
System.out.println(" [");
System.out.println(" position: " + docsAndPositionsEnum.nextPosition());
System.out.println(" offset start: " + docsAndPositionsEnum.startOffset());
System.out.println(" offset end: " + docsAndPositionsEnum.endOffset());
System.out.println(" ]");
}
}
//}
}
}indexWriter.close();
}

How to fill PDF form with specific font using pdfbox?

I am trying to fill pdf form and I am able to fill it using the following approach through PDFBox library.
val pdf: PDDocument = PDDocument.load(file)
pdf.setAllSecurityToBeRemoved(true)
val docCatalog: PDDocumentCatalog = pdf.getDocumentCatalog
val acroForm: PDAcroForm = docCatalog.getAcroForm
def populateFields(inputJson: String, targetPdfPath: String): Unit = {
val valueMap: Array[Field] = gson.fromJson(inputJson, classOf[Array[Field]])
valueMap.foreach((field) => {
val pdField: PDField = acroForm.getField(field.name)
if (pdField != null) {
pdField.setValue(field.value)
} else {
println(s"No field with name ${field.name}")
}
})
pdf.save(targetPdfPath)
pdf.close()
}
The only problem is, I don't see any option to set the font before filling the pdf. Can you help me here?

You can achieve it by using these methods (note that you have to use PDFBox 1.8.15, not the newer 2.0).
// Set the field with custom font.
private void setField(String name, String value, String fontSource) throws IOException {
PDDocumentCatalog docCatalog;
PDAcroForm acroForm;
PDField field;
COSDictionary dict;
COSString defaultAppearance;
docCatalog = pdfTemplate.getDocumentCatalog();
acroForm = docCatalog.getAcroForm();
field = acroForm.getField(name);
dict = (field).getDictionary();
defaultAppearance = (COSString) dict.getDictionaryObject(COSName.DA);
if (defaultAppearance != null)
{
dict.setString(COSName.DA, "/" + fontName + " 10 Tf 0 g");
if (name.equalsIgnoreCase("Field1")) {
dict.setString(COSName.DA, "/" + fontName + " 12 Tf 0 g");
}
}
if (field instanceof PDTextbox)
{
field = new PDTextbox(acroForm, dict);
(field).setValue(value);
}
}
// Set the field with custom font.
private List<String> prepareFont(PDDocument _pdfDocument, List<PDFont> fonts) {
PDDocumentCatalog docCatalog = _pdfDocument.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();
PDResources res = acroForm.getDefaultResources();
if (res == null)
res = new PDResources();
List<String> fontNames = new ArrayList<>();
for (PDFont font: fonts)
{
fontNames.add(res.addFont(font));
}
acroForm.setDefaultResources(res);
return fontNames;
}
// Set the field with custom font.
private PDFont loadTrueTypeFont(PDDocument _pdfDocument, String resourceName) throws IOException
{
return PDTrueTypeFont.loadTTF(_pdfDocument, new File(resourceName));
}
Now, you only have to source the method setField with the name of the field, the value you want to insert and a string which is the path to the TTF font you wanna use.
Hope it helps!

selecting divs fails

I am trying to parse the information inside the div class="base shortstory:
<div id="dle-content">
<div class="base shortstory">
<h3 class="btl">HTC Jetstream</h3>
</div>
<div class="base shortstory">
<h3 class="btl">Samsung S4</h3>
</div>
<div class="base shortstory">
<h3 class="btl">Dell Streak</h3>
</div>
</div>
Here is the code
const string url = "http://someurl.com/catalogue";
const string rootUrl = "http://someurl.com";
HtmlWeb hw = new HtmlWeb();
HtmlDocument doc = hw.Load(url);
int dealsCount = 0;
HtmlNode root = doc.DocumentNode.SelectSingleNode("//div[#id='dle-content']");
int i = 1;
//this is for the default page
while (i<=10)
{
try
{
string node= String.Format("//div[{0}]", i);
var link =
doc.DocumentNode.SelectSingleNode(node);
var href = link.SelectSingleNode("//div[#class='mlink']//span[#class='argmore']//a[#href]").Attributes["href"].Value;
string title = link.SelectSingleNode("//h3[#class='btl']//a[#href]").InnerText.Trim();
string description = link.SelectSingleNode("//div[#class='maincont']//div[1]").InnerText.Replace("\n", " ").Replace("\r", "").Replace("\t", "").Trim();
description = RemoveHTMLComments(description);
var imageURL = link.SelectSingleNode("//div[#class='maincont']//div[1]//a//img").Attributes["src"].Value;
var price = link.SelectSingleNode("//div[#class='mlink']//span[3]//font").InnerText.Trim();
price = Regex.Match(price, #"\d+").Value;
var partnerdealID = href;
//no information
var isActivesStr = link.SelectSingleNode("//div[#class='mlink']//span[2]/font").InnerText.Trim();
bool isActive;
if (isActivesStr.Contains("Нет в наличии"))
{
isActive = false;
}
else
{
isActive = true;
}
var dealUrl = href; //requires login - show the page itself
}
catch (Exception)
{
}
i += 1;
}
But after looping still the selected node is first one. What am I doing wrong?

All your XPATH expressions start with '//' which means "start from root of the document and search recursively". So when you do this:
link.SelectSingleNode("//div[#class='mlink']//span[#class='argmore']//a[#href]")
You will start not from link, but from the document's root. You probably want to do this instead:
link.SelectSingleNode("div[#class='mlink']...etc...")
which is equivalent to
link.SelectSingleNode("./div[#class='mlink']...etc...")
'.' means the current node. '/' means search only the direct children, not recursively.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How can I convert XHTML nested list to pdf with iText? - pdf

Related

Syntax Highlighting for go in vb.net

How to write a tag-helper for alphabetical paging

Obtaining Lucene term vectors for a found term in a string

How to fill PDF form with specific font using pdfbox?

selecting divs fails

Categories

Resources