Getting PDFBox IO Exception: COSStream has been closed and cannot be read, while using PDF as background image in Apache FOP - pdfbox

I am generating a pdf file using Apache FOP - 2.5.x and I need to use some existing pdf as background images. I am using fop-pdf-images-2.3 to enable this. Fop-pdf-images uses PDFBox 2.0.20 to enable FOP to read pdf file as background image.
But I am sometimes getting this error
Error on PDF page: https://s2.q4cdn.com/498544986/files/doc_downloads/test.pdf COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?
at org.apache.fop.render.pdf.pdfbox.PDFBoxImageHandler.handleImage(PDFBoxImageHandler.java:84)
at org.apache.fop.render.intermediate.AbstractIFPainter.drawImage(AbstractIFPainter.java:249)
at org.apache.fop.render.intermediate.AbstractIFPainter.drawImage(AbstractIFPainter.java:205)
at org.apache.fop.render.intermediate.AbstractIFPainter.drawImageUsingImageHandler(AbstractIFPainter.java:170)
at org.apache.fop.render.pdf.PDFPainter.drawImageUsingURI(PDFPainter.java:218)
at org.apache.fop.render.pdf.PDFPainter.drawImage(PDFPainter.java:181)
at org.apache.fop.render.intermediate.IFRenderer.drawImage(IFRenderer.java:1301)
at org.apache.fop.render.AbstractPathOrientedRenderer.drawImage(AbstractPathOrientedRenderer.java:973)
at org.apache.fop.render.AbstractPathOrientedRenderer.drawBackground(AbstractPathOrientedRenderer.java:308)
at org.apache.fop.render.intermediate.IFRenderer.drawBackground(IFRenderer.java:1390)
at org.apache.fop.render.AbstractPathOrientedRenderer.drawBackground(AbstractPathOrientedRenderer.java:215)
at org.apache.fop.render.AbstractPathOrientedRenderer.drawBackAndBorders(AbstractPathOrientedRenderer.java:173)
at org.apache.fop.render.AbstractPathOrientedRenderer.handleRegionTraits(AbstractPathOrientedRenderer.java:128)
at org.apache.fop.render.AbstractRenderer.renderRegionViewport(AbstractRenderer.java:373)
at org.apache.fop.render.intermediate.IFRenderer.renderRegionViewport(IFRenderer.java:738)
at org.apache.fop.render.AbstractRenderer.renderPageAreas(AbstractRenderer.java:345)
at org.apache.fop.render.AbstractRenderer.renderPage(AbstractRenderer.java:318)
at org.apache.fop.render.intermediate.IFRenderer.renderPage(IFRenderer.java:587)
at org.apache.fop.area.RenderPagesModel.addPage(RenderPagesModel.java:123)
at org.apache.fop.layoutmgr.AbstractPageSequenceLayoutManager.finishPage(AbstractPageSequenceLayoutManager.java:316)
at org.apache.fop.layoutmgr.PageSequenceLayoutManager.finishPage(PageSequenceLayoutManager.java:243)
at org.apache.fop.layoutmgr.PageSequenceLayoutManager.activateLayout(PageSequenceLayoutManager.java:147)
at org.apache.fop.area.AreaTreeHandler.endPageSequence(AreaTreeHandler.java:267)
at org.apache.fop.fo.pagination.PageSequence.endOfNode(PageSequence.java:139)
at org.apache.fop.fo.FOTreeBuilder$MainFOHandler.endElement(FOTreeBuilder.java:362)
at org.apache.fop.fo.FOTreeBuilder.endElement(FOTreeBuilder.java:190)
at java.xml/com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.endElement(ToXMLSAXHandler.java:263)
at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:610)
at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1718)
at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2883)
at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:605)
at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:534)
at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1216)
at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
at java.xml/com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:660)
at java.xml/com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:774)
... 7 more
Caused by: java.io.IOException: COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?
at org.apache.pdfbox.cos.COSStream.checkClosed(COSStream.java:154)
at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:239)
at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:234)
at org.apache.pdfbox.pdmodel.PDPage.getContents(PDPage.java:157)
at org.apache.fop.render.pdf.pdfbox.PDFBoxAdapter.getContents(PDFBoxAdapter.java:460)
at org.apache.fop.render.pdf.pdfbox.PDFBoxAdapter.createStreamFromPDFBoxPage(PDFBoxAdapter.java:352)
at org.apache.fop.render.pdf.pdfbox.AbstractPDFBoxHandler.createStreamForPDF(AbstractPDFBoxHandler.java:111)
at org.apache.fop.render.pdf.pdfbox.PDFBoxImageHandler.handleImage(PDFBoxImageHandler.java:71)
... 46 more
This error depends on the pageHeight and pageWidth in setup in my .fo file.
Here is my .fo file for the same.
{% set cardWidth = 8.3 %}
{% set cardHeight = 8 %}
{% set bleedLength = 0.125 %}
{% set totalWidth = cardWidth + 2 * bleedLength %}
{% set totalHeight = cardHeight + 2 * bleedLength %}
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<fo:simple-page-master master-name="MyPick" page-width="{{ totalWidth }}in" page-height="{{ totalHeight }}in">
<fo:region-body background-image="https://s2.q4cdn.com/498544986/files/doc_downloads/test.pdf"/>
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="MyPick">
<fo:flow flow-name="xsl-region-body">
<fo:block-container absolute-position="absolute" top="{{ bleedLength }}in" left="{{ bleedLength }}in" width="{{ cardWidth }}in"
height="{{ cardHeight }}in">
<fo:block-container display-align="center" margin-left="1.5in" margin-top="1.2in" margin-right="1.5in"
height="6.539in">
<fo:block-container margin="0">
<fo:block font-family="Bookerly Display" text-align="start" color="rgb(255,255,255)" font-size="160pt" line-height="1">
Header
</fo:block>
<fo:block font-family="Bookerly Display" margin-top="0.3in" text-align="start" color="rgb(255,255,255)" font-size="110pt" line-height="1">
SubHeader
</fo:block>
</fo:block-container>
</fo:block-container>
<fo:block margin-top="0.75in" margin-bottom="0.4175in" margin-right="0.39in" text-align="right" font-family="Arial" font-size="6pt"
color="rgb(255,255,255)">
Footer
</fo:block>
</fo:block-container>
</fo:flow>
</fo:page-sequence>
</fo:root>
If I change my page cardWidth from 8.3 to 8.2, the same pdf renders properly. I don't understand why the issue happens for some dimensions but for some dimensions FOP is able to generate the file without any issues.
I am using this sample pdf file to test : https://s2.q4cdn.com/498544986/files/doc_downloads/test.pdf

Related

Creating Random string containing over 25 characters of numbers and letters (upper and lower case)

How to create a Random string longer than 25 characters consisting of of digits and letters with XSLT?
Example: Khb34KXQ23ib34KDNBBE342nQE
My XSLT is like this:
<xsl:function name="kh:shortRandom">
<xsl:sequence select="generate-id()"/>
</xsl:function>
<xsl:template match="/">
<test>
<randomId><xsl:value-of select="concat(kh:shortRandom(), kh:shortRandom(), kh:shortRandom(), kh:shortRandom())"/></randomId>
</test>
</xsl:template>
But the answer is always the same..(e1d1).. Because i call the function four times.. the answer is also four time. (e1d1e1d1e1d1e1d1)
I want to have a different character every time. A little bit like password generator but just with letters and numbers.
Tnx :)
In XSLT 3.0 (XPath 3.1) one can use the random-number-generator() function.
For XSLT 2.0 I would recommend using the random number functions of FXSL - see for example: "Casting the Dice with FXSL: Random Number Generation Functions in XSLT"
Using this, here is an implementation of the wanted random-string generation function:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:my="my">
<xsl:import href="C:/CVS-DDN/fxsl-xslt2/f/random.xsl"/>
<xsl:output method="text"/>
<xsl:variable name="vSeed" select=
"xs:integer(seconds-from-time(current-time()))
* xs:integer(minutes-from-time(current-time()))
* xs:integer(hours-from-time(current-time()))"/>
<xsl:template match="/">
<xsl:value-of select="my:randomStrings(25, 10, $vSeed)" separator="
"/>
</xsl:template>
<xsl:function name="my:randomStrings">
<xsl:param name="pRandomLength" as="xs:integer"/>
<xsl:param name="pNumResults" as="xs:integer" />
<xsl:param name="pSeed" as="xs:integer"/>
<xsl:variable name="vAlphaNum" select="'abcdefghijklmnopqrstuvwxyz0123456789'"/>
<xsl:variable name="vNums">
<xsl:call-template name="randomSequence">
<xsl:with-param name="pSeed" select="$pSeed"/>
<xsl:with-param name="pStart" select="1"/>
<xsl:with-param name="pEnd" select="36"/>
<xsl:with-param name="pLength" select="$pRandomLength*$pNumResults"/>
</xsl:call-template>
</xsl:variable>
<xsl:sequence select=
"for $vK in 1 to $pNumResults
return
string-join(for $i in
$vNums/*[position() gt ($vK -1)*$pRandomLength
and position() le $vK*$pRandomLength]
/xs:integer(.)
return substring($vAlphaNum, $i, 1),
'')
"/>
</xsl:function>
</xsl:stylesheet>
The function my:randomStrings(pRandomLength, pNumResults, pSeed) implemented above produces a sequence of random strings and has these three arguments:
pRandomLength - the wanted length of each generated random string
pNumResults - the wanted number of random strings to be generated
pSeed - a seed for the random generator. Calling the function with different seeds will produce different results.
The code above calls the function to produce 10 random strings each with length 25. The seed is calculated from the current time and thus the result will be different each time the transformation is performed.
Here is one result:
azdkex5yi5rm3suewa7bxazpc
qi2qsg7qvl7en4cx2c5s9vfrp
l8t0lv659uba500t6e7fea518
7bt80g6bpjtjltna7ru6e3t15
t90s62fvnex5yqcq2osv97n5z
hibzw8g95wv15x2s2wv8cobem
dqiubm165tp1pci34hparuqs7
5d0chkl85liaowx3v88isk4oo
6iw5iktzaqa7jnf4g9lakqdhk
insg7iggsc22fqd1jkhbrxo53
And here is another:
bstudsgn85xq7dncy9fubu8we
g9hkl0qf493u0x7xmaz0hunqd
9lyclhrp19iz33v0hdmt7txoh
b45t1t1xfves5fjn3syzilhjq
p5bh89iojemh7adb41suew20d
goznie54278vfb4968zx3n9o8
lmouaz8j7i033mtjx1t6ymbjn
jxgqajz7g9db0g6j4o8l6ukgw
2ge6nhv69emcqanc6f63yeoro
yws75ttmbnsbyxvwwch86wbe2
Note:
As noted, you need to have downloaded the FXSL library and you will need to set the href attribute of the <xsl:import> declaration above, to point to the exact location of the file system, where the imported stylesheet file resides.
const characters = 'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
const charactersLength = characters.length;
const generate25letterrandomstring = () => {
let randomString = '';
for (let i = 0; i < 25; i++) {
randomString += characters.charAt(Math.floor(Math.random() * charactersLength));
}
console.log(randomString);
return randomString;
}
generate25letterrandomstring()

How can I enhance the quality of images generated from PDF?

We use TYPO3 9.5.13, GraphicsMagick 1.3.29, Ghostscript 9.27, BK2K\\BootstrapPackage 11.0.1
Using PDFs as normal images is no problem.
But now I want a 'preview' of the PDFs in full column width (~1000px). And although the PDF has a high resolution, the generated Image has a width of 595px only and any text is nearly unreadable.
The problem occurs with Image-CEs like in the uploads CE, which I want to enhance:
each time I want the image using the full column width it renders in a bad resolution and the image seems distorted.
here a small area from the generated image:
and the same area from the PDF as shown in PDF-reader:
The fluid part:
<img loading="lazy"
src="{f:uri.image(image: file, cropVariant: 'default', maxWidth: 1100)}"
width="{bk2k:lastImageInfo(property: 'width')}"
height="{bk2k:lastImageInfo(property: 'height')}"
intrinsicsize="{bk2k:lastImageInfo(property: 'width')}x{bk2k:lastImageInfo(property: 'height')}"
title="{file.properties.title}"
alt="{file.properties.alternative}">
which results in something like:
<img loading="lazy"
src="/fileadmin/_processed_/3/2/csm_Warum_D-Arzt_6afd8ad8d4.png"
intrinsicsize="595x842"
title=""
alt=""
width="595"
height="842">
Edit:
In case of using this FLUID:
<img loading="lazy"
src="{f:uri.image(image: file, cropVariant: 'default', width: 1100)}"
width="{bk2k:lastImageInfo(property: 'width')}"
height="{bk2k:lastImageInfo(property: 'height')}"
intrinsicsize="{bk2k:lastImageInfo(property: 'width')}x{bk2k:lastImageInfo(property: 'height')}"
title="{file.properties.title}"
alt="{file.properties.alternative}">
I get:
<img loading="lazy"
src="/fileadmin/_processed_/3/2/csm_Warum_D-Arzt_2ffb63b15f.png"
intrinsicsize="1100x1557"
title=""
alt=""
width="1100"
height="1557">
the image is bigger (and overflow the container) but the quality is worse the same, notice the bigger pixels:
Actually, a PDF is NOT an image. It is a container format which can contain vectors and images with different colorspaces and dimensions. A bitmap image has fixed dimensions width, height, density, a PDF not. Originally, it was created and optimized to work for printers, not for screens.
TYPO3 reflects that with a message in the backend:
IMHO, there is no perfect way of handling PDFs to behave like images, as you know the output format, but not the input format (properly). Two ways to get acceptable results:
Extend content elements or create new ones and add a second image slot for PDF preview images. Create the preview images yourself with a graphical program.
Write your own viewhelper and create your own thumbnails
Solution 1 will lead to more work for editors. Would be no best practise for me.
I would go with an own viewhelper.
Add your own render type for PDFs:
<f:switch expression="{file.type}">
<f:case value="5">
<f:render partial="Media/Type/Pdf" arguments="{file: file, dimensions: dimensions, data: data, settings: settings}" />
</f:case>
<f:defaultCase>
<f:render partial="Media/Type/Image" arguments="{file: file, dimensions: dimensions, data: data, settings: settings}" />
</f:defaultCase>
</f:switch>
Partial Media/Type/Pdf
{namespace cv=Conversion\HelperUtils\ViewHelpers}
<html xmlns:f="http://typo3.org/ns/TYPO3/CMS/Fluid/ViewHelpers" xmlns:ce="http://typo3.org/ns/TYPO3/CMS/FluidStyledContent/ViewHelpers" data-namespace-typo3-fluid="true">
<cv:forEachPdfThumbnail document="{file}" pages="1" as="pdfPreviewPage">
<f:image src="{pdfPreviewPage}" alt="" />
</cv:forEachPdfThumbnail>
</html>
ViewHelper:
This viewhelper convert multiple pages from a PDF, using the CommandUtility::imageMagickCommand. You can raise the density to a higher value to improve quality.
As mentioned, this viewhelper was developed a few years ago and could be improved (e.g. saving to fileadmin/processed instead of typo3temp. Feel free to clone and improve: https://github.com/conversion1/t3-pdfthumbnailviewhelper/blob/master/ForEachPdfThumbnailViewHelper.php
public static function renderStatic(array $arguments, \Closure $renderChildrenClosure, RenderingContextInterface $renderingContext)
{
$templateVariableContainer = $renderingContext->getVariableProvider();
/** #var \TYPO3\CMS\Core\Resource\FileReference $document */
$document = $arguments['document'];
$pages = explode(',', str_replace(' ', '', $arguments['pages']));
$colorspace = TRUE === isset($GLOBALS['TYPO3_CONF_VARS']['GFX']['colorspace']) ? $GLOBALS['TYPO3_CONF_VARS']['GFX']['colorspace'] : 'RGB';
$absFilePath = GeneralUtility::getFileAbsFileName($document->getOriginalFile()->getPublicUrl());
$destinationPath = 'typo3temp/';
$destinationFilePrefix = 'pdf-prev_' . $document->getOriginalFile()->getNameWithoutExtension();
$destinationFileExtension = 'png';
$output = '';
foreach ($pages as $pageNumber) {
if($pageNumber > 0) {
$pageNumber = intval($pageNumber);
} else {
$pageNumber = 1;
}
$destinationFileSuffix = '_page-' . $pageNumber;
$absDestinationFilePath = GeneralUtility::getFileAbsFileName($destinationPath . $destinationFilePrefix . $destinationFileSuffix . '.' . $destinationFileExtension);
$imgArguments = '-colorspace ' . $colorspace;
$imgArguments .= ' -density 300';
$imgArguments .= ' -sharpen 0x.6';
$imgArguments .= ' "' . $absFilePath . '"';
$imgArguments .= '['. intval($pageNumber - 1) .']';
$imgArguments .= ' "' . $absDestinationFilePath . '"';
if(!file_exists($absDestinationFilePath)) {
$command = CommandUtility::imageMagickCommand('convert', $imgArguments);
CommandUtility::exec($command);
}
$thumbnail = substr($absDestinationFilePath, strlen(Environment::getPublicPath()));
$templateVariableContainer->add($arguments['as'], $thumbnail);
$output .= $renderChildrenClosure();
$templateVariableContainer->remove($arguments['as']);
}
return $output;
}
Edit:
A third way: You can use a JavaScript library to generate thumbnails on the fly. E.g. https://github.com/mozilla/pdf.js

Getting the failed-assert from the sch file

I have an .sch file provided by PEPPOL website: http://docs.peppol.eu/poacc/billing/3.0/files/PEPPOL-EN16931-UBL.sch and we need to convert it to .xsl. We have done the conversion using a tool called oXygen.
This is the snipped from .sch that generates the [BR-S-06]
<rule context="cac:AllowanceCharge[cbc:ChargeIndicator=false()]/cac:TaxCategory[normalize-space(cbc:ID)='S'][cac:TaxScheme/normalize-space(upper-case(cbc:ID))='VAT']">
<assert id="BR-S-06" flag="fatal" test="(cbc:Percent) > 0">[BR-S-06]-In a Document level allowance (BG-20) where the Document level allowance VAT category code (BT-95) is "Standard rated" the Document level allowance VAT rate (BT-96) shall be greater than zero.</assert>
</rule>
This is how I am expecting a rule to show as:
<!--ASSERT -->
<xsl:choose>
<xsl:when test="#listID = 'UNCL1001'"/>
<xsl:otherwise>
<svrl:failed-assert xmlns:svrl="http://purl.oclc.org/dsdl/svrl" test="#listID = 'BR-S-06'">
<xsl:attribute name="id">BR-S-06</xsl:attribute>
<xsl:attribute name="flag">fatal</xsl:attribute>
<xsl:attribute name="location">
<xsl:apply-templates select="." mode="schematron-select-full-path"/>
</xsl:attribute>
<svrl:text>[BR-S-06]-In a Document level allowance (BG-20) where the Document level allowance VAT category code (BT-95) is "Standard rated" the Document level allowance VAT rate (BT-96) shall be greater than zero.</svrl:text>
</svrl:failed-assert>
</xsl:otherwise>
</xsl:choose>
<xsl:apply-templates select="#*|*" mode="M7"/>
This is how it is actually shown:
<!--ASSERT -->
<xsl:choose>
<xsl:when test="(cbc:Percent) > 0"/>
<xsl:otherwise>
<xsl:message xmlns:iso="http://purl.oclc.org/dsdl/schematron">
<xsl:text>[BR-S-06]-In a Document level allowance (BG-20) where the Document level allowance VAT category code (BT-95) is "Standard rated" the Document level allowance VAT rate (BT-96) shall be greater than zero.</xsl:text>
</xsl:message>
</xsl:otherwise>
</xsl:choose>
<xsl:apply-templates select="#*|node()" mode="M10"/>
I am expecting to see the failed-assert element because it also contains the id/flag/location rather then what i currently get which is a message.
To run the validation using Saxon we have the following code:
public static Dictionary<string, List<ValidationResult>> ValidateXML(string xslTemplate, string xslName, XmlDocument document)
{
Dictionary<string, List<ValidationResult>> resultToReturn = new Dictionary<string, List<ValidationResult>>();
XmlNamespaceManager xmlNamespacesForDocument = GetAllNamespaces(document);
var transformAssertFailed = new List<ValidationResult>();
var processor = new Processor();
var compiler = processor.NewXsltCompiler();
var executable = compiler.Compile(new MemoryStream(Encoding.UTF8.GetBytes(xslTemplate)));
var destination = new DomDestination();
MemoryStream xmlStream = new MemoryStream();
document.Save(xmlStream);
xmlStream.Position = 0;
using (xmlStream)
{
var transformer = executable.Load();
transformer.SetInputStream(xmlStream, new Uri("file:///C:/"));
transformer.Run(destination);
}
return resultToReturn;
}
I am not sure what is wrong here, maybe the .sch file that I started with or maybe the .sch to .xsl converter.
I have posted the same question on the oXygen form here and I got my question answered.

Selected Marked elements are not writing in new document

This code was working and now it's not. I have no idea why. What the code is supposed to do is take all content after to and paste write it to a new file.
Right now it's grabbing all the text regardless of the start/stop markers
I've tried changing the insideBlock to true or false. I've also tried changing the source document to make sure the comment markers are right.
Here's my code
Dim startMark As String = "<!--#start#-->"
Dim stopMark As String = "<!--#stop#-->"
searchDir = txtDirectory.Text
Prefix = txtBxUnique.Text
For Each singleFile In allFiles
If File.Exists(singleFile.FullName) Then
Dim fileName = singleFile.FullName
Debug.Print("file name : " & fileName)
' A backup first
Dim backup As String = fileName & ".bak"
File.Copy(fileName, backup, True)
' Load lines from the source file in memory
Dim lines() As String = File.ReadAllLines(backup)
' Now re-create the source file and start writing lines inside a block
Dim insideBlock As Boolean = False
Using sw As StreamWriter = File.CreateText(backup)
For Each line As String In lines
If line = startMark Then
' start writing at the line below
insideBlock = True
ElseIf line = stopMark Then
' Stop writing
insideBlock = False
ElseIf insideBlock = True Then
' Write the current line in the block
sw.WriteLine(line)
End If
Next
End Using
End If
Next
Here's source data
<!--Arbortext, Inc., 1988-2010, v.4002-->
<!DOCTYPE DOC PUBLIC "-//USA-DOD//DTD 38784STD-BV7//EN" [
<!ENTITY graphic3-16_dgcs_file_window SYSTEM Graphics\3-16_dgcs_file_window.cgm" NDATA cgm>
<!ENTITY graphic19000_2 SYSTEM "Graphics\19000_2.cgm" NDATA cgm> ]>
<?Pub UDT _bookmark _target>
<?Pub EntList alpha bull copy rArr sect trade deg>
<?Pub Inc>
<doc service="af" docid="TO 1Q-1(M)B-2-2-12-1" docstat="formal" verstatpg="ver" cycle="2" chglevel="1">
<front numcols="1">
<idinfo>
<tmidno></tmidno>
<chgnum></chgnum>
<chgdate></chgdate>
<chghistory>
<chginfo>
<chgtxt></chgtxt>
<date></date></chginfo></chghistory>
<doctype></doctype>
<maintlvl></maintlvl>
<!--#start#-->
<prtitle>
<subject>
Trying to get this code to work</subject></prtitle>
<mfr></mfr>
<contractno>A12345</contractno>
<contractno>B12</contractno>
<!--#start#-->
<line></line>
<contractno>Contract No</contractno>
<supersed></supersed>
<discl>Discipline</discl>
<distrib>
<emphasis type="u"></emphasis></distrib>
<expcont></expcont>
<destr>Destruction</destr>
<authnot></authnot>
<pubdate></pubdate></idinfo>
<lep>
<lepcontents autobuild="1"></lep>
<contents autobuild="1">
<illuslist autobuild="1">
<tablelist autobuild="1">
<foreword>
<!--#start#-->
<para0 verstatus="ver">
<title></title>
<para>
This text should be in the new file</para></para0>
<para0 verstatus="ver">
<title></title>
<para>
<!--#stop#-->
<acronym>
<def></def>
<term></term></acronym></para></para0>
<?Pub Caret -2></foreword>
<safesum>
<para0 verdate="5/2/16" verstatus="ver">
<title>
</title>
<para>
</para>
</para0></safesum></front>
<body numcols="1">
<chapter id="chap1">
<title>Chapter 1</title>
<para0 verstatus="ver">
<title>Paragraph 1</title>
<para></para></para0></chapter>
<chapter id="chap2">
<title>Chapter 2</title>
<section id="thoery_of_operation_section">
<title></title>
<para0 verstatus="ver">
<title>Paragraph 2</title>
<para>
<!--#start#-->
<change level="1" change="delete">
<emphasis type="u" color="blue">
<xref xrefid="fig2-1">Figure 1 Smelly Fish</emphasis></change>
<!--#stop#-->
</para></para0></section>
<section>
<title></title>
<para0>
<title></title></para0></section></chapter>
</body>
</doc>
The results should be just the elements in between the markers.
<prtitle>
<subject>
Trying to get this code to work</subject></prtitle>
<mfr></mfr>
<contractno>A12345</contractno>
<contractno>B12</contractno>
<para0 verstatus="ver">
<title>Para0 Title</title>
<para>
This text should be in the new file</para></para0>
<para0 verstatus="ver">
<title></title>
<para>
<acronym>
<def>definition</def>
<term>Look here</term></acronym></para>
<change level="1" change="delete">
<emphasis type="u" color="blue">
<xref xrefid="fig2-1">Figure 1 Smelly Fish</emphasis></change>
Thank you for all the help,
Max

Save vb.net HTML in UTF8 or Unicode

Hello and good afternoon, i am in development of my project long story short i need to save my html base page with UTF8 or Unicode
Dim y As String
Dim UTF8encoding() As Byte, MyEncoder As New System.Text.UTF8Encoding(TRUE)
y = (html code should be here will be under)
Dim utf8 As New UTF8Encoding()
Dim utf8EmitBOM As New UTF8Encoding(True)
Dim code As String = y
Path = "C:\Users\OWNER\Desktop\invoice.html"
Try
Dim my_write As System.IO.StreamWriter
my_write = IO.File.CreateText(path)
my_write.write(utf8EmitBOM.GetPreamble())
UTF8encoding = System.Text.Encoding.Convert(System.Text.Encoding.UTF8, System.Text.Encoding.Unicode, MyEncoder.GetBytes(y))
my_write.WriteLine(code)
my_write.Close()
Catch ex As Exception
End Try
HTML
<html>
<style>
table, th, td
{ border: 1px solid black; }
</style>
</head>
<body><center><b>
<font size=20>Family Butcher</font></br></br></br><center><b>
<font size =4>164 Battersea Bridge Road London SW11 3AW</center></font><center><b>
<font size =4>Tel: Mob:</center></font><center><b>
<font size =4>VAT No: 835522334</center></br></font></br>
<table Border = 3 WIDTH=610 align=left></br><tr>
<th colspan=3 align = left>To: " & txto & " <br/>
<br/>Date: <br/><br/>Invoice nº <br/></th></tr>
<td WIDTH = 100 HEIGHT=40><center><b>Quantity </b></td>
<td WIDTH = 400><center><b>Description</b></td><td><center>
<b>Value</b> </br></p>
</body></html>
At the moment i do not know how to save the file in either unicode or utf8 and then i cannot open the html file without the symbols "Â" Thanks for any support
I suggest you use File.WriteAllText - this overload allows you to specify encoding, so you can accomplish your goal with just one line of code, for example:
File.WriteAllText(path, code, Encoding.UTF8)