I'm trying to convert HTML stream to XML using SgmlReader for further parsing. This conversion is part of an APP i'm developing for Windows 8 Store. Below is the method that convert Html to XML:-
public static void ConvertToXml(string webResponse)
{
StringWriter sWriter = new StringWriter();
XmlWriter xmlWriter = XmlWriter.Create(sWriter);
SgmlReader sgmlReader = new SgmlReader();
sgmlReader.DocType = "HTML";
sgmlReader.WhitespaceHandling = WhitespaceHandling.All;
sgmlReader.CaseFolding = CaseFolding.ToLower;
sgmlReader.InputStream = new StringReader(webResponse);
sgmlReader.IgnoreDtd = true;
while (!sgmlReader.EOF)
{
xmlWriter.WriteNode(sgmlReader, true);
}
xmlWriter.Flush();
XmlString = sWriter.ToString();
}
The sgmlReader.WhitespaceHandling = WhitespaceHandling.All; is the problem as Xml.WhitespaceHandling is not present. Is there anyother way to do this?
After alot of reading and testing/debugging just found that sgmlReader.WhitespaceHandling = WhitespaceHandling.All is not needed atleast in my case, as sgmlReader.WhitespaceHandling is set to All by default. However i removed sgmlReader.IgnoreDtd = true; and now my Xml file look Normal ;)
Hope this will help someone
Related
I have an existing library that I can use to receive a docx file and return it. The software is .Net Core hosted in a Linux Docker container.
It's very limited in scope though and I need to perform some actions it can't do. As these are straightforward I thought I would use OpenXML, and for my proof of concept all I need to do is to read a docx as a memorystream, replace some text, turn it back into a memorystream and return it.
However the docx that gets returned is unreadable. I've commented out the text replacement below to eliminate that, and if I comment out the call to the method below then the docx can be read so I'm sure the issue is in this method.
Presumably I'm doing something fundamentally wrong here but after a few hours googling and playing around with the code I am not sure how to correct this; any ideas what I have wrong?
Thanks for the help
private MemoryStream SearchAndReplace(MemoryStream mem)
{
mem.Position = 0;
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(mem, true))
{
string docText = null;
StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream());
docText = sr.ReadToEnd();
//Regex regexText = new Regex("Hello world!");
//docText = regexText.Replace(docText, "Hi Everyone!");
MemoryStream newMem = new MemoryStream();
newMem.Position = 0;
StreamWriter sw = new StreamWriter(newMem);
sw.Write(docText);
return newMem;
}
}
If your real requirement is to search and replace text in a WordprocessingDocument, you should have a look at this answer.
The following unit test shows how you can make your approach work if the use case really demands that you read a string from a part, "massage" the string, and write the changed string back to the part. It also shows one of the shortcomings of any other approach than the one described in the answer already mentioned above, e.g., by demonstrating that the string "Hello world!" will not be found in this way if it is split across w:r elements.
[Fact]
public void CanSearchAndReplaceStringInOpenXmlPartAlthoughThisIsNotTheWayToSearchAndReplaceText()
{
// Arrange.
using var docxStream = new MemoryStream();
using (var wordDocument = WordprocessingDocument.Create(docxStream, WordprocessingDocumentType.Document))
{
MainDocumentPart part = wordDocument.AddMainDocumentPart();
var p1 = new Paragraph(
new Run(
new Text("Hello world!")));
var p2 = new Paragraph(
new Run(
new Text("Hello ") { Space = SpaceProcessingModeValues.Preserve }),
new Run(
new Text("world!")));
part.Document = new Document(new Body(p1, p2));
Assert.Equal("Hello world!", p1.InnerText);
Assert.Equal("Hello world!", p2.InnerText);
}
// Act.
SearchAndReplace(docxStream);
// Assert.
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(docxStream, false))
{
MainDocumentPart part = wordDocument.MainDocumentPart;
Paragraph p1 = part.Document.Descendants<Paragraph>().First();
Paragraph p2 = part.Document.Descendants<Paragraph>().Last();
Assert.Equal("Hi Everyone!", p1.InnerText);
Assert.Equal("Hello world!", p2.InnerText);
}
}
private static void SearchAndReplace(MemoryStream docxStream)
{
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(docxStream, true))
{
// If you wanted to read the part's contents as text, this is how you
// would do it.
string partText = ReadPartText(wordDocument.MainDocumentPart);
// Note that this is not the way in which you should search and replace
// text in Open XML documents. The text might be split across multiple
// w:r elements, so you would not find the text in that case.
var regex = new Regex("Hello world!");
partText = regex.Replace(partText, "Hi Everyone!");
// If you wanted to write changed text back to the part, this is how
// you would do it.
WritePartText(wordDocument.MainDocumentPart, partText);
}
docxStream.Seek(0, SeekOrigin.Begin);
}
private static string ReadPartText(OpenXmlPart part)
{
using Stream partStream = part.GetStream(FileMode.OpenOrCreate, FileAccess.Read);
using var sr = new StreamReader(partStream);
return sr.ReadToEnd();
}
private static void WritePartText(OpenXmlPart part, string text)
{
using Stream partStream = part.GetStream(FileMode.Create, FileAccess.Write);
using var sw = new StreamWriter(partStream);
sw.Write(text);
}
I need to generate the xml, but without jumping line, the way I generate it is all indented, how do I save without jumping the line?
document.LoadXml(soapEnvelope);
document.Save(#"E:\nota.xml");
I tried this code below:
XDocument document = XDocument.Load("arquivo.xml");
document.Save("arquivo2.xml", SaveOptions.DisableFormatting);
However, the SaveOptions option does not appear, I use ASP.NET CORE.
This way the data goes like this below:
<Rps>
<IdentificacaoRps>
<Numero>1</Numero>
<Serie>999</Serie>
<Tipo>1</Tipo>
</IdentificacaoRps>
<DataEmissao>2018-11-27</DataEmissao>
<Status>1</Status>
</Rps>
And I need them to leave like this.
<Rps><IdentificacaoRps><Numero>1</Numero><Serie>999</Serie><Tipo>1</Tipo</IdentificacaoRps><DataEmissao>2018-11-27</DataEmissao<Status>1</Status></Rps>
How to solve this problem, is there any way?
To efficiently remove the white-space, here I'm creating a new XmlDocument with PreserveWhiteSpace = false. Once you load the document from the file, you can access the white space free XML from doc.InnerXml, and then you can use that for whatever purpose, but here I'm sticking with the question and writing it to a file, and properly disposing the writer.
using (var writer = System.IO.File.CreateText("C:\\b.xml"))
{
var doc = new XmlDocument {PreserveWhitespace = false};
doc.Load("C:\\a.xml");
writer.WriteLine(doc.InnerXml);
writer.Flush();
}
Try if this helps-
XDocument document = XDocument.Load("arquivo.xml");
XmlWriterSettings xmlSettings = new XmlWriterSettings();
xmlSettings.Indent = false;
xmlSettings.NewLineChars = String.Empty;
using (XmlWriter xwWriter = XmlWriter.Create(#"c:\YourFileName.xml", xmlSettings))
document.Save(xwWriter);
I am working on pdf generation, it is successfully implemented using itextsharp.dll. It’s working fine on local environment after publish also. We have our own server at other site
But same code doesn't work on the server,pdf is not generated instead it gives an error: 'The document has no pages.'
Initially I thought it is due to no data in document but it works locally with or without data in the document.
I had code implemented as follows to make a web request Is any problem in that ??
try
{
var myHttpWebRequest = (HttpWebRequest)WebRequest.Create(strPdfData + "?objpatId=" + patID);
var response = myHttpWebRequest.GetResponse();
myHttpWebRequest.Timeout = 900000;
var stream = response.GetResponseStream();
StreamReader sr = new StreamReader(stream);
content = sr.ReadToEnd();
}
create a method in the controller:
[HttpGet]
public JsonResult GetFile()
{
var json = new WebClient().DownloadFile(string address, string fileName);
//This code is just to convert the file to json you can keep it in file format and send to the view
dynamic result = Newtonsoft.Json.JsonConvert.DeserializeObject(json);
var oc = Newtonsoft.Json.JsonConvert.DeserializeObject<countdata[]>(Convert.ToString(result.countdata));
return Json(oc, JsonRequestBehavior.AllowGet);
}
In the view just call this function:
#Url.Action('genPDF','GetFile');
Updated my MVC3/.Net 4.5/Azure solution to MVC4.
My code for uploading an image to blob storage appears to fail each time in the upgraded MVC4 solution. However, when I run my MVC3 solution works fine. Code that does the uploading, in a DLL, has not changed.
I’ve uploaded the same image file in the MVC3 and MVC4 solution. I’ve inspected in the stream and it appears to be fine. In both instance I am running the code locally on my machine and my connections point to blob storage in cloud.
Any pointers for debugging? Any known issues that I may not be aware of when upgrading to MVC4?
Here is my upload code:
public string AddImage(string pathName, string fileName, Stream image)
{
var client = _storageAccount.CreateCloudBlobClient();
client.RetryPolicy = RetryPolicies.Retry(3, TimeSpan.FromSeconds(5));
var container = client.GetContainerReference(AzureStorageNames.ImagesBlobContainerName);
image.Seek(0, SeekOrigin.Begin);
var blob = container.GetBlobReference(Path.Combine(pathName, fileName));
blob.Properties.ContentType = "image/jpeg";
blob.UploadFromStream(image);
return blob.Uri.ToString();
}
I managed to fix it. For some reason reading the stream directly from the HttpPostFileBase wasn't working. Simply copy it into a new memorystream solved it.
My code
public string StoreImage(string album, HttpPostedFileBase image)
{
var blobStorage = storageAccount.CreateCloudBlobClient();
var container = blobStorage.GetContainerReference("containerName");
if (container.CreateIfNotExist())
{
// configure container for public access
var permissions = container.GetPermissions();
permissions.PublicAccess = BlobContainerPublicAccessType.Container;
container.SetPermissions(permissions);
}
string uniqueBlobName = string.Format("{0}{1}", Guid.NewGuid().ToString(), Path.GetExtension(image.FileName)).ToLowerInvariant();
CloudBlockBlob blob = container.GetBlockBlobReference(uniqueBlobName);
blob.Properties.ContentType = image.ContentType;
image.InputStream.Position = 0;
using (var imageStream = new MemoryStream())
{
image.InputStream.CopyTo(imageStream);
imageStream.Position = 0;
blob.UploadFromStream(imageStream);
}
return blob.Uri.ToString();
}
I have a document which is stored in doc library of sharepoint..now i want to open and read data from it ...how can i do it ..filestream does not take url as input ..please help.
Try SPFile.OpenBinaryStream
From SharePoint 2007 - Read content from SPFile:
string content = string.Empty;
using (SPSite oSite = new SPSite("http://localhost/"))
{
using (SPWeb oWeb = oSite.OpenWeb())
{
SPDocumentLibrary doclib = (SPDocumentLibrary)oWeb.GetList(DocLibUrl);
SPFile htmlFile = doclib.Items[0].File;
using (System.IO.StreamReader reader = new System.IO.StreamReader(htmlFile.OpenBinaryStream()))
{
content = reader.ReadToEnd();
}
}
}
Sounds like you need to use a HTTPRequest object to retrieve the file. Here is a code example:
http://geeknotes.wordpress.com/2008/01/10/saving-a-possibly-binary-file-from-a-url-in-c/