Pound(£) symbol replaced with some special character in struts FormFile - struts

Am uploading a file using struts FormFile, In uploaded file I specified pound(£) symbol and do uploading, but in the action server while reading the FormFile using InputStream or Byte, the pound symbol got replaced with some special character.
To resolve this issue, i made few changes in my jsp
<%#page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
My action class:
final FormFile file = form.getUploadFile();
StringWriter stringWriter = new StringWriter();
IOUtils.copy(file.getInputStream(), stringWriter);
final String data = stringWriter.toString();
final byte[] bytes = file.getFileData();
final String byteStr = new String(bytes,"UTF-8");
In both string fields and , the pound symbol replaced with some special character.
Can any one please help me to fix this issue?

Related

How to decode data from Content Stream

I created a pdf document using the code looks like the following:
// The text parameter equels 'שדג' it is Hebrew. unicode equivalent is '\u05E9\u05D3\u05D2'
private static void createSimplePdf(String filename, String text) throws Exception {
final String path = RunItextApp.class.getResource("/Arial.ttf").getPath();
final PdfFont font = PdfFontFactory.createFont(path, PdfEncodings.IDENTITY_H);
Style hebrewStyle = new Style()
.setBaseDirection(BaseDirection.RIGHT_TO_LEFT)
.setFontSize(14)
.setFont(font);
final PdfWriter pdfWriter = new PdfWriter(filename);
final PdfDocument pdfDocument = new PdfDocument(pdfWriter);
final Document pdf = new Document(pdfDocument);
pdf.add(
new Paragraph(text)
.setFontScript(Character.UnicodeScript.HEBREW)
.addStyle(hebrewStyle)
);
pdf.close();
System.out.println("The document '" + filename + "' has been created.");
}
and after that, I tried to open this document using pdfbox util and I got the following data:
but I got an unexpected result in the Contents:stream section especially Tj tag. I expected string like the following 05E905D305D2 but I got 02b902a302a2. I tried to convert this hex string to normal string and I got the following result: ʹʣʢ but I expected that string שדג.
What do I wrong? Hot to convert this 02b902a302a2 string and get שדג?
This answer writes in a comment #usr2564301. Thanks for the help!
The numbers you get are not Unicode characters but font indexes instead. (Check how the font is embedded!) The text in a PDF does not specifically care about Unicode – it may or may not be this. Good PDF creators add a /ToUnicode table to help decoding, but it's optional.

How is a PDF supposed to be encoded?

I'm trying to set up an API that generate PDF from web page (provided as URL). The API is gotenberg from thecodingmachine. I have it on Docker, it works just fine, I can't generate PDF through http request send with curl (for now I'm just trying to make it work, so I use the request provided as example in the documentation)
Now I am trying to make it work with my groovy/grails app. So I'm using the java tools to make the request.
Now here is my problem : the PDF file I get is blank (my app opend directly in my browser). It do has the right content, if I open it with the text editor, it's not empty, and it has almost the same content as the one I make using the curl request (which isn't blank).
I am 99% sure the problem come from the encoding. I tried changing the InputStreamReader encoding parameter, but it doesn't change anything. Here I put "X-MACROMAN" because that the encoding inside the pdf file that isn't blank, but it still doesn't change.
Here is my code :
static def execute(def apiURL)
{
def httpClient = HttpClients.createDefault()
// Request parameters and other properties.
def request = new HttpPost(apiURL)
MultipartEntityBuilder builder = MultipartEntityBuilder.create()
builder.addTextBody("remoteURL", 'https://google.com')
builder.addTextBody("marginTop", '0')
builder.addTextBody("marginBottom", '0')
builder.addTextBody("marginLeft", '0')
builder.addTextBody("marginRight", '0')
HttpEntity multipart = builder.build()
request.setEntity(multipart)
def response = httpClient.execute(request)
BufferedReader rd = new BufferedReader(
new InputStreamReader(response.getEntity().getContent(), "X-MACROMAN"))
StringBuffer result = new StringBuffer()
String line = ""
Boolean a = Boolean.FALSE
while ((line = rd.readLine()) != null) {
if(!a){
a = Boolean.TRUE
}
else {
result.append("\n")
}
result.append(line)
}
return result
I am 99% sure the problem come from the encoding. I tried changing the InputStreamReader encoding parameter, but it doesn't change anything. Here I put "X-MACROMAN" because that the encoding inside the pdf file that isn't blank, but it still doesn't change.
Did I made myself clear ? And does those who understands has any ideas why my PDFs are blank ?

newtonsoft SerializeXmlNode trailing nulls

I am creating an XmlDoc in C# and using Newtonsoft to serialize to JSON. It works, but I am getting a bunch of what appear to be "NUL"'s at the end of the JSON. No idea why. Anyone seen this before?
CODE:
XmlDocument xmlDoc = BuildTranslationXML(allTrans, applicationName, language);
// Convert the xml doc to json
// the conversion inserts \" instead of using a single quote, so we need to replace it
string charToReplace = "\"";
string jsonText = JsonConvert.SerializeXmlNode(xmlDoc);
// json to a stream
MemoryStream memoryStream = new MemoryStream();
TextWriter tw = new StreamWriter(memoryStream);
tw.Write(jsonText);
tw.Flush();
tw.Close();
// output the stream as a file
string fileName = string.Format("{0}_{1}.json", applicationName, language);
return File(memoryStream.GetBuffer(), "text/json", fileName);
The file is served up to the calling web page and the browser prompts the user to save the file. When opening the file, it displays the correct JSON but also has all the trailing nulls. See image below (hopefully the stackoverflow link works):
file screenshot
The GetBuffer() method returns the internal representation of the MemoryStream. Use ToArray() instead to get just the part of that internal array that has data Newtonsoft has put in there.

DataContractSerializer encode \n

I am using the following code to serialize my object
DataContractSerializer ser = new DataContractSerializer(obj.GetType());
String text;
using (MemoryStream memoryStream = new MemoryStream())
{
ser.WriteObject(memoryStream, obj);
byte[] data = new byte[memoryStream.Length];
Array.Copy(memoryStream.GetBuffer(), data, data.Length);
text = Encoding.UTF8.GetString(data);
}
My object is serializing like this:
<Meta xmlns:i=\"http://www.w3.org/2001/XMLSchema-instance\"><Description>This is my new file
\n
\nMore Data</Description><Title>My Other Test Document</Title></Meta>
Notice that my \n was not escaped. Why is that? What is the best way to send \r\n through xml.
I searched and I dont see any articles about this. Am I missing some attribute in my serialize code?
Babel -- not at all, you're not missing any special attributes here. The \n is getting serialized. If it were being interpreted as a non-newline special character, you would see \n in the string, not \n itself. Are you not catching the newline on the client end, and have you verified it by spitting it out via an stdout call?

What's the easiest way of converting an xhtml string to PDF using Flying Saucer?

I've been using Flying Saucer for a while now with awesome results.
I can set a document via uri like so
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(xhtmlUri);
Which is nice, as it will resolve all relative css resources etc relative to the given URI. However, I'm now generating the xhtml, and want to render it directly to a PDF (without saving a file). The appropriate methods in ITextRenderer seem to be:
private Document loadDocument(final String uri) {
return _sharedContext.getUac().getXMLResource(uri).getDocument();
}
public void setDocument(String uri) {
setDocument(loadDocument(uri), uri);
}
public void setDocument(Document doc, String url) {
setDocument(doc, url, new XhtmlNamespaceHandler());
}
As you can see, my existing code just gives the uri and ITextRenderer does the work of creating the Document for me.
What's the shortest way of creating the Document from my formatted xhtml String? I'd prefer to use the existing Flying Saucer libs without having to import another XML parsing jar (just for the sake of consistent bugs and functionality).
The following works:
Document document = XMLResource.load(new ByteArrayInputStream(templateString.getBytes())).getDocument();
Previously, I had tried
final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setValidating(false);
final DocumentBuilder documentBuilder = dbf.newDocumentBuilder();
Document document = documentBuilder.parse(new ByteArrayInputStream(templateString.getBytes()));
but that fails as it attempts to download the HTML docType from http://www.w3.org (which returns 503's for the java libs).
I use the following without problem:
final DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setValidating(false);
DocumentBuilder builder = documentBuilderFactory.newDocumentBuilder();
builder.setEntityResolver(FSEntityResolver.instance());
org.w3c.dom.Document document = builder.parse(new ByteArrayInputStream(doc.toString().getBytes()));
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(document, null);
renderer.layout();
renderer.createPDF(os);
The key differences here are passing in a null URI, and also provided the DocumentBuilder with an entity resolver.