Convert html content to PDF Byte Array with kotlin - kotlin

val sanitizedHTML = Jsoup.clean(html, whitelist)
val textRenderer = ITextRenderer()
val outputStream = ByteArrayOutputStream()
textRenderer.setDocumentFromString(sanitizedHTML)
textRenderer.layout()
textRenderer.createPDF(outputStream)
textRenderer.finishPDF()
return Base64.getDecoder().decode(outputStream.toByteArray())
I would like to generate pdf from html content and rather than saving as file, would like to upload to server which expects it to be ByteArray.
I tried to do above using jsoup to clean html and textRenderer for generating pdf but keep receiving error about invalid Base64 character 25. Could someone help what I am doing wrong here.

return Base64.getDecoder().decode(outputStream.toByteArray())
This was incorrect, if I remove Base64 decoding it is working well.

Related

Base64 string to pdf in groovy

I am new to groovy and to this forum.
I am using a middleware tool(SAP CPI) where I am getting a pdf in base64 string format. I need to send the pdf to another system.
This is what I did:
Store the Base64 input in a string.
Pass it through a Base64 Decoder(this is an inbuilt function)
Use below code and pass the base64 string as an input to "body":
import com.sap.gateway.ip.core.customdev.util.Message;
import java.util.HashMap;
def Message processData(Message message){
def body = message.getBody(String.class);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream()
outputStream.write(body.getBytes())
message.setBody(outputStream.toString())
return message;
}
what I get as an output looks something like below:
%PDF-1.7
%????
5 0 obj
<</Type/Font/Subtype/Type1/BaseFont/Helvetica/Encoding/WinAnsiEncoding>>
endobj
14 0 obj
<</Title(T....
and so on..
Now when I save this output as a pdf file, it is either blank or it says "There was an error opening this document. The file is damaged and could not be repaired."
What am I doing wrong? Thanks in advance!
Note: Due to limitation of the middleware, I cannot create a whole project, but will rather have to use processData() function and can manipulate the string inside it. Thanks!

How to decode data from Content Stream

I created a pdf document using the code looks like the following:
// The text parameter equels 'שדג' it is Hebrew. unicode equivalent is '\u05E9\u05D3\u05D2'
private static void createSimplePdf(String filename, String text) throws Exception {
final String path = RunItextApp.class.getResource("/Arial.ttf").getPath();
final PdfFont font = PdfFontFactory.createFont(path, PdfEncodings.IDENTITY_H);
Style hebrewStyle = new Style()
.setBaseDirection(BaseDirection.RIGHT_TO_LEFT)
.setFontSize(14)
.setFont(font);
final PdfWriter pdfWriter = new PdfWriter(filename);
final PdfDocument pdfDocument = new PdfDocument(pdfWriter);
final Document pdf = new Document(pdfDocument);
pdf.add(
new Paragraph(text)
.setFontScript(Character.UnicodeScript.HEBREW)
.addStyle(hebrewStyle)
);
pdf.close();
System.out.println("The document '" + filename + "' has been created.");
}
and after that, I tried to open this document using pdfbox util and I got the following data:
but I got an unexpected result in the Contents:stream section especially Tj tag. I expected string like the following 05E905D305D2 but I got 02b902a302a2. I tried to convert this hex string to normal string and I got the following result: ʹʣʢ but I expected that string שדג.
What do I wrong? Hot to convert this 02b902a302a2 string and get שדג?
This answer writes in a comment #usr2564301. Thanks for the help!
The numbers you get are not Unicode characters but font indexes instead. (Check how the font is embedded!) The text in a PDF does not specifically care about Unicode – it may or may not be this. Good PDF creators add a /ToUnicode table to help decoding, but it's optional.

How is a PDF supposed to be encoded?

I'm trying to set up an API that generate PDF from web page (provided as URL). The API is gotenberg from thecodingmachine. I have it on Docker, it works just fine, I can't generate PDF through http request send with curl (for now I'm just trying to make it work, so I use the request provided as example in the documentation)
Now I am trying to make it work with my groovy/grails app. So I'm using the java tools to make the request.
Now here is my problem : the PDF file I get is blank (my app opend directly in my browser). It do has the right content, if I open it with the text editor, it's not empty, and it has almost the same content as the one I make using the curl request (which isn't blank).
I am 99% sure the problem come from the encoding. I tried changing the InputStreamReader encoding parameter, but it doesn't change anything. Here I put "X-MACROMAN" because that the encoding inside the pdf file that isn't blank, but it still doesn't change.
Here is my code :
static def execute(def apiURL)
{
def httpClient = HttpClients.createDefault()
// Request parameters and other properties.
def request = new HttpPost(apiURL)
MultipartEntityBuilder builder = MultipartEntityBuilder.create()
builder.addTextBody("remoteURL", 'https://google.com')
builder.addTextBody("marginTop", '0')
builder.addTextBody("marginBottom", '0')
builder.addTextBody("marginLeft", '0')
builder.addTextBody("marginRight", '0')
HttpEntity multipart = builder.build()
request.setEntity(multipart)
def response = httpClient.execute(request)
BufferedReader rd = new BufferedReader(
new InputStreamReader(response.getEntity().getContent(), "X-MACROMAN"))
StringBuffer result = new StringBuffer()
String line = ""
Boolean a = Boolean.FALSE
while ((line = rd.readLine()) != null) {
if(!a){
a = Boolean.TRUE
}
else {
result.append("\n")
}
result.append(line)
}
return result
I am 99% sure the problem come from the encoding. I tried changing the InputStreamReader encoding parameter, but it doesn't change anything. Here I put "X-MACROMAN" because that the encoding inside the pdf file that isn't blank, but it still doesn't change.
Did I made myself clear ? And does those who understands has any ideas why my PDFs are blank ?

How would do I upload a PDF file?

I am currently making an app for students where they can upload a PDF file to a server. I am using the android Volley API but have been testing the function using JPEG files.
This is my code
public String getStringImage(Bitmap bmp) {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
bmp.compress(Bitmap.CompressFormat.JPEG, 100, baos);
byte[] imageBytes = baos.toByteArray();
String encodedImage = Base64.encodeToString(imageBytes, Base64.DEFAULT);
return encodedImage;
}
How would I change this code so that a PDF can be uploaded instead?
Do I still used Base64? and imageBytes?
Or is there an alternate method?
You should be able to reuse most of the code that you have there, instead of taking in a Bitmap, you would take in a File. You won't be able to use Bitmap.Compress on it though so that will need to be removed.
For uploading a PDF, you can Base64 encode it and send it as a string as part of the request, this might get out of control if you are handling large files.
The other option is to use a multipart form, I would suggest taking a look at this stackoverflow question and answer for how to do that.

Uncompressing a Gzip format?

I am facing a problem with Gzip uncompressing.
The situation is like this. I have some text in UTF-8 format. Now this text is compressed using gzdeflate() function in PHP and then stored in a blob object in Mysql.
Now I tried to retrieve the blob object and then used Java's Gzip Stream to un compress it. But it throws an error saying that it is not in GZIP format.
I even used Inflater in Java to do the same but now I get "DataFormatException:incorrect header check". The code for the inflater is as below.
//rs is the resultset
//blobAsBytes is the byte array
while(rs.next()){
blob = rs.getBlob("old_text");
int blobLength = (int) blob.length();
blobAsBytes = blob.getBytes(1, blobLength);
}
Inflater decompresser = new Inflater();
decompresser.setInput(blobAsBytes);
byte[] result = new byte[100];
int resultLength = decompresser.inflate(result); // this is the line where the exception is occurring.
decompresser.end();
// Decode the bytes into a String
String outputString = new String(result, 0, resultLength, "UTF-8");
System.out.println(outputString);
I have to do this using Java and get all the text back that is stored in the database.
Can someone please help me with this.
Use gzencode(), not gzdeflate(). The latter does not produce the gzip format, it produces the deflate format. The former does produce the gzip format. The PHP functions are horribly and confusingly named.
Alternatively, use the java.util.zip.Inflater class with nowrap true in the Inflater constructor. That will decode raw deflate data on the Java end.