PDF to .txt using iTextSharp issues - pdf

I've finally gotten a C# project loaded as a PowerShell module, and am able to convert a PDF to text, sort of.
Some of my PDFs seem to convert fine, others get cut off at the end, and some of them just simply give me output like this:
!nn!
Q9!r
!
!!
-;7!!
H*
Q0-.(;5!!n
#n%-,-Bn
!rn-;7!H+
(-;7
,-;7
,-;79
!-;7nnn;,-
n-n
n>-rn
n!!r
n+*,
,),+I-.n
#55=
!8
(
)%
rr-;7
Q!n-;7
n!Q
!!n
F3Q02
!8nH
#*825
-n-;7
-;7nrQ
&n!!-;7C4-;7
Kn>B)I!!!I
$rn==.=!
r*.//
#5>8636nKnn
I've checked the properties of the PDF files i'm testing with, and I can't find anything that sticks out as the reason a PDF outputs gibberish like above.
Can someone please give me some ideas on what else to look for and/or change?

Related

Problem with line breaks in PDF document generated by BIRT

I have some cell texts in a BIRT report which do not flow as nicely as I hoped.
For example,
The text is Long value resultwithaverylongname whichcannotbreak and I had hoped that it would be displayed like this:
Long value
resultwithaverylongname
whichcannotbreak
The render options are as follows:
renderOptions.setOutputFormat(IPDFRenderOption.OUTPUT_FORMAT_PDF);
renderOptions.setOption(IPDFRenderOption.PAGE_OVERFLOW, IPDFRenderOption.OUTPUT_TO_MULTIPLE_PAGES);
renderOptions.setOption(IPDFRenderOption.PDF_TEXT_WRAPPING, true);
renderOptions.setOption(IPDFRenderOption.PDF_WORDBREAK, true);
It seems to me that my desired output is physically possible but I don't know why BIRT does not break on a whitespace and breaks in the middle of the word.
I am using BIRT 4.16 (from Sourceforge). The texts contain normal whitespace (no non-breakable spaces) and are displayed via a data object.
3.Sep.21
I now have an example project which I am trying to commit to Github. In the meantime here is a screenshot showing breaks which look good and others which are not...
The git repo is here: https://github.com/pramsden/test.wordbreak
If the text "resultwithaverylongname" physically fits, then you are right:
BIRT should not break it in the middle of the word.
Your renderOptions seem right (depending of what BIRT version you are using).
At first glance this looks like a bug.
But: In German language, we often have quite long words, and I've created a lot of (complex) PDF reports with BIRT, but I never saw this issue.
So I guess it is a tiny silly detail which causes this.
Just to double-check:
Are the spaces between "Long", "value", "result..." normal spaces (0x20)? or non-breaking spaces?
Which BIRT release are you using?
Are you using a data item or a dynamic text item and if so, is it HTML or plain text?
Can you create a reproducible simple test case and post the rptdesign file somewhere?
well i don use BIRT , but try to use (\n),
in my case I use PDFFlow library to generate pdf docs, and to make a line-break i just use \n
this is a simple example code to create a pdf file and use line break
var DocumentBuilder.New()
.AddSection()
.AddParagraphToSection("Hello world! \n go to the next line")
.ToDocument()
.Build("Result.PDF");
try it and tell me if it works

A file named Butterfly7198.txt was found and It's boggling me

I happened to come across a file while changing images for a Toontown Rewritten's Context Pack I've been working on and I stumbled across this file marked
Butterfly7198.txt
and upon clicking on it, I was greeted with the following prompt that reads
"Hi. This isn't part of any grand story arc or anything. We're not using this as
some gimmick to announce some bold new feature. There's no big pot of gold at the end of this rainbow. We're looking for people with technical talent who want to help work on
Toontown Rewritten. You can discuss this file and its contents online if you wish. However, it IS meant to be solved alone, so please don't share answers or spoilers. Good luck. -Butterfly 7198"
Then I was greeted with a weird chain of letters and numbers on which I could only assume meant that this was encoded in some language I had no Idea about. The following goes
A/MNCrYNG1ljAAAAAAAAAAADAAAAQAAAAHNEAAAAZAAAZAEAbAAAWgEAZAAAZAEAbAIAWgMAZAIA
hAAAWgQAZAMAhAAAWgUAZAQAZQYAZgEAZAUAhAAAgwAAWVoHAGQBAFMoBgAAAGn/////TmMBAAAA
AwAAAAQAAABDAAAAczoAAABkAQB9AQB4LQB0AABkAwCDAQBEXR8AfQIAdAEAagIAfAEAfAAAF4MB
AGoDAIMAAH0BAHETAFd8AQBTKAQAAABOdAAAAABpAAQAAGkAABAAKAQAAAB0BgAAAHhyYW5nZXQI
AAAAX2hhc2hsaWJ0BgAAAHNoYTI1NnQGAAAAZGlnZXN0KAMAAAB0AQAAAG10AQAAAGh0AQAAAGko
AAAAACgAAAAAcxEAAABzZWNyZXRfbWVzc2FnZS5weXQFAAAAX2lzaGEEAAAAcwgAAAAAAQYBEwAd
AWMCAAAACQAAAAgAAABDAAAAcyMBAAB0AABkAQCDAQB9AgBkAgB9AwB4XwB0AQBkAQCDAQBEXVEA
fQQAfAMAfAIAfAQAGXQCAHwAAHwEAHQDAHwAAIMBABYZgwEAFzd9AwB8AgB8AwBkAwBAGXwCAHwE
ABkCfAIAfAQAPHwCAHwDAGQDAEA8cR8AV2QCAH0EAGQCAH0DAGQEAH0FAHiWAHwBAERdjgB9BgB4
UQB0AQBkBQCDAQBEXUMAfQcAfAQAZAYAF2QDAEB9BAB8AwB8AgB8BAAZF2QDAEB9AwB8AgB8AwAZ
fAIAfAQAGQJ8AgB8BAA8fAIAfAMAPHGgAFd8AgB8AgB8BAAZfAIAfAMAGRdkAwBAGX0IAHwFAHQE
AHQCAHwGAIMBAHwIAEGDAQA3fQUAcY0AV3wFAFMoBwAAAE5pAAEAAGkAAAAAaf8AAABSAAAAAGn9
AwAAaQEAAAAoBQAAAHQFAAAAcmFuZ2VSAQAAAHQDAAAAb3JkdAMAAABsZW50AwAAAGNocigJAAAA
dAMAAABrZXl0BAAAAGRhdGF0AQAAAFN0AQAAAGpSBwAAAHQDAAAAb3V0dAEAAABidAEAAAB4dAEA
AABLKAAAAAAoAAAAAHMRAAAAc2VjcmV0X21lc3NhZ2UucHl0BQAAAF9tcmM0CQAAAHMgAAAAAAEM
AQYBEwEmASkBBgEGAQYBDQETAQ4BEgEhARoBHgF0BQAAAFJvYm90YwAAAAAAAAAAAQAAAEIAAABz
RwAAAGUAAFoBAGQAAIQAAFoCAGQBAIQAAFoDAGQCAIQAAFoEAGQDAIQAAFoFAGQEAIQAAFoGAGQF
AIQAAFoHAGQGAIQAAFoIAFJTKAcAAABjAQAAAAEAAAACAAAAQwAAAHMWAAAAZAMAfAAAXwAAZAIA
fAAAXwEAZAAAUygEAAAATmkAAAAAUgAAAAAoAgAAAGkAAAAAaQAAAAAoAgAAAHQLAAAAX1JvYm90
X19wb3N0CwAAAF9Sb2JvdF9fc3RrKAEAAAB0BAAAAHNlbGYoAAAAACgAAAAAcxEAAABzZWNyZXRf
bWVzc2FnZS5weXQIAAAAX19pbml0X18cAAAAcwQAAAAAAQkBYwEAAAABAAAAAgAAAEMAAABzDQAA
AHwAAGoAAGQBAIMBAFMoAgAAAE50AQAAAGQoAQAAAHQEAAAAbW92ZSgBAAAAUhkAAAAoAAAAACgA
AAAAcxEAAABzZWNyZXRfbWVzc2FnZS5weXQIAAAAbW92ZURvd24gAAAAcwIAAAAAAWMBAAAAAQAA
AAIAAABDAAAAcw0AAAB8AABqAABkAQCDAQBTKAIAAABOdAEAAABsKAEAAABSHAAAACgBAAAAUhkA
AAAoAAAAACgAAAAAcxEAAABzZWNyZXRfbWVzc2FnZS5weXQIAAAAbW92ZUxlZnQjAAAAcwIAAAAA
AWMBAAAAAQAAAAIAAABDAAAAcw0AAAB8AABqAABkAQCDAQBTKAIAAABOdAEAAAByKAEAAABSHAAA
ACgBAAAAUhkAAAAoAAAAACgAAAAAcxEAAABzZWNyZXRfbWVzc2FnZS5weXQJAAAAbW92ZVJpZ2h0
JgAAAHMCAAAAAAFjAQAAAAEAAAACAAAAQwAAAHMNAAAAfAAAagAAZAEAgwEAUygCAAAATnQBAAAA
dSgBAAAAUhwAAAAoAQAAAFIZAAAAKAAAAAAoAAAAAHMRAAAAc2VjcmV0X21lc3NhZ2UucHl0BgAA
AG1vdmVVcCkAAABzAgAAAAABYwIAAAAJAAAABAAAAEMAAABzbgEAAHQAAHwBAIMBAHQBAGsDAHMw
AHQCAHwBAIMBAGQBAGsDAHMwAHwBAGQCAGsHAHI8AHQDAIMAAIIBAG4AAHwAAGoEAFwCAH0CAH0D
AGkEAGQRAGQEADZkEgBkBgA2ZBMAZAcANmQUAGQIADZ8AQAZXAIAfQQAfQUAZAMAfAIAfAQAFwQD
awEAb5IAZAkAawAAbgIAAgFvtABkAwB8AwB8BQAXBANrAQBvsgBkCQBrAABuAgACAXO7AHQFAFN8
AQBkBgBrAwByzQB8AgBuBwB8AgB8BAAXfQYAfAEAZAgAawMAcukAfAMAbgcAfAMAfAUAF30HAGQB
AHwGAGQKABR8AQBkCwBrBgBkCQAUF3wHABc+ZAwAQHIbAXQFAFN8AAAEagYAfAEANwJfBgB8AgB8
BAAXfAMAfAUAF2YCAHwAAF8EAHgmAGQVAERdHgB9CAB8AABqBgBqBwB8CABkEACDAgB8AABfBgBx
SAFXdAgAUygWAAAATmkBAAAAdAQAAABkbHJ1aQAAAABSGwAAAGn/////Uh4AAABSIAAAAFIiAAAA
aSAAAABpQAAAAHQCAAAAZHVsiQAAAMwhqSWNKClirAW/QlF2SzalSZQbJVnGYblNWUQGaxJ3yRqJ
Fyl50iotVjxwYyWCGTF/+WkPFnNwQwtmNJ1UiEiNWtpvp1XmJIBS8UrmIKoZ43ZWaPcLo01iEmEQ
KSUsdKJRv1U+NW4IQi3WerApRUV7KRhweibvL1pwSiDPJMNZhXw5DpN0n2fPPXYt514QI6p4jGJI
HfA1nhjFVwY6YCRSRV1ltX44J3I7dVWhXlIoSgK0aattVFlhRzxZWRSME6kh11FGKWRrZX0XAjsN
4m1qQl0LLgeLJjwUfQqHc+JGoy4teT5iRhbAENNldj3MGkU9xTPZVXJFwESuOXMvwUzsKGYL2AMI
Sfp//3//SCtT1AB0AgAAAHVkdAIAAABscnQCAAAAcmxSAAAAACgCAAAAaQAAAABpAQAAACgCAAAA
af////9pAAAAACgCAAAAaQEAAABpAAAAACgCAAAAaQAAAABp/////ygEAAAAUiYAAABSJQAAAFIn
AAAAUigAAAAoCQAAAHQEAAAAdHlwZXQDAAAAc3RyUgsAAAB0CgAAAFZhbHVlRXJyb3JSFwAAAHQF
AAAARmFsc2VSGAAAAHQHAAAAcmVwbGFjZXQEAAAAVHJ1ZSgJAAAAUhkAAABSGwAAAFITAAAAdAEA
AAB5dAIAAABkeHQCAAAAZHl0AgAAAGN4dAIAAABjeXQCAAAAYnQoAAAAACgAAAAAcxEAAABzZWNy
ZXRfbWVzc2FnZS5weVIcAAAALAAAAHMeAAAAAAEwAAwBDwEsAUAABAEcARwBJAAEAQ8BFwENARwB
YwEAAAAEAAAABAAAAEMAAABzcAAAAHwAAGoAAFwCAH0BAH0CAHwAAGoAAGQHAGsDAHI0AGQCAGQB
AHwBABhkAQB8AgAYZgIAFlN0AQB8AABqAgCDAQB9AwB8AwBqAwBkAwCDAQBzVgBkBABTdAQAfAMA
ZAUAIHQFAGoGAGQGAIMBAIMCAFMoCAAAAE5pHwAAAHM3AAAAWW91IGFyZSBzdGlsbCAlZCBsZWZ0
IGFuZCAlZCBhYm92ZSB3aGVyZSB5b3Ugc2hvdWxkIGJlIXMCAAAABIVzHwAAAENoZWF0ZXIhIEdv
IHNvbHZlIGl0IGNvcnJlY3RseSFp/f///3OAAAAAUWZ2ZlhZbjROVHpCR084eWo5cjBOTjk0ekZ6
VEphczEyUDIvak1Uc2QzUFFlNjJIeVh0WXZIaGxrMXFkbHR4SnhpZ0Fxd1ozczlqK2E4dGhBZVlp
M242TWY3RDU4eCtEZzhDWEkvS1FSRzB6UGhyYVl6TGRnNGJ2TVpJYTl4Yz0oAgAAAGkfAAAAaR8A
AAAoBwAAAFIXAAAAUggAAABSGAAAAHQIAAAAZW5kc3dpdGhSFQAAAHQJAAAAX2JpbmFzY2lpdAoA
AABhMmJfYmFzZTY0KAQAAABSGQAAAFITAAAAUi8AAAB0AQAAAGsoAAAAACgAAAAAcxEAAABzZWNy
ZXRfbWVzc2FnZS5weXQFAAAAc29sdmU6AAAAcxAAAAAAAQ8BDwEWAg8BDwAEAhABKAkAAAB0CAAA
AF9fbmFtZV9fdAoAAABfX21vZHVsZV9fUhoAAABSHQAAAFIfAAAAUiEAAABSIwAAAFIcAAAAUjkA
AAAoAAAAACgAAAAAKAAAAABzEQAAAHNlY3JldF9tZXNzYWdlLnB5UhYAAAAbAAAAcw4AAAAGAQkE
CQMJAwkDCQMJDigIAAAAdAcAAABoYXNobGliUgIAAAB0CAAAAGJpbmFzY2lpUjYAAABSCAAAAFIV
AAAAdAYAAABvYmplY3RSFgAAACgAAAAAKAAAAAAoAAAAAHMRAAAAc2VjcmV0X21lc3NhZ2UucHl0
CAAAADxtb2R1bGU+AQAAAHMIAAAADAEMAgkFCRI=
I was wondering if anyone knew exactly what either language this was encoded with or whether or not there is someone out there that CAN help me decode this.
(Note: I tried using a Decoding website but from all the languages I checked, none of them were matching the language I was trying to find out. This has been boggling me ever since I started working on this 2 years ago.)
It appears to be BASE64 encoded at least in parts. Putting it through a BASE64 decoder gives the text below which includes a few hints as to what this might be.
Yc#sDddlZddlZdZdZdefdYZdS(iNcCs:d}x-tdD]}tj||j}qW|S(Ntii(txranget_hashlibtsha256tdigest(tmthti((ssecret_message.pyt_ishasc Cs#td}d}x_tdD]Q}|||t||t|7}||d#||||<||d#d|dS(NiR(ii(t_Robot__post_Robot__stk(tself((ssecret_message.pytinits cCs
|jdS(Ntd(tmove(R((ssecret_message.pytmoveDown scCs
|jdS(Ntl(R(R((ssecret_message.pytmoveLeft#scCs
|jdS(Ntr(R(R((ssecret_message.pyt moveRight&scCs
|jdS(Ntu(R(R((ssecret_message.pytmoveUp)sc Csnt|tks0t|dks0|dkrd#rtS|j|7||||f|x&dD]}|jj|d|_qHWtS(NitdlruiRiRR R"i i#tdul!%()bBQvK6I%YaMYDkw)y*-V5nB-z)EE{)pz&/ZpJ $Y|9tg=v-^#xbH5W:`$RE]e~8'r;uU^R(JimTYaGbFev=E=3UrED9s/L(fIH+StudtlrtrlR(ii(ii(ii(ii(R&R%R'R(( ttypetstrRt
ValueErrorRtFalseRtreplacetTrue( RRRtytdxtdytcxtcytbt((ssecret_message.pyR,s0,#$
cCsp|j\}}|jdkr4dd|d|fSt|j}|jdsVdSt|d tjdS(Nis7You are still %d left and %d above where you should be!ssCheater! Go solve it correctly!isQfvfXYn4NTzBGO8yj9r0NN94zFzTJas12P2/jMTsd3PQe62HyXtYvHhlk1qdltxJxigAqwZ3s9j+a8thAeYi3n6Mf7D58x+Dg8CXI/KQRG0zPhraYzLdg4bvMZIa9xc=(ii(RRRtendswithRt _binasciit
a2b_base64(RRR/tk((ssecret_message.pytsolve:s( t__name__t
__module__RRRR!R#RR9(((ssecret_message.pyRs (thashlibRtbinasciiR6RRtobjectR(((ssecret_message.pyts

Concat in xslt, unable to get an image from server

I am new to xslt coding, I am trying to fix an issue in existing piece of code. I got stuck up at a point
<fo:external-graphic content-width="150pt"
content-height="50pt"
src="url:{concat('${OA_MEDIA}/',$revised_last_name,',',DOCUMENT_BUYER_FIRST_NAME,'.gif')}" />
The above piece of code is trying to find a .gif file in OA_MEDIA directory. Till that part I can understand fine.
When I am placing a file name as "Eckert,Tim.gif" (excluding the quotes) my program isn't picking that file
In the above piece, I printed $revised_lastname and $document_buyer_first_name..It's coming as Tim and Eckert, but it's still not picking the file. If I am hardcoding a file name like below it's working fine
<fo:external-graphic content-width="150pt"
content-height="50pt"
src="url:{concat('${OA_MEDIA}/','Tim','.gif')}" />
How can I print what value is coming into the src in above piece of code so I can see what file is it trying to look in the $OA_MEDIA.
Any suggestions are appreciated.
Thanks!
Your first problem appears to be that in your constructed URI and your file name, you have put the given name and the family name in different orders: Tim,Eckert.gif vs Eckert,Tim.gif. Choose one.
If you still have problems after that, your next step is to confirm that your concatenation is producing the value you expect. I'd add a line like
<xsl:message>Generating fo:external graphic with URI <xsl:value-of
select="concat('url:',
'${OA_MEDIA}/',
$revised_last_name,
',',
DOCUMENT_BUYER_FIRST_NAME,
'.gif')"/></xsl:message>
And if that did not shed light on what was going wrong, I'd insert individual messages to display the current value of $revised_last_name and DOCUMENT_BUYER_FIRST_NAME.
More than that it's difficult to say, because your question does not provide a short, self-contained, complete example that allows readers to reproduce the problem you are trying to solve. There is good advice on asking effective questions in the SO help files and in Eric Raymond and Rick Moen's essay How to ask questions the smart way.

Converting raw print codes to a printer using a VB.net web page

I've run into a situation I've never encountered before. I'm working with a shipping API for an e-commerce site and I've retrieved a shipping label. The problem is that the code used for the label is what appears to be raw print codes. They look something like this:
N
Q822,24
R40,0
S4
D15
ZB
A760,120,1,1,1,1,N,"Interlink Express"
A735,080,1,1,1,1,N,"www.interlinkexpress.com"
A706,033,1,1,1,1,N,"Sender"
.... more codes like this
LO001,330,765,10
LO001,025,765,1
LO001,192,590,1
LO001,330,765,10
.... more codes like this
P1
What I'd like to be able to do is convert it to a graphic or something else that I can print from a web page. The problem is that I can't find any ways to deal with this particular type of information, since it's not in a web-standard format (it's not HTML, XML, JSON, etc.)
How would I deal with something like this?
Thanks.
In the end, I solved this a different way. Apparently the shipping company was able to use an Accept: text/html header from my HttpWebRequest and I got an HTML-formatted printer label. So we're good. Thanks.

phpexcel pdf rendering library has not been defined

After trying and failing to generate PDFs with PHPExcel 1.7.6 (out of memory errors), I upgraded to 1.7.8. I can't for the life of me figure out how to get it working. I've tried tcPDF and mPDF, and it's the same for both.
Putting it back to Excel output, I can see I'm setting the path correctly. All I can get is "PDF Rendering library has not been defined", and I can't figure out what it wants - I've tried 'mPDF5.4', 'MPDF54' (the actual name of the folder itself), 'mpdf', 'mpdf.php'...same each time.
I've been using PHPExcel for over a year, so I'm not entirely new to it. I've lost way more time than I care to admit on this problem, and I haven't found this problem described anywhere, so I'm feeling more than a little stupid that I appear to be the only one that can't figure this out.
The actual code I'm using is the following:
ini_set('include_path', ini_get('include_path').'\\Classes\\');
$rendererName = PHPExcel_Settings::PDF_RENDERER_MPDF;
$rendererLibrary = 'mPDF5.4';
$rendererLibraryPath = ini_get('include_path') . $rendererLibrary;
(That is, pretty well a copy of the example code.)
In the interest of completeness, the headers I'm using are
echo header("Content-Type: application/pdf");
echo header("Content-Disposition: attachment; filename=".$filename.".pdf" );
echo header('Cache-Control: max-age=0');
These near the top of the file, naturally.
Near the end of the file, the output code is
$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'PDF');
$objWriter->save('php://output');
I got it working. Much as I'd like to say I had a breakthrough moment and understand it perfectly, I have no idea how I got it to work. However, in hopes that it might help someone, let me lay out what I did.
I'm running XAMPP on Windows. My file structure has the folder for PHPExcel itself in xampp\php\PEAR\Classes. domPDF is in the same folder, and I renamed it 'dompdf'.
For reasons I no longer recall, I set the include path like so:
ini_set('include_path', ini_get('include_path').'\\Classes\\');
To set the rendering path, I used the following:
$rendererName = PHPExcel_Settings::PDF_RENDERER_DOMPDF;
$rendererLibrary = 'dompdf';
$rendererLibraryPath = ini_get('include_path') . $rendererLibrary;
For the actual writer creation, I'm using the following:
$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'PDF');
$objWriter->save($path.$fullFileName);
One thing I noticed that may have made things different was doing this:
// include 'PHPExcel/Writer/Excel2007.php';
That is, unlike everything I've done in PHPExcel, I'm not including anything from the Writer folder at all. Best I can remember, that's all that's different this time versus a week ago when I asked the question. Once I'd grabbed the 01simple-download-pdf.php file from the Tests folder in 1.7.8, it was mostly a matter of copying the code from it and tweaking it to my paths.
To summarize, leave $rendererName alone. The $rendererLibrary is the name of the folder that contains the library, 'dompdf' in my case. The $rendererLibraryPath is literally setting the path to that folder, so it ends with the path that contains the pdf library folder.
It should be obvious that I'm no uber-leet hax0r, but SO has answered many, many programming questions for me. I'm hoping this helps someone else, so they're not wasting hours like I did.
The PHPOffice contains also PHPWord. I have had the same error message with PHPWord. This is for LINUX. A replacement of 'PhpWord' by 'PhpExcel' should do the job for this case. You must modify the path $rendererLibraryPath to your needs.
$rendererName = \PhpOffice\PhpWord\Settings::PDF_RENDERER_DOMPDF;
$rendererLibraryPath = realpath(__DIR__ . '/../../../../../dompdf-0.6.1');
\PhpOffice\PhpWord\Settings::setPdfRenderer($rendererName, $rendererLibraryPath);
$objWriter = \PhpOffice\PhpWord\IOFactory::createWriter($phpWord, 'PDF');
$objWriter->save('helloWorld.pdf');
If you are getting this error and you have set the correct path to the TCPDF or DOMPDF folder (you do not have to write the full path), then also make sure you have these lines:
if (!PHPExcel_Settings::setPdfRenderer(
$rendererName,
$rendererLibraryPath
)) {
die(
'NOTICE: Please set the $rendererName and $rendererLibraryPath values' .
EOL .
'at the top of this script as appropriate for your directory structure'
);
}