PDFBOX - Invalid characters codes with AR PL Zenkai Uni Font - pdfbox

PdfBox 2.0.24.
Hi, I'm developing a PDF writer and I need to use "AR PL Zenkai Uni Font".
When I try to load it PDFBox crash with the following error:
Exception in thread "main" java.io.IOException: Invalid characters codes
at org.apache.fontbox.ttf.CmapSubtable.processSubtype12(CmapSubtable.java:257)
at org.apache.fontbox.ttf.CmapSubtable.initSubtable(CmapSubtable.java:111)
at org.apache.fontbox.ttf.CmapTable.read(CmapTable.java:86)
at org.apache.fontbox.ttf.TrueTypeFont.readTable(TrueTypeFont.java:361)
at org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:173)
at org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:150)
at org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:106)
at org.apache.pdfbox.pdmodel.font.PDType0Font.load(PDType0Font.java:97)
at com.vgs.pdf.PDFCreatorSandbox.main(PDFCreatorSandbox.java:166)e here
To load this font i'm usign the following code:
PDType0Font brokenFont = PDType0Font.load(document, new FileInputStream("font/ukai.ttf"), false);
This code was run on Windows 10 with java 1.8.0_291
Any suggestions?
Thanks in advance

Related

PyPDF2: Error -5 while decompressing data: incomplete or truncated stream

I'm having problem with Incomplete or truncated stream while trying to pull data out of PDF interactive form. Could anyone help me with this please
PDFfile = open(fname, "rb")
pdfread = p2.PdfFileReader(PDFfile)
I'm having below error when i execute pdfread
Error -5 while decompressing data: incomplete or truncated stream
It mostly happens when you already opened pdf in different program or pdf is is corrupted. Try opening pdf with open()

Uncompilable source code - Erroneous sym type: org.apache.lucene.document.FieldType.setIndexed when indexing pdf files error

im trying to index pdf files in lucene 6.6.0 and pdfbox 2.0.7
im getting some following errors. (EDITED)
run:
Indexing ke folder: 'D:\Kuliah\rancangan document indexing\dir-index\'...
Indexing PDF document: D:\Kuliah\rancangan document indexing\dir-pdf\dua.pdf
Exception in thread "main" java.lang.ExceptionInInitializerError
at tigasepuluh.Playground.indexDocs(Playground.java:110)
at tigasepuluh.Playground.indexDocs(Playground.java:88)
at tigasepuluh.Playground.main(Playground.java:65)
Caused by: java.lang.RuntimeException: Uncompilable source code - Erroneous sym type: org.apache.lucene.document.FieldType.setIndexed
at org.apache.pdfbox.examples.lucene.LucenePDFDocument.<clinit>(LucenePDFDocument.java:123)
... 3 more
C:\Users\abc\AppData\Local\NetBeans\Cache\8.2\executor-snippets\run.xml:53: Java returned: 1
BUILD FAILED (total time: 5 seconds)
And this is github link to my complete code
my complete code
Change this line in your copy of org.apache.pdfbox.examples.lucene.LucenePDFDocument:
TYPE_STORED_NOT_INDEXED.setIndexed(false);
to
TYPE_STORED_NOT_INDEXED.setIndexOptions(IndexOptions.NONE);
The problem you had is because the PDFBox example was made for lucene 4.

Unable to extract data with double pipe delimiter in Pig Script

I am trying to extract data which is pipe delimited in Pig. Following is my command
L = LOAD 'entirepath_in_HDFS/b.txt/part-m*' USING PigStorage('||');
Iam getting following error
2016-08-04 23:58:21,122 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
<line 1, column 4> pig script failed to validate: java.lang.RuntimeException: could not instantiate 'PigStorage' with arguments '[||]'
My input sample file has exactly 5 lines as following
POS_TIBCO||HDFS||POS_LOG||1||7806||2016-07-18||1||993||0
POS_TIBCO||HDFS||POS_LOG||2||7806||2016-07-18||1||0||0
POS_TIBCO||HDFS||POS_LOG||3||7806||2016-07-18||1||0||5
POS_TIBCO||HDFS||POS_LOG||4||7806||2016-07-18||1||0||0
POS_TIBCO||HDFS||POS_LOG||5||7806||2016-07-18||1||0||19.99
I tried several options like using the backslash before delimiter(\||,\|\|) but everything failed. Also, I tried with schema but got the same error.I am using Horton works(HDP2.2.4) and pig (0.14.0).
Any help is appreciated. Please let me know if you need any further details.
I have faced this case, and by checking PigStorage code source, i think PigStorage argument should be parsed into only one character.
So we can use this code instead:
L0 = LOAD 'entirepath_in_HDFS/b.txt/part-m*' USING PigStorage('|');
L = FOREACH L0 GENERATE $0,$2,$4,$6,$8,$10,$12,$14,$16;
Its helpful if you know how many column you have, and it will not affect performance because it's map side.
When you load data using PigStorage, It only expects single character as delimiter.
However if still you want to achieve this you can use MyRegExLoader-
REGISTER '/path/to/piggybank.jar'
A = LOAD '/path/to/dataset' USING org.apache.pig.piggybank.storage.MyRegExLoader('||')
as (movieid:int, title:chararray, genre:chararray);

Voice recognition with Julius. How to make .voca file?

I'm making a voice recognition system and Julius shows not bad results in this work.
Words from sample .voca file are recognizing perfectly but how to place own words and transcriptions to the file?
I've tried VoxForge (http://www.voxforge.org/) last release and nightly builds for acoustic models with their vocabulary but I've got a lot a lot errors at julius start like this:
Error: voca_load_htkdict: line 19: triphone "r-d+v" not found
Error: voca_load_htkdict: line 19: triphone "d-v+aa" not found
Error: voca_load_htkdict: the line content was: 2 [AARDVARK] aa r d v aa r k
Error: voca_load_htkdict: begin missing phones
Error: voca_load_htkdict: r-d+v
Error: voca_load_htkdict: d-v+aa
Error: voca_load_htkdict: end missing phones
Error: init_voca: error in reading /usr/src/custom/julius/quickstart/grammar/sample.dict
ERROR: failed to read dictionary "/usr/src/custom/julius/quickstart/grammar/sample.dict"
ERROR: m_fusion: some error occured in reading grammars
ERROR: Error in loading model
Anyone knows the rules of word transcription for .voca files?
error reason:
julius optput these messages when your word dictionary contains words that are not trained in the Acoustic Model because the "voca_load_htkdict.c" tries to match the triphones in dict file with the triphone list in Acoustic Model, so when it does not find it, it shows this error and stops the program.
possible error solutions:
1. enable -forcedict option or uncomment it jconf file to Skip error words in dictionary and force running.
or..
2. map the "not found triphone" to the most close physical triphone in hmmlist file "tiedlist".
for example:
b-ey+t v-eh+t
The first column is the name of triphone (generated from your dictionary), and the second column is the name of the HMM actually defined in your AM.
but this solution can be done if the "not found triphones" are little, not too many.
the best solution is to not to include words in your dict file that are not in the A.M
note that the first two solutions are for testing julius only because for production or comercial projects you must train the acoustic model and language model with the same corpus.

error when trying to import ps file by grImport in R

I need to create a pdf file with several chart created by ggplot2 arranged in a A4 paper, and repeat it 20-30 times.
I export the ggplot2 chart into ps file, and try to PostScriptTrace it as instructed in grImport, but it just keep giving me error of Unrecoverable error, exit code 1.
I ignore the error and try to import and xml file generated into R object, give me another error:
attributes construct error
Couldn't find end of Start Tag text line 21
Premature end of data in tag picture line 3
Error: 1: attributes construct error
2: Couldn't find end of Start Tag text line 21
3: Premature end of data in tag picture line 3
What's wrong here?
Thanks!
If you have no time to deal with Sweave, you could also write a simple TeX document from R after generating the plots, which you could later compile to pdf.
E.g.:
ggsave(p, file=paste('filename', id, '.pdf'))
cat(paste('\\includegraphics{',
paste('filename', id, '.pdf'), '}', sep=''),
file='report.pdf')
Later, you could easily compile it to pdf with for example pdflatex.