possible to get Pdf fonts encoding value?

possible to get Pdf fonts encoding value? - vb.net

Adobe Encoding: Build-In
I would like to get the encoding value : 'Build-in', currently i am using iTextSharp, seem like don't have any straight forward way to get the encoding value.
The only way i find to get font encoding value
Dim fonts As PdfDictionary = resources.GetAsDict(PdfName.FONT)
Dim fontEncoding As String = font.GetAsName(PdfName.ENCODING).ToString().Substring(1)
but this way unable to get some encoding value.
Such as encoding value (Build-in & Custom) font.GetAsName(PdfName.ENCODING) will return Nothing. For Ansi able to get value 'WinAnsiEncoding'

Related

VB.net - convert PDF bytes to string

I have the below response in a json format, from an API request of printing something and it contains PDF bytes:
%PDF-1.7 %??5 0 obj <</Filter/FlateDecode/Alternate/DeviceRGB/Length 2592/N 3>>stream x???wTS???7??" %? ?H?. ! BB?+?#??4E?A??Q,??O?ADG??F?y??g}k????.
The idea is that I need to convert that into a string using VB and I wasn't able to find something that would help me on the web.
Can someone give me a hint on how to do this? Thanks

Use Base64 methods to convert byte[] to String.
Dim b As Byte() = Convert.FromBase64String(pdfByte)
Dim finalString As String = System.Text.Encoding.UTF8.GetString(b)
btw. I don't know how the json response will be created but there is something wrong. You can not get anything useful out of x???wTS???7??" %? ?H?. ! BB?+?#??4E?A??Q,
??O?ADG??F?y??g}k????.
There is a problem on the other site. It should convert the bytes to a Base64 String and then it will work.

Only get DateTimeOriginal with exiftool

Hey community,
Since a few days I'm stuck while trying to get the date of a .jpg or .png image file, when the picture was taken.
I believe it was called DateTimeOriginal.
What I'm trying to do, is getting just this one specific info, DateTimeOriginal, not more, not less.
This is part of a selfmade project, a program to sort pictures by the date when they were taken.
I'm programming with VB, and for the exif data I'm calling a batch file.
So i know how to use the exiftool. It's common use is:
exiftool file.jpg
But I need something like:
exiftool -DateTimeOriginal file.jpg >> DateTaken.txt
I have tried this one, but I'm not getting the Date, I only got a list of any jpg found in the directory, but without metadata.
I was searching so long for any option like this, but I can't find anything useful. Perhaps there is another, more efficient way to get metadata of an image, only using VB.
Has anyone an advise or other idea?
Thanks

You have the correct command to get the DateTimeOriginal tag from a file (exiftool -DateTimeOriginal file.jpg). But you say you are getting a list of filenames in a directory, which sounds like you're passing a directory name, not a file name. If you wish to get DateTimeOriginal for only those files in a directory that have a value in the tag, use exiftool -if "$DateTimeOriginal" -DateTimeOriginal C:/path/to/dir. Any file that doesn't have a DateTimeOriginal will not be listed then.
One thing to note is that the windows "Date Taken" property will be filled by a variety of metadata tags depending upon the filetype. For example, in PNG files, Windows will use PNG:CreationTime. In jpg files, Windows will use, in order, EXIF:DateTimeOriginal, IPTC:DateCreated + IPTC:TimeCreated, XMP:CreateDate, EXIF:CreateDate, and then XMP:DateTimeOriginal tags.

After a bit of digging, I found that you can get a list of properties if a bitmap.
Unfortunately the property IDs are numeric and rather cryptic.
Have a look here to find out more
After a bit more digging, it seems that the propertyId &h132 (a hexadecimal number) is the date stored as an array of integers in ascii encoding. This function finds propertyid &h132 and returns the date info as a string in year:month:date hour:minute:second format.
You might get variations with localization.. for example using /,: or - for the date separators etc, so, to parse it as a date type, you might need to work around that.
Public Function GetImageTakenDate(theimage As Bitmap) As String
Dim propItems As List(Of PropertyItem) = theimage.PropertyItems.ToList
Dim dt As PropertyItem = propItems.Find(Function(x) x.Id = &H132)
Dim datestring As String = ""
For Each ch As Integer In dt.Value
datestring += Chr(ch)
Next
datestring = datestring.Remove(datestring.Length - 1)
Return datestring
End Function

Special characters in PDF form fields and global and fieldbased DR

I have a question regarding a weird form field behaviour.
Two pdf documents, both have textfield(s) using Helvetica as a font
Both are filled with values using the same iText logic (cp. below)
The field value (/V) is correct for both PDFs however the field appearance is not.
One Pdf is working fine the other scrambles special character like the euro symbol € or German characters like üöäß.
I tried to define a substitute font (as described in the book) however never got € and ß to work.
The only difference I could find is that a /DR dictionary is defined on field level for the non-working PDF (in adition to the global one). But if I remove it, the € sign still doesn't work. Please note, that I am not talking about asian or some exotic unicode characters here - all are part of the standard helvetica font (as the other PDF proves)
Question(s):
Any ideas how to get the non working PDF to correctly display the characters?
Or does the PDF violates the pdf spec somehow? (It was created using Acrobat which makes that unlikely but not impossible).
If you suggest to replace the form field font - how can I differentiate between working and non working PDF files since I don't want to do that for perfectly valid and working files
Update: The code is not the problem (I am certain of that since its the same code for both) however for the sake of completeness here it is:
AcroFields acroFields = stamper.getAcroFields();
try {
boolean successful = acroFields.setField("Mitarbeiter", "öäüß€#");
if (!successful) {
//throw some exception
}
}
catch (DocumentException de) {
//some exceptionhandling
}

I didn't find any clues in the PDF reference about this, but the font that is used for the field doesn't define an encoding. However: an encoding is defined at the level of the resource dictionary (/DR). If you use that encoding, then the appearance of the field is created correctly. Note that the ISO specification doesn't say anything about the existence of an /Encoding entry at the level of the resource dictionary.
I've made a small update to iText. You can check the changes in revision 6693. This way, iText will now check if the /DR dictionary has encoding values in case no encoding is defined at the level of the font. With this fix, your form is filled out correctly.

VB.NET 2010 - Chr() Function of another code page

When I use function Chr(225), I get character "á", because code page of Windows is 1250 (System.Globalization.CultureInfo.CurrentCulture.TextInfo.ANSICodePage)
Is it possible to use Chr(225), but get character of another code page?
For example code 225 represents in code page DOS-852 character "ß".
I need to convert "á" to "ß".
Is it possible to get character of DOS code page 852?
For example Chr(225) should return "ß".
Thanks!

You can get an Encoding for a specific code page, 852 in your case:
Dim enc = Text.Encoding.GetEncoding(852)
Dim str = enc.GetString(New Byte() {225})
Have a look at the Encoding class in general for conversions between text encodings.

How do I match non-ASCII characters with RegexKitLite?

I am using RegexKitLite and I'm trying to match a pattern.
The following regex patterns do not capture my word that includes N with a titlde: ñ.
Is there a string conversion I am missing?
subjectString = #"define_añadir";
//regexString = #"^define_(.*)"; //this pattern does not match, so I assume to add the ñ
//regexString = #"^define_([.ñ]*)"; //tried this pattern first with a range
regexString = #"^define_((?:\\w|ñ)*)"; //tried second
NSString *captured= [subjectString stringByMatching:regexString capture:1L];
//I want captured == añadir

Looks like an encoding problem to me. Either you're saving the source code in an encoding that can't handle that character (like ASCII), or the compiler is using the wrong encoding to read the source files. Going back to the original regex, try creating the subject string like this:
subjectString = #"define_a\xC3\xB1adir";
or this:
subjectString = #"define_a\u00F1adir";
If that works, check the encoding of your source code files and make sure it's the same encoding the compiler expects.
EDIT: I've never worked with the iPhone technology stack, but according to this doc you should be using the stringWithUTF8String method to create the NSString, not the #"" literal syntax. In fact, it says you should never use non-ASCII characters (that is, anything not in the range 0x00..0x7F) in your code; that way you never have to worry about the source file's encoding. That's good advice no matter what language or toolset you're using.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

possible to get Pdf fonts encoding value? - vb.net

Related

VB.net - convert PDF bytes to string

Only get DateTimeOriginal with exiftool

Special characters in PDF form fields and global and fieldbased DR

VB.NET 2010 - Chr() Function of another code page

How do I match non-ASCII characters with RegexKitLite?

Categories

Resources