How to decompress/deflate PDF Stream - pdf

Working with the 2016-W4 pdf, which has 2 large streams (page 1 & 2), along with a bunch of other objects and smaller streams. I'm trying to deflate the stream(s), to work with the source data, but am struggling. I'm only able to get corrupt inputs and invalid checksums errors.
I've written a test script to help debug, and have pulled out smaller streams from the file to test with.
Here are 2 streams from the original pdf, along with their length objects:
stream 1:
149 0 obj
<< /Length 150 0 R /Filter /FlateDecode /Type /XObject /Subtype /Form /FormType
1 /BBox [0 0 8 8] /Resources 151 0 R >>
stream
x+TT(T0�B ,JUWÈS0Ð37±402V(NFJS�þ¶
«
endstream
endobj
150 0 obj
42
endobj
stream 2
142 0 obj
<< /Length 143 0 R /Filter /FlateDecode /Type /XObject /Subtype /Form /FormType
1 /BBox [0 0 0 0] /Resources 144 0 R >>
stream
x+T�ç�ã
endstream
endobj
143 0 obj
11
endobj
I copied just the stream contents into new files within Vim (excluding the carriage returns after stream and before endstream).
I've tried both:
compress/flate (rfc-1951) – (removing the first 2 bytes (CMF, FLG))
compress/zlib (rfc-1950)
I've converted the streams to []byte for the below:
package main
import (
"bytes"
"compress/flate"
"compress/gzip"
"compress/zlib"
"fmt"
"io"
"os"
)
var (
flateReaderFn = func(r io.Reader) (io.ReadCloser, error) { return flate.NewReader(r), nil }
zlibReaderFn = func(r io.Reader) (io.ReadCloser, error) { return zlib.NewReader(r) }
)
func deflate(b []byte, skip, length int, newReader func(io.Reader) (io.ReadCloser, error)) {
// rfc-1950
// --------
// First 2 bytes
// [120, 1] - CMF, FLG
//
// CMF: 120
// 0111 1000
// ↑ ↑
// | CM(8) = deflate compression method
// CINFO(7) = 32k LZ77 window size
//
// FLG: 1
// 0001 ← FCHECK
// (CMF*256 + FLG) % 31 == 0
// 120 * 256 + 1 = 30721
// 30721 % 31 == 0
stream := bytes.NewReader(b[skip:length])
r, err := newReader(stream)
if err != nil {
fmt.Println("\nfailed to create reader,", err)
return
}
n, err := io.Copy(os.Stdout, r)
if err != nil {
if n > 0 {
fmt.Print("\n")
}
fmt.Println("\nfailed to write contents from reader,", err)
return
}
fmt.Printf("%d bytes written\n", n)
r.Close()
}
func main() {
//readerFn, skip := flateReaderFn, 2 // compress/flate RFC-1951, ignore first 2 bytes
readerFn, skip := zlibReaderFn, 0 // compress/zlib RFC-1950, ignore nothing
// ⤹ This is where the error occurs: `flate: corrupt input before offset 19`.
stream1 := []byte{120, 1, 43, 84, 8, 84, 40, 84, 48, 0, 66, 11, 32, 44, 74, 85, 8, 87, 195, 136, 83, 48, 195, 144, 51, 55, 194, 177, 52, 48, 50, 86, 40, 78, 70, 194, 150, 74, 83, 8, 4, 0, 195, 190, 194, 182, 10, 194, 171, 10}
stream2 := []byte{120, 1, 43, 84, 8, 4, 0, 1, 195, 167, 0, 195, 163, 10}
fmt.Println("----------------------------------------\nStream 1:")
deflate(stream1, skip, 42, readerFn) // flate: corrupt input before offset 19
fmt.Println("----------------------------------------\nStream 2:")
deflate(stream2, skip, 11, readerFn) // invalid checksum
}
I'm sure I'm doing something wrong somewhere, I just can't quite see it.
(The pdf does open in a viewer)

Binary data should never be copied out of / saved from text editors. There might be cases when this succeeds, and it just adds oil to the flame.
Your data that you eventually "mined out" from the PDF is most likely not identical to the actual data that is in the PDF. You should take the data from a hex editor (e.g. try hecate for something new), or write a simple app that saves it (which strictly handles the file as binary).
Hint #1:
The binary data displayed spread across multiple lines. Binary data does not contain carriage returns, that's a textual control. If it does, that means the editor did interpret it as text, and so some codes / characters where "consumed" to start a new line. Multiple sequences may be interpreted as the same newline (e.g. \n, \r\n). By excluding them, you're already at data loss, by including them, you might already have a different sequence. And if the data was interpreted and displayed as text, more problems may arise as there are more control characters, and some characters may not appear when displayed.
Hint #2:
When flateReaderFn is used, decoding the 2nd example succeeds (completes without an error). This means "you were barking up the right tree", but the success depends on what the actual data is and to what extent was it "distorted" by the text editor.

Okay, confession time...
I was so caught up in trying to understand deflate that I completely overlooked the fact that Vim wasn't saving the stream contents correctly into new files. So I spent quite a bit of time reading the RFC's, and digging through the internals of the Go compress/... packages, assuming the problem was with my code.
Shortly after I posted my question I tried reading the PDF as a whole, finding the stream/endstream locations, and pushing that through deflate. As soon as I saw the content scroll through the screen I realized my dumb mistake.
+1 #icza, that was exactly my issue.
It was good in then end, as I have a much better understanding of the whole process than if it would have just worked the first go around.

Extracting objects from PDF can be tricky depending on the filters used. The filter can also have additional options which need to be handled correctly.
For someone interested in extracting an object without taking care of the low-level details of the process.
To get a single object from a PDF and decode it can be done as follows:
package main
import (
"fmt"
"os"
"strconv"
"github.com/unidoc/unipdf/v3/core"
"github.com/unidoc/unipdf/v3/model"
)
func main() {
objNum := 149 // Get object 149
err := inspectPdfObject("input.pdf", objNum)
if err != nil {
fmt.Printf("Error: %v\n", err)
os.Exit(1)
}
}
func inspectPdfObject(inputPath string, objNum int) error {
f, err := os.Open(inputPath)
if err != nil {
return err
}
defer f.Close()
pdfReader, err := model.NewPdfReader(f)
if err != nil {
return err
}
isEncrypted, err := pdfReader.IsEncrypted()
if err != nil {
return err
}
if isEncrypted {
// If encrypted, try decrypting with an empty one.
// Can also specify a user/owner password here by modifying the line below.
auth, err := pdfReader.Decrypt([]byte(""))
if err != nil {
fmt.Printf("Decryption error: %v\n", err)
return err
}
if !auth {
fmt.Println(" This file is encrypted with opening password. Modify the code to specify the password.")
return nil
}
}
obj, err := pdfReader.GetIndirectObjectByNumber(objNum)
if err != nil {
return err
}
fmt.Printf("Object %d: %s\n", objNum, obj.String())
if stream, is := obj.(*core.PdfObjectStream); is {
decoded, err := core.DecodeStream(stream)
if err != nil {
return err
}
fmt.Printf("Decoded:\n%s", decoded)
} else if indObj, is := obj.(*core.PdfIndirectObject); is {
fmt.Printf("%T\n", indObj.PdfObject)
fmt.Printf("%s\n", indObj.PdfObject.String())
}
return nil
}
A full example: pdf_get_object.go
Disclosure: I am the original developer of UniPDF.

Related

Use Gob to write logs to a file in an append style

Would it be possible to use Gob encoding for appending structs in series to the same file using append? It works for writing, but when reading with the decoder more than once I run into:
extra data in buffer
So I wonder if that's possible in the first place or whether I should use something like JSON to append JSON documents on a per line basis instead. Because the alternative would be to serialize a slice, but then again reading it as a whole would defeat the purpose of append.
The gob package wasn't designed to be used this way. A gob stream has to be written by a single gob.Encoder, and it also has to be read by a single gob.Decoder.
The reason for this is because the gob package not only serializes the values you pass to it, it also transmits data to describe their types:
A stream of gobs is self-describing. Each data item in the stream is preceded by a specification of its type, expressed in terms of a small set of predefined types.
This is a state of the encoder / decoder–about what types and how they have been transmitted–, a subsequent new encoder / decoder will not (cannot) analyze the "preceeding" stream to reconstruct the same state and continue where a previous encoder / decoder left off.
Of course if you create a single gob.Encoder, you may use it to serialize as many values as you'd like to.
Also you can create a gob.Encoder and write to a file, and then later create a new gob.Encoder, and append to the same file, but you must use 2 gob.Decoders to read those values, exactly matching the encoding process.
As a demonstration, let's follow an example. This example will write to an in-memory buffer (bytes.Buffer). 2 subsequent encoders will write to it, then we will use 2 subsequent decoders to read the values. We'll write values of this struct:
type Point struct {
X, Y int
}
For short, compact code, I use this "error handler" function:
func he(err error) {
if err != nil {
panic(err)
}
}
And now the code:
const n, m = 3, 2
buf := &bytes.Buffer{}
e := gob.NewEncoder(buf)
for i := 0; i < n; i++ {
he(e.Encode(&Point{X: i, Y: i * 2}))
}
e = gob.NewEncoder(buf)
for i := 0; i < m; i++ {
he(e.Encode(&Point{X: i, Y: 10 + i}))
}
d := gob.NewDecoder(buf)
for i := 0; i < n; i++ {
var p *Point
he(d.Decode(&p))
fmt.Println(p)
}
d = gob.NewDecoder(buf)
for i := 0; i < m; i++ {
var p *Point
he(d.Decode(&p))
fmt.Println(p)
}
Output (try it on the Go Playground):
&{0 0}
&{1 2}
&{2 4}
&{0 10}
&{1 11}
Note that if we'd use only 1 decoder to read all the values (looping until i < n + m, we'd get the same error message you posted in your question when the iteration reaches n + 1, because the subsequent data is not a serialized Point, but the start of a new gob stream.
So if you want to stick with the gob package for doing what you want to do, you have to slightly modify, enhance your encoding / decoding process. You have to somehow mark the boundaries when a new encoder is used (so when decoding, you'll know you have to create a new decoder to read subsequent values).
You may use different techniques to achieve this:
You may write out a number, a count before you proceed to write values, and this number would tell how many values were written using the current encoder.
If you don't want to or can't tell how many values will be written with the current encoder, you may opt to write out a special end-of-encoder value when you don't write more values with the current encoder. When decoding, if you encounter this special end-of-encoder value, you'll know you have to create a new decoder to be able to read more values.
Some things to note here:
The gob package is most efficient, most compact if only a single encoder is used, because each time you create and use a new encoder, the type specifications will have to be re-transmitted, causing more overhead, and making the encoding / decoding process slower.
You can't seek in the data stream, you can only decode any value if you read the whole file from the beginning up until the value you want. Note that this somewhat applies even if you use other formats (such as JSON or XML).
If you want seeking functionality, you'd need to manage an index file separately, which would tell at which positions new encoders / decoders start, so you could seek to that position, create a new decoder, and start reading values from there.
Check a related question: Efficient Go serialization of struct to disk
In addition to the above, I suggest using an intermediate structure to exclude the gob header:
package main
import (
"bytes"
"encoding/gob"
"fmt"
"io"
"log"
)
type Point struct {
X, Y int
}
func main() {
buf := new(bytes.Buffer)
enc, _, err := NewEncoderWithoutHeader(buf, new(Point))
if err != nil {
log.Fatal(err)
}
enc.Encode(&Point{10, 10})
fmt.Println(buf.Bytes())
}
type HeaderSkiper struct {
src io.Reader
dst io.Writer
}
func (hs *HeaderSkiper) Read(p []byte) (int, error) {
return hs.src.Read(p)
}
func (hs *HeaderSkiper) Write(p []byte) (int, error) {
return hs.dst.Write(p)
}
func NewEncoderWithoutHeader(w io.Writer, sample interface{}) (*gob.Encoder, *bytes.Buffer, error) {
hs := new(HeaderSkiper)
hdr := new(bytes.Buffer)
hs.dst = hdr
enc := gob.NewEncoder(hs)
// Write sample with header info
if err := enc.Encode(sample); err != nil {
return nil, nil, err
}
// Change writer
hs.dst = w
return enc, hdr, nil
}
func NewDecoderWithoutHeader(r io.Reader, hdr *bytes.Buffer, dummy interface{}) (*gob.Decoder, error) {
hs := new(HeaderSkiper)
hs.src = hdr
dec := gob.NewDecoder(hs)
if err := dec.Decode(dummy); err != nil {
return nil, err
}
hs.src = r
return dec, nil
}
Additionally to great icza answer, you could use the following trick to append to a gob file with already written data: when append the first time write and discard the first encode:
Create the file Encode gob as usual (first encode write headers)
Close file
Open file for append
Using and intermediate writer encode dummy struct (which write headers)
Reset the writer
Encode gob as usual (writes no headers)
Example:
package main
import (
"bytes"
"encoding/gob"
"fmt"
"io"
"io/ioutil"
"log"
"os"
)
type Record struct {
ID int
Body string
}
func main() {
r1 := Record{ID: 1, Body: "abc"}
r2 := Record{ID: 2, Body: "def"}
// encode r1
var buf1 bytes.Buffer
enc := gob.NewEncoder(&buf1)
err := enc.Encode(r1)
if err != nil {
log.Fatal(err)
}
// write to file
err = ioutil.WriteFile("/tmp/log.gob", buf1.Bytes(), 0600)
if err != nil {
log.Fatal()
}
// encode dummy (which write headers)
var buf2 bytes.Buffer
enc = gob.NewEncoder(&buf2)
err = enc.Encode(Record{})
if err != nil {
log.Fatal(err)
}
// remove dummy
buf2.Reset()
// encode r2
err = enc.Encode(r2)
if err != nil {
log.Fatal(err)
}
// open file
f, err := os.OpenFile("/tmp/log.gob", os.O_WRONLY|os.O_APPEND, 0600)
if err != nil {
log.Fatal(err)
}
// write r2
_, err = f.Write(buf2.Bytes())
if err != nil {
log.Fatal(err)
}
// decode file
data, err := ioutil.ReadFile("/tmp/log.gob")
if err != nil {
log.Fatal(err)
}
var r Record
dec := gob.NewDecoder(bytes.NewReader(data))
for {
err = dec.Decode(&r)
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
fmt.Println(r)
}
}

Efficient Go serialization of struct to disk

I've been tasked to replace C++ code to Go and I'm quite new to the Go APIs. I am using gob for encoding hundreds of key/value entries to disk pages but the gob encoding has too much bloat that's not needed.
package main
import (
"bytes"
"encoding/gob"
"fmt"
)
type Entry struct {
Key string
Val string
}
func main() {
var buf bytes.Buffer
enc := gob.NewEncoder(&buf)
e := Entry { "k1", "v1" }
enc.Encode(e)
fmt.Println(buf.Bytes())
}
This produces a lot of bloat that I don't need:
[35 255 129 3 1 1 5 69 110 116 114 121 1 255 130 0 1 2 1 3 75 101 121 1 12 0 1 3 86 97 108 1 12 0 0 0 11 255 130 1 2 107 49 1 2 118 49 0]
I want to serialize each string's len followed by the raw bytes like:
[0 0 0 2 107 49 0 0 0 2 118 49]
I am saving millions of entries so the additional bloat in the encoding increases the file size by roughly x10.
How can I serialize it to the latter without manual coding?
If you zip a file named a.txt containing the text "hello" (which is 5 characters), the result zip will be around 115 bytes. Does this mean the zip format is not efficient to compress text files? Certainly not. There is an overhead. If the file contains "hello" a hundred times (500 bytes), zipping it will result in a file being 120 bytes! 1x"hello" => 115 bytes, 100x"hello" => 120 bytes! We added 495 byes, and yet the compressed size only increased by 5 bytes.
Something similar is happening with the encoding/gob package:
The implementation compiles a custom codec for each data type in the stream and is most efficient when a single Encoder is used to transmit a stream of values, amortizing the cost of compilation.
When you "first" serialize a value of a type, the definition of the type also has to be included / transmitted, so the decoder can properly interpret and decode the stream:
A stream of gobs is self-describing. Each data item in the stream is preceded by a specification of its type, expressed in terms of a small set of predefined types.
Let's return to your example:
var buf bytes.Buffer
enc := gob.NewEncoder(&buf)
e := Entry{"k1", "v1"}
enc.Encode(e)
fmt.Println(buf.Len())
It prints:
48
Now let's encode a few more of the same type:
enc.Encode(e)
fmt.Println(buf.Len())
enc.Encode(e)
fmt.Println(buf.Len())
Now the output is:
60
72
Try it on the Go Playground.
Analyzing the results:
Additional values of the same Entry type only cost 12 bytes, while the first is 48 bytes because the type definition is also included (which is ~26 bytes), but that is a one-time overhead.
So basically you transmit 2 strings: "k1" and "v1" which are 4 bytes, and the length of strings also has to be included, using 4 bytes (size of int on 32-bit architectures) gives you the 12 bytes, which is the "minimum". (Yes, you could use a smaller type for length, but that would have its limitations. A variable-length encoding would be a better choice for small numbers, see encoding/binary package.)
All in all, encoding/gob does a pretty good job for your needs. Don't get fooled by initial impressions.
If this 12 bytes for one Entry is too "much" for you, you can always wrap the stream into a compress/flate or compress/gzip writer to further reduce the size (in exchange for slower encoding/decoding and slightly higher memory requirement for the process).
Demonstration:
Let's test the following 5 solutions:
Using a "naked" output (no compression)
Using compress/flate to compress the output of encoding/gob
Using compress/zlib to compress the output of encoding/gob
Using compress/gzip to compress the output of encoding/gob
Using github.com/dsnet/compress/bzip2 to compress the output of encoding/gob
We will write a thousand entries, changing keys and values of each, being "k000", "v000", "k001", "v001" etc. This means the uncompressed size of an Entry is 4 byte + 4 byte + 4 byte + 4 byte = 16 bytes (2x4 bytes text, 2x4 byte lengths).
The code looks like this:
for _, name := range []string{"Naked", "flate", "zlib", "gzip", "bzip2"} {
buf := &bytes.Buffer{}
var out io.Writer
switch name {
case "Naked":
out = buf
case "flate":
out, _ = flate.NewWriter(buf, flate.DefaultCompression)
case "zlib":
out, _ = zlib.NewWriterLevel(buf, zlib.DefaultCompression)
case "gzip":
out = gzip.NewWriter(buf)
case "bzip2":
out, _ = bzip2.NewWriter(buf, nil)
}
enc := gob.NewEncoder(out)
e := Entry{}
for i := 0; i < 1000; i++ {
e.Key = fmt.Sprintf("k%3d", i)
e.Val = fmt.Sprintf("v%3d", i)
enc.Encode(e)
}
if c, ok := out.(io.Closer); ok {
c.Close()
}
fmt.Printf("[%5s] Length: %5d, average: %5.2f / Entry\n",
name, buf.Len(), float64(buf.Len())/1000)
}
Output:
[Naked] Length: 16036, average: 16.04 / Entry
[flate] Length: 4120, average: 4.12 / Entry
[ zlib] Length: 4126, average: 4.13 / Entry
[ gzip] Length: 4138, average: 4.14 / Entry
[bzip2] Length: 2042, average: 2.04 / Entry
Try it on the Go Playground.
As you can see: the "naked" output is 16.04 bytes/Entry, just little over the calculated size (due to the one-time tiny overhead discussed above).
When you use flate, zlib or gzip to compress the output, you can reduce the output size to about 4.13 bytes/Entry, which is about ~26% of the theoretical size, I'm sure that satisfies you. If not, you can reach out to libs providing compression with higher efficiency like bzip2, which in the above example resulted in 2.04 bytes/Entry, being 12.7% of the theoretical size!
(Note that with "real-life" data the compression ratio would probably be a lot higher as the keys and values I used in the test are very similar and thus really well compressible; still ratio should be around 50% with real-life data).
Use protobuf to efficiently encode your data.
https://github.com/golang/protobuf
Your main would look like this:
package main
import (
"fmt"
"log"
"github.com/golang/protobuf/proto"
)
func main() {
e := &Entry{
Key: proto.String("k1"),
Val: proto.String("v1"),
}
data, err := proto.Marshal(e)
if err != nil {
log.Fatal("marshaling error: ", err)
}
fmt.Println(data)
}
You create a file, example.proto like this:
package main;
message Entry {
required string Key = 1;
required string Val = 2;
}
You generate the go code from the proto file by running:
$ protoc --go_out=. *.proto
You can examine the generated file, if you wish.
You can run and see the results output:
$ go run *.go
[10 2 107 49 18 2 118 49]
"Manual coding", you're so afraid of, is trivially done in Go using the standard encoding/binary package.
You appear to store string length values as 32-bit integers in big-endian format, so you can just go on and do just that in Go:
package main
import (
"bytes"
"encoding/binary"
"fmt"
"io"
)
func encode(w io.Writer, s string) (n int, err error) {
var hdr [4]byte
binary.BigEndian.PutUint32(hdr[:], uint32(len(s)))
n, err = w.Write(hdr[:])
if err != nil {
return
}
n2, err := io.WriteString(w, s)
n += n2
return
}
func main() {
var buf bytes.Buffer
for _, s := range []string{
"ab",
"cd",
"de",
} {
_, err := encode(&buf, s)
if err != nil {
panic(err)
}
}
fmt.Printf("%v\n", buf.Bytes())
}
Playground link.
Note that in this example I'm writing to a byte buffer, but that's for demonstration purposes only—since encode() writes to an io.Writer, you can pass it an opened file, a network socket and anything else implementing that interface.

golang create pdf with cyrillic

I need to create pdf file in go with cyrillic symbols. I've started with https://github.com/jung-kurt/gofpdf but it needs 1251 symbols to produce correct cyrillic.
I've tried
package main
import (
"github.com/jung-kurt/gofpdf"
"fmt"
"os"
)
func main() {
pwd, err := os.Getwd()
if err != nil {
fmt.Println(err)
os.Exit(1)
}
pdf := gofpdf.New("P", "mm", "A4", "")
pdf.AddFont("Helvetica", "", pwd + "/font/helvetica_1251.json")
pdf.AddPage()
pdf.SetFont("Helvetica", "", 16)
tr := pdf.UnicodeTranslatorFromDescriptor("cp1251")
pdf.Cell(15, 50, tr("русский текст"))
pdf.OutputFileAndClose("test.pdf")
}
but it produce only dots instead of text.
Then I've tried to use https://github.com/golang/freetype to create image with text and then insert it to pdf. So I've tried
package main
import (
"github.com/jung-kurt/gofpdf"
"github.com/golang/freetype"
"image"
"fmt"
"os"
"bytes"
"image/jpeg"
"io/ioutil"
"image/draw"
)
func main() {
pwd, err := os.Getwd()
if err != nil {
fmt.Println(err)
os.Exit(1)
}
dataFont, err := ioutil.ReadFile(pwd + "/font/luxisr.ttf")
if err != nil {
fmt.Printf("%v",err)
}
f, err := freetype.ParseFont(dataFont)
if err != nil {
fmt.Printf("%v",err)
}
dst := image.NewRGBA(image.Rect(0, 0, 800, 600))
draw.Draw(dst, dst.Bounds(), image.White, image.ZP, draw.Src)
c := freetype.NewContext()
c.SetDst(dst)
c.SetClip(dst.Bounds())
c.SetSrc(image.Black)
c.SetFont(f)
c.DrawString("русский текст", freetype.Pt(0, 16))
pdf := gofpdf.New("P", "mm", "A4", "")
pdf.AddPage()
buf := new(bytes.Buffer)
err = jpeg.Encode(buf, dst, nil)
if err != nil {
fmt.Printf("%v",err)
}
reader := bytes.NewReader(buf.Bytes())
textName := "text1"
pdf.RegisterImageReader(textName, "jpg", reader)
pdf.Image(textName, 15, 15, 0, 0, false, "jpg", 0, "")
pdf.OutputFileAndClose("test.pdf")
}
but as a result I receive squares instead of text because it seems that freetype needs unicode symbols for text.
Is it possible to convert strings which are generally in utf-8 to unicode? How can I create pdf or image with cyrillic text?
Thank you.
First, you are ignoring an error in the final line. pdf.OutputFileAndClose returns an error, so you should check it:
err := pdf.OutputFileAndClose("test.pdf")
if err != nil {
log.Fatal(err)
}
Other than that, your first example works for me. The generated output looks like this:
Here is the code I used, you'll see it's very similar to yours:
package main
import (
"log"
"github.com/jung-kurt/gofpdf"
)
func main() {
pdf := gofpdf.New("P", "mm", "A4", "")
pdf.AddFont("Helvetica", "", "helvetica_1251.json")
pdf.AddPage()
pdf.SetFont("Helvetica", "", 16)
tr := pdf.UnicodeTranslatorFromDescriptor("cp1251")
pdf.Cell(15, 50, tr("русский текст"))
err := pdf.OutputFileAndClose("test.pdf")
if err != nil {
log.Println(err)
}
}
With the above code, the important thing is to make sure helvetica_1251.z, helvetica_1251.json and cp1251.map (from $GOPATH/src/github.com/jung-kurt/gofpdf/font, or generated by the makefont tool) are all in the current directory. If you can confirm that this works for you, you can proceed to move them into the fonts directory and changing the code accordingly. My best guess is that you may be silently ignoring an error warning you about one of these files.
PS I'm running Mac OS X. If you are on another system, make sure you have a version of Helvetica with cyrillic character support installed.
Update
For others facing this problem in the future, I wanted to add the final solution here. From the comments below:
Thanks to jung-kurt I found solution. You can avoid this bug on Windows by adding pdf.SetCompression(true) – Timur Shahmuratov

golang - How to check multipart.File information

When a user uploads a file using r.FormFile("file") you get a multipart.File, a multipart.FileHeader and an error.
My question is how to just obtain information about the uploaded file . For example, its size, its dimensions if it's an image, and so on and so forth.
I have literally got no idea on where to start so any help would be great.
To get the file size and MIME type:
// Size constants
const (
MB = 1 << 20
)
type Sizer interface {
Size() int64
}
func Sample(w http.ResponseWriter, r *http.Request) error {
if err := r.ParseMultipartForm(5 * MB); err != nil {
return err
}
// Limit upload size
r.Body = http.MaxBytesReader(w, r.Body, 5*MB) // 5 Mb
//
file, multipartFileHeader, err := r.FormFile("file")
// Create a buffer to store the header of the file in
fileHeader := make([]byte, 512)
// Copy the headers into the FileHeader buffer
if _, err := file.Read(fileHeader); err != nil {
return err
}
// set position back to start.
if _, err := file.Seek(0, 0); err != nil {
return err
}
log.Printf("Name: %#v\n", multipartFileHeader.Filename)
log.Printf("Size: %#v\n", file.(Sizer).Size())
log.Printf("MIME: %#v\n", http.DetectContentType(fileHeader))
}
Sample output:
2016/12/01 15:00:06 Name: "logo_35x30_black.png"
2016/12/01 15:00:06 Size: 18674
2016/12/01 15:00:06 MIME: "image/png"
The file name and MIME type can be obtained from the returned multipart.FileHeader.
Most further meta-data will depend on the file type. If it's an image, you should be able to use the DecodeConfig functions in the standard library, for PNG, JPEG and GIF, to obtain the dimensions (and color model).
There are many Go libraries available for other file types as well, which will have similar functions.
EDIT: There's a good example on the golang-nuts mail group.
You can get approximate information about the size of file from Content-Length header. This is not recommended, because this header can be changed.
A better way is to use ReadFrom method:
clientFile, handler, err := r.FormFile("file") // r is *http.Request
var buff bytes.Buffer
fileSize, err := buff.ReadFrom(clientFile)
fmt.Println(fileSize) // this will return you a file size.
Another way I've found pretty simple for this type of testing is to place test assets in a test_data directory relative to the package. Within my test file I normally create a helper that creates an instance of *http.Request, which allows me to run table test pretty easily on multipart.File, (errors checking removed for brevity).
func createMockRequest(pathToFile string) *http.Request {
file, err := os.Open(pathToFile)
if err != nil {
return nil
}
defer file.Close()
body := &bytes.Buffer{}
writer := multipart.NewWriter(body)
part, err := writer.CreateFormFile("file", filepath.Base(pathToFile))
if err != nil {
return nil
}
_, _ = io.Copy(part, file)
err = writer.Close()
if err != nil {
return nil
}
// the body is the only important data for creating a new request with the form data attached
req, _ := http.NewRequest("POST", "", body)
req.Header.Set("Content-Type", writer.FormDataContentType())
return req
}

How to read a binary file in Go

I'm completely new to Go and I'm trying to read a binary file, either byte by byte or several bytes at a time. The documentation doesn't help much and I cannot find any tutorial or simple example (by the way, how could Google give their language such an un-googlable name?). Basically, how can I open a file, then read some bytes into a buffer? Any suggestion?
For manipulating files, the os package is your friend:
f, err := os.Open("myfile")
if err != nil {
panic(err)
}
defer f.Close()
For more control over how the file is open, see os.OpenFile() instead (doc).
For reading files, there are many ways. The os.File type returned by os.Open (the f in the above example) implements the io.Reader interface (it has a Read() method with the right signature), it can be used directly to read some data in a buffer (a []byte) or it can also be wrapped in a buffered reader (type bufio.Reader).
Specifically for binary data, the encoding/binary package can be useful, to read a sequence of bytes into some typed structure of data. You can see an example in the Go doc here. The binary.Read() function can be used with the file read using the os.Open() function, since as I mentioned, it is a io.Reader.
And there's also the simple to use io/ioutil package, that allows you to read the whole file at once in a byte slice (ioutil.ReadFile(), which takes a file name, so you don't even have to open/close the file yourself), or ioutil.ReadAll() which takes a io.Reader and returns a slice of bytes containing the whole file. Here's the doc on ioutil.
Finally, as others mentioned, you can google about the Go language using "golang" and you should find all you need. The golang-nuts mailing list is also a great place to look for answers (make sure to search first before posting, a lot of stuff has already been answered). To look for third-party packages, check the godoc.org website.
HTH
This is what I use to read an entire binary file into memory
func RetrieveROM(filename string) ([]byte, error) {
file, err := os.Open(filename)
if err != nil {
return nil, err
}
defer file.Close()
stats, statsErr := file.Stat()
if statsErr != nil {
return nil, statsErr
}
var size int64 = stats.Size()
bytes := make([]byte, size)
bufr := bufio.NewReader(file)
_,err = bufr.Read(bytes)
return bytes, err
}
For example, to count the number of zero bytes in a file:
package main
import (
"fmt"
"io"
"os"
)
func main() {
f, err := os.Open("filename")
if err != nil {
fmt.Println(err)
return
}
defer f.Close()
data := make([]byte, 4096)
zeroes := 0
for {
data = data[:cap(data)]
n, err := f.Read(data)
if err != nil {
if err == io.EOF {
break
}
fmt.Println(err)
return
}
data = data[:n]
for _, b := range data {
if b == 0 {
zeroes++
}
}
}
fmt.Println("zeroes:", zeroes)
}
You can't whimsically cast primitive types to (char*) like in C, so for any sort of (de)serializing of binary data use the encoding/binary package.
http://golang.org/pkg/encoding/binary .
I can't improve on the examples there.
Here is an example using Read method:
package main
import (
"io"
"os"
)
func main() {
f, e := os.Open("a.go")
if e != nil {
panic(e)
}
defer f.Close()
for {
b := make([]byte, 10)
_, e = f.Read(b)
if e == io.EOF {
break
} else if e != nil {
panic(e)
}
// do something here
}
}
https://golang.org/pkg/os#File.Read