golang create pdf with cyrillic - pdf

I need to create pdf file in go with cyrillic symbols. I've started with https://github.com/jung-kurt/gofpdf but it needs 1251 symbols to produce correct cyrillic.
I've tried
package main
import (
"github.com/jung-kurt/gofpdf"
"fmt"
"os"
)
func main() {
pwd, err := os.Getwd()
if err != nil {
fmt.Println(err)
os.Exit(1)
}
pdf := gofpdf.New("P", "mm", "A4", "")
pdf.AddFont("Helvetica", "", pwd + "/font/helvetica_1251.json")
pdf.AddPage()
pdf.SetFont("Helvetica", "", 16)
tr := pdf.UnicodeTranslatorFromDescriptor("cp1251")
pdf.Cell(15, 50, tr("русский текст"))
pdf.OutputFileAndClose("test.pdf")
}
but it produce only dots instead of text.
Then I've tried to use https://github.com/golang/freetype to create image with text and then insert it to pdf. So I've tried
package main
import (
"github.com/jung-kurt/gofpdf"
"github.com/golang/freetype"
"image"
"fmt"
"os"
"bytes"
"image/jpeg"
"io/ioutil"
"image/draw"
)
func main() {
pwd, err := os.Getwd()
if err != nil {
fmt.Println(err)
os.Exit(1)
}
dataFont, err := ioutil.ReadFile(pwd + "/font/luxisr.ttf")
if err != nil {
fmt.Printf("%v",err)
}
f, err := freetype.ParseFont(dataFont)
if err != nil {
fmt.Printf("%v",err)
}
dst := image.NewRGBA(image.Rect(0, 0, 800, 600))
draw.Draw(dst, dst.Bounds(), image.White, image.ZP, draw.Src)
c := freetype.NewContext()
c.SetDst(dst)
c.SetClip(dst.Bounds())
c.SetSrc(image.Black)
c.SetFont(f)
c.DrawString("русский текст", freetype.Pt(0, 16))
pdf := gofpdf.New("P", "mm", "A4", "")
pdf.AddPage()
buf := new(bytes.Buffer)
err = jpeg.Encode(buf, dst, nil)
if err != nil {
fmt.Printf("%v",err)
}
reader := bytes.NewReader(buf.Bytes())
textName := "text1"
pdf.RegisterImageReader(textName, "jpg", reader)
pdf.Image(textName, 15, 15, 0, 0, false, "jpg", 0, "")
pdf.OutputFileAndClose("test.pdf")
}
but as a result I receive squares instead of text because it seems that freetype needs unicode symbols for text.
Is it possible to convert strings which are generally in utf-8 to unicode? How can I create pdf or image with cyrillic text?
Thank you.

First, you are ignoring an error in the final line. pdf.OutputFileAndClose returns an error, so you should check it:
err := pdf.OutputFileAndClose("test.pdf")
if err != nil {
log.Fatal(err)
}
Other than that, your first example works for me. The generated output looks like this:
Here is the code I used, you'll see it's very similar to yours:
package main
import (
"log"
"github.com/jung-kurt/gofpdf"
)
func main() {
pdf := gofpdf.New("P", "mm", "A4", "")
pdf.AddFont("Helvetica", "", "helvetica_1251.json")
pdf.AddPage()
pdf.SetFont("Helvetica", "", 16)
tr := pdf.UnicodeTranslatorFromDescriptor("cp1251")
pdf.Cell(15, 50, tr("русский текст"))
err := pdf.OutputFileAndClose("test.pdf")
if err != nil {
log.Println(err)
}
}
With the above code, the important thing is to make sure helvetica_1251.z, helvetica_1251.json and cp1251.map (from $GOPATH/src/github.com/jung-kurt/gofpdf/font, or generated by the makefont tool) are all in the current directory. If you can confirm that this works for you, you can proceed to move them into the fonts directory and changing the code accordingly. My best guess is that you may be silently ignoring an error warning you about one of these files.
PS I'm running Mac OS X. If you are on another system, make sure you have a version of Helvetica with cyrillic character support installed.
Update
For others facing this problem in the future, I wanted to add the final solution here. From the comments below:
Thanks to jung-kurt I found solution. You can avoid this bug on Windows by adding pdf.SetCompression(true) – Timur Shahmuratov

Related

Use Gob to write logs to a file in an append style

Would it be possible to use Gob encoding for appending structs in series to the same file using append? It works for writing, but when reading with the decoder more than once I run into:
extra data in buffer
So I wonder if that's possible in the first place or whether I should use something like JSON to append JSON documents on a per line basis instead. Because the alternative would be to serialize a slice, but then again reading it as a whole would defeat the purpose of append.
The gob package wasn't designed to be used this way. A gob stream has to be written by a single gob.Encoder, and it also has to be read by a single gob.Decoder.
The reason for this is because the gob package not only serializes the values you pass to it, it also transmits data to describe their types:
A stream of gobs is self-describing. Each data item in the stream is preceded by a specification of its type, expressed in terms of a small set of predefined types.
This is a state of the encoder / decoder–about what types and how they have been transmitted–, a subsequent new encoder / decoder will not (cannot) analyze the "preceeding" stream to reconstruct the same state and continue where a previous encoder / decoder left off.
Of course if you create a single gob.Encoder, you may use it to serialize as many values as you'd like to.
Also you can create a gob.Encoder and write to a file, and then later create a new gob.Encoder, and append to the same file, but you must use 2 gob.Decoders to read those values, exactly matching the encoding process.
As a demonstration, let's follow an example. This example will write to an in-memory buffer (bytes.Buffer). 2 subsequent encoders will write to it, then we will use 2 subsequent decoders to read the values. We'll write values of this struct:
type Point struct {
X, Y int
}
For short, compact code, I use this "error handler" function:
func he(err error) {
if err != nil {
panic(err)
}
}
And now the code:
const n, m = 3, 2
buf := &bytes.Buffer{}
e := gob.NewEncoder(buf)
for i := 0; i < n; i++ {
he(e.Encode(&Point{X: i, Y: i * 2}))
}
e = gob.NewEncoder(buf)
for i := 0; i < m; i++ {
he(e.Encode(&Point{X: i, Y: 10 + i}))
}
d := gob.NewDecoder(buf)
for i := 0; i < n; i++ {
var p *Point
he(d.Decode(&p))
fmt.Println(p)
}
d = gob.NewDecoder(buf)
for i := 0; i < m; i++ {
var p *Point
he(d.Decode(&p))
fmt.Println(p)
}
Output (try it on the Go Playground):
&{0 0}
&{1 2}
&{2 4}
&{0 10}
&{1 11}
Note that if we'd use only 1 decoder to read all the values (looping until i < n + m, we'd get the same error message you posted in your question when the iteration reaches n + 1, because the subsequent data is not a serialized Point, but the start of a new gob stream.
So if you want to stick with the gob package for doing what you want to do, you have to slightly modify, enhance your encoding / decoding process. You have to somehow mark the boundaries when a new encoder is used (so when decoding, you'll know you have to create a new decoder to read subsequent values).
You may use different techniques to achieve this:
You may write out a number, a count before you proceed to write values, and this number would tell how many values were written using the current encoder.
If you don't want to or can't tell how many values will be written with the current encoder, you may opt to write out a special end-of-encoder value when you don't write more values with the current encoder. When decoding, if you encounter this special end-of-encoder value, you'll know you have to create a new decoder to be able to read more values.
Some things to note here:
The gob package is most efficient, most compact if only a single encoder is used, because each time you create and use a new encoder, the type specifications will have to be re-transmitted, causing more overhead, and making the encoding / decoding process slower.
You can't seek in the data stream, you can only decode any value if you read the whole file from the beginning up until the value you want. Note that this somewhat applies even if you use other formats (such as JSON or XML).
If you want seeking functionality, you'd need to manage an index file separately, which would tell at which positions new encoders / decoders start, so you could seek to that position, create a new decoder, and start reading values from there.
Check a related question: Efficient Go serialization of struct to disk
In addition to the above, I suggest using an intermediate structure to exclude the gob header:
package main
import (
"bytes"
"encoding/gob"
"fmt"
"io"
"log"
)
type Point struct {
X, Y int
}
func main() {
buf := new(bytes.Buffer)
enc, _, err := NewEncoderWithoutHeader(buf, new(Point))
if err != nil {
log.Fatal(err)
}
enc.Encode(&Point{10, 10})
fmt.Println(buf.Bytes())
}
type HeaderSkiper struct {
src io.Reader
dst io.Writer
}
func (hs *HeaderSkiper) Read(p []byte) (int, error) {
return hs.src.Read(p)
}
func (hs *HeaderSkiper) Write(p []byte) (int, error) {
return hs.dst.Write(p)
}
func NewEncoderWithoutHeader(w io.Writer, sample interface{}) (*gob.Encoder, *bytes.Buffer, error) {
hs := new(HeaderSkiper)
hdr := new(bytes.Buffer)
hs.dst = hdr
enc := gob.NewEncoder(hs)
// Write sample with header info
if err := enc.Encode(sample); err != nil {
return nil, nil, err
}
// Change writer
hs.dst = w
return enc, hdr, nil
}
func NewDecoderWithoutHeader(r io.Reader, hdr *bytes.Buffer, dummy interface{}) (*gob.Decoder, error) {
hs := new(HeaderSkiper)
hs.src = hdr
dec := gob.NewDecoder(hs)
if err := dec.Decode(dummy); err != nil {
return nil, err
}
hs.src = r
return dec, nil
}
Additionally to great icza answer, you could use the following trick to append to a gob file with already written data: when append the first time write and discard the first encode:
Create the file Encode gob as usual (first encode write headers)
Close file
Open file for append
Using and intermediate writer encode dummy struct (which write headers)
Reset the writer
Encode gob as usual (writes no headers)
Example:
package main
import (
"bytes"
"encoding/gob"
"fmt"
"io"
"io/ioutil"
"log"
"os"
)
type Record struct {
ID int
Body string
}
func main() {
r1 := Record{ID: 1, Body: "abc"}
r2 := Record{ID: 2, Body: "def"}
// encode r1
var buf1 bytes.Buffer
enc := gob.NewEncoder(&buf1)
err := enc.Encode(r1)
if err != nil {
log.Fatal(err)
}
// write to file
err = ioutil.WriteFile("/tmp/log.gob", buf1.Bytes(), 0600)
if err != nil {
log.Fatal()
}
// encode dummy (which write headers)
var buf2 bytes.Buffer
enc = gob.NewEncoder(&buf2)
err = enc.Encode(Record{})
if err != nil {
log.Fatal(err)
}
// remove dummy
buf2.Reset()
// encode r2
err = enc.Encode(r2)
if err != nil {
log.Fatal(err)
}
// open file
f, err := os.OpenFile("/tmp/log.gob", os.O_WRONLY|os.O_APPEND, 0600)
if err != nil {
log.Fatal(err)
}
// write r2
_, err = f.Write(buf2.Bytes())
if err != nil {
log.Fatal(err)
}
// decode file
data, err := ioutil.ReadFile("/tmp/log.gob")
if err != nil {
log.Fatal(err)
}
var r Record
dec := gob.NewDecoder(bytes.NewReader(data))
for {
err = dec.Decode(&r)
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
fmt.Println(r)
}
}

Accessing Data From Interfaces in Go

I am trying to implement a simple api in Golang. My experience in the backend is more with python and node, so I am having some difficulty printing out data held within the interface since it won't allow me to index it. I have searched around and several people have asked similar questions when the interface is one value, but not when the interface is a slice, I believe ([]interface{}). I have tried vaping the interface to no avail.
When I point the browser to /quandl/ddd/10 I would like to fmt.Println the specific numerical data, i.e. ("2017-01-13",
15.67,
16.41,
15.67,
16.11,
3595248,
0,
1,
15.67,
16.41,
15.67,
16.11,
3595248
])
package main
import (
"encoding/json"
"fmt"
"io/ioutil"
"net/http"
"net/url"
"github.com/fatih/color"
"github.com/gorilla/mux"
)
type QuandlResponse struct {
SourceCode string `json:"source_code"`
SourceName string `json:"source_name"`
Code string `json:"code"`
Frequency string `json:"frequency"`
FromDate string `json:"from_date"`
ToDate string `json:"to_date"`
Columns []string `json:"column_names"`
Data interface{} `json:"data"`
}
func getContent(w http.ResponseWriter, r *http.Request) {
stock := mux.Vars(r)["stock"]
limit := mux.Vars(r)["limit"]
url := "https://www.quandl.com/api/v1/datasets/WIKI/" + url.QueryEscape(stock) + ".json?&limit=" + url.QueryEscape(limit) + "&auth_token=XXXXX"
response, err := http.Get(url)
if err != nil {
fmt.Println(err)
}
contents, err := ioutil.ReadAll(response.Body)
var result QuandlResponse
json.Unmarshal(contents, &result)
json.NewEncoder(w).Encode(result)
fmt.Println(result.Data[0])
}
func callAll() {
rabbit := mux.NewRouter()
rabbit.HandleFunc("/quandl/{stock}/{limit}", getContent)
http.ListenAndServe(":8000", rabbit)
}
func main() {
color.Blue("Running Server #localhost:8000")
callAll()
}
If you know that the type of Data is []interface{}, you can do a type assertion:
slice := result.Data.([]interface{})
fmt.Println(slice[0])
If there are several possibilities for the type of Data, you can use a type switch:
switch data := result.Data.(type) {
case []interface{}:
fmt.Println(data[0])
case string:
fmt.Println(data)
default:
// unexpected type
}
You may also want to look at the reflect package if your requirements are more complicated.

Extract words from PDF with golang?

I don't understand type conversion. I know this isn't right, all I get is a bunch of hieroglyphs.
f, _ := os.Open("test.pdf")
defer f.Close()
io.Copy(os.Stdout, f)
I want to work with the strings....
I tried some go pdf libs, and found sajari/docconv works like I expect.
easy to use, here is a example:
package main
import (
"fmt"
"log"
"code.sajari.com/docconv"
)
func main() {
res, err := docconv.ConvertPath("your-file.pdf")
if err != nil {
log.Fatal(err)
}
fmt.Println(res)
}
It's because the PDF doesn't only contain the text, but it also contains the formats (fonts, padding, margin, position, shapes, image) information.
In case you need to read the plain text without format. I have forked a repository and implement the function to do that. You can check it at https://github.com/ledongthuc/pdf
I also have put an example, help it useful for you.
package main
import (
"bytes"
"fmt"
"github.com/ledongthuc/pdf"
)
func main() {
content, err := readPdf("test.pdf") // Read local pdf file
if err != nil {
panic(err)
}
fmt.Println(content)
return
}
func readPdf(path string) (string, error) {
r, err := pdf.Open(path)
if err != nil {
return "", err
}
totalPage := r.NumPage()
var textBuilder bytes.Buffer
for pageIndex := 1; pageIndex <= totalPage; pageIndex++ {
p := r.Page(pageIndex)
if p.V.IsNull() {
continue
}
textBuilder.WriteString(p.GetPlainText("\n"))
}
return textBuilder.String(), nil
}
all I get is a bunch of hieroglyphs.
What you get is the content of a pdf file, which is not clear text.
If you want to read a pdf file in Go, use one of the golang pdf libraries like rsc.io/pdf, or one of those libraries like yob/pdfreader.
As mentioned here:
I doubt there is any 'solid framework' for this kind of stuff. PDF format isn't meant to be machine-friendly by design, and AFAIK there is no guaranteed way to parse arbitrary PDFs.

golang - How to check multipart.File information

When a user uploads a file using r.FormFile("file") you get a multipart.File, a multipart.FileHeader and an error.
My question is how to just obtain information about the uploaded file . For example, its size, its dimensions if it's an image, and so on and so forth.
I have literally got no idea on where to start so any help would be great.
To get the file size and MIME type:
// Size constants
const (
MB = 1 << 20
)
type Sizer interface {
Size() int64
}
func Sample(w http.ResponseWriter, r *http.Request) error {
if err := r.ParseMultipartForm(5 * MB); err != nil {
return err
}
// Limit upload size
r.Body = http.MaxBytesReader(w, r.Body, 5*MB) // 5 Mb
//
file, multipartFileHeader, err := r.FormFile("file")
// Create a buffer to store the header of the file in
fileHeader := make([]byte, 512)
// Copy the headers into the FileHeader buffer
if _, err := file.Read(fileHeader); err != nil {
return err
}
// set position back to start.
if _, err := file.Seek(0, 0); err != nil {
return err
}
log.Printf("Name: %#v\n", multipartFileHeader.Filename)
log.Printf("Size: %#v\n", file.(Sizer).Size())
log.Printf("MIME: %#v\n", http.DetectContentType(fileHeader))
}
Sample output:
2016/12/01 15:00:06 Name: "logo_35x30_black.png"
2016/12/01 15:00:06 Size: 18674
2016/12/01 15:00:06 MIME: "image/png"
The file name and MIME type can be obtained from the returned multipart.FileHeader.
Most further meta-data will depend on the file type. If it's an image, you should be able to use the DecodeConfig functions in the standard library, for PNG, JPEG and GIF, to obtain the dimensions (and color model).
There are many Go libraries available for other file types as well, which will have similar functions.
EDIT: There's a good example on the golang-nuts mail group.
You can get approximate information about the size of file from Content-Length header. This is not recommended, because this header can be changed.
A better way is to use ReadFrom method:
clientFile, handler, err := r.FormFile("file") // r is *http.Request
var buff bytes.Buffer
fileSize, err := buff.ReadFrom(clientFile)
fmt.Println(fileSize) // this will return you a file size.
Another way I've found pretty simple for this type of testing is to place test assets in a test_data directory relative to the package. Within my test file I normally create a helper that creates an instance of *http.Request, which allows me to run table test pretty easily on multipart.File, (errors checking removed for brevity).
func createMockRequest(pathToFile string) *http.Request {
file, err := os.Open(pathToFile)
if err != nil {
return nil
}
defer file.Close()
body := &bytes.Buffer{}
writer := multipart.NewWriter(body)
part, err := writer.CreateFormFile("file", filepath.Base(pathToFile))
if err != nil {
return nil
}
_, _ = io.Copy(part, file)
err = writer.Close()
if err != nil {
return nil
}
// the body is the only important data for creating a new request with the form data attached
req, _ := http.NewRequest("POST", "", body)
req.Header.Set("Content-Type", writer.FormDataContentType())
return req
}

How to read a binary file in Go

I'm completely new to Go and I'm trying to read a binary file, either byte by byte or several bytes at a time. The documentation doesn't help much and I cannot find any tutorial or simple example (by the way, how could Google give their language such an un-googlable name?). Basically, how can I open a file, then read some bytes into a buffer? Any suggestion?
For manipulating files, the os package is your friend:
f, err := os.Open("myfile")
if err != nil {
panic(err)
}
defer f.Close()
For more control over how the file is open, see os.OpenFile() instead (doc).
For reading files, there are many ways. The os.File type returned by os.Open (the f in the above example) implements the io.Reader interface (it has a Read() method with the right signature), it can be used directly to read some data in a buffer (a []byte) or it can also be wrapped in a buffered reader (type bufio.Reader).
Specifically for binary data, the encoding/binary package can be useful, to read a sequence of bytes into some typed structure of data. You can see an example in the Go doc here. The binary.Read() function can be used with the file read using the os.Open() function, since as I mentioned, it is a io.Reader.
And there's also the simple to use io/ioutil package, that allows you to read the whole file at once in a byte slice (ioutil.ReadFile(), which takes a file name, so you don't even have to open/close the file yourself), or ioutil.ReadAll() which takes a io.Reader and returns a slice of bytes containing the whole file. Here's the doc on ioutil.
Finally, as others mentioned, you can google about the Go language using "golang" and you should find all you need. The golang-nuts mailing list is also a great place to look for answers (make sure to search first before posting, a lot of stuff has already been answered). To look for third-party packages, check the godoc.org website.
HTH
This is what I use to read an entire binary file into memory
func RetrieveROM(filename string) ([]byte, error) {
file, err := os.Open(filename)
if err != nil {
return nil, err
}
defer file.Close()
stats, statsErr := file.Stat()
if statsErr != nil {
return nil, statsErr
}
var size int64 = stats.Size()
bytes := make([]byte, size)
bufr := bufio.NewReader(file)
_,err = bufr.Read(bytes)
return bytes, err
}
For example, to count the number of zero bytes in a file:
package main
import (
"fmt"
"io"
"os"
)
func main() {
f, err := os.Open("filename")
if err != nil {
fmt.Println(err)
return
}
defer f.Close()
data := make([]byte, 4096)
zeroes := 0
for {
data = data[:cap(data)]
n, err := f.Read(data)
if err != nil {
if err == io.EOF {
break
}
fmt.Println(err)
return
}
data = data[:n]
for _, b := range data {
if b == 0 {
zeroes++
}
}
}
fmt.Println("zeroes:", zeroes)
}
You can't whimsically cast primitive types to (char*) like in C, so for any sort of (de)serializing of binary data use the encoding/binary package.
http://golang.org/pkg/encoding/binary .
I can't improve on the examples there.
Here is an example using Read method:
package main
import (
"io"
"os"
)
func main() {
f, e := os.Open("a.go")
if e != nil {
panic(e)
}
defer f.Close()
for {
b := make([]byte, 10)
_, e = f.Read(b)
if e == io.EOF {
break
} else if e != nil {
panic(e)
}
// do something here
}
}
https://golang.org/pkg/os#File.Read