How to read a binary file in Go - file-io

I'm completely new to Go and I'm trying to read a binary file, either byte by byte or several bytes at a time. The documentation doesn't help much and I cannot find any tutorial or simple example (by the way, how could Google give their language such an un-googlable name?). Basically, how can I open a file, then read some bytes into a buffer? Any suggestion?

For manipulating files, the os package is your friend:
f, err := os.Open("myfile")
if err != nil {
panic(err)
}
defer f.Close()
For more control over how the file is open, see os.OpenFile() instead (doc).
For reading files, there are many ways. The os.File type returned by os.Open (the f in the above example) implements the io.Reader interface (it has a Read() method with the right signature), it can be used directly to read some data in a buffer (a []byte) or it can also be wrapped in a buffered reader (type bufio.Reader).
Specifically for binary data, the encoding/binary package can be useful, to read a sequence of bytes into some typed structure of data. You can see an example in the Go doc here. The binary.Read() function can be used with the file read using the os.Open() function, since as I mentioned, it is a io.Reader.
And there's also the simple to use io/ioutil package, that allows you to read the whole file at once in a byte slice (ioutil.ReadFile(), which takes a file name, so you don't even have to open/close the file yourself), or ioutil.ReadAll() which takes a io.Reader and returns a slice of bytes containing the whole file. Here's the doc on ioutil.
Finally, as others mentioned, you can google about the Go language using "golang" and you should find all you need. The golang-nuts mailing list is also a great place to look for answers (make sure to search first before posting, a lot of stuff has already been answered). To look for third-party packages, check the godoc.org website.
HTH

This is what I use to read an entire binary file into memory
func RetrieveROM(filename string) ([]byte, error) {
file, err := os.Open(filename)
if err != nil {
return nil, err
}
defer file.Close()
stats, statsErr := file.Stat()
if statsErr != nil {
return nil, statsErr
}
var size int64 = stats.Size()
bytes := make([]byte, size)
bufr := bufio.NewReader(file)
_,err = bufr.Read(bytes)
return bytes, err
}

For example, to count the number of zero bytes in a file:
package main
import (
"fmt"
"io"
"os"
)
func main() {
f, err := os.Open("filename")
if err != nil {
fmt.Println(err)
return
}
defer f.Close()
data := make([]byte, 4096)
zeroes := 0
for {
data = data[:cap(data)]
n, err := f.Read(data)
if err != nil {
if err == io.EOF {
break
}
fmt.Println(err)
return
}
data = data[:n]
for _, b := range data {
if b == 0 {
zeroes++
}
}
}
fmt.Println("zeroes:", zeroes)
}

You can't whimsically cast primitive types to (char*) like in C, so for any sort of (de)serializing of binary data use the encoding/binary package.
http://golang.org/pkg/encoding/binary .
I can't improve on the examples there.

Here is an example using Read method:
package main
import (
"io"
"os"
)
func main() {
f, e := os.Open("a.go")
if e != nil {
panic(e)
}
defer f.Close()
for {
b := make([]byte, 10)
_, e = f.Read(b)
if e == io.EOF {
break
} else if e != nil {
panic(e)
}
// do something here
}
}
https://golang.org/pkg/os#File.Read

Related

How to check a log/output in go test?

I have this function that logs the error in some cases:
func readByte(/*...*/){
// ...
if err != nil {
fmt.Println("ERROR")
log.Print("Couldn't read first byte")
return
}
// ...
}
Now, in the test file, I want to check the output error from this function:
c.Assert(OUTPUT, check.Matches, "teste")
How can I access the log? I tried to put a buffer but it didn't work. What is the right way to catch this log without change my readByte function code?
For example,
readbyte_test.go:
package main
import (
"bytes"
"fmt"
"io"
"log"
"os"
"testing"
)
func readByte( /*...*/ ) {
// ...
err := io.EOF // force an error
if err != nil {
fmt.Println("ERROR")
log.Print("Couldn't read first byte")
return
}
// ...
}
func TestReadByte(t *testing.T) {
var buf bytes.Buffer
log.SetOutput(&buf)
defer func() {
log.SetOutput(os.Stderr)
}()
readByte()
t.Log(buf.String())
}
Output:
$ go test -v readbyte_test.go
=== RUN TestReadByte
ERROR
--- PASS: TestReadByte (0.00s)
readbyte_test.go:30: 2017/05/22 16:41:00 Couldn't read first byte
PASS
ok command-line-arguments 0.004s
$
Answer for Concurrent Tests
If your test is running concurrently (for example, when testing an http Server or Client), you may encounter a race between writing to the buffer and reading from it. Instead of the buffer, we can redirect output to an os.Pipe and use a bufio.Scanner to block until output has been written by using the Scan() method.
Here is an example of creating an os.Pipe and setting the stdlib log package to use the pipe. Note my use of the testify/assert package here:
func mockLogger(t *testing.T) (*bufio.Scanner, *os.File, *os.File) {
reader, writer, err := os.Pipe()
if err != nil {
assert.Fail(t, "couldn't get os Pipe: %v", err)
}
log.SetOutput(writer)
return bufio.NewScanner(reader), reader, writer
}
The *os.File objects are returned so they can be properly closed with a deferred function. Here I'm just printing to stdout since if there was some strange error on close I personally wouldn't want to fail the test. However, this could easily be another call to t.Errorf or similar if you wanted:
func resetLogger(reader *os.File, writer *os.File) {
err := reader.Close()
if err != nil {
fmt.Println("error closing reader was ", err)
}
if err = writer.Close(); err != nil {
fmt.Println("error closing writer was ", err)
}
log.SetOutput(os.Stderr)
}
And then in your test you would have this pattern:
scanner, reader, writer := mockLogger(t) // turn this off when debugging or developing as you will miss output!
defer resetLogger(reader, writer)
// other setup as needed, getting some value for thing below
go concurrentAction()
scanner.Scan() // blocks until a new line is written to the pipe
got := scanner.Text() // the last line written to the scanner
msg := fmt.Sprintf("your log message with thing %v you care about", thing)
assert.Contains(t, got, msg)
And finally, the concurrentAction() function is calling a log function (or method if using a log.logger, the package actually behaves the same way with log.SetOutput() call above either way) like:
// doing something, getting value for thing
log.Printf("your log message with the thing %v you care about", thing)

Use Gob to write logs to a file in an append style

Would it be possible to use Gob encoding for appending structs in series to the same file using append? It works for writing, but when reading with the decoder more than once I run into:
extra data in buffer
So I wonder if that's possible in the first place or whether I should use something like JSON to append JSON documents on a per line basis instead. Because the alternative would be to serialize a slice, but then again reading it as a whole would defeat the purpose of append.
The gob package wasn't designed to be used this way. A gob stream has to be written by a single gob.Encoder, and it also has to be read by a single gob.Decoder.
The reason for this is because the gob package not only serializes the values you pass to it, it also transmits data to describe their types:
A stream of gobs is self-describing. Each data item in the stream is preceded by a specification of its type, expressed in terms of a small set of predefined types.
This is a state of the encoder / decoder–about what types and how they have been transmitted–, a subsequent new encoder / decoder will not (cannot) analyze the "preceeding" stream to reconstruct the same state and continue where a previous encoder / decoder left off.
Of course if you create a single gob.Encoder, you may use it to serialize as many values as you'd like to.
Also you can create a gob.Encoder and write to a file, and then later create a new gob.Encoder, and append to the same file, but you must use 2 gob.Decoders to read those values, exactly matching the encoding process.
As a demonstration, let's follow an example. This example will write to an in-memory buffer (bytes.Buffer). 2 subsequent encoders will write to it, then we will use 2 subsequent decoders to read the values. We'll write values of this struct:
type Point struct {
X, Y int
}
For short, compact code, I use this "error handler" function:
func he(err error) {
if err != nil {
panic(err)
}
}
And now the code:
const n, m = 3, 2
buf := &bytes.Buffer{}
e := gob.NewEncoder(buf)
for i := 0; i < n; i++ {
he(e.Encode(&Point{X: i, Y: i * 2}))
}
e = gob.NewEncoder(buf)
for i := 0; i < m; i++ {
he(e.Encode(&Point{X: i, Y: 10 + i}))
}
d := gob.NewDecoder(buf)
for i := 0; i < n; i++ {
var p *Point
he(d.Decode(&p))
fmt.Println(p)
}
d = gob.NewDecoder(buf)
for i := 0; i < m; i++ {
var p *Point
he(d.Decode(&p))
fmt.Println(p)
}
Output (try it on the Go Playground):
&{0 0}
&{1 2}
&{2 4}
&{0 10}
&{1 11}
Note that if we'd use only 1 decoder to read all the values (looping until i < n + m, we'd get the same error message you posted in your question when the iteration reaches n + 1, because the subsequent data is not a serialized Point, but the start of a new gob stream.
So if you want to stick with the gob package for doing what you want to do, you have to slightly modify, enhance your encoding / decoding process. You have to somehow mark the boundaries when a new encoder is used (so when decoding, you'll know you have to create a new decoder to read subsequent values).
You may use different techniques to achieve this:
You may write out a number, a count before you proceed to write values, and this number would tell how many values were written using the current encoder.
If you don't want to or can't tell how many values will be written with the current encoder, you may opt to write out a special end-of-encoder value when you don't write more values with the current encoder. When decoding, if you encounter this special end-of-encoder value, you'll know you have to create a new decoder to be able to read more values.
Some things to note here:
The gob package is most efficient, most compact if only a single encoder is used, because each time you create and use a new encoder, the type specifications will have to be re-transmitted, causing more overhead, and making the encoding / decoding process slower.
You can't seek in the data stream, you can only decode any value if you read the whole file from the beginning up until the value you want. Note that this somewhat applies even if you use other formats (such as JSON or XML).
If you want seeking functionality, you'd need to manage an index file separately, which would tell at which positions new encoders / decoders start, so you could seek to that position, create a new decoder, and start reading values from there.
Check a related question: Efficient Go serialization of struct to disk
In addition to the above, I suggest using an intermediate structure to exclude the gob header:
package main
import (
"bytes"
"encoding/gob"
"fmt"
"io"
"log"
)
type Point struct {
X, Y int
}
func main() {
buf := new(bytes.Buffer)
enc, _, err := NewEncoderWithoutHeader(buf, new(Point))
if err != nil {
log.Fatal(err)
}
enc.Encode(&Point{10, 10})
fmt.Println(buf.Bytes())
}
type HeaderSkiper struct {
src io.Reader
dst io.Writer
}
func (hs *HeaderSkiper) Read(p []byte) (int, error) {
return hs.src.Read(p)
}
func (hs *HeaderSkiper) Write(p []byte) (int, error) {
return hs.dst.Write(p)
}
func NewEncoderWithoutHeader(w io.Writer, sample interface{}) (*gob.Encoder, *bytes.Buffer, error) {
hs := new(HeaderSkiper)
hdr := new(bytes.Buffer)
hs.dst = hdr
enc := gob.NewEncoder(hs)
// Write sample with header info
if err := enc.Encode(sample); err != nil {
return nil, nil, err
}
// Change writer
hs.dst = w
return enc, hdr, nil
}
func NewDecoderWithoutHeader(r io.Reader, hdr *bytes.Buffer, dummy interface{}) (*gob.Decoder, error) {
hs := new(HeaderSkiper)
hs.src = hdr
dec := gob.NewDecoder(hs)
if err := dec.Decode(dummy); err != nil {
return nil, err
}
hs.src = r
return dec, nil
}
Additionally to great icza answer, you could use the following trick to append to a gob file with already written data: when append the first time write and discard the first encode:
Create the file Encode gob as usual (first encode write headers)
Close file
Open file for append
Using and intermediate writer encode dummy struct (which write headers)
Reset the writer
Encode gob as usual (writes no headers)
Example:
package main
import (
"bytes"
"encoding/gob"
"fmt"
"io"
"io/ioutil"
"log"
"os"
)
type Record struct {
ID int
Body string
}
func main() {
r1 := Record{ID: 1, Body: "abc"}
r2 := Record{ID: 2, Body: "def"}
// encode r1
var buf1 bytes.Buffer
enc := gob.NewEncoder(&buf1)
err := enc.Encode(r1)
if err != nil {
log.Fatal(err)
}
// write to file
err = ioutil.WriteFile("/tmp/log.gob", buf1.Bytes(), 0600)
if err != nil {
log.Fatal()
}
// encode dummy (which write headers)
var buf2 bytes.Buffer
enc = gob.NewEncoder(&buf2)
err = enc.Encode(Record{})
if err != nil {
log.Fatal(err)
}
// remove dummy
buf2.Reset()
// encode r2
err = enc.Encode(r2)
if err != nil {
log.Fatal(err)
}
// open file
f, err := os.OpenFile("/tmp/log.gob", os.O_WRONLY|os.O_APPEND, 0600)
if err != nil {
log.Fatal(err)
}
// write r2
_, err = f.Write(buf2.Bytes())
if err != nil {
log.Fatal(err)
}
// decode file
data, err := ioutil.ReadFile("/tmp/log.gob")
if err != nil {
log.Fatal(err)
}
var r Record
dec := gob.NewDecoder(bytes.NewReader(data))
for {
err = dec.Decode(&r)
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
fmt.Println(r)
}
}

Extract words from PDF with golang?

I don't understand type conversion. I know this isn't right, all I get is a bunch of hieroglyphs.
f, _ := os.Open("test.pdf")
defer f.Close()
io.Copy(os.Stdout, f)
I want to work with the strings....
I tried some go pdf libs, and found sajari/docconv works like I expect.
easy to use, here is a example:
package main
import (
"fmt"
"log"
"code.sajari.com/docconv"
)
func main() {
res, err := docconv.ConvertPath("your-file.pdf")
if err != nil {
log.Fatal(err)
}
fmt.Println(res)
}
It's because the PDF doesn't only contain the text, but it also contains the formats (fonts, padding, margin, position, shapes, image) information.
In case you need to read the plain text without format. I have forked a repository and implement the function to do that. You can check it at https://github.com/ledongthuc/pdf
I also have put an example, help it useful for you.
package main
import (
"bytes"
"fmt"
"github.com/ledongthuc/pdf"
)
func main() {
content, err := readPdf("test.pdf") // Read local pdf file
if err != nil {
panic(err)
}
fmt.Println(content)
return
}
func readPdf(path string) (string, error) {
r, err := pdf.Open(path)
if err != nil {
return "", err
}
totalPage := r.NumPage()
var textBuilder bytes.Buffer
for pageIndex := 1; pageIndex <= totalPage; pageIndex++ {
p := r.Page(pageIndex)
if p.V.IsNull() {
continue
}
textBuilder.WriteString(p.GetPlainText("\n"))
}
return textBuilder.String(), nil
}
all I get is a bunch of hieroglyphs.
What you get is the content of a pdf file, which is not clear text.
If you want to read a pdf file in Go, use one of the golang pdf libraries like rsc.io/pdf, or one of those libraries like yob/pdfreader.
As mentioned here:
I doubt there is any 'solid framework' for this kind of stuff. PDF format isn't meant to be machine-friendly by design, and AFAIK there is no guaranteed way to parse arbitrary PDFs.

golang upload file err runtime error index out of range

I've put together a golang func that takes an uploaded file and saves it to folder.
Just before os.Create() I am getting the following error :
http: panic serving [::1]:64373: runtime error: index out of range
My golang function is:
func webUploadHandler(w http.ResponseWriter, r *http.Request) {
file, header, err := r.FormFile("file") // the FormFile function takes in the POST input id file
if err != nil {
fmt.Fprintln(w, err)
return
}
defer file.Close()
// My error comes here
messageId := r.URL.Query()["id"][0]
out, err := os.Create("./upload/" + messageId + ".mp3")
if err != nil {
fmt.Fprintf(w, "Unable to create the file for writing. Check your write access privilege")
return
}
defer out.Close()
// write the content from POST to the file
_, err = io.Copy(out, file)
if err != nil {
fmt.Fprintln(w, err)
}
fmt.Fprintf(w,"File uploaded successfully : ")
fmt.Fprintf(w, header.Filename)
}
any ideas? much appreciate
You should at least check if r.URL.Query()["id"] has actually one element.
If len(r.URL.Query()["id"]), you could consider not accessing the index 0.
Easier, Ainar-G suggests in the comments to use the Get() method
Get gets the first value associated with the given key.
If there are no values associated with the key, Get returns the empty string.
To access multiple values, use the map directly.

golang - How to check multipart.File information

When a user uploads a file using r.FormFile("file") you get a multipart.File, a multipart.FileHeader and an error.
My question is how to just obtain information about the uploaded file . For example, its size, its dimensions if it's an image, and so on and so forth.
I have literally got no idea on where to start so any help would be great.
To get the file size and MIME type:
// Size constants
const (
MB = 1 << 20
)
type Sizer interface {
Size() int64
}
func Sample(w http.ResponseWriter, r *http.Request) error {
if err := r.ParseMultipartForm(5 * MB); err != nil {
return err
}
// Limit upload size
r.Body = http.MaxBytesReader(w, r.Body, 5*MB) // 5 Mb
//
file, multipartFileHeader, err := r.FormFile("file")
// Create a buffer to store the header of the file in
fileHeader := make([]byte, 512)
// Copy the headers into the FileHeader buffer
if _, err := file.Read(fileHeader); err != nil {
return err
}
// set position back to start.
if _, err := file.Seek(0, 0); err != nil {
return err
}
log.Printf("Name: %#v\n", multipartFileHeader.Filename)
log.Printf("Size: %#v\n", file.(Sizer).Size())
log.Printf("MIME: %#v\n", http.DetectContentType(fileHeader))
}
Sample output:
2016/12/01 15:00:06 Name: "logo_35x30_black.png"
2016/12/01 15:00:06 Size: 18674
2016/12/01 15:00:06 MIME: "image/png"
The file name and MIME type can be obtained from the returned multipart.FileHeader.
Most further meta-data will depend on the file type. If it's an image, you should be able to use the DecodeConfig functions in the standard library, for PNG, JPEG and GIF, to obtain the dimensions (and color model).
There are many Go libraries available for other file types as well, which will have similar functions.
EDIT: There's a good example on the golang-nuts mail group.
You can get approximate information about the size of file from Content-Length header. This is not recommended, because this header can be changed.
A better way is to use ReadFrom method:
clientFile, handler, err := r.FormFile("file") // r is *http.Request
var buff bytes.Buffer
fileSize, err := buff.ReadFrom(clientFile)
fmt.Println(fileSize) // this will return you a file size.
Another way I've found pretty simple for this type of testing is to place test assets in a test_data directory relative to the package. Within my test file I normally create a helper that creates an instance of *http.Request, which allows me to run table test pretty easily on multipart.File, (errors checking removed for brevity).
func createMockRequest(pathToFile string) *http.Request {
file, err := os.Open(pathToFile)
if err != nil {
return nil
}
defer file.Close()
body := &bytes.Buffer{}
writer := multipart.NewWriter(body)
part, err := writer.CreateFormFile("file", filepath.Base(pathToFile))
if err != nil {
return nil
}
_, _ = io.Copy(part, file)
err = writer.Close()
if err != nil {
return nil
}
// the body is the only important data for creating a new request with the form data attached
req, _ := http.NewRequest("POST", "", body)
req.Header.Set("Content-Type", writer.FormDataContentType())
return req
}