Testing Go Web Scraper with Go-VCR

Testing Go Web Scraper with Go-VCR - testing

I'm newer to the Go language and it's resources, but have been looking around for quite some time without any luck of finding what I'm looking for. So if there is a resource out there for it, I apologize for the duplicate question and would appreciate being directed that way.
My goal is simply to build a web scraper. I'm using chromedp, which has features to focus on elements, fill in text, etc. I want to create a test environment/server to test it with during development. The main reason being that I do not want to constantly create GET requests from a website (out of common courtesy), but also be able to work offline and in addition it should also make testing a little faster. I stumbled across the go-vcr library and have been trying to get that to work, but to no avail. I can get it to record and create a .yaml, but I can't figure out how to test beyond the raw html that bounces back and gets stored in the .yaml file. My understanding is that it's possible to replicate the website and functionality of it using the library, but I'm unable to piece together how to do that.
Is what I'm trying to do possible, or is the go-vcr library (or any test/fake server for that matter) only capable of returning static data, therefore rendering anything I want to test with the web scraper not possible?
I haven't posted any code simply because I haven't pieced together much more than the examples given from the repository for the go-vcr.
I hope I was able to explain that in a way that made sense. If not I'd be happy to answer questions to clarify.
Update: Adding the example code for sake of ease. I understand how this part of it works (I think) and I can use it for testing whether or not I grabbed the proper elements of a static page, but ideally (as an example) I want to be able to fill in a text box with my program and test whether or not I successfully found the text box and filled it in without hitting the live webpage.
package vcr_test
import (
"io/ioutil"
"net/http"
"strings"
"testing"
"github.com/dnaeon/go-vcr/recorder"
)
func TestSimple(t *testing.T) {
// Start our recorder
r, err := recorder.New("fixtures/golang-org")
if err != nil {
t.Fatal(err)
}
defer r.Stop() // Make sure recorder is stopped once done with it
// Create an HTTP client and inject our transport
client := &http.Client{
Transport: r, // Inject as transport!
}
url := "http://golang.org/"
resp, err := client.Get(url)
if err != nil {
t.Fatalf("Failed to get url %s: %s", url, err)
}
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
t.Fatalf("Failed to read response body: %s", err)
}
wantTitle := "<title>The Go Programming Language</title>"
bodyContent := string(body)
if !strings.Contains(bodyContent, wantTitle) {
t.Errorf("Title %s not found in response", wantTitle)
}
}

Related

How to make a post request using the gogearbox framework

I am currently stuck because I do not know how to complete my Post request for a project I am doing. I am using a framework called gearbox (I wanted to try something new). The main problem is I don't know how to bind the json to the new variable. So can anyone help me with this issue. For the info I can post the GitHub package. It's "github.com/gogearbox/gearbox" , Please help me.
I did try to look up the documentation,and I did try a few different functions but it didn't work so if anyone can help me please.

You should provide some code even if it doesn't work. It's usually a good starting point. This way we can avoid trying things you already tested out. I've briefly read the doc and didn't test the code below but you may try to look at the ParseBody function:
type Payload struct{
FirstName string `json:"firstname"`
LastName string `json:"lastname"`
}
requestHandler := func(ctx *fasthttp.RequestCtx) {
var payload *Payload
err := ctx.ParseBody(&payload)
if err!= nil {
ctx.Status(gearbox.StatusInternalServerError).SendString("Something went wrong when parsing your payload!")
}
// do something with your payload
}
reference here

Using encoding.com golang wrapper

I'm rather new to both golang and encoding.com and I'm trying to use the encoding.com API wrapper to transcode a simple video file, but I'm rather confused by the format to use.
When looking at the tests I can see how to call the AddMedia function (https://github.com/nytimes/encoding-wrapper/blob/master/encodingcom/media_test.go#L9-L39) but unfortunately it doesn't work for me.
package main
import ("github.com/NYTimes/encoding-wrapper/encodingcom")
func main() {
client, err := encodingcom.NewClient("https://manage.encoding.com", "123", "key")
format := encodingcom.Format{
Output: []string{"https://key:secret#bucket.s3.amazonaws.com/aladin.ogg"},
VideoCodec: "libtheora",
AudioCodec: "libvorbis",
Bitrate: "400k",
AudioBitrate: "64k",
}
addMediaResponse, err := client.AddMedia([]string{"https://samples.mplayerhq.hu/h264/Aladin.mpg"},
[]encodingcom.Format{format}, "us-east-1")
}
}
The error "raised" is
APIError.Errors.Errors0: Output format 'https://key:secret#bucket.s3.amazonaws.com/aladin.aac' is not allowed! (format #0)
APIError.Message:
and I really don't get it, the Output element in the Format looks missplaced, am I reading the test wrong? Using the API builder the format parameter should receive only the format, for example "ogg", and there's a "destination" parameter for S3. It also doesn't specify if the url must be urlencoded, but honestly I don't think so. Still keys and secrets can contain for example the char '/'
Any more experienced gopher?

TDD Constructors Golang

Even though there are a few posts on this, I haven't found one with much substance. So hopefully a few people will share opinions on this.
One thing holding me up from having a true TDD workflow is that I can't figure out a clean way to test things that have to connect to networked services like database.
For example:
type DB struct {
conn *sql.DB
}
func NewDB(URL string) (*DB, err) {
conn, err := sql.Open("postgres", URL)
if err != nil {
return nil, err
}
}
I know I could pass the sql connection to NewDB, or directly to the struct and assign it to an interface that has all the methods I need, and that would be easily testable. But somewhere, I'm going to have to connect. The only way to test this that I've been able to find is...
var sqlOpen = sql.Open
func CreateDB() *DB {
conn, err := sqlOpen("postgres", "url...")
if err != nil {
log.Fatal(err)
}
dataBase = DB{
conn: conn
}
}
Then in the test you swap out the sqlOpen function with something that returns a function with the same signature that will give an error for one test case and not give an error for another. But this feels like a hack, especially if you're doing this for several functions in the same file. Is there a better way? The codebase I'm working with has a lot of functions in packages and network connections. Because I'm struggling to test things in clean way, it's driving me away from TDD.

Typical business application has A LOT of logic in queries. We significantly decrease testing coverage and leave room for regression errors if they are not tested. So, mocking DB repositories is not the best option. Instead, we can mock database itself and test how we work with it on SQL level.
Below are sample code using DATA-DOG/go-sqlmock, but there could be other libraries that mock sql databases.
First of all, we need to inject sql connection into our code. GO sql connection is a misleading name and it is actually connections pool, not just single DB connection. That is why, it is make sense to create single *sql.DB in your composition root and reuse in your code even if you do not write tests.
Sample below shows how to mock web service.
At the beginning, we need to create new handler with injected connection:
// New creates new handler
func New(db *sql.DB) http.Handler {
return &handler{
db: db,
}
}
Handler code:
type handler struct {
db *sql.DB
}
func (h handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
// some code that loads person name from database using id
}
Unit Test that code that mocks DB. It uses stretchr/testify for assertions :
func TestHandler(t *testing.T) {
db, sqlMock, _ := sqlmock.New()
rows := sqlmock.NewRows([]string{"name"}).AddRow("John")
// regex is used to match query
// assert that we execute SQL statement with parameter and return data
sqlMock.ExpectQuery(`select name from person where id \= \?`).WithArgs(42).WillReturnRows(rows)
defer db.Close()
sut := mypackage.New(db)
r, err := http.NewRequest(http.MethodGet, "https://example.com", nil)
require.NoError(t, err, fmt.Sprintf("Failed to create request: %v", err))
w := httptest.NewRecorder()
sut.ServeHTTP(w, r)
// make sure that all DB expectations were met
err = sqlMock.ExpectationsWereMet()
assert.NoError(t, err)
// other assertions that check DB data should be here
assert.Equal(t, http.StatusOK, w.Code)
}
Our test asserts simple SQL statement against DB. But with go-sqlmock it is possible to test all CRUD operations and database transactions.
Test above still has one weak point. We tested that our SQL statement is executed from code, but we did not test if it works against our real DB. That issue cannot be solved with unit tests. The only solution is integration test against real DB.
We are in better position now though. Out business logic is already tested in unit tests. We do not need to create lots of integration tests to cover different scenarios and parameters, instead we need to have just one test per query to verify SQL syntax and match to our DB schema.
Happy testing!

Limiting simultaneous downloads using RxAlamofire

Given my App will download files from a server and I only want 1 download to be progressed at the same time, then how could this be done with RxAlamofire? I might simply be missing an Rx operator.
Here's the rough code:
Observable
.from(paths)
.flatMapWithIndex({ (ip, idx) -> Observable<(Int, Video)> in
let v = self.files![ip.row] as! Video
return Observable.from([(idx, v)])
})
.flatMap { (item) -> Observable<Video> in
let req = URLRequest(url: item.1.downloadURL())
return Api.alamofireManager()
.rx
.download(req, to: { (url, response) -> (destinationURL: URL, options: DownloadRequest.DownloadOptions) in
...
})
.flatMap({ $0.rx.progress() })
.flatMap { (progress) -> Observable<Float> in
// Update a progress bar
...
}
// Only propagate finished items
.filter { $0 >= 1.0 }
// Return the item itself
.flatMap { _ in Observable.from([item.1]) }
}
.subscribe(
onNext: { (res) in
...
},
onError: { (error) in
...
},
onCompleted: {
...
}
)
My problem is a) RxAlamofire will download multiple items at the same time and b) the (progress) block is called multiple times for those various items (with different progress infos on each, causing the UI to behave a bit weird).
How to ensure the downloads are done one by one instead of simultaneously?

Does alamofireManager().rx.download() download concurrently or serially?
I'm not sure how it does, so test that first. Isolate this code and see if it does execute multiple downloads at once. If it does, then read up on the documentation for serial downloads instead of concurrent downloads.
If it downloads one at a time, then it means it has something to do with your Rx code that triggers the progress bar update issue. If it doesn't download one at a time, then it means we just need to read up on Alamofire's documentation on how to download one at a time.
Complex transformations and side effects
Something to consider is that your data streams are becoming more complex and difficult to debug because so many things are happening in one stream. Because of the multiple flat maps, there can be a lot more emissions coming out affecting the progress bar update. It is also possible that the numerous flat maps operations that acquired an Observable are the cause for the multiple triggering of the updates on the progress bar.
Complex data streams
In one data stream you (a) performed the network call (b) updated the progress bar (c) filtered finished videos (d) and went back to the video you wanted by using flatMapWithIndex at the start to pair together id and the video model so that you can return back to the model at the end. Kind of complicated... My guess is that the weird progress bar updates might be caused by creating a hot observable on call of $0.rx.progress().
I made a github gist of my Rx Playground that tries to model what you're trying to do.
In functional reactive programming, it would be much more readable and easier to debug if you first define your data streams/observables. In my gist, I began with the observables and how I planned to model the download progress.
This code will avoid the concurrency issues if the RxAlamofire query downloads 1 at a time, and it properly presents the progress value for a UIProgressBar.
Side note
Do you need to track the individual progress downloads per download item? Or do you want your progress bar to just increment per finished download item?
Also, be wary with the possible dangers of misusing a chain of multiple flatMaps as explained here.

Go error handling, type assertion, and the net package

I'm learning go and trying to understand how to get more detailed error information out of the generic error type. The example I'll use is from the net package, specifically the DialTimeout function.
The signature is
func DialTimeout(network, address string, timeout time.Duration) (Conn, error)
The error type only defines an Error() string function. If I want to find out exactly why DialTimeout failed, how can I get that information? I found out that I can use type assertion to get the net.Error specific error:
con, err := net.DialTimeout("tcp", net.JoinHostPort(address, "22"),
time.Duration(5) * time.Second)
if err != nil {
netErr, ok := err.(net.Error)
if ok && netErr.Timeout() {
// ...
}
}
but that only tells me whether or not I had a timeout. For example, say I wanted to differentiate between a refused connection and no route to host. How can I do that?
Maybe DialTimeout is too high-level to give me that kind of detail, but even looking at syscall.Connect, I don't see how to get the specific error. It just says it returns the generic error type. Compare that to the Posix connect, which will let me know why it failed with the various return codes.
My general question is: how am I supposed to pull out error details from the generic error type if the golang docs don't tell me what type of errors may be returned?

Most networking operations return a *OpError which holds detailed information about the error
and implements the net.Error interface. So for most use cases it is sufficient to use net.Error
as you already did.
But for your case you'd want to assert the returned error to be *net.OpError and
use the internal error:
if err != nil {
if oerr, ok := err.(*OpError); ok {
// Do something with oerr.Err
}
}
As soon as you're doing this you are in the land of platform dependency as syscalls under Linux
can fail differently to those under Windows. For Linux you'd do something like this:
if oerr.Err == syscall.ECONNREFUSED {
// Connection was refused :(
}
The syscall package contains the important error constants for your platform. Unfortunately
the golang website only shows the syscall package for Linux amd64. See here for ECONNREFUSED.
Finding out types
The next time you're wondering what is actually returned by some function and you can't make
heads and tails of it, try using the %#v format specified in fmt.Printf (and friends):
fmt.Printf("%#v\n", err)
// &net.OpError{Op:"dial", Net:"tcp", Addr:(*net.TCPAddr)(0xc20006d390), Err:0x6f}
It will print detailed type information and is generally quite helpful.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas