How to Unit Test Gorm Golang Preload Query - eager-loading

Recently, I was trying to query data with preload function that gorm provided.
Here's the actual implementation looks like:
func (repo *SectorRepository) GetSectorsByStockCode(stockCode string) *model.Sector {
var foundStocks []*model.Stock
repo.DB.Where("code = ?", stockCode).Preload("Sectors").Find(&foundStocks)
if foundStocks == nil {
return nil
} else if len(foundStocks) == 0 {
return nil
} else if foundStocks[0].Sectors == nil {
return nil
} else if len(foundStocks[0].Sectors) == 0 {
return nil
} else {
return foundStocks[0].Sectors[0]
}
}
Then this is what I've done on my unit test:
func Test_SectorRepo_GetSectorsByStockCode(t *testing.T) {
db, gdb, sqlMock, err := InitTestSectorRepo()
require.NoError(t, err)
defer db.Close()
repository := repository.SectorRepository{DB: gdb}
var stockCode = "AAPL"
var expectedCode = "code1"
var expectedName = "name1"
var exchangeCode = "exchange_code1"
var sectorCode = "sector_code1"
var now = time.Now()
sqlMock.ExpectQuery(regexp.QuoteMeta(`SELECT * FROM "stocks" WHERE code = $1`)).WithArgs(stockCode).WillReturnRows(
sqlMock.NewRows([]string{"code", "name", "exchange_code", "created_at", "updated_at"}).
AddRow(expectedCode, expectedName, exchangeCode, now, now),
)
sqlMock.ExpectQuery(regexp.QuoteMeta(`SELECT * FROM "stock_sectors" WHERE ("stock_sectors"."stock_code","stock_sectors"."stock_exchange_code") IN (($1,$2))`)).
WithArgs(expectedCode, exchangeCode).WillReturnRows(sqlMock.NewRows([]string{"stock_code", "stock_exchange_code", "sector_code"}).AddRow(expectedCode, exchangeCode, sectorCode))
sqlMock.ExpectQuery(regexp.QuoteMeta(`SELECT * FROM "sectors" WHERE "sectors"."code" = $1'`)).WithArgs(stockCode).WillReturnRows(
sqlMock.NewRows([]string{"code", "name", "created_at", "updated_at"}).AddRow(sectorCode, "sector_name1", now, now),
)
res := repository.GetSectorsByStockCode(stockCode)
require.Equal(t, res.Name, expectedName)
}
But I got a failing message as provided below:
Running tool: /usr/local/go/bin/go test -timeout 30s -run ^Test_SectorRepo_GetSectorsByStockCode$ gitlab.com/xxx/yyy-service/test/repository
2022/07/02 20:38:10 [32m/Users/aaa/yyy-service/api/repository/sector_repository.go:38
[0m[33m[0.045ms] [34;1m[rows:1][0m SELECT * FROM "stock_sectors" WHERE ("stock_sectors"."stock_code","stock_sectors"."stock_exchange_code") IN (('code1','exchange_code1'))
2022/07/02 20:38:10 [31;1m/Users/aaa/yyy-service/api/repository/sector_repository.go:38 [35;1mQuery: could not match actual sql: "SELECT * FROM "sectors" WHERE "sectors"."code" = $1" with expected regexp "SELECT \* FROM "sectors" WHERE "sectors"\."code" = \$1'"
[0m[33m[0.025ms] [34;1m[rows:0][0m SELECT * FROM "sectors" WHERE "sectors"."code" = 'sector_code1'
2022/07/02 20:38:10 [31;1m/Users/aaa/yyy-service/api/repository/sector_repository.go:38 [35;1mQuery: could not match actual sql: "SELECT * FROM "sectors" WHERE "sectors"."code" = $1" with expected regexp "SELECT \* FROM "sectors" WHERE "sectors"\."code" = \$1'"
[0m[33m[1.300ms] [34;1m[rows:1][0m SELECT * FROM "stocks" WHERE code = 'AAPL'
--- FAIL: Test_SectorRepo_GetSectorsByStockCode (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x14499f6]
goroutine 4 [running]:
testing.tRunner.func1.2({0x1497620, 0x1896c40})
/usr/local/go/src/testing/testing.go:1209 +0x24e
testing.tRunner.func1()
/usr/local/go/src/testing/testing.go:1212 +0x218
panic({0x1497620, 0x1896c40})
/usr/local/go/src/runtime/panic.go:1038 +0x215
gitlab.com/xxx/yyy-service/test/repository_test.Test_SectorRepo_GetSectorsByStockCode(0x0)
/Users/aaa/yyy-service/test/repository/sector_repository_test.go:131 +0x8b6
testing.tRunner(0xc00011b380, 0x1526fd8)
/usr/local/go/src/testing/testing.go:1259 +0x102
created by testing.(*T).Run
/usr/local/go/src/testing/testing.go:1306 +0x35a
FAIL gitlab.com/xxx/yyy-service/test/repository 0.533s
FAIL
I thought in the first place that gorm preload function will create subsequent queries, hence I on the above code I was trying to evaluate all the subsequent queries.
Until now I'm still figuring it out if it's possible to test the preload gorm query by using sqlmock.
If there's anyone who could help to answer this problem will be much appreciated :)

Related

Submitting an SQL query with a slice parameter

I have a Snowflake query where I'm trying to update a field on all items where another field is in a list which is submitted to the query as a variable:
UPDATE my_table SET download_enabled = ? WHERE provider_id = ? AND symbol IN (?)
I've tried doing this query using the gosnowflake.Array function like this:
enable := true
provider := 1
query := "UPDATE my_table SET download_enabled = ? WHERE provider_id = ? AND symbol IN (?)"
if _, err := client.db.ExecContext(ctx, query, enable, provider,
gosnowflake.Array(assets)); err != nil {
fmt.Printf("Error: %v", err)
}
However, this code fails with the following error:
002099 (42601): SQL compilation error: Batch size of 1 for bind variable 1 not the same as previous size of 2.
So then, how can I submit a variable representing a list of values to an SQL query?
I found a potential workaround, which is to submit each item in the list as a separate parameter explicitly:
func Delimit(s string, sep string, count uint) string {
return strings.Repeat(s+sep, int(count)-1) + s
}
func doQuery(enable bool, provider int, assets ...string) error {
query := fmt.Sprintf("UPDATE my_table SET download_enabled = ? " +
"WHERE provider_id = ? AND symbol IN (%s)", Delimit("?", ", ", uint(len(assets))))
params := []interface{}{enable, provider}
for _, asset := range assets {
params = append(params, asset)
}
if _, err := client.db.ExecContext(ctx, query, params...); err != nil {
return err
}
return nil
}
Needless to say this is a less elegant solution then what I wanted but it does work.

How to implement a PATCH with `database/sql`?

Let’s say you have a basic API (GET/POST/PATCH/DELETE) backed by an SQL database.
The PATCH call should only update the fields in the JSON payload that the user sends, without touching any of the other fields.
Imagine the table (let's call it sample) has id, string_a and string_b columns, and the struct which corresponds to it looks like:
type Sample struct {
ID int `json:"id"`
StringA string `json:"stringA"`
StringB string `json:"stringB"`
}
Let's say the user passes in { "stringA": "patched value" } as payload. The json will be unmarshalled to something that looks like:
&Sample{
ID: 0,
StringA: "patched value",
StringB: "",
}
For a project using database/sql, you’d write the query to patch the row something like:
// `id` is from the URL params
query := `UPDATE sample SET string_a=$1, string_b=$2 WHERE id=$3`
row := db.QueryRow(query, sample.StringA, sample.StringB, id)
...
That query would update the string_a column as expected, but it’d also update the string_b column to "", which is undesired behavior in this case. In essence, I’ve just created a PUT instead of a PATCH.
My immediate thought was - OK, that’s fine, let’s use strings.Builder to build out the query and only add a SET statement for those that have a non-nil/empty value.
However, in that case, if a user wanted to make string_a empty, how would they accomplish that?
Eg. the user makes a PATCH call with { "stringA": "" } as payload. That would get unmarshalled to something like:
&Sample{
ID: 0,
StringA: "",
StringB: "",
}
The “query builder” I was theorizing about would look at that and say “ok, those are all nil/empty values, don’t add them to the query” and no columns would be updated, which again, is undesired behavior.
I’m not sure how to write my API and the SQL queries it runs in a way that satisfies both cases. Any thoughts?
I think reasonable solution for smaller queries is to build UPDATE query and list of bound parameters dynamically while processing payload with logic that recognizes what was updated and what was left empty.
From my own experience this is clear and readable (if repetitive you can always iterate over struct members that share same logic or employ reflection and look at struct tags hints, etc.). Every (my) attempt to write universal solution for this ended up as very convoluted overkill supporting all sorts of corner-cases and behavioral differences between endpoints.
func patchSample(s Sample) {
var query strings.Builder
params := make([]interface{}, 0, 2)
// TODO Check if patch makes sense (e.g. id is non-zero, at least one patched value provided, etc.
query.WriteString("UPDATE sample SET")
if s.StringA != "" {
query.WriteString(" stringA = ?")
params = append(params, s.StringA)
}
if s.StringB != "" {
query.WriteString(" stringB = ?")
params = append(params, s.StringB)
}
query.WriteString(" WHERE id = ?")
params = append(params, s.ID)
fmt.Println(query.String(), params)
//_, err := db.Exec(query.String(), params...)
}
func main() {
patchSample(Sample{1, "Foo", ""})
patchSample(Sample{2, "", "Bar"})
patchSample(Sample{3, "Foo", "Bar"})
}
EDIT: In case "" is valid value for patching then it needs to be distinguishable from the default empty value. One way how to solve that for string is to use pointer which will default to nil if value is not present in payload:
type Sample struct {
ID int `json:"id"`
StringA *string `json:"stringA"`
StringB *string `json:"stringB"`
}
and then modify condition(s) to check if field was sent like this:
if s.StringA != nil {
query.WriteString(" stringA = ?")
params = append(params, *s.StringA)
}
See full example in playground: https://go.dev/play/p/RI7OsNEYrk6
For what it's worth, I solved the issue by:
Converting the request payload to a generic map[string]interface{}.
Implementing a query builder that loops through the map's keys to create a query.
Part of the reason I went this route is it fit all my requirements, and I didn't particularly like having *strings or *ints laying around.
Here is what the query builder looks like:
func patchQueryBuilder(id string, patch map[string]interface{}) (string, []interface{}, error) {
var query strings.Builder
params := make([]interface{}, 0)
query.WriteString("UPDATE some_table SET")
for k, v := range patch {
switch k {
case "someString":
if someString, ok := v.(string); ok {
query.WriteString(fmt.Sprintf(" some_string=$%d,", len(params)+1))
params = append(params, someString)
} else {
return "", []interface{}{}, fmt.Errorf("could not process some_string")
}
case "someBool":
if someBool, ok := v.(bool); ok {
query.WriteString(fmt.Sprintf(" some_bool=$%d,", len(params)+1))
params = append(params, someBool)
} else {
return "", []interface{}{}, fmt.Errorf("could not process some_bool")
}
}
}
if len(params) > 0 {
// Remove trailing comma to avoid syntax errors
queryString := fmt.Sprintf("%s WHERE id=$%d RETURNING *", strings.TrimSuffix(query.String(), ","), len(params)+1)
params = append(params, id)
return queryString, params, nil
} else {
return "", []interface{}{}, nil
}
}
Note that I'm using PostgreSQL, so I needed to provide numbered parameters to the query, eg $1, which is what params is used for. It's also returned from the function so that it can be used as follows:
// Build the patch query based on the payload
query, params, err := patchQueryBuilder(id, patch)
if err != nil {
return nil, err
}
// Use the query/params and get output
row := tx.QueryRowContext(ctx, query, params...)

gorm raw sql query execution

Am running a query to check if a table exists or not using the gorm orm for golang. Below is my code.
package main
import (
"fmt"
"log"
"gorm.io/driver/postgres"
"gorm.io/gorm"
_ "github.com/lib/pq"
)
// App sets up and runs the app
type App struct {
DB *gorm.DB
}
`const tableCreationQuery = `SELECT count (*)
FROM information_schema.TABLES
WHERE (TABLE_SCHEMA = 'api_test') AND (TABLE_NAME = 'Users')`
func ensureTableExists() {
if err := a.DB.Exec(tableCreationQuery); err != nil {
log.Fatal(err)
}
}`
The expected response should be either 1 or 0. I got this from another SO answer. Instead I get this
2020/09/03 00:27:18 &{0xc000148900 1 0xc000119ba0 0}
exit status 1
FAIL go-auth 0.287s
My untrained mind says its a pointer but how do I reference the returned values to determine what was contained within?
If you want to check if your SQL statement was successfully executed in GORM you can use the following:
tx := DB.Exec(sqlStr, args...)
if tx.Error != nil {
return false
}
return true
However in your example are using a SELECT statement then you need to check the result, which will be better suited to use the DB.Raw() method like below
var exists bool
DB.Raw(sqlStr).Row().Scan(&exists)
return exists

Golang SQL query variable substituion

I have sql query that needs variable substitution for better consumption of my go-kit service.
I have dep & org as user inputs which are part of my rest service, for instance: dep = 'abc' and org = 'def'.
I've tried few things like:
rows, err := db.Query(
"select name from table where department='&dep' and organisation='&org'",
)
And:
rows, err := db.Query(
"select name from table where department=? and organisation=?", dep , org,
)
That led to error: sql: statement expects 0 inputs; got 2
Only hard-coded values work and substitution fails .
I haven't found much help from oracle blogs regarding this and wondering if there is any way to approach this.
Parameter Placeholder Syntax (reference: http://go-database-sql.org/prepared.html )
The syntax for placeholder parameters in prepared statements is
database-specific. For example, comparing MySQL, PostgreSQL, and
Oracle:
MySQL PostgreSQL Oracle
===== ========== ======
WHERE col = ? WHERE col = $1 WHERE col = :col
VALUES(?, ?, ?) VALUES($1, $2, $3) VALUES(:val1, :val2, :val3)
For oracle you need to use :dep, :org as placeholders.
As #dakait stated, on your prepare statement you should use : placeholders.
So, for completeness, you would get it working with something like:
package main
import (
"database/sql"
"fmt"
"log"
)
// Output is an example struct
type Output struct {
Name string
}
const (
dep = "abc"
org = "def"
)
func main() {
query := "SELECT name from table WHERE department= :1 and organisation = :2"
q, err := db.Prepare(query)
if err != nil {
log.Fatal(err)
}
defer q.Close()
var out Output
if err := q.QueryRow(dep, org).Scan(&out.Name); err != nil {
log.Fatal(err)
}
fmt.Println(out.Name)
}

BigQuery UDF memory exceeded error on multiple rows but works fine on single row

I'm writing a UDF to process Google Analytics data, and getting the "UDF out of memory" error message when I try to process multiple rows. I downloaded the raw data and found the largest record and tried running my UDF query on that, with success. Some of the rows have up to 500 nested hits, and the size of the hit record (by far the largest component of each row of the raw GA data) does seem to have an effect on how many rows I can process before getting the error.
For example, the query
select
user.ga_user_id,
ga_session_id,
...
from
temp_ga_processing(
select
fullVisitorId,
visitNumber,
...
from [79689075.ga_sessions_20160201] limit 100)
returns the error, but
from [79689075.ga_sessions_20160201] where totals.hits = 500 limit 1)
does not.
I was under the impression that any memory limitations were per-row? I've tried several techniques, such as setting row = null; before emit(return_dict); (where return_dict is the processed data) but to no avail.
The UDF itself doesn't do anything fancy; I'd paste it here but it's ~45 kB in length. It essentially does a bunch of things along the lines of:
function temp_ga_processing(row, emit) {
topic_id = -1;
hit_numbers = [];
first_page_load_hits = [];
return_dict = {};
return_dict["user"] = {};
return_dict["user"]["ga_user_id"] = row.fullVisitorId;
return_dict["ga_session_id"] = row.fullVisitorId.concat("-".concat(row.visitNumber));
for(i=0;i<row.hits.length;i++) {
hit_dict = {};
hit_dict["page"] = {};
hit_dict["time"] = row.hits[i].time;
hit_dict["type"] = row.hits[i].type;
hit_dict["page"]["engaged_10s"] = false;
hit_dict["page"]["engaged_30s"] = false;
hit_dict["page"]["engaged_60s"] = false;
add_hit = true;
for(j=0;j<row.hits[i].customMetrics.length;j++) {
if(row.hits[i].customDimensions[j] != null) {
if(row.hits[i].customMetrics[j]["index"] == 3) {
metrics = {"video_play_time": row.hits[i].customMetrics[j]["value"]};
hit_dict["metrics"] = metrics;
metrics = null;
row.hits[i].customDimensions[j] = null;
}
}
}
hit_dict["topic"] = {};
hit_dict["doctor"] = {};
hit_dict["doctor_location"] = {};
hit_dict["content"] = {};
if(row.hits[i].customDimensions != null) {
for(j=0;j<row.hits[i].customDimensions.length;j++) {
if(row.hits[i].customDimensions[j] != null) {
if(row.hits[i].customDimensions[j]["index"] == 1) {
hit_dict["topic"] = {"name": row.hits[i].customDimensions[j]["value"]};
row.hits[i].customDimensions[j] = null;
continue;
}
if(row.hits[i].customDimensions[j]["index"] == 3) {
if(row.hits[i].customDimensions[j]["value"].search("doctor") > -1) {
return_dict["logged_in_as_doctor"] = true;
}
}
// and so on...
}
}
}
if(row.hits[i]["eventInfo"]["eventCategory"] == "page load time" && row.hits[i]["eventInfo"]["eventLabel"].search("OUTLIER") == -1) {
elre = /(?:onLoad|pl|page):(\d+)/.exec(row.hits[i]["eventInfo"]["eventLabel"]);
if(elre != null) {
if(parseInt(elre[0].split(":")[1]) <= 60000) {
first_page_load_hits.push(parseFloat(row.hits[i].hitNumber));
if(hit_dict["page"]["page_load"] == null) {
hit_dict["page"]["page_load"] = {};
}
hit_dict["page"]["page_load"]["sample"] = 1;
page_load_time_re = /(?:onLoad|pl|page):(\d+)/.exec(row.hits[i]["eventInfo"]["eventLabel"]);
if(page_load_time_re != null) {
hit_dict["page"]["page_load"]["page_load_time"] = parseFloat(page_load_time_re[0].split(':')[1])/1000;
}
}
// and so on...
}
}
row = null;
emit return_dict;
}
The job ID is realself-main:bquijob_4c30bd3d_152fbfcd7fd
Update Aug 2016 : We have pushed out an update that will allow the JavaScript worker to use twice as much RAM. We will continue to monitor jobs that have failed with JS OOM to see if more increases are necessary; in the meantime, please let us know if you have further jobs failing with OOM. Thanks!
Update : this issue was related to limits we had on the size of the UDF code. It looks like V8's optimize+recompile pass of the UDF code generates a data segment that was bigger than our limits, but this was only happening when when the UDF runs over a "sufficient" number of rows. I'm meeting with the V8 team this week to dig into the details further.
#Grayson - I was able to run your job over the entire 20160201 table successfully; the query takes 1-2 minutes to execute. Could you please verify that this works on your side?
We've gotten a few reports of similar issues that seem related to # rows processed. I'm sorry for the trouble; I'll be doing some profiling on our JavaScript runtime to try to find if and where memory is being leaked. Stay tuned for the analysis.
In the meantime, if you're able to isolate any specific rows that cause the error, that would also be very helpful.
A UDF will fail on anything but very small datasets if it has a lot of if/then levels, such as:
if () {
.... if() {
.........if () {
etc
We had to track down and remove the deepest if/then statement.
But, that is not enough. In addition, when you pass the data into the UDF run a "GROUP EACH BY" on all the variables. This will force BQ to send the output to multiple "workers". Otherwise it will also fail.
I've wasted 3 days of my life on this annoying bug. Argh.
I love the concept of parsing my logs in BigQuery, but I've got the same problem, I get
Error: Resources exceeded during query execution.
The Job Id is bigquery-looker:bquijob_260be029_153dd96cfdb, if that at all helps.
I wrote a very basic parser does a simple match and returns rows. Works just fine on a 10K row data set, but I get out of resources when trying to run against a 3M row logfile.
Any suggestions for a work around?
Here is the javascript code.
function parseLogRow(row, emit) {
r = (row.logrow ? row.logrow : "") + (typeof row.l2 !== "undefined" ? " " + row.l2 : "") + (row.l3 ? " " + row.l3 : "")
ts = null
category = null
user = null
message = null
db = null
found = false
if (r) {
m = r.match(/^(\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d\d\d (\+|\-)\d\d\d\d) \[([^|]*)\|([^|]*)\|([^\]]*)\] :: (.*)/ )
if( m){
ts = new Date(m[1])/1000
category = m[3] || null
user = m[4] || null
db = m[5] || null
message = m[6] || null
found = true
}
else {
message = r
found = false
}
}
emit({
ts: ts,
category: category,
user: user,
db: db,
message: message,
found: found
});
}
bigquery.defineFunction(
'parseLogRow', // Name of the function exported to SQL
['logrow',"l2","l3"], // Names of input columns
[
{'name': 'ts', 'type': 'timestamp'}, // Output schema
{'name': 'category', 'type': 'string'},
{'name': 'user', 'type': 'string'},
{'name': 'db', 'type': 'string'},
{'name': 'message', 'type': 'string'},
{'name': 'found', 'type': 'boolean'},
],
parseLogRow // Reference to JavaScript UDF
);