Is it a good idea to initialize databaze as global variable? Can it work?
I'm thinking about something like that:
func MustDB(d *sql.DB, err error) *sql.DB {
if err != nil {
log.Panic(err)
}
return d
}
// what I don't know - is how to call db.Close()
// username and password can also be read similar way
var db *DB = MustDB(db.Open(...))
func MustPrepare(db *sql.DB, query string) *sql.Stmt {
res, err := sql.Prepare(db, query)
if err!=nil {
log.Panic(err)
}
return ret;
}
The advantage is, I can simple create prepared sql statements as global variables. I don't have to create and manage a storage, where all sql commands will be put. Only I write:
var s1 *sql.Stmt = MustPrepare(db, "SELECT * FROM MyTable")
var s2 *sql.Stmt = MustPrepare(db, "INSERT INTO MyTable(col1, col2) VALUES(?,?)")
var s3 *sql.Stmt = MustPrepare(db, "DELETE FROM MyTable WHERE col1=?")
Do you think, that pattern is usefull, or it cannot work at all.
In go you typicallly initialize a global *DB struct using Open (at least global in your Database Access package). That does not open an actual connection to the DB, but creates a connection pool. Therefore there should be only one instance of it. You can initialize that in init of your package.
See
http://go-database-sql.org/
or
https://www.vividcortex.com/blog/2015/01/14/the-ultimate-guide-to-building-database-driven-apps-with-go/
for a good introductory guide.
Yes it is a good approach. when you go through the go documentation it clearly tells you
It is rare to Close a DB, as the DB handle is meant to be long-lived
and shared between many go routines.
Go maintains its own pool of idle connections. Thus, the Open function should be called just once. It is rarely necessary to close a DB.
As a rule of thumb I don't think its good practice to use database connections this way, you should privatize it and just open/close as you need it :)
But if it works and you like it then nothings wrong with doing it that way.
Related
I am trying to upload bunch of files using the company's api to the storage service they provide. (basically to my account). I have got lots of files like 40-50 or something.
I got the full path of the files and utilize the os.Open, so that, I can pass the io.Reader. I did try to use client.Files.Upload() without goroutines but it took so much time to upload them and decided to use goroutines. Here the implementation that I tried. When I run the program it just uploads one file which is the one that has the lowest size or something that it waits for a long time. What is wrong with it? Is it not like every time for loops run it creates a goroutine continue its cycle and creates for every file? How to make it as fast as possible with goroutines?
var filePaths []string
var wg sync.WaitGroup
// fills the string of slice with fullpath of files.
func fill() {
filepath.Walk(rootpath, func(path string, info os.FileInfo, err error) error {
if !info.IsDir() {
filePaths = append(filePaths, path)
}
if err != nil {
fmt.Println("ERROR:", err)
}
return nil
})
}
func main() {
fill()
tokenSource := oauth2.StaticTokenSource(&oauth2.Token{AccessToken: token})
oauthClient := oauth2.NewClient(context.TODO(), tokenSource)
client := putio.NewClient(oauthClient)
for _, path := range filePaths {
wg.Add(1)
go func() {
defer wg.Done()
f, err := os.Open(path)
if err != nil {
log.Println("err:OPEN", err)
}
upload, err := client.Files.Upload(context.TODO(), f, path, 0)
if err != nil {
log.Println("error uploading file:", err)
}
fmt.Println(upload)
}()
}
wg.Wait()
}
Consider a worker pool pattern like this: https://go.dev/play/p/p6SErj3L6Yc
In this example application, I've taken out the API call and just list the file names. That makes it work on the playground.
A fixed number of worker goroutines are started. We'll use a channel to distribute their work and we'll close the channel to communicate the end of the work. This number could be 1 or 1000 routines, or more. The number should be chosen based on how many concurrent API operations your putio API can reasonably be expected to support.
paths is a chan string we'll use for this purpose.
workers range over paths channel to receive new file paths to upload
package main
import (
"fmt"
"os"
"path/filepath"
"sync"
)
func main() {
paths := make(chan string)
var wg = new(sync.WaitGroup)
for i := 0; i < 10; i++ {
wg.Add(1)
go worker(paths, wg)
}
if err := filepath.Walk("/usr", func(path string, info os.FileInfo, err error) error {
if err != nil {
return fmt.Errorf("Failed to walk directory: %T %w", err, err)
}
if info.IsDir() {
return nil
}
paths <- path
return nil
}); err != nil {
panic(fmt.Errorf("failed Walk: %w", err))
}
close(paths)
wg.Wait()
}
func worker(paths <-chan string, wg *sync.WaitGroup) {
defer wg.Done()
for path := range paths {
// do upload.
fmt.Println(path)
}
}
This pattern can handle an indefinitely large amount of files without having to load the entire list in memory before processing it. As you can see, this doesn't make the code more complicated - actually, it's simpler.
When I run the program it just uploads one file which is the one
Function literals inherit the scope in which they were defined. This is why our code only listed one path - the path variable scope in the for loop was shared to each go routine, so when that variable changed, all routines picked up the change.
Avoid function literals unless you actually want to inherit scope. Functions defined at the global scope don't inherit any scope, and you must pass all relevant variables to those functions instead. This is a good thing - it makes the functions more straightforward to understand and makes variable "ownership" transitions more explicit.
An appropriate case to use a function literal could be for the os.Walk parameter; its arguments are defined by os.Walk so definition scope is one way to access other values - such as paths channel, in our case.
Speaking of scope, global variables should be avoided unless their scope of usage is truly global. Prefer passing variables between functions to sharing global variables. Again, this makes variable ownership explicit and makes it easy to understand which functions do and don't access which variables. Neither your wait group nor your filePaths have any cause to be global.
f, err := os.Open(path)
Don't forget to close any files you open. When you're dealing with 40 or 50 files, letting all those open file handles pile up until the program ends isn't so bad, but it's a time bomb in your program that will go off when the number of files exceeds the ulimit of allowed open files. Because the function execution greatly exceeds the part where the file needs to be open, defer doesn't make sense in this case. I would use an explicit f.Close() after uploading the file.
I want to do something like this pseudocode:
let mut vec = Vec<u8>::new();
vec.resize(1024); // Some large-enough size
serialize(&vec, || { .... });
// ... other code blah blah ...
deserialize(&vec); // This will execute the closure
Ideally, I would be able to run deserialize in another thread, which is the whole point of doing this really.
I do not want to send an opcode and data because this way opens up a remarkably clean programming model where you don't create messages and send them. Instead you just run arbitrarily complex code in another thread.
Why "remarkably clean"?
There are no opcodes (messages) that need to be created. i.e., less code.
There is no switch statement for opcode dispatch on the receiving end. i.e., less code.
Since there is no protocol, there is no need to version the messages. i.e., less code.
This idea cannot be used across processes, but that's OK for my needs.
Without using unsafe? No. No no no.
The problem is that since Vec<u8>s can be trivially modified, you can easily violate Rust's safety invariants. Consider the following code:
let vec = Vec<u8>::new();
vec.resize(1024);
// Our theoretical serialize function.
serialize(&vec, || { /* ... */ });
vec[0] = 0x90; // What happens to the closure now?
deserialize(&vec); // There goes any memory safety...
However, if all you want to do is send closures between threads, consider using something like std::sync::mpsc, which supports sending closures:
use std::thread;
use std::sync::mpsc::channel;
let (tx, rx) = channel();
thread::spawn(move || {
tx.send(|| { "hi" }).unwrap();
});
let f = rx.recv().unwrap();
assert_eq!(f(), "hi");
My guess, however, is that this is not actually what you want to do. Like Netwave said in the comments, you most likely actually want to send the data and a tag of the operation; for example:
// On one thread...
tx.send((Op::REMOVE_FILE, path));
// and on the other thread...
let (op, path) = rx.recv();
match op {
Op::REMOVE_FILE => remove_file(path),
/* ... */
}
What is the best practice for storing a connection to a database in Go language?
In Java for example you can use singletons, some IoC containers like Spring.
What is the best practice in it's lifecycle?
How to release it after application close?
There is nothing wrong about using a Singleton pattern here too.
I would use something like this:
var db *sql.DB = nil
func GetDB() (*sql.DB, error) {
if db == nil {
conn := fmt.Sprintf("host=%s user=%s password=%s dbname=%s sslmode=require",
DB_HOST, DB_USER, DB_PASSWORD, DB_NAME)
log.Println("Creating a new connection: %v", conn)
d, err := sql.Open("postgres", conn)
if err != nil {
return nil, err
}
db = d
}
return db, nil
}
With this exported function you can receive a connection from all other packages.
Update of the answer according to the comments (thanks #all for the information)!:
The returned DB is safe for concurrent use by multiple goroutines and
maintains its own pool of idle connections. Thus, the Open function
should be called just once. It is rarely necessary to close a DB.¹
It is rare to Close a DB, as the DB handle is meant to be long-lived
and shared between many goroutines.²
I would say that there is no forcing reason to call close on the database connection. I found no other statements. Despite this I would use a defer GetDB().close() in the main function - just for the completeness of the code.
Another thing I would like to note is that the connection should be verified by a db.Ping() otherwise the connection could be established but the database may not exist.
With this new information I wouldn't bother using some mutexes to ensure that the database is established. I would create a new DBInit() and run it inside the init() function of the main package.
I have a go function which will do some logic after the Channel have received some logic.
My problem is I want the function to keep alive after it has done the logic. My thinking is
that I will add a endless while loop in the function. However, I am wondering whether this is
a good technique. My Code is as follows:
func process(channel chan string, sid string) {
inputSid := <-channel
// check if sid exist in process pool
if strings.EqualFold(sid, inputSid) {
fmt.Println("Got message", sid)
//the code that I added to make this function alive
for {}
} else {
channel <- sid
//the code that I added to make this function alive
for {}
}
}
For future reference:
A better way to keep alive a function is to use an empty select statement which will block a Go routine indefinitely. As opposed to an empty for statement, it does not consume CPU time doing so.
select { }
Just use the standard "Go way". You can range over a channel until it is closed:
for sid := range channel {
// do stuff
}
It will continue until the channel is closed. Adding a "wait loop" like that is asking for trouble.
I just test PetaPoco Transaction in a multithread way...
I have a simple test case :
-- Simple value object call it MediaDevice
-- Insert a record an update it for 1000 times
void TransactionThread(Object object)
{
Database db = (Database) object;
for(int i= 0; i < 1000;i++)
{
Transaction transaction = db.GetTransaction();
MediaDevice device = new MediaDevice();
device.Name = "Name";
device.Brand = "Brand";
db.Insert(device);
device.Name = "Name_Updated";
device.Brand = "Brand_Updated";
db.Update(device);
transaction.Complete();
}
long count = db.ExecuteScalar<long>("SELECT Count(*) FROM MediaDevices");
Console.WriteLine("Number of all records:" + count);
}
And I call this in two threads like this:[ Single Database object for both threads]
void TransactionTest()
{
Database db = GetDatabase();
Thread tThread1 = ... // thread for TransactionTest()
Thread tThread2 = ... // thread for TransactionTest()
tThread1.Start(db); // pass Database to TransactionTest()
tThread2.Start(db); // pass same Database to TransactionTest()
}
I get Null error or sometimes Object disposed error for Database..
But when i supply two Database instance,
void TransactionTest()
{
Database db = GetDatabase();
Database db2 = GetDatabase();
Thread tThread1 = ... // thread for TransactionTest()
Thread tThread2 = ... // thread for TransactionTest()
tThread1.Start(db); // pass Database instance db to TransactionTest()
tThread2.Start(db2); // pass Database intance db2 to TransactionTest()
}
Everthing is OK...
Well When I check PetaPoco source code at transaction I see that at transaction.Complete
public virtual void Complete()
{
_db.CompleteTransaction();
_db = null;
}
My question is that to able to use transaction from multiple threads Do I have to use new copy of Database object? Or what am i doing wrong?
And to make it thread safe do i have to open and close NEW database at every data update-query?
Yes, you need a separate PetaPoco Database instance per-thread. See this quote from the PetaPoco documentation:
Note: for transactions to work, all operations need to use the same
instance of the PetaPoco database object. So you'll probably want to
use a per-http request, or per-thread IOC container to serve up a
shared instance of this object. Personally StructureMap is my
favourite for this.
I bolded the phrase that gives the clue. It is saying that one instance of the PetaPoco database object should be used per-thread.
Hi use with nolock in select query because the table may be locked. long count = db.ExecuteScalar("SELECT Count(*) with nolock FROM MediaDevices");
sorry dude.. yes you are right. they change the object to be null. so you cannot use the same object to threading. you have to use they use described like db=GetDataBase() ; db2=GetDataBase();
otherwise you can change the source code for your requirement. i think their license allow it. but i am not sure.