Creating generic code using database/sql package?

Creating generic code using database/sql package? - sql

I've recently implemented a package that uses the database/sql package. By limiting the SQL to very simple select/update/insert statements I assumed the package would work with all the DBMS supported by database/sql.
However, it turns out that some databases use ? as placeholder value while others use $1, $2, etc., which means the prepared statements will work with some DBMS but not with others.
So I'm wondering is there any technique to make this work in a generic way with all the supported drivers? Or is it necessary to have DBMS-specific code everywhere? (which I think would make the abstraction provided by database/sql a bit pointless). I guess using non-prepared statements is not an option either since different DBMS have different ways to escape parameters.
Any suggestion?

I assume this behaviour was left out specifically because SQL dialects vary significantly between databases, and the Go team wanted to avoid writing a preprocessor for each driver to translate 'GoSQL' into native SQL. The database/sql package mostly provides connection wrangling, which is an abstraction that falls under 'pretty much necessary' instead of statement translation, which is more 'nice to have'.
That said, I agree that re-writing every statement is a major nuisance. It shouldn't be too hard to wrap the database/sql/driver.Prepare() method with a regex to substitute the standard placeholder with the native one, though, or provide a new interface that specifies an additional PrepareGeneric method that guesses a wrapped sql.DB flavour, and provides similar translation.
Gorp uses a dialect type for this, which may be worth a look.
Just throwing out ideas.

Related

Performance of SQL standards vs T-SQL extensions

Articles on the internet say user-defined functions can either burden or increase the performance.
Now, I know that standard SQL is pretty limited, however, some of the behavior can still be written as in T-SQL built-in functions.
For example, adddays() vs. dateadd() . Another point I've heard that its also better to use coalesce() - the ANSI standard function rather than isNull().
What is the performance difference between using the ANSI SQL standard functions vs T-SQL functions?
Does T-SQL adds any burden what so ever on the performance with it trying to make the job easier, or not?
My research does not seem to indicate a trend.

You will need to approach this on a case-by-case basis and do actual testing. There is no general rule, other than Microsoft tries to make the entire stack perform as well as possible. TESTING is what you need to do - we can't tell you that always a certain thing would be faster. That would be really bad advice.
It is important to do this testing on your actual production data, prefereably a copy of it. Do not rely on tests done against data sets that aren't yours. When you're talking about performance differences of functions, some very subtle things can make a big difference. Things like the size of the table, the data types involved, the indexing, and SQL Server versions, can change the result of these tests. That is why "no one has done this" for you. We can't.

Why is SQL text-based?

I suppose this isn't a particularly "good fit for our Q&A format" but no idea where else to ask it.
Why is SQL-querying text based? Is there a historical motivation, or just laziness?
mysqli_query("SELECT * FROM users WHERE Name=$name");
It opens up an incredible number of stupid mistakes like above, encouraging SQL injection. It's basically exec for SQL, and exec'ing code, especially dynamically generated, is widely discouraged.
Plus, to be honest, prepared statements seem like just a verbose workaround.
Why don't we have a more object-oriented, noninjectable way of doing it, like having a directly-accessible database object with certain methods:
$DB->setDB("mysite");
$table='users';
$col='username';
$DB->selectWhereEquals($table,$col,$name);
Which the database itself natively implements, completely eliminating all resemblance to exec and nullifying the entire basis of SQL injection.
Culminating in the real question, is there any database framework that does anything like this?

Why are programming languages text-based? Quite simply: because text can be used to represent powerful human-readable/editable DSLs (Domain-specific language), such as SQL.
The better (?) question is: Why do programmers refuse to use placeholders?
Modern database providers (e.g. ADO.NET, PDO) support placeholders and an appropriate range of database adapters1. Programmers who fail to take advantage of this support only have themselves to blame.
Besides ubiquitous support for placeholders in direct database providers, there are also many different APIs available including:
"Fluent database libraries"; these generally map an AST, such as LINQ, onto a dynamically generated SQL query and take care of the details such as correctly using placeholders.
A wide variety of ORMs; may or may not include a fluent API.
Primitive CRUD wrappers as found in the Android SQLite API, which looks suspiciously similar to the proposal.
I love the power of SQL and almost none of the queries I write can be expressed in such a trivial API. Even LINQ, which is a powerful provider, can sometimes get in my way (although the prudent use of Views helps).
1 Even though SQL is text-based (and such is used to pass the shape of the query to the back-end), bound data need not be included in-line, which is where the issue of SQL Injection (being able to change the shape of the query) comes in. The database adapter can, and does if it is smart enough, use the API exposed by the target database. In the case of SQL Server this amounts to sending the data separately [via RPC] and for SQLite this means creating and binding separate prepared-statement objects.

When will I ever need more than what MySqli can give me?

I use mysqli for everything. I'm adding features to a small system I built, and some of the examples are pdo. I was about to convert to mysqli to match my system, but I realized it might be less work to change what I've already built to pdo. I've been reading about pdo vs mysqli all over the web.
But here is my question. When will I ever need more than mysqli? PDO offers three major things that mysqli doesn't. Twelve different drivers, named parameters, and prepared statements.
For a web-dev I don't really need the ability to use my web application over 18 database types.
The only major advantage I see is prepared statements.
What are major reasons to switch when the only language I am using is php? Does php even support named parameters now?
Is there even a way to get the number of results from a select using pdo?

You managed to confuse everything.
On your statement
mysqli does offer native prepared statements support as well
named parameters are just actually bloating your code with no purpose. Yet they are very easy to implement manually. There are some people called programmers who can actually program and implement whatever feature they need.
twelve different drivers are actually a bubble. Only few of them actually workable and you cannot switch databases by only changing a DSN. You need a higher level of abstraction such as ORM.
On what you have missed
mysqli way of handling prepared statements is indeed too complex for a newcomer.
as a matter of fact, mysqli is just an API that should never be used as is, but only as a source material for a higher level driver.
yet it has a lot of mysql-specific nitty-gritties obviously absent in a generalized driver like PDO
So, in brief
If you are going to create a database abstraction library for use with mysql - go for mysqli.
If your only idea of using API is to call it's methods directly in the code - then PDO is the only choice as it's already a semi-DAL and it's support for prepared statements is way easier to use.
On your other questions
Does php even support named parameters now?
PHP has nothing to do with named parameters.
Is there even a way to get the number of results from a select using pdo?
You don't need it. Once you have results, you can simply count them.

Is it possible to write a SQL statement in plain assembly language processor-level code?

Just recently a friend suggested it is possible and achievable (though very difficult) to write a SQL statement in assembly code, since every programming operation eventually gets down to processor-level execution.
I did a bit of research on SQL's behaviour and although it follows relational algebra's theory and platform-independent execution, I still believe that the level of abstraction and semantics are rather distant as to even consider a way to translate a SQL statement to assembly code (a very operations/memory/resources specific set of instructions).
Perhaps you could mimic a SQL statement's processor operations result and try to replicate it with a pure assembly set of instructions. You would come to realise though, that you still would not be writing/translating SQL statements.
Take for instance, MonetDB's SQL Reference page, they state the following in the third paragraph:
"The architecture is based on a compiler, which translates SQL
statements into the MonetDB Assembly Language (MAL). In this process
common optimization heuristics, specific to the relational algebra are
performed."
The SQL language however does not even allow for brute assembly instructions to be typed, whereas common languages such as C-based, and C# do allow for such typing/imports.
What do you guys think? Thanks for sharing your thoughts!

Anything that runs on your computer can be coded using an assembly language. If a SQL database can run on your machine, then it can be coded in assembly.
It can be ridiculously hard to do though.
The SQL example you mention isn't that far removed from what happens when C or other compiled languages are translated to machine code. Modern optimizing compilers don't translate your C code directly to assembly. They use one (or more) intermediate representations that are easier to perform optimizations on. It's a multi-step process, and the actual assembly output isn't the main part of it complexity-wise.
If you look at it that way, your SQL case is not very different. You could imagine an SQL pre-processor that produces native code from the MAL given a sufficiently fixed environment (schema notably). With something like that, adding extensions to that SQL dialect to allow inline assembly (for aggregate functions for instance) could make sens. And doing all that manually (i.e. without the pre-processor itself) would be possible.
You loose all the portability and flexibility you get from a runtime SQL interpreter though, would have to recompile every time your schema changes, data-dependent optimizations become nearly impossible, etc. So the situations where this would be useful are, I believe, very limited. (Same thing for other languages that are usually run through a VM or interpreter - compiling them down to native code usually carries heavy restrictions.)

The SQL language however does not even allow for brute assembly instructions to be typed, whereas common languages such as C-based, and C# do allow for such typing/imports.
No, SQL does not allow this because it is a higher level language than C (or C#). In SQL, the code describes what should be done and not how, nor any details on how to do it. The implementation has to parse the code and compile it into a set or low-level instructions that do what SQL code describes.
For example, for a SELECT we have no guarantee on what the plan to access the tables will be, in what order they will be accessed, which (if any) indices will be used, what type of operations will be used for joins, if temporary tables will be used or the sorting is done in memory, etc...
So, something like this would be ill-defined and extremely dangerous to be allowed:
SELECT *
FROM a_table AS a
JOIN another_table AS b
ON b.aid = a.id
WHERE b_data LIKE 'Alex%'
( .CODE
getRSP PROC
mov rax, rsp
add rax, 8
ret
getRSP ENDP
END
)
AND a_date BETWEEN '2000-01-01'
AND '2099-12-31'
ORDER BY b_year

If you are interested in compilation of relational queries/operations to assembler, you might want to check out this paper: http://www.vldb.org/pvldb/vol4/p539-neumann.pdf. In this DBMS, components of LLVM are used to produce CPU-instructions (which I assume is what you mean when you say assembler) from a query within the DBMS.
Also, even though I might be preaching to the choir, I want to make clear the MAL has nothing to do with CPU-instruction Assembler. Every single MAL-statement its backed by an implementation in C. MAL is used only (taadaa:) as an intermediate representation that is easy to optimized and interpret.

Well, the machine executes instructions you could have written in assembly. However, I wouldn't call writing the assembly language directly doing a SQL query. SQL could be interpreted very differently... e.g. by librarians consulting encyclopediae, in contexts where raw assembly might have little meaning.

No. SQL is an abstraction that can be interpreted* by different SQL implementations with different SQL environments with different physical layouts. Maybe the layouts even change over time, as you ALTER TABLE and now you have a mixture of old and new tuple layouts. Also, there's more you can do with SQL than just run it. You can also type-check it, analyze it to see what kind of effects it has, put it in a view definition or stored procedure, etc.
Here's another way to put it. Can you "write" HTML as assembly language? Maybe you can write a program that, when executed, has the same effect as a browser rendering a particular page. But can your program be processed by AdBlock, NoScript, and whatever other filters I have installed? Anything that supports all of the relevant operations on HTML is going to be isomorphic to HTML itself. Similarly with SQL, and any other language. Any other data structure, in fact: a change in representation must preserve the meaning of all the relevant operations on that data structure. And languages tend to have lots of relevant operations.
(* I don't mean "interpreted" as in "vs compiled"; I mean "given meaning".)

What, if any, are the disadvantages of SQL::Interp over SQL::Abstract?

I'm currently looking at some light-weight SQL abstraction modules. My workflow is such that i usually write SELECT queries manually, and INSERT/UPDATE queries via subs which take hashes.
Both of these modules seem perfect for my needs and i have a hard time deciding. SQL::Interp claims SQL::Abstract cannot provide full expressivity in SQL, but discusses no other differences.
Does it have any disadvantages? If so, which?

I can't speak to SQL::Interp, but I use SQL::Abstract and it's pretty good. In conjunction with DBIx::Connector and plain old DBI, I was able to totally eliminate the use of an ORM in my system with very little downside.
The only limitations I have run into is that it's not possible to write GROUP BY queries directly (although it's easy to do by simply appending to the generated query, and LIMIT queries are handled by the extension SQL::Abstract::Limit.

I used SQL::Abstract for a over a year, and then switched to SQL::Interp, which I've stuck with since.
SQL::Abstract had trouble with complex clauses. For the ones it could support, you would end up with a nest of "(" "[" and {" characters, which you were mentally translate back to meaning "AND", "OR" or actually parentheses.
SQL::Interp has no such limitations and uses no middle representation. Your SQL looks like SQL with bind variables where you want them. It works for complex queries as well as simple ones. I find SQL::Interp especially pleasant to use in combination with DBIx::Simple's built-in support for it. DBIx::Simple+SQL::Interp is a friendly and intuitive replacement for using raw DBI. I use the combination in a 100,000k+ LoC mod_perl web app.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas