Does Duck DB support triggers? - sql

I suspect the answer is no, but I just wanted to check if anyone has a way to implement triggers in DuckDB?
I have a SQLite database that relies heavily on views with INSTEAD OF INSERT/ UPDATE/ DELETE triggers to mask the underlying table structure from applications. Having heard good things about DuckDB I was hoping to try a port, but the lack of triggers (or an alternate mechanism for achieving the same goal) is a show stopper. I've no great desire to reimplement all of the functionality in the various host languages that access the database.

No.
The documentation of DuckDB (as of Oct'2022) doesn't contain any reference to database triggers.
That said, given the nature of the project -- an analytical (OLAP) database, instead of a transactional (OLTP) one -- I would say the developers have few incentives to implement this feature, anyway.
Also, this issue on GitHub DuckDB.
Both SQLite and DuckDB are two fantastic tools! But very different beasts. Use the best tool for your job.

Related

For a data analytical web app, would it be better to use django's auto-generated relational objects or raw SQL queries?

I am making an application that will serve as a demonstration of the data analysis capabilities our team can provide. I am very new to django and a the auto-generated APIs seem pretty cool, but I worry about scalability and having the quickness that only a carefully constructed and queried database can provide. Has anyone been in this situation and regretted/been satisfied with there choice of raw queries vs django APIs?
Django's ORM is great. It is one of the most complete and easy to use ORM I have ever encounter, but as every ORM, it has its limitations.
If your application requires full control of the database and very efficient queries, you might consider other approaches and compare them to Django and see which one fits better. You'll have to do some research.
Django is great for developing very fast complicated database applications, but I'm sure --if the application grows long enough-- sooner or later you'll have to start working directly with your database engine for optimization reasons. ORMs are generic tools, so database engine specific functions will not be available.
There is no rule to decide wheter or not django is gonna work for you but, one thing I can tell you is that its ORM helps you get your app started very quick and if you find some specific circumstance where you need to customize your SQL, then you can do it in Django as well. If not, just create a Python module which handle the database as you like in those specific circumstances and use it from your Django code. That probably will be the best way if you need to show your very efficient data analysis capabilities.
I hope this bring some light, it is a very wide question. One thing I'm sure is that you won't regret until your app grows big enough and when it does, you'll have the resources to find great programmers that could twist Python to handle every specific situation that behaves odd with Django's database accessing tools.
Found this link which may be helpful.
Good luck!
I'd consider this a case of premature optimization.
Django ORM is good enough in a general case, provided that your database is reasonably designed, has appropriate indexes, etc.
In > 90% cases this will be adequate, and often optimal.
When you will have identified specific complicated and slow queries, have reviewed the ORM-generated SQL and came up with a better query, then you may add a special case for it.
Maybe you will have more than one such special case. I still think that ORM will save you a lot of legwork in the 90% of database access cases where it is adequate.
Besides querying, an ORM allows you to describe the DB schema, its constraints, ways to recreate it and migrate it between versions, etc. Even if the ORM would not let you query the DB, these management capabilities would be enough reason to use an ORM.

Are modifiable join views a reasonable design choice?

To be clear, by modifiable join view I mean a view constructed from the joining of two or more tables that allows insert/update/delete actions that modify any/all of the component tables.
This may be a postgres specific question, not sure. I am also interested if other DBMSs have idiosyncratic features for modifiable join views, since as far as I can tell, they are not possible in standard SQL.
I'm working on a postgres schema, and some of my recent reading has suggested that it is possible to construct modifiable join views using instead rules (CREATE RULE ... DO INSTEAD ...). Modifiable join views seem desirable since it would allow for hiding strong normalization behind an interface, providing a mechanism for classic abstraction. Rules are the only option for implementation, since currently triggers cannot be set on views.
However, the first modifiable view I tried to design ran into problems, and I find out that many consider non-trivial rules to be harmful (see links in comments to this SO answer). Also, I can't find any examples of modifiable join views on the web.
Questions (Edit to put finer points on the questions):
Do you have any experience with modifiable join views and can you provide a concrete example with select/insert/delete/update ability?
Are they practical, i.e. can they be treated transparently without having to tiptoe around mines/black holes?
Are they ever a good design choice, in terms of functionality/effort ratio and maintainability?
Would greatly appreciate links to any examples/discussions on this topic. Thanks.
Yes, I have some experience with updatable views in general. I think they're practical in PostgreSQL. Like all design choices, they can be a good choice, and they can be a bad choice.
I find them particularly useful in dealing with supertype/subtype tables. I create one view for each subtype; the view joins the subtype to the supertype. Revoke permissions on the base tables, write rules for the view, and give client code access only to the views. All data manipulation done by client code then goes through the view and the rules defined on them.
I don't think rules are really different from any other feature in any other environment. And by environment, I mean C, C++, Java, Ruby, Python, Erlang, and BASIC, not just dbms environments.
Use the good features of a language. Avoid the bad ones.
"Don't use malloc()" is bad advice. "Always check the return value of malloc()" is good advice. "Never use rules" is bad advice. "Avoid using rules in ways that are known to have questionable behavior" is good advice. The rules you need for views on supertype/subtype tables are simple and easy to understand. They don't misbehave.
At the theoretical level, views provide logical data independence. But that's only possible if the views are updatable. (And many views should be updatable directly by the database engine, without any need of rules or triggers.)
I use them as a replacement for ORMs. I think as long as you do not run-a-muck sprinkling them everywhere through the database they can be easy enough to understand. I define a schema for an application and then whatever views are in that schema are the methods and operations of that app. The client code can be mostly automated after that since the views give the abstraction I need to write generic client code.
People point out that the rule rewrite is not a real table (but it is posing as one) which makes it possible to write things that will break. This is possible but I have yet to come across it yet. The idea is to hide the complexity in the rewrite and then only do simple deletes and update with no joins. If it turns out that a join is needed - it is time to rewrite the rule, not the top level query.
At the end, I find it a very compact way to write the database. All the ways of interfacing with it are written as rules. No connection should have access to a real table. Your business logic is very explicit. If a view does not have an UPDATE rule for it - it can not be updated period. Since you have written all this in the database level instead of the client level, it is not tied to a web framework or a particular language. This leads to a lot of flexibility in how you want to connect to the database. Imagine you used web framework, but as time goes on you need direct access to the database for another source. Direct access will also bypass all of ORM business rules you worked so hard on. With a rule writing interface you can expose, the interface without fear that the new connection will corrupt the data.
If people say you can really F UP a database with them - then sure - of course you can. But you can with everything else too. If people say you can not use them at all with out mucking things up, then I would disagree.
Two quick links:
Why using rules is bad idea
Triggers on views
My personal preference is to use views only for reading data, (virtually) never for inserting or updating. By essentially re-normalizing data (which sounds like what you are doing) in your database, you are likely creating a system that will be very difficult to test and maintain in the long term.
If at all possible, look at mapping your denormalized data back to a normal schema somewhere in your application code, and providing it to the database that way (to individual tables IMHO) in a single transaction.
I know in SQL Server if you update a view you must limit the change to only one table anyway which makes using views for updating useless in my mind as you have to know which fields go with which tables anyway.
If you want to abstract the information out and not have to worry about the database structure for inserts adn updates, an ORM mught do a better job for you than views.
I have never used modifiable views of any sort but as you are asking whether they are a "reasonable design choice", can I suggest an alternative design choice with many benefits where modifiable views are not needed: a Transactional API
Basically what this amounts to is:
Users have no access to tables and cannot issue insert, update, delete statements at all
Users have access to functions that represent well defined transactions - at the simplest level these may just do a single DML, but often would not. The important thing is that they map to transactions in the 'business' sense rather than in the 'database' sense
For querying, users have access to (non-modifiable) views
I do usually do views in the form of "last-valid-record" just hidding and tracking modifications (like a wiki)
The only drawback that I see to this is: then you use your view as a table, and you join it with anything, and and you use it on "wheres", and you insert records on it, and so on, but behinds you have made lot of performance lost compared to the same acctions against a real table (more bigger and more complex). I think it depends on how many people must understud de schema. Its true that some DBMS also admit to index the views, but I think you lose an important amount of performance anyway. Sorry about my english.

How to maintain SQL scripts when developing an application working against many databases

Imagine an application which is supposed to work with different database vendors. As we all know the syntax for SQLs (especially DDL) is not portable. How do you deal with maintaing the SQL scripts?
Until now I see three options:
to store SQLs in format of one of the databases and have a tool which automatically converts from one syntax do another (do you know such tools?)
to store SQLs in some artificial language and a have a tool which is able to generate vendor-specific SQLs on demand (any recommendation here?)
to store SQLs in many database formats neglecting the redundancy (this is the worst one, isn't it?)
Do you recommend any of them? Do you have a better idea?
The development environment tries to follow the continuous integration principles, so automation is a key feature here.
Have a look at Liquibase (that's essentially your second item on the list)
http://www.liquibase.org
It's not perfect (e.g. it does not support check constraints) but it is quite useful
This video shows a solution using the Subsonic project http://subsonicproject.com/docs/Using_SimpleRepository and its data migration capabilities. The strategy is to use a general language and apply it to different databases.
Hope this is what you were looking for
Use some kind of ORM framework with schema generation capability.

ORM: Handwritten schema or auto-generated?

Should I use a hand-written schema for my projected developed in a high-level language (such as Python, Ruby) or should I let my ORM solution auto-generate it?
Eventually I will need to migrate without destroying all the data. It's okay to be tied to a specific RDBMS but it would be nice if features such as constraints and procedures could be supported somehow.
I never go with ORM-generated schema.
I find that the ways in which the ORM wants to generate the schema are often at total odds with how I want my database to be structured. Also, and I know this is trivial, the nomenclature scheme is usually poor.
Database structure has its own constraints, that I find that usually the ORM autogeneration tools don't consider fully. And if you're going to be wanting to run reports on your database later (and you will), then having good database structure and design is very important.
See this Coding Horror article and links for discussion on that migration you'll eventually need to do. Plan for it now.
Also see Martin Fowler on database evolution; I particularly recommend the notion that test data generation is part of database set-up. The idea may be a little underdeveloped, in that there is not a clear delineation of the different problems in different environments, development versus QA versus production.
Let the ORM generate the schema it wants. Then you can always change things that are too slow or that you want differently. But it allows you to quickly get started and have something working plus the ORM people usually know what they do when it comes to generating schemas.
Let your ORM solution generate it, but don't just blindly use it; read through it and sanity-check it.

Languages other than SQL in postgres

I've been using PostgreSQL a little bit lately, and one of the things that I think is cool is that you can use languages other than SQL for scripting functions and whatnot. But when is this actually useful?
For example, the documentation says that the main use for PL/Perl is that it's pretty good at text manipulation. But isn't that more of something that should be programmed into the application?
Secondly, is there any valid reason to use an untrusted language? It seems like making it so that any user can execute any operation would be a bad idea on a production system.
PS. Bonus points if someone can make PL/LOLCODE seem useful.
#Mike: this kind of thinking makes me nervous. I've heard to many times "this should be infinitely portable", but when the question is asked: do you actually foresee that there will be any porting? the answer is: no.
Sticking to the lowest common denominator can really hurt performance, as can the introduction of abstraction layers (ORM's, PHP PDO, etc). My opinion is:
Evaluate realistically if there is a need to support multiple RDBMS's. For example if you are writing an open source web application, chances are that you need to support MySQL and PostgreSQL at least (if not MSSQL and Oracle)
After the evaluation, make the most of the platform you decided upon
And BTW: you are mixing relational with non-relation databases (CouchDB is not a RDBMS comparable with Oracle for example), further exemplifying the point that the perceived need for portability is many times greatly overestimated.
"isn't that [text manipulation] more of something that should be programmed into the application?"
Usually, yes. The generally accepted "three-tier" application design for databases says that your logic should be in the middle tier, between the client and the database. However, sometimes you need some logic in a trigger or need to index on a function, requiring that some code be placed into the database. In that case all the usual "which language should I use?" questions come up.
If you only need a little logic, the most-portable language should probably be used (pl/pgSQL). If you need to do some serious programming though, you might be better off using a more expressive language (maybe pl/ruby). This will always be a judgment call.
"is there any valid reason to use an untrusted language?"
As above, yes. Again, putting direct file access (for example) into your middle tier is best when possible, but if you need to fire things off based on triggers (that might need access to data not available directly to your middle tier), then you need untrusted languages. It's not ideal, and should generally be avoided. And you definitely need to guard access to it.
These days, any "unique" or "cool" feature in a DBMS makes me incredibly nervous. I break out in a rash and have to stop work until the itching goes away.
I just hate to be locked in to a platform unnecessarily. Suppose you build a big chunk of your system in PL/Perl inside the database. Or in C# within SQL Server, or PL/SQL within Oracle, there are plenty of examples*.
Now you suddenly discover that your chosen platform doesn't scale. Or isn't fast enough. Or something. Worse, there's a new kid on the database block (something like MonetDB, CouchDB, Cache, say but much cooler) that would solve all your problems (even if your only problem, like mine, is having an uncool databse platform). And you can't switch to it without recoding half your application.
(*Admittedly, the paid-for products are to some extent seeking to lock you in by persuading you to use their unique features, which is not an accusation that can directly be levelled at the free providers, but the effect is the same).
So that's a rant on the first part of the question. Heart-felt, though.
is there any valid reason to use an
untrusted language? It seems like
making it so that any user can execute
any operation would be a bad idea
My goodness, yes it does! A sort of "Perl injection attack"? Almost worth doing it just to see what happens, I'd have thought.
For philosophical reasons outlined above I think I'll pass on the PL/LOLCODE challenge. Although I was somewhat amazed to discover it was a link to something extant.
From my perspective, I guess the answer is 'it depends'.
There is an argument that manipulation of the data belongs in the database layer, so that the business logic does not need to be overly concerned about how the manipulation happens, it just knows that it has.
Another very good reason to process data on the db layer is if the volume of data being crunched means that network bandwidth will become an issue. I once had to categorise very large amounts of data. Processing this in the application layer was severly restricted by the time required to transfer all the data across the network for processing.
I then wrote a binning algorithm in PL/pgSQL and it worked much faster.
Regarding untrusted languages, I heard a podcast from Josh Berkus (a postgres advocate) who discussed an application of postgresql that brought in data from MySQL as part of its processing, so that the communication itself was handled by the postgres server. I don't remember the full details, I think it was on the FLOSS Weekly podcast which is quite an interesting discussion of the history of PostGRESQL and some of the issues it is put to.
The untrusted versions of the procedural languages allow you to access I/O on the system.
This can come in handy if you need a trigger or something send a email or connect to a socket server to send a popup notification. There are tons of uses for this type of thing, and because of postgresql isolation levels you cans safely do things like this.
You can put checkpoints in the function so if the transaction fails the email or whatever won't go out. The nice thing about doing this is it removes the logic from the client and puts it on the server.
I think most additional languages are offered so that if you develop in that language on a regular basis, you can feel comfortable writing db functions, triggers, etc. The usefulness of these features is to provide a control over data as close to the data as possible.
An example of a useful stored procedure I recently wrote in an external language that would not have been possible in pl/sql is a version of 'df' which allowed SQL table generators to pick a tablespace with the most free space available at runtime.
I used plperlu, and it was relatively simple, although I had to be careful with data typing.