Design patterns for a large analysis-oriented SQL script library? [closed] - sql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I am writing a large number of SQL scripts to be shared by a number of people, who will generally work with the scripts by editing them slightly for their specific purpose, then using the modified script to generate a dataset for further analysis.
I've struggled with the best way to build and organize such a library. I want more structure than just a folder with a long list of files like "signups_and_spend_by_week.sql." I see basically two, inter-related issues:
Parameterization. Things like dates and sample sizes are relatively easy to pull out and make into variables, but what about parameters that alter the nature of the query---say changing a left join to an inner join? Is it wise to just (a) comment these kinds of possibilites, (b) create two versions (hence creating serious DRY problems), or maybe (c) wrap queries with a higher-lever programming language that can more easily represent these kinds of possibilites, e.g.,
q = "SELECT * FROM plants"
if want_all:
q = _q + "LEFT JOIN fruits ON fruits.plant_id = plants.id"
else:
q = _q + "INNER JOIN fruit ON fruits.plant_id = plants.id"
run_query(q)
DRY. I find that the same patterns continually appear, with similar or identical views being created as intermediate steps for more complex queries. My thinking is to split these re-usable bits out and call them as needed. However, my concern is that this then creates dependencies all over the library, and that seemingly innocuous changes to a "base" query might might inadvertently break someone else's query. Other than just discipline and good documentation/rules, is their some compromise possible, e.g., automated testing?

This looks like a problem that should not be treated in a purely technical way.
Basically you will offer a base of SQL's that other people can refine.
Some of your base SQL's will have bugs, run slow in certain circumstances, solve problems that don't exist, are spot on, ... and they will be improved, altered, neglected, praised, ...
It is hard to foresee what will work best for you and your co-workers. At least I don't have an idea.
I would suggest that you first start very basic, very simple: with a directory with SQL scripts, giving the files meaningful names.
- Allow people to look at them, use them, change them, derive from them, comment on them, give points for their usefulness, ....
- Have frequent meetings with all involved.
- Try to find out what works well, what your group needs, what things don't work at all, ....
- Only when you, as a group, start to see clearly what tools you need to support what you're doing, start thinking and designing a system to supports your needs. Only then store your scripts in a database, if that still makes sense.
Don't start designing now, you will probably throw 80% away, and the 20% you keep will be to spare the feelings of the one who has spent so much time up front trying to come up with something useful for the group.
For me this is really a situation where a SCRUM approach works best: a situation that no one has a clear view of on how it should be best build. Communication, short sprints that try to address problems, interactivity, change the things that didn't work out as planned, ... those seem to me the key words and phrases for this project.
Let it grow, don't assume you can grok now how it will be.
(I wrote this out of the assumption that it is as clear to you as it is to me how your project will evolve. If you have a clear view of what it will look like and have done this before, my comments will be way off.)

My few thoughts:
Separate application with:
-tags for scripts marking
-search functionality (Google like)
-scripts verifications (with metadata)
-script runner with GUI for parameters entry
Scripts:
Some metalanguage over SQL is a must, but still it will be a DSL, with all disadvantages. So I will try to build your DSL in spirit of minimalism. My recomendation is unobstrucive style of DSL: all meta informations in comments, control statements too. Something like this:
/*
%%Parameters:
%%StartDate:Date:Please enter date for blah blah blah
%%EndDate:Date:And enter date for blah blah blah
%%Verions:
%%AllProducts:Defalt
%%Friuts:Optional
*/
SELECT * FROM plants
--%%Version:AllProducts
LEFT JOIN fruits ON fruits.plant_id = plants.id
--%%EndOfVersion
--%%Version:Fruits
-- INNER JOIN fruit ON fruits.plant_id = plants.id
--%%EndOfVersion
WHERE
StartDate >= %%StartDate AND
EndDate <= %%EndDate
DRY: It's highly subjective but I will recoment not to build infrastructure for subscripts, incuding mechanism, etc. IMO there is more costs than profits.

Related

Pros and cons of putting logic in SQL? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
At a new job, I've just been exposed to the concept of putting logic into SQL statements.
In MySQL, a dumb example would be like this:
SELECT
P.LastName, IF(P.LastName='Baldwin','Michael','Bruce') AS FirstName
FROM
University.PhilosophyProfessors P
// This is like a ternary operator; if the condition is true, it returns
// the first value; else the second value. So if a professor's last name
// is 'Baldwin', we will get their first name as "Michael"; otherwise, "Bruce"**
For a more realistic example, maybe you're deciding whether a salesperson qualifies for a bonus. You could grab various sales numbers and do some calculations in your SQL query, and return true / false as a column value called "qualifies."
Previously, I would have gotten all the sales data back from the query, then done the calculation in my application code.
To me, this seems better, because if necessary, I can walk through the application logic step-by-step with a debugger, but whatever the database is doing is a black box to me. But I'm a junior developer, so I don't know what's normal.
What are the pros and cons of having the database server do some of your calculations / logic?
**Code example based on Monty Python sketch.
This way SQL becomes part of your domain model. It's one more (and not necessarily obvious) place where domain knowledge is implemented. Such leaks result in tighter coupling between business logic / application code and database, what usually is a bad idea.
One exception is views, report queries etc. But these usually are so isolated that it's obvious what role they play.
One of the most persuasive reasons to push logic out to the database is to minimise traffic. In the example given, there is little gain, since you are fetching the same amount of data whether the logic is in the query or in your app.
If you want to fetch only users with a first name of Michael, then it makes more sense to implement the logic on the server. Actually, in this simple example, it doesn't make much difference, since you could specify users who's lastname is Baldwin. But consider a more interesting problem, whereby you give each user a "popularity" score based on how common their first and last names are, and you want to fetch the 10 most "popular" users. Calculating "popularity" in the app would mean that you have to fetch every single user before ranking, sorting and choosing them locally. Calculating it on the server means you can fetch just 10 rows across the wire.
There aren't a lot of absolute pros and cons to this argument, so the answer is 'it depends.' Some scenarios with different conditions that affect this decision might be:
Client-server app
One example of a place where it might be appropriate to do this is an older 4GL or rich client application where all database operations were done through stored procedure based update, insert, delete sprocs. In this case the gist of the architecture was to have the sprocs act as the main interface for the database and all business logic relating to particular entities lived in the one place.
This type of architecture is somewhat unfashionable these days but at one point it was considered to be the best way to do it. Many VB, Oracle Forms, Informix 4GL and other client-server apps of the era were done like this and it actually works fairly well.
It's not without its drawbacks, however - SQL is not particularly good at abstraction, so it's quite easy to wind up with fairly obtuse SQL code that presents a maintenance issue through being hard to understand and not as modular as one might like.
Is it still relevant today? Quite often a rich client is the right platform for an application and there's certainly plenty of new development going on with Winforms and Swing. We do have good open-source ORMs today where a 1995 vintage Oracle Forms app might not have had the option of using this type of technology. However, the decision to use an ORM is certainly not a black and white one - Fowler's Patterns of Enterprise Application Architecture does quite a good job of running through a range of data access strategies and discussing their relative merits.
Three tier app with rich object model
This type of app takes the opposite approach, and places all of the business logic in the middle tier model object layer with a relatively thin database layer (or perhaps an off-the-shelf mechanism like an ORM). In this case you are attempting to place all the application logic in the middle-tier. The data access layer has relatively little intelligence, except perhaps for a handful of stored procedured needed to get around limits of an ORM.
In this case, SQL based business logic is kept to a minimum as the main repository of application logic is the middle-tier.
Overhight batch processes
If you have to do a periodic run to pick out records that match some complex criteria and do something with them it may be appropriate to implement this as a stored procedure. For something that may have to go over a significant portion of a decent sized database a sproc based approch is probably going to be the only reasonably performant way to do this sort of thing.
In this case SQL may well be the appropriate way to do this, although traditional 3GLs (particularly COBOL) were designed specifically for this type of processing. In really high volume environments (particularly mainframes) doing this type of processing with flat or VSAM files outside a database may be the fastest way to do it. In addition, some jobs may be inherently record-oriented and procedural, or may be much more transparent and maintanable if implemented in this way.
To paraphrase Ed Post, 'you can write COBOL in any language' - although you might not want to. If you want to keep it in the database, use SQL, but it's certainly not the only game in town.
Reporting
The nature of reporting tools tends to dictate the means of encoding business logic. Most are designed to work with SQL based data sources so the nature of the tool forces the choice on you.
Other domains
Some applications like ETL processing may be a good fit for SQL. ETL tools start to get unwiedly if the transformation gets too complex, so you may want to go for a stored procedure based architecture. Mixing Queries and transformations across extraction, ETL processing and stored-proc based processing can lead to a transformation process that is hard to test and troubleshoot.
Where you have a significant portion of your logic in sprocs it may be better to put all of the logic in this as it gives you a relatively homogeneous and modular code base. In fact I have it on fairly good authority that around half of all data warehouse projects in the banking and insurance sectors are done this way as an explicit design decision - for precisely this reason.
Many times the answer to this type of question is going to depend a great deal on deployment approach. Where it makes the most sense to place your logic depends on what you'll need to be able to get access to when making changes.
In the case of web applications that aren't compiled, it can be easier to deal with changes to a page or file than it is to work with queries (depending on query complexity, programming backgrounds / expertise, etc). In these kinds of situations, logic in the scripting language is typically ok and make make it easier to revise later.
In the case of desktop applications that require more effort to modify, placing this kind of logic in the database where it can be adjusted without requiring a recompilation of the application may benefit you. If there was a decision made that people used to qualify for bonuses at 20k, but now must make 25k, it'd be much easier to adjust that on the SQL Server than to recompile your accounting application for all of your users, for example.
I'm a strong advocate of putting as much logic as possible directly into the database. That means incorporating it in views and stored procedures. I believe that most follows the DRY principle.
For example, consider a table with FirstName and LastName columns, and an application that frequently makes use of a FullName field. You have three choices:
Query first and last name and compute the full name in application code.
Query first, last, and (first || last) in your application's SQL whenever you query the table.
Define a view CustomerExt that includes the first and last columns, and a computed full name column and then query against that view, rather than the customer table.
I believe option 3 is clearly correct. Consider the addition of a MiddleInitial field to the table and the full name computation. Using option 3, you simply need to replace the view and every application across your company will instantly use the new format for FullName. The view still makes the base columns available for those instances in which you need to do some special formatting, but for the standard instance everything works "automatically".
That's a simple case, but the principle is the same for more complex situations. Perform application- or company-wide data logic directly in the database and you do not need to concern yourself with keeping different applications up to date.
The answer depends on your expertise and your familiarity with the technologies involved. Also, if you're a technical manager, it depends on your analysis of the skills of the people working on your team and whom you intend on hiring / keeping on staff to support, extend and maintain the application in future.
If you are not literate and proficient in the database , (as you are not) then stick with doing it in code. If otoh, you are literate and proficient in database coding (as you should be), then there is nothing wrong (and a lot right) abput doing it in the database.
Two other considerations that might influence your decision are whether the logic is of such a complex nature that doing it in database code would be inordinately more complex or more abstract than in code, and second, if the process involved requires data from outside the database (from some other source) In either of these scenarios I would consider moving the logic to a code module.
The fact that you can step through the code in your IDE more easily is really the only advantage to your post-processing solution. Doing the logic in the database server reduces the sizes of result sets, often drastically, which leads to less network traffic. It also allows the query optimizer to get a much better picture of what you really want done, again often allowing better performance.
Therefore I would nearly always recommend SQL logic. If you treat a database as a mere dumb store, it will return the favor by behaving dumb, and depending on the situation, that can absolutely kill your performance - if not today, possibly next year when things have taken off...
That particular first example is a bad idea. Per-row functions do not scale well as the table gets bigger. In fact, a (likely) better way to do it would be to index LastName and use something like:
SELECT P.LastName, 'Michael' AS FirstName
FROM University.PhilosophyProfessors P
WHERE P.LastName = 'Baldwin'
UNION ALL SELECT P.LastName, 'Bruce' AS FirstName
FROM University.PhilosophyProfessors P
WHERE P.LastName <> 'Baldwin'
On databases where data are read more often than written (and that's most of them), these sorts of calculations should be done at write time such as using an insert/update trigger to populate a real FirstName field.
Databases should be used for storing and retrieving data, not doing massive non-databasey calculations that will slow down everything.
One big pro: a query may be all you can work with. Reports have been mentioned: many reporting tools or reporting plugins to existing programs only allow users to make their own queries (the results of which they will display).
If you cannot alter the code (because it isn't yours), you may yet be able to alter a query. And in some cases (data migration), you'll be writing queries to do migration as well.
I like to distinguish data vs business rules, and push the data rules into the stored procs as much as possible. There is not always a hard and fast distinction between the two, but in your example of calculating sales bonuses, the formula itself might be a business rule but the work of gathering and aggregating the various figures used in the formula is a data rule.
Sometimes, though, it depends on the deployment model and change control procedures. If the sales formula changes frequently and deployment of the business layer code is cumbersome, then tweaking just one function/stored proc in the database would be a great solution.
I'm a big fan of elegant database queries because the code is closer to the data and SQL works very well. But such queries, whether they're text in you app, generated by an OR mapper or stored in the database are harder to test, especially in the cloud, because you need a database to run against.
Database is exactly what it's called. DATABASE.
You should not mix the business logic with data layer.
Keep it separate as any close coupling between data and business makes impossible to follow best standards in programming.
I was working recently on a project where all logic was in MS SQL. Horrible idea, that back-fired after few years (energy company), no easy way to scale-out, no easy way to follow up CI/CD, Agile or code repos. Very difficult to co-work, very slow and very inefficient.
Company basically was reaching hardware limits in order to make it work (they've spent £100k on SSD SAN), while you could reach the same performance with C# for business and keep the database for data, with perhaps 3-4 cheap servers, that could easily scale-out.
Horrible, horrible idea. Guess what ? Company went under, as one time SQL server has reached it's potential (sometimes some queries were running for hours (very well written, but SQL is not for business logic. End of story)) when one time failed to bill all DD customers and basically didn't took the monthly payment that they needed to survive till next month (millions of pounds).

maintaining query-oriented applications [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I am currently doing some kind of reporting system.the figures, tables, graphs are all based on the result of queries. somehow i find that complex queries are not easy to maintain, especially when there are a lot of filtering. this makes the query very long and not easy to understand. And also, sometimes, queries with similar filters are executed, making a lot of redundant code, e.g. when i am going to select something between '2010-03-10' and '2010-03-15' and the location is 'US', customer group is "ZZ", i need to rewrite these conditions each time i make a query in this scope. does the dbms (in my case, mysql) support any "scope/context" to make the coding more maintainable as well as the speed faster?
also, is there a industrial standard or best practice for designing such applications?
i guess what I am doing is called data mining, right?
Learn how to create views to eliminate redundant code from queries. http://dev.mysql.com/doc/refman/5.0/en/create-view.html
No, this isn't data mining, it's plain old reporting. Sometimes called "decision support". The bread and butter of information technology. Ultimately, play old reporting is the reason we write software. Someone needs information to make a decision and take action.
Data mining is a little more specialized in that the relationships aren't easily defined yet. Someone is trying to discover the relationships so they can then write a proper query to make use of the relationship they found.
You won't make a very flexible reporting tool if you are hand coding the queries. Every time a requirement changes you are up to your neck in fiddly code trying to satisfy it - that way lies madness.
Instead you should start thinking about a meta-layer above your query infrastructure and generating the sql in response to criteria expressed by the user. You could present them with a set of choices from which you could generate your queries. If you give a bit of thought to making those choices extensible you'll be well on your way down the path of the many, many BI and reporting products that already exist.
You might also want to start looking for infrastructure that does this already, such as Crystal Reports (swallowed by Business Objects, swallowed by SAP) or Eclipse's BIRT. Depending on whether you are after a programming exercise or a solution to your users' reporting problems you might just want to grab an off the shelf product which has already had tens of thousands of man years of development, such as one of those above or even Cognos (swallowed by IBM) or Hyperion (swallowed by Oracle).
Best of luck.

How to write a simple database engine [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
The community reviewed whether to reopen this question 12 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I am interested in learning how a database engine works (i.e. the internals of it). I know most of the basic data structures taught in CS (trees, hash tables, lists, etc.) as well as a pretty good understanding of compiler theory (and have implemented a very simple interpreter) but I don't understand how to go about writing a database engine. I have searched for tutorials on the subject and I couldn't find any, so I am hoping someone else can point me in the right direction. Basically, I would like information on the following:
How the data is stored internally (i.e. how tables are represented, etc.)
How the engine finds data that it needs (e.g. run a SELECT query)
How data is inserted in a way that is fast and efficient
And any other topics that may be relevant to this. It doesn't have to be an on-disk database - even an in-memory database is fine (if it is easier) because I just want to learn the principals behind it.
Many thanks for your help.
If you're good at reading code, studying SQLite will teach you a whole boatload about database design. It's small, so it's easier to wrap your head around. But it's also professionally written.
SQLite 2.5.0 for Code Reading
http://sqlite.org/
The answer to this question is a huge one. expect a PHD thesis to have it answered 100% ;)
but we can think of the problems one by one:
How to store the data internally:
you should have a data file containing your database objects and a caching mechanism to load the data in focus and some data around it into RAM
assume you have a table, with some data, we would create a data format to convert this table into a binary file, by agreeing on the definition of a column delimiter and a row delimiter and make sure such pattern of delimiter is never used in your data itself. i.e. if you have selected <*> for example to separate columns, you should validate the data you are placing in this table not to contain this pattern. you could also use a row header and a column header by specifying size of row and some internal indexing number to speed up your search, and at the start of each column to have the length of this column
like "Adam", 1, 11.1, "123 ABC Street POBox 456"
you can have it like
<&RowHeader, 1><&Col1,CHR, 4>Adam<&Col2, num,1,0>1<&Col3, Num,2,1>111<&Col4, CHR, 24>123 ABC Street POBox 456<&RowTrailer>
How to find items quickly
try using hashing and indexing to point at data stored and cached based on different criteria
taking same example above, you could sort the value of the first column and store it in a separate object pointing at row id of items sorted alphabetically, and so on
How to speed insert data
I know from Oracle is that they insert data in a temporary place both in RAM and on disk and do housekeeping on periodic basis, the database engine is busy all the time optimizing its structure but in the same time we do not want to lose data in case of power failure of something like that.
so try to keep data in this temporary place with no sorting, append your original storage, and later on when system is free resort your indexes and clear the temp area when done
good luck, great project.
There are books on the topic a good place to start would be Database Systems: The Complete Book by Garcia-Molina, Ullman, and Widom
SQLite was mentioned before, but I want to add some thing.
I personally learned a lot by studying SQlite. The interesting thing is, that I did not go to the source code (though I just had a short look). I learned much by reading the technical material and specially looking at the internal commands it generates. It has an own stack based interpreter inside and you can read the P-Code it generates internally just by using explain. Thus you can see how various constructs are translated to the low-level engine (that is surprisingly simple -- but that is also the secret of its stability and efficiency).
I would suggest focusing on www.sqlite.org
It's recent, small (source code 1MB), open source (so you can figure it out for yourself)...
Books have been written about how it is implemented:
http://www.sqlite.org/books.html
It runs on a variety of operating systems for both desktop computers and mobile phones so experimenting is easy and learning about it will be useful right now and in the future.
It even has a decent community here: https://stackoverflow.com/questions/tagged/sqlite
Okay, I have found a site which has some information on SQL and implementation - it is a bit hard to link to the page which lists all the tutorials, so I will link them one by one:
http://c2.com/cgi/wiki?CategoryPattern
http://c2.com/cgi/wiki?SliceResultVertically
http://c2.com/cgi/wiki?SqlMyopia
http://c2.com/cgi/wiki?SqlPattern
http://c2.com/cgi/wiki?StructuredQueryLanguage
http://c2.com/cgi/wiki?TemplateTables
http://c2.com/cgi/wiki?ThinkSqlAsConstraintSatisfaction
may be you can learn from HSQLDB. I think they offers small and simple database for learning. you can look at the codes since it is open source.
If MySQL interests you, I would also suggest this wiki page, which has got some information about how MySQL works. Also, you might want to take a look at Understanding MySQL Internals.
You might also consider looking at a non-SQL interface for your Database engine. Please take a look at Apache CouchDB. Its what you would call, a document oriented database system.
Good Luck!
I am not sure whether it would fit to your requirements but I had implemented a simple file oriented database with support for simple (SELECT, INSERT , UPDATE ) using perl.
What I did was I stored each table as a file on disk and entries with a well defined pattern and manipulated the data using in built linux tools like awk and sed. for improving efficiency, frequently accessed data were cached.

What simple guidelines would you give your developers for writing good SQL against Oracle? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I work in a group of about 25 developers. I'm responsible for coming up with the database design (tables, views, etc) and am called apon for performance tuning when necessary.
There are a couple of different applications that connect. Database access is via JDBC, hibernate, and iBatis SQL maps. Developers with various levels of experience write SQL statements.
What guidelines would you give to developers to write good SQL?
By good I mean: correct, performs well, easy to understand and maintain.
These are just meant to be easy to follow guidelines - I want to get people onto the right track for the majority of situations. We will break these guidelines when it makes sense.
EDIT: We have in place code reviews for all source commits (SQL, java, etc) enforced through a jira workflow.
If you have 25 developers writing SQL queries against your database you are in quite a bit of trouble. Guidelines are not worth much when your junior developers are learning SQL and checking in a mess.
I would like to offer 4 recommendations
Use an ORM of sorts so your all your devs write less SQL.
Invest in training, buy books, send people to courses.
Have all the SQL reviewed by the senior SQL developers, by all, I mean every SQL statement, no exceptions. This way your senior guys will be able to teach the juniors over time.
Have a single person, who lives and breaths Oracle, responsible for the database. By responsible I mean knows every query, understands all the structure and is able to give expert advice.
Here are some additional things you may add to your existing guidelines/checklist.
Have you tested your queries on a large data set? How was performance?
Have you performed a quick index review on the tables that are being accessed? Are all the right indexes in place? Do you recommend and new indexes?
For high volume queries, are any covering indexes required?
Are you using "NOT IN" in cases where a "LEFT JOIN" should be used?
Is your work transactionally sound? Are you missing a transaction somewhere?
Here's what I already have in my guidelines.
Work in sets, not row by row
The best way to make something go quicker is to avoid doing work you do not have to do
Databases love to join
Fully qualify and specify column names (so SQL does not break when additional columns are added)
Select only the data you need (never select *, never more rows than you require, never every column just becaues it's there)
How to use rownum to limit resultsets
Bind Variables vs Literals (use bind variables in all but a few special cases related to skewed data)
Avoid functions or calculations on columns in the WHERE clause (except for a special case of function based index)
Use ORDER BY for all queries returning more than one row (this is mostly for testability)
Each of these points is expanded a bit in the actual guidelines I've written out with an example relevant to our database schema.
Read Tom Kyte's books. He explains how you can write fast code and how you can measure performance and scalability. If you have a problem you can probably find the answer on the "ask tom"-site.
Introduce basic style guide that covers:
naming (of everything - tables, columns, procedures, aliases, ...) .
formatting style
line width
what reserved words require new line (e.g where)
are reserved word capitalized or small caps
indenting
...
Here are some examples:
Oracle PL/SQL Programming, Fourth Edition. There is older, 2nd edition - available online
SQL and PL/SQL Coding Standards
Be very strict about naming, it will be easier for you to read other people's code.
As formatting is concerned, there are tools available that can format automatically, so maybe you don't need very detailed description here.
If you are a database developer, you need to know what an EXECUTION PLAN is. If you don't then go mine coal or something.
Before developing:
first, you think what best EXECUTION PLAN will be,
second you create tables and indexes, and
third you use hints to persuade the optimizer to come out with the plan you made.
You do use hints. Forget automatic optimization, it's a marketing myth. No optimizer knows your data better than you and never will.
There are no "programmers who create queries" and "system administrators who create indexes". Programmers program, system administrators make backups (or whatever they make).
Triggers are evil.
Prefix you columns, tables and views (SELECT prs_name FROM t_person)
Make lines and indent
An hour long presentation on some Oracle fundamentals (eg parsing, SGA vs PGA). "Do this" rules may or may not apply to your situation. Give them an understanding of what the DB side does, and they at least have a basis on which to make a decision.
Plus Code reviews.
Pair-program. Any advantange it provides for agile development in general, at least doubles for SQL development.
Second choice, code reviews for all SQL.
Along with the recommendation to have queries reviewed by senior programmers, if you can get the buy-in, have code reviews which involve as many team members as possible.
I'm by no means a guru but here are my tips:
Don't use ORDER BY unless you really need an ordered list as it incurs a performance hit.
Understand the explain plan and also recognise that the plan on your development environment is often different from your production environment. Don't expect it to accurately reflect real life performance
The pros of using hints is that you get to choose your explain plan, the cons of using hints is that the optimal plan may change over time and you might be choosing a plan that is suboptimal in the long term
Make sure the developers know when to use INNER JOIN, OUTER JOIN, [NOT] IN, [NOT] EXISTS - you can put in place a lot of processes but one or two Cartesian products will bring production performance to its knees
Ensure your developers understand indexes - what they are, when they should be used, when they should be avoided
Have a DBA monitor the most executed queries and the most expensive queries and highlight these as candidates for optimisation
Peer review
Coding standards (especially code comments on particularly long/complex queries)
Unit testing
Don't write SQL if you can help it, use HQL (or JPQL is on Java EE) whenever possible
Don't use SELECT *
Pick your internet sources wisely (e.g. asktom.oracle.com)
Don't use cursors
Don't do string concatenation in SQL
Write queries such that they use indexes (fundamentally this means base WHERE predicates on the indexes that exist)
use MERGE instead of other awkward 'upsert' type logic
When working with dates, make sure you understand how they're stored in Oracle vs. how they are stored in Java, especially when it relates to TimeZone. Depending on the Calendar/Date types, this information can be stripped out, remapped to the TZ of the default locale, etc.
Most importantly: Don't use the excuse of being a developer for not knowing how to write good SQL, and how the database works. You don't have to be a DBA, but you need to invest in your own training to make yourself suitable for the task. By the same token, your company needs to invest in that as well.
I don't mean to say that these "Don'ts" always apply. It's just that, if you're talking about a developer who is not comfortable with Oracle, they need to know what they're doing before they start deciding whether those types of things are necessary and appropriate.

A beginner's guide to SQL database design [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Do you know a good source to learn how to design SQL solutions?
Beyond the basic language syntax, I'm looking for something to help me understand:
What tables to build and how to link them
How to design for different scales (small client APP to a huge distributed website)
How to write effective / efficient / elegant SQL queries
I started with this book: Relational Database Design Clearly Explained (The Morgan Kaufmann Series in Data Management Systems) (Paperback) by Jan L. Harrington and found it very clear and helpful
and as you get up to speed this one was good too Database Systems: A Practical Approach to Design, Implementation and Management (International Computer Science Series) (Paperback)
I think SQL and database design are different (but complementary) skills.
I started out with this article
http://en.tekstenuitleg.net/articles/software/database-design-tutorial/intro.html
It's pretty concise compared to reading an entire book and it explains the basics of database design (normalization, types of relationships) very well.
Experience counts for a lot, but in terms of table design you can learn a lot from how ORMs like Hibernate and Grails operate to see why they do things. In addition:
Keep different types of data separate - don't store addresses in your order table, link to an address in a separate addresses table, for example.
I personally like having an integer or long surrogate key on each table (that holds data, not those that link different tables together, e,g., m:n relationships) that is the primary key.
I also like having a created and modified timestamp column.
Ensure that every column that you do "where column = val" in any query has an index. Maybe not the most perfect index in the world for the data type, but at least an index.
Set up your foreign keys. Also set up ON DELETE and ON MODIFY rules where relevant, to either cascade or set null, depending on your object structure (so you only need to delete once at the 'head' of your object tree, and all that object's sub-objects get removed automatically).
If you want to modularise your code, you might want to modularise your DB schema - e.g., this is the "customers" area, this is the "orders" area, and this is the "products" area, and use join/link tables between them, even if they're 1:n relations, and maybe duplicate the important information (i.e., duplicate the product name, code, price into your order_details table). Read up on normalisation.
Someone else will recommend exactly the opposite for some or all of the above :p - never one true way to do some things eh!
I really liked this article..
http://www.codeproject.com/Articles/359654/important-database-designing-rules-which-I-fo
Head First SQL is a great introduction.
These are questions which, in my opionion, requires different knowledge from different domains.
You just can't know in advance "which" tables to build, you have to know the problem you have to solve and design the schema accordingly;
This is a mix of database design decision and your database vendor custom capabilities (ie. you should check the documentation of your (r)dbms and eventually learn some "tips & tricks" for scaling), also the configuration of your dbms is crucial for scaling (replication, data partitioning and so on);
again, almost every rdbms comes with a particular "dialect" of the SQL language, so if you want efficient queries you have to learn that particular dialect --btw. much probably write elegant query which are also efficient is a big deal: elegance and efficiency are frequently conflicting goals--
That said, maybe you want to read some books, personally I've used this book in my datbase university course (and found a decent one, but I've not read other books in this field, so my advice is to check out for some good books in database design).
It's been a while since I read it (so, I'm not sure how much of it is still relevant), but my recollection is that Joe Celko's SQL for Smarties book provides a lot of info on writing elegant, effective, and efficient queries.