How to think in SQL?

How to think in SQL? - sql

How do I stop thinking every query in terms of cursors, procedures and functions and start using SQL as it should be? Do we make the transition to thinking in SQL just by practise or is there any magic to learning the set based query language? What did you do to make the transition?

A few examples of what should come to your mind first if you're real SQL geek:
Bible concordance is a FULLTEXT index to the Bible
Luca Pacioli's Summa de arithmetica which describes double-entry bookkeeping is in fact a normalized database schema
When Xerxes I counted his army by walling an area that 10,000 of his men occupied and then marching the other men through this enclosure, he used HASH AGGREGATE method.
The House That Jack Built should be rewritten using a self-join.
The Twelve Days of Christmas should be rewritten using a self-join and a ROWNUM
There Was An Old Woman Who Swallowed a Fly should be rewritten using CTE's
If the European Union were called European Union All, we would see 27 spellings for the word euro on a Euro banknote, instead of 2.
And finally you can read a lame article in my blog on how I stopped worrying and learned to love SQL (I almost forgot I wrote it):
Click
And one more article just on the subject:
Double-thinking in SQL

The key thing is you're manipulating SETS & elements of sets; and relating different sets (and corresponding elements) together. That's really the heart of it, imho. That's why every table should have a primary key; why you see set operators in the language; and why set operators like UNION won't (by defualt) return duplicate rows.
Of course in practice, the rules of sets are bent or broken but it's not that hard to see when this is necessary (otherwise, SQL would be TOO limited). Imho, just crack open your discrete math book and reacquaint yourself with some set exercises.

Best advice I can give you is that every time you think about processing something row-by-row, that you stop and ask yourself if there is a set-based way to do this.

Joe Celko's Thinking in Sets (book)
Perfectly intelligent programmers
often struggle when forced to work
with SQL. Why? Joe Celko believes the
problem lies with their procedural
programming mindset, which keeps them
from taking full advantage of the
power of declarative languages. The
result is overly complex and
inefficient code, not to mention lost
productivity.
This book will change the way you
think about the problems you solve
with SQL programs.. Focusing on three
key table-based techniques, Celko
reveals their power through detailed
examples and clear explanations. As
you master these techniques, you’ll
find you are able to conceptualize
problems as rooted in sets and
solvable through declarative
programming. Before long, you’ll be
coding more quickly, writing more
efficient code, and applying the full
power of SQL.

When people ask me about joins I send them here it has a great visual representation on what they are!

The way that I learned was by doing a lot of queries, and working at a job that required you to think in terms of result sets.
From your question, it seems like you've been writing lots of front-end code that uses sequential/procedural/iterative data manipulation. If you don't get on any projects that require you to use result set skills, I personally wouldn't worry about it.
One thing you might want to try is by trying to write analytical queries, e.g., generating simplistic reports on your data. In those cases you are trying to summarize large amounts of data by cordoning them off into sets.
Another good way would be to read a book on the theoretical/mathematical foundations to RDBMSes. Those deal strictly with set theory and how parts of the SQL query syntax relate directly with the math behind it. Of course, this requires you to like math. :)

I found that the Art Of SQL was a useful kick in the head for getting into the right mindset.
Part of this, however, comes down to style.
Obviously, you need to start thinking in result sets and not just procedurally.
However, once you've start that, you will often find decisions have to be made.
Do you write the incredibly complex update statement that may be difficult to understand by anyone but yourself, and difficult to maintain, or do you write a less efficient, but easier to manage procedure?
I would HIGHLY suggest that you remember that SQL statements can have comments in them to clarifiy what they are doing, not just stored procedures.
link: The Art Of SQL

One exercise you might want to try is this:
Take some of your existing reporting code from your application layer, preferably something that produces a single, tabular data set. Starting with the most basic elements, port it over to an SQL View.
Take all of the columns pulled from a single table and write the SQL statement to select that data. Then join on one table at a time and start figuring out the appropriate conditions and logic for your output.
You might come up against some particular task that at first seems impossible in SQL, but depending on the implementation you are programming against, there is almost always a way to get the result you're looking for. Check the documentation for your SQL implementation, or try Google.
This exercise has the benefit of giving you an original report to test against, so you know if you're getting the output you expect.
A few things to watch out for:
Recursion and graphs are fairly advanced techniques; you might want to start with something easier. (Joe Celko has a good book on the topic, if you're interested.)
There's often a big difference between a BIT and a C-style bool. At the very least, you may have to explicitly cast your output from INT to BIT.
OUTER JOINs are useful when a portion of the data might be empty, but try not to abuse them.

I think it takes a while to adjust (it was long ago for me, so I don't remember too well). But perhaps the key point is that SQL is declarative - i.e. you specify what you want done, not precisely how it should be done procedurally. So for a simple example:
"Get me the names and salaries of employees in departments located in London"
The relevant SQL is almost natural:
select name, salary
from employees
join departments on departments.deptno = employees.deptno
where departments.location = 'London';
We have "told" SQL how to join departments to employees, but only declaratively (NATURAL JOIN removes the need to do that, but is dangerous so not used in practice). We haven't defined procedurally how it should be done (e.g. "for each department, find all employees...") SQL is free to choose the optimal method to perform the query.

Thinking of rows makes sense when you use SQL to dump a table to your file system and then do whatever has to be done in your favorite programming language.
Not too much leverage of SQL; waste of disk, memory, cpu and human resources.
Think of SQL as of English (or whatever human language you prefer).
Show me all customers who ride bulls and get drunk every day but never visited Indonesia with their mother-in-law whose phone number is the same as my friend Doug's except for the area code.
You can do it (and much more) in one SQL statement, just learn how to. It's very lucrative.

Related

What is the relational database equivalent of Factorial and the Fibonacci function?

When learning a new programming language there are always a couple of traditional problems that are good to get yourself moving. For example, Hello world and Fibonacci will show how to read input, print output and compute functions (the bread and butter that will solve basically everything) and while they are really simple they are nontrivial enough to be worth their time (and there is always some fun to be had by calculating the factorial of a ridiculously large number in a language with bignums)
So now I'm trying to get to grips with some SQL system and all the textbook examples I can think of involve mind-numbingly boring tables like "Student" or "Employee". What nice alternate datasets could I use instead? I am looking for something that (in order of importance) ...
The data can be generated by a straightforward algorithm.
I don't want to have to enter things by hand.
I want to be able to easily increase the size of my tables to stress efficiency, etc
Can be used to showcase as much stuff as possible. Selects, Joins, Indexing... You name it.
Can be used to get back some interesting results.
I can live with "boring" data manipulation if the data is real and has an use by itself but I'd rather have something more interesting if I am creating the dataset from scratch.
In the worst case, I at least presume there should be some sort of benchmark dataset out there that would at least fit the first two criteria and I would love to hear about that too.

The benchmark database in the Microsoft world is Northwind. One similar open source (EPL) one is Eclipse's Classic Models database.
You can't autogenerate either as far as I know.
However, Northwind "imports and exports specialty foods from around the world", while Classic Models sells "scale models of classic cars". Both are pretty interesting. :)

SQL is a query language, not a procedural language, so unless you will be playing with PL/SQL or something similar, your examples will be manipulating data.
So here is what was fun for me -- data mining! Go to:
http://usa.ipums.org/usa/
And download their micro-data (you will need to make an account, but its free).
You'll need to write a little script to inject the fixed width file into your db, which in itself should be fun. And you will need to write a little script to auto create the fields (since there are many) based on parsing their meta-file. That's fun, too.
Then, you can start asking questions. Suppose the questions are about house prices:
Say you want to look at the evolution of house price values by those with incomes in the top 10% of the population over the last 40 years. Then restrict to if they are living in california. See if there is a correlation between income and the proportion of mortgage payments as a percentage of income. Then group this by geographic area. Then see if there is a correlation between those areas with the highest mortgage burden and the percentage of units occupied by renters. Your db will have some built-in statistical functions, but you can always program your own as well -- so correl might be the equivalent of fibonnacci. Then write a little script to do the same thing in R, importing data from your db, manipulating it, and storing the result.
The best way to learn about DBs is to use them for some other purpose.
Once you are done playing with iPUMS, take a look at GEO data, with (depending on your database) something like PostGis -- the only difference is that iPUMS gives you resolution in terms of tracts, whereas GIS data has latitude/longitude coordinates. Then you can plot a heat map of mortgage burdens for the U.S., and evolve this heat map over different time scales.

Perhaps you can do something with chemistry. Input the 118 elements, or extract them for an online source. Use basic rules to combine them into molecules, which you can store in the database. Combine molecules into bigger molecules and perform more complex queries upon them.

You will have a hard time finding database agnostic tutorials. The main reason for that is that the SQL-92 standard on which most examples are based on is plain old boring. There are updated standards, but most database agnostic tutorials will dumb-it-down to the lowest common denomiator: SQL-92.
If you want to learn about databases as a software engineer, I would definitely recommend starting with Microsoft SQL Server. There are many reasons for that, some are facts, some are opinions. The primary reason though is that it's a lot easier to get a lot further with SQL Server.
As for sample data, Northwind has been replaced by AdventureWorks. You can get the latest versions from codeplex. This is a much more realistic database and allows demonstrating way more than basic joins, filtering and roll-ups. The great thing too, is that it is actually maintained for each release of SQL Server and updated to showcase some of the new features of the database.
Now, for your goal #1, well, I would consider the scaling out an exercise. After you go through the basic and boring stuff, you should gradually be able to perform efficient large-scale data manipulation and while not really generating data, at least copy/paste/modify your SQL data to take it to the size you think.
Keep in mind though that benchmarking databases is not trivial. The performance and efficiency of a database depends on many aspect of your application. How it is used is just as important as how it is setup.
Good luck and do let us know if you find a viable solution outside this forum.

Implement your genealogical tree within a single table and print it. In itself is not a very general problem, but the approach certainly is, and it should prove reasonably challenging.

Geographic data can showcase a lot of SQL capabilities while being somewhat complicated (but not too complicated). It's also readily available from many sources online - international organizations, etc.
You could create a database with countries, cities, zip codes, etc. Mark capitals of countries (remember that some countries have more than one capital city...). Include GIS data if you want to get really fancy. Also, consider how you might model different address information. Now what if the address information had to support international addresses? You can do the same with phone numbers as well. Once you get the hang of things you could even integrate with Google Maps or something similar.
You'd likely have to do the database design and import work yourself, but really that's a pretty huge part of working with databases.

Eclipse's Classic Model database is the best open source database equivalent of Factorial and the Fibonacci function .And Microsoft's Northwind is the another powerful alternative that you can use .

Are there any open-source query-language agnostic relational storage engines?

I've been studying proper relational algebra, from Christopher Date's book Database in Depth: Relational Theory for Practitioners. Throughout the book he uses the language he and Hugh Darwen came up with in order to convey the theory — Tutorial D. In general I think Tutorial D is a very workable query language, much more flexible than SQL and so I (just for fun) was keen to take a stab at writing a (poor performing, undoubtedly) little RDBMS based on Tutorial D, rather than SQL.
Realizing this is a mammoth of a task even just to make something basic, I wonder if there are existing storage systems available that don't represent tables in the SQL sense, but represent relations in the relational sense and don't assume any particular query language is used to access the data, but rather just provide low-level functions like product, join, intersect, union, project etc (at the C-level, not at a query language level).
Am I making sense? :) Basically I'd like to take something like this and stick a Tutorial D (or similar) query interface in front of it.
It's really easy to do everything in memory, but representing the data structures on disk in a fashion that is even mildly efficient is pretty tricky and probably over my head without some serious research.

General SQL-based RDBMS that use SQL as an interface for structured input between the user and the database engine use what is called a Query Optimizer which takes the query expression and generates a set of Execution Plans.
The most optimal execution plan is then executed on the database; that's what generates result sets.
So, if you took an open source RDBMS implementation and wanted to modify it to accept a different query language, all you would have to do would be to translate the query language of your choice into an execution plan.
That's not to say that what you're trying to do is easy. Just that it should be possible, without having to write your own RDBMS. You would need to write a lexer and interpreter for your query language and then figure out how to transfer your interpreted query expression to the database engine's optimizer so that it can generate the execution plans, and execute the most efficient of them.
Take a look a SQLite as a compact open source relational database engine.

Dave Voorhis' Rel already does what you seem to want to build.
http://dbappbuilder.sourceforge.net/Rel.php
Unless of course it is your express purpose to try and build for yourself ...

Note that a front end for Tutorial D would not be query-language agnostic ;)
My vote also goes for Rel.
Hugh Darwen maintains a list of projects related to TTM (the spec for a D language of which Tutorial D is an implementation), I'm sure he would love to hear of your efforts if they come to anything.

For really complex reports, do people sometimes code in their language rather than in sql?

I have some pretty complex reports to write. Some of them... I'm not sure how I could write an sql query for just one of the values, let alone stuff them in a single query.
Is it common to just pull a crap load of data and figure it all via code instead? Or should I try and find a way to make all the reports rely on sql?
I have a very rich domain model. In fact, parts of code can be expanded on to calculate exactly what they want. The actual logic is not all that difficult to write - and it's nicer to work my domain model than with SQL. With SQL, writing the business logic, refactoring it, testing it and putting it version control is a royal pain because it's separate from your actual code.
For example, one the statistics they want is the % of how much they improved, especially in relation to other people in the same class, the same school, and compared to other schools. This requires some pretty detailed analysis of how they performed in the past to their latest information, as well as doing a calculation for the groups you are comparing against as a whole. I can't even imagine what the sql query would even look like.
The thing is, this % improvement is not a column in the database - it involves a big calculation in of itself by analyzing all the live data in real-time. There is no way to cache this data in a column as doing this calculation for every row it's needed every time the student does something is CRAZY.
I'm a little afraid about pulling out hundreds upon hundreds of records to get these numbers though. I may have to pull out that many just to figure out 1 value for 1 user... and if they want a report for all the users on a single screen, it's going to basically take analyzing the entire database. And that's just 1 column of values of many columns that they want on the report!
Basically, the report they want is a massive performance hog no matter what method I choose to write it.
Anyway, I'd like to ask you what kind of solutions you've used to these kind of a problems.

Sometimes a report can be generated by a single query. Sometimes some procedural code has to be written. And sometimes, even though a single query CAN be used, it's much better/faster/clearer to write a bit of procedural code.
Case in point - another developer at work wrote a report that used a single query. That query was amazing - turned a table sideways, did some amazing summation stuff - and may well have piped the output through hyperspace - truly a work of art. I couldn't have even conceived of doing something like that and learned a lot just from readying through it. It's only problem was that it took 45 minutes to run and brought the system to its knees in the process. I loved that query...but in the end...I admit it - I killed it. ((sob!)) I dismembered it with a chainsaw while humming "Highway To Hell"! I...I wrote a little procedural code to cover my tracks and...nobody noticed. I'd like to say I was sorry, but...in the end the job ran in 30 seconds. Oh, sure, it's easy enough to say "But performance matters, y'know"...but...I loved that query... ((sniffle...)) Anybody seen my chainsaw..? >;->
The point of the above is "Make Things As Simple As You Can, But No Simpler". If you find yourself with a query that covers three pages (I loved that query, but...) maybe it's trying to tell you something. A much simpler query and some procedural code may take up about the same space, page-wise, but could possibly be much easier to understand and maintain.
Share and enjoy.

Sounds like a challenging task you have ahead of you. I don't know all the details, but I think I would go at it from several directions:
Prioritize: You should try to negotiate with the "customer" and prioritize functionality. Chances are not everything is equally useful for them.
Manage expectations: If they have unrealistic expectations then tell them so in a nice way.
IMHO SQL is good in many respects, but it's not a brilliant programming language. So I'd rather just do calculations in the application rather than in the database.
I think I'd go for some delay in the system .. perhaps by caching calculated results for some minutes before recalculating. This is with a mind towards performance.

The short answer: for analysing large quantities of data, a SQL database is probably the best tool around.
However, that does not mean you should analyse this straight off your production database. I suggest you look into Datawarehousing.

For a one-off report, I'll write the code to produce it in whatever I can best reason about it in.
For a report that'll be generated more than once, I'll check on who is going to be producing it the next time. I'll still write the code in whatever I can best reason about it in, but I might add something to make it more attractive to use to that other person.

People usually use a third party report writing system rather than writing SQL. As an application developer, if you're spending a lot of time writing complex reports, I would severely question your manager's actions in NOT buying an off-the-shelf solution and letting less-skilled people build their own reports using some GUI.

How do you think while formulating Sql Queries. Is it an experience or a concept?

I have been working on sql server and front end coding and have usually faced problem formulating queries.
I do understand most of the concepts of sql that are needed in formulating queries but whenever some new functionality comes into the picture that can be dont using sql query, i do usually fails resolving them.
I am very comfortable with select queries using joins and all such things but when it comes to DML operation i usually fails
For every query that i never done before I usually finds uncomfortable with that while creating them. Whenever I goes for an interview I usually faces this problem.
Is it their some concept behind approaching on formulating sql queries.
Eg.
I need to create an sql query such that
A table contain single column having duplicate record. I need to remove duplicate records.
I know i can find the solution to this query very easily on Googling, but I want to know how everyone comes to the desired result.
Is it something like Practice Makes Man Perfect i.e. once you did it, next time you will be able to formulate or their is some logic or concept behind.
I could have get my answer of solving above problem simply by posting it on stackoverflow and i would have been with an answer within 5 to 10 minutes but I want to know the reason. How do you work on any new kind of query. Is it a major contribution of experience or some an implementation of concepts.
Whenever I learns some new thing in coding section I tries to utilize it wherever I can use it. But here scenario seems to be changed because might be i am lagging in some concepts.
EDIT
How could I test my knowledge and
concepts in Sql and related sql
queries ?

Typically, the first time you need to open a child proof bottle of pills, you have a hard time, but after that you are prepared for what it might/will entail.
So it is with programming (me thinks).
You find problems, research best practices, and beat your head against a couple of rocks, but in the process you will come to have a handy set of tools.
Also, reading what others tried/did, is a good way to avoid major obsticles.
All in all, with a lot of practice/coding, you will see patterns quicker, and learn to notice where to make use of what tool.

I have a somewhat methodical method of constructing queries in general, and it is something I use elsewhere with any problem solving I need to do.
The first step is ALWAYS listing out any bits of information I have in a request. Information is essentially anything that tells me something about something.
A table contain single column having
duplicate record. I need to remove
duplicate
I have a table (I'll call it table1)
I have a
column on table table1 (I'll call it col1)
I have
duplicates in col1 on table table1
I need to remove
duplicates.
The next step of my query construction is identifying the action I'll take from the information I have.
I'll look for certain keywords (e.g. remove, create, edit, show, etc...) along with the standard insert, update, delete to determine the action.
In the example this would be DELETE because of remove.
The next step is isolation.
Asnwer the question "the action determined above should only be valid for ______..?" This part is almost always the most difficult part of constructing any query because it's usually abstract.
In the above example you're listing "duplicate records" as a piece of information, but that's really an abstract concept of something (anything where a specific value is not unique in usage).
Isolation is also where I test my action using a SELECT statement.
Every new query I run gets thrown through a select first!
The next step is execution, or essentially the "how do I get this done" part of a request.
A lot of times you'll figure the how out during the isolation step, but in some instances (yours included) how you isolate something, and how you fix it is not the same thing.
Showing duplicated values is different than removing a specific duplicate.
The last step is implementation. This is just where I take everything and make the query...
Summing it all up... for me to construct a query I'll pick out all information that I have in the request. Using the information I'll figure out what I need to do (the action), and what I need to do it on (isolation). Once I know what I need to do with what I figure out the execution.
Every single time I'm starting a new "query" I'll run it through these general steps to get an idea for what I'm going to do at an abstract level.
For specific implementations of an actual request you'll have to have some knowledge (or access to google) to go further than this.
Kris

I think in the same way I cook dinner. I have some ingredients (tables, columns etc.), some cooking methods (SELECT, UPDATE, INSERT, GROUP BY etc.) then I put them together in the way I know how.
Sometimes I will do something weird and find it tastes horrible, or that it is amazing.
Occasionally I will pick up new recipes from the internet or friends, then use parts of these in my own.
I also save my recipes in handy repositories, broken down into reusable chunks.

On the "Delete a duplicate" example, I'd come to the result by googling it. This scenario is so rare if the DB is designed properly that I wouldn't bother keeping this information in my head. Why bother, when there is a good resource is available for me to look it up when I need it?
For other queries, it really is practice makes perfect.
Over time, you get to remember frequently used patterns just because they ARE frequently used. Rare cases should be kept in a reference material. I've simply got too much other stuff to remember.

Find a good documentation to your software. I am using Mysql a lot and Mysql has excellent documentation site with decent search function so you get many answers just by reading docs. If you do NOT get your answer at least you are learning something.
Than I set up an example database (or use the one I am working on) and gradually build my SQL. I tend to separate the problem into small pieces and solve it step by step - this is very successful if you are building queries including many JOINS - it is best to start with some particular case and "polute" your SQL with many conditions like WHEN id = "123" which you are taking out as you are working towards your solution.
The best and fastest way to learn good SQL is to work with someone else, preferably someone who knows more than you, but it is not necessarry condition. It can be replaced by studying mature code written by others.

Your example is a test of how well you understand the DISTINCT keyword and the GROUP BY clause, which are SQL's ways of dealing with duplicate data.

Examples and experience. You look at other peoples examples and you create your own code and once it groks, you don't need to think about it again.

I would have a look at the Mere Mortals book - I think it's the one by Hernandez. I remember that when I first started seriously with SQL Server 6.5, moving from manual ISAM databases and Access database systems using VB4, that it was difficult to understand the syntax, the joins and the declarative style. And the SQL queries, while powerful, were very intimidating to understand - because typically, I was looking at generated code in Microsoft Access.
However, once I had developed a relatively systematic approach to building queries in a consistent and straightforward fashion, my skills and confidence quickly moved forward.

From seeing your responses you have two options.
Have a copy of the specification for whatever your working on (SQL spec and the documentation for the SQL implementation (SQLite, SQL Server etc..)
Use Google, SO, Books, etc.. as a resource to find answers.
You can't formulate an answer to a problem without doing one of the above. The first option is to become well versed into the capabilities of whatever you are working on.
The second option allows you to find answers that you may not even fully know how to ask. You example is fairly simplistic, so if you read the spec/implementation documentaion you would know the answer right away. But there are times, where even if you read the spec/documentation you don't know the answer. You only know that it IS possible, just not how to do it.
Remember that as far as jobs and supervisors go, being able to resolve a problem is important, but the faster you can do it the better which can often be done with option 2.

Fastest way to become a MySQL expert?

I have been using MySQL for years, mainly on smaller projects until the last year or so. I'm not sure if it's the nature of the language or my lack of real tutorials that gives me the feeling of being unsure if what I'm writing is the proper way for optimization purposes and scaling purposes.
While self-taught in PHP I'm very sure of myself and the code I write, easily can compare it to others and so on.
With MySQL, I'm not sure whether (and in what cases) an INNER JOIN or LEFT JOIN should be used, nor am I aware of the large amount of functionality that it has. While I've written code for databases that handled tens of millions of records, I don't know if it's optimum. I often find that a small tweak will make a query take less than 1/10 of the original time... but how do I know that my current query isn't also slow?
I would like to become completely confident in this field in the ability to optimize databases and be scalable. Use is not a problem -- I use it on a daily basis in a number of different ways.
So, the question is, what's the path? Reading a book? Website/tutorials? Recommendations?

EXPLAIN is your friend for one. If you learn to use this tool, you should be able to optimize your queries very effectively.
Scan the the MySQL manual and read Paul DuBois' MySQL book.
Use EXPLAIN SELECT, SHOW VARIABLES, SHOW STATUS and SHOW PROCESSLIST.
Learn how the query optimizer works.
Optimize your table formats.
Maintain your tables (myisamchk, CHECK TABLE, OPTIMIZE TABLE).
Use MySQL extensions to get things done faster.
Write a MySQL UDF function if you notice that you would need some
function in many places.
Don't use GRANT on table level or column level if you don't really need
it.
http://dev.mysql.com/tech-resources/presentations/presentation-oscon2000-20000719/index.html

The only way to become an expert in something is experience and that usually takes time. And a good mentor(s) that are better than you to teach you what you are missing. The problem is you don't know what you don't know.

Research and experience - if you don't have the projects to warrant the research, make them. Make three tables with related data and make up scenarios.
E.g.
Make a table of movies their data
make a table of user
make a table of ratings for users
spend time learning how joins work, how to get movies of a particular rating range in one query, how to search the movies table ( like, regex) - as mentioned, use explain to see how different things affect speed. Make a day of it; I guarantee your
handle on it will be greatly increased.
If you're still struggling for case-scenarios, start looking here on SO for questions and try out those scenarios yourself.

I don't know if MIT open courseware has anything about databases... Well whaddya know? They do: http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Computer-Science/6-830Fall-2005/CourseHome/
I would recommend that as one source based only on MITs reputation. If you can take a formal course from a university you may find that helpful. Also a good understanding of the fundamental discrete mathematics/logic certainly would do no harm.

As others have said, time and practice is the only real approach.
More practically, I found that EXPLAIN worked wonders for me personally. Learning to read the output of that was probably the biggest single leap I made in being able to write efficient queries.
The second thing I found really helpful was SQL Tuning by Dan Tow, which describes a fairly formal methodology for extracting performance. It's a bit involved, but works well in lots of situations. And if nothing else, it will give you a much better understanding of the way joins are processed.

Start with a class like this one: https://www.udemy.com/sql-mysql-databases/
Then use what you've learned to create and manage a number of SQL databases and run queries. Getting to the expert level is really about practice. But of course you need to learn the pieces before you can practice.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas