What is a good 'FizzBuzz' question for a SQL programmer? - sql

We are looking to hire a SQL programmer and need a good screening question similar to the FizzBuzz question but for SQL.
While it is certainly possible to write a FizzBuzz solution using SQL, I think the effort is misplaced. The FizzBuzz question assesses coding fundamentals such as looping, conditionals, output, and basic math. With SQL, I think something related to queries, joins, projections, and the like would be more appropriate. But, just as with FizzBuzz, it should be simple enough that 'good' SQL programmers can write a solution on paper in a couple minutes.
What is a good 'FizzBuzz' question for a SQL programmer?

We typically use something like this as a bare minimum for SQL:
Given the tables:
Customers: CustomerID, CustomerName
Orders: OrderID, CustomerID, ProductName, UnitPrice, Quantity
Calculate the total value of orders for each customer showing CustomerName and TotalPrice.
In our view, this is a pretty simple question requiring a join on two tables, grouping, and an aggregate function. We're amazed at how many people we talk to that presumably write database code in their job can't remember join syntax (and we never care which syntax they use, MSSQL style or Oracle style or something else).
What I like about this question is it lends itself to follow up questions like
How would you find all customers that ordered more than $1000 total?
How would you normalize these tables?
How would you optimize the queries?

A "FizzBuzz" is supposed to be so simple that anyone who can program at all should be able to solve it, and a good programmer should be able to solve it almost without thinking, right?
So maybe something like this:
First, take two tables, Employees and Departments, with a foreign key from Employees that shows which department each employee works for. (Typical boring example, from almost any database textbook.) Then let them write a query that involves both tables, such as "give me the names of all employees who work for the Cleaning department".
Then do exactly the same thing, but not with employees that work for departments, but with mice that are eaten by cats, or something else that is not an exact copy of the employee-department or student-course examples in the database textbook.
If they can find who works at the Cleaning department, but have no idea how to find which mice were eaten by the cat Tom, don't hire!

I would probably do something that requires an inner join, a left join and a where clause with both an AND and an OR condition. Also specify what fields you want returned. You would be looking to see if they recognize that they need a left join fromthe problem description, that they use explicit join syntax and that they use () to make the meaning of the and/or clear. You would also be looking to see if they used select * even though you specified what fields you wanted.

Stick with fizzbuzz, just change the number from 100 to 10000000 and say that the solution has to be reasonably efficient.

SQL Developer or SQL DBA ? for a developer something about cursors; the syntax is a pain and a good one would question why you need to use it. For a dba give them a cursor and ask them to fix it ;)

Related

linked table or sub queries?

I was looking up different ways to write a query, and I'm just wondering which way do you all think is better way to go with the following options:
SELECT a.salary
FROM emp a
JOIN emp b ON a.salary < b.salary
WHERE b.id = 200
or
SELECT salary
FROM emp
WHERE salary < (SELECT salary
FROM emp
WHERE id = 200)
I did some execution times with about 300 records in the table, and they come out about the same, so this is really just more about preference and accepted standards. I personally like the 2nd way better (just easier to read to me). I have a feeling the 1st is standard though. What do you all think?
I don't think there is a definite answer to this question as it's really dependent on the context of the query itself & the RDMS.
So in your case, you've done the research with 300 records. How about 3,000,000? Does it make a difference?
Personally I like Joins over Sub-Queries if I can help it, but like I said, this is really determined on a case by case basis.
SQL is a declarative language, which broadly means you tell the database what you want, and its up to the database to decide to how best get it.
For that reason, a lot is to be said for writing your query in the way that makes its goal most obvious/readable (which is why a subquery may make sense here).
However, not all RDBMSs are equal, and the ability of the query optimizer to "equate" queries which are technically the same, but which are written differently, varies from db to db.
For instance, MySQL only has a nested-loop join algorithm at its disposal, and that can be an issue when dealing with subqueries for large data sets. You'll have to try it out with different data sets, taking a look at what the optimizer is doing behind the scenes.

Strategy for avoiding a common sql development error (misleading result on join bug)

Sometimes when i'm writing moderately complex SELECT statements with a few JOINs, wrong key columns are sometimes used in the JOIN statement that still return valid-looking results.
Because the auto numbering values (especially early in development) all tend to fall in similar ranges (sub 100s or so) the SELECT sill produces some results. These results often look valid at first glance and a problem is not detected until much, much later making debugging much more difficult because familiarity with the data structures and code has staled. (Gone stale in the dev's mind.)
i just spent several hours tracking down yet another of this issue that i've run into a too many times before. i name my tables and columns carefully, write my SQL statements methodically but this is an issue i can't seem to competely avoid. It comes back and bites me for hours of productivity about twice a year on average.
My question is: Has anyone come up with a clever method for avoiding this; what i assume is probably a common SQL bug/mistake?
i have thought of trying to auto-number starting with different start values but this feels cludgy and would get ugly trying to keep such a scheme straight for data models with dozens of tables... Any better ideas?
P.S.
i am very careful and methodical in naming my tables and columns. Patient table gets PatientId column, Facility get a FacilityId etc. This issues tends to arise when there are join tables involved where the linkage takes on extra meaning such as: RelatedPatientId, ReferingPatientId, FavoriteItemId etc.
When writing long complex SELECT statements try to limit the result to one record.
For instance, assume you have this gigantic enormous awesome CMS system and you have to write internal reports because the reports that come with it are horrendous. You notice that there are about 500 tables. Your select statement joins 30 of these tables. Your result should limit your row count by using a WHERE clause.
My advice is to rather then get all this code written and generalized for all cases, break the problem up and use WHERE and limit the row count to only say a record. Check all fields, if they look ok, break it up and let your code return more rows. Only after further checking should you generalize.
It bites a lot of us who keep adding more and more joins until it seems to look ok, but only after Joe Blow the accountant runs the report does he realize that the PO for 4 million was really the telephone bill for the entire year. Somehow that join got messed up!
One option would be to use your natural keys.
More practically, Red Gate SQL Prompt picks the FK columns for me.
I also tend to build up one JOIN at a time to see how things look.
If you have a visualization or diagramming tool for your SQL statements, you can follow the joins visually, and any errors will become immediately apparent, provided you have followed a sensible naming scheme for your primary and foreign keys.
Your column names should take care of this unless you named them all "ID". Are you writing multiple select statement using the same tables? You may want to create views for the more common ones.
If you're using SQL Server, you can use GUID columns as primary keys (that's what we do). You won't have problems with collisions again.
You could use GUIDs as your primary keys, but it has its pros and cons.
This pro is actually not mentioned on that page.
I have never tried doing this myself - I use a tool on top of SQL that makes incorrect joins very unlikely, so I don't have this problem. I just thought I'd mention it as another option though!
For IDs use TableNameID, for example for table Person, use PersonID
Use db model and look at the drawing when writing queries.
This way join looks like:
... ON p.PersonID = d.PersonID
as opposed to:
... ON p.ID = d.ID
Auto-increment integer PKs are among your best friends.

Has anyone written a higher level query langage (than sql) that generates sql for common tasks, on limited schemas

Sql is the standard in query languages, however it is sometime a bit verbose. I am currently writing limited query language that will make my common queries quicker to write and with a bit less mental overhead.
If you write a query over a good database schema, essentially you will be always joining over the primary key, foreign key fields so I think it should be unnecessary to have to state them each time.
So a query could look like.
select s.name, region.description from shop s
where monthly_sales.amount > 4000 and s.staff < 10
The relations would be
shop -- many to one -- region,
shop -- one to many -- monthly_sales
The sql that would be eqivilent to would be
select distinct s.name, r.description
from shop s
join region r on shop.region_id = region.region_id
join monthly_sales ms on ms.shop_id = s.shop_id
where ms.sales.amount > 4000 and s.staff < 10
(the distinct is there as you are joining to a one to many table (monthly_sales) and you are not selecting off fields from that table)
I understand that original query above may be ambiguous for certain schemas i.e if there the two relationship routes between two of the tables. However there are ways around (most) of these especially if you limit the schema allowed. Most possible schema's are not worth considering anyway.
I was just wondering if there any attempts to do something like this?
(I have seen most orm solutions to making some queries easier)
EDIT: I actually really like sql. I have used orm solutions and looked at linq. The best I have seen so far is SQLalchemy (for python). However, as far as I have seen they do not offer what I am after.
Hibernate and LinqToSQL do exactly what you want
I think you'd be better off spending your time just writing more SQL and becoming more comfortable with it. Most developers I know have gone through just this progression, where their initial exposure to SQL inspires them to bypass it entirely by writing their own ORM or set of helper classes that auto-generates the SQL for them. Usually they continue adding to it and refining it until it's just as complex (if not more so) than SQL. The results are sometimes fairly comical - I inherited one application that had classes named "And.cs" and "Or.cs", whose main functions were to add the words " AND " and " OR ", respectively, to a string.
SQL is designed to handle a wide variety of complexity. If your application's data design is simple, then the SQL to manipulate that data will be simple as well. It doesn't make much sense to use a different sort of query language for simple things, and then use SQL for the complex things, when SQL can handle both kinds of thing well.
I believe that any (decent) ORM would be of help here..
Entity SQL is slightly higher level (in places) than Transact SQL. Other than that, HQL, etc. For object-model approaches, LINQ (IQueryable<T>) is much higher level, allowing simple navigation:
var qry = from cust in db.Customers
select cust.Orders.Sum(o => o.OrderValue);
etc
Martin Fowler plumbed a whole load of energy into this and produced the Active Record pattern. I think this is what you're looking for?
Not sure if this falls in what you are looking for but I've been generating SQL dynamically from the definition of the Data Access Objects; the idea is to reflect on the class and by default assume that its name is the table name and all properties are columns. I also have search criteria objects to build the where part. The DAOs may contain lists of other DAO classes and that directs the joins.
Since you asked for something to take care of most of the repetitive SQL, this approach does it. And when it doesn't, I just fall back on handwritten SQL or stored procedures.

SQL Table Aliases - Good or Bad? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
What are the pros and cons of using table aliases in SQL? I personally try to avoid them, as I think they make the code less readable (especially when reading through large where/and statements), but I'd be interested in hearing any counter-points to this. When is it generally a good idea to use table aliases, and do you have any preferred formats?
Table aliases are a necessary evil when dealing with highly normalized schemas. For example, and I'm not the architect on this DB so bear with me, it can take 7 joins in order to get a clean and complete record back which includes a person's name, address, phone number and company affiliation.
Rather than the somewhat standard single character aliases, I tend to favor short word aliases so the above example's SQL ends up looking like:
select person.FirstName
,person.LastName
,addr.StreetAddress
,addr.City
,addr.State
,addr.Zip
,phone.PhoneNumber
,company.CompanyName
from tblPeople person
left outer join tblAffiliations affl on affl.personID = person.personID
left outer join tblCompany company on company.companyID = affl.companyID
... etc
Well, there are some cases you must use them, like when you need to join to the same table twice in one query.
It also depends on wether you have unique column names across tables. In our legacy database we have 3-letter prefixes for all columns, stemming from an abbreviated form from the table, simply because one ancient database system we were once compatible with didn't support table aliases all that well.
If you have column names that occur in more than one table, specifying the table name as part of the column reference is a must, and thus a table alias will allow for a shorter syntax.
Am I the only person here who really hates them?
Generally, I don't use them unless I have to. I just really hate having to read something like
select a.id, a.region, a.firstname, a.blah, b.yadda, b.huminahumina, c.crap
from table toys as a
inner join prices as b on a.blah = b.yadda
inner join customers as c on c.crap = something else
etc
When I read SQL, I like to know exactly what I'm selecting when I read it; aliases actually confuse me more because I've got to slog through lines of columns before I actually get to the table name, which generally represents information about the data that the alias doesn't. Perhaps it's okay if you made the aliases, but I commonly read questions on StackOverflow with code that seems to use aliases for no good reason. (Additionally, sometimes, someone will create an alias in a statement and just not use it. Why?)
I think that table aliases are used so much because a lot of people are averse to typing. I don't think that's a good excuse, though. That excuse is the reason we end up with terrible variable naming, terrible function acronyms, bad code...I would take the time to type out the full name. I'm a quick typer, though, so maybe that has something to do with it. (Maybe in the future, when I've got carpal tunnel, I'll reconsider my opinion on aliases. :P ) I especially hate running across table aliases in PHP code, where I believe there's absolutely no reason to have to do that - you've only got to type it once!
I always use column qualifiers in my statements, but I'm not averse to typing a lot, so I will gladly type the full name multiple times. (Granted, I do abuse MySQL's tab completion.) Unless it's a situation where I have to use an alias (like some described in other answers), I find the extra layer of abstraction cumbersome and unnecessary.
Edit: (Over a year later) I'm dealing with some stored procedures that use aliases (I did not write them and I'm new to this project), and they're kind of painful. I realize that the reason I don't like aliases is because of how they're defined. You know how it's generally good practice to declare variables at the top of your scope? (And usually at the beginning of a line?) Aliases in SQL don't follow this convention, which makes me grind my teeth. Thus, I have to search the entire code for a single alias to find out where it is (and what's frustrating is, I have to read through the logic before I find the alias declaration). If it weren't for that, I honestly might like the system better.
If I ever write a stored procedure that someone else will have to deal with, I'm putting my alias definitions in a comment block at the beginning of the file, as a reference. I honestly can't understand how you guys don't go crazy without it.
Good
As it has been mentioned multiple times before, it is a good practice to prefix all column names to easily see which column belongs to which table - and aliases are shorter than full table names so the query is easier to read and thus understand. If you use a good aliasing scheme of course.
And if you create or read the code of an application, which uses externally stored or dynamically generated table names, then without aliases it is really hard to tell at the first glance what all those "%s"es or other placeholders stand for. It is not an extreme case, for example many web apps allow to customize the table name prefix at installation time.
Microsoft SQL's query optimiser benefits from using either fully qualified names or aliases.
Personally I prefer aliases, and unless I have a lot of tables they tend to be single letter ones.
--seems pretty readable to me ;-)
select a.Text
from Question q
inner join Answer a
on a.QuestionId = q.QuestionId
There's also a practical limit on how long a Sql string can be executed - aliases make this limit easier to avoid.
If I write a query myself (by typing into the editor and not using a designer) I always use aliases for the table name just so I only have to type the full table name once.I really hate reading queries generated by a designer with the full table name as a prefix to every column name.
I suppose the only thing that really speaks against them is excessive abstraction. If you will have a good idea what the alias refers to (good naming helps; 'a', 'b', 'c' can be quite problematic especially when you're reading the statement months or years later), I see nothing wrong with aliasing.
As others have said, joins require them if you're using the same table (or view) multiple times, but even outside that situation, an alias can serve to clarify a data source's purpose in a particular context. In the alias's name, try to answer why you are accessing particular data, not what the data is.
I LOVE aliases!!!! I have done some tests using them vs. not and have seen some processing gains. My guess is the processing gains would be higher when you're dealing with larger datasets and complex nested queries than without. If I'm able to test this, I'll let you know.
You need them if you're going to join a table to itself, or if you use the column again in a subquery...
Aliases are great if you consider that my organization has table names like:
SchemaName.DataPointName_SubPoint_Sub-SubPoint_Sub-Sub-SubPoint...
My team uses a pretty standard set of abbreviations, so the guesswork is minimized. We'll have say ProgramInformationDataPoint shortened to pidp, and submissions to just sub.
The good thing is that once you get going in this manner and people agree with it, it makes those HAYUGE files just a little smaller and easier to manage. At least for me, fewer characters to convey the same info seems to go a little easier on my brain.
I like long explicit table names (it's not uncommon to be more than 100 characters) because I use many tables and if the names aren't explicit, I might get confused as to what each table stores.
So when I write a query, I tend to use shorter aliases that make sense within the scope of the query and that makes the code much more readable.
I always use aliases in my queries and it is part of the code guidebook in my company. First of all you need aliases or table names when there are columns with identical names in the joining tables. In my opinion the aliases improve readability in complex queries and allow me to see quickly the location of each columns. We even use aliases with single table queries, because experience has shown that single table queries donĀ“t stay single table for long.
IMHO, it doesn't really matter with short table names that make sense, I have on occasion worked on databases where the table name could be something like VWRECOFLY or some other random string (dictated by company policy) that really represents users, so in that case I find aliases really help to make the code FAR more readable. (users.username makes a lot more sence then VWRECOFLY.username)
I always use aliases, since to get proper performance on MSSQL you need to prefix with schema at all times. So you'll see a lot of
Select
Person.Name
From
dbo.Person As Person
I always use aliases when writing queries. Generally I try and abbreviate the table name to 1 or 2 representative letters. So Users becomes u and debtor_transactions becomes dt etc...
It saves on typing and still carries some meaning.
The shorter names makes it more readable to me as well.
If you do not use an alias, it's a bug in your code just waiting to happen.
SELECT Description -- actually in a
FROM
table_a a,
table_b b
WHERE
a.ID = b.ID
What happens when you do a little thing like add a column called Description to Table_B. That's right, you'll get an error. Adding a column doesn't need to break anything. I never see writing good code, bug free code, as a necessary evil.
Aliases are required when joining tables with columns that have identical names.

SQL Query Help - Scoring Multiple Choice Tests

Say I have a Student table, it's got an int ID. I have a fixed set of 10 multiple choice questions with 5 possible answers. I have a normalized answer table that has the question id, the Student.answer (1-5) and the Student.ID
I'm trying to write a single query that will return all scores over a certain pecentage. To this end I wrote a simple UDF that accepts the Student.answers and the correct answer, so it has 20 parameters.
I'm starting to wonder if it's better to denormalize the answer table, bring it into my applcation and let my application do the scoring.
Anyone ever tackle something like this and have insight?
If I understand your schema and question correctly, how about something like this:
select student_name, score
from students
join (select student_answers.student_id, count(*) as score
from student_answers, answer_key
group by student_id
where student_answers.question_id = answer_key.question_id
and student_answers.answer = answer_key.answer)
as student_scores on students.student_id = student_scores.student_id
where score >= 7
order by score, student_name
That should select the students with a score of 7 or more, for example. Just adjust the where clause for your purposes.
I would probably leave it up to your application to perform the scoring. Check out Maybe Normalizing Isn't Normal by Jeff Atwood.
The architecture you are talking about could become very cumbersome in the long run, and if you need to change the questions it means more changes to the UDF you are using.
I would think you could probably do your analysis in code without necessarily de-normalizing your database. De-normalization could also lend to inflexibility, or at least added expense to update, down the road.
No way, you definitely want to keep it normalized. It's not even that hard of a query.
Basically, you want to left join the students correct answers with the total answers for that question, and do a count. This will give you the percent correct. Do that for each student, and put the minimum percent correct in a where clause.
Denormalization is generally considered a last resort. The problem seems very similar to survey applications, which are very common. Without seeing your data model, it's difficult to propose a solution, but I will say that it is definitely possible. I'm wondering why you need 20 parameters to that function?
A relational set-based solution will be simpler and faster in most cases.
This query should be quite easy... assuming you have the correct answer stored in the question table. You do have the correct answer stored in the question table, right?