SQL Query Help - Scoring Multiple Choice Tests - dynamic-sql

Say I have a Student table, it's got an int ID. I have a fixed set of 10 multiple choice questions with 5 possible answers. I have a normalized answer table that has the question id, the Student.answer (1-5) and the Student.ID
I'm trying to write a single query that will return all scores over a certain pecentage. To this end I wrote a simple UDF that accepts the Student.answers and the correct answer, so it has 20 parameters.
I'm starting to wonder if it's better to denormalize the answer table, bring it into my applcation and let my application do the scoring.
Anyone ever tackle something like this and have insight?

If I understand your schema and question correctly, how about something like this:
select student_name, score
from students
join (select student_answers.student_id, count(*) as score
from student_answers, answer_key
group by student_id
where student_answers.question_id = answer_key.question_id
and student_answers.answer = answer_key.answer)
as student_scores on students.student_id = student_scores.student_id
where score >= 7
order by score, student_name
That should select the students with a score of 7 or more, for example. Just adjust the where clause for your purposes.

I would probably leave it up to your application to perform the scoring. Check out Maybe Normalizing Isn't Normal by Jeff Atwood.

The architecture you are talking about could become very cumbersome in the long run, and if you need to change the questions it means more changes to the UDF you are using.
I would think you could probably do your analysis in code without necessarily de-normalizing your database. De-normalization could also lend to inflexibility, or at least added expense to update, down the road.

No way, you definitely want to keep it normalized. It's not even that hard of a query.
Basically, you want to left join the students correct answers with the total answers for that question, and do a count. This will give you the percent correct. Do that for each student, and put the minimum percent correct in a where clause.

Denormalization is generally considered a last resort. The problem seems very similar to survey applications, which are very common. Without seeing your data model, it's difficult to propose a solution, but I will say that it is definitely possible. I'm wondering why you need 20 parameters to that function?
A relational set-based solution will be simpler and faster in most cases.

This query should be quite easy... assuming you have the correct answer stored in the question table. You do have the correct answer stored in the question table, right?

Related

Hibernate : queries with multiple tables

I am new to hibernate and working on a project where I need to extract data from DB using complex query. So just for example :
Let there are tables Student, Attendance, Subject and so on ..
Student contains (name(assume primary key),class,age,sex .. and other student data).
Attendance contains (student name, % attendance).
Subject contains (student , subjects )
I need to extract data for queries like
q1: (age > 20 && age < 22)
q2: class == Engineering
q3: should contain algorithm as it's subject.
student with (q1||q2) && q3.
Even query can be even more complex. like ((q1&&q2)||(q3&&q4)) && q5 ..
I have few questions:
Assuming all table have same primary key(and I am taking join on that).
What is best and most efficient way to do it.
is it possible to write single query for such complex expression and if possible is it
recommended ?
If it is not possible to write a single query for it. I can think of calculating it like a postfix expression, seems dirty ?
4.My understanding is if q1 and q2 belongs to same table AND/OR in where clause will work but if they belong to different table if have to take join and then condition need to be applied. Right ?
If anything looks stupid in my question , I am sorry for that just 2 days back I started working on it.
Any good resource for read will be helpful.
yes, you can create quite complex queries with a single statement. You might want to look into sub queries.
As for whether it is recommended or not, it would all depend on execution plan which would require a pick at actual datasets.
hope it helps a bit.
I believe that Hibernate Criteria API is perfect for your purpose. You can see some examples in this articles:
http://www.javalobby.org/articles/hibernatequery102/
http://www.tutorialspoint.com/hibernate/hibernate_criteria_queries.htm
Regards

SQL Best Practice: count(1) or count(*) [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Count(*) vs Count(1)
I remember anecdotally being told:
never use count(*) when count(1) will do
Recently I passed this advice on to another developer, and was challenged to prove this was true. My argument was what I was told along with when I was given the advice: that the database would only return the first column, which would then be counted. The counterargument was that the database wouldn't evaluate anything in the brackets.
From some (unscientific) testing on small tables, there certainly seems to be no difference. I don't currently have access to any large tables to experiment on.
I was given this advice when I was using Sybase, and tables had hundreds of millions of rows. I'm now working with Oracle and considerably less data.
So I guess in summary, my two questions are:
Which is faster, count(1) or count(*)?
Would this vary in different database vendors?
According to another similar question (Count(*) vs Count(1)), they are the same.
In Oracle, according to Ask Tom, count(*) is the correct way to count the number of rows because the optimizer changes count(1) to count(*). count(1) actually means to count rows with non-null 1's (all of them are non-null so optimizer will change it for you).
See
What is better in MYSQL count(*) or count(1)?
for MYSQL (no difference between count(*) and count(1))
Count(*) vs Count(1)
http://beyondrelational.com/blogs/dave_ballantyne/archive/2010/07/27/count-or-count-1.aspx
for MS Sql Server (no difference)
http://dbaspot.com/sybase/349079-count-vs-count-1-a.html
for Sybase (no difference)
In reading books specifically on TSQL and Microsoft SQL Server, I have read that using * is better because it lets the optimizer decide what is best to do. I'll try to find the names of the specific books and post those here.
This is such a basic query pattern, and the meaning is identical. I've read more than once that the optimizer treats them identically - can't find a specific reference right now but put this in the category of "institutional knowledge".
(should have searched first...http://stackoverflow.com/questions/1221559/count-vs-count1)
I can only speak to SQL Server, but testing on a 5 GB table, 11 mm records - both the number of reads and execution plan were identical.
I'd say there is no difference.
As far as I know using count() should be faster because when that function is called the engine counts only indexes. From another point of view probably both count() and count(1) in binary code look very similar, so there should be no difference.
count(1)
No, generally speaking this will always have slightly better performance.
It would only affect if upscaled to a drastic amount but it is good practice.

linked table or sub queries?

I was looking up different ways to write a query, and I'm just wondering which way do you all think is better way to go with the following options:
SELECT a.salary
FROM emp a
JOIN emp b ON a.salary < b.salary
WHERE b.id = 200
or
SELECT salary
FROM emp
WHERE salary < (SELECT salary
FROM emp
WHERE id = 200)
I did some execution times with about 300 records in the table, and they come out about the same, so this is really just more about preference and accepted standards. I personally like the 2nd way better (just easier to read to me). I have a feeling the 1st is standard though. What do you all think?
I don't think there is a definite answer to this question as it's really dependent on the context of the query itself & the RDMS.
So in your case, you've done the research with 300 records. How about 3,000,000? Does it make a difference?
Personally I like Joins over Sub-Queries if I can help it, but like I said, this is really determined on a case by case basis.
SQL is a declarative language, which broadly means you tell the database what you want, and its up to the database to decide to how best get it.
For that reason, a lot is to be said for writing your query in the way that makes its goal most obvious/readable (which is why a subquery may make sense here).
However, not all RDBMSs are equal, and the ability of the query optimizer to "equate" queries which are technically the same, but which are written differently, varies from db to db.
For instance, MySQL only has a nested-loop join algorithm at its disposal, and that can be an issue when dealing with subqueries for large data sets. You'll have to try it out with different data sets, taking a look at what the optimizer is doing behind the scenes.

What is a good 'FizzBuzz' question for a SQL programmer?

We are looking to hire a SQL programmer and need a good screening question similar to the FizzBuzz question but for SQL.
While it is certainly possible to write a FizzBuzz solution using SQL, I think the effort is misplaced. The FizzBuzz question assesses coding fundamentals such as looping, conditionals, output, and basic math. With SQL, I think something related to queries, joins, projections, and the like would be more appropriate. But, just as with FizzBuzz, it should be simple enough that 'good' SQL programmers can write a solution on paper in a couple minutes.
What is a good 'FizzBuzz' question for a SQL programmer?
We typically use something like this as a bare minimum for SQL:
Given the tables:
Customers: CustomerID, CustomerName
Orders: OrderID, CustomerID, ProductName, UnitPrice, Quantity
Calculate the total value of orders for each customer showing CustomerName and TotalPrice.
In our view, this is a pretty simple question requiring a join on two tables, grouping, and an aggregate function. We're amazed at how many people we talk to that presumably write database code in their job can't remember join syntax (and we never care which syntax they use, MSSQL style or Oracle style or something else).
What I like about this question is it lends itself to follow up questions like
How would you find all customers that ordered more than $1000 total?
How would you normalize these tables?
How would you optimize the queries?
A "FizzBuzz" is supposed to be so simple that anyone who can program at all should be able to solve it, and a good programmer should be able to solve it almost without thinking, right?
So maybe something like this:
First, take two tables, Employees and Departments, with a foreign key from Employees that shows which department each employee works for. (Typical boring example, from almost any database textbook.) Then let them write a query that involves both tables, such as "give me the names of all employees who work for the Cleaning department".
Then do exactly the same thing, but not with employees that work for departments, but with mice that are eaten by cats, or something else that is not an exact copy of the employee-department or student-course examples in the database textbook.
If they can find who works at the Cleaning department, but have no idea how to find which mice were eaten by the cat Tom, don't hire!
I would probably do something that requires an inner join, a left join and a where clause with both an AND and an OR condition. Also specify what fields you want returned. You would be looking to see if they recognize that they need a left join fromthe problem description, that they use explicit join syntax and that they use () to make the meaning of the and/or clear. You would also be looking to see if they used select * even though you specified what fields you wanted.
Stick with fizzbuzz, just change the number from 100 to 10000000 and say that the solution has to be reasonably efficient.
SQL Developer or SQL DBA ? for a developer something about cursors; the syntax is a pain and a good one would question why you need to use it. For a dba give them a cursor and ask them to fix it ;)

SQL Table Aliases - Good or Bad? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
What are the pros and cons of using table aliases in SQL? I personally try to avoid them, as I think they make the code less readable (especially when reading through large where/and statements), but I'd be interested in hearing any counter-points to this. When is it generally a good idea to use table aliases, and do you have any preferred formats?
Table aliases are a necessary evil when dealing with highly normalized schemas. For example, and I'm not the architect on this DB so bear with me, it can take 7 joins in order to get a clean and complete record back which includes a person's name, address, phone number and company affiliation.
Rather than the somewhat standard single character aliases, I tend to favor short word aliases so the above example's SQL ends up looking like:
select person.FirstName
,person.LastName
,addr.StreetAddress
,addr.City
,addr.State
,addr.Zip
,phone.PhoneNumber
,company.CompanyName
from tblPeople person
left outer join tblAffiliations affl on affl.personID = person.personID
left outer join tblCompany company on company.companyID = affl.companyID
... etc
Well, there are some cases you must use them, like when you need to join to the same table twice in one query.
It also depends on wether you have unique column names across tables. In our legacy database we have 3-letter prefixes for all columns, stemming from an abbreviated form from the table, simply because one ancient database system we were once compatible with didn't support table aliases all that well.
If you have column names that occur in more than one table, specifying the table name as part of the column reference is a must, and thus a table alias will allow for a shorter syntax.
Am I the only person here who really hates them?
Generally, I don't use them unless I have to. I just really hate having to read something like
select a.id, a.region, a.firstname, a.blah, b.yadda, b.huminahumina, c.crap
from table toys as a
inner join prices as b on a.blah = b.yadda
inner join customers as c on c.crap = something else
etc
When I read SQL, I like to know exactly what I'm selecting when I read it; aliases actually confuse me more because I've got to slog through lines of columns before I actually get to the table name, which generally represents information about the data that the alias doesn't. Perhaps it's okay if you made the aliases, but I commonly read questions on StackOverflow with code that seems to use aliases for no good reason. (Additionally, sometimes, someone will create an alias in a statement and just not use it. Why?)
I think that table aliases are used so much because a lot of people are averse to typing. I don't think that's a good excuse, though. That excuse is the reason we end up with terrible variable naming, terrible function acronyms, bad code...I would take the time to type out the full name. I'm a quick typer, though, so maybe that has something to do with it. (Maybe in the future, when I've got carpal tunnel, I'll reconsider my opinion on aliases. :P ) I especially hate running across table aliases in PHP code, where I believe there's absolutely no reason to have to do that - you've only got to type it once!
I always use column qualifiers in my statements, but I'm not averse to typing a lot, so I will gladly type the full name multiple times. (Granted, I do abuse MySQL's tab completion.) Unless it's a situation where I have to use an alias (like some described in other answers), I find the extra layer of abstraction cumbersome and unnecessary.
Edit: (Over a year later) I'm dealing with some stored procedures that use aliases (I did not write them and I'm new to this project), and they're kind of painful. I realize that the reason I don't like aliases is because of how they're defined. You know how it's generally good practice to declare variables at the top of your scope? (And usually at the beginning of a line?) Aliases in SQL don't follow this convention, which makes me grind my teeth. Thus, I have to search the entire code for a single alias to find out where it is (and what's frustrating is, I have to read through the logic before I find the alias declaration). If it weren't for that, I honestly might like the system better.
If I ever write a stored procedure that someone else will have to deal with, I'm putting my alias definitions in a comment block at the beginning of the file, as a reference. I honestly can't understand how you guys don't go crazy without it.
Good
As it has been mentioned multiple times before, it is a good practice to prefix all column names to easily see which column belongs to which table - and aliases are shorter than full table names so the query is easier to read and thus understand. If you use a good aliasing scheme of course.
And if you create or read the code of an application, which uses externally stored or dynamically generated table names, then without aliases it is really hard to tell at the first glance what all those "%s"es or other placeholders stand for. It is not an extreme case, for example many web apps allow to customize the table name prefix at installation time.
Microsoft SQL's query optimiser benefits from using either fully qualified names or aliases.
Personally I prefer aliases, and unless I have a lot of tables they tend to be single letter ones.
--seems pretty readable to me ;-)
select a.Text
from Question q
inner join Answer a
on a.QuestionId = q.QuestionId
There's also a practical limit on how long a Sql string can be executed - aliases make this limit easier to avoid.
If I write a query myself (by typing into the editor and not using a designer) I always use aliases for the table name just so I only have to type the full table name once.I really hate reading queries generated by a designer with the full table name as a prefix to every column name.
I suppose the only thing that really speaks against them is excessive abstraction. If you will have a good idea what the alias refers to (good naming helps; 'a', 'b', 'c' can be quite problematic especially when you're reading the statement months or years later), I see nothing wrong with aliasing.
As others have said, joins require them if you're using the same table (or view) multiple times, but even outside that situation, an alias can serve to clarify a data source's purpose in a particular context. In the alias's name, try to answer why you are accessing particular data, not what the data is.
I LOVE aliases!!!! I have done some tests using them vs. not and have seen some processing gains. My guess is the processing gains would be higher when you're dealing with larger datasets and complex nested queries than without. If I'm able to test this, I'll let you know.
You need them if you're going to join a table to itself, or if you use the column again in a subquery...
Aliases are great if you consider that my organization has table names like:
SchemaName.DataPointName_SubPoint_Sub-SubPoint_Sub-Sub-SubPoint...
My team uses a pretty standard set of abbreviations, so the guesswork is minimized. We'll have say ProgramInformationDataPoint shortened to pidp, and submissions to just sub.
The good thing is that once you get going in this manner and people agree with it, it makes those HAYUGE files just a little smaller and easier to manage. At least for me, fewer characters to convey the same info seems to go a little easier on my brain.
I like long explicit table names (it's not uncommon to be more than 100 characters) because I use many tables and if the names aren't explicit, I might get confused as to what each table stores.
So when I write a query, I tend to use shorter aliases that make sense within the scope of the query and that makes the code much more readable.
I always use aliases in my queries and it is part of the code guidebook in my company. First of all you need aliases or table names when there are columns with identical names in the joining tables. In my opinion the aliases improve readability in complex queries and allow me to see quickly the location of each columns. We even use aliases with single table queries, because experience has shown that single table queries donĀ“t stay single table for long.
IMHO, it doesn't really matter with short table names that make sense, I have on occasion worked on databases where the table name could be something like VWRECOFLY or some other random string (dictated by company policy) that really represents users, so in that case I find aliases really help to make the code FAR more readable. (users.username makes a lot more sence then VWRECOFLY.username)
I always use aliases, since to get proper performance on MSSQL you need to prefix with schema at all times. So you'll see a lot of
Select
Person.Name
From
dbo.Person As Person
I always use aliases when writing queries. Generally I try and abbreviate the table name to 1 or 2 representative letters. So Users becomes u and debtor_transactions becomes dt etc...
It saves on typing and still carries some meaning.
The shorter names makes it more readable to me as well.
If you do not use an alias, it's a bug in your code just waiting to happen.
SELECT Description -- actually in a
FROM
table_a a,
table_b b
WHERE
a.ID = b.ID
What happens when you do a little thing like add a column called Description to Table_B. That's right, you'll get an error. Adding a column doesn't need to break anything. I never see writing good code, bug free code, as a necessary evil.
Aliases are required when joining tables with columns that have identical names.