Hibernate : queries with multiple tables - sql

I am new to hibernate and working on a project where I need to extract data from DB using complex query. So just for example :
Let there are tables Student, Attendance, Subject and so on ..
Student contains (name(assume primary key),class,age,sex .. and other student data).
Attendance contains (student name, % attendance).
Subject contains (student , subjects )
I need to extract data for queries like
q1: (age > 20 && age < 22)
q2: class == Engineering
q3: should contain algorithm as it's subject.
student with (q1||q2) && q3.
Even query can be even more complex. like ((q1&&q2)||(q3&&q4)) && q5 ..
I have few questions:
Assuming all table have same primary key(and I am taking join on that).
What is best and most efficient way to do it.
is it possible to write single query for such complex expression and if possible is it
recommended ?
If it is not possible to write a single query for it. I can think of calculating it like a postfix expression, seems dirty ?
4.My understanding is if q1 and q2 belongs to same table AND/OR in where clause will work but if they belong to different table if have to take join and then condition need to be applied. Right ?
If anything looks stupid in my question , I am sorry for that just 2 days back I started working on it.
Any good resource for read will be helpful.

yes, you can create quite complex queries with a single statement. You might want to look into sub queries.
As for whether it is recommended or not, it would all depend on execution plan which would require a pick at actual datasets.
hope it helps a bit.

I believe that Hibernate Criteria API is perfect for your purpose. You can see some examples in this articles:
http://www.javalobby.org/articles/hibernatequery102/
http://www.tutorialspoint.com/hibernate/hibernate_criteria_queries.htm
Regards

Related

Why not have a JOINONE keyword in SQL to hint and enforce that each record has at most one match?

I encounter this a lot when writing SQL. I have two tables that are meant to be in a one-to-one relationship with each other, and I wish I could easily assert that fact in my query. For example, the simplified query:
SELECT Person.ID, Person.Name, Location.Address1
FROM Person
LEFT JOIN Location ON Person.LocationID = Location.ID
When I read this query I think to myself, well what if the Location table fails to enforce uniqueness on its ID column? Suddenly you could have the same Person multiple times in your resultset. Sure, I can go look at the schema to assure myself it's unique so everything will be okay, but why shouldn't I simply be able to put it right here in my query, a la:
SELECT Person.ID, Person.Name, Location.Address1
FROM Person
LEFT JOINONE Location ON Person.LocationID = Location.ID
Not only would a keyword like this (made up "JOINONE") make it 100% clear to a human reading this query that we are guaranteed to get exactly one row for each Person record, but it lets the db engine optimize its execution plan because it knows there won't be more than one match each, even if the foreign key relationship isn't defined in the schema.
Another advantage of this would be that the db engine could enforce it, so if the data actually did have more than one match, an error could be thrown. This happens for subqueries already, e.g.:
SELECT Person.ID, Person.Name
, (
SELECT Location.Address1
FROM Location
WHERE Location.ID = Person.Location
) AS Address1
FROM Person
This is nice and spiffy, 100% clear to the human reader, neatly optimizable, and enforced by the db engine. In fact I often end up doing things this way for all those reasons. The problem is, besides the distracting syntax, you can only select one field this way. (What if I want City, State, and Zip too?) How nice it would be if you could flow this table right along with the rest of your JOINs and select any fields from it you wish in your SELECT clause just like all the rest of your tables.
I couldn't find any other question like this around StackOverflow, though I did find lots of repeats of a close question: people wanting to choose a single record. Close but really quite a different kind of goal, and less meaningful in my opinion.
I'm posting this question to see if there's some mechanism already in the SQL language that I'm missing, or an efficient workaround anyone has come up with. The concept of a one-to-one vs. one-to-many relationship is so fundamental to relational database design, I'm just so surprised at the absence of this language element.
SQL is two languages in one. Constraints, including uniqueness constraints, are set using the data definition language (DDL) in SQL. This is a layer above the data manipulation language (DML), where SELECT statements live, and it's understood that statements issued in the DDL might invalidate statements in the DML.
There's no way for a query to prevent someone from executing an ALTER TABLE command and changing the name of a field that the query refers to between query runs.
And there isn't much more of a way for a query to be written defensively against uncertain constraints; it's OK if you need to ask someone for information outside of the database environment to address this. The information may also be available within the environment; in most engines, you get to it by querying the data dictionary. This is the INFORMATION_SCHEMA in MySQL, for instance.

Is it possible there is a faster way to perform this SELECT query?

UPDATE (based on everyone's responses):
I'm thinking of changing my structure so that I have a new table called prx_tags_sportsitems. I will be removing prx_lists entirely. prx_tags_sportsitems will act as a reference of ID's table to replace the prx_lists.ListString which used to be storing the ID's of tags belonging to each prx_sportsitem.
The new relation will be like so:
prx_tags_sportsitems.TagID <--> prx_tags.ID
prx_sportsitems.ID <--> prx_tags_sportsitems.OwnerID
prx_tags will contain the TagName. This is so I can still maintain each "tag" as a separate unique entity.
My new query for finding all sportsitems with the tag "aerobic" will become something similar to as follows:
SELECT prx_sportsitems.* FROM prx_sportsitems, prx_tags_sportsitems
WHERE prx_tags_sportsitems.OwnerID = prx_sportsitems.ID
AND prx_tags_sportsitems.TagID = (SELECT ID FROM prx_tags WHERE TagName = 'aerobic')
ORDER BY prx_sportsitems.DateAdded DESC LIMIT 0,30;
Or perhaps I can do something with the "IN" clause, but I'm unsure about that just yet.
Before I go ahead with this huge modification to my scripts, does everyone approve? comments? Many thanks!
ORIGINAL POST:
When it comes to MYSQL queries, I'm rather novice. When I originally designed my database I did something, rather silly, because it was the only solution I could find. Now I'm finding it appears to be causing too much stress of my MYSQL server since it takes 0.2 seconds to perform each of these queries where I believe it could be more like 0.02 seconds if it was a better query (or table design if it comes to it!). I want to avoid needing to rebuild my entire site structure since it's deeply designed the way it currently is, so I'm hoping there's a faster mysql query possible.
I have three tables in my database:
Sports Items Table
Tags Table
Lists Table
Each sports item has multiple tag names (categories) assigned to them. Each "tag" is stored as a separate result in prx_tags. I create a "list" in prx_lists for the sports item in prx_sportsitems and link them through prx_lists.OwnerID which links to prx_sportsitems.ID
This is my current query (which finds all sports items which have the tag 'aerobic'):
SELECT prx_sportsitems.*
FROM prx_sportsitems, prx_lists
WHERE prx_lists.ListString LIKE (CONCAT('%',(SELECT prx_tags.ID
FROM prx_tags
WHERE prx_tags.TagName = 'aerobic'
limit 0,1),'#%'))
AND prx_lists.ListType = 'Tags-SportsItems'
AND prx_lists.OwnerID = prx_sportsitems.ID
ORDER BY prx_sportsitems.DateAdded
DESC LIMIT 0,30
To help clarify more, the list that contains all of the tag ids is inside a single field called ListString and I structure it like so: " #1 #2 #3 #4 #5" ...and from that, the above query "concats" the prx_tags.ID which tagname is 'aerobic'.
My thoughts are that, there probably isn't a faster query existing and that I need to simply accept I need to do something simpler, such as putting all the Tags in a list, directly inside prx_sportsitems in a new field called "TagsList" and then I can simply run a query which does Select * from prx_sportsitems Where TagsList LIKE '%aerobic%' - however, I want to avoid needing to redesign my entire site. I'm really regretting not looking into optimization beforehand :(
Whenever I am writing a query, and think I need to use LIKE, an alarm goes off in my head that maybe there is a better design. This is certainly the case here.
You need to redesign the prx_lists tables. From what you've said, its hard to say what the exact schema should be, but here is my best guess:
prx_lists should have three columns: OwnerID, ListType, and TagName. Then you would have one row for each tag an OwnerID has. Your above query would now look something like this:
SELECT prx_sportsitems.*
FROM prx_sportsitems, prx_lists
where prx_lists.TagName = 'aerobic'
AND prx_lists.OwnerID = prx_sportsitems.ID
This is a MUCH more efficient query. Maybe ListType doesn't belong in that table either, but its hard to say without more info about what that column is used for.
Don't forget to create the appropriate indexes either! This will improve performance.
Refactoring your database schema might be painful, but its seems to me the only way to fix your long term problem.
To help clarify more, the list that
contains all of the tag ids is inside
a single field called ListString and I
structure it like so: " #1 #2 #3 #4 #5" ...and from that, the above query "concats" the prx_tags.ID which
tagname is 'aerobic'.
There's your problem right there. Don't store delimited data in a DB field (ListString). Modeling data this way is going to make it extremely difficult/impossible to write performant queries against it.
Suggestion: Break the contents of ListString out into a related table with one row for each item.
the list that contains all of the tag ids is inside a single field called ListString and I structure it like so: " #1 #2 #3 #4 #5" ...and from that, the above query "concats" the prx_tags.ID which tagname is 'aerobic'.
Not only is that bad, storing denormalized data, but the separator character is uncommon.
Interim Improvement
The quickest way to improve things is to change the separator character you're currently using ("#") to a comma:
UPDATE PRX_LISTS
SET liststring = REPLACE(liststring, '#', ',')
Then, you can use MySQL's FIND_IN_SET function:
SELECT si.*
FROM PRX_SPORTSITEMS si
JOIN PRX_LISTS l ON l.ownerid = si.id
JOIN PRX_TAGS t ON FIND_IN_SET(t.id, l.liststring) > 0
WHERE t.tagname = 'aerobic'
AND l.listtype = 'Tags-SportsItems'
ORDER BY si.DateAdded DESC
LIMIT 0, 30
Long Term Solution
As you've experienced, searching for specifics in denormalized data does not perform well, and makes queries overly complicated. You need to change the PRX_LISTS table so one row contains a unique combination of the SPORTSITEM.ownerid and PRX_TAGS.id, and whatever other columns you might need. I'd recommend renaming as well - lists of what, exactly? The name is too generic:
CREATE TABLE SPORTSITEM_TAGS_XREF (
sportsitem_ownerid INT,
tag_id INT,
PRIMARY KEY (sportsitem_ownerid INT, tag_id)
)
Don't make any changes without
looking at the execution plan. (And
post that here, too, by editing your
original question.)
The way your LIKE clause is
constructed, MySQL can't use an
index.
The LIKE clause is a symptom. Your
table structure is more likely the problem.
You'll probably get at least one order of magnitude improvement by building sane tables.
I'm really regretting not looking into
optimization beforehand
That's not what caused your problem. Being ignorant of the fundamentals of database design caused your problem. (That's an observation, not a criticism. You can fix ignorance. You can't fix stupid.)
Later:
Post your existing table structure and your proposed changes. You'll be a lot happier with our ability to predict what your code will do than with our ability to predict what your description of a piece of your code will do.

How would you do give the user a preference for how from an SQL table is to be printed?

I'm given a task from a prospective employer which involves SQL tables. One requirement that they mentioned is that they want the name retrieved from a table called "Employees" to come in the form at of either "<LastName>, <FirstName>" OR "<FirstName> <MiddleName> <LastName> <Suffix>".
This appears confusing to me because this kind of sounds like they're asking me to make a function or something. I could probably do this in a programming language and have the information retrieved that way, but to do this in the SQL table exclusively is weird to me. Since I'm rather new to SQL and my familiarity with SQL doesn't exceed simple tasks such as creating databases, tables, fields, inserting data into fields, updating fields in records, deleting records in tables which meet a specific condition, and selecting fields from tables.
I hope that this isn't considered cheating since I mentioned that this was for a prospective employer, but if I was still in school then I could just outright ask a professor where I can find a clue for this or he would've outright told me in class. But, for a prospective job, I'm not sure who I would ask about any confusion. Thanks in advance for anyone's help.
A SQL query has a fixed column output: you can't change it. To achieve this. you could have a concatenate with a CASE statement to make it one varchar column, but then you need something (parameter) to switch the CASE.
So, this is presentation, not querying SQL.
I'd return all 4 columns mentioned and decide how I want them in the client.
Unless you have just been asked for 2 different queries on the same SQL table
You haven't specified the RDBMS, but in SQL Server you could accomplish this using Computed Columns.
Typically, you would use a View over the table..

Has anyone written a higher level query langage (than sql) that generates sql for common tasks, on limited schemas

Sql is the standard in query languages, however it is sometime a bit verbose. I am currently writing limited query language that will make my common queries quicker to write and with a bit less mental overhead.
If you write a query over a good database schema, essentially you will be always joining over the primary key, foreign key fields so I think it should be unnecessary to have to state them each time.
So a query could look like.
select s.name, region.description from shop s
where monthly_sales.amount > 4000 and s.staff < 10
The relations would be
shop -- many to one -- region,
shop -- one to many -- monthly_sales
The sql that would be eqivilent to would be
select distinct s.name, r.description
from shop s
join region r on shop.region_id = region.region_id
join monthly_sales ms on ms.shop_id = s.shop_id
where ms.sales.amount > 4000 and s.staff < 10
(the distinct is there as you are joining to a one to many table (monthly_sales) and you are not selecting off fields from that table)
I understand that original query above may be ambiguous for certain schemas i.e if there the two relationship routes between two of the tables. However there are ways around (most) of these especially if you limit the schema allowed. Most possible schema's are not worth considering anyway.
I was just wondering if there any attempts to do something like this?
(I have seen most orm solutions to making some queries easier)
EDIT: I actually really like sql. I have used orm solutions and looked at linq. The best I have seen so far is SQLalchemy (for python). However, as far as I have seen they do not offer what I am after.
Hibernate and LinqToSQL do exactly what you want
I think you'd be better off spending your time just writing more SQL and becoming more comfortable with it. Most developers I know have gone through just this progression, where their initial exposure to SQL inspires them to bypass it entirely by writing their own ORM or set of helper classes that auto-generates the SQL for them. Usually they continue adding to it and refining it until it's just as complex (if not more so) than SQL. The results are sometimes fairly comical - I inherited one application that had classes named "And.cs" and "Or.cs", whose main functions were to add the words " AND " and " OR ", respectively, to a string.
SQL is designed to handle a wide variety of complexity. If your application's data design is simple, then the SQL to manipulate that data will be simple as well. It doesn't make much sense to use a different sort of query language for simple things, and then use SQL for the complex things, when SQL can handle both kinds of thing well.
I believe that any (decent) ORM would be of help here..
Entity SQL is slightly higher level (in places) than Transact SQL. Other than that, HQL, etc. For object-model approaches, LINQ (IQueryable<T>) is much higher level, allowing simple navigation:
var qry = from cust in db.Customers
select cust.Orders.Sum(o => o.OrderValue);
etc
Martin Fowler plumbed a whole load of energy into this and produced the Active Record pattern. I think this is what you're looking for?
Not sure if this falls in what you are looking for but I've been generating SQL dynamically from the definition of the Data Access Objects; the idea is to reflect on the class and by default assume that its name is the table name and all properties are columns. I also have search criteria objects to build the where part. The DAOs may contain lists of other DAO classes and that directs the joins.
Since you asked for something to take care of most of the repetitive SQL, this approach does it. And when it doesn't, I just fall back on handwritten SQL or stored procedures.

SQL Query Help - Scoring Multiple Choice Tests

Say I have a Student table, it's got an int ID. I have a fixed set of 10 multiple choice questions with 5 possible answers. I have a normalized answer table that has the question id, the Student.answer (1-5) and the Student.ID
I'm trying to write a single query that will return all scores over a certain pecentage. To this end I wrote a simple UDF that accepts the Student.answers and the correct answer, so it has 20 parameters.
I'm starting to wonder if it's better to denormalize the answer table, bring it into my applcation and let my application do the scoring.
Anyone ever tackle something like this and have insight?
If I understand your schema and question correctly, how about something like this:
select student_name, score
from students
join (select student_answers.student_id, count(*) as score
from student_answers, answer_key
group by student_id
where student_answers.question_id = answer_key.question_id
and student_answers.answer = answer_key.answer)
as student_scores on students.student_id = student_scores.student_id
where score >= 7
order by score, student_name
That should select the students with a score of 7 or more, for example. Just adjust the where clause for your purposes.
I would probably leave it up to your application to perform the scoring. Check out Maybe Normalizing Isn't Normal by Jeff Atwood.
The architecture you are talking about could become very cumbersome in the long run, and if you need to change the questions it means more changes to the UDF you are using.
I would think you could probably do your analysis in code without necessarily de-normalizing your database. De-normalization could also lend to inflexibility, or at least added expense to update, down the road.
No way, you definitely want to keep it normalized. It's not even that hard of a query.
Basically, you want to left join the students correct answers with the total answers for that question, and do a count. This will give you the percent correct. Do that for each student, and put the minimum percent correct in a where clause.
Denormalization is generally considered a last resort. The problem seems very similar to survey applications, which are very common. Without seeing your data model, it's difficult to propose a solution, but I will say that it is definitely possible. I'm wondering why you need 20 parameters to that function?
A relational set-based solution will be simpler and faster in most cases.
This query should be quite easy... assuming you have the correct answer stored in the question table. You do have the correct answer stored in the question table, right?