I've been trying to come up with a good design pattern for mapping data contained in relational databases to the business objects I've created but I keep hitting a wall.
Consider the following tables:
TYPE: typeid, description
USER: userid, username, usertypeid->TYPE.typeid, imageid->IMAGE.imageid
IMAGE: imageid, location, imagetypeid->TYPE.typeid
I would like to gather all the information regarding a specific user. Creating a query for this isn't too difficult.
SELECT u.*, ut.*, i.*, it.* FROM user u
INNER JOIN type ut ON ut.typeid = u.usertypeid
INNER JOIN image i ON i.imageid = u.imageid
INNER JOIN type it ON it.typeid = i.imagetypeid
WHERE u.userid = #userid
The problem is that the field names collide and then I'm forced to alias every single field which gets out of hand very quickly.
Does anyone have a decent design pattern for this kind of thing?
I've thought about retrieving multiple results from a single stored procedure and then using a dataset to iterate through each one but I'm worried that some performance issues might bite me in the butt later. For example instead of the above query something like:
SELECT u.*, t.* FROM user u
INNER JOIN type t ON t.typeid = u.usertypeid
WHERE u.userid = #userid;
SELECT i.*, t.* FROM image i
INNER JOIN type t ON t.typeid = i.imagetypeid
INNER JOIN user u ON u.imageid = i.imageid
WHERE u.userid = #userid;
Does that seem like a decent solution? Can anyone foresee any issues with this approach?
Never use the SQL * wildcard in production code. Always spell out all the columns you want to retrieve.
Then aliasing some of them doesn't seem like such a huge amount of extra work.
Re your comment asking for background and reasoning:
Sometimes you don't really need every column from all tables, and fetching them can be needlessly costly (especially for large strings and blobs). There is no SQL syntax for "all columns except the following exceptions."
You can't alias columns that you fetch using the wildcard. Once you need to alias any of the columns, you need to expand the wildcard to list all the columns explicitly.
If the table structure changes, e.g. columns are renamed, reordered, dropped, or added, then the wildcard fetches them all, by position as defined in the tables. This may seem like a convenience, but not when your application depends on columns being in the result set by a given name or in a given position. You can get mysterious bugs where your application displays columns in the wrong order (if referencing columns by position), or shows them as blank (if referencing columns by name).
However, if the SQL query names columns explicitly, you can employ the "Fail Early" principle. This helps debugging, because it leads you directly to the SQL query that needs to be edited to account for the schema change.
Related
I'm trying to select just one account using SQL Server but am getting the following error:
ERROR: The text data type cannot be selected as DISTINCT because
it is not comparable. Error Code: 421
with the following statement:
select DISTINCT ad.*,
acc.companyname,
acc.accountnumber
from address ad
join AddressLink al on al.AddressID = ad.id
join account acc on acc.ID = al.ParentID
where acc.accountnumber like '11227'
What have I done wrong?
Edit:
New query:
select address.ID,
address.StreetAddress1,
address.StreetAddress2,
address.City,
Address.State,
Address.PostalCode,
Address.ClassTypeID,
account.companyname,
account.accountnumber,
addresslink.ID as addressLinkID,
addresslink.addresstypeid
from address
join AddressLink on address.id = addresslink.AddressID
join account on addresslink.ParentID = account.ID
where account.CompanyName like 'company name'
All the company names that I've had to blur are identical.
Try:
select ad.*,
l.companyname,
l.accountnumber
from address ad
join (select DISTINCT al.AddressID,
acc.companyname,
acc.accountnumber
from account acc
join AddressLink al on acc.ID = al.ParentID
where acc.accountnumber like '11227') l
on l.AddressID = ad.id
"Distinct", in the context you have is trying to do distinct on ALL columns. That said, there are some data types that are NOT converable, such as TEXT. So, if your table has some of these non "Distinctable" column types exists, that is what is crashing your query.
However, to get around this, if you do something like
CONVERT( char(60), YourTextColumn ) as YourTextColumn,
It should get that for you... at least its now thinking the final column type is "char"acter and CAN compare it.
You should check the data types of the columns in the address table. My guess is that one or more of them has the data type text, ntext or image.
One of the restrictions of using text, ntext or image data types is that columns defined of these data types cannot be used as part of a SELECT statement that includes the DISTINCT clause.
For what it's worth, the MSDN article for ntext, text, and image (Transact-SQL) recommends avoiding these data types and use nvarchar(max), varchar(max), and varbinary(max) instead. You may want to consider changing how that table is defined.
The accepted answer from Mark B shows a subquery (good idea to limit the domain of the DISTINCT) on AddressLink.AddressId, Account.CompanyName, and Account.AccountNumber.
Let me ask this: Does AddressLink allow more than one record to have the same value in the ParentId and AddressId fields?
If not, and assuming that Mark B's answer works, then just remove the DISTINCT because you're never going to get any duplicates inside of that subquery.
Leaving the DISTINCT in causes a performance hit because the DB has to create a temporary table that is either indexed with a btree or a hash and it has to insert every value returned by the subquery into that table to check if it invalidates the uniqueness constraint on those three fields. Note that the "optimizer" doesn't know that there won't be any dupes... if you tell it to check for DISTINCT, it will check it... With a btree index this is going to cause O(n log n) work on the number of rows returned; with a hash it would cause O(n) work but who knows how big the constant factor is in relation to the other work you're doing (it's probably larger than everything else you're doing meaning this could make this run half as fast as without the DISTINCT).
So my answer is Mark B's answer without the DISTINCT in the subquery. Let me know if AddressLink does allow repeats (can't imagine why it would).
I have an order system. Users with can be attached to different orders as a type of different user. They can download documents associated with an order. Documents are only given to certain types of users on the order. I'm having trouble writing the query to check a user's permission to view a document and select the info about the document.
I have the following tables and (applicable) fields:
Docs: DocNo, FileNo
DocAccess: DocNo, UserTypeWithAccess
FileUsers: FileNo, UserType, UserNo
I have the following query:
SELECT Docs.*
FROM Docs
WHERE DocNo = 1000
AND EXISTS (
SELECT * FROM DocAccess
LEFT JOIN FileUsers
ON FileUsers.UserType = DocAccess.UserTypeWithAccess
AND FileUsers.FileNo = Docs.FileNo /* Errors here */
WHERE DocAccess.UserNo = 2000 )
The trouble is that in the Exists Select, it does not recognize Docs (at Docs.FileNo) as a valid table. If I move the second on argument to the where clause it works, but I would rather limit the initial join rather than filter them out after the fact.
I can get around this a couple ways, but this seems like it would be best. Anything I'm missing here? Or is it simply not allowed?
I think this is a limitation of your database engine. In most databases, docs would be in scope for the entire subquery -- including both the where and in clauses.
However, you do not need to worry about where you put the particular clause. SQL is a descriptive language, not a procedural language. The purpose of SQL is to describe the output. The SQL engine, parser, and compiler should be choosing the most optimal execution path. Not always true. But, move the condition to the where clause and don't worry about it.
I am not clear why do you need to join with FileUsers at all in your subquery?
What is the purpose and idea of the query (in plain English)?
In any case, if you do need to join with FileUsers then I suggest to use the inner join and move second filter to the WHERE condition. I don't think you can use it in JOIN condition in subquery - at least I've never seen it used this way before. I believe you can only correlate through WHERE clause.
You have to use aliases to get this working:
SELECT
doc.*
FROM
Docs doc
WHERE
doc.DocNo = 1000
AND EXISTS (
SELECT
*
FROM
DocAccess acc
LEFT OUTER JOIN
FileUsers usr
ON
usr.UserType = acc.UserTypeWithAccess
AND usr.FileNo = doc.FileNo
WHERE
acc.UserNo = 2000
)
This also makes it more clear which table each field belongs to (think about using the same table twice or more in the same query with different aliases).
If you would only like to limit the output to one row you can use TOP 1:
SELECT TOP 1
doc.*
FROM
Docs doc
INNER JOIN
FileUsers usr
ON
usr.FileNo = doc.FileNo
INNER JOIN
DocAccess acc
ON
acc.UserTypeWithAccess = usr.UserType
WHERE
doc.DocNo = 1000
AND acc.UserNo = 2000
Of course the second query works a bit different than the first one (both JOINS are INNER). Depeding on your data model you might even leave the TOP 1 out of that query.
I am wondering which is the best way to query data from a large database.
Say I have a requirement to get all a list of all users who live in the United States, along with their orders and the products belonging to their orders. For simplicity, we have a user table, which has a CountryId in that table... and then an Order table, with a userId.. then maybe an OrderProduct table to list many products to an order (and many orders can contain the same product).
My question is, would it be better to maybe create a temp table by
SELECT userId FROM dbo.User WHERE countrId = #CountryId
We now have the relevant users in a temp table.
Then, do a
Select p.ProductDescription ...
From #TempTable tmp
INNER JOIN Order o
ON o.UserId = tmp.UserId
INNER JOIN OrderProduct op
ON op.OrderID = o.OrderId
INNER JOIN Product p
ON p.ProductId = op.ProductId
So, what I am doing is getting the users I need.... and moving that into a temp table, then using that temp table to filter the data for the main query.
Or, is it as efficient, if not more, to just do it all in one...
Select ... from User u
INNER JOIN Order o
....
WHERE u.UserId = #UserId
?
In general you want to write your entire request in one query, because that gives the database query optimizer the best possible chance of coming up with the most efficient possibility. With a decent database, it will generally do a wonderful job at this, and any effort on your part to help it along is more likely to hurt than help.
If the database is insufficiently fast, first look into things like whether you have the right indexes, tuning your database, etc. Those are the most common causes of problems and should clear up most remaining problems quite promptly.
Only after you have given the database every chance to get the right answer in the right way, should you consider trying to use temp tables to force a particular query plan. (There are other reasons to use temp tables. But for getting good query plans, it should be a last resort.)
There is an old pair of rules about optimization that applies here in spades.
Don't.
(For experts only.) Not yet.
You could create a view that holds the data you need.
A view is created like this:
CREATE VIEW view_name AS
SELECT column_name(s)
FROM table_name
WHERE condition
You can then query the view you created just as you would query a table.
Say I have a select statement that goes..
select * from animals
That gives a a query result of all the columns in the table.
Now, if the 42nd column of the table animals is is_parent, and I want to return that in my results, just after gender, so I can see it more easily. But I also want all the other columns.
select is_parent, * from animals
This returns ORA-00936: missing expression.
The same statement will work fine in Sybase, and I know that you need to add a table alias to the animals table to get it to work ( select is_parent, a.* from animals ani), but why must Oracle need a table alias to be able to work out the select?
Actually, it's easy to solve the original problem. You just have to qualify the *.
select is_parent, animals.* from animals;
should work just fine. Aliases for the table names also work.
There is no merit in doing this in production code. We should explicitly name the columns we want rather than using the SELECT * construct.
As for ad hoc querying, get yourself an IDE - SQL Developer, TOAD, PL/SQL Developer, etc - which allows us to manipulate queries and result sets without needing extensions to SQL.
Good question, I've often wondered this myself but have then accepted it as one of those things...
Similar problem is this:
sql>select geometrie.SDO_GTYPE from ngg_basiscomponent
ORA-00904: "GEOMETRIE"."SDO_GTYPE": invalid identifier
where geometrie is a column of type mdsys.sdo_geometry.
Add an alias and the thing works.
sql>select a.geometrie.SDO_GTYPE from ngg_basiscomponent a;
Lots of good answers so far on why select * shouldn't be used and they're all perfectly correct. However, don't think any of them answer the original question on why the particular syntax fails.
Sadly, I think the reason is... "because it doesn't".
I don't think it's anything to do with single-table vs. multi-table queries:
This works fine:
select *
from
person p inner join user u on u.person_id = p.person_id
But this fails:
select p.person_id, *
from
person p inner join user u on u.person_id = p.person_id
While this works:
select p.person_id, p.*, u.*
from
person p inner join user u on u.person_id = p.person_id
It might be some historical compatibility thing with 20-year old legacy code.
Another for the "buy why!!!" bucket, along with why can't you group by an alias?
The use case for the alias.* format is as follows
select parent.*, child.col
from parent join child on parent.parent_id = child.parent_id
That is, selecting all the columns from one table in a join, plus (optionally) one or more columns from other tables.
The fact that you can use it to select the same column twice is just a side-effect. There is no real point to selecting the same column twice and I don't think laziness is a real justification.
Select * in the real world is only dangerous when referring to columns by index number after retrieval rather than by name, the bigger problem is inefficiency when not all columns are required in the resultset (network traffic, cpu and memory load).
Of course if you're adding columns from other tables (as is the case in this example it can be dangerous as these tables may over time have columns with matching names, select *, x in that case would fail if a column x is added to the table that previously didn't have it.
why must Oracle need a table alias to be able to work out the select
Teradata is requiring the same. As both are quite old (maybe better call it mature :-) DBMSes this might be historical reasons.
My usual explanation is: an unqualified * means everything/all columns and the parser/optimizer is simply confused because you request more than everything.
Due to a variety of design decisions, we have a table, 'CustomerVariable'. CustomerVariable has three bits of information--its own id, an id to Variable (a list of possible settings the customer can have), and the value for that variable. The Variable table, on the other hand, has the information on a default--in case the CustomerVariable is not set.
This works well in most situations, allowing us not to have to create an insanely long list of information--especially in a case where there are 16 similar variables that need to be handled for a customer.
The problem comes in trying to get this information into a select. So far, our 'best' solution involves far too many joins to be efficient--we get a list of the 16 VariableIds we need information on, setting them into variables, first. Later on, however, we have to do this:
CROSS JOIN dbo.Variable v01
LEFT JOIN dbo.CustomerVariable cv01 ON cv01.customerId = c.id
AND cv01.variableId = v01.id
CROSS JOIN dbo.Variable v02
LEFT JOIN dbo.CustomerVariable cv02 ON cv02.customerId = c.id
AND cv02.variableId = v02.id
-- snip --
CROSS JOIN dbo.Variable v16
LEFT JOIN dbo.CustomerVariable cv16 ON cv16.customerId = c.id
AND cv16.variableId = v16.id
WHERE
v01.id = #cv01VariableId
v02.id = #cv02VariableId
-- snip --
v16.id = #cv16VariableId
I know there has to be a better way, but we can't seem to find it amidst crunch time. Any help would be greatly appreciated.
If your data set is relatively small and not too volatile, you may want to use materialized views (assuming your database supports them) to optimize the lookup.
If materialized views are not an option, consider writing a stored procedure that retrieves that data in two passes:
First retrieve all of the CustomerVariables available for a particular customer (or set of customers)
Next, retrieve all of the default values from the Variables table
Perform a non-distinct union on the results merging the defaults in wherever a CustomerVariable record is missing.
Essentially, this is the equivalent of:
SELECT variableId,
CASE WHEN CV.variableId = NULL THEN VR.defaultValue ELSE CV.value END
FROM Variable VR
LEFT JOIN CUstomerVariable CV on CV.variableId = VR.variableId
WHERE CV.customerId = c.id
The type of query you want is called a pivot table or crosstab query.
By far the easiest way of dealing with this is to create a view based off of a crosstab query. This will flip the columns from being vertical to being horizontal like a regular sql table. Once that is done, just query the view. Easy. ;)