Ok, first of all thanks in advance if you read through this whole thing as it may be quite painful on several levels.
It's a long post
It's gross
It's going to probably make your brain hurt
But on the plus side, after reading through this whole thing I have a feeling the answer is very obvious and simple, so you have that going for you.
So I'll tell you the problem in a nutshell, and then in more detail:
Nutshell
I have a query in SQL Server 2008r2 that is taking a very long time to complete.
I have several tables that contain information about a child and its parent.
A child in one table can have a parent in another table which then could have a parent in another table (there are only 3 tables).
I want to be able to take a child's name as a string and figure out it's heirarchy of ancestors and return that as a period delimited string. So Grandpappy.Grandpa.Dad.Me.
I have this all working, it's just taking forever so I'm doing something stupid, or poorly performant, or most likely both.
I have NO control over the tables, they are what they are and I can't do anything to them. I created a view and a function (which you will see below) and that is all I can control.
The table names and values below are obviously fictitious.
Detailed description
Here are the tables that indicate children and parents. In this example we will be dealing with Fruits, Vegetables, and Planets.
A Planet has no parents.
A Fruit has a Parent who is a Planet, or a Fruit.
A Vegetable has a Parent who is a Fruit, or a Planet, or a Vegetable.
Let's take a look at them...
Table 1 = Planets (I have no parents)
ID, Name
1, Earth
2, Saturn
Table 2 = Fruits (my parent is either a planet or a fruit)
ID, Name, PlanetName, FruitName
1, Kiwi, Earth, null
2, Strawberry, Saturn, null
3, Banana, null, Strawberry
Table 3 = Vegetables (my parent is planet or a fruit or a vegetable)
ID, Name, FruitName, PlanetName, VegetableName
1, Potato, Kiwi, null, null
2, Squash, null, Earth, null
3, Pumpkin, null, null, Potato
Table 4 = BigTable (this will be the one the main slow query is using. It has a column that contains just a child's name and it could be a planet or a fruit or a vegetable)
ID, Name, OneOfTheThree
1, John, Earth
2, Steve, Kiwi
3, Joe, Saturn
4, Jane, Potato
We have our tables and we have our data, what do I want to do now?
I want to create a query that looks at all of the OneOfTheThree values in the BigTable and find out what their lineage is (who there dads, grand parents etc are) and return that to the caller.
So my thought was to do this:
Create a view that pulls the three tables (Planet, Fruit, Vegetable) into one single view that shows Name and Parent.
Create a function that takes in a Name. It then uses that view to find out who the Parent is for that Name. It then looks to see who the Parent is for that Parent, and on and on until the Parent is null and it stops (because that's the top of the ancestry chain... we made it all the way to Planet, who has no parents).
Create a query to query BigTable and then use the above function on BigTable's OneOfTheThree column to get the ancestry of the name in OneOfTheThree.
So I did it as follows:
My view
View = vwEverybodyAndTheirParents
-- Planets
SELECT Name, null AS Parent
FROM Planets
UNION
-- Fruits
SELECT Name, PlanetName AS Parent
FROM Fruits
UNION
-- Vegetables
SELECT Name, CASE WHEN FruitName IS NOT NULL THEN FruitName WHEN PlanetName IS NOT NULL THEN Planet ELSE NULL END AS Parent
FROM Vegetables
Ok, that gives me everything and it's parents. Now for the function to crawl that view and give me the period delimited string of the full ancestry:
My function
CREATE FUNCTION dbo.fnGetMyParent(#NameToGetParentsFor varchar(255))
RETURNS varchar(255)
AS
DECLARE #InternalName varchar(255)
DECLARE #ParentName varchar(255)
DECLARE #ConcatenatedParentStringToReturn varchar(max)
SELECT #ParentName = Parent
,#ConcatenatedParentStringToReturn = Name
FROM vwEverybody
WHERE Name = #NameToGetParentsFor
WHILE #ParentName IS NOT NULL
BEGIN
SELECT #InternalName = Name,
#ParentName = Parent
FROM vwEverybody
WHERE Name = #ParentName
SET #ConcatenatedParentStringToReturn = RTRIM(InternalName) + "." + RTRIM(#ConcatenatedParentStringToReturn)
END
RETURN #ConcatenatedParentStringToReturn
END
This function works fine (though could be poorly coded and poorly performing?), so with all the above examples if I were to call it like so:
dbo.fnGetMyParent('Potato')
I get back the concatenated string of:
Earth.Kiwi.Potato
The problem
Ok, so now to finally get to the problem... the big query that takes forever:
SELECT Name,
OneOfTheThree,
fnGetMyParent(OneOfTheThree) as HeirarchyOfParents
FROM BigTable
I can see why it could take so long as for each value it executes the function which needs to then crawl a view. So...
My questions to you
How can I speed this up?
Do I need to put an index on the view?
Is my approach off, and should I do this differently?
If so, what do you recommend?
A BIG THANK YOU if you made it this far!
First of all when using sql you should avoid using loops as much as you can (unless the situation asks for it)
Second, there is no need of the view, or of the function as your query should be easily written in one go.
select
bt.Name
,bt.OneOfTheThree
,p.Name+'.'+isnull(f.Name,'')+'.'+isnull(v.Name,'')+'.'+bt.Name as HeirarchyOfParents
from BigTable bt
left join Vegetables v
on bt.OneOfTheThree = v.name
left join Fruits f
on coalesce(v.FruitName,bt.OneOfTheThree) = f.Name
left join Planets p
on coalesce(f.PlanetName,v.PlanetName,bt.OneOfTheThree) = p.Name
The last join you can remove if the table is consistent with the others, as it does not bring new information (the planet name is already there).
The improvements that you can bring here are with indexes on the tables, if you are able to do that.
Ok, with the new information, the easiest way I can think of is the following:
;with ftemp as (
select
name as path
,PlanetName
,name as root
,name as name
,FruitName as parent
,0 as cnt
from fruits
union all
select
fruits.name + '.' + ftemp.path
,ftemp.PlanetName
,root
,fruits.name
,cnt+1
from fruits
join ftemp
on fruits.name= ftemp.parent
)
,fg as (
select
name
,max(cnt) as cnt
from ftemp
group by name
)
,f as (
select
ftemp.*
from ftemp
join fg
on ftemp.cnt = fg.cnt
and ftemp.name = fg.name
)
,vtemp (same ideea)
,vg (same ideea)
,v (same ideea)
select
bt.Name
,bt.OneOfTheThree
,p.Name+'.'+isnull(f.Path+'.','')+isnull(v.Path+'.','')+bt.Name as HeirarchyOfParents
from BigTable bt
left join v
on bt.OneOfTheThree = v.name
left join f
on coalesce(v.FruitName,bt.OneOfTheThree) = f.Name
left join Planets p
on coalesce(f.PlanetName,v.PlanetName,bt.OneOfTheThree) = p.Name
With this approach though .. I have no idea on the performance it will yield. So it's up to you to complete the query and test.
Hope it helps.
Related
I'm creating a database for a hypothetical video rental store.
All I need to do is a procedure that check the availabilty of a specific movie (obviously the movie can have several copies). So I have to check if there is a copy available for the rent, and take the number of the copy (because it'll affect other trigger later..).
I already did everything with the cursors and it works very well actually, but I need (i.e. "must") to do it without using cursors but just using "pure sql" (i.e. queries).
I'll explain briefly the scheme of my DB:
The tables that this procedure is going to use are 3: 'Copia Film' (Movie Copy) , 'Include' (Includes) , 'Noleggio' (Rent).
Copia Film Table has this attributes:
idCopia
Genere (FK references to Film)
Titolo (FK references to Film)
dataUscita (FK references to Film)
Include Table:
idNoleggio (FK references to Noleggio. Means idRent)
idCopia (FK references to Copia film. Means idCopy)
Noleggio Table:
idNoleggio (PK)
dataNoleggio (dateOfRent)
dataRestituzione (dateReturn)
dateRestituito (dateReturned)
CF (FK to Person)
Prezzo (price)
Every movie can have more than one copy.
Every copy can be available in two cases:
The copy ID is not present in the Include Table (that means that the specific copy has ever been rented)
The copy ID is present in the Include Table and the dataRestituito (dateReturned) is not null (that means that the specific copy has been rented but has already returned)
The query I've tried to do is the following and is not working at all:
SELECT COUNT(*)
FROM NOLEGGIO
WHERE dataNoleggio IS NOT NULL AND dataRestituito IS NOT NULL AND idNoleggio IN (
SELECT N.idNoleggio
FROM NOLEGGIO N JOIN INCLUDE I ON N.idNoleggio=I.idNoleggio
WHERE idCopia IN (
SELECT idCopia
FROM COPIA_FILM
WHERE titolo='Pulp Fiction')) -- Of course the title is just an example
Well, from the query above I can't figure if a copy of the movie selected is available or not AND I can't take the copy ID if a copy of the movie were available.
(If you want, I can paste the cursors lines that work properly)
------ USING THE 'WITH SOLUTION' ----
I modified a little bit your code to this
WITH film
as
(
SELECT idCopia,titolo
FROM COPIA_FILM
WHERE titolo = 'Pulp Fiction'
),
copy_info as
(
SELECT N.idNoleggio, N.dataNoleggio, N.dataRestituito, I.idCopia
FROM NOLEGGIO N JOIN INCLUDE I ON N.idNoleggio = I.idNoleggio
),
avl as
(
SELECT film.titolo, copy_info.idNoleggio, copy_info.dataNoleggio,
copy_film.dataRestituito,film.idCopia
FROM film LEFT OUTER JOIN copy_info
ON film.idCopia = copy_info.idCopia
)
SELECT COUNT(*),idCopia FROM avl
WHERE(dataRestituito IS NOT NULL OR idNoleggio IS NULL)
GROUP BY idCopia
As I said in the comment, this code works properly if I use it just in a query, but once I try to make a procedure from this, I got errors.
The problem is the final SELECT:
SELECT COUNT(*), idCopia INTO CNT,COPYFILM
FROM avl
WHERE (dataRestituito IS NOT NULL OR idNoleggio IS NULL)
GROUP BY idCopia
The error is:
ORA-01422: exact fetch returns more than requested number of rows
ORA-06512: at "VIDEO.PR_AVAILABILITY", line 9.
So it seems the Into clause is wrong because obviously the query returns more rows. What can I do ? I need to take the Copy ID (even just the first one on the list of rows) without using cursors.
You can try this -
WITH film
as
(
SELECT idCopia, titolo
FROM COPIA_FILM
WHERE titolo='Pulp Fiction'
),
copy_info as
(
select N.idNoleggio, I.dataNoleggio , I.dataRestituito , I.idCopia
FROM NOLEGGIO N JOIN INCLUDE I ON N.idNoleggio=I.idNoleggio
),
avl as
(
select film.titolo, copy_info.idNoleggio, copy_info.dataNoleggio,
copy_info.dataRestituito
from film LEFT OUTER JOIN copy_info
ON film.idCopia = copy_info.idCopia
)
select * from avl
where (dataRestituito IS NOT NULL OR idNoleggio IS NULL);
You should think in terms of sets, rather than records.
If you find the set of all the films that are out, you can exclude them from your stock, and the rest is rentable.
select copiafilm.* from #f copiafilm
left join
(
select idCopia from #r Noleggio
inner join #i include on Noleggio.idNoleggio = include.idNoleggio
where dateRestituito is null
) out
on copiafilm.idCopia = out.idCopia
where out.idCopia is null
I solved the problem editing the last query into this one:
SELECT COUNT(*),idCopia INTO CNT,idCopiaFilm
FROM avl
WHERE (dataRestituito IS NOT NULL OR idNoleggio IS NULL) AND rownum = 1
GROUP BY idCopia;
IF CNT > 0 THEN
-- FOUND AVAILABLE COPY
END IF;
EXCEPTION
WHEN NO_DATA_FOUND THEN
-- NOT FOUND AVAILABLE COPY
Thank you #Aditya Kakirde ! Your suggestion almost solved the problem.
I have a table in SQL Server that contains the following columns :
Id Name ParentId LevelOrder
8 vehicle 0 0/8/
9 car 8 0/8/9/
10 bike 8 0/8/10/
11 House 0 0/11/
...
This creates a tree.
Say that I have the LevelOrder 0/8/, this should return only the car and bike rows, but how do I handle this in SQL Server?
I have tried :
Select * FROM MyTable WHERE LevelOrder >= '0/8/'
but that does not work.
The underscore character will guarantee at least one character comes after '0/8/', so you don't get a match on the "vehicle" row.
SELECT *
FROM MyTable
WHERE LevelOrder LIKE '0/8/_%'
This code allows you to select values that start with 0/8/
Select * FROM MyTable WHERE LevelOrder like '0/8/%'
Okay -
While #Joe's answer is the simplest and easiest to implement (and possibly better performing than what I'm about to propose...), there are some issues with update anomalies.
Specifically:
You already have a parentId column. You need to synchronize both this and the levelOrder column, or risk inconsistent data. (I believe this also violates 1NF, although my understanding of the exact definition is a little sketchy...)
levelOrder contains the entire heirarchy. If any one parent is moved, all children rows must have levelOrder modified to reflect this (potentially very messy).
In light of this, here's what I recommend:
Drop the levelOrder column, as its existence will (generally) cause problems.
Use a recursive CTE and the parentId column to build the heirarchy dynamically. Either leave the column where it is, or move it to a dedicated relationship table. Moving one parent then requires only one cell to be updated, and cannot result in any (data, not semantic) anomalies. The CTE should look similar to this form (will need to be adjusted for purpose):
WITH heir_parent (parentId, id) as (SELECT parentId, id
FROM table
WHERE id =
UNION ALL
SELECT b.parentId, b.id
FROM heir_parent as a
JOIN table as b
ON b.parentId = a.id)
At the moment, the CTE returns a list of all children of the given id, with their id and their immediate parent. It can be adjusted to return a number of other things as well - although I recommend that the CTE be used only to generate the relationship, and join externally to get the remaining data.
Using Postgres. Here's my scenario:
I have three different tables. One is a title table. The second is a genre table. The third table is used to join the two. When I designed the database, I expected that each title would have one top level genre. After filling it with data, I discovered that there were titles that had two, sometimes, three top level genres.
I wrote a query that retrieves titles and their top level genres. This obviously requires that I join the two tables. For those that only have one top level genre, there is one record. For those that have more, there are multiple records.
I realize I'll probably have to write a custom function of some kind that will handle this for me, but I thought I'd ask if it's possible to do this without doing so just to make sure I'm not missing anything.
Is it possible to write a query that will allow me to select all of the distinct titles regardless of the number of genres that it has, but also include the genre? Or even better, a query that would give me a comma delimited string of genres when there are multiples?
Thanks in advance!
Sounds like a job for array_agg to me. With tables like this:
create table t (id int not null, title varchar not null);
create table g (id int not null, name varchar not null);
create table tg (t int not null, g int not null);
You could do something like this:
SELECT t.title, array_agg(g.name)
FROM t, tg, g
WHERE t.id = tg.t
AND tg.g = g.id
GROUP BY t.title, t.id
to get:
title | array_agg
-------+-----------------------
one | {g-one,g-two,g-three}
three | {g-three}
two | {g-two}
Then just unpack the arrays as needed. If for some reason you really want a comma delimited string instead of an array, then string_agg is your friend:
SELECT t.title, string_agg(g.name, ',')
FROM t, tg, g
WHERE t.id = tg.t
AND tg.g = g.id
GROUP BY t.title, t.id
and you'll get something like this:
title | string_agg
-------+---------------------
one | g-one,g-two,g-three
three | g-three
two | g-two
I'd go with the array approach so that you wouldn't have to worry about reserving a character for the delimiter or having to escape (and then unescape) the delimiter while aggregating.
Have a look at this thread which might answer your question.
This is a dumbed down version of the real table data, so may look bit silly.
Table 1 (users):
id INT
username TEXT
favourite_food TEXT
food_pref_id INT
Table 2 (food_preferences):
id INT
food_type TEXT
The logic is as follows:
Let's say I have this in my food preference table:
1, 'VEGETARIAN'
and this in the users table:
1, 'John', NULL, 1
2, 'Pete', 'Curry', 1
In which case John defaults to be a vegetarian, but Pete should show up as a person who enjoys curry.
Question, is there any way to combine the query into one select statement, so that it would get the default from the preferences table if the favourite_food column is NULL?
I can obviously do this in application logic, but would be nice just to offload this to SQL, if possible.
DB is SQLite3...
You could use COALESCE(X,Y,...) to select the first item that isn't NULL.
If you combine this with an inner join, you should be able to do what you want.
It should go something like this:
SELECT u.id AS id,
u.username AS username,
COALESCE(u.favorite_food, p.food_type) AS favorite_food,
u.food_pref_id AS food_pref_id
FROM users AS u INNER JOIN food_preferences AS p
ON u.food_pref_id = p.id
I don't have a SQLite database handy to test on, however, so the syntax might not be 100% correct, but it's the gist of it.
Goal is to replace a integer value that is returned in a SQL query with the char value that the number represents. For example:
A table attribute labeled ‘Sport’ is defined as a integer value between 1-4. 1 = Basketball, 2 = Hockey, etc. Below is the database table and then the desired output.
Database Table:
Player Team Sport
--------------------------
Bob Blue 1
Roy Red 3
Sarah Pink 4
Desired Outputs:
Player Team Sport
------------------------------
Bob Blue Basketball
Roy Red Soccer
Sarah Pink Kickball
What is best practice to translate these integer values for String values? Use SQL to translate the values prior to passing to program? Use scripting language to change the value within the program? Change database design?
The database should hold the values and you should perform a join to another table which has that data in it.
So you should have a table which has say a list of people
ID Name FavSport
1 Alex 4
2 Gnats 2
And then another table which has a list of the sports
ID Sport
1 Basketball
2 Football
3 Soccer
4 Kickball
Then you would do a join between these tables
select people.name, sports.sport
from people, sports
where people.favsport = sports.ID
which would give you back
Name Sport
Alex Kickball
Gnat Football
You could also use a case statement eg. just using the people table from above you could write something like
select name,
case
when favsport = 1 then 'Basketball'
when favsport = 2 then 'Football'
when favsport = 3 then 'Soccer'
else 'Kickball'
end as "Sport"
from people
But that is certainly not best practice.
MySQL has a CASE statement. The following works in SQL Server:
SELECT
CASE MyColumnName
WHEN 1 THEN 'First'
WHEN 2 THEN 'Second'
WHEN 3 THEN 'Third'
ELSE 'Other'
END
In oracle you can use the DECODE function which would provide a solution where the design of the database is beyond your control.
Directly from the oracle documentation:
Example: This example decodes the value warehouse_id. If warehouse_id is 1, then the function returns 'Southlake'; if warehouse_id is 2, then it returns 'San Francisco'; and so forth. If warehouse_id is not 1, 2, 3, or 4, then the function returns 'Non domestic'.
SELECT product_id,
DECODE (warehouse_id, 1, 'Southlake',
2, 'San Francisco',
3, 'New Jersey',
4, 'Seattle',
'Non domestic') "Location"
FROM inventories
WHERE product_id < 1775
ORDER BY product_id, "Location";
The CASE expression could help. However, it may be even faster to have a small table with an int primary key and a name string such as
1 baseball
2 football
etc, and JOIN it appropriately in the query.
Do you think it would be helpful to store these relationships between integers and strings in the database itself? As long as you have to store these relationships, it makes sense to store it close to your data (in the database) instead of in your code where it can get lost. If you use this solution, this would make the integer a foreign key to values in another table. You store integers in another table, say sports, with sport_id and sport, and join them as part of your query.
Instead of SELECT * FROM my_table you would SELECT * from my_table and use the appropriate join. If not every row in your main column has a corresponding sport, you could use a left join, otherwise selecting from both tables and using = in the where clause is probably sufficient.
definitely have the DB hold the string values. I am not a DB expert by any means, but I would recommend that you create a table that holds the strings and their corresponding integer values. From there, you can define a relationship between the two tables and then do a JOIN in the select to pull the string version of the integer.
tblSport Columns
------------
SportID int (PK, eg. 12)
SportName varchar (eg. "Tennis")
tblFriend Columns
------------
FriendID int (PK)
FriendName (eg. "Joe")
LikesSportID (eg. 12)
In this example, you can get the following result from the query below:
SELECT FriendName, SportName
FROM tblFriend
INNER JOIN tblSport
ON tblFriend.LikesSportID = tblSport.SportID
Man, it's late - I hope I got that right. by the way, you should read up on the different types of Joins - this is the simplest example of one.