Create a report with an query - sql
I have a problem. Consider the following fact and dimension tables in a ROLAP system that collects values of harmful substances measured in foods that are sold in supermarkets.
Fact table:
• Contaminants (TimeID, ShopID, FoodID, substance, quantityPerOunce)
This describes which harmful substance in which quantity was found on a given
food in a given supermarket at a given time.
Dimension tables:
• Time (TimeID, dayNr, dayName, weekNr, monthNr, year)
• Food (FoodID, foodName, brand, foodType)
Example data: (43, egg, Bioland, animalProduct)
• Place (ShopID, name, street1, region, country)
Write one SQL statement to create a report that answers the following query:
List the minimum quantities of the substance "PCB" in animal products and
vegetables (both are foodTypes) that were measured per year in the regions Sachsen,
Thüringen, and Hessen in Germany.
The result should contain years, regions, and the minimum values.
With the same statement, also list
the minimum values per year (i.e. aggregating over all regions in each year)
as well as a grand total with the minimum quantity of PCB in the mentioned regions for animal products and vegetables over all years and all regions.
SQL query
SELECT years, regions, min(quantityPerOunce)
FROM Contaminants as c, Time as t, Food as f, Place as p
WHERE c.TimeID = t.TimeID
AND c.FoodID = f.FoodID
AND c.ShopdID = p.ShopID
AND substance = "PCB"
AND foodType = "vegetables"
AND foodType = "animalProducts"
GROUP BY regions;
I don't know how to solve this kind of exercise. I tried it, but I don't know. And the join should be Equi-Join even if this not the best way.
You are close. First, remember that in GROUP BY queries, the non-aggregate fields in your SELECT must also appear on the GROUP BY line. So, you should have:
GROUP BY years, regions;
Further, if you use this:
foodType = 'vegetables' AND foodType = 'animalProducts'
the query will return nothing, because the foodType can't be both at the same time.
As such, you need this:
(foodType = 'vegetables' OR foodType = 'animalProducts')
or alternatively:
foodType IN ('vegetables','animalProducts')
Your query assumes that regions only contains the three listed regions. If you aren't 100% sure about that, it would be better to specify them explicitly with:
AND regions IN ('Sachsen', 'Thüringen', 'Hessen')
This alone also assumes that these regions are only in Germany. This may be true. It might not be though, so it would be safest to also add:
AND country = 'Germany'
So, something along these lines:
SELECT years, regions, MIN(quantityPerOunce) AS min_quantityPerOunce
FROM Contaminants as c, Time as t, Food as f, Place as p
WHERE c.TimeID = t.TimeID
AND c.FoodID = f.FoodID
AND c.ShopdID = p.ShopID
AND substance = 'PCB'
AND foodType IN ('vegetables','animalProducts')
AND regions IN ('Sachsen', 'Thüringen', 'Hessen')
AND country = 'Germany'
GROUP BY years, regions;
Forgive me if I'm mistaken, but it does seem like this might be a school assignment, so it may help to think about general principles in the future:
Identify ALL the nouns in the problem statement (the names of the regions, the name of the country, the names of the food types, the name of the substance) and make sure they are all represented in the query. They likely wouldn't be mentioned in the problem statement / client request if they weren't important. This is a good rule of thumb for professional settings as well as educational settings.
As a rule, fields in the SELECT which aren't aggregates must also be in the GROUP BY. You can have fields in the GROUP BY which are not in the SELECT, but this is far less common.
For parts of the request which list some items from the same field (regions, for example), use field IN (item1,item2,...,itemX) to allow an OR operator on each of the items.
As an addendum, if you have a dimension table called Time, you may want to enclose the name in double-quotes in some systems to avoid confusion with what is normally a system name of some kind.
Related
SQL: Calculate the rating based on different columns and use it as an argument
I'm trying to calculate the rating based on a table that has 3 columns with different ratings ranging from 1 to 5. I wanted to calculate the average of these 3 values and then be able to use this as an argument in queries, for example: Where Rating >3.5 At this moment I have this that gives me the average for all suppliers SELECT c.Name ,(SELECT CAST(AVG(rat) AS DECIMAL(5, 2)) FROM( VALUES(b.Qty_Price), (b.Quality), (b.DeliveryTime)) A (rat)) AS Rating FROM Order a JOIN Evaluation b ON b.ID_Evaluation = a.ID_Evaluation JOIN Supplier c ON c.NIF_Supplier = a.NIF_Supplier What I would like now is, for example, to filter the providers that have more than 3 ratings, but I don't know how I can do that. If anyone can help i would be grateful
If the query works like you want it, you get the average for all entries, that is. The WHERE rating > 3.5 cannot be added, as rating does not exist in the context of the SELECT-clause, nor the tables we JOIN. To overcome this issue, we can keep the query that you have made, call it something different using WITH and SELECT from that sub-query WHERE rating > 3.5 It should look something like this: WITH Averages(name, rating) AS (SELECT c.name ,(SELECT CAST(AVG(rat) AS DECIMAL(5, 2)) FROM( VALUES(b.qty_Price), (b.quality), (b.deliveryTime)) AS (rat)) AS rating FROM Order a JOIN Evaluation b ON b.ID_Evaluation = a.ID_Evaluation JOIN Supplier c ON c.NIF_Supplier = a.NIF_Supplier) SELECT name, rating FROM Averages WHERE rating > 3.5; Now, we simply call the query you provided as Averages for example, and we SELECT from that table WHERE rating > 3.5. Also note that you can have multiple WITHs to make things easier for you, but remember that a comma (,) is needed to seperate them. In our case, we only have 1 use of WITH ... AS, so no need for a comma or semi-colon after ...= a.NIF_Supplier) Looks like you typed only "A" before "(rat)", it should be "AS". Also, remember that attributes should be lowercase, it makes it easier for all of us to distinguish tables from attributes. Cheers!
Query complex in Oracle SQL
I have the following tables and their fields They ask me for a query that seems to me quite complex, I have been going around for two days and trying things, it says: It is desired to obtain the average age of female athletes, medal winners (gold, silver or bronze), for the different modalities of 'Artistic Gymnastics'. Analyze the possible contents of the result field in order to return only the expected values, even when there is no data of any specific value for the set of records displayed by the query. Specifically, we want to show the gender indicator of the athletes, the medal obtained, and the average age of these athletes. The age will be calculated by subtracting from the system date (SYSDATE), the date of birth of the athlete, dividing said value by 365. In order to avoid showing decimals, truncate (TRUNC) the result of the calculation of age. Order the results by the average age of the athletes. Well right now I have this: select person.gender,score.score from person,athlete,score,competition,sport where person.idperson = athlete.idathlete and athlete.idathlete= score.idathlete and competition.idsport = sport.idsport and person.gender='F' and competition.idsport=18 and score.score in ('Gold','Silver','Bronze') group by person.gender, score.score; And I got this out By adding the person.birthdate field instead of leaving 18 records of the 18 people who have a medal, I'm going to many more records. Apart from that, I still have to draw the average age with SYSDATE and TRUNC that I try in many ways but I do not get it. I see it very complicated or I'm a bit saturated from so much spinning, I need some help.
Reading the task you got, it seems that you're quite close to the solution. Have a look at the following query and its explanation, note the differences from your query, see if it helps. select p.gender, ((sysdate - p.birthday) / 365) age, s.score from person p join athlete a on a.idathlete = p.idperson left join score s on s.idathlete = a.idathlete left join competition c on c.idcompetition = s.idcompetition where p.gender = 'F' and s.score in ('Gold', 'Silver', 'Bronze') and c.idsport = 18 order by age; when two dates are subtracted, the result is number of days. Dividing it by 365, you - roughly - get number of years (as each year has 365 days - that's for simplicity, of course, as not all years have that many days (hint: leap years)). The result is usually a decimal number, e.g. 23.912874918724. In order to avoid that, you were told to remove decimals, so - use TRUNC and get 23 as the result although data model contains 5 tables, you don't have to use all of them in a query. Maybe the best approach is to go step-by-step. The first one would be to simply select all female athletes and calculate their age: select p.gender, ((sysdate - p.birthday) / 365 age from person p where p.gender = 'F' Note that I've used a table alias - I'd suggest you to use them too, as they make queries easier to read (table names can have really long names which don't help in readability). Also, always use table aliases to avoid confusion (which column belongs to which table) Once you're satisfied with that result, move on to another table - athlete It is here just as a joining mechanism with the score table that contains ... well, scores. Note that I've used outer join for the score table because not all athletes have won the medal. I presume that this is what the task you've been given says: ... even when there is no data of any specific value for the set of records displayed by the query. It is suggested that we - as developers - use explicit table joins which let you to see all joins separated from filters (which should be part of the WHERE clause). So: NO : from person p, athlete a where a.idathlete = p.idperson and p.gender = 'F' YES: from person p join athlete a on a.idathlete = p.idperson where p.gender = 'F' Then move to yet another table, and so forth. Test frequently, all the time - don't skip steps. Move on to another one only when you're sure that the previous step's result is correct, as - in most cases - it won't automagically fix itself.
Sorting with many to many relationship
I have a 3 tables person, person_speaks_language and language. person has 80 records language has 2 records I have the following records the first 10 persons speaks one language the first 70 persons (include the first group) speaks 2 languages the last 10 persons dont speaks any language Following with the example I want sort the persons by language, How I can do it correctly. I'm trying to use the the following SQL but seems quite strange SELECT "person".* FROM "person" LEFT JOIN "person_speaks_language" ON "person"."id" = "person_speaks_language"."person_id" LEFT JOIN "language" ON "person_speaks_language"."language_id" = "language"."id" ORDER BY "language"."name" ASC dataset 71,Catherine,Porter,male,NULL 72,Isabelle,Sharp,male,NULL 73,Scott,Chandler,male,NULL 74,Jean,Graham,male,NULL 75,Marc,Kennedy,male,NULL 76,Marion,Weaver,male,NULL 77,Melvin,Fitzgerald,male,NULL 78,Catherine,Guerrero,male,NULL 79,Linnie,Strickland,male,NULL 80,Ann,Henderson,male,NULL 11,Daniel,Boyd,female,English 12,Ora,Beck,female,English 13,Hulda,Lloyd,female,English 14,Jessie,McBride,female,English 15,Marguerite,Andrews,female,English 16,Maurice,Hamilton,female,English 17,Cecilia,Rhodes,female,English 18,Owen,Powers,female,English 19,Ivan,Butler,female,English 20,Rose,Bishop,female,English 21,Franklin,Mann,female,English 22,Martha,Hogan,female,English 23,Francis,Oliver,female,English 24,Catherine,Carlson,female,English 25,Rose,Sanchez,female,English 26,Danny,Bryant,female,English 27,Jim,Christensen,female,English 28,Eric,Banks,female,English 29,Tony,Dennis,female,English 30,Roy,Hoffman,female,English 31,Edgar,Hunter,female,English 32,Matilda,Gordon,female,English 33,Randall,Cruz,female,English 34,Allen,Brewer,female,English 35,Iva,Pittman,female,English 36,Garrett,Holland,female,English 37,Johnny,Russell,female,English 38,Nina,Richards,female,English 39,Mary,Ballard,female,English 40,Adrian,Sparks,female,English 41,Evelyn,Santos,female,English 42,Bess,Jackson,female,English 43,Nicholas,Love,female,English 44,Fred,Perkins,female,English 45,Cynthia,Dunn,female,English 46,Alan,Lamb,female,English 47,Ricardo,Sims,female,English 48,Rosie,Rogers,female,English 49,Susan,Sutton,female,English 50,Mary,Boone,female,English 51,Francis,Marshall,male,English 52,Carl,Olson,male,English 53,Mario,Becker,male,English 54,May,Hunt,male,English 55,Sophie,Neal,male,English 56,Frederick,Houston,male,English 57,Edwin,Allison,male,English 58,Florence,Wheeler,male,English 59,Julia,Rogers,male,English 60,Janie,Morgan,male,English 61,Louis,Hubbard,male,English 62,Lida,Wolfe,male,English 63,Alfred,Summers,male,English 64,Lina,Shaw,male,English 65,Landon,Carroll,male,English 66,Lilly,Harper,male,English 67,Lela,Gordon,male,English 68,Nina,Perry,male,English 69,Dean,Perez,male,English 70,Bertie,Hill,male,English 1,Nelle,Gill,female,Spanish 2,Lula,Wright,female,Spanish 3,Anthony,Jensen,female,Spanish 4,Rodney,Alvarez,female,Spanish 5,Scott,Holmes,female,Spanish 6,Daisy,Aguilar,female,Spanish 7,Elijah,Olson,female,Spanish 8,Alma,Henderson,female,Spanish 9,Willie,Barrett,female,Spanish 10,Ada,Huff,female,Spanish 11,Daniel,Boyd,female,Spanish 12,Ora,Beck,female,Spanish 13,Hulda,Lloyd,female,Spanish 14,Jessie,McBride,female,Spanish 15,Marguerite,Andrews,female,Spanish 16,Maurice,Hamilton,female,Spanish 17,Cecilia,Rhodes,female,Spanish 18,Owen,Powers,female,Spanish 19,Ivan,Butler,female,Spanish 20,Rose,Bishop,female,Spanish 21,Franklin,Mann,female,Spanish 22,Martha,Hogan,female,Spanish 23,Francis,Oliver,female,Spanish 24,Catherine,Carlson,female,Spanish 25,Rose,Sanchez,female,Spanish 26,Danny,Bryant,female,Spanish 27,Jim,Christensen,female,Spanish 28,Eric,Banks,female,Spanish 29,Tony,Dennis,female,Spanish 30,Roy,Hoffman,female,Spanish 31,Edgar,Hunter,female,Spanish 32,Matilda,Gordon,female,Spanish 33,Randall,Cruz,female,Spanish 34,Allen,Brewer,female,Spanish 35,Iva,Pittman,female,Spanish 36,Garrett,Holland,female,Spanish 37,Johnny,Russell,female,Spanish 38,Nina,Richards,female,Spanish 39,Mary,Ballard,female,Spanish 40,Adrian,Sparks,female,Spanish 41,Evelyn,Santos,female,Spanish 42,Bess,Jackson,female,Spanish 43,Nicholas,Love,female,Spanish 44,Fred,Perkins,female,Spanish 45,Cynthia,Dunn,female,Spanish 46,Alan,Lamb,female,Spanish 47,Ricardo,Sims,female,Spanish 48,Rosie,Rogers,female,Spanish 49,Susan,Sutton,female,Spanish 50,Mary,Boone,female,Spanish 51,Francis,Marshall,male,Spanish 52,Carl,Olson,male,Spanish 53,Mario,Becker,male,Spanish 54,May,Hunt,male,Spanish 55,Sophie,Neal,male,Spanish 56,Frederick,Houston,male,Spanish 57,Edwin,Allison,male,Spanish 58,Florence,Wheeler,male,Spanish 59,Julia,Rogers,male,Spanish 60,Janie,Morgan,male,Spanish 61,Louis,Hubbard,male,Spanish 62,Lida,Wolfe,male,Spanish 63,Alfred,Summers,male,Spanish 64,Lina,Shaw,male,Spanish 65,Landon,Carroll,male,Spanish 66,Lilly,Harper,male,Spanish 67,Lela,Gordon,male,Spanish 68,Nina,Perry,male,Spanish 69,Dean,Perez,male,Spanish 70,Bertie,Hill,male,Spanish Update the expect results are: each person must be appears only one time using the language order For explain the case further, I'll take a new and small dataset, using only the person id and the language name 1,English 2,English 3,English 4,English 19,English 1,Spanish 2,Spanish 3,Spanish 4,Spanish 5,Spanish 14,Spanish 15,Spanish 16,Spanish 19,Spanish 21,Spanish 25,Spanish I'm using the same order but if I use a limit for example LIMIT 8 the results will be 1,English 2,English 3,English 4,English 19,English 1,Spanish 2,Spanish 3,Spanish And the expected result is 1,English 2,English 3,English 4,English 19,English 5,Spanish 14,Spanish 15,Spanish What I'm trying to do What I'm trying to do is sorting, paginating and filtering a list of X that may have a many-to-many relationship with Y, in this case X is a person and Y is the language. I need do it in a general way. I found a trouble if I want ordering the list by some Y properties. The list will show in this way: firstname, lastname, gender , languages Daniel , Boyd , female , English Spanish Ora , Beck , female , English Anthony , Jensen , female , Spanish .... I only need return a array with the IDs in the correct order this is the main reason I need that the results only appears the person one time is because the ORM (that I'm using) try to hydrate each result and if I paginate the results using offset and limit. the results maybe aren't the expected. I'm doing assumptions many to many relationships I can't use the string_agg or group_concat because I dont know the real data, I dont know if are integers or strings
If you want each person to appear only once, then you need to aggregate by that person. If you then want the list of languages, you need to combine them in some way, concatenation comes to mind. The use of double quotes suggests Postgres or Oracle to me. Here is Postgres syntax for this: SELECT p.id, string_agg(l.name) as languages FROM person p LEFT JOIN person_speaks_language psl ON p.id = psl.person_id LEFT JOIN language l ON psl.language_id = l.id GROUP BY p.id ORDER BY COUNT(l.name) DESC, languages; Similar functionality to string_agg() exists in most databases.
There is nothing wrong with Bertie Hill appearing in two rows, with one language each, that is the Tabular View of Data per the Relational Model. There are no dependencies on data values or number of data values. It is completely correct and un-confused. But here, the requirement is confused, because you really want three separate lists: speaks one language speaks two languages [or the number of languages currently in the language file] speaks no language [on file] ) ... But you want those three lists in one list. Concatenating data values is never, ever a good idea. It is a breach of rudimentary standards, specifically 1NF. It may be common, but it is a gross error. It may be taught by the so-called "theoreticians", but it remains a gross error. Even in a result set, yes. It creates confusion, such as I have detailed at the top. With concatenated strings, as the number of languages changes, the width of that concatenated field will grow, and eventually exceed space, wherever it appears (eg. the width of the field on the screen). Just two of the many reasons why it is incorrect, not expandable, sub-standard. By the way, in your "dataset" (it isn't the result set produced by your code), the sexes appear to be nicely mixed up. Therefore the answer, and the only correct one, even if it isn't popular, is that your code is correct (it can be cleaned it up, sure), and you have to educate the user re the dangers of sub-standard code or reports. You can sort by person.name (rather than by language.name) and then write smarter SQL such that (eg) the person.name is not repeated on the second and subsequent row for persons who speak more than one language, etc. That is just pretty printing. The non-answer, for those who insist on sub-standard code that will break one day when, is Gordon's response. Response to Comments In the Relational Model: There is no order to the rows, that is deemed a physical or implementation aspect, which we have no control over, and which changes anyway, and which we are warned not to rely upon. If order is sought in the output result set, then we must us ORDER BY, that is its purpose in life. The data has meaning, and that meaning is carried in Relational Keys. Meaning cannot be carried in surrogates (ie. ID columns). Limiting myself to the files (they are not tables) that you have given, there is no such thing in the data as: the first 10 persons who speaks one language Obtaining persons who speak one language is simple, I believe you already understand that: SELECT person.first_name, person.last_name FROM person P, (SELECT person_id FROM person_speaks_language GROUP BY person_id HAVING COUNT(*) = 1 -- change this for 2 languages, etc ) AS PL WHERE P.person_id = PL.person_id But "first" ? "first" by what criteria ? Record creation date ? ORDER BY date_created -- if it exists in the data Record ID does not give first anything: as records are added and deleted, any "order" that may exist initially is completely lost. You cannot extract meaning out of, or assign meaning to something that, by definition, has no meaning. If the Record ID is relevant, ie. you are going to use it for some purpose, then it is not a Record ID, name the field for what it actually is. I fail to see, I do not understand, the relevance of the difference between the "dataset" and the updated "small dataset". The "dataset" size is irrelevant, the field headings are irrelevant, what the result set means, is relevant. The problem is not some "limitation" in the Relational Model, the problem is (a) your fixed view of data values, and (b) your lack of understanding about what the Relational Model is, what it does, understanding of which makes this whole question disappear, and we are left with a simple SQL (as tagged) "how to" question. Eg. If I had a Relational Database, with persons and languages, with no ID columns, there is nothing that I cannot do with it, no report that I cannot produce from it, from the data. Please try to use an example that conveys the meaning in the data, in what you are trying to do. the expect results are: each person must be appear only one time They already appear only once (for each language) using the language order Well, there is no order in the language file. We can give it some order, whatever order is meaning-ful, to you, in the result set, based on the data. Eg. language.name. Of course, many persons speak each language, so what order would you like within language.name? How about last_name, first_name. The Record IDs are meaningless to the user, so I won't display them in the result set. NULL is also meaningless, and ambiguous, so I will make the meaning here explicit. This is pretty much what you have, tidied up: SELECT [language] = CASE name WHEN NULL THEN "[None]" ELSE name END, last_name, first_name FROM person P LEFT JOIN person_speaks_language PL ON P.id = PL.person_id LEFT JOIN language L ON PL.language_id = L.id ORDER BY name, last_name, first_name But then you have: And the expected result is The example data of which contradicts your textual descriptions: the expect results are: each person must be appear only one time using the language order So now, if I ignore the text, and examine the example data re what you want (which is a horrible thing to do, because I am joining you in the incorrect activity of focussing on the data values, rather than understanding the meaning), it appears you want the person to appear only once, full stop, regardless of how many languages they speak. Your example data is meaningless, so I cannot be asked to reproduce it. See if this has some meaning. SELECT last_name, first_name, [language] = ( -- correlated subquery SELECT TOP 1 -- get the "first" language CASE name -- make meaning of null explicit WHEN NULL THEN "[None]" ELSE name END FROM person_speaks_language PL JOIN language L ON PL.language_id = L.id WHERE P.id = PL.person_id -- the subject person ORDER BY name -- id would be meaningless ) FROM person P -- vector for person, once ORDER BY last_name, first_name Now if you wanted only persons who speak a language (on file): SELECT last_name, first_name, [language] = ( -- correlated subquery SELECT TOP 1 -- get the "first" language name FROM person_speaks_language PL JOIN language L ON PL.language_id = L.id WHERE P.id = PL.person_id -- the subject person ORDER BY name -- id would be meaningless ) FROM person P, ( SELECT DISTINCT person_id -- just one occ, thanks FROM person_speaks_language PL -- vector for speakers ) AS PL_1 WHERE P.id = PL_1.person_id -- join them to person fields There, not an outer join anywhere to be seen, in either solution. LEFT or RIGHT will confuse you. Do not attempt to "get everything", so that you can "see" the data values, and then mangle, hack and chop away at the result set, in order to get what you want from that. No, forget about the data values and get only what you want from the record filing system. Response to Update I was trying to explain the case with a data set, I think I made things tougher than they actually were Yes, you did. Reviewing the update then ... The short answer is, get rid of the ORM. There is nothing in it of value: you can access the RDB from the queries that populate your objects directly. The way we did for decades before the flatulent beast came along. Especially if you understand and implement Open Architecture Standards. Further, as evidenced, it creates masses of problems. Here, you are trying to work around the insane restrictions of the ORM. Pagination is a straight-forward issue, if you have your data Normalised, and Relational Keys. The long answer is ... please read this Answer. I trust you will understand that the approach you take to designing your app components, your design of windows, will change. All your queries will be simplified, you get only what you require for the specific window or object. The problem may well disappear entirely (except for possibly the pagination, you might need a method). Then please think about those architectural issues carefully, and make specific comments of questions.
Problems with distinct in SQL query
Okay, i've been trying it for a while and haven't succeeded yet, it's kind of mystical, so please help. Here is my table. I need to select all distinct models and group/order them by the vehicle_type. Everything is ok until I start using DISTINCT. I'm using postgres Little help with query please?
Assuming model could be shared between several vehicle types: SELECT vehicle_type,model FROM vehicle GROUP BY vehicle_type,model ORDER BY vehicle_type,model
The data model does not adequately capture your reporting requirments as the column data needs to be inspected to categorise it but something like: (Extrapolating a possible relationship from your description) SELECT CASE (vt.description ~ 'car$') WHEN TRUE THEN 'car' ELSE 'van' END AS vehicle_group, vt.description AS vehicle_sub_group, COUNT (*) -- or whatever aggregates you might need FROM vehicle v INNER JOIN vehicle_type vt ON vt.vehicle_type = v.vehicle_type GROUP BY 1,2; Might get you towards what you need in the stated case, however it is a fragile way of dealing with data and will not cope well with additional complexities e.g. if you need to further split car into saloon car, sports car, 4WD or van into flatbed, 7.5 ton, 15 ton etc.
SQL sub queries - is there a better way
This is an SQL efficiency question. A while back I had to write a collection of queries to pull data from an ERP system. Most of these were simple enough but one of them resulted in a rather ineficient query and its been bugging me ever since as there's got to be a better way. The problem is not complex. You have rows of sales data. In each row you have quantity, sales price and the salesman code, among other information. Commission is paid based on a stepped sliding scale. The more they sell, the better the commission. Steps might be 1000, 10000, 10000$ and so forth. The real world problem is more complex but thats it essentially it. The only way I found of doing this was to do something like this (obviously not the real query) select qty, price, salesman, (select top 1 percentage from comissions where comisiones.salesman = saleslines.salesman and saleslines.qty > comisiones.qty order by comissiones.qty desc ) percentage from saleslines this results in the correct commission but is horrendously heavy. Is there a better way of doing this? I'm not looking for someone to rewrite my sql, more 'take a look as foobar queries' and I can take it from there. The real life commission structure can be specified for different salesmen, articles and clients and even sales dates. It also changes from time to time, so everything has to be driven by the data in the tables... i.e I can't put fixed ranges in the sql. The current query returns some 3-400000 rows and takes around 20-30 secs. Luckily its only used monthly but the slowness is kinda bugging me. This is on mssql. Ian edit: I should have given a more complex example from the beginning. I realize now that my initial example is missing a few essential elements of the complexity, apologies to all. This may better capture it select client-code, product, product-family, qty, price, discount, salesman, (select top 1 percentage from comissions where comisiones.salesman = saleslines.salesman and saleslines.qty > comisiones.qty and [ a collection of conditions which may or may not apply: Exclude rows if the salesman has offered discounts above max discounts which appear in each row in the commissions table There may be a special scale for the product family There may be a special scale for the product There may be a special scale for the client A few more cases ] order by [ The user can control the order though a table which can prioritize by client, family or product It normally goes from most to least specific. ] ) percentage from saleslines needless to say the real query is not easy to follow. Just to make life more interesting, its naming is multi language. Thus for every row of salesline the commission can be different. It may sound overly complex but if you think of how you would pay commission it makes sense. You don't want to pay someone for selling stuff at high discounts, you also want to be able to offer a particular client a discount on a particular product if they buy X units. The salesman should earn more if they sell more. In all the above I'm excluding date limited special offers. I think partitions may be the solution but I need to explore this more indepth as I know nothing about partitions. Its given me a few ideas.
If you are using a version of SQL Server that supports common-table expressions such as SQL Server 2005 and later, a more efficient solution might be: With RankedCommissions As ( Select SL.qty, SL.price, SL.salesman, C.percentage , Row_Number() Over ( Partition By SL.salesman Order By C.Qty Desc ) As CommissionRank From SalesLines As SL Join Commissions As C On SL.salesman = C.salesman And SL.qty > C.qty ) Select qtr, price, salesman, percentage From RankedCommissions Where CommissionRank = 1 If you needed to account for the possibility that there are no Commissions values for a given salesman where the SalesLine.Qty > Commission.Qty, then you could do something like: With RankedCommissions As ( Select SL.qty, SL.price, SL.salesman, C.percentage , Row_Number() Over ( Partition By SL.salesman Order By C.Qty Desc ) As CommissionRank From SalesLines As SL Join Commissions As C On SL.salesman = C.salesman And SL.qty > C.qty ) Select SL.qtr, SL.price, SL.salesman, RC.percentage From SalesLines As SL Left Join RankedCommissions As RC On RC.salesman = SL.salesman And RC.CommissionRank = 1
select qty, price, salesman, max(percentage) from saleslines inner join comissions on commisions.salesman = saleslines.salesman and saleslines.qty > comissions.qty group by qty, price, salesman