Data Driven Restrictions of Plan Selections - sql
I have a complex data structure I am working with and I am not quite sure how to tackle it in a single SQL query, although my gut tells me this should be possible to do.
The essence of what I am doing is trying to display the results of available plans for a given vendor based on the selected hardware model. The results should adhere to only possible combinations, and the plans contain restrictions which are currently stored as key/value pairs in a restrictions table. Below is a simplification of what I am working with:
(I will use a wireless device analogy since almost everyone is familair with cell phones)
models Table
model_id
vendor_id
is_data
is_voice
is_4g
is_3g
Sample Data:
model_id,vendor_id,is_data,is_voice,is_4g,is_3g
DeviceA,Sprint,1,1,0,1
DeviceB,Sprint,1,0,1,0
DeviceC,Sprint,0,1,0,0
DeviceD,Sprint,0,1,0,0
DeviceE,Sprint,0,1,0,0
DeviceF,Verizon,1,1,0,1
DeviceG,Verizon,1,0,1,0
DeviceH,Verizon,0,1,0,0
DeviceI,Verizon,0,1,0,0
DeviceJ,Verizon,0,1,0,0
DeviceK,Tmobile,1,1,0,1
DeviceL,Tmobile,1,0,1,0
DeviceM,Tmobile,0,1,0,0
DeviceN,Tmobile,0,1,0,0
DeviceO,Tmobile,0,1,0,0
plans Table
plan_id
vendor_id
name
Sample Data:
plan_id,vendor_id,name
PlanA,Sprint,Big Data Only Plan
PlanB,Verizon,Small Data Only Plan
PlanC,Sprint,300 Min Plan
PlanD,Verizon,900 Min Plan
PlanE,Verizon,Big Data Only Plan
PlanF,Tmobile,Small Data Only Plan
PlanG,Tmobile,300 Min Plan
PlanH,Tmobile,1000 Min Plan
plan_restrictions Table
restriction_id
vendor_id
plan_id
type
value
Sample Data:
restriction_id,vendor_id,plan_id,type,value
1,Sprint,PlanA,radio,3G
2,Sprint,PlanA,device_type,data
3,Verizon,PlanB,radio,4G
4,Sprint,PlanC,radio,3G
5,Sprint,PlanC,device_type,voice
6,Verizon,PlanD,radio,3G
7,Verizon,PlanD,device_type,voice
8,Verizon,PlanE,radio,3G
9,Verizon,PlanE,device_type,voice
10,Tmobile,PlanF,device_type,data
11,Tmobile,PlanG,device_type,voice
12,Tmobile,PlanH,device_type,voice
Restrictions keyed (I have closer to 50 actually, here is a same type of representation):
type / value possibilities
radio / 3g, 4g
device_type / data, voice
I am open to the possibility of restructuring the tables to make it easier to re-query, however I need to retain a certain amount of flexibility since I do have about 1000 models, 1000 plans, and about 2000 restrictions.
I personally think there is some sort of structure issue here, ie. models perhaps should have their elements as key/value pairs in a separate table, but that is even more complexity, and I haven't determined yet how to properly apply data driven restrictions in the first place.
Something like this should get you started:
SELECT p.name
FROM Plans as p
INNER JOIN plan_restriction as pr
ON p.plan_id = pr.plan_id
INNER JOIN models as m
ON pr.model_id = pr.model_id
WHERE p.vendor_id = 1 AND m.is_data = 1 AND is_4g = 1 AND ...
I kicked this around for about the last hour with the other dba's here and think I solved it. I am posting this for anyone who finds themselves in a similar situation. The biggest problem was that I was too close to the data, and was trying enforce "meaningful" properties and restrictions between the plans needs and the models properties.. which isn't really necessary.
I can restructure my data to be in the following tables:
Plans
Restrictions
Models
Plans would have a many to many relationship to Restrictions
Models would have a many to many relationship to Restrictions
I would solve the many to many relationships with intirum tables
Plans_Restrictions
Models_Restrictions
This would allow me to have stupid "Restrictions" such as a "Red Thing"
I would query as a chain:
Plans
Plans_Restrictions
Restrictions
Models_Restrictions
Models
ie. To get all models with their properties information (restriction info) that are eligible for a plan I could use:
SELECT
M.*
,R.*
FROM (
SELECT P1.*
FROM Plans P1
WHERE id_vendor = #id_vendor
) P
INNER JOIN Plans_Restrictions PR
ON P.plan_id = PR.plan_id
INNER JOIN Restrictions R
ON PR.property = R.property
INNER JOIN Model_Restrictions MR
ON R.property = MR.property
INNER JOIN Model M
ON MR.model_id = M.model_id
And to get all the plans that are eligible for a model, i would reverse the 5 table chained join.
Thanks Abe.. writing this all down in detail to explain it, and understanding why your suggestion didn't solve my problem really helped me understand what my problem was and what I really needed to do. I don't think I would have solved it so fast without you.
Related
My Joins in query not pulling through correctly
Good evening. Could someone please help me with the following. I am trying to join two tables.The first id wbr_global.gl_ap_details. This stores historic GL information. The second table sandbox.utr_fixed_mapping is where account mapping is stored. For example, ana ccount number 60820 is mapped as Employee relation. The first table needs the mapping from the second table linked on the account number. The output I am getting is not right and way to bug. Any help would be appreciated! Output select sandbox.utr_fixed_mapping_na.new_mapping_1,sum(wbr_global.gl_ap_details.amount) from wbr_global.gl_ap_details LEFT JOIN sandbox.utr_fixed_mapping_na ON wbr_global.gl_ap_details.account_number = sandbox.utr_fixed_mapping_na.account_number Where gl_ap_details.cost_center = '1172' and gl_ap_details.period_name = 'JUL-21' and gl_ap_details.ledger_name = 'Amazon.com, Inc.' Group by 1; I tried adding the cast function but after 5000 seconds of the query running I canceled it.
The query itself appears ok, but minor changes. Learn to use table "aliases". This way you don't have to keep typing long database.table.column all over. Additionally, SQL is easier to read doing it that way anyhow. Notice the aliases "gl" and "fm" after the tables are declared, then these aliases are used to represent the columns.. Easier to read, would you agree. Added GL Account number as described below the query. select gl.account_number, fm.new_mapping_1, sum(gl.amount) from wbr_global.gl_ap_details gl LEFT JOIN sandbox.utr_fixed_mapping_na fm ON gl.account_number = fm.account_number Where gl.cost_center = '1172' and gl.period_name = 'JUL-21' and gl.ledger_name = 'Amazon.com, Inc.' Group by gl.account_number, fm.new_mapping_1 Now, as for your query and getting null. This just means that there are records within the gl_ap_details table with an account number that is not found in the utr_fixed_mapping_na table. So, to see WHAT gl account number does NOT exist, I have added it to the query. Its possible there are MULTIPLE records in the gl_ap_details that are not found in the mapping table. So, you may get GLAccount Description SumOfAmount glaccount1 null $someAmount glaccount37 null $someAmount glaccount49 null $someAmount glaccount72 Depreciation $someAmount glaccount87 Real Estate $someAmount glaccount92 Building $someAmount glaccount99 Salaries $someAmount I obviously made-up glaccounts just to show the purpose. You may have multiple where the null's total amount is actually masking how many different gl account numbers were NOT found. Once you find which are missing, you can check / confirm they SHOULD be in the mapping table. FEEDBACK. Since you do realize the missing numbers, lets consider a Cartesian result. If there are multiple entries in the mapping table for the same G/L account number, you will get a Cartesian result thus bloating your numbers. To clarify, lets say your mapping table has Mapping file. GL Descr1 NewMapping 1 test Salaries 1 testView Buildings 1 Another Depreciation And your GL_AP_Details has GL Amount 1 $100 Your total for the query would result in $300 because the query is trying to join the AP Details GL #1 to EACH of the entries in the mapping file thus bloating the amount. You could also add a COUNT(*) as NumberOfEntries to the query to see how many transactions it THINKS it is processing. Is there some "unique ID" in the GL_AP_Details table? If so, then you could also do a count of DISTINCT ID values. If they are different (distinct is lower than # of entries), I think THAT is your culprit. select fm.new_mapping_1, sum(gl.amount), count(*) as NumberOfEntries, count( distinct gl.UniqueIdField ) as DistinctTransactions from wbr_global.gl_ap_details gl LEFT JOIN sandbox.utr_fixed_mapping_na fm ON gl.account_number = fm.account_number Where gl.cost_center = '1172' and gl.period_name = 'JUL-21' and gl.ledger_name = 'Amazon.com, Inc.' Group by fm.new_mapping_1 Might you also need to limit the mapping table for a specific prophecy or mec view?
If you "think" that the result of an aggregate is wrong, then the easiest way to verify this is to select the individual rows that correlate to 1 record in the aggregate output and inspect the records, looking for duplications. For instance, pick 'Building Management': SELECT fixed.new_mapping_1,details.amount,* FROM wbr_global.gl_ap_details details LEFT JOIN sandbox.utr_fixed_mapping_na fixed ON details.account_number = fixed.account_number WHERE details.cost_center = '1172' AND details.period_name = 'JUL-21' AND details.ledger_name = 'Amazon.com, Inc.' AND details.account_number = 'Building Management' Notice that we tack on a ,* to the end of the projection, this will show you everything that the query has access to, you should look for repeating sections of data that you were not expecting, then depending on which table they originate from your might add additional criteria to the JOIN, or to the WHERE or you might need to group by additional columns. This type of issue is really hard to comment on in a forum like this because it is highly specific to your schema, and the data contained within it, making solutions highly subjective to criteria you are not likely to publish online. Generally if you think a calculation is wrong, you need to manually compute it to verify, this above advice helps you to inspect the data your query is using, you should either construct your own query or use other tools to build the data set that helps you to manually compute the correct values, then work them back into or replace your original query. The speed issues are out of scope here, we can comment on the poor schema design but I suspect you don't have a choice. In the utr_fixed_mapping_na table you should make the account_number have the same column type as the source data, or add a new column that has the data in the original type, then you can setup indexes on the columns to improve the speed of the join.
How to select data with a complex condition?
Using Microsoft Access, I normally use condition (mostly where) to obtain the data I want to display. So far, it went well. However now I have a complex filtering and I'm not sure of the best way to do it. I will explain how I do it with many queries, and I'd like to know if there is something simpler, since I feel like it's doing too much for what I accomplish. I have Building and Energy tables. Between them, I have a link table since a Building has a list of possible energies. My goal is to display ALL energy not already associated with the building. I first have a simple query to display all the IDs of energy that are in the link table where building is the one of interest. Once I do that, I have another query using this one, which display an energy if it is an energy absent from previous list. This takes 2 queries and I feel like I could have a better way to do this. I'm fairly new to MS Access, so any suggestion is welcome. Here is the first request to obtain the list of energies: SELECT Batiments.ID, Energies.ID, Energies.Type FROM Energies INNER JOIN (Batiments INNER JOIN Batiment_Energie ON Batiments.ID = Batiment_Energie.Batiment_ID) ON Energies.ID = Batiment_Energie.Energie_ID WHERE (((Batiments.ID) = " & cbxBatiments.Column(0) & "));"
You can query the non-associated energy types with SELECT ID, Type FROM Energies WHERE ID NOT IN (SELECT Energie_ID FROM Batiment_Energie WHERE Batiment_ID = 123) where 123 is to be replaced by the Id comming from cbxBatiments.Column(0).
You can use not exists: select e.* from energie as e where not exists (select 1 from Batiment_Energie as be where be.energie_id = e.id and be.batiment_id = <your id> );
Access query speed differs
I have a local access database and in it a query which takes values from a form to populate a drop down menu. The weird (to me) thing is that with most options this query is quick (blink of an eye), but with a few options it's very slow (>10 seconds). What the query is does is a follows: It populates a dropdown menu to record animals seen at a specific sighting, but only those animals which have not been recorded at that specific sighting yet (to avoid duplicate entries). SELECT DISTINCT tblAnimals.AnimalID, tblAnimals.Nickname, tblAnimals.Species FROM tblSightings INNER JOIN (tblAnimals INNER JOIN tblAnimalsatSighting ON tblAnimals.AnimalID = tblAnimalsatSighting.AnimalID) ON tblSightings.SightingID = tblAnimalsatSighting.SightingID WHERE (((tblAnimals.Species)=[form]![Species]) AND ((tblAnimals.CurrentGroup)=[form]![AnimalGroup2]) AND ((tblAnimals.[Dead?])=False) AND ((Exists (select tblAnimalsatSighting.AnimalID FROM tblAnimalsatSighting WHERE tblAnimals.AnimalID = tblAnimalsatSighting.AnimalID AND tblAnimalsatSighting.SightingID = [form]![SightingID]))=False)); It performs well for all groups of 2 of the 4 possible species, for 1 species it performs well for 4 of the 5 groups, but not for the last group, and for the last species it performs very slowly for both groups. Anybody an idea what can be the cause of this kind of behavior? Is it problems with the query? Or duplicate entries in the tables which can cause this? I don't think it's duplicates in the tables, I've checked that, and there are some, but they appear both for groups where there are problems and where there aren't. Could I re-write the query so it performs faster?
As noted in our comments above, you confirmed that the extra joins were not really need and were in fact going to limit the results to animal that had already had a sighting. Those joins would also likely contribute to a slowdown. I know that Access probably added most of the parentheses automatically but I've removed them and converted the subquery to a not exists form that's a lot more readable. SELECT tblAnimals.AnimalID, tblAnimals.Nickname, tblAnimals.Species FROM tblAnimals WHERE tblAnimals.Species = [form]![Species] AND tblAnimals.CurrentGroup = [form]![AnimalGroup2] AND tblAnimals.[Dead?] = False AND NOT EXISTS ( SELECT tblAnimalsatSighting.AnimalID FROM tblAnimalsatSighting WHERE tblAnimals.AnimalID = tblAnimalsatSighting.AnimalID AND tblAnimalsatSighting.SightingID = [form]![SightingID] );
Sorting with many to many relationship
I have a 3 tables person, person_speaks_language and language. person has 80 records language has 2 records I have the following records the first 10 persons speaks one language the first 70 persons (include the first group) speaks 2 languages the last 10 persons dont speaks any language Following with the example I want sort the persons by language, How I can do it correctly. I'm trying to use the the following SQL but seems quite strange SELECT "person".* FROM "person" LEFT JOIN "person_speaks_language" ON "person"."id" = "person_speaks_language"."person_id" LEFT JOIN "language" ON "person_speaks_language"."language_id" = "language"."id" ORDER BY "language"."name" ASC dataset 71,Catherine,Porter,male,NULL 72,Isabelle,Sharp,male,NULL 73,Scott,Chandler,male,NULL 74,Jean,Graham,male,NULL 75,Marc,Kennedy,male,NULL 76,Marion,Weaver,male,NULL 77,Melvin,Fitzgerald,male,NULL 78,Catherine,Guerrero,male,NULL 79,Linnie,Strickland,male,NULL 80,Ann,Henderson,male,NULL 11,Daniel,Boyd,female,English 12,Ora,Beck,female,English 13,Hulda,Lloyd,female,English 14,Jessie,McBride,female,English 15,Marguerite,Andrews,female,English 16,Maurice,Hamilton,female,English 17,Cecilia,Rhodes,female,English 18,Owen,Powers,female,English 19,Ivan,Butler,female,English 20,Rose,Bishop,female,English 21,Franklin,Mann,female,English 22,Martha,Hogan,female,English 23,Francis,Oliver,female,English 24,Catherine,Carlson,female,English 25,Rose,Sanchez,female,English 26,Danny,Bryant,female,English 27,Jim,Christensen,female,English 28,Eric,Banks,female,English 29,Tony,Dennis,female,English 30,Roy,Hoffman,female,English 31,Edgar,Hunter,female,English 32,Matilda,Gordon,female,English 33,Randall,Cruz,female,English 34,Allen,Brewer,female,English 35,Iva,Pittman,female,English 36,Garrett,Holland,female,English 37,Johnny,Russell,female,English 38,Nina,Richards,female,English 39,Mary,Ballard,female,English 40,Adrian,Sparks,female,English 41,Evelyn,Santos,female,English 42,Bess,Jackson,female,English 43,Nicholas,Love,female,English 44,Fred,Perkins,female,English 45,Cynthia,Dunn,female,English 46,Alan,Lamb,female,English 47,Ricardo,Sims,female,English 48,Rosie,Rogers,female,English 49,Susan,Sutton,female,English 50,Mary,Boone,female,English 51,Francis,Marshall,male,English 52,Carl,Olson,male,English 53,Mario,Becker,male,English 54,May,Hunt,male,English 55,Sophie,Neal,male,English 56,Frederick,Houston,male,English 57,Edwin,Allison,male,English 58,Florence,Wheeler,male,English 59,Julia,Rogers,male,English 60,Janie,Morgan,male,English 61,Louis,Hubbard,male,English 62,Lida,Wolfe,male,English 63,Alfred,Summers,male,English 64,Lina,Shaw,male,English 65,Landon,Carroll,male,English 66,Lilly,Harper,male,English 67,Lela,Gordon,male,English 68,Nina,Perry,male,English 69,Dean,Perez,male,English 70,Bertie,Hill,male,English 1,Nelle,Gill,female,Spanish 2,Lula,Wright,female,Spanish 3,Anthony,Jensen,female,Spanish 4,Rodney,Alvarez,female,Spanish 5,Scott,Holmes,female,Spanish 6,Daisy,Aguilar,female,Spanish 7,Elijah,Olson,female,Spanish 8,Alma,Henderson,female,Spanish 9,Willie,Barrett,female,Spanish 10,Ada,Huff,female,Spanish 11,Daniel,Boyd,female,Spanish 12,Ora,Beck,female,Spanish 13,Hulda,Lloyd,female,Spanish 14,Jessie,McBride,female,Spanish 15,Marguerite,Andrews,female,Spanish 16,Maurice,Hamilton,female,Spanish 17,Cecilia,Rhodes,female,Spanish 18,Owen,Powers,female,Spanish 19,Ivan,Butler,female,Spanish 20,Rose,Bishop,female,Spanish 21,Franklin,Mann,female,Spanish 22,Martha,Hogan,female,Spanish 23,Francis,Oliver,female,Spanish 24,Catherine,Carlson,female,Spanish 25,Rose,Sanchez,female,Spanish 26,Danny,Bryant,female,Spanish 27,Jim,Christensen,female,Spanish 28,Eric,Banks,female,Spanish 29,Tony,Dennis,female,Spanish 30,Roy,Hoffman,female,Spanish 31,Edgar,Hunter,female,Spanish 32,Matilda,Gordon,female,Spanish 33,Randall,Cruz,female,Spanish 34,Allen,Brewer,female,Spanish 35,Iva,Pittman,female,Spanish 36,Garrett,Holland,female,Spanish 37,Johnny,Russell,female,Spanish 38,Nina,Richards,female,Spanish 39,Mary,Ballard,female,Spanish 40,Adrian,Sparks,female,Spanish 41,Evelyn,Santos,female,Spanish 42,Bess,Jackson,female,Spanish 43,Nicholas,Love,female,Spanish 44,Fred,Perkins,female,Spanish 45,Cynthia,Dunn,female,Spanish 46,Alan,Lamb,female,Spanish 47,Ricardo,Sims,female,Spanish 48,Rosie,Rogers,female,Spanish 49,Susan,Sutton,female,Spanish 50,Mary,Boone,female,Spanish 51,Francis,Marshall,male,Spanish 52,Carl,Olson,male,Spanish 53,Mario,Becker,male,Spanish 54,May,Hunt,male,Spanish 55,Sophie,Neal,male,Spanish 56,Frederick,Houston,male,Spanish 57,Edwin,Allison,male,Spanish 58,Florence,Wheeler,male,Spanish 59,Julia,Rogers,male,Spanish 60,Janie,Morgan,male,Spanish 61,Louis,Hubbard,male,Spanish 62,Lida,Wolfe,male,Spanish 63,Alfred,Summers,male,Spanish 64,Lina,Shaw,male,Spanish 65,Landon,Carroll,male,Spanish 66,Lilly,Harper,male,Spanish 67,Lela,Gordon,male,Spanish 68,Nina,Perry,male,Spanish 69,Dean,Perez,male,Spanish 70,Bertie,Hill,male,Spanish Update the expect results are: each person must be appears only one time using the language order For explain the case further, I'll take a new and small dataset, using only the person id and the language name 1,English 2,English 3,English 4,English 19,English 1,Spanish 2,Spanish 3,Spanish 4,Spanish 5,Spanish 14,Spanish 15,Spanish 16,Spanish 19,Spanish 21,Spanish 25,Spanish I'm using the same order but if I use a limit for example LIMIT 8 the results will be 1,English 2,English 3,English 4,English 19,English 1,Spanish 2,Spanish 3,Spanish And the expected result is 1,English 2,English 3,English 4,English 19,English 5,Spanish 14,Spanish 15,Spanish What I'm trying to do What I'm trying to do is sorting, paginating and filtering a list of X that may have a many-to-many relationship with Y, in this case X is a person and Y is the language. I need do it in a general way. I found a trouble if I want ordering the list by some Y properties. The list will show in this way: firstname, lastname, gender , languages Daniel , Boyd , female , English Spanish Ora , Beck , female , English Anthony , Jensen , female , Spanish .... I only need return a array with the IDs in the correct order this is the main reason I need that the results only appears the person one time is because the ORM (that I'm using) try to hydrate each result and if I paginate the results using offset and limit. the results maybe aren't the expected. I'm doing assumptions many to many relationships I can't use the string_agg or group_concat because I dont know the real data, I dont know if are integers or strings
If you want each person to appear only once, then you need to aggregate by that person. If you then want the list of languages, you need to combine them in some way, concatenation comes to mind. The use of double quotes suggests Postgres or Oracle to me. Here is Postgres syntax for this: SELECT p.id, string_agg(l.name) as languages FROM person p LEFT JOIN person_speaks_language psl ON p.id = psl.person_id LEFT JOIN language l ON psl.language_id = l.id GROUP BY p.id ORDER BY COUNT(l.name) DESC, languages; Similar functionality to string_agg() exists in most databases.
There is nothing wrong with Bertie Hill appearing in two rows, with one language each, that is the Tabular View of Data per the Relational Model. There are no dependencies on data values or number of data values. It is completely correct and un-confused. But here, the requirement is confused, because you really want three separate lists: speaks one language speaks two languages [or the number of languages currently in the language file] speaks no language [on file] ) ... But you want those three lists in one list. Concatenating data values is never, ever a good idea. It is a breach of rudimentary standards, specifically 1NF. It may be common, but it is a gross error. It may be taught by the so-called "theoreticians", but it remains a gross error. Even in a result set, yes. It creates confusion, such as I have detailed at the top. With concatenated strings, as the number of languages changes, the width of that concatenated field will grow, and eventually exceed space, wherever it appears (eg. the width of the field on the screen). Just two of the many reasons why it is incorrect, not expandable, sub-standard. By the way, in your "dataset" (it isn't the result set produced by your code), the sexes appear to be nicely mixed up. Therefore the answer, and the only correct one, even if it isn't popular, is that your code is correct (it can be cleaned it up, sure), and you have to educate the user re the dangers of sub-standard code or reports. You can sort by person.name (rather than by language.name) and then write smarter SQL such that (eg) the person.name is not repeated on the second and subsequent row for persons who speak more than one language, etc. That is just pretty printing. The non-answer, for those who insist on sub-standard code that will break one day when, is Gordon's response. Response to Comments In the Relational Model: There is no order to the rows, that is deemed a physical or implementation aspect, which we have no control over, and which changes anyway, and which we are warned not to rely upon. If order is sought in the output result set, then we must us ORDER BY, that is its purpose in life. The data has meaning, and that meaning is carried in Relational Keys. Meaning cannot be carried in surrogates (ie. ID columns). Limiting myself to the files (they are not tables) that you have given, there is no such thing in the data as: the first 10 persons who speaks one language Obtaining persons who speak one language is simple, I believe you already understand that: SELECT person.first_name, person.last_name FROM person P, (SELECT person_id FROM person_speaks_language GROUP BY person_id HAVING COUNT(*) = 1 -- change this for 2 languages, etc ) AS PL WHERE P.person_id = PL.person_id But "first" ? "first" by what criteria ? Record creation date ? ORDER BY date_created -- if it exists in the data Record ID does not give first anything: as records are added and deleted, any "order" that may exist initially is completely lost. You cannot extract meaning out of, or assign meaning to something that, by definition, has no meaning. If the Record ID is relevant, ie. you are going to use it for some purpose, then it is not a Record ID, name the field for what it actually is. I fail to see, I do not understand, the relevance of the difference between the "dataset" and the updated "small dataset". The "dataset" size is irrelevant, the field headings are irrelevant, what the result set means, is relevant. The problem is not some "limitation" in the Relational Model, the problem is (a) your fixed view of data values, and (b) your lack of understanding about what the Relational Model is, what it does, understanding of which makes this whole question disappear, and we are left with a simple SQL (as tagged) "how to" question. Eg. If I had a Relational Database, with persons and languages, with no ID columns, there is nothing that I cannot do with it, no report that I cannot produce from it, from the data. Please try to use an example that conveys the meaning in the data, in what you are trying to do. the expect results are: each person must be appear only one time They already appear only once (for each language) using the language order Well, there is no order in the language file. We can give it some order, whatever order is meaning-ful, to you, in the result set, based on the data. Eg. language.name. Of course, many persons speak each language, so what order would you like within language.name? How about last_name, first_name. The Record IDs are meaningless to the user, so I won't display them in the result set. NULL is also meaningless, and ambiguous, so I will make the meaning here explicit. This is pretty much what you have, tidied up: SELECT [language] = CASE name WHEN NULL THEN "[None]" ELSE name END, last_name, first_name FROM person P LEFT JOIN person_speaks_language PL ON P.id = PL.person_id LEFT JOIN language L ON PL.language_id = L.id ORDER BY name, last_name, first_name But then you have: And the expected result is The example data of which contradicts your textual descriptions: the expect results are: each person must be appear only one time using the language order So now, if I ignore the text, and examine the example data re what you want (which is a horrible thing to do, because I am joining you in the incorrect activity of focussing on the data values, rather than understanding the meaning), it appears you want the person to appear only once, full stop, regardless of how many languages they speak. Your example data is meaningless, so I cannot be asked to reproduce it. See if this has some meaning. SELECT last_name, first_name, [language] = ( -- correlated subquery SELECT TOP 1 -- get the "first" language CASE name -- make meaning of null explicit WHEN NULL THEN "[None]" ELSE name END FROM person_speaks_language PL JOIN language L ON PL.language_id = L.id WHERE P.id = PL.person_id -- the subject person ORDER BY name -- id would be meaningless ) FROM person P -- vector for person, once ORDER BY last_name, first_name Now if you wanted only persons who speak a language (on file): SELECT last_name, first_name, [language] = ( -- correlated subquery SELECT TOP 1 -- get the "first" language name FROM person_speaks_language PL JOIN language L ON PL.language_id = L.id WHERE P.id = PL.person_id -- the subject person ORDER BY name -- id would be meaningless ) FROM person P, ( SELECT DISTINCT person_id -- just one occ, thanks FROM person_speaks_language PL -- vector for speakers ) AS PL_1 WHERE P.id = PL_1.person_id -- join them to person fields There, not an outer join anywhere to be seen, in either solution. LEFT or RIGHT will confuse you. Do not attempt to "get everything", so that you can "see" the data values, and then mangle, hack and chop away at the result set, in order to get what you want from that. No, forget about the data values and get only what you want from the record filing system. Response to Update I was trying to explain the case with a data set, I think I made things tougher than they actually were Yes, you did. Reviewing the update then ... The short answer is, get rid of the ORM. There is nothing in it of value: you can access the RDB from the queries that populate your objects directly. The way we did for decades before the flatulent beast came along. Especially if you understand and implement Open Architecture Standards. Further, as evidenced, it creates masses of problems. Here, you are trying to work around the insane restrictions of the ORM. Pagination is a straight-forward issue, if you have your data Normalised, and Relational Keys. The long answer is ... please read this Answer. I trust you will understand that the approach you take to designing your app components, your design of windows, will change. All your queries will be simplified, you get only what you require for the specific window or object. The problem may well disappear entirely (except for possibly the pagination, you might need a method). Then please think about those architectural issues carefully, and make specific comments of questions.
Multiple Joins + Lots of Data Optimization
I am working on a massive join at work and have very limited resources in terms of being able to add indexes and such as well as what I can do in the query itself due to the environment (i.e. I can only select data, no variables or table creations allowed). I have read somewhere that a subquery will automatically index the result, is this true? Also for my major join tables (3) each has ~140K rows. I have to join 2 extra tables to ensure filtering is correct. I have the query listed below which I currently have criteria on the JOIN clause. Another question is if I move my criteria to a where clause either in or out of the subquery will it benefit? SELECT * FROM (SELECT NULL AS A1, DFS_ROHEADER.TECHID, DFS_ROHEADER.RONUMBER, DFS_ROHEADER.CUSTOMERNUMBER, DFS_CUSTOMER.BNAME, DFS_ROHEADER.UNITNUMBER, DFS_ROHEADER.MILEAGE, DFS_ROHEADER.OPENEDDATE, DFS_ROHEADER.CLOSEDDATE, DFS_ROHEADER.STATUS, DFS_ROHEADER.PONUMBER, DFS_TECH.REGION, DFS_TECH.RSM, DFS_ROPART.PARTID, CONVERT(NVARCHAR(max), DFS_RODETAIL.STORY) AS STORY FROM DFS_ROHEADER LEFT JOIN DFS_CUSTOMER ON DFS_ROHEADER.CUSTOMERNUMBER = DFS_CUSTOMER.CUST_NO LEFT JOIN DFS_TECH ON DFS_ROHEADER.TECHID = DFS_TECH.TECHID INNER JOIN DFS_RODETAIL ON DFS_ROHEADER.RONUMBER = DFS_RODETAIL.RONUMBER INNER JOIN DFS_ROPART ON DFS_RODETAIL.RONUMBER = DFS_ROPART.RONUMBER AND DFS_RODETAIL.LINENUMBER = DFS_ROPART.LINENUMBER AND DFS_ROHEADER.RONUMBER LIKE '%$FF_RONumber%' AND DFS_ROHEADER.UNITNUMBER LIKE '%$FF_UnitNumber%' AND DFS_ROHEADER.PONUMBER LIKE '%$FF_PONumber%' AND ( DFS_CUSTOMER.BNAME LIKE '%$FF_Customer%' OR DFS_CUSTOMER.BNAME IS NULL ) AND DFS_ROHEADER.TECHID LIKE '%$FF_TechID%' AND DFS_ROHEADER.CLOSEDDATE BETWEEN FF_ClosedBegin AND FF_ClosedEnd AND ( DFS_TECH.REGION LIKE '%$FilterRegion%' OR DFS_TECH.REGION IS NULL ) AND ( DFS_TECH.RSM LIKE '%$FF_RSM%' OR DFS_TECH.RSM IS NULL ) AND DFS_RODETAIL.STORY LIKE '%$FF_Story%' AND DFS_ROPART.PARTID LIKE '%$FF_PartID%' WHERE DFS_ROHEADER.DELETED_BY < 0 AND DFS_RODETAIL.DELETED_BY < 0 AND DFS_ROPART.DELETED_BY < 0) T ORDER BY T.RONUMBER This query works; however, it can take forever to run, and can timeout. I have other queries that also run in the environment and I will take whatever you can give me in terms of suggestions and apply it to those. I am using SQLServer 2000, Thanks for the help. EDIT: Execution Plan: https://dl.dropboxusercontent.com/u/99733863/ExecutionPlan.sqlplan UPDATE: I have come to the conclusion the environment I'm working in is the cause of the problem. My query works as intended and is not slow at all (1 sec. for 18,000 rows). As stated in the comments I have to fill grids with limited flexibility and I believe that these grids fill by first filling a temporary grid with the SQL statement and then copying row by row into the desired grid. There is a good chance that this is the cause of my issues. Thanks for the help.
I have come to the conclusion the environment I'm working in is the cause of the problem. My query works as intended and is not slow at all (1 sec. for 18,000 rows). As stated in the comments I have to fill grids with limited flexibility and I believe that these grids fill by first filling a temporary grid with the SQL statement and then copying row by row into the desired grid. There is a good chance that this is the cause of my issues. Thanks for the help everyone.
My 2 cents here.. In general LIKE is not very well optimized. In your case you also seem to be using LIKE with '%value%'. In that case the query optimizer has to scan the entire index. At a minimum I would see if there is a way to avoid using this.