Need Input | SQL Dynamic Query - sql

Have a requirement where I need to build a dynamic query based on user input and send the count of records from result set.
So there are 6 tables which I needs to make a join Inner for sure and rest table join will be based on user input and this should be performance oriented.
Here is the requirement
select count(A.A1) from table A
INNER JOIN table B on B.B1=A.A1
INNER JOIN table B on C.C1=B.B1
INNER JOIN table D on D.D1=C.C1
INNER JOIN table E on E.E1=D.D1
INNER JOIN table F on F.F1=E.E1
Now if user select some value in UI , then have to execute query as
select count(A.A1) from table A
INNER JOIN table B on B.B1=A.A1
INNER JOIN table B on C.C1=B.B1
INNER JOIN table D on D.D1=C.C1
INNER JOIN table E on E.E1=D.D1
INNER JOIN table F on F.F1=E.E1
INNER JOIN table B on G.G1=F.F1
Where G.Name like '%Germany%'
User can send 1- 5 choices and have to build the query and accordingly and send the result set
So if I add all the joins first and then add where clause as per the choice , then query will be easy and serve the purpose, but if user did not select any query then I am creating unnecessary join for the user choices.
So which will be better way to write having all the joins in advance and then filtering it or on demand join and with filters using dynamic query.
Could be great if someone can provide valuable inputs.

When SQL Server executes a query, there is a first step which is planning the query, i.e. deciding an strategy to get the query result.
If you use "inner joins" you're making it compulsory to include all the tables, becasuse "inner join" means that there must be matching rows on both tables of the join, so the query planner can't dicard any tables.
However, if you change the inner joins by left outer joins, it's not compulsory that there are matching rows on both sides of the join, so the query planner can decide if it includes or not the tables on the right. So, if you use left outer joins, and you don't select, or filter, or do any operation on fields on the right side of the joins, the query planner can discard then when executing the query. That's the easiest way to get rid of your concerns.
On the other hand, if you want to control what tables to inclued or not to include, and create a custom query for each case, you can use several techniques:
making a graph that includes the definition of the table relations, and using some graph manipulation library that allows you to get the necessary tables from the graph.I did this one, but is quite hard to achieve if you don't have experience with graps.
using Entity Framework. You must build a simple model including all the tables. And then, to run each query, you can programmatically build the query in LINQ, and EF will take care to generate and execute the SQL query for you.

Related

Optimizing OUTER JOIN queries using filters from WHERE clause.(Query Planner)

I am writing a distributed SQL query planner(Query Engine). Data will be fetched from RDBMS(PostgreSQL) nodes involving network I/O.
I want to optimize JOIN queries.
Logical Order of Execution is:
Do JOIN(make use of ON clause)
Apply WHERE clause on the joined result.
I was thinking about applying Filter(WHERE clause specific to a table) first itself, and then do join.
In what cases would that result in wrong results?
Example:
SELECT *
FROM tableA
LEFT JOIN tableB ON(tableA.col1 = tableB.col1)
LEFT JOIN tableC ON(tableB.col2 = tableC.col1)
WHERE tableA.colY < 100 AND tableB.colX > 50
Logical Execution:
joinResult = (tableA left join tableB ON() ) left join tableC ON()
Filter joinResult using given WHERE clause.
Proposed Execution:
filteredA = tableA WHERE tableA.colY < 100
filteredB = tableB WHERE tableB.colX > 50
Result = (filteredA left join filteredB ON(..))left join tableC ON(..)
Can I optimize any query like this? That is filtering the table first and then applying join above that.
Edit:
Some people are confusing and talking about this specific example. I am not talking about this specific example query, I am writing a query planner and I want to handle all type of queries
Please note that, each of the tables is sharded and stored in different machines, and the current execution model is to fetch each of the tables and then do join locally. So if I apply the WHERE filter before fetching, it would be better.
This is actually a complex topic.
We can filter the table in some cases. We can also reorder outer joins and then push the filter quals inside.
I was going through a research paper regarding this, but I haven't completed it yet(may not complete it also).
So for now, for those who are looking for answers, you could probably go through this research paper particularly section 2.2. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.43.2531&rep=rep1&type=pdf
For now I'm relying on PostgreSQL's planner and taking its output and reconstructing the query for my requirements.

How to join two tables first then join to another table

I have a DB2 database and I have to join three tables. I would like to join first two tables and after joining firs two tables, I would like to join the joined table to another third table. I tried using left join but couldn't find a result as I expected. I tried the following:
select AFJKAR as "ELR_Elig_Redirect_SchdID",
AFEZAM as "Priority",
AFTSAS as "ELC_Status",
AFT7CE as "ELC_From_Date",
AFT8CE as "ELC_Thru_Date",
AFTTAS as "ELC_Redirect_Action",
AFJLAR as "GPI_List",
AIZAHA as "GPI_List_ID",
AILUIG as "GPI_Number",
AICXHG as "GPI_From_Date",
AICYHG as "GPI_Thru_Date",
SUEFC4 as "GPI_ID",
SUB4T3 as "Drug_Name"
from CLMPRDFIL.RCELCP as RCE
left join CLMPRDFIL.RCGP2P as RCG on RCE.AFJLAR = RCG.AIZAHA
left join CLMPRDFIL.RCGPIP as RCGP on RCG.AIZAHA = RCGP.SUEFC4;
Basically, I would like to join RCE and RCGP2P tables first. After joining this, I would like to join it by RCGPIP.
Use a corresponding optimization profile / guideline for this.
Optimization profiles and guidelines.
You may specify the desired join order there, if you believe, that you may achieve better performance with particular join order.
Note, that you should try to collect statistics on these tables first to make the Db2 optimizer use correct join order. For example, try to create a statistical view on first 2 tables using their join keys and collect statistics on it. Look at the access plan of your original query afterwards to check, if you get desired join order.

Access SQL subquery access field from parent

I have a SQL query that works in Access 2016:
SELECT
Count(*) AS total_tests,
Sum(IIf(score>=securing_threshold And score<mastering_threshold,1,0)) AS total_securing,
Sum(IIf(score>=mastering_threshold,1,0)) AS total_mastering,
total_securing/Count(*) AS percent_securing,
total_mastering/Count(*) AS percent_mastering,
(Count(*)-total_securing-total_mastering)/Count(*) AS percent_below,
subjects.subject,
students.year_entered,
IIf(Month(Date())<9,Year(Date())-students.year_entered+6,Year(Date())-students.year_entered+7) AS current_form,
groups.group
FROM
((subjects
INNER JOIN tests ON subjects.ID = tests.subject)
INNER JOIN (students
INNER JOIN test_results ON students.ID = test_results.student) ON tests.ID = test_results.test)
LEFT JOIN
(SELECT * FROM group_membership LEFT JOIN groups ON group_membership.group = groups.ID) As g
ON students.ID = g.student
GROUP BY subjects.subject, students.year_entered, groups.group;
However, I wish to filter out irrelevant groups before joining them to my table. The table groups has a column subject which is a foreign key.
When I try changing ON students.ID = g.student to ON students.ID = g.student And subjects.ID = g.subject I get the error 'JOIN expression not supported'.
Alternatively, when I try adding WHERE subjects.ID = groups.subject to the subquery, it asks me for the parameter value of subjects.ID, although it is a column in the parent query.
Googling reveals many similar errors but they were all resolved by changing the brackets. That didn't help.
Just in case the table relationships help:
Thank you.
EDIT: Sample database at https://www.dropbox.com/s/yh80oooem6gsni7/student%20tracker.ACCDB?dl=0
MS Access queries with many joins are difficult to update by SQL alone as parenetheses pairings are required unlike other RDBMS's and these pairings must follow an order. Moreover, some pairings can even be nested. Hence, for beginners it is advised to build queries with complex and many joins using the query design GUI in the MS Office application and let it build out the SQL.
For a simple filter on the g derived table, you could filter subject on the derived table, g, but likely you want want all subjects:
...
(SELECT * FROM group_membership
LEFT JOIN groups ON group_membership.group = groups.ID
WHERE groups.subject='Earth Science') As g
...
So for all subjects, consider re-building query from scratch in GUI that nearly mirrors your table relationships which actually auto-links joins in the GUI. Then, drop unneeded tables.
Usually you want to begin with the join table or set like groups and group_membership or tests and test_results. In fact, consider saving the g derived table as its own query.
Then add the distinct record primary source tables like students and subjects.
You may even need to play around with order in FROM and JOIN clauses to attain desired results, and maybe even add the same table in query. And be careful with adding join tables like group_membership (two one-to-many links), to GROUP BY queries as it leads to the duplicate record aggregation. So you may need to join aggregates queries by subject.
Unless you can post content of all tables, from our perspective it is difficult to help from here.
Your subquery g uses a LEFT JOIN, but there is a enforced 1:n relation between the two tables, so there will always be a matching group. Use a INNER JOIN instead.
With g.subject you are trying to join on a column that is on the right side of a left join, that cannot really work.
Also you shouldn't use SELECT * on a join of tables with identical column names. Include only the qualified column names that you need.
LEFT JOIN
(SELECT group_membership.student, groups.group, groups.subject
FROM group_membership INNER JOIN groups
ON group_membership.group = groups.ID) As g
ON (students.ID = g.student AND subjects.ID = g.subject)
I would call the columns in group_membership group_ID and student_ID to avoid confusion.
I don't have the database to test, but I would use subject table as subquery:
(SELECT * FROM subject WHERE filter out what you don't need) Subj
Then INNER JOIN this new Subj Table in your query which would exclude irrelevant groups.
Also I would never create join in WHERE clause (WHERE subjects.ID = groups.subject), what this does it creates cartesian product (table with all the possible combinations of subjects.ID and groups.subject) then it filters out records to satisfy your join. When dealing with huge data it might take forever or crash.
Error related to "Join expression may not be supported"; do datatypes match in those fields?
I solved it by (a lot of trial and error and) taking the advice here to make queries in the GUI and joining them. The end result is 4 queries deep! If the database was bigger, performance would be awful... but now its working I can tweak it eventually.
Thank you everybody!

Is the order of joining tables indifferent as long as we chose proper join types?

Can we achieve desired results of joining tables by executing joins in whatever order? Suppose we want to left join two tables A and B (order AB). We can get the same results with right join of B and A (BA).
What about 3 tables ABC. Can we get whatever results by only changing order and joins types? For example A left join B inner join C. Can we get it with BAC order? What about if we have 4 or more tables?
Update.
The question Does the join order matter in SQL? is about inner join type. Agreed that then the order of join doesn't matter. The answer provided in that question does not answer my question whether it is possible to get desired results of joining tables with whatever original join types (join types here) by choosing whatever order of tables we like, and achieve this goal only by manipulating with join types.
In an inner join, the ordering of the tables in the join doesn't matter - the same rows will make up the result set regardless of the order they are in the join statement.
In either a left or right outer join, the order DOES matter. In A left join B, your result set will contain one row for every record in table A, irrespective of whether there is a matching row in table B. If there are non matching rows, this is likely to be a different result set to B left join A.
In a full outer join, the order again doesn't matter - rows will be produced for each row in each joined table no matter what their order.
Regarding A left join B vs B right join A - these will produce the same results. In simple cases with 2 tables, swapping the tables and changing the direction of the outer join will result in the same result set.
This will also apply to 3 or more tables if all of the outer joins are in the same direction - A left join B left join C will give the same set of results as C right join B right join A.
If you start mixing left and right joins, then you will need to start being more careful. There will almost always be a way to make an equivalent query with re-ordered tables, but at that point sub-queries or bracketing off expressions might be the best way to clarify what you are doing.
As another commenter states, using whatever makes your purpose most clear is usually the best option. The ordering of the tables in your query should make little or no difference performance wise, as the query optimiser should work this out (although the only way to be sure of this would be to check the execution plans for each option with your own queries and data).

Successive LEFT SQL joins back to original

I need to redo sql statement in legacy Foxpro application and don't understand whether it is meaningful at all. Syntax is a bit specific - it extracts data from temporary table into the same temporary table ( overwriting) with some joins.
SELECT aa.*,b.spa_date FROM (ALIAS()) aa INNER JOIN jobs ON aa.seq=jobs.seq ;
LEFT JOIN job2 ON jobs.job_no=job2.rucjob;
left join jobs b on b.job_no=job2.job_no;
WHERE jobs.qty1<>0 INTO CURSOR (ALIAS())
Since only one field is added from joined tables ( spa_date ) is there any point in 2 left joins or I am missing something. Isn't it equivalent to
SELECT aa.*,jobs.spa_date FROM (ALIAS()) aa INNER JOIN jobs ON aa.seq=jobs.seq ;
WHERE jobs.qty1<>0 INTO CURSOR (ALIAS())
They are different because b.spa_date come from the second left join. You may be missing filtered rows without both left joins.
You would need to know the intent of the original query and perhaps rewrite it to make more sense but I'd say the two queries are different.