Successive LEFT SQL joins back to original - sql

I need to redo sql statement in legacy Foxpro application and don't understand whether it is meaningful at all. Syntax is a bit specific - it extracts data from temporary table into the same temporary table ( overwriting) with some joins.
SELECT aa.*,b.spa_date FROM (ALIAS()) aa INNER JOIN jobs ON aa.seq=jobs.seq ;
LEFT JOIN job2 ON jobs.job_no=job2.rucjob;
left join jobs b on b.job_no=job2.job_no;
WHERE jobs.qty1<>0 INTO CURSOR (ALIAS())
Since only one field is added from joined tables ( spa_date ) is there any point in 2 left joins or I am missing something. Isn't it equivalent to
SELECT aa.*,jobs.spa_date FROM (ALIAS()) aa INNER JOIN jobs ON aa.seq=jobs.seq ;
WHERE jobs.qty1<>0 INTO CURSOR (ALIAS())

They are different because b.spa_date come from the second left join. You may be missing filtered rows without both left joins.
You would need to know the intent of the original query and perhaps rewrite it to make more sense but I'd say the two queries are different.

Related

There are a few questions about the select statement

I have a few questions about the select statement.
First of all, I have normalized 15 tables for this select query.
The problem is invisible because there is not much data right now.
However, since I try to process many tables in one select query, it seems to cause problems later.
So I want to add a few more select statements to divide the tables to search, but I want to know how different it is from doing it at once.
Secondly, if I use join, I will use outer join. If I join multiple tables with outer join, I'm not sure how to use left outer join and right outer join.
The currently created select query refers to 8 tables and one join is linked.
That is, the remaining rest of the tables have obtained data in subqueries and the remaining eight tables are likely to use join.
I would appreciate it if you could let me know the direction of the multiple outer joins.
Let me briefly show you some of the current select queries.
select
a.cal1,a.cal2,a.cal3,...,
(select b.cal1 from b
where a.cal4=b.cal2)
as "bcals",
(select c.cal1 from c
where a.cal5=c.cal2)
as "ccals",
....,
(select e.cal1 from e
where a.caln=e.cal2)
as "ecals",
(select sum(extract(year from age(f.endday,f.startday))
from f
where e.cal1=a.cal1)
as "fcals",
g.cal1,g.cal2,g.cal3,...,
(select h.cal1 from h
where g.cal4=h.cal2)
as "hcals"
from a left outer join g on a.cal1=g.cal5
where a.cal1=?;
Result:
a.cal1|a.cal2|a.cal3|...|hcals
var1 |var2 |var3 |...|varn
After this, I wonder how to join the rest of the tables.
To sum up
If there are many tables that need to be included in a select query statement, what is the difference between performance and performance when this complex query is divided into multiple select statements?
If we write inside a select statement, how should outer join be?
Is there a problem with the query?
Actually your code is correct, but it looks very complex. People will find it difficult to understand it. Using joins you can minimize the lines of code and also make it more readable.
SELECT
TBL1.AMOUNT T1,
TBL2.AMOUNT T2,
TBL3.AMOUNT T3
FROM TBL1
LEFT JOIN TBL2 ON TBL2.ID = TBL1.ID
LEFT JOIN TBL3 ON TBL3.ID = TBL1.ID
In the above code , there are three tables, and two joins. One can easily understand and debug/make changes. Please try this for your code.

SAS Enterprise: left join and right join difference?

I joined a new company that uses SAS Enterprise Guide.
I have 2 tables, table A has 100 row, and table B has over 30M rows (50-60 columns).
I tried to do a right join from A (100) to B (30M), it took over 2 hours and no result come back. I want to ask, will it help if I do a left join? I used the GUI and created the following query.
30M Record <- 100 Record ?
or
100 Record -> 30M Record ?
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_CASE_NUMBER AS
SELECT t2.EMPGRPCOM,
t2.SEQINVNUM,
t2.SBSID,
t2.SBSLASTNAME,
t2.SBSFIRSTNAME,
t2.PMTDUEDATE,
t2.PREMAMT,
t2.ITEMDESC,
t2.EFFDATE,
t2.PAYAMT,
t2.MCAIDRATECD,
t2.REBILLIND,
t2.BILLTYPE
FROM WORK.'CASE NUMBER'n t1
LEFT JOIN DW.BILLING t2 ON (t1.CaseNumber = t2.SBSID)
WHERE t2.LOB = 'MD' AND t2.PMTDUEDATE BETWEEN '1Jan2015:0:0:0'dt AND '31Dec2017:0:0:0'dt AND t2.SITEID = '0001';
QUIT;
Left join and Right join, all other things aside, are equivalent - if you implement them the same way, anyway. I.E.,
select a.*
from a
left join
b
on a.id=b.id
;
vs
select a.*
from b
right join
a
on b.id=a.id
;
Same exact query, no difference, same time used. SQL is an interpreted language, meaning the SQL interpreter looks at what you send it and figures out what the best way to do it is - so it sees both queries and knows in both cases to do the same thing.
You can read about this in all sorts of articles, this one is a good starting point, or if that link ages just search for "right join vs left join".
Now, what you might want to consider is writing this in a different way, namely not using SQL; this kind of query SQL should be good at but sometimes isn't for some reason. I would write it as a hash table search, where the smaller case_number dataset is loaded to memory, then data step iterate over the larger table and check if it's found in the smaller dataset - if so, then great, return it.
I'd also think about whether left/right join is what you want, vs. inner join. Seems to me that if you're returning solely t2 values, right/left join isn't correct (when t1 is the "primary"): you'll just get empty rows for the non-matches. Either return a t1 variable, or use inner join.

Need Input | SQL Dynamic Query

Have a requirement where I need to build a dynamic query based on user input and send the count of records from result set.
So there are 6 tables which I needs to make a join Inner for sure and rest table join will be based on user input and this should be performance oriented.
Here is the requirement
select count(A.A1) from table A
INNER JOIN table B on B.B1=A.A1
INNER JOIN table B on C.C1=B.B1
INNER JOIN table D on D.D1=C.C1
INNER JOIN table E on E.E1=D.D1
INNER JOIN table F on F.F1=E.E1
Now if user select some value in UI , then have to execute query as
select count(A.A1) from table A
INNER JOIN table B on B.B1=A.A1
INNER JOIN table B on C.C1=B.B1
INNER JOIN table D on D.D1=C.C1
INNER JOIN table E on E.E1=D.D1
INNER JOIN table F on F.F1=E.E1
INNER JOIN table B on G.G1=F.F1
Where G.Name like '%Germany%'
User can send 1- 5 choices and have to build the query and accordingly and send the result set
So if I add all the joins first and then add where clause as per the choice , then query will be easy and serve the purpose, but if user did not select any query then I am creating unnecessary join for the user choices.
So which will be better way to write having all the joins in advance and then filtering it or on demand join and with filters using dynamic query.
Could be great if someone can provide valuable inputs.
When SQL Server executes a query, there is a first step which is planning the query, i.e. deciding an strategy to get the query result.
If you use "inner joins" you're making it compulsory to include all the tables, becasuse "inner join" means that there must be matching rows on both tables of the join, so the query planner can't dicard any tables.
However, if you change the inner joins by left outer joins, it's not compulsory that there are matching rows on both sides of the join, so the query planner can decide if it includes or not the tables on the right. So, if you use left outer joins, and you don't select, or filter, or do any operation on fields on the right side of the joins, the query planner can discard then when executing the query. That's the easiest way to get rid of your concerns.
On the other hand, if you want to control what tables to inclued or not to include, and create a custom query for each case, you can use several techniques:
making a graph that includes the definition of the table relations, and using some graph manipulation library that allows you to get the necessary tables from the graph.I did this one, but is quite hard to achieve if you don't have experience with graps.
using Entity Framework. You must build a simple model including all the tables. And then, to run each query, you can programmatically build the query in LINQ, and EF will take care to generate and execute the SQL query for you.

multiple sql joins not producing desired results

I'm new to sql and trying to tweak someone else's huge stored procedure to get a subset of the results. The code below is maybe 10% of the whole procedure. I added the lp.posting_date, last left join, and the where clause. Trying to get records where the posting date is between the start date and the end date. Am I doing this right? Apparently not because the results are unaffected by the change. UPDATE: I CHANGED THE LAST JOIN. The results are correct if there's only one area allocation term. If there is more than one area allocation term, the results are duplicated for each term.
SELECT Distinct
l.lease_id ,
l.property_id as property_id,
l.lease_number as LeaseNumber,
l.name as LeaseName,
lty.name as LeaseType,
lst.name as LeaseStatus,
l.possession_date as PossessionDate,
l.rent as RentCommencementDate,
l.store_open_date as StoreOpenDate,
msr.description as MeasureUnit,
l.comments as Comments ,
lat.start_date as atStartDate,
lat.end_date as atEndDate,
lat.rentable_area as Rentable,
lat.usable_area as Usable,
laat.start_date as aatStartDate,
laat.end_date as aatEndDate,
MK.Path as OrgPath,
CAST(laa.percentage as numeric(9,2)) as Percentage,
laa.rentable_area as aaRentable,
laa.usable_area as aaUsable,
laa.headcounts as Headcount,
laa.area_allocation_term_id,
lat.area_term_id,
laa.area_allocation_id,
lp.posting_date
INTO #LEASES FROM la_tbl_lease l
INNER JOIN #LEASEID on l.lease_id=#LEASEID.lease_id
INNER JOIN la_tbl_lease_term lt on lt.lease_id=l.lease_id and lt.IsDeleted=0
LEFT JOIN la_tlu_lease_type lty on lty.lease_type_id=l.lease_type_id and lty.IsDeleted=0
LEFT JOIN la_tlu_lease_status lst on lst.status_id= l.status_id
LEFT JOIN la_tbl_area_group lag on lag.lease_id=l.lease_id
LEFT JOIN fnd_tlu_unit_measure msr on msr.unit_measure_key=lag.unit_measure_key
LEFT JOIN la_tbl_area_term lat on lat.lease_id=l.lease_id and lat.isDeleted=0
LEFT JOIN la_tbl_area_allocat_term laat on laat.area_term_id=lat.area_term_id and laat.isDeleted=0
LEFT JOIN dbo.la_tbl_area_allocation laa on laa.area_allocation_term_id=laat.area_allocation_term_id and laa.isDeleted=0
LEFT JOIN vw_FND_TLU_Menu_Key MK on menu_type_id_key=2 and isActive=1 and id=laa.menu_id_key
INNER JOIN la_tbl_lease_projection lp on lp.lease_projection_id = #LEASEID.lease_projection_id
where lp.posting_date <= laat.end_date and lp.posting_date >= laat.start_date
As may have already been hinted at you should be careful when using the WHERE clause with an OUTER JOIN.
The idea of the OUTER JOIN is to optionally join that table and provide access to the columns.
The JOINS will generate your set and then the WHERE clause will run to restrict your set. If you are using a condition in the WHERE clause that says one of the columns in your outer joined table must exist / equal a value then by the nature of your query you are no longer doing a LEFT JOIN since you are only retrieving rows where that join occurs.
Shorten it and copy it out as a new query in ssms or whatever you are using for testing. Use an inner join unless you want to preserve the left side set even when there is no matching lp.lease_id. Try something like
if object_id('tempdb..#leases) is not null
drop table #leases;
select distinct
l.lease_id
,l.property_id as property_id
,lp.posting_date
into #leases
from la_tbl_lease as l
inner join la_tbl_lease_projection as lp on lp.lease_id = l.lease_id
where lp.posting_date <= laat.end_date and lp.posting_date >= laat.start_date
select * from #leases
drop table #leases
If this gets what you want then you can work from there and add the other left joins to the query (getting rid of the select * and 'drop table' if you copy it back into your proc). If it doesn't then look at your Boolean date logic or provide more detail for us. If you are new to sql and its procedural extensions, try using the object explorer to examine the properties of the columns you are querying, and try selecting the top 1000 * from the tables you are using to get a feel for what the data looks like when building queries. -Mike
You can try the BETWEEN operator as well
Where lp.posting_date BETWEEN laat.start_date AND laat.end_date
Reasoning: You can have issues wheres there is no matching values in a table. In that instance on a left join the table will populate with null. Using the 'BETWEEN' operator insures that all returns have a value that is between the range and no nulls can slip in.
As it turns out, the problem was easier to solve and it was in a different place in the stored procedure. All I had to do was add one line to one of the cursors to include area term allocations by date.

Group by in SQL Server giving wrong count

I have a query which works, goes like this:
Select
count(InsuranceOrderLine.AntallPotensiale) as potensiale,
COUNT(InsuranceOrderLine.AntallSolgt) as Solgt,
InsuranceProduct.Name,
InsuranceProductCategory.Name as Kategori
From
InsuranceOrderLine, InsuranceProduct, InsuranceProductCategory
where
InsuranceOrderLine.FKInsuranceProductId = InsuranceProduct.InsuranceProductID
and InsuranceProduct.FKInsuranceProductCategory = InsuranceProductCategory.InsuranceProductCategoryID
Group by
InsuranceProduct.name, InsuranceProductCategory.Name
This query over returns what I need, but when I try to add more table (InsuranceOrder) to be able to get the regardingUser column, then all the count values are way high.
Select
count(InsuranceOrderLine.AntallPotensiale) as Potensiale,
COUNT(InsuranceOrderLine.AntallSolgt) as Solgt,
InsuranceProduct.Name,
InsuranceProductCategory.Name as Kategori,
RegardingUser
From
InsuranceOrderLine, InsuranceProduct, InsuranceProductCategory, InsuranceSalesLead
where
InsuranceOrderLine.FKInsuranceProductId = InsuranceProduct.InsuranceProductID
and InsuranceProduct.FKInsuranceProductCategory = InsuranceProductCategory.InsuranceProductCategoryID
Group by
InsuranceProduct.name, InsuranceProductCategory.Name,RegardingUser
Thanks in advance
You're adding one more table to your FROM statement, but you don't specify any JOIN condition for that table - so your previous result set will do a FULL OUTER JOIN (cartesian product) with your new table! Of course you'll get duplication of data....
That's one of the reasons that I'm recommending never to use that old, legacy style JOIN - do not simply list a comma-separated bunch of tables in your FROM statement.
Always use the new ANSI standard JOIN syntax with INNER JOIN, LEFT OUTER JOIN and so on:
SELECT
count(iol.AntallPotensiale) as Potensiale,
COUNT(iol.AntallSolgt) as Solgt,
ip.Name,
ipc.Name as Kategori,
isl.RegardingUser
FROM
dbo.InsuranceOrderLine iol
INNER JOIN
dbo.InsuranceProduct ip ON iol.FKInsuranceProductId = ip.InsuranceProductID
INNER JOIN
dbo.InsuranceProductCategory ipc ON ip.FKInsuranceProductCategory = ipc.InsuranceProductCategoryID
INNER JOIN
dbo.InsuranceSalesLead isl ON ???????? -- JOIN condition missing here !!
When you do this, you first of all see right away that you're missing a JOIN condition here - how is this new table InsuranceSalesLead linked to any of the other tables already used in this SQL statement??
And secondly, your intent is much clearer, since the JOIN conditions linking the tables are where they belong - right with the JOIN - and don't clutter up your WHERE clauses ...
It looks like you added the table join which slightly multiplies count of rows - make sure, that you properly joining the table. And be careful with aggregate functions over several joined tables - joins very often lead to duplicates