Substitute join to leave only one 'Table Scan' - sql

I have financials data. And want to calculate Shareholder's Equity. This is basically how it looks like:
I have the following query which works:
SELECT a.Ticker, a.Value - l.Value as 'ShareholdersEquity'
FROM FinData a
JOIN FinData l
ON a.Ticker = l.Ticker AND a.Date = l.Date
WHERE a.Type = 'assets'
AND l.Type = 'liabilities'
But for a table with many records this will work slowly because when I check the query with Explain (I use Azure Data Studio) and it makes 2 table scans, which means more time. How can I rewrite it to be faster?

You could try conditional aggregation rather than a self-join:
select ticker, date,
sum(case when type = 'asset' then value else - value end) as ShareholdersEquity
from findata
where type in ('asset', 'liabilities')
group by ticker, date

Related

Optimizing SQL Query speed

I am trying to optimize my SQL query below as I am using a very old RDMS called firebird. I tried rearranging the items in my where clause and removing the order by statement but the query still seems to take forever to run. Unfortunately firebird doesn't support Explain Execution Plan Functionalities and therefore I cannot identify the code that is holding up the query.
select T.veh_reg_no,T.CON_NO, sum(T.pos_gpsunlock) as SUM_GPS_UNLOCK,
count(T.pos_gpsunlock) as SUM_REPORTS, contract.con_name
from
(
select veh_reg_no,CON_NO,
case when pos_gpsunlock = upper('T') then 1 else 0 end as pos_gpsunlock
from vehpos
where veh_reg_no in
( select regno
from fleetvehicle
where fleetno in (97)
) --DS5
and pos_timestamp > '2022-07-01'
and pos_timestamp < '2022-08-01'
) T
join contract on T.con_no = contract.con_no
group by T.veh_reg_no, T.con_no,contract.con_name
order by SUM_GPS_UNLOCK desc;
If anyone can help it would be greatly appreciated.
I'd either comment out some of the sub-queries or remove a join or aggregation and see if that improves it. Once you find the offending code maybe you can move it or re-write it. I know nothing of Firebird but I'd approach that query with the below code, wrapping the aggregation outside of the joins and removing the "Where in" clause.
If nothing works can you create an aggregation table or pre-filtered table and use that?
select
x.*
,sum(case when x.pos_gpsunlock = upper('T') then 1 else 0 end) as SUM_GPS_UNLOCK
,count(*) as SUM_REPORTS
FROM (
select
a.veh_reg_no
,a.pos_gpsunlock
,a.CON_NO
,c.con_name
FROM vehpos a
JOIN fleetvehicle b on a.veg_reg_no = b.reg_no and b.fleetno = 97 and b.pos_timestamp between '222-07-01' and '2022-08-01'
JOIN contract c on a.con_no = contract.con_no
) x
Group By....
This might help by converting subqueries to joins and reducing nesting. Also an = instead of IN() operation.
select vp.veh_reg_no,vp.con_no,c.con_name,
count(*) as SUM_REPORTS,
sum(case when pos_gpsunlock = upper('T') then 1 else 0 end) as SUM_GPS_UNLOCK
from vehpos vp
inner join fleetvehicle fv on fv.fleetno = 97 and fv.regno = vp.veh_reg_no
inner join contract c on vp.con_no = c.con_no
where vp.pos_timestamp >= '2022-07-01'
and vp.pos_timestamp < '2022-08-01'
group by vp.veh_reg_no, vp.con_no, c.con_name

Joining multiple CTEs

I am working on a database of a large retail store.
I have to query data from multiple tables to get numbers such as revenue, raw proceeds and compare different time periods.
Most of it is quite easy but I was struggling to work out a way of joining multiple CTEs.
I made a fiddle so you know what I am talking about.
I simplified the structure a lot and left out quite a few columns in the subqueries because they do not matter in this case.
As you can see every row in every table has country and brand in it.
The final query has to be grouped by those.
What I first tried was to FULL JOIN all the tables, but that didn't work in some cases as you can see here: SQLfiddle #1. Note the two last rows which did not group correctly.
Select Coalesce(incoming.country, revenue.country, revcompare.country,
openord.country) As country,
Coalesce(incoming.brand, revenue.brand, revcompare.brand,
openord.brand) As brand,
incoming.OrdersNet,
openord.OpenOrdersNet,
revenue.Revenue,
revenue.RawProceeds,
revcompare.RevenueCompare,
revcompare.RawProceedsCompare
From incoming
Full Join openord On openord.country = incoming.country And
openord.brand = incoming.brand
Full Join revenue On revenue.country = incoming.country And
revenue.brand = incoming.brand
Full Join revcompare On revcompare.country = incoming.country And
revcompare.brand = incoming.brand
Group By incoming.OrdersNet,
openord.OpenOrdersNet,
revenue.Revenue,
revenue.RawProceeds,
revcompare.RevenueCompare,
revcompare.RawProceedsCompare,
incoming.country,
revenue.country,
openord.country,
revcompare.country,
incoming.brand,
revenue.brand,
revcompare.brand,
openord.brand
Order By country,
brand
I then rewrote the query keeping all the CTEs. I added another CTE (basis) which UNIONs all the possible country and brand combinations and left joined on that one.
Now it works fine (check it out here -> SQLfiddle #2) but it just seems so complicated. Isn't there an easier way to achieve this? The only thing I probably won't be able to change are the CTEs as in real life they are way more complex.
WITH basis AS (
SELECT Country, Brand FROM incoming
UNION
SELECT Country, Brand FROM openord
UNION
SELECT Country, Brand FROM revenue
UNION
SELECT Country, Brand FROM revcompare
)
SELECT
basis.Country,
basis.Brand,
incoming.OrdersNet,
openord.OpenOrdersNet,
revenue.Revenue,
revenue.RawProceeds,
revcompare.RevenueCompare,
revcompare.RawProceedsCompare
FROM basis
LEFT JOIN incoming On incoming.Country = basis.Country AND incoming.Brand = basis.Brand
LEFT JOIN openord On openord.Country = basis.Country AND openord.Brand = basis.Brand
LEFT JOIN revenue On revenue.Country = basis.Country AND revenue.Brand = basis.Brand
LEFT JOIN revcompare On revcompare.Country = basis.Country AND revcompare.Brand = basis.Brand
Thank you all for your help!
Since you only work with two tables, orders and rev, consider conditional aggregation by moving WHERE conditions to CASE logic for single aggregate query. Also, consider only one CTE for all possible country/brand pairs for LEFT JOIN on the two tables.
WITH cb AS (
SELECT Country, Brand FROM orders
UNION
SELECT Country, Brand FROM rev
)
SELECT cb.Country
, cb.Brand
, SUM(o.netprice) AS OrdersNet
, SUM(CASE
WHEN o.isopen = 1
THEN o.netprice
END) AS OpenOrdersNet
, SUM(CASE
WHEN r.bdate BETWEEN '2020-12-01' AND '2020-12-31'
THEN r.netprice
END) AS Revenue
, SUM(CASE
WHEN r.bdate BETWEEN '2020-12-01' AND '2020-12-31'
THEN r.rpro
END) AS RawProceeds
, SUM(CASE
WHEN r.bdate BETWEEN '2020-11-01' AND '2020-11-30'
THEN r.netprice
END) AS RevenueCompare
, SUM(CASE
WHEN r.bdate BETWEEN '2020-11-01' AND '2020-11-30'
THEN r.rpro
END) AS RawProceedsCompare
FROM cb
LEFT JOIN orders o
ON cb.Country = o.Country
AND cb.Brand = o.Brand
LEFT JOIN rev r
ON cb.Country = r.Country
AND cb.Brand = r.Brand
GROUP BY cb.Country
, cb.Brand
SQL Fiddle

SQL Advance, creating fill new column with coditional value

Cound you please support with question below, i trying to fill a new colmn in my SQL query based in a conditional value.
pseudo code:
IF ap_id exists in visit_table and visit_type = 'first'
then firs_tvisit_id = visit_table.visit_id and first_visit_user = visit_table.Username
The same logic for second and third visit.
This are visit_table.
And this are ap_table (where are the key values):
And below is the expected table.
SQL is able to make these kind of data manipulation, if there are how can i get this.
I alrady tryed left joins, inner joins, and full outher join. however i were not able to create a new column and fill this based in a conditional.
One method is multiple joins:
select a.*,
v1.visit_id, v1.user_name,
v2.visit_id, v2.user_name,
v3.visit_id, v3.user_name
from ap_table a left join
visit_table v1
on v1.ap_id = a.ap_id and v1.type = 'first' left join
visit_table v2
on v2.ap_id = a.ap_id and v2.type = 'second' left join
visit_table v3
on v3.ap_id = a.ap_id and v3.type = 'third'
You can join and do conditional aggregation:
select
a.ap_id,
a.location,
max(case when visit_type = 'first' then visit_id end) first_visit_id,
max(case when visit_type = 'second' then visit_id end) second_visit_id,
...
max(case when visit_type = 'first' then username end) first_visit_username,
max(case when visit_type = 'second' then username end) second_visit_username,
...
from ap_table a
inner join visit_table v on v.ap_id = a.ap_id
group by a.ap_id, a.location
It looks like you have some potential data issues here that must be answered for a proper solution. Most specifically, what should happen if the visit_table has multiple records for the same ap_id and visit_type? Or should we assume that there is a unique key on those two columns?
The best answer to this question as posted is actually likely to be dependent on the flavor of SQL you are using and the nuance of what you are really trying to do. For example, PostgreSQL has the ability to "SELECT DISTINCT ON" columns without requiring grouping. Oracle, SQL Server, and others have the ability to simplify complex queries using the with clause. And if other methods fail you can always write nested queries to try and grab everything in one monster query.
If you are able to use a temp table or table variable you may be better off selecting information into a constructed table as needed. If you need more than 3 visits, you may have to write dynamic sql. And if this is information you want to have readily available for other queries to access you might consider writing it as a view instead.
One quick note about simply selecting min(visit_id) and min(username) for your query - if you have multiple grouped rows you are likely to get the id from one visit incorrectly paired with the username from another visit.
Here is an attempt at creating a monster query for just the 3 visit types included:
SELECT a.ap_id,
a.location,
v1.visit_id AS first_visit_id,
v1.username AS first_visit_username,
v2.visit_id AS second_visit_id,
v2.username AS second_visit_username,
v3.visit_id AS third_visit_id,
v3.username AS third_visit_username
FROM (
SELECT ai.ap_id,
(SELECT MIN(v1i.visit_id) FROM visit_table v1i ON v1i.ap_id = ai.ap_id AND v1i.type = 'first') AS v1_id,
(SELECT MIN(v2i.visit_id) FROM visit_table v2i ON v2i.ap_id = ai.ap_id AND v2i.type = 'second') AS v2_id,
(SELECT MIN(v3i.visit_id) FROM visit_table v3i ON v3i.ap_id = ai.ap_id AND v3i.type = 'third') AS v3_id
FROM ap_table ai
) x
JOIN ap_table a ON x.ap_id = a.ap_id
LEFT JOIN visit_table v1 ON x.v1_id = v1.visit_id
LEFT JOIN visit_table v3 ON x.v2_id = v2.visit_id
LEFT JOIN visit_table v3 ON x.v3_id = v3.visit_id
WHERE
x.v1_id IS NOT NULL
OR x.v2_id IS NOT NULL
OR x.v3_id IS NOT NULL

How to fix "Conversion from string "August" to type 'Date' is not vaid in SSRS

SELECT
a.ItemCode,
SUM(a.NoOfApplication) AS NoOfApplication,
SUM(a.NoOfAccomplished) AS NoOfAccomplished,
SUM(a.NoOfPending) AS NoOfPending,
SUM(a.NoOfDocumentCompliance) AS NoOfDocumentCompliance,
a.[Year]
FROM
(SELECT
ItemCode,
COUNT(am.ReferenceNumber) AS NoOfApplication,
COUNT(TNA.NoOfAccomplished) AS NoOfAccomplished,
COUNT(TNP.NoOfPending) AS NoOfPending,
SUM(FDC.NoOfDocumentCompliance) AS NoOfDocumentCompliance,
DATENAME(month, ad.applicationdate) AS [Year]
FROM
AppTypes at
INNER JOIN
AssessmentMainDetails am ON at.Category = am.Category
INNER JOIN
InspectionProcesses i ON am.ReferenceNumber = i.ReferenceNo
LEFT JOIN
(SELECT
COUNT(Status) AS NoOfDocumentCompliance,
ReferenceNumber, Status
FROM
ApplicationStatus
WHERE
Status = 'For Document Compliance'
GROUP BY
ReferenceNumber, Status) AS FDC ON FDC.ReferenceNumber = i.ReferenceNo
LEFT JOIN
(SELECT
COUNT(ReferenceNo) AS NoOfAccomplished,
ReferenceNo
FROM
InspectionProcesses
WHERE
DateOfInspection <> ''
GROUP BY
ReferenceNo) AS TNA ON TNA.ReferenceNo = i.ReferenceNo
LEFT JOIN
(SELECT
COUNT(ReferenceNo) AS NoOfPending, ReferenceNo
FROM
InspectionProcesses
WHERE
DateOfInspection = ''
GROUP BY
ReferenceNo) AS TNP ON TNP.ReferenceNo = i.ReferenceNo
INNER JOIN
ApplicationDetails ad on i.ReferenceNo = ad.ReferenceNumber
INNER JOIN
Companies c on ad.CompanyId = c.CompanyID
INNER JOIN
Zones z on c.zonecode = z.zonecode
INNER JOIN
ZoneGroups zg on z.ZoneGroup = zg.ZoneGroupId
WHERE
DateOfInspection = ''
AND ad.ApplicationDate BETWEEN '2017-08-01' AND '2017-09-30'
AND zg.ZoneGroupCode = 'HO'
AND z.ZoneCode = 'VIDC'
GROUP BY
ItemCode, DATENAME(month, ad.applicationdate)) a
GROUP BY
a.ItemCode, a.[Year]
This my code, I already converted my date to get the month name. Please I need help
Look carefully. That giant derived table (a - nice meaningful name btw) has the same group by clause as the outermost query. So that means that [a] has a single row per ItemCode and datename(month, ad.applicationdate). Therefore, there is nothing to sum in your outer query since it is grouping by the same columns.
You also have the expression:
DateOfInspection = ''
which is highly suspicious based on the name of the column. What datatype is the DateOfInspection column? Doesn't sound like it should be string-based.
And lastly, the error message you posted sounds like it comes from SSRS and not sql server. Is that the case? Does your query run correctly from SSMS? Then the problem is in your report - and it would seem that you attempt to manipulate or interpret the Year column as a date (perhaps for sorting?). It also seems a bit short-sighted in your report design that your "Year" column is actually the name of a month and that your resultset does not include the year in some fashion. What happens when your data spans more than twelve months? And how do you intend to sort your report when you have month name but not month number?

SQL Server "ORDER BY" optimization - massive performance decrease

Using SQL Server 2000. I have a table that receives a dump from a legacy system once a day, I am trying to write a query that will process this table with a few reference table joins and an order by clause.
This is the SQL I have:
select d.acct_no,
d.associate_id,
d.first_name,
d.last_name,
d.acct_bal,
plr.long_name p_lvl,
tlr.long_name t_lvl,
d.category,
d.status,
tm.site_name,
d.addr1 + ' ' + isnull(d.addr2,'') address,
d.city,
d.state,
d.country,
d.post_code,
CASE WHEN d.home_phone_ok = 1 THEN d.home_phone END home_phone,
CASE WHEN d.work_phone_ok = 1 THEN d.work_phone END work_phone,
CASE WHEN d.alt_phone_ok = 1 THEN d.alt_phone END alt_phone,
CASE WHEN d.email_ok = 1 THEN d.email END email,
d.last_credit last_paid,
d.service,
d.quantity,
d.amount,
ar.area_desc area
from item_dump d
left outer join territory_map tm on tm.short_postcode = left(post_code,3) and country in ('United States','Canada')
left outer join p_level_ref plr on plr.p_level_id = d.p_lvl_id
left outer join t_level_ref tlr on tlr.t_level_id = d.t_lvl_id
left outer join (select distinct master_item_id, site_item_id from invoice_detail) as map on map.item_id = d.item_no
left outer join item_ref i on i.item_id = map.master_item_id
left outer join area_ref ar on ar.area_id = i.area_id
where (d.cat_id > 80 or d.cat_id < 70)
and d.standing < 4
and d.status not like 'DECEASED'
and d.paid = 1
order by d.associate_id
Most of these columns are straight from the legacy system dump table item_dump. All the joins are only reference tables with few rows. The legacy table itself has about 17000 records but with the where statements the query comes out to 3000.
I have a non-clustered index on the associate_id column.
When I run this query without the order by associate_id clause it takes about 2 seconds. With the order by clause it takes a full minute!
I've tried adding the where clause columns to the index along with associate_id but that didn't change the performance at all.
The end of the execution plan without the order by looks like this:
Using order by, parallelism kicks in on the order by argument and it looks like this:
I thought maybe it was weird SQL Server 2000 parallelism handling, but adding the (maxdop 1) hint made the query take 3 minutes instead!
It isn't really sensible for me to put sorting in the application code because this query caches for about 6 hours before it gets run again and I would have to sort it in the application code many times a minute.
I must be missing something very basic but after straining at the query for an hour i.e. running it 10 times, I can't see what it is anymore.
What happens when u remove all the outer joins and ofcourse the select's in there..
select d.acct_no,
d.associate_id,
d.first_name,
d.last_name,
d.acct_bal,
d.category,
d.status,
d.addr1 + ' ' + isnull(d.addr2,'') address,
d.city,
d.state,
d.country,
d.post_code,
CASE WHEN d.home_phone_ok = 1 THEN d.home_phone END home_phone,
CASE WHEN d.work_phone_ok = 1 THEN d.work_phone END work_phone,
CASE WHEN d.alt_phone_ok = 1 THEN d.alt_phone END alt_phone,
CASE WHEN d.email_ok = 1 THEN d.email END email,
d.last_credit last_paid,
d.service,
d.quantity,
d.amount
from item_dump d
where (d.cat_id > 80 or d.cat_id < 70)
and d.standing < 4
and d.status not like 'DECEASED'
and d.paid = 1
order by d.associate_id
If that works fast then i would go for sub selects inside the select's
select d.acct_no,
d.associate_id,
d.first_name,
d.last_name,
d.acct_bal,
plr.long_name p_lvl,
tlr.long_name t_lvl,
d.category,
d.status,
(select tm.site_name
from territory_map tm
where tm.short_postcode = left(post_code,3)
and country in ('United States','Canada') as site_name
etc. it'll be really faster as left outer joining them in the from clause