I am trying to write a SQL Statement for Interbase.
Whats wrong with this SQL?
md_master (trm) = Master Table
cd_Med (cdt) = Detail table
SELECT trm.seq_no, trm.recipient_id, trm.payee_fullname, trm.payee_address1, trm.payee_address2, trm.payee_address3, trm.payee_address_city, trm.payee_address_state, trm.recip_zip, trm.recip_zip_4, trm.recip_zip_4_2, trm.check_no, trm.check_date, trm.check_amount,
cdt.com_ss_source_sys, cdt.cd_pay_date, cdt.com_set_amount,
bnk.name, bnk.address, bnk.transit_routing,
act.acct_no
FROM md_master trm, cd_med cdt, accounts act, banks bnk
join cd_med on cdt.master_id = trm.id
join accounts on act.acct_id = trm.account_tag
join banks on bnk.bank_id = act.bank_id
ORDER BY cdt.master_id
I don't get an error, the computer just keeps crunching away and hangs.
I don't know about Interbase specifically, but that FROM clause seems a little strange (perhaps just some syntax I'm not familiar with though). Does this help?
...
FROM md_master trm
join cd_med cdt on cdt.master_id = trm.id
join accounts act on act.acct_id = trm.account_tag
join banks bnk on bnk.bank_id = act.bank_id
By the way, you have no WHERE clause so if any of these tables is large, I wouldn't be overly surprised that it takes a long time to run.
You have been bitten by an anti-pattern called implicit join syntax
SELECT * FROM table_with_a_1000rows, othertable_with_a_1000rows
Will do a cross-join on both tables selecting 1 million rows in the output.
You are doing:
FROM md_master trm, cd_med cdt, accounts act, banks bnk
A cross join on 4 tables (combined with normal joins afterwards), which could easily generate many billions of rows.
No wonder interbase hangs; it is working until the end of time to generate more rows then there are atoms in the universe.
The solution
Never use , after the FROM clause, that is an implicit join and it is evil.
Only use explicit joins, like so:
SELECT
trm.seq_no, trm.recipient_id, trm.payee_fullname, trm.payee_address1
, trm.payee_address2, trm.payee_address3, trm.payee_address_city
, trm.payee_address_state, trm.recip_zip, trm.recip_zip_4, trm.recip_zip_4_2
, trm.check_no, trm.check_date, trm.check_amount
, cdt.com_ss_source_sys, cdt.cd_pay_date, cdt.com_set_amount
, bnk.name, bnk.address, bnk.transit_routing
, act.acct_no
FROM md_master trm
join cd_med on cdt.master_id = trm.id
join accounts on act.acct_id = trm.account_tag
join banks on bnk.bank_id = act.bank_id
ORDER BY cdt.master_id
The error lie in the from clause. You are using half with comma separated tables without a relation in where clause and half with joins.
Just use joins and all should work fine
Related
This code is taking a significant amount of time to run. It's returning every single transaction within the date range but I just need to know if the customer has had at least one transaction, then include the CustomerID, CustomerName, Type, Sign, ReportingName.
I think I need to GROUP BY 'CustomerID' but again only if there was a transaction within the date range. And of course, I'm sure there is an optimal way to execute the below TSQL because it's quite slow at present.
Thanks in advance for any help!
SELECT [ABC].[dbo].[vwPrimary].[RelatedNameId] AS CustomerID
,[ABC].[dbo].[vwPrimary].[RelatedName] AS CustomerName
,[AFGPurchase].[IvL].[TaxTreatment].[ParticluarType] AS Type
,[AFGPurchase].[IvL].[Product].[Sign] AS [Sign]
,[AFGPurchase].[IvL].[Product].[ReportingName] AS ReportingName
,[AFGPurchase].[IvL].[Transaction].[EffectiveDate] AS 'Date'
FROM (((([AFGPurchase].[IvL].[Account]
INNER JOIN [AFGPurchase].[IvL].[Position] ON [AFGPurchase].[IvL].[Account].[AccountId] = [AFGPurchase].[IvL].[Position].[AccountId])
INNER JOIN [AFGPurchase].[IvL].[Product] ON [AFGPurchase].[IvL].[Position].[ProductID] = [AFGPurchase].[IvL].[Product].[ProductId])
INNER JOIN [ABC].[dbo].[vwPrimary] ON [AFGPurchase].[IvL].[Account].[ReportingEntityId] = [ABC].[dbo].[vwPrimary].[RelatedNameId])
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] ON [AFGPurchase].[IvL].[Account].[TaxTreatmentId] = [AFGPurchase].[IvL].[TaxTreatment].[TaxTreatmentId])
INNER JOIN [AFGPurchase].[IvL].[Transaction] ON [AFGPurchase].[IvL].[Position].[PositionId] = [AFGPurchase].[IvL].[Transaction].[PositionId]
WHERE ((([AFGPurchase].[IvL].[TaxTreatment].[RegistrationType]) LIKE 'NON%')
AND (([AFGPurchase].[IvL].[Product].[Sign])='XYZ2')
AND (([AFGPurchase].[IvL].[Position].[Quantity])<>0)
AND (([AFGPurchase].[IvL].[Transaction].[EffectiveDate]) between '2021-12-31' and '2022-12-31'))
Check your indexes on fragmentation, to speed up your query. And make sure you have indexes.
If you just need one result, just TOP 1
SELECT TOP 1 [ABC].[dbo].[vwPrimary].[RelatedNameId] AS CustomerID
,[ABC].[dbo].[vwPrimary].[RelatedName] AS CustomerName
,[AFGPurchase].[IvL].[TaxTreatment].[ParticluarType] AS Type
,[AFGPurchase].[IvL].[Product].[Sign] AS [Sign]
,[AFGPurchase].[IvL].[Product].[ReportingName] AS ReportingName
,[AFGPurchase].[IvL].[Transaction].[EffectiveDate] AS 'Date'
FROM (((([AFGPurchase].[IvL].[Account]
INNER JOIN [AFGPurchase].[IvL].[Position] ON [AFGPurchase].[IvL].[Account].[AccountId] = [AFGPurchase].[IvL].[Position].[AccountId])
INNER JOIN [AFGPurchase].[IvL].[Product] ON [AFGPurchase].[IvL].[Position].[ProductID] = [AFGPurchase].[IvL].[Product].[ProductId])
INNER JOIN [ABC].[dbo].[vwPrimary] ON [AFGPurchase].[IvL].[Account].[ReportingEntityId] = [ABC].[dbo].[vwPrimary].[RelatedNameId])
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] ON [AFGPurchase].[IvL].[Account].[TaxTreatmentId] = [AFGPurchase].[IvL].[TaxTreatment].[TaxTreatmentId])
INNER JOIN [AFGPurchase].[IvL].[Transaction] ON [AFGPurchase].[IvL].[Position].[PositionId] = [AFGPurchase].[IvL].[Transaction].[PositionId]
WHERE ((([AFGPurchase].[IvL].[TaxTreatment].[RegistrationType]) LIKE 'NON%')
AND (([AFGPurchase].[IvL].[Product].[Sign])='XYZ2')
AND (([AFGPurchase].[IvL].[Position].[Quantity])<>0)
AND (([AFGPurchase].[IvL].[Transaction].[EffectiveDate]) between '2021-12-31' and '2022-12-31'))
If you only need to check for the existence of a row, and not actually get any data from it then use EXISTS() rather than INNER JOIN, e.g.
SELECT vpr.[RelatedNameId] AS CustomerID
,vpr.[RelatedName] AS CustomerName
,tt.[ParticluarType] AS Type
,prd.[Sign]
,prd.ReportingName
,tr.[EffectiveDate] AS [Date]
FROM [AFGPurchase].[IvL].[Account] AS acc
INNER JOIN [AFGPurchase].[IvL].[Position] AS pos ON acc.[AccountId] = pos.[AccountId]
INNER JOIN [AFGPurchase].[IvL].[Product] AS prd ON pos.[ProductID] = prd.[ProductId]
INNER JOIN [ABC].[dbo].[vwPrimary] AS vpr ON acc.[ReportingEntityId] = vpr.[RelatedNameId]
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] AS tt ON acc.[TaxTreatmentId] = tt.[TaxTreatmentId]
WHERE tt.[RegistrationType] LIKE 'NON%'
AND prd.[Sign]='XYZ2'
AND pos.[Quantity]<>0
AND EXISTS
( SELECT 1
FROM [AFGPurchase].[IvL].[Transaction] AS tr
WHERE tr.[PositionId] = pos.[PositionId]
AND tr.[EffectiveDate] BETWEEN '2021-12-31' AND '2022-12-31'
);
N.B. I have added in table aliases and removed all the unnecessary parentheses for readability - you may disagree that it is more readable, but I would expect that most people would agree
This may not offer any performance benefits over simply grouping by the columns you are selecting and keeping your joins as they are - SQL is after all a declarative language where you tell the engine what you want, not how to get it. So you may find that the two plans are the same because you are requesting the same result. Using EXISTS does have the advance of being more semantically tied to what you are trying to do though, so gives the optimiser the best chance of getting to the right plan. If you are still having performance issues, then you may need to inspect the execution plan, and see if it suggests any indexes.
Finally, if you are really still using SQL Server 2008 then you really need to start thinking about your upgrade path. It has been completely unsupported for over 3 years now.
I have written this SQL query to fetch the data from greenplum datalake. The primary table has hardy 800,000ish rows which I am joining with other table. The below query is taking insane amount of time to give result. What might be the possible reason for the longer query time? How to resolve it?
select
a.pole,
t.country_name,
a.service_area,
a.park_name,
t.turbine_platform_name,
a.turbine_subtype,
a.pad as "turbine_name",
t.system_number as "turbine_id",
a.customer,
a.service_contract,
a.component,
c.vendor_mfg as "component_manufacturer",
a.case_number,
a.description as "case_description",
a.rmd_diagnosis as "case_rmd_diagnostic_description",
a.priority as "case_priority",
a.status as "case_status",
a.actual_rootcause as "case_actual_rootcause",
a.site_trends_feedback as "case_site_feedback",
a.added as "date_case_added",
a.start as "date_case_started",
a.last_flagged as "date_case_flagged_by_algorithm_latest",
a.communicated as "date_case_communicated_to_field",
a.field_visible_date as "date_case_field_visbile_date",
a.fixed as "date_anamoly_fixed",
a.expected_clse as "date_expected_closure",
a.request_closure_date as "date_case_request_closure",
a.validation_date as "date_case_closure",
a.production_related,
a.estimated_value as "estimated_cost_avoidance",
a.cms,
a.anomaly_category,
a.additional_information as "case_additional_information",
a.model,
a.full_model,
a.sent_to_field as "case_sent_to_field"
from app_pul.anomaly_stage a
left join ge_cfg.turbine_detail t on a.scada_number = t.system_number and a.added > '2017-12-31'
left join tbwgr_v.pmt_wmf_tur_component_master_t c on a.component = c.component_name
Your query is basically:
select . . .
from app_pul.anomaly_stage a left join
ge_cfg.turbine_detail t
on a.scada_number = t.system_number and
a.added > '2017-12-31' left join
tbwgr_v.pmt_wmf_tur_component_master_t c
on a.component = c.component_name
First, the condition on a is ignored, because it is the first table in the left join and is the on clause. So, I assume you actually intend for it to filter, so write the query as:
select . . .
from app_pul.anomaly_stage a left join
ge_cfg.turbine_detail t
on a.scada_number = t.system_number left join
tbwgr_v.pmt_wmf_tur_component_master_t c
on a.component = c.component_name
where a.added > '2017-12-31'
That might help with performance. Then in Postgres, you would want indexes on turbine_detail(system_number) and pmt_wmf_tur_component_master_t(component_name). It is doubtful that an index would help on the first table, because you are already selecting a large amount of data.
I'm not sure if indexes would be appropriate in Greenplum.
Verify if the joins are using respective primary and foreign keys.
Try to execute the query removing one left join after the other, so you see the focus the problem.
Try using the plan execution.
I'm trying to join 4 tables that have a somewhat complex relationship. Because of where this will be used, it needs to be contained in a single query, but I'm having trouble since the primary query and the IN clause query both join 2 tables together and the lookup is on two columns.
The goal is to input a SalesNum and SalesType and have it return the Price
Tables and relationships:
sdShipping
SalesNum[1]
SalesType[2]
Weight[3]
sdSales
SalesNum[1]
SalesType[2]
Zip[4]
spZones
Zip[4]
Zone[5]
spPrices
Zone[5]
Price
Weight[3]
Here's my latest attempt in T-SQL:
SELECT
spp.Price
FROM
spZones AS spz
LEFT OUTER JOIN
spPrices AS spp ON spz.Zone = spp.Zone
WHERE
(spp.Weight, spz.Zip) IN (SELECT ship.Weight, sales.Zip
FROM sdShipping AS ship
LEFT OUTER JOIN sdSales AS sales ON sales.SalesNum = ship.SalesNum
AND sales.SalesType = ship.SalesType
WHERE sales.SalesNum = (?)
AND ship.SalesType = (?));
SQL Server Management Studio says I have an error in my syntax near ',' (appropriately useless error message). Does anybody have any idea whether this is even allowed in Microsoft's version of SQL? Is there perhaps another way to accomplish it? I've seen the multi-key IN questions answered on here, but never in the case where both sides require a JOIN.
Many databases do support IN on tuples. SQL Server is not one of them.
Use EXISTS instead:
SELECT spp.Price
FROM spZones spz LEFT OUTER JOIN
spPrices spp
ON spz.Zone = spp.Zone
WHERE EXISTS (SELECT 1
FROM sdShipping ship LEFT JOIN
sdSales sales
ON sales.SalesNum = ship.SalesNum AND
sales.SalesType = ship.SalesType
WHERE spp.Weight = ship.Weight AND spz.Zip = sales.Zip AND
sales.SalesNum = (?) AND
ship.SalesType = (?)
);
I'm trying to using the aggregation features of the django ORM to run a query on a MSSQL 2008R2 database, but I keep getting a timeout error. The query (generated by django) which fails is below. I've tried running it directs the SQL management studio and it works, but takes 3.5 min
It does look it's aggregating over a bunch of fields which it doesn't need to, but I wouldn't have though that should really cause it to take that long. The database isn't that big either, auth_user has 9 records, ticket_ticket has 1210, and ticket_watchers has 1876. Is there something I'm missing?
SELECT
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined],
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined]
HAVING
(COUNT([tickets_ticket].[id]) > 0 OR COUNT(T3.[id]) > 0 )
EDIT:
Here are the relevant indexes (excluding those not used in the query):
auth_user.id (PK)
auth_user.username (Unique)
tickets_ticket.id (PK)
tickets_ticket.capturer_id
tickets_ticket.responsible_id
tickets_ticket_watchers.id (PK)
tickets_ticket_watchers.user_id
tickets_ticket_watchers.ticket_id
EDIT 2:
After a bit of experimentation, I've found that the following query is the smallest that results in the slow execution:
SELECT
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id]
The weird thing is that if I comment out any two lines in the above, it runs in less that 1s, but it doesn't seem to matter which lines I remove (although obviously I can't remove a join without also removing the relevant SELECT line).
EDIT 3:
The python code which generated this is:
User.objects.annotate(
Count('tickets_captured'),
Count('assigned_tickets'),
Count('tickets_watched')
)
A look at the execution plan shows that SQL Server is first doing a cross-join on all the table, resulting in about 280 million rows, and 6Gb of data. I assume that this is where the problem lies, but why is it happening?
SQL Server is doing exactly what it was asked to do. Unfortunately, Django is not generating the right query for what you want. It looks like you need to count distinct, instead of just count: Django annotate() multiple times causes wrong answers
As for why the query works that way: The query says to join the four tables together. So say an author has 2 captured tickets, 3 assigned tickets, and 4 watched tickets, the join will return 2*3*4 tickets, one for each combination of tickets. The distinct part will remove all the duplicates.
what about this?
SELECT auth_user.*,
C1.tickets_captured__count
C2.assigned_tickets__count
C3.tickets_watched__count
FROM
auth_user
LEFT JOIN
( SELECT capturer_id, COUNT(*) AS tickets_captured__count
FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id
LEFT JOIN
( SELECT responsible_id, COUNT(*) AS assigned_tickets__count
FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id
LEFT JOIN
( SELECT user_id, COUNT(*) AS tickets_watched__count
FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id
WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null -- also works (I think with beter performance)
We have a lot of star schemas in our data warehouse. I thought I can create views in order to simplify SQL analysis of data.
Example SQL for profit&loss star:
select
month_number,
sum(amount)
from
bizdata.dw_daily_pl_fact dwdpf
join bizdata.dw_distance dwdis on (dwdis.distance_key= dwdpf.distance_key)
join bizdata.dw_ledger_account dwled on (dwled.ledger_account_key= dwdpf.ledger_account_key)
join bizdata.dw_party dwpar on (dwpar.party_key= dwdpf.company_key)
join bizdata.dw_party dwpar2 on (dwpar2.party_key= dwdpf.supplier_key)
join bizdata.dw_budget_code dwbud on (dwbud.budget_code_key= dwdpf.budget_code_key)
join bizdata.dw_time dwtim on (dwtim.time_key= dwdpf.time_key)
join bizdata.dw_project dwpro on (dwpro.project_key= dwdpf.project_key)
where
year_number = 2012
and budget_code = 'SALARIES'
group by
month_number
(There are approx 200 columns and 100k rows in this star)
If I have a view:
create or replace view bizdata.dwv_pl_fact as (
select
*
from
bizdata.dw_daily_pl_fact dwdpf
join bizdata.dw_distance dwdis on (dwdis.distance_key= dwdpf.distance_key)
join bizdata.dw_ledger_account dwled on (dwled.ledger_account_key= dwdpf.ledger_account_key)
join bizdata.dw_party dwpar on (dwpar.party_key= dwdpf.company_key)
join bizdata.dw_party dwpar2 on (dwpar2.party_key= dwdpf.supplier_key)
join bizdata.dw_budget_code dwbud on (dwbud.budget_code_key= dwdpf.budget_code_key)
join bizdata.dw_time dwtim on (dwtim.time_key= dwdpf.time_key)
join bizdata.dw_project dwpro on (dwpro.project_key= dwdpf.project_key)
);
I can simplify the statement to the following:
select
month_number,
sum(amount)
from
bizdata.dwv_pl_fact
where
year_number = 2012
and budget_code = 'SALARIES'
group by
month_number
My questions is - Are there any performance or other issues with such approach?
A view in PostgreSQL is just a query rewriting mechanism. So you can basically assume your user-supplied criteria get merged into the view's definition and the resulting query gets run.
Since 9.0 it the planner should even notice some joins in the resulting query are unnecessary and skip them. That seems particularly useful in your case.
However, it is possible that some criteria may not be pushed "inside" clauses in the view definition - these would be the same as you might see with a sub-query though. For example, a subquery with order-by + limit can present a boundary the planner can't see through.
HTH
You haven't mentioned about the environment. In a generic model, your approach seems to be OK but you missed one great point. Keep in mind you are bringing all columns in that view. if there are 100s of column and there is a row chain, it will be nightmare. So rewrite the query and build the view(dwv_plk_fact) like following and you should be ok.
create or replace view bizdata.dwv_pl_fact as (
select
<table_name>.month_number,
<table_name>.amount
from
bizdata.dw_daily_pl_fact dwdpf
join bizdata.dw_distance dwdis on (dwdis.distance_key= dwdpf.distance_key)
join bizdata.dw_ledger_account dwled on (dwled.ledger_account_key= dwdpf.ledger_account_key)
join bizdata.dw_party dwpar on (dwpar.party_key= dwdpf.company_key)
join bizdata.dw_party dwpar2 on (dwpar2.party_key= dwdpf.supplier_key)
join bizdata.dw_budget_code dwbud on (dwbud.budget_code_key= dwdpf.budget_code_key)
join bizdata.dw_time dwtim on (dwtim.time_key= dwdpf.time_key)
join bizdata.dw_project dwpro on (dwpro.project_key= dwdpf.project_key)
);