SQL-Query (with subquery, group and order by) optimization - sql

coud you help me optimizing the following statement. It has a bad prerformance when dealing with huge amount of data (in my case 3Mio Messages and 25Mio MessageWorkItems).
Does anybody have any suggestions? Thank you in advance.
select distinct msg.id, msgWorkItem_1.description
from message msg
left outer join message_work_item msgWorkItem_1 on msg.id=msgWorkItem_1.message_id
and ( msgWorkItem_1.id in (
select max(msgWorkItem_2.id)
from message_work_item msgWorkItem_2
inner join message_work_item_type msgWorkItem_Type on msgWorkItem_2.message_work_item_type_id=msgWorkItem_Type.id
where
msgWorkItem_2.creation_type= 'mobile'
and msgWorkItem_2.description is not null
and msgWorkItem_Type.code <> 'sent-to-app-manually'
-- Is it possible to avoid this correlation to the outer query ? )
and msgWorkItem_2.message_id = msg.id)
)
where msg.deactivation_time > ?
order by msgWorkItem_1.description asc

STEP 1 : Lay out the query so I have any hope of reading it
SELECT
DISTINCT
msg.id,
msgWorkItem_1.description
FROM
message msg
LEFT OUTER JOIN
message_work_item AS msgWorkItem_1
ON msgWorkItem_1.message_id = msg.id
AND msgWorkItem_1.id =
(
SELECT
MAX(msgWorkItem_2.id)
FROM
message_work_item AS msgWorkItem_2
INNER JOIN
message_work_item_type AS msgWorkItem_Type
ON msgWorkItem_2.message_work_item_type_id=msgWorkItem_Type.id
WHERE
msgWorkItem_2.creation_type= 'mobile'
AND msgWorkItem_2.description IS NOT NULL
AND msgWorkItem_Type.code <> 'sent-to-app-manually'
-- Is it possible to avoid this correlation to the outer query ?
AND msgWorkItem_2.message_id = msg.id
)
WHERE
msg.deactivation_time > ?
ORDER BY
msgWorkItem_1.description ASC
STEP 2 : rewrite using analytic functions instead of MAX()
SELECT
DISTINCT
message.id,
message_work_item_sorted.description
FROM
message
LEFT OUTER JOIN
(
SELECT
message_work_item.message_id,
message_work_item.description,
ROW_NUMBER() OVER (PARTITION BY message_work_item.message_id
ORDER BY message_work_item.id DESC
)
AS row_ordinal
FROM
message_work_item
INNER JOIN
message_work_item_type
ON message_work_item.message_work_item_type_id = message_work_item_type.id
WHERE
message_work_item.creation_type= 'mobile'
AND message_work_item.description IS NOT NULL
AND message_work_item_type.code <> 'sent-to-app-manually'
)
message_work_item_sorted
ON message_work_item_sorted.message_id = message.id
AND message_work_item_sorted.row_ordinal = 1
WHERE
message.deactivation_time > ?
ORDER BY
message_work_item_sorted.description ASC
With more information we could probably help further, but as you gave no definition of the tables, constraints, or business logic, this is just a re-write of what you're already implemented.
For example, I strongly doubt you need the DISTINCT (provided that the id columns in your tables are unique).

Related

SELECT DISTINCT, ORDER BY expressions must appear in target list

I have done this SQL request :
SELECT DISTINCT
*, j_start_date, typeNP.j_value, g_challenge.j_title
FROM g_challenge
INNER JOIN g_challenge_catset typeNP ON g_challenge.j_row_id = typeNP.j_item_id AND typeNP.j_value IN ('jbm_5233','jbm_5232','bfr_8227')
INNER JOIN g_challenge_catset as tps ON g_challenge.j_row_id = tps.j_item_id AND tps.j_value IN ('aga_7777','aga_7778','aga_7776')
LEFT OUTER JOIN g_dbsocialgroup gdb ON g_challenge.j_dbsocialgroup_id = gdb.j_row_id || '_DBSocialGroup'
WHERE (j_start_date >= '2018-08-13' OR (j_start_date <= '2018-08-13' AND j_end_date >= '2018-08-13')) AND g_challenge.j_pstatus = 0 AND g_challenge.j_mutualiste_type = false
ORDER BY
CASE typeNP.j_value
WHEN 'bfr_8227' THEN '1'
WHEN 'jbm_5233' THEN '2'
WHEN 'jbm_5232' THEN '3'
END,
j_start_date ASC, g_challenge.j_title ASC
But I got this error : SELECT DISTINCT, ORDER BY expressions must appear in target list and I don't know why ?
Help me please
Ok, I solved this with a group by
GROUP BY g_challenge.j_start_date, typeNP.j_value, g_challenge.j_title, g_challenge.j_row_id, typeNP.j_item_id, tps.j_item_id, tps.j_value, gdb.j_row_id
and with removing the DISTINCT and named columns in the select

Only month end values for each identifier

For the code below I am attempting to select only the month end values for all unique FundId's. The code below keeps giving me the error of
Msg 164, Level 15, State 1, Line 16
Each GROUP BY expression must contain at least one column that is not an outer reference.
How can I fix the where statement to pull all month end values for each fundid
SELECT TOP 10000 a.[PerformanceId]
,[InvestmentType]
,[EndDate]
,a.[CurrencyId]
,[AssetValue]
,c.FundId
FROM [StatusData_DMWkspaceDB].[dbo].[NetAssetsValidationFailure] a
LEFT JOIN MappingData_GAPortDB.dbo.PerformanceLevelMapping b
ON a.PerformanceId = b.PerformanceId
LEFT JOIN MappingData_GAPortDB.dbo.FundClassMatching c
ON b.SecId = c.SecId
WHERE a.EndDate IN (
SELECT MAX(a.EndDate)
From [StatusData_DMWkspaceDB].[dbo].[NetAssetsValidationFailure]
GROUP BY c.FundId, Month(a.EndDate), YEAR(a.EndDate))
This is your query:
SELECT TOP 10000 navf.[PerformanceId], [InvestmentType], [EndDate],
navf.[CurrencyId], [AssetValue], fcm.FundId
FROM [StatusData_DMWkspaceDB].[dbo].[NetAssetsValidationFailure] navf LEFT JOIN
MappingData_GAPortDB.dbo.PerformanceLevelMapping plm
ON navf.PerformanceId = plm.PerformanceId LEFT JOIN
MappingData_GAPortDB.dbo.FundClassMatching fcm
ON l.m.SecId = fcm.SecId
WHERE navf.EndDate IN (SELECT MAX(navf.EndDate)
From [StatusData_DMWkspaceDB].[dbo].[NetAssetsValidationFailure] navf
GROUP BY fcm.FundId, Month(navf.EndDate), YEAR(navf.EndDate)
);
Learn to use sensible table aliases, so the query is easier to write and to read.
In any case, your WHERE clause is only referencing outer tables in the GROUP BY. The message is quite clear.
I'm not even sure what you want to do, but I am guessing that this is a working version of what you want:
SELECT x.*
FROM (SELECT navf.[PerformanceId], [InvestmentType], [EndDate],
navf.[CurrencyId], [AssetValue], fcm.FundId,
ROW_NUMBER() OVER (PARTITION BY fcm.FundId, Month(navf.EndDate), YEAR(navf.EndDate)
ORDER BY navf.EndDate DESC
) as seqnum
FROM [StatusData_DMWkspaceDB].[dbo].[NetAssetsValidationFailure] navf LEFT JOIN
MappingData_GAPortDB.dbo.PerformanceLevelMapping plm
ON navf.PerformanceId = plm.PerformanceId LEFT JOIN
MappingData_GAPortDB.dbo.FundClassMatching fcm
ON l.m.SecId = fcm.SecId
) x
WHERE seqnum = 1;

Teradata using Top 10 and Distinct

I am getting an error saying that I cannot use Top 10 with distinct I am wondering if there is any way I can get my query to work on that fashion. this is what the error is saying .
[Teradata Database] [6916] TOP N Syntax error: Top N option is not supported with DISTINCT option.
Query Below:
Thank you.
Select Distinct TOP 10 t1.Adjustment_ID, t1.OfficeNum, t1.InvoiceNum, t1.PatientNum,
t1.CurrentStatus, t1.AdjustmentTotal, t1.SubmittedOn, t1.UserSubmitted,
t1.Invoice_Type, t1.Pat_First_Name, t1.Pat_Last_Name,t2.Reason_Code FROM App_UnityAdj_AdjInfo_Tbl t1
Left Join RCM_WORK_PRD.App_UnityAdj_AdjRecord_Tbl t2
On t1.Adjustment_ID = t2.Adjustment_ID
Where t1.UserSubmitted = 'Name' AND (t1.CurrentStatus = 'Pending' OR t1.CurrentStatus = 'Deny')
What are you trying to do? You have columns in the SELECT that are not in the GROUP BY. You also have TOP without ORDER BY, which is suspicious.
One simple method is to move all the SELECT columns to the GROUP BY:
select TOP 10 t1.Adjustment_ID, t1.OfficeNum, t1.InvoiceNum, t1.PatientNum,
t1.CurrentStatus, t1.AdjustmentTotal, t1.SubmittedOn, t1.UserSubmitted,
t1.Invoice_Type, t1.Pat_First_Name, t1.Pat_Last_Name, t2.Reason_Code
from App_UnityAdj_AdjInfo_Tbl t1 Left Join
RCM_WORK_PRD.App_UnityAdj_AdjRecord_Tbl t2
On t1.Adjustment_ID = t2.Adjustment_ID
where t1.UserSubmitted = 'Name' AND
t1.CurrentStatus in ('Pending', 'Deny')
group by t1.Adjustment_ID, t1.OfficeNum, t1.InvoiceNum, t1.PatientNum,
t1.CurrentStatus, t1.AdjustmentTotal, t1.SubmittedOn, t1.UserSubmitted,
t1.Invoice_Type, t1.Pat_First_Name, t1.Pat_Last_Name,t2.Reason_Code;
I think I figured it out, in case anyone wants to know it is sample 10
Select Distinct t1.Adjustment_ID, t1.OfficeNum, t1.InvoiceNum, t1.PatientNum,
t1.CurrentStatus, t1.AdjustmentTotal, t1.SubmittedOn, t1.UserSubmitted,
t1.Invoice_Type, t1.Pat_First_Name, t1.Pat_Last_Name,t2.Reason_Code FROM App_UnityAdj_AdjInfo_Tbl t1
Left Join RCM_WORK_PRD.App_UnityAdj_AdjRecord_Tbl t2
On t1.Adjustment_ID = t2.Adjustment_ID
Where t1.AssignedTo IS null AND (t1.CurrentStatus = 'Pending')
sample 10

Column is invalid in the ORDER BY clause because it is not contained in either an aggregate function or the GROUP BY clause

Ok here's my View (vw_LiftEquip)
SELECT dbo.tbl_equip_swl_unit.unit_id,
dbo.tbl_equip_swl_unit.unit_name,
dbo.tbl_equip_swl_unit.archived,
dbo.tbl_categories.category_id,
dbo.tbl_categories.categoryName,
dbo.tbl_categories.parentCategory,
dbo.tbl_categories.sub_category,
dbo.tbl_categories.desc_category,
dbo.tbl_categories.description,
dbo.tbl_categories.miscellaneous,
dbo.tbl_categories.category_archived,
dbo.tbl_equip_swl_unit.unit_name AS Expr1,
dbo.tbl_categories.categoryName AS Expr2,
dbo.tbl_categories.description AS Expr3,
dbo.tbl_equip_depts.dept_name,
dbo.tbl_equip_man.man_name,
dbo.tbl_Lifting_Gear.e_defects AS Expr7,
dbo.tbl_Lifting_Gear.e_defects_desc AS Expr8,
dbo.tbl_Lifting_Gear.e_defects_date AS Expr9,
dbo.tbl_equipment.equipment_id,
dbo.tbl_equipment.e_contract_no,
dbo.tbl_equipment.slID,
dbo.tbl_equipment.e_entered_by,
dbo.tbl_equipment.e_serial,
dbo.tbl_equipment.e_model,
dbo.tbl_equipment.e_description,
dbo.tbl_equipment.e_location_id,
dbo.tbl_equipment.e_owner_id,
dbo.tbl_equipment.e_department_id,
dbo.tbl_equipment.e_manafacture_id,
dbo.tbl_equipment.e_manDate1,
dbo.tbl_equipment.e_manDate2,
dbo.tbl_equipment.e_manDate3,
dbo.tbl_equipment.e_dimensions,
dbo.tbl_equipment.e_test_no,
dbo.tbl_equipment.e_firstDate1,
dbo.tbl_equipment.e_firstDate2,
dbo.tbl_equipment.e_firstDate3,
dbo.tbl_equipment.e_prevDate1,
dbo.tbl_equipment.e_prevDate2,
dbo.tbl_equipment.e_prevDate3,
dbo.tbl_equipment.e_insp_frequency,
dbo.tbl_equipment.e_swl,
dbo.tbl_equipment.e_swl_unit_id,
dbo.tbl_equipment.e_swl_notes,
dbo.tbl_equipment.e_cat_id,
dbo.tbl_equipment.e_sub_id,
dbo.tbl_equipment.e_parent_id,
dbo.tbl_equipment.e_last_inspector,
dbo.tbl_equipment.e_last_company,
dbo.tbl_equipment.e_deleted AS Expr11,
dbo.tbl_equipment.e_deleted_desc AS Expr12,
dbo.tbl_equipment.e_deleted_date AS Expr13,
dbo.tbl_equipment.e_deleted_insp AS Expr14,
dbo.tbl_Lifting_Gear.e_defects_action AS Expr15,
dbo.tbl_equipment.e_rig_location,
dbo.tbl_Lifting_Gear.e_add_type AS Expr17,
dbo.tbl_Lifting_Gear.con_id,
dbo.tbl_Lifting_Gear.lifting_date,
dbo.tbl_Lifting_Gear.lifting_ref_no,
dbo.tbl_Lifting_Gear.e_id,
dbo.tbl_Lifting_Gear.inspector_id,
dbo.tbl_Lifting_Gear.lift_testCert,
dbo.tbl_Lifting_Gear.lift_rig_location,
dbo.tbl_Lifting_Gear.inspected,
dbo.tbl_Lifting_Gear.lifting_through,
dbo.tbl_Lifting_Gear.liftingNDT,
dbo.tbl_Lifting_Gear.liftingTest,
dbo.tbl_Lifting_Gear.e_defects,
dbo.tbl_Lifting_Gear.e_defects_desc,
dbo.tbl_Lifting_Gear.e_defects_date,
dbo.tbl_Lifting_Gear.e_defects_action,
dbo.tbl_Lifting_Gear.lift_department_id,
dbo.tbl_Lifting_Gear.lifting_loc
FROM dbo.tbl_equipment
INNER JOIN dbo.tbl_equip_swl_unit
ON dbo.tbl_equipment.e_swl_unit_id = dbo.tbl_equip_swl_unit.unit_id
INNER JOIN dbo.tbl_categories
ON dbo.tbl_equipment.e_cat_id = dbo.tbl_categories.category_id
INNER JOIN dbo.tbl_equip_depts
ON dbo.tbl_equipment.e_department_id = dbo.tbl_equip_depts.dept_id
INNER JOIN dbo.tbl_equip_man
ON dbo.tbl_equipment.e_manafacture_id = dbo.tbl_equip_man.man_id
INNER JOIN dbo.vwSubCategory
ON dbo.tbl_equipment.e_sub_id = dbo.vwSubCategory.category_id
INNER JOIN dbo.vwDescCategory
ON dbo.tbl_equipment.e_cat_id = dbo.vwDescCategory.category_id
INNER JOIN dbo.tbl_Lifting_Gear
ON dbo.tbl_equipment.equipment_id = dbo.tbl_Lifting_Gear.e_id
And here's the select statement with subquery that I am using:
SELECT *
FROM vw_LiftEquip
WHERE lifting_loc = ? AND
con_id = ? AND
EXPR11 =
'N'(
SELECT MAX(lifting_date) AS maxLift
FROM vw_LiftEquip
WHERE e_id = equipment_id
)
ORDER BY lifting_ref_no,
category_id,
e_swl,
e_serial
I get the error :
Column "vw_LiftEquip.category_id" is invalid in the ORDER BY clause because it is not contained in either an aggregate function or the GROUP BY clause.
Can't see why its returning that error, this is admittedly the first time I've ran a subquery on such a complex view, and I am a bit lost, thanks in advance for any help. I have looked through the similar posts and can find no answers to this one, sorry if I am just being dumb.
You are missing AND between EXPR11 = 'N' and (SELECT MAX(...
Otherwise, it looks OK. MAX without GROUP BY is allowed if you have no other columns in the SELECT
Update: #hvd also noted that you have nothing to compare to MAX(lifting_date). See comment
Update 2,
SELECT *
FROM vw_LiftEquip v1
CROSS JOIN
(
SELECT MAX(lifting_date) AS maxLift
FROM vw_LiftEquip
WHERE e_id = equipment_id
) v2
WHERE v1.lifting_loc = ? AND
v1.con_id = ? AND
v1.EXPR11 = 'N'
ORDER BY v1.lifting_ref_no,
v1.category_id,
v1.e_swl,
v1.e_serial

SQL - Derived tables issue

I have the following SQL query:
SELECT VehicleRegistrations.ID, VehicleRegistrations.VehicleReg,
VehicleRegistrations.Phone, VehicleType.VehicleTypeDescription,
dt.ID AS 'CostID', dt.IVehHire, dt.FixedCostPerYear, dt.VehicleParts,
dt.MaintenancePerMile, dt.DateEffective
FROM VehicleRegistrations
INNER JOIN VehicleType ON VehicleRegistrations.VehicleType = VehicleType.ID
LEFT OUTER JOIN (SELECT TOP (1) ID, VehicleRegID, DateEffective, IVehHire,
FixedCostPerYear, VehicleParts, MaintenancePerMile
FROM VehicleFixedCosts
WHERE (DateEffective <= GETDATE())
ORDER BY DateEffective DESC) AS dt
ON dt.VehicleRegID = VehicleRegistrations.ID
What I basically want to do is always select the top 1 record from the 'VehicleFixedCosts' table, where the VehicleRegID matches the one in the main query. What is happening here is that it's selecting the top row before the join, so if the vehicle registration of the top row doesn't match the one we're joining to it returns nothing.
Any ideas? I really don't want to have use subselects for each of the columns I need to return
Try this:
SELECT vr.ID, vr.VehicleReg,
vr.Phone, VehicleType.VehicleTypeDescription,
dt.ID AS 'CostID', dt.IVehHire, dt.FixedCostPerYear, dt.VehicleParts,
dt.MaintenancePerMile, dt.DateEffective
FROM VehicleRegistrations vr
INNER JOIN VehicleType ON vr.VehicleType = VehicleType.ID
LEFT OUTER JOIN (
SELECT ID, VehicleRegID, DateEffective, IVehHire, FixedCostPerYear, VehicleParts, MaintenancePerMile
FROM VehicleFixedCosts vfc
JOIN (
select VehicleRegID, max(DateEffective) as DateEffective
from VehicleFixedCosts
where DateEffective <= getdate()
group by VehicleRegID
) t ON vfc.VehicleRegID = t.VehicleRegID and vfc.DateEffective = t.DateEffective
) AS dt
ON dt.VehicleRegID = vr.ID
Subquery underneath dt might need some grouping but without schema (and maybe sample data) it's hard to say which column should be involved in that.