Optimizing select query with DISTINCT - sql

My query goes like this. If I just run the query without a distinct, it takes only 11 seconds. While it takes 46 seconds to run with distinct. Any advice on how this can be optimized?
SELECT * FROM (
SELECT DISTINCT ELIGIBLE_TO_SIGN_DT
, LAST_NAME
, FIRST_NAME
, EXTENDED_LAST_NAME
, DT_OF_BIRTH
, BIRTH_COUNTRY_ID
, REG_BY
, LAST_UPDATED_BY
, REVIEW_STATUS_1
, REVIEW_STATUS_2
, REVIEW_STATUS_3
, MLSB_MATCH_FILTER
, REG_STATUS_ID
, REG_STATUS
, HAS_TRAVELED
, PLAYER_ID_SHOW
, MLSB_MATCH
, TRAINER_AGENT_NAME
, NATIONAL_ID
, RES_FOLLOW_UP
, ATTACHMENT
, COMMENTS
, PLAYER_ID
, CHECKBOX
, INTL_AMA_ENTRY_ID
, ALSO_REG_BY
, MIDDLE_NAME
, BIRTH_COUNTRY_NAME
FROM AS_INTL_ADMIN_REG_VIEW
where VARCHAR_FORMAT(ELIGIBLE_TO_SIGN_DT, 'YYYY-MM-DD') >= date('07/02/2015') AND VARCHAR_FORMAT(ELIGIBLE_TO_SIGN_DT, 'YYYY-MM-DD') <= date('08/31/2015')
) order by case when UPPER(LAST_NAME) is null or trim(UPPER(LAST_NAME)) = '' then 'ZZZZZZ' else UPPER(LAST_NAME) end ASC, case when FIRST_NAME is null or trim(FIRST_NAME) = '' then 'ZZZZZZ' else FIRST_NAME end ASC
limit 200 offset 0

If the example as written is your actual code, then the difference is accounted for by the LIMIT you have on the query. When you run it without the DISTINCT, the query engine can just take the first 200 rows. But with the DISTINCT, it must first run on the entire table to find the distinct rows, then select the first 200.

DISTINCT is costly, ORDER BY ... UPPER()... can be costly
In the query given, it seems likely to me that DISTINCT is being used unnecessarily or to fix duplicates that shouldn't be there. If the data is bad, consider fixing it rather than working around it.
Make sure you have a case insensitive index built
create index myindex on mytable UPPER(LAST_NAME)

Related

Group by - Non-group-by expression in select clause

Sorry you will have to bear with me as i am relativly new to SQL. I am querying an ODBC in EXCEL. At the moment my dataset is massive so i am looking to narrow it down by grouping it by company name and date. not all my columns are calculated fields.I have put the Sum on the two i need adding up. When i try to return the data i get the error of Non-group-by expression in select clause
Please can someone help me out.
SELECT
SopOrder_0.SooOrderNumber
, Company_0.CoaCompanyName
, InvoiceCreditItem_0.InvoiceCreditItemID
, InvoiceCreditItem_0.IciInvoiceApproved
, InvoiceCreditItem_0.InvoiceCreditID
, InvoiceCreditItem_0.CompanySiteID
, InvoiceCreditItem_0.VatID
, InvoiceCreditItem_0.NominalID
, InvoiceCreditItem_0.IciCreatedDate
, Sum(InvoiceCreditItem_0.IciTotalNettValue)
, Sum(InvoiceCreditItem_0.IciVatValue)
FROM
SBS.PUB.Company Company_0
, SBS.PUB.Customer Customer_0
, SBS.PUB.InvoiceCreditItem InvoiceCreditItem_0
, SBS.PUB.SopOrder SopOrder_0
WHERE
SopOrder_0.SopOrderID = InvoiceCreditItem_0.SopOrderID
AND InvoiceCreditItem_0.CompanyID = Customer_0.CompanyID
AND InvoiceCreditItem_0.CompanyID = Company_0.CompanyID
AND (Company_0.CoaCompanyName<>'ATOS')
AND InvoiceCreditItem_0.IciCreatedDate >= ?
GROUP BY
Company_0.CoaCompanyName, InvoiceCreditItem_0.IciCreatedDate

How to split a column into two columns based on the value in the another column

I have below data in the Ms SQL server table.
I would like to get the output like below.
I have tried two sets of queries but it didn't helped me.
1st set query gives me the null values
Query
SELECT
[id]
, [sav]
, [cat]
, [tech]
, [asset]
, CASE
WHEN [objname] = 'FieldName'
THEN [stringvalue]
END AS [fieldname]
, CASE
WHEN [objname] = 'FieldValue'
THEN [stringvalue]
END AS [fieldvalue]
FROM [test].[dbo].[sample];
Output
2nd set query gives me 0 as field value, because i have hard coded it.
Query
SELECT
ROW_NUMBER() OVER(ORDER BY [fieldname]) AS 'id'
, [sav]
, [cat]
, [tech]
, [asset]
, [fieldname]
, 0 AS [fieldvalue]
FROM [test].[dbo].[sample] PIVOT(MAX([stringvalue]) FOR [objname] IN(
[fieldname])) [p]
WHERE [fieldname] IS NOT NULL;
Output
How to achieve it ?
You have a very arcane data structure. SQL tables are inherently unordered. From what I can tell, the SQL value is in the "next" row based on the id.
If so, you can use lead():
select . . .,
stringvalue as fieldname, next_string_value as stringvalue
from (select t.*, lead(t.stringvalue) over (order by id) as next_string_value
from t
) t
where t.objname = 'objname';
If you are really using SQL Server 2008, you can use a self-join. This does assume that the ids have no gaps in them.

Oracle SQL UNION ALL with NULL causes performance issues when trying to outer join the data

I am trying to build a query that returns the customers from Oracle and their notes. Unfortunately, the notes tables from which I am selecting the data do not have any 1-1 joins with the customers so I am joining the data by using the party id and looking for a particular string in the notes that contain the customer contract number.
What I want to do is to return the customer, contract and their note information if the notes exist and if the notes do not exist.
I know the code below is lengthy, but I am particularly interested in how to handle the very last bit of the code (so the code where I join with the notes info at the end). The issue that I have in the current version of the query is that if I join the FORCE_NOTE_GUAR and FORCE_NOTE_CUST subqueries by adding the UNION ALL with nulls, the performance is very very bad.
If I remove that UNION ALL the performance is good, however I only get the customers that do have the notes and I don't have the customers that do not have the notes.
I know that it is a long query and a long post so please ping me if I can give more info.
SELECT QUERY_MAIN.*
, FORCE_NOTE_CUST.NOTE_CREATION_DATE AS FORCE_ACCEPT_DATE_CUST
, FORCE_NOTE_GUAR.NOTE_CREATION_DATE AS FORCE_ACCEPT_DATE_GUAR
, FORCE_NOTE_CUST.ENTERED_BY_NAME AS USER_FORCE_ACCEPT_CUST
, FORCE_NOTE_GUAR.ENTERED_BY_NAME AS USER_FORCE_ACCEPT_GUAR
, FORCE_NOTE_CUST.NOTES AS NOTES_CUST
, FORCE_NOTE_GUAR.NOTES AS NOTES_GUAR
FROM (SELECT HP.PARTY_ID
, HCA_CUSTOMER.ACCOUNT_NUMBER AS ACCOUNT_NUMBER
, OKH.CONTRACT_NUMBER AS CONTRACT_NUMBER
, DECODE(OKP.ATTRIBUTE5, 'F', 'Y', 'N') AS CUSTOMER_FORCE
, DECODE(GUAR_FORCE.FORCE_FLAG, 'F', 'Y', 'N') AS GUARANTOR_FORCE
--------------------------------------------------------------------------
FROM ... customer tables) QUERY_MAIN
--------------------------------------------------------------------------------
, (SELECT* FROM(SELECT JII.PARTY_ID AS PARTY_ID
, TO_CHAR(DECODE( JIHA.ACTION, 'Converted'
, SUBSTR(JNV.NOTES_DETAIL,1,2000)
, NVL( JNV.NOTES
, SUBSTR( JNV.NOTES_DETAIL
, 1
, 2000)))) AS NOTES
, JNV.CREATION_DATE AS NOTE_CREATION_DATE
, NVL(PEP.FULL_NAME, FU_INT.USER_NAME) AS ENTERED_BY_NAME
----------------------------------------------------------------
FROM ... notes tables)
WHERE NOTES LIKE '%Guarantor acceptance manually progressed%'
UNION ALL
SELECT NULL AS PARTY_ID
, NULL AS NOTES
, NULL AS NOTE_CREATION_DATE
, NULL AS ENTERED_BY_NAME
FROM DUAL) FORCE_NOTE_GUAR
--------------------------------------------------------------------------------
, (SELECT* FROM(SELECT JII.PARTY_ID AS PARTY_ID
, TO_CHAR(DECODE( JIHA.ACTION, 'Converted'
, SUBSTR(JNV.NOTES_DETAIL,1,2000)
, NVL( JNV.NOTES
, SUBSTR( JNV.NOTES_DETAIL
, 1
, 2000)))) AS NOTES
, JNV.CREATION_DATE AS NOTE_CREATION_DATE
, NVL(PEP.FULL_NAME, FU_INT.USER_NAME) AS ENTERED_BY_NAME
----------------------------------------------------------------
FROM ... notes tables)
WHERE NOTES LIKE '%Customer acceptance manually progressed%'
UNION ALL
SELECT NULL AS PARTY_ID
, NULL AS NOTES
, NULL AS NOTE_CREATION_DATE
, NULL AS ENTERED_BY_NAME
FROM DUAL) FORCE_NOTE_CUST
--------------------------------------------------------------------------------
-- Outer logic to select the appropriate notes
WHERE 1 = 1
AND (( CUSTOMER_FORCE = 'N' AND FORCE_NOTE_CUST.PARTY_ID IS NULL)
--If CUSTOMER_FORCE = 'Y'
--If the customer has force accepted, we need to find the note
OR ( CUSTOMER_FORCE = 'Y'
AND QUERY_MAIN.PARTY_ID = FORCE_NOTE_CUST.PARTY_ID
AND INSTR(FORCE_NOTE_CUST.NOTES, CONTRACT_NUMBER) > 0))
AND (( GUARANTOR_FORCE = 'N' AND FORCE_NOTE_GUAR.PARTY_ID IS NULL)
--If GUARANTOR_FORCE = 'Y'
--If the guarantor has force accepted, we need to find the note
OR ( GUARANTOR_FORCE = 'Y'
AND QUERY_MAIN.PARTY_ID = FORCE_NOTE_GUAR.PARTY_ID
AND INSTR(FORCE_NOTE_GUAR.NOTES, CONTRACT_NUMBER) > 0));
Remove unions with nulls and change Your query to left join version:
SELECT QUERY_MAIN.*,
FORCE_NOTE_CUST.NOTES,
FORCE_NOTE_GUAR.NOTES
FROM QUERY_MAIN
LEFT JOIN FORCE_NOTE_GUAR on FORCE_NOTE_CUST.PARTY_ID = QUERY_MAIN.PARTY_ID
and FORCE_NOTE_CUST.NOTES like '%'||CONTRACT_NUMBER||'%'
LEFT JOIN FORCE_NOTE_CUST on FORCE_NOTE_GUAR.PARTY_ID = QUERY_MAIN.PARTY_ID
and FORCE_NOTE_GUAR.NOTES like '%'||CONTRACT_NUMBER||'%'

SQL between clause returns rows for values outside bounds

As the title says when I use the sql between clause I am running it against an address range from low to high and I am putting in a number that is outside of the two and getting rows returned to me. And I would like to know why.
http://www.sqlfiddle.com/#!6/49467/2
Quick hit on the query, I am using an address number of 4929 and getting rows returned to me where the address range low and high numbers are 400 and 498 respectively.
Here is the query:
SELECT
ZipCodeLow ,
ZipCodeHigh ,
ZipExtensionLow ,
EndingEffectiveDate ,
BeginningEffectiveDate ,
AddressRangeLow ,
AddressRangeHigh ,
StreetName ,
City ,
ZipCode ,
Zip4,
Zip4High
FROM BoundTable
WHERE
('68503' BETWEEN ZipCodeLow AND ZipCodeHigh) AND
('4929' BETWEEN [AddressRangeLow] AND [AddressRangeHigh]) AND
([StreetName] = '32ND') AND
(GETDATE() BETWEEN [BeginningEffectiveDate] AND [EndingEffectiveDate])
Convert 4929 to numeric first and then run the query:
SELECT
ZipCodeLow ,
ZipCodeHigh ,
ZipExtensionLow ,
EndingEffectiveDate ,
BeginningEffectiveDate ,
AddressRangeLow ,
AddressRangeHigh ,
StreetName ,
City ,
ZipCode ,
Zip4,
Zip4High
FROM BoundTable
WHERE
('68503' BETWEEN ZipCodeLow AND ZipCodeHigh) AND
(convert(numeric,'4929') BETWEEN [AddressRangeLow] AND [AddressRangeHigh]) AND
([StreetName] = '32ND') AND
(GETDATE() BETWEEN [BeginningEffectiveDate] AND [EndingEffectiveDate])

Why I need Group by in this simple query?

UPDATE :
-----
the error might be in sum(si.amt_pd) from item table (as there is no relation) :
select SUM(si.amt_pd)amt_pd from [HMIS_REPORTING].HMIS_RPT_ME.dbo.item i
where
is there a work around?
----------
I am trying to run this query. The query just fetches the amount of a month based on some tables. It is just a part of a big query.
select s.sales_Contract_Nbr
, s.Sales_Id
, s.Sale_Dt
, YEAR(s.Sale_Dt) 'YEAR'
, MONTH(s.Sale_Dt) 'MONTH'
, s.Sales_Need_TYpe_Cd
, s.Sales_Status_Cd
, si.Posted
, s.location_Cd
, jan2011 = (
select SUM(si.amt_pd)amt_pd
from [HMIS_REPORTING].HMIS_RPT_ME.dbo.item i
where i.Item_Id = si.Product_Item_ID
and i.Item_Cd <> '*INT'
and convert(varchar(10),SI.Sales_Item_Dt,126) >= '2011-01-01'
and convert(varchar(10),SI.Sales_Item_Dt,126) >= '2011-01-31'
) INTO dbo.#a_acomparision
FROM [HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales S
, [HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales_Item SI
WHERE SI.Sales_Id = S.Sales_Id
and s.Sales_Contract_Nbr in (
select distinct (Sales_Contract_Nbr)
from mountainviewContracts
where Sales_Contract_Nbr <> '')
but I am getting the following error message.
Msg 8120, Level 16, State 1, Line 1
Column 'HMIS_REPORTING.HMIS_RPT_ME.dbo.Sales.Sales_Contract_Nbr' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I just can't understand why my query should have a group by for sales_contract_nbr and even if I put in the group by clause it tells me that inner query si.Product_item_id and SI.sales_item_dt should also be contained in group by clause.
Please help me out.
Thanks in advance
This is a very subtle problem. However, I think the subquery should be:
select SUM(i.amt_pd)amt_pd from [HMIS_REPORTING].HMIS_RPT_ME.dbo.item i
That is, the alias should be i not si.
What is happening is that the sum in the subquery is on a value in the outer query. So, the SQL compiler assumes an aggregation query. As soon as the first column is found that is not an aggregation, it complains with the message that you have.
By the way, you should use proper join syntax, so you from clause looks like:
FROM [HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales S join
[HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales_Item SI
on SI.Sales_Id = S.Sales_Id
As #Gordon Linoff says, this is almost certainly because the query optimizer is treating this like a SUM operation, normalizing away the subquery for "jan2001".
If the amt_pd column is present in the ITEM table, Gordon's solution is the right one.
If not, you have to add the group by statement, as below.
select s.sales_Contract_Nbr
, s.Sales_Id
, s.Sale_Dt
, YEAR(s.Sale_Dt) 'YEAR'
, MONTH(s.Sale_Dt) 'MONTH'
, s.Sales_Need_TYpe_Cd
, s.Sales_Status_Cd
, si.Posted
, s.location_Cd
, jan2011 = (
select SUM(si.amt_pd)amt_pd
from [HMIS_REPORTING].HMIS_RPT_ME.dbo.item i
where i.Item_Id = si.Product_Item_ID
and i.Item_Cd <> '*INT'
and convert(varchar(10),SI.Sales_Item_Dt,126) >= '2011-01-01'
and convert(varchar(10),SI.Sales_Item_Dt,126) >= '2011-01-31'
) INTO dbo.#a_acomparision
FROM [HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales S
, [HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales_Item SI
WHERE SI.Sales_Id = S.Sales_Id
and s.Sales_Contract_Nbr in (
select distinct (Sales_Contract_Nbr)
from mountainviewContracts
where Sales_Contract_Nbr <> '')
GROUP BY s.sales_Contract_Nbr
, s.Sales_Id
, s.Sale_Dt
, YEAR
, MONTH
, s.Sales_Need_TYpe_Cd
, s.Sales_Status_Cd
, si.Posted
, s.location_Cd