select
<here I have functions like to_char, nvl, rtrim, ltrim, sum, decode>
from
table1
table2
where
joining conditions 1
joining conditions 2
group by
<here I have functions like to_char, nvl, rtrim, ltrim, sum, decode>
I got this query from production and looking at it need to provide few solutions to tune, I m thinking of using function based inbex for group by columns. I think select columns need not be index. I will get enviornment in couple of days but before that I need to come up with different apporaches. What all things I need to check if function by index is useful? Also, apart from explain plan which other documents I need to ask from DBAs?
I m adding actual sql here, I have asked for explain plan, which I will get in sometime :-
SELECT
D_E_TRADE.DATE_VALUE,
to_char(D_E_TRADE.DATE_VALUE,'Mon-yyyy'),
NVL(P_DIM.P_NAME,' '),
rtrim(ltrim(P_DIM.C_CTRY)),
D_E_TRADE.YEAR,
L_E_DIM.L_CODE,
NVL(D_DIM.DESCR,' '),
( decode(D_DIM.DEPT_ID,'-1',' ',D_DIM.DEPT_ID) ),
sum(A_CGE.TOTAL_CALC_NET_FEES),
L_E_DIM.L_NAME,
decode(A_CGE.E_M_CENTER,-9,0,A_CGE.E_M_CENTER),
NVL(F_DIM.S_DESC,'-1'),
sum(A_CGE.C_TOTAL_SHARES)
FROM
DATE_D D_E_TRADE,
P_DIM,
L_E_DIM,
D_DIM,
A_CGE,
F_DIM
WHERE
( D_E_TRADE.DATE_KEY=A_CGE.T_KEY )
AND ( P_DIM.PARTY_KEY=A_CGE.E_P_KEY )
AND ( F_DIM.F_T_KEY=A_CGE.F_T_KEY )
AND ( L_E_DIM.L_E_KEY=A_CGE.L_E_KEY )
AND ( D_DIM.DEPT_KEY=A_CGE.DEPT_KEY )
AND
(
rtrim(ltrim(P_DIM.C_CTRY)) = 'AC'
AND
( A_CGE.T_KEY >= (SELECT DATE_D_PROMPTS.DATE_KEY FROM DATE_D DATE_D_PROMPTS WHERE ( DATE_D_PROMPTS.DATE_VALUE = '01-01-2012 00:00:00' ) )
AND
A_CGE.T_KEY <= (SELECT DATE_D_PROMPTS.DATE_KEY FROM DATE_D DATE_D_PROMPTS WHERE ( DATE_D_PROMPTS.DATE_VALUE = '31-08-2012 00:00:00' ))
AND
A_CGE.TRANS_REGION_KEY IN (SELECT REGION_KEY FROM REGION_DIM WHERE REGION_DIM.REGION_NAME IN ('Americas') ) )
AND
( A_CGE.T_KEY >= (SELECT DATE_D_PROMPTS.DATE_KEY FROM DATE_D DATE_D_PROMPTS WHERE ( DATE_D_PROMPTS.DATE_VALUE = '01-01-2012 00:00:00' ) )
AND
A_CGE.T_KEY <= (SELECT DATE_D_PROMPTS.DATE_KEY FROM DATE_D DATE_D_PROMPTS WHERE ( DATE_D_PROMPTS.DATE_VALUE = '31-08-2012 00:00:00' ))
AND
A_CGE.TRANS_REGION_KEY IN (SELECT REGION_KEY FROM REGION_DIM WHERE REGION_DIM.REGION_NAME IN ('Americas') ) )
AND
( 'All Fees' IN ('2 - E','3 - P','4 - F','5 - C,') OR A_CGE.F_T_KEY IN (SELECT F_T_KEY FROM F_DIM WHERE (F_DIM.s_id ) || ' - ' || ( F_DIM.CHARGE_LVL1_NAME ) IN ('2 - E','3 - P','4 - F','5 - C')) )
)
GROUP BY
D_E_TRADE.DATE_VALUE,
to_char(D_E_TRADE.DATE_VALUE,'Mon-yyyy'),
NVL(P_DIM.P_NAME,' '),
rtrim(ltrim(P_DIM.C_CTRY)),
D_E_TRADE.YEAR,
L_E_DIM.L_CODE,
NVL(D_DIM.DESCR,' '),
( decode(D_DIM.DEPT_ID,'-1',' ',D_DIM.DEPT_ID) ),
L_E_DIM.L_NAME,
decode(A_CGE.E_M_CENTER,-9,0,A_CGE.E_M_CENTER),
NVL(F_DIM.S_DESC,'-1')
Generaly, indexes help you on fast retrieval of data when you have filtering conditions wich may use the indexes.
(Another case whold be when you retrieve only column that are in the index, so the engine does not need to read anything from table)
In your case, you may need indexes on filtering/join conditions in the following part:
joining conditions 1
joining conditions 2
But keep in mind. If the you get more than 15%-20% of rows of a table, is better to read from table, not to use the index. That is, the index may not be used.
Related
I am attempting to create an IF statement in BigQuery. I have built a concept that will work but it does not select the data from a table, I can only get it to display 1 or 0
Example:
SELECT --AS STRUCT
CASE
WHEN (
Select Count(1) FROM ( -- If the records are the same, then return = 0, if the records are not the same then > 1
Select Distinct ESCO, SOURCE, LDCTEXT, STATUS,DDR_DATE, TempF, HeatingDegreeDays, DecaTherms
from `gas-ddr.gas_ddr_outbound.LexingtonDDRsOutbound_onchange_Prior_Filtered`
Except Distinct
Select Distinct ESCO, SOURCE, LDCTEXT, STATUS,DDR_DATE, TempF, HeatingDegreeDays, DecaTherms
from `gas-ddr.gas_ddr_outbound.LexingtonDDRsOutbound_onchange_Latest_Filtered`
)
)= 0
THEN
(Select * from `gas-ddr.gas_ddr_outbound.LexingtonDDRsOutbound_onchange_Latest`) -- This Does not
work Scalar subquery cannot have more than one column unless using SELECT AS
STRUCT to build STRUCT values at [16:4] END
SELECT --AS STRUCT
CASE
WHEN (
Select Count(1) FROM ( -- If the records are the same, then return = 0, if the records are not the same then > 1
Select Distinct ESCO, SOURCE, LDCTEXT, STATUS,DDR_DATE, TempF, HeatingDegreeDays, DecaTherms
from `gas-ddr.gas_ddr_outbound.LexingtonDDRsOutbound_onchange_Prior_Filtered`
Except Distinct
Select Distinct ESCO, SOURCE, LDCTEXT, STATUS,DDR_DATE, TempF, HeatingDegreeDays, DecaTherms
from `gas-ddr.gas_ddr_outbound.LexingtonDDRsOutbound_onchange_Latest_Filtered`
)
)= 0
THEN 1 --- This does work
Else
0
END
How can I Get this query to return results from an existing table?
You question is still a little generic, so my answer same as well - and just mimic your use case at extend I can reverse engineer it from your comments
So, in below code - project.dataset.yourtable mimics your table ; whereas
project.dataset.yourtable_Prior_Filtered and project.dataset.yourtable_Latest_Filtered mimic your respective views
#standardSQL
WITH `project.dataset.yourtable` AS (
SELECT 'aaa' cols, 'prior' filter UNION ALL
SELECT 'bbb' cols, 'latest' filter
), `project.dataset.yourtable_Prior_Filtered` AS (
SELECT cols FROM `project.dataset.yourtable` WHERE filter = 'prior'
), `project.dataset.yourtable_Latest_Filtered` AS (
SELECT cols FROM `project.dataset.yourtable` WHERE filter = 'latest'
), check AS (
SELECT COUNT(1) > 0 changed FROM (
SELECT DISTINCT cols FROM `project.dataset.yourtable_Latest_Filtered`
EXCEPT DISTINCT
SELECT DISTINCT cols FROM `project.dataset.yourtable_Prior_Filtered`
)
)
SELECT t.* FROM `project.dataset.yourtable` t
CROSS JOIN check WHERE check.changed
the result is
Row cols filter
1 aaa prior
2 bbb latest
if you changed your table to
WITH `project.dataset.yourtable` AS (
SELECT 'aaa' cols, 'prior' filter UNION ALL
SELECT 'aaa' cols, 'latest' filter
) ......
the result will be
Row cols filter
Query returned zero records.
I hope this gives you right direction
Added more explanations:
I can be wrong - but per your question - it looks like you have one table project.dataset.yourtable and two views project.dataset.yourtable_Prior_Filtered and project.dataset.yourtable_Latest_Filtered which present state of your table prior and after some event
So, first three CTE in the answer above just mimic those table and views which you described in your question.
They are here so you can see concept and can play with it without any extra work before adjusting this to your real use-case.
For your real use-case you should omit them and use your real table and views names and whatever columns the have.
So the query for you to play with is:
#standardSQL
WITH check AS (
SELECT COUNT(1) > 0 changed FROM (
SELECT DISTINCT cols FROM `project.dataset.yourtable_Latest_Filtered`
EXCEPT DISTINCT
SELECT DISTINCT cols FROM `project.dataset.yourtable_Prior_Filtered`
)
)
SELECT t.* FROM `project.dataset.yourtable` t
CROSS JOIN check WHERE check.changed
It should be a very simple IF statement in any language.
Unfortunately NO! it cannot be done with just simple IF and if you see it fit you can submit a feature request to BigQuery team for whatever you think makes sense
I have an SQL query in which I need to take the output of a subquery and use it more than once. My existing query works, but only if I repeat the subquery each time I need it. Unfortunately the subquery is complex, and takes time to execute - meaning that multiple iterations really slow the whole thing down.
I have read that you can use the "WITH" statement to assign a subquery output to a variable, in order to re-use that variable. However the problem I'm having is that within the subquery, I need to reference values from the main query. And it appears that if I use WITH - before the main query SELECT - then those references are not recognised. I'll give you a simplified example:
WITH
DateX AS
(
SELECT
MAX(TableSub.Date)
FROM
TableA TableSub
WHERE
TableSub.ID = TableMain.ID
AND TableSub.Event = 'AnotherEvent'
AND TableSub.Date BETWEEN '01-Jan-2015' AND '31-Dec-2015'
)
SELECT
TableMain.ID
FROM
TableA TableMain
WHERE
TableMain.Event = 'MainEvent'
AND TableMain.Date >= DateX
AND (
SELECT
TableSub2.ID
FROM
TableA TableSub2
WHERE
TableSub2.ID = TableMain.ID
TableSub2.Event = 'ThirdEvent'
AND TableSub2.Date <= DateX
) IS NULL
I hope this is clear. It's a simplified version of what I have, but you can see that DateX is used in more than one place: within the main query, and within a subquery. However the problem is that when DateX is defined by WITH, I need to link the ID back to the ID of the main query. And it's not working...
I would be grateful for any advice on this. Am I doing it wrong? Is there a way, or is it just impossible? If so, then should I be using another approach entirely? Thanks.
A better way:
SELECT ID
FROM (
SELECT ID,
"Date",
Event,
LAST_VALUE( CASE Event WHEN 'AnotherEvent' THEN "Date" END IGNORE NULLS )
OVER ( PARTITION BY ID ORDER BY "Date"
ROWS BETWEEN UNBOUNDED PRECEEDING AND UNBOUNDED FOLLOWING
) AS another_date,
FIRST_VALUE( CASE Event WHEN 'ThirdEvent' THEN "Date" END IGNORE NULLS )
OVER ( PARTITION BY ID ORDER BY "Date"
ROWS BETWEEN UNBOUNDED PRECEEDING AND UNBOUNDED FOLLOWING
) AS third_date
FROM TableA
WHERE Event IN ( 'MainEvent', 'ThirdEvent' )
OR ( Event = 'AnotherEvent' AND EXTRACT( YEAR FROM "Date" ) = 2015 )
)
WHERE Event = 'MainEvent'
AND "Date" >= another_date
AND ( third_date IS NULL OR third_date > another_date );
You need to join your DateX CTE on the ID column. Something like:
WITH
DateX AS
(
SELECT
TableSub.ID,
MAX(TableSub.Date) AS MaxDate
FROM
TableA TableSub
WHERE
AND TableSub.Event = 'AnotherEvent'
AND TableSub.Date >= DATE '2015-01-01'
AND TableSub.Date < DATE '2016-01-01'
GROUP BY
TableSub.ID
)
SELECT
TableMain.ID
FROM
TableA TableMain
JOIN
DateX
ON
DateX.ID = TableMain.ID
WHERE
TableMain.Event = 'MainEvent'
AND TableMain.Date >= DateX.MaxDate
AND (
SELECT
TableSub2.ID
FROM
TableA TableSub2
JOIN
DateX
ON
DateX.ID = TableSub2.ID
WHERE
TableSub2.ID = TableMain.ID
TableSub2.Event = 'ThirdEvent'
AND TableSub2.Date <= DateX.MaxDate
) IS NULL
The CTE also needs a column alias for the aggregate; and as you need to join in the ID, you need to include that and group by it.
The last subquery looks odd; you might want NOT EXISTS rather than IS NULL if you're looking for no record. Perhaps your real query is using an aggregate, but even so that might be quicker.
This still may not be the best approach but it's hard to tell from your example. Hitting the same table three times may be unnecessarily expensive.
How can I make this SQL query more efficient? The CteFinal code shown below is a portion of my query which add up to 6 minutes to my query. The cteMonth is shown below. The cteDetail is another cte which pulls information directly from the database, and it takes less than a second to run.
What CteFinal is doing is creating missing fiscal period rows while including some of the column data from the row where f.FiscalPeriod=0.
I cannot add, delete, or change any of the indexes on the tables, as this is a ERP database and I'm not allowed to make those type of changes.
CteFinal:
SELECT Account,Month, CONVERT(DATETIME, CAST(#Year as varchar(4)) + '-' + CAST(Month as VARCHAR(2)) + '-' + '01', 102) JEDate
,accountdesc,'' Description,'' JournalCode,NULL JournalNum,NULL JournalLine
,'' LegalNumber,'' CurrencyCode,0.00 DebitAmount,0.00 CreditAmount,fiscalcalendarid,company,bookid,SegValue2,SegValue1,SegValue3,SegValue4
FROM cteDetail f
CROSS JOIN cteMonths m
WHERE f.FiscalPeriod=0 and not exists(select * from cteDetailADDCreatedZero x where x.Account=f.Account and x.FiscalPeriod=Month)
CteMonth:
cteMonths (Month) AS(
select 0 as Month
UNION select 1 as Month
UNION select 2 as Month
UNION select 3 as Month
UNION select 4 as Month
UNION select 5 as Month
UNION select 6 as Month
UNION select 7 as Month
UNION select 8 as Month
UNION select 9 as Month
UNION select 10 as Month
UNION select 11 as Month
UNION select 12 as Month)
Thank you!
Here's a slightly more efficient way to generate the 12 months of a given year (even more efficient if you have your own Numbers table):
DECLARE #year INT = 2013;
;WITH cteMonths([Month],AsDate) AS
(
SELECT n-1,DATEADD(YEAR, #Year-1900, DATEADD(MONTH,n-1,0)) FROM (
SELECT TOP (13) RANK() OVER (ORDER BY [object_id]) FROM sys.all_objects
) AS c(n)
)
SELECT [Month], AsDate FROM cteMonths;
So now, you can say:
;WITH cteMonths([Month],AsDate) AS
(
SELECT n,DATEADD(YEAR, #Year-1900, DATEADD(MONTH,n-1,0)) FROM (
SELECT TOP (13) RANK() OVER (ORDER BY [object_id]) FROM sys.all_objects
) AS c(n)
),
cteDetail AS
(
...no idea what is here...
),
cteDetailADDCreatedZero AS
(
...no idea what is here...
)
SELECT f.Account, m.[Month], JEDate = m.AsDate, f.accountdesc, Description = '',
JournalCode = '', JournalNum = NULL, JournalLine = NULL, LegalNumber = '',
CurrencyCode = '', DebitAmount = 0.00, CreditAmount = 0.00, f.fiscalcalendarid,
f.company, f.bookid, f.SegValue2, f.SegValue1, f.SegValue3, f.SegValue4
FROM cteMonths AS m
LEFT OUTER JOIN cteDetail AS f
ON ... some clause I am not clear on ...
WHERE f.FiscalPeriod = 0
AND NOT EXISTS
(
SELECT 1 FROM cteDetailADDCreatedZero AS x
WHERE x.Account = f.Account
AND x.FiscalPeriod = m.[Month]
);
I suspect this won't solve your problem though: it is likely that this is forcing an entire table scan on either whatever tables are mentioned in cteDetail or cteDetailADDCreatedZero or both. You should inspect the actual execution plan for this query and see if there are any scans or other expensive operations that could guide you towards better indexing. It also might just be that you have a bunch of inefficient CTEs stacked up together - we can't really help with that unless you show everything. CTEs are like views - if you start stacking them up on top of each other, you really limit the optimizer's ability to generate an efficient plan for you. At some point it will just throw its hands in the air...
One possibility is to physicalize the SQL View (if it the query is a view). Sometimes views with complex queries are slow.
I need to create an aggregate function in Advantage-Database to calculate the median value.
SELECT
group_field
, MEDIAN(value_field)
FROM
table_name
GROUP BY
group_field
Seems the solutions I am finding are quite specific to the sql engine used.
There is no built-in median aggregate function in ADS as you can see in the help file:
http://devzone.advantagedatabase.com/dz/webhelp/Advantage10.1/index.html
I'm afraid that you have to write your own stored procedure or sql script to solve this problem.
The accepted answer to the following question might be a solution for you:
Simple way to calculate median with MySQL
I've updated this answer with a solution that avoids the join in favor of storing some data in a json object.
SOLUTION #1 (two selects and a join, one to get counts, one to get rankings)
This is a little lengthy, but it does work, and it's reasonably fast.
SELECT x.group_field,
avg(
if(
x.rank - y.vol/2 BETWEEN 0 AND 1,
value_field,
null
)
) as median
FROM (
SELECT group_field, value_field,
#r:= IF(#current=group_field, #r+1, 1) as rank,
#current:=group_field
FROM (
SELECT group_field, value_field
FROM table_name
ORDER BY group_field, value_field
) z, (SELECT #r:=0, #current:='') v
) x, (
SELECT group_field, count(*) as vol
FROM table_name
GROUP BY group_field
) y WHERE x.group_field = y.group_field
GROUP BY x.group_field
SOLUTION #2 (uses a json object to store the counts and avoids the join)
SELECT group_field,
avg(
if(
rank - json_extract(#vols, path)/2 BETWEEN 0 AND 1,
value_field,
null
)
) as median
FROM (
SELECT group_field, value_field, path,
#rnk := if(#curr = group_field, #rnk+1, 1) as rank,
#vols := json_set(
#vols,
path,
coalesce(json_extract(#vols, path), 0) + 1
) as vols,
#curr := group_field
FROM (
SELECT p.group_field, p.value_field, concat('$.', p.group_field) as path
FROM table_name
JOIN (SELECT #curr:='', #rnk:=1, #vols:=json_object()) v
ORDER BY group_field, value_field DESC
) z
) y GROUP BY group_field;
Pardon the convoluted example, but I believe there is something fundamental about sql I am missing and I'm not sure what it is. I have this crazy query...
SELECT *
FROM (
SELECT *
FROM (
SELECT #t1 := #t1 +1 AS leaderboard_entry_youngness_rank, 1 - #t1 /100 AS
leaderboard_entry_youngness_based_on_expiry, leaderboard_entry . * ,
NOW( ) - leaderboard_entry_timestamp AS leaderboard_entry_age_in_some_units,
TO_DAYS( NOW( ) ) - TO_DAYS( leaderboard_entry_timestamp )
AS leaderboard_entry_age_in_days
FROM leaderboard_entry) AS inner_temp
NATURAL JOIN leaderboard
NATURAL JOIN user
WHERE (
leaderboard_load_key = 'sk-en-adjectives-1'
OR leaderboard_load_key = '-sk-en-adjectives-1'
)
AND leaderboard_quiz_mode = '0'
ORDER BY leaderboard_entry_age_in_some_units ASC , leaderboard_entry_timestamp ASC
LIMIT 0 , 100
) AS outer_temp
ORDER BY leaderboard_entry_elapsed_time_ms ASC , leaderboard_entry_timestamp ASC
LIMIT 0 , 50
I added the second nested SELECT statement because the user_name in the user table was not being returned in the outermost query. But now the leaderboard_entry_youngness_based_on_expiry field, which is being generated based on a row index ratio, is not working correctly.
If I remove the second nested SELECT statement, the leaderboard_entry_youngness_based_on_expiry works as expected, but the user_name column is not returned.
How can I satisfy both? Why is this happening?
Thanks!
This stems from the following question:
Add a numbered list column to a returned MySQL query
In your inner SELECT statement, you do not have user.user_name, that's why username is not returned. Remove the outer query, do it like earlier but with user.user_name like this:
....
SELECT #t1 := #t1 +1 AS leaderboard_entry_youngness_rank, 1 - #t1 /100 AS
leaderboard_entry_youngness_based_on_expiry, leaderboard_entry . * ,
NOW( ) - leaderboard_entry_timestamp AS leaderboard_entry_age_in_some_units,
TO_DAYS( NOW( ) ) - TO_DAYS( leaderboard_entry_timestamp )
AS leaderboard_entry_age_in_days, user.user_name
....
Try putting a ORDER BY in the inner most query, since there currently is no ORDER BY clause, its wrong to say that "is not working correctly".
Check if you take away the outer SELECT * FROM..., see if there are duplicate user_name columns.
BTW, Since you are not using the row index columns in your query, why not just put this logic in the application itself? it will be more reliable doing so.