Query optimization with 3000000 in single date oracle - sql

Table x contains millions of rows and I have to fetch data for single date using function based index(trunc).
Single date data for eg, for 22-07-16 we have 3000000 rows. I am also using case for sum of columns. Query taking 18 sec. How I can reduce time.
EDIT
QUERY:
SELECT SUM(
CASE
WHEN cssgoldenc1_.impact='Low'
THEN 1
ELSE 0
END) AS col_0_0_,
SUM(
CASE
WHEN cssgoldenc1_.impact='High'
THEN 1
ELSE 0
END) AS col_1_0_
FROM CSSCOMPLIANCEDETAIL csscomplia0_,
CSSGoldenConfiguration cssgoldenc1_,
CSS css7_
WHERE csscomplia0_.cssGoldenConfigurationID_FK=cssgoldenc1_.CSSGoldenConfigurationId_PK
AND csscomplia0_.cssID_FK =css7_.cssId_PK
AND (cssgoldenc1_.cmcategory IN ('Access List','Application of QoS Policy','Archive','BFD','BGP', 'CPU','Clock','Debug','Default settings','Entity Check','IGP Routing','Inclusion in VRF', 'Interface Parameters','LDP','LDP Establishment','License','Logging/Syslog/Debug','MTU Size', 'Multicast','Multilink','NodeReadiness','Nomenclature Related','Performance Optimization', 'QoS','Router OAM','Routing','SNMP','Security','Services','System Recovery', 'Type of Interface','Unicast','Unrequired Services','mBGP'))
AND TRUNC(csscomplia0_.creationDate) =to_Date('22-07-16','dd-mm-yy')
AND (css7_.softwareVersion IN ('/asr920-universalk9.V155_1S2_SR635680903_6.bin', '/asr920-universalk9_npe.03.13.00.S.154-3.S-ext.bin','/asr920-universalk9_npe.03.14.02.S.155-1.S2-std.bin', '/asr920-universalk9_npe.V155_1_S2_SR635680903_2.bin','/asr920-universalk9_npe.V155_1_S2_SR635680903_6.bin', '/bootflash','asr901-universalk9-mz.155-3.S1a.bin','asr903rsp1-universalk9_npe.V155_1_S2_SR635680903_10.bin', 'asr920-universalk9.V155_1S2_SR635680903_6.bin','asr920-universalk9_npe.03.13.00.S.154-3.S-ext.bin', 'asr920-universalk9_npe.03.13.00z.S.154-3.S0z-ext.bin','asr920-universalk9_npe.03.14.02.S.155-1.S2-', 'asr920-universalk9_npe.03.14.02.S.155-1.S2-std.bin','asr920-universalk9_npe.03.15.01.S.155-2.S1-std.bin', 'asr920-universalk9_npe.03.16.01a.S.155-3.S1a-ext.bin','asr920-universalk9_npe.2016-05-10_07.53_saappuku.bin' ,'asr920-universalk9_npe.V155_1_S2_SR635680903_2.bin','asr920-universalk9_npe.V155_1_S2_SR635680903_6.bin',
'asr920-universalk9_npe.V155_1_S2_SR635680903_6.binn','bootflash'));
Index :
create index idx_fnc on CSSCOMPLIANCEDETAIL(trunc(creationDate));

Try this. Basically I took the CASE to a subquery since this way it shouldn't be evaluated 3M times. I also change the query in order to use JOIN
with cssgoldenc1_ as
(select /*+ Materialize */ CASE WHEN impact='Low' THEN 1
ELSE 0 END AS col_0_0_,
CASE WHEN impact='High' THEN 1
ELSE 0 END AS col_1_0_,
CSSGoldenConfigurationId_PK
from CSSGoldenConfiguration
where cssgoldenc1_.cmcategory IN ('Access List','Application of QoS Policy','Archive','BFD','BGP', 'CPU','Clock','Debug','Default settings','Entity Check','IGP Routing','Inclusion in VRF', 'Interface Parameters','LDP','LDP Establishment','License','Logging/Syslog/Debug','MTU Size', 'Multicast','Multilink','NodeReadiness','Nomenclature Related','Performance Optimization', 'QoS','Router OAM','Routing','SNMP','Security','Services','System Recovery', 'Type of Interface','Unicast','Unrequired Services','mBGP')
)
SELECT SUM(col_0_0_) AS col_0_0_,
SUM(col_1_0_) AS col_1_0_
FROM CSSCOMPLIANCEDETAIL csscomplia0_ join cssgoldenc1_ on csscomplia0_.cssGoldenConfigurationID_FK = cssgoldenc1_.CSSGoldenConfigurationId_PK
join CSS css7_ on csscomplia0_.cssID_FK = css7_.cssId_PK
WHERE TRUNC(csscomplia0_.creationDate) =to_Date('22-07-16','dd-mm-yy')
AND css7_.softwareVersion IN ('/asr920-universalk9.V155_1S2_SR635680903_6.bin', '/asr920-universalk9_npe.03.13.00.S.154-3.S-ext.bin','/asr920-universalk9_npe.03.14.02.S.155-1.S2-std.bin', '/asr920-universalk9_npe.V155_1_S2_SR635680903_2.bin','/asr920-universalk9_npe.V155_1_S2_SR635680903_6.bin', '/bootflash','asr901-universalk9-mz.155-3.S1a.bin','asr903rsp1-universalk9_npe.V155_1_S2_SR635680903_10.bin', 'asr920-universalk9.V155_1S2_SR635680903_6.bin','asr920-universalk9_npe.03.13.00.S.154-3.S-ext.bin', 'asr920-universalk9_npe.03.13.00z.S.154-3.S0z-ext.bin','asr920-universalk9_npe.03.14.02.S.155-1.S2-', 'asr920-universalk9_npe.03.14.02.S.155-1.S2-std.bin','asr920-universalk9_npe.03.15.01.S.155-2.S1-std.bin', 'asr920-universalk9_npe.03.16.01a.S.155-3.S1a-ext.bin','asr920-universalk9_npe.2016-05-10_07.53_saappuku.bin' ,'asr920-universalk9_npe.V155_1_S2_SR635680903_2.bin','asr920-universalk9_npe.V155_1_S2_SR635680903_6.bin',
'asr920-universalk9_npe.V155_1_S2_SR635680903_6.binn','bootflash');

Related

Dynamic SQL: CASE expression in HAVING clause for SSRS dataset query

One of my tables contains 6 bit flags:
tblDocumentFact.useCase1
tblDocumentFact.useCase2
tblDocumentFact.useCase3
tblDocumentFact.useCase4
tblDocumentFact.useCase5
tblDocumentFact.useCase6
The bit flags are used to restrict the returned data via a HAVING clause, for example:
HAVING tblDocumentFact.useCase4 = 1 /* '1' means 'True' */
That works in a static query. The query is for a dataset for a SQL Server Reporting Services report. Rather than have 6 reports, one per bit flag, I'd like to have 1 report with an #UserChoice input parameter. I'm trying to write a dynamic query to structure the HAVING clause in accordance with the #UserChoice parameter. I'm thinking that #UserChoice could be set to an integer value (1, 2, 3, 4, 5 or 6) when the user clicks a 1-of-6 option button. I've tried to do this via CASE expressions as shown below, but it doesn't work--the query returns no rows. What's the correct approach here?
HAVING (
(CASE WHEN #UserChoice =1 THEN 'dbo.tblDocumentFact.useCase1' END) = '1'
OR (CASE WHEN #UserChoice =2 THEN 'dbo.tblDocumentFact.useCase2' END) = '1'
OR (CASE WHEN #UserChoice =3 THEN 'dbo.tblDocumentFact.useCase3' END) = '1'
OR (CASE WHEN #UserChoice =4 THEN 'dbo.tblDocumentFact.useCase4' END) = '1'
OR (CASE WHEN #UserChoice =5 THEN 'dbo.tblDocumentFact.useCase5' END) = '1'
OR (CASE WHEN #UserChoice =6 THEN 'dbo.tblDocumentFact.useCase6' END) = '1'
)
You need to rephrase your logic slightly:
HAVING
(#UserChoice = 1 AND 'dbo.tblDocumentFact.useCase1' = '1') OR
(#UserChoice = 2 AND 'dbo.tblDocumentFact.useCase2' = '2') OR
(#UserChoice = 3 AND 'dbo.tblDocumentFact.useCase3' = '3') OR
(#UserChoice = 4 AND 'dbo.tblDocumentFact.useCase4' = '4') OR
(#UserChoice = 5 AND 'dbo.tblDocumentFact.useCase5' = '5') OR
(#UserChoice = 6 AND 'dbo.tblDocumentFact.useCase6' = '6');
A CASE expression can't be used in the way you were using it, because what follows THEN or ELSE has to be a literal value, not a logical condition.
To expand a bit on the comment under Tim's post, I think the reason it doesn't work out is because your cases are emitting strings containing column names not the values of columns
HAVING
CASE WHEN #UserChoice = 1 THEN dbo.tblDocumentFact.useCase1 END = 1
OR CASE WHEN #UserChoice = 2 THEN dbo.tblDocumentFact.useCase2 END = 1
...
It might even clean up to this:
HAVING
CASE #UserChoice
WHEN 1 THEN dbo.tblDocumentFact.useCase1
WHEN 2 THEN dbo.tblDocumentFact.useCase2
...
END = 1
The problem (I believe; in sql server at least, not totally sure about SSRS) is that when you say:
CASE WHEN #UserChoice = 1 THEN 'dbo.tblDocumentFact.useCase1' END = '1'
Your case when is emitting the literal string dbo.tblDocumentFact.useCase1 not the value of that column on that row. And of course this literal string is never equal to a literal string of 1
Overall I prefer Tim's solution; I think the query optimizer will more likely be able to use an index on the bit columns in that form, but be aware that use of ORs can cause sql server to ignore indexes; the DBAs at my old place frequently rewrote queries like:
SELECT * FROM Person WHERE FirstName = 'john' OR LastName = 'Smith'
Into this:
SELECT * FROM Person WHERE FirstName = 'john'
UNION
SELECT * FROM Person WHERE LastName = 'Smith'
Because the server wouldn't combine the index on FirstName and the other index on LastName when we used OR, but it would parallel execute using both indexes in the UNION form
Consider as an alternative, combining those bit flags into a single integer, either as a binary 2's complement (if you want to be able to say user choice 1 and 2 by searching for 3 or choice 2 and 4 and 6 by searching 42 [2^(2 -1) + 2^(4-1) + 2^(6-1)]) or just a straight int you can compare to #userChoice, and indexing it

Oracle SQL Statement - Identify & Count Unique Callers

I'm looking to make some improvements to our telephony call data - and have a requirement to identify if a CALLER is unique - if they call more than once on a given date (CALL_DATE) - it flags as a 1 value, if only once a 0 value.
Any ideas how I can modify this existing statement to reflect this?
SELECT /*+ PARALLEL (4) */
A.CALL_ID,
A.CALL_DATE,
O.OT_OUTLET_CODE,
A.CALL_TIME,
TO_CHAR(TO_DATE(A.CALL_TIME, 'HH24:MI:SS')+A.TALK_TIME/(24*60*60),'HH24:MI:SS') "CALL_END_TIME",
A.TALK_TIME,
A.RING_TIME,
A.OUTCOME,
CASE WHEN A.TRANSFER_TO = '10000' THEN 1 ELSE 0 END AS "VOICEMAIL"
FROM
OWBI.ODS_FACT_TIGER_TELEPHONY A,
OWBI.WHS_DIM_CAL_DATE C,
OWBI.WHS_DIM_OUTLET O
WHERE
A.CALL_DATE = C.CD_DAY_DATE
AND A.WHS_DIM_OUTLET = O.DIMENSION_KEY
AND C.EY_YEAR_CODE IN ('2019')
AND C.EW_WEEK_IN_YEAR IN ('1') -- **FILTER ON PREVIOUS BUSINESS WEEK NUMBER**
ORDER BY A.CALL_DATE DESC;
What you are describing sounds like a job for the analytic count(*) function.
Add this to the SELECT clause and don't change anything else:
case when count(*) over (partition by a.call_id, a.call_date) = 1 then 0
else 1 end as unique_flag

Fetch rows based on condition

I am using PostgreSQL on Amazon Redshift.
My table is :
drop table APP_Tax;
create temp table APP_Tax(APP_nm varchar(100),start timestamp,end1 timestamp);
insert into APP_Tax values('AFH','2018-01-26 00:39:51','2018-01-26 00:39:55'),
('AFH','2016-01-26 00:39:56','2016-01-26 00:40:01'),
('AFH','2016-01-26 00:40:05','2016-01-26 00:40:11'),
('AFH','2016-01-26 00:40:12','2016-01-26 00:40:15'), --row x
('AFH','2016-01-26 00:40:35','2016-01-26 00:41:34') --row y
Expected output:
'AFH','2016-01-26 00:39:51','2016-01-26 00:40:15'
'AFH','2016-01-26 00:40:35','2016-01-26 00:41:34'
I had to compare start and endtime between alternate records and if the timedifference < 10 seconds get the next record endtime till last or final record.
I,e datediff(seconds,2018-01-26 00:39:55,2018-01-26 00:39:56) Is <10 seconds
I tried this :
SELECT a.app_nm
,min(a.start)
,max(b.end1)
FROM APP_Tax a
INNER JOIN APP_Tax b
ON a.APP_nm = b.APP_nm
AND b.start > a.start
WHERE datediff(second, a.end1, b.start) < 10
GROUP BY 1
It works but it doesn't return row y when conditions fails.
There are two reasons that row y is not returned is due to the condition:
b.start > a.start means that a row will never join with itself
The GROUP BY will return only one record per APP_nm value, yet all rows have the same value.
However, there are further logic errors in the query that will not successfully handle. For example, how does it know when a "new" session begins?
The logic you seek can be achieved in normal PostgreSQL with the help of a DISTINCT ON function, which shows one row per input value in a specific column. However, DISTINCT ON is not supported by Redshift.
Some potential workarounds: DISTINCT ON like functionality for Redshift
The output you seek would be trivial using a programming language (which can loop through results and store variables) but is difficult to apply to an SQL query (which is designed to operate on rows of results). I would recommend extracting the data and running it through a simple script (eg in Python) that could then output the Start & End combinations you seek.
This is an excellent use-case for a Hadoop Streaming function, which I have successfully implemented in the past. It would take the records as input, then 'remember' the start time and would only output a record when the desired end-logic has been met.
Sounds like what you are after is "sessionisation" of the activity events. You can achieve that in Redshift using Windows Functions.
The complete solution might look like this:
SELECT
start AS session_start,
session_end
FROM (
SELECT
start,
end1,
lead(end1, 1)
OVER (
ORDER BY end1) AS session_end,
session_boundary
FROM (
SELECT
start,
end1,
CASE WHEN session_switch = 0 AND reverse_session_switch = 1
THEN 'start'
ELSE 'end' END AS session_boundary
FROM (
SELECT
start,
end1,
CASE WHEN datediff(seconds, end1, lead(start, 1)
OVER (
ORDER BY end1 ASC)) > 10
THEN 1
ELSE 0 END AS session_switch,
CASE WHEN datediff(seconds, lead(end1, 1)
OVER (
ORDER BY end1 DESC), start) > 10
THEN 1
ELSE 0 END AS reverse_session_switch
FROM app_tax
)
AS sessioned
WHERE session_switch != 0 OR reverse_session_switch != 0
UNION
SELECT
start,
end1,
'start'
FROM (
SELECT
start,
end1,
row_number()
OVER (PARTITION BY APP_nm
ORDER BY end1 ASC) AS row_num
FROM APP_Tax
) AS with_row_number
WHERE row_num = 1
) AS with_boundary
) AS with_end
WHERE session_boundary = 'start'
ORDER BY start ASC
;
Here is the breadkdown (by subquery name):
sessioned - we first identify the switch rows (out and in), the rows in which the duration between end and start exceeds limit.
with_row_number - just a patch to extract the first row because there is no switch into it (there is an implicit switch that we record as 'start')
with_boundary - then we identify the rows where specific switches occur. If you run the subquery by itself it is clear that session start when session_switch = 0 AND reverse_session_switch = 1, and ends when the opposite occurs. All other rows are in the middle of sessions so are ignored.
with_end - finally, we combine the end/start of 'start'/'end' rows into (thus defining session duration), and remove the end rows
with_boundary subquery answers your initial question, but typically you'd want to combine those rows to get the final result which is the session duration.

How to add a column on fly ?

I am facing different kind of problem. In select query I want to add a temporary column on fly based on other columns value.
I have 2 columns
IsOpeningClosingDateToo (tinyint),
HearingDate Date
Now I want to check that if IsOpeningClosingDate = 1 then
Select HearingDate, HearingDate as 'OpeningDate'
If IsOpeningClosingDate= 2
Select HearingDate, HearingDate as 'ClosingDate'
I have tried to do this but failed:
SELECT
,[HearingDate]
,CASE [IsOpeningClosingDate]
when 1 then [HearingDate] as OpeningDate
When 0 then [HearingDate] as ClosingDate
end as 'test'
]
FROM [LitMS_MCP].[dbo].[CaseHearings]
I would suggest returning three columns. Then you can fetch the values in on the application side:
SELECT HearingDate,
(CASE WHEN IsOpeningClosingDate = 1 THEN HearingDate END) as OpeningDate,
(CASE WHEN IsOpeningClosingDate = 0 THEN HearingDate END) as ClosingDate
FROM [LitMS_MCP].[dbo].[CaseHearings];
Alternatively, you could just fetch HearingDate and IsOpeningClosingDate and do the comparison in Python.
The important point is that the columns in a SQL query are fixed by the SELECT. You cannot vary the names or types of the columns conditionally within the query.

SQL multiple SELECT too slow (7 min)

This source is good but too slow.
Function:
Selecting all rows if SC and %%5 and 2013.07.11 < date < 2013.07.18
and
some older lines represent lines
Method:
Finding X count rows.
one by one to see whether there is consistency 28 days
select efi_name, efi_id, count(*) as dupes, id, mlap_date
from address m
where
mlap_date > "2013.07.11"
and mlap_date < "2013.07.18"
and mlap_type = "SC"
and calendar_id not like "%%5"
and concat(efi_id,irsz,ucase(city), ucase(address)) in (
select concat(k.efi_id,k.irsz,ucase(k.city), ucase(k.address)) as dupe
from address k
where k.mlap_date > adddate(m.`mlap_date`,-28)
and k.mlap_date < m.mlap_date
and k.mlap_type = "SC"
and k.calendar_id not like "%%5"
and k.status = 'Befejezett'
group by concat(k.efi_id,k.irsz,ucase(k.city), ucase(k.address))
having (count(*) > 1)
)
group by concat(efi_id,irsz,ucase(city), ucase(address))
Thanks for helping!
NOT LIKE plus wildcard-prefixed terms are index-usage killers.
You could also try replacing the IN + inline table with an inner join: does the optimizer run the NOT LIKE query twice (see your explain plan)?
It looks like you might be using MySql, in which case you could build a hash column based on
efi_id
irsz
ucase(city)
ucase(address))
and compare that column directly. This is a way of implementing a hash join in MySql.
I don't think you need a subquery to do this. You should be able to do it just with the outer group by and conditional aggregations.
select efi_name, efi_id,
sum(case when mlap_date > "2013.07.11" and mlap_date < "2013.07.18" then 1 else 0 end) as dupes,
id, mlap_date
from address m
where mlap_type = 'SC' and calendar_id not like '%%5'
group by efi_id,irsz, ucase(city), ucase(address)
having sum(case when m.status = 'Befejezett' and
m.mlap_date <= '2013.07.11' and
k.mlap_date > adddate(date('2013.07.11'), -28)
then 1
else 0
end) > 1
This produces a slightly different result from your query. Instead of looking at the 28 days before each record, it looks at all records in the week period and then at the four weeks before that period. Despite this subtle difference, it is still identifying dupes in the four-week period before the one-week period.