Oracle Select Query, distributing frequency of labels - sql

If i know how much data i am receiving, i can distribute the appearance of a label in this way:
select
value_column
, CASE WHEN TO_CHAR(VALUE_DATE, 'mm') = '01' and MOD(extract(year from VALUE_DATE), 3) = 0
THEN TO_CHAR(VALUE_DATE, 'MON-yyyy')
else ' '
END VALUE_DATE_STRING
from SomeTable
this will show the data label on January every 3rd year.
Now, if i don't know how many years are coming back, i'd like to figure this out in the same select and display a total of 5 labels.
i reckon i'd need something like this (pseudo code):
CASE WHEN MOD(allRows / 5, ROW_NUM) = 0
i guess the only challenging part is getting the allRows in the same select.. since i'm calling this sql from a telerik report, there's limited support for declaring vars and running multiple statements..

This might do what you want:
select value_column,
(CASE WHEN mod(rownum, trunc(cnt / 5)) = 0 o
THEN TO_CHAR(VALUE_DATE, 'MON-yyyy')
else ' '
END) as VALUE_DATE_STRING
from (select t.*, count(*) over () as cnt
from SomeTable t
) t;
It might be trunc((cnt - 1) / 5).
DIT:
If this does what you want, you don't need a CTE or subquery. I just thought that it made more sense this way (and the subquery does not effect performance). You can do:
select value_column,
(CASE WHEN mod(row_number() over (order by value_date),
trunc(count(*) over () / 5)) = 0
THEN TO_CHAR(VALUE_DATE, 'MON-yyyy')
else ' '
END) as VALUE_DATE_STRING
from SomeTable t;
By the way, you should include order by if you are using rownum. Oracle does not guarantee the ordering of rows in a result set without an order by.

here's my solution.. it could still be optimized, and i believe the rounding will not always work in my favor..
WITH base as (
select
value_column
, value_date
, cnt.c
, case mod(c,2) when 0 then 6 else 5 end as divider -- get 5 or 6 dates.. try to minimize truncation consequences.. this may need work.
, row_number() over (order by value_date) as row_number
from MyView
left join (select count(*) c from MyView where id = 250170) cnt on 1=1
where id = 250170
order by value_date
)
select
value_column
, value_date
, CASE WHEN MOD(row_number, trunc(c/divider)) = 0
THEN TO_CHAR(VALUE_DATE, 'MON-yyyy')
else ' '
END VALUE_DATE_STRING
from base
order by value_date
UPDATE
simplified based on Gordon's answer
select
value_column
, value_date
, (
CASE WHEN
mod(
row_number() over (order by value_date),
trunc(count(*) over () / 5)) = 0
THEN TO_CHAR(VALUE_DATE, 'MON-yyyy')
else ' '
END) as LABEL_STRING
from MyView

Related

SQL - Adding conditions to SELECT

I have a table which has a timestamp and inCycle status of a machine. I'm using two CTE's and doing an INNER JOIN on row number so I can easily compare the timestamp of one row to the next. I have the DATEDIFF working and now I need to look at the inCycle status. Basically, if the inCycleThis and inCycleNext both = 1, I need to add it to an InCycle total.
Similarly (Shown table will make this clear):
incycleThis/next = 0,1 = not in cycle
incycleThis/next = 0,0 = not in cycle
incycleThis/next = 1,1 = in cycle
If I was doing this client side, this would be pretty simple. I need to do this in a stored procedure though due to there being a lot of records. I'd love to use an 'IF' in the SELECT section, but it seems that's not how it works.
The result I'm looking for at the end is simply: InCycle = Xtime. Something like:
SUM(Diff_seconds if((InCycleThis = 1 AND InCycleNext = 1) OR (InCycleThis = 1 AND InCycleNext = 0))
This is what I have so far:
WITH History_CTE (DT, MID, FRO, IC, RowNum)
AS
(
SELECT DateAndTime
,MachineID
,FeedRateOverride
,InCycle
,ROW_NUMBER()OVER(ORDER BY MachineID, DateAndTime) AS "row number"
FROM History
WHERE DateAndTime >= '2020-11-15'
AND DateAndTime < '2020-11-16'
),
History2_CTE (DT2, MID2, FRO2, IC2, RowNum2)
AS
(
SELECT DateAndTime
,MachineID
,FeedRateOverride
,InCycle
,ROW_NUMBER()OVER(ORDER BY MachineID, DateAndTime) AS "row number"
FROM History
WHERE DateAndTime >= '2020-11-15'
AND DateAndTime < '2020-11-16'
)
SELECT DT as 'TimeStamp'
,DT2 as 'TimeStamp Next Row'
,MID
,FRO
,IC as 'InCycle this'
,IC2 as 'InCycle next'
,RowNum
,DATEDIFF(s, History2_CTE.DT2, History_CTE.DT) AS 'Diff_seconds'
FROM History_CTE
INNER JOIN
History2_CTE ON History_CTE.RowNum = History2_CTE.RowNum2 + 1
Consider adding a third CTE to first conditionally calculate your needed value. Then aggregate for final statement. Recall CTEs can reference previously defined CTEs. Be sure to always quailfy columns with table aliases in JOIN queries.
WITH
... first two ctes...
, sub AS (
SELECT h1.DT AS 'TimeStamp'
, h2.DT2 AS 'TimeStamp Next Row'
, h1.MID
, h1.FRO
, h1.IC AS 'InCycle this'
, h2.IC2 AS 'InCycle next'
, h1.RowNum
, DATEDIFF(s, h2.DT2, h1.DT) AS 'Diff_seconds'
, CASE
WHEN (h1.IC = 1 AND h2.IC2 = 1) OR (h1.IC= 1 AND h2.IC2 = 0)
THEN DATEDIFF(s, h2.DT2, h1.DT)
END AS 'IC_Diff_seconds'
FROM History_CTE h1
INNER JOIN History2_CTE h2
ON h1.RowNum = h2.RowNum2 + 1
)
SELECT SUM([Diff_seconds]) AS Diff_seconds_Total
, SUM([IC_Diff_seconds]) AS IC_Diff_seconds_Total
FROM sub
And if needing to add groupings, incorporate GROUP BY:
SELECT h1.MID
, h1.FRO
, SUM([Diff_seconds]) AS Diff_seconds_Total
, SUM([IC_Diff_seconds]) AS IC_Diff_seconds_Total
FROM sub
GROUP BY h1.MID
, h1.FRO
Even aggregate calculations by day:
SELECT CONVERT(date, [TimeStamp]) AS [Day]
, SUM([Diff_seconds]) AS Diff_seconds_Total
, SUM([IC_Diff_seconds]) AS IC_Diff_seconds_Total
FROM sub
GROUP BY CONVERT(date, [TimeStamp])
The result I'm looking for at the end is simply: InCycle = Xtime. Something like:
SUM(Diff_seconds if((InCycleThis = 1 AND InCycleNext = 1) OR (InCycleThis = 1 AND InCycleNext = 0))
As I understand your question, you just need to sum the difference betwen the timestamp of "in cycle" rows and the timestamp of the next row.
select machineid,
sum(datediff(s, dateandtime, lead_dateandtime)) as total_in_time
from (
select h.*,
lead(dateandtime) over(partition by machineid order by dateandtime) as lead_dateandtime
from history h
) h
where inclycle = 1
group by machineid

COUNT from DISTINCT values in multiple columns

If this has been asked before, I apologize, I wasn't able to find a question/solution like it before breaking down and posting. I have the below query (using Oracle SQL) that works fine in a sense, but not fully what I'm looking for.
SELECT
order_date,
p_category,
CASE
WHEN ( issue_grp = 1 ) THEN '1'
ELSE '2/3 '
END AS issue_group,
srt AS srt_level,
COUNT(*) AS total_orders
FROM
database.t_con
WHERE
order_date IN (
'&Enter_Date_YYYYMM'
)
GROUP BY
p_category,
CASE
WHEN ( issue_grp = 1 ) THEN '1'
ELSE '2/3 '
END,
srt,
order_date
ORDER BY
p_category,
issue_group,
srt_level,
order_date
Current Return (12 rows):
Needed Return (8 rows without the tan rows being shown):
Here is the logic of total_order column that I'm expecting:
count of order_date where (srt_level = 80 + 100 + Late) ... 'Late' counts needed to be added to the total, just not be displayed
I'm eventually adding a filled_orders column that will go before the total_orders column, but I'm just not there yet.
Sorry I wasn't as descriptive earlier. Thanks again!
You don't appear to need a subquery; if you want the count for each combination of values then group by those, and aggregate at that level; something like:
SELECT
t1.order_date,
t1.p_category,
CASE
WHEN ( t1.issue_grp = 1 ) THEN '1'
ELSE '2/3 '
END AS issue_group,
t1.srt AS srt_level,
COUNT(*) AS total_orders
FROM
database.t_con t1
WHERE
t1.order_date = TO_DATE ( '&Enter_Date_YYYYMM', 'YYYYMM' )
GROUP BY
t1.p_category,
CASE
WHEN ( t1.issue_grp = 1 ) THEN '1'
ELSE '2/3 '
END,
t1.srt,
t1.order_date
ORDER BY
p_category,
issue_group,
srt_level,
order_date;
You shouldn't be relying on implicit conversion and NLS settings for your date argument (assuming order_date is actually a date column, not a string), so I've used an explicit TO_DATE() call, using the format suggested by your substitution variable name and prompt.
However, that will give you the first day of the supplied month, since a day number isn't being supplied. It's more likely that you either want to prompt for a full date, or (possibly) just the year/month but want to include all days in that month - which IN() will not do, if that was your intention. It also implies that stored dates all have their time portions set to midnight, as that is all it will match on. If those values have non-midnight times then you need a range to pick those up too.
I got it working to the extent of what my question was. Just needed to nest each column where counts/calculations were happening.
SELECT
order_date,
p_category,
issue_group,
srt_level,
order_count,
SUM(order_count) OVER(
PARTITION BY order_date, issue_group, p_category
) AS total_orders
FROM
(
SELECT
order_date,
p_category,
CASE
WHEN ( issue_grp = 1 ) THEN '1'
ELSE '2/3 '
END AS issue_group,
srt AS srt_level,
COUNT(*) AS order_count
FROM
database.t_con
WHERE
order_date IN (
'&Enter_Date_YYYYMM'
)
GROUP BY
p_category,
CASE
WHEN ( issue_grp = 1 ) THEN '1'
ELSE '2/3 '
END,
srt,
order_date
)
ORDER BY
order_date,
p_category,
issue_group

Speed up execution of query to find sequential rows that have a changed value

My goal is to go through my dataset, compare each ITEM_NO/LOC day-by-day, and identify days where the VAL has changed from the day before. Right now, I do that by sorting, creating a column of row numbers, joining the table to itself offset by a row, and then only picking rows where VAL has changed.
Each month has about half a billion records. In total there's around 2.7 billion records. The data is stored in DB2 BLU. The table already has indices for ITEM_NO, LOC, and ARCV_DATE. I only have select access to the table.
I think the big bottleneck is the order by in the select statement given that n is so large. One idea I had was to try to do the sorting month-by-month and then union each of the months together.
Here's what I have so far:
with x as (
select ITEM_NO, LOC, ARCV_DATE, VAL, ROW_NUMBER() over (order by ITEM_NO, LOC, ARCV_DATE) as RN
from MY_SCHEMA.MY_TABLE a
where
ARCV_DATE >= '2017-06-01'
and ARCV_DATE < '2017-07-01'
)
SELECT
x.ITEM_NO,
x.LOC,
y.ARCV_DATE as CHANGE_DATE,
y.VAL,
x.VAL as OLD_VAL
FROM x
INNER JOIN x AS y
ON x.rn = y.rn + 1
WHERE
x.VAL <> y.VAL
and x.ITEM_NO = y.ITEM_NO
and x.LOC = y.LOC
What could I do to improve performance on this for such a dataset?
Without any write access your options are very limited because the query isn't that complex. You could try avoiding the join altogether by using LAG() OVER() such as this:
SELECT
*
FROM (
SELECT
ITEM_NO
, LOC
, ARCV_DATE
, VAL
, LAG(ARCV_DATE, 1) OVER (PARTITION BY ITEM_NO, LOC ORDER BY ARCV_DATE DESC) AS CHANGE_DATE
, LAG(VAL, 1) OVER (PARTITION BY ITEM_NO, LOC ORDER BY ARCV_DATE DESC) AS OLD_VAL
FROM MY_SCHEMA.MY_TABLE
WHERE ARCV_DATE >= '2017-06-01'
AND ARCV_DATE < '2017-07-01'
) d
WHERE ( VAL <> OLD_VAL OR OLD_VAL IS NULL )
But tuning this further could require adding or changing indexes.
SELECT currentval.ITEM,
currentval.LOC
currentval.ARCV_DATE currentdate
prevval.ARCV_DATE Previousdate
currentval.val currentval
prevval.val Previousval
FROM MY_SCHEMA.MY_TABLE currentval JOIN
MY_SCHEMA.MY_TABLE prevval ON
currentval.ITEM_NO = prevval.ITEM_NO
WHERE currentval.loc = prevval.loc
AND currentval.val <> prevval.val
AND currentval.ARCV_DATE = prevval.ARCV_DATE+1
AND currentval.ARCV_DATE >= '2017-06-01'
AND prevval.ARCV_DATE < '2017-07-01'
Assuming that values will change from one day to next day. This query will retrieve the values that changes from previous day to current day.
AND currentval.ARCV_DATE = prevval.ARCV_DATE+1

Oracle SQL LAG Function

I'd appreciate some help with this code, I'm getting a 'missing keyword' error. I've never used the Lag function before, so hopefully I using it correctly. Thanks for your help. Gav
CREATE VIEW GS_Date AS
SELECT
DATE_DATE,
DATE_FLAG,
CASE WHEN LAG ( DATE_FLAG) OVER ( ORDER BY DATE_DATE ) = '1' THEN DATE_STEP = ( LAG ( DATE_FLAG) OVER ( ORDER BY DATE_DATE ) ) + '1'
WHEN LAG ( DATE_FLAG) OVER ( ORDER BY DATE_DATE ) = '0' AND LAG ( DATE_FLAG) OVER ( ORDER BY DATE_DATE ) = '-1' THEN DATE_STEP = ( LAG ( DATE_FLAG) OVER ( ORDER BY DATE_DATE ) ) + '1'
ELSE DATE_STEP = LAG ( DATE_FLAG) OVER ( ORDER BY DATE_DATE ) END AS DATE_STEP
FROM DATE_GROUP
The problem is with the CASE expression; you were using LAG correctly.
Other points: Don't add strings like '1' and '-1' to numbers. Add numbers - you don't need the single quotes.
Also, if in a computation something is common and only the "last part" is different, you can use the CASE expression "at the end". Like below:
Note: On re-reading the original post, the formula needs to be more complicated (I didn't get it exactly right). Not changing the answer, since it still illustrates the same ideas I meant to share. BUT: Looking at the original post, there is a condition "when LAG = 0 and LAG = -1" - that can never be true. What was meant is probably "OR" instead of "AND". In the formula I wrote below, this means one more WHEN...THEN... branch.
LAG(DATE_FLAG) OVER (ORDER BY DATE)
+ CASE LAG(DATE_FLAG) OVER (ORDER BY DATE ) WHEN 1 THEN 1
WHEN 0 THEN -1
ELSE 0 END AS DATE_STEP
Further edit: Looking at it again, it seems when the flag is 1, 0 or -1 then we must add 1, otherwise add 0... then it's easier to use a "simple CASE expression" instead of a "searched CASE expression" as I did. Something like:
LAG(...) ...
+ CASE WHEN LAG(...) ... IN (-1, 0, 1) THEN 1
ELSE 0 END AS DATE_STEP
Try like this
CREATE VIEW GS_Date AS
SELECT DATE_DATE,
DATE_FLAG,
CASE
WHEN LAG(DATE_FLAG) OVER(ORDER BY DATE_DATE) = '1' THEN
(LAG(DATE_FLAG) OVER(ORDER BY DATE_DATE)) + '1'
WHEN LAG(DATE_FLAG) OVER(ORDER BY DATE_DATE) = '0' AND LAG(DATE_FLAG) OVER(ORDER BY DATE_DATE) = '-1' THEN
(LAG(DATE_FLAG) OVER(ORDER BY DATE_DATE)) + '1'
ELSE
LAG(DATE_FLAG) OVER(ORDER BY DATE_DATE)
END AS DATE_STEP
FROM DATE_GROUP
So you don't have to keep writing LAG( ... ) OVER ( ... ) statements, get the LAG value in a sub-query and then use CASE or DECODE in the outer query:
CREATE VIEW GS_Date AS
SELECT DATE_DATE,
DATE_FLAG,
DECODE(
DATE_STEP,
1, 2,
0, 1,
-1, 0,
DATE_STEP
) AS DATE_STEP
FROM (
SELECT DATE_DATE,
DATE_FLAG,
LAG ( DATE_FLAG ) OVER ( ORDER BY DATE_DATE ) AS DATE_STEP
FROM DATE_GROUP
)'
Also, your second WHEN clause will never be true:
WHEN LAG ( DATE_FLAG ) OVER ( ORDER BY DATE_DATE ) = '0'
AND LAG ( DATE_FLAG ) OVER ( ORDER BY DATE_DATE ) = '-1'
THEN ...
Since the value can never be both -1 and 0. I've assumed you meant to use OR rather than AND.

CASE Statement inside a subquery

I was able to create the following query after help from the post below
select * from duppri t
where exists (
select 1
from duppri
where symbolUP = t.symbolUP
AND date = t.date
and price <> t.price)
ORDER BY date
SQL to check when pairs don't match
I have now realized that I need to add a case statement to indicate when all the above criteria fits, but the type value is equal between duppri and t.duppri. This occurs because of case sensitivity. This query is an attempt to clean up a portfolio accounting system that unfortunately allowed numerous duplicates because it didn't have strong referential integrity or constraints.
I would like the case statement to produce the column 'isMatch'
Date |Type|Symbol |SymbolUP |Concatt |Price |IsMatch
6/30/1995 |gaus|313586U72|313586U72|gaus313586U72|109.25|Different
6/30/1995 |gbus|313586U72|313586U72|gbus313586U72|108.94|Different
6/30/1995 |agus|SRR |SRR |agusSRR |10.25 |Different
6/30/1995 |lcus|SRR |SRR |lcusSRR |0.45 |Different
11/27/1996|lcus|LLY |LLY |lcusLLY |76.37 |Matched
11/27/1996|lcus|lly |LLY |lcusLLY |76 |Matched
11/28/1996|lcus|LLY |LLY |lcusLLY |76.37 |Matched
11/28/1996|lcus|lly |LLY |lcusLLY |76 |Matched
I tried the following CASE statement but it is creating errors
SELECT * from duppri t
where exists (
select 1,
CASE IsMatch WHEN [type] = [t.TYPE] THEN 'Matched' ELSE 'Different' END
from duppri
where symbolUP = t.symbolUP
AND date = t.date
and price <> t.price)
ORDER BY date
You could just use window functions, if I understand correctly:
select d.*,
(case when mint = maxt
then 'Matched' else 'Different'
end)
from (select d.*,
min(type) over (partition by symbolup, date) as mint,
max(type) over (partition by symbolup, date) as maxt,
min(price) over (partition by symbolup, date) as minp,
max(price) over (partition by symbolup, date) as maxp
from duppri d
) d
where minp <> maxp
order by date;
The subquery used with the exists predicate can't and won't return anything other than true/false but you can accomplish what you want using a subquery like this, which should work:
select
*,
(select
CASE when count(distinct type) = 1 THEN 'Matched' ELSE 'Different' END
from duppri
where symbol = t.symbol and date = t.date
) IsMatch
from duppri t
where exists (
select 1
from duppri
where symbol = t.symbol
and price <> t.price);