Get max(timestamp) for each group with repetition (IBM DB2) - sql

ANSWER: Need to use LEAD and PARTITION BY functions. Please refer to Gordon's answer.
I have the following dataset :
I want to get rows 1,3,5,7 in the result set.
RESULT SET SHOULD LOOK LIKE :
11/10/2020 19:36:11.548955 IN_REVIEW
11/8/2020 19:36:11.548955 EXPIRED
11/6/2020 19:36:11.548955 IN_REVIEW
11/4/2020 19:36:11.548955 ACTIVE

Use window functions. LEAD() gets the value from the "next" row, so filter only when the value changes:
SELECT t.*
FROM (SELECT t.*,
LEAD(interac_Reg_stat) OVER (PARTITION BY Acct_No ORDER BY xcn_tmstmp) as next_interac_Reg_stat
FROM TABLE
) t
WHERE interac_Reg_stat <> next_interac_Reg_stat OR
next_interac_Reg_stat IS NULL;

Well, since you are grouping by interac_Reg_stat, you will never get 2 separate rows for the IN_REVIEW status. To get the result you want, you will need to add a sub-query to find all items with a certain interac_Reg_stat that are before another specific interac_Reg_stat.

Related

How to implement lag function in teradata.

Input :
Output :
I want the output as shown in the image below.
In the output image, 4 in 'behind' is evaluated as tot_cnt-tot and the subsequent numbers in 'behind', for eg: 2 is evaluated as lag(behind)-tot & as long as the 'rank' remains same, even 'behind' should remain same.
Can anyone please help me implement this in teradata?
You appears to want :
select *, (select count(*)
from table t1
where t1.rank > t.rank
) as behind
from table t;
I would summarize the data and do:
select id, max(tot_cnt), max(tot),
(max(tot_cnt) -
sum(max(tot)) over (order by id rows between unbounded preceding and current row)
) as diff
from t
group by id;
This provides one row per id, which makes a lot more sense to me. If you want the original data rows (which are all duplicates anyway), you can join this back to your table.

Teradata error 3504 (non-aggregate values must be part of group) when using windowing function

So I wrote a query that uses a window function and I keep getting an error 3504 in Teradata, eventhough I'm sure I have the correct columns in the group by clause (all non-aggregate columns). It has something to do with the windowing function I'm using, because when I comment it out I don't get the error, but I have no idea how to resolve it.
This is the query:
select
n.acct_id as bd_acct_id
,n.tran_nr as tran_order
,t.trade_dt - n.tran_dt as days_until_trade
,n.n_total
,sum(t.trade_ct) as trades_ct
,sum(t.trade_gross_am) as tot_trades
,sum(t.trade_gross_am) over (partition by bd_acct_id, tran_order order by tran_order) as running_total
from nnae n
left join trades t
on n.acct_id = t.acct_id
having days_until_trade > 0
group by 1,2,3,4
order by 1,2,3
Would appreciate any help. Thanks!
Presumably, you intend something like this:
sum(sum(t.trade_gross_am)) over (partition by n.acct_id, n.tran_nr
order by min(n.tran_dt)
rows between unbounded preceding and current row
) as running_total
It seems odd to have a running total, without the date column explicitly in the result set.
Also, I replaced the aliases with the original column names. Not all databases support aliases in window functions, so this is just a habit I'm used to.

How to compare ordered datasets with the dataset before?

I have the following query:
select * from events order by Source, DateReceived
This gives me something like this:
I would like to get the results which i marked blue -> When there are two or more equal ErrorNr-Entries behind each other FROM THE SAME SOURCE.
So I have to compare every row with the row before. How can I achieve that?
This is what I want to get:
Apply the row number over partition by option on your table:
SELECT
ROW_NUMBER() OVER(PARTITION BY Source ORDER BY datereceived)
AS Row,
* FROM events
Either you can run a (max) having > 1 option on the result set's row number. Or if you need the details, apply the same query deducting the row nuber with 1.
Then you can make a join on the source and the row numbers and if the error nr is the same then you have a hit.
You can use the partition by as below.
select * from(select
*,row_number()over(partition by source,errornr order by Source, DateReceived) r
from
[yourtable])t
where r>1
You can specify your column names in the outer select.

Select finishes where athlete didn't finish first for the past 3 events

Suppose I have a database of athletic meeting results with a schema as follows
DATE,NAME,FINISH_POS
I wish to do a query to select all rows where an athlete has competed in at least three events without winning. For example with the following sample data
2013-06-22,Johnson,2
2013-06-21,Johnson,1
2013-06-20,Johnson,4
2013-06-19,Johnson,2
2013-06-18,Johnson,3
2013-06-17,Johnson,4
2013-06-16,Johnson,3
2013-06-15,Johnson,1
The following rows:
2013-06-20,Johnson,4
2013-06-19,Johnson,2
Would be matched. I have only managed to get started at the following stub:
select date,name FROM table WHERE ...;
I've been trying to wrap my head around the where clause but I can't even get a start
I think this can be even simpler / faster:
SELECT day, place, athlete
FROM (
SELECT *, min(place) OVER (PARTITION BY athlete
ORDER BY day
ROWS 3 PRECEDING) AS best
FROM t
) sub
WHERE best > 1
->SQLfiddle
Uses the aggregate function min() as window function to get the minimum place of the last three rows plus the current one.
The then trivial check for "no win" (best > 1) has to be done on the next query level since window functions are applied after the WHERE clause. So you need at least one CTE of sub-select for a condition on the result of a window function.
Details about window function calls in the manual here. In particular:
If frame_end is omitted it defaults to CURRENT ROW.
If place (finishing_pos) can be NULL, use this instead:
WHERE best IS DISTINCT FROM 1
min() ignores NULL values, but if all rows in the frame are NULL, the result is NULL.
Don't use type names and reserved words as identifiers, I substituted day for your date.
This assumes at most 1 competition per day, else you have to define how to deal with peers in the time line or use timestamp instead of date.
#Craig already mentioned the index to make this fast.
Here's an alternative formulation that does the work in two scans without subqueries:
SELECT
"date", athlete, place
FROM (
SELECT
"date",
place,
athlete,
1 <> ALL (array_agg(place) OVER w) AS include_row
FROM Table1
WINDOW w AS (PARTITION BY athlete ORDER BY "date" ASC ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
) AS history
WHERE include_row;
See: http://sqlfiddle.com/#!1/fa3a4/34
The logic here is pretty much a literal translation of the question. Get the last four placements - current and the previous 3 - and return any rows in which the athlete didn't finish first in any of them.
Because the window frame is the only place where the number of rows of history to consider is defined, you can parameterise this variant unlike my previous effort (obsolete, http://sqlfiddle.com/#!1/fa3a4/31), so it works for the last n for any n. It's also a lot more efficient than the last try.
I'd be really interested in the relative efficiency of this vs #Andomar's query when executed on a dataset of non-trivial size. They're pretty much exactly the same on this tiny dataset. An index on Table1(athlete, "date") would be required for this to perform optimally on a large data set.
; with CTE as
(
select row_number() over (partition by athlete order by date) rn
, *
from Table1
)
select *
from CTE cur
where not exists
(
select *
from CTE prev
where prev.place = 1
and prev.athlete = cur.athlete
and prev.rn between cur.rn - 3 and cur.rn
)
Live example at SQL Fiddle.

Oracle Group by issue

I have the below query. The problem is the last column productdesc is returning two records and the query fails because of distinct. Now i need to add one more column in where clause of the select query so that it returns one record. The issue is that the column i need
to add should not be a part of group by clause.
SELECT product_billing_id,
billing_ele,
SUM(round(summary_net_amt_excl_gst/100)) gross,
(SELECT DISTINCT description
FROM RES.tariff_nt
WHERE product_billing_id = aa.product_billing_id
AND billing_ele = aa.billing_ele) productdescr
FROM bil.bill_sum aa
WHERE file_id = 38613 --1=1
AND line_type = 'D'
AND (product_billing_id, billing_ele) IN (SELECT DISTINCT
product_billing_id,
billing_ele
FROM bil.bill_l2 )
AND trans_type_desc <> 'Change'
GROUP BY product_billing_id, billing_ele
I want to modify the select statement to the below way by adding a new filter to the where clause so that it returns one record .
(SELECT DISTINCT description
FROM RRES.tariff_nt
WHERE product_billing_id = aa.product_billing_id
AND billing_ele = aa.billing_ele
AND (rate_structure_start_date <= TO_DATE(aa.p_effective_date,'yyyymmdd')
AND rate_structure_end_date > TO_DATE(aa.p_effective_date,'yyyymmdd'))
) productdescr
The aa.p_effective_date should not be a part of GROUP BY clause. How can I do it? Oracle is the Database.
So there are multiple RES.tariff records for a given product_billing_id/billing_ele, differentiated by the start/end dates
You want the description for the record that encompasses the 'p_effective_date' from bil.bill_sum. The kicker is that you can't (or don't want to) include that in the group by. That suggests you've got multiple rows in bil.bill_sum with different effective dates.
The issue is what do you want to happen if you are summarising up those multiple rows with different dates. Which of those dates do you want to use as the one to get the description.
If it doesn't matter, simply use MIN(aa.p_effective_date), or MAX.
Have you looked into the Oracle analytical functions. This is good link Analytical Functions by Example