Aggregating values in SQL - sql

Im trying to aggregate output values of a customer's bankruptcy in terms of yes (Y), no (N) or no data (N/D) using a window function in a subquery below. For example, if there arise edge cases when in one record the Customer is classified as not bankrupt (N) but in another record on the same CDate it is also classified as no data (N/D), I should get a final aggregated output value as N/D, but it gives me N instead, because of what I've done here by partitioning the customer records over IsBankrupt ascending (asc). The logic behind it which is supposed to be implemented:
Y and Y = Y;
Y and N = Y;
N and N = N;
Y and N/D = Y;
N and N/D = N/D
with sample as (
select date('2020-12-32') as CDate, 123 as CustomerID, 'N/D' as IsBankrupt
union all
select date('2020-12-32') as CDate, 123 as CustomerID, 'N' as IsBankrupt)
select CDate, CustomerID, IsBankrupt, case when CustomerID = 123 then 'N/D' end as ExpectedResult
from
(
select CDate, CustomerID, IsBankrupt,
row_number() over (partition by CustomerID, CDate order by IsBankrupt asc) as flag
from sample
) from subsample
where flag = 1
output:
CDate
CustomerID
IsBankrupt
ExpectedOutput
2020-12-31
123
N
N/D
All the other cases of the previously mentioned logic work. So Question is - how could i update my row_number() over partition by clause so that the logic doesnt break down?

I would suggest aggregation:
select cdate, customerid,
(case when sum(case when IsBankrupt = 'Y' then 1 else 0 end) > 0
then 'Y'
when sum(case when IsBankrupt = 'N/D' then 1 else 0 end) > 0
then 'N/D'
else 'N'
end) as new_flag
from t
group by cdate, customerid;
If you don't like the nested case expressions, you can actually do this based on the ordering of the values:
select cdate, customerid,
max(IsBankrupt) as new_flag
from t
group by cdate, customerid;

Related

SQL to return 1 or 0 depending on values in a column's audit trail

If I were to have a table such as the one below:
id_
last_updated_by
1
robot
1
human
1
robot
2
robot
3
robot
3
human
Using SQL, how could I group by the ID and create a new column to indicate whether a human has ever updated the record like this:
id_
last_updated_by
updated_by_human
1
robot
1
2
robot
0
3
robot
1
UPDATE
I'm currently doing the following, though I'm not sure how efficient this is. Selecting the latest record and then merging it with my calculated column via a sub-select.
SELECT MAIN.TRANSACTION_ID,
MAIN.CREATED_DATE
MAIN.CREATED_BY_USER_ID,
MAIN.OWNER_USER_ID,
STP.TOUCHED_BY_HUMAN
FROM (
SELECT TRANSACTION_ID,
CREATED_DATE
CREATED_BY_USER_ID_
OWNER_USER_ID_
FROM TABLE_NAME
WHERE CREATED_DATE >= CAST('{start_date} 00:00:00' AS TIMESTAMP)
AND CREATED_DATE <= CAST('{end_date} 23:59:59' AS TIMESTAMP)
QUALIFY row_number() OVER (partition by TRANSACTION_ID order by End_Dt desc) = 1
) MAIN
LEFT JOIN (
SELECT TRANSACTION_ID,
CASE
WHEN CREATED_BY_USER_ID IN ('ROBOT', 'MACHINE') OR
CREATED_BY_USER_ID LIKE 'N%' OR
CREATED_BY_USER_ID IS NULL
THEN 0
ELSE 1 END AS CREATED_BY_HUMAN,
CASE
WHEN OWNER_USER_ID IN ('ROBOT', 'MACHINE') OR
OWNER_USER_ID LIKE 'N%' OR
OWNER_USER_ID IS NULL
THEN 0
ELSE 1 END AS OWNED_BY_HUMAN,
CASE
WHEN CREATED_BY_HUMAN = 0 AND
OWNED_BY_HUMAN = 0
THEN 0
ELSE 1 END AS TOUCHED_BY_HUMAN_
FROM TABLE_NAME
WHERE CREATED_DATE >= CAST('{start_date} 00:00:00' AS TIMESTAMP)
AND CREATED_DATE <= CAST('{end_date} 23:59:59' AS TIMESTAMP)
QUALIFY row_number() OVER (partition by TRANSACTION_ID order by TOUCHED_BY_HUMAN_ desc) = 1
) STP
ON MAIN.TRANSACTION_ID = STP.TRANSACTION_ID
If I'm following your problem, then something like this should work.
SELECT
t.*
,CASE WHEN a.id IS NOT NULL THEN 1 ELSE 0 END AS updated_by_human
FROM table t
LEFT JOIN (SELECT DISTINCT id FROM table WHERE last_updated_by = 'human') a ON t.id = a.id
That takes care of the updated_by_human field, but if you also need to reduce the records in table (only keeping a subset) then you need more information to do that.
Exists clauses are usually not that performant but if your data isn't big this should work.
select id_,
IF (EXISTS (SELECT 1 FROM table_name t2 WHERE t2.last_updated_by = 'human' and t2.id_ = t1.id_), 1, 0) AS updated_by_human
from table_name t1;
here is another way
SELECT *
FROM table_name t1
GROUP BY ti.id_
HAVING COUNT(*) > 0
AND MAX(CASE t1.last_updated_by WHEN 'human' THEN 1 ELSE 0 END) = 1;
Since you didn't specified which column is used to determine this record is the newest record added by a given id, I assume that there will be a column to track the insert/modify timestamp (which is pretty standard table design), let's put it is last_updated_timestamp (if you don't have any, then I still insist you to have one as an auditing trail without timestamp does not make sense)
Given your table name is updating_trail
SELECT updating_trail.*, last_update_trail.modified_by_human
FROM updating_trail
INNER JOIN (
-- determine the id_, the lastest modified_timestamp, and a flag check to determine if there is any record with last_update_by is 'human' -> if yes then give 1
SELECT updating_trail.id_, MAX(last_update_timestamp) AS most_recent_update_ts, MAX(CASE WHEN updating_trail.last_updated_by = 'human' THEN 1 ELSE 0 END) AS modified_by_human
FROM updating_trail
GROUP BY updating_trail.id_
) last_update_trail
ON updating_trail.id_ = last_update_trail.id_ AND updating_trail.last_update_timestamp = last_update_trail.most_recent_update_ts;
Give
id_
last_updated_by
last_update_timestamp
modified_by_human
1
robot
2021-10-19T20:00:00.000Z
1
2
robot
2021-10-19T17:00:00.000Z
0
3
robot
2021-10-19T16:00:00.000Z
1
Check out this sample db fiddle I created for you
This is a 1:1 translation of your query to conditional aggregation:
SELECT TRANSACTION_ID,
CREATED_DATE,
CREATED_BY_USER_ID,
OWNER_USER_ID,
Max(CASE
WHEN CREATED_BY_USER_ID IN ('ROBOT', 'MACHINE') OR
CREATED_BY_USER_ID LIKE 'N%' OR
CREATED_BY_USER_ID IS NULL
THEN 0
ELSE 1
END) Over (PARTITION BY TRANSACTION_ID) AS CREATED_BY_HUMAN
FROM Table_Name
WHERE CREATED_DATE >= Cast('{start_date} 00:00:00' AS TIMESTAMP)
AND CREATED_DATE <= Cast('{end_date} 23:59:59' AS TIMESTAMP)
QUALIFY Row_Number() Over (PARTITION BY TRANSACTION_ID ORDER BY End_Dt DESC) = 1

Why error: 01428. 00000 - "argument '%s' is out of range" SQl Developer

I have the following SQL script,
Select * From
(Select To_Char(Bmret.Pricedate, 'dd-mm-yyyy') As Pricedate, Bmret.Bmval, Bmret.id
, Cast(Exp(Sum(Ln(Cast(Bmret.Bmval As number))) Over (Partition By bmret.id)) As Number) As Twr
, RANK() OVER (PARTITION BY bmret.id ORDER BY bmret.pricedate asc) AS rank
From Tab_A Bmret
Where 1=1
) B
Where 1=1
And B.Rank=1
;
, which provides me with the desired result of a column, twr, that contains the product of the elements in column Bmval across pricedates, grouped by id.
However, I obtain the following error: 01428. 00000 - "argument '%s' is out of range".
I am aware that the error stems from the part Cast(Exp(Sum(Ln(Cast(Bmret.Bmval As number))) Over (Partition By bmret.id)) As Number) of the code and in particular that the "parameter passed into the function was not a valid value". Hence, my question is, is there any way to identify the id with values that are not valid?
I am not allowed to share the sample data. I am sorry.
Thank you in advance.
Best regards,
Please check the value of Cast(Bmret.Bmval As number). It must be greater than 0.
For further read:
https://www.techonthenet.com/oracle/functions/ln.php
Oracle / PLSQL: LN Function This Oracle tutorial explains how to use
the Oracle/PLSQL LN function with syntax and examples.
Description The Oracle/PLSQL LN function returns the natural logarithm
of a number.
Syntax The syntax for the LN function in Oracle/PLSQL is:
LN( number ) Parameters or Arguments number The numeric value used to
calculate the natural logarithm. It must be greater than 0.
You need to define what will be the Ln(Cast(Bmret.Bmval As number)) if Bmret.Bmval <=0. If you define it as 0( which might not be correct for the calculation) then your query would be:
Select * From
(Select To_Char(Bmret.Pricedate, 'dd-mm-yyyy') As Pricedate, Bmret.Bmval, Bmret.id
, Cast(Exp(Sum(case when Cast(Bmret.Bmval As number)>0 then Ln(Cast(Bmret.Bmval As number)) else 0 end) Over (Partition By bmret.id)) As Number) As Twr
, RANK() OVER (PARTITION BY bmret.id ORDER BY bmret.pricedate asc) AS rank
From Tab_A Bmret
Where 1=1
) B
Where 1=1
And B.Rank=1;
As #Kazi said, and as earlier answers had already mentioned, the issue is with using ln() with a negative number or zero. The documentation says:
LN returns the natural logarithm of n, where n is greater than 0.
so you can identify the IDs with out-of-range values with:
select id from tab_a where bmval <= 0
As you want the product of several numbers, you probably still want to include those values; but then having a zero amongst them should make the result zero, one negative number should make the result negative, two should make it positive, etc.
You can use the absolute value of your numbers for the calculation, and at the same time count how many negative values there are - then if that count of negatives is an odd number, multiply the whole result by -1.
Adapting the answer to your previous question, and changing the table and column names to match this question, that would be:
select to_char(a1.pricedate, 'dd-mm-yyyy') as pricedate, b1.bm, a1.bmval,
round(cast(exp(sum(ln(cast(abs(a1.bmval) as binary_double))) over (partition by b1.bmik)) as number))
*
case
when mod(count(case when a1.bmval < 0 then pricedate end) over (partition by b1.bmik), 2) = 0
then 1
else -1
end as product
from tab_a a1
inner join benchmarkdefs b1 on (a1.id = b1.bmik);
db<>fiddle with a group that has two negatives (which cancel out), one negative (which is applied), and one with a zero - where the product ends up as zero, as you'd hopefully expect.
The point of the cast() calls was to improve performance, as noted in the old question I linked to, by performing the exp/ln part as binary_double; there is no point casting a number to number. If you don't want the binary_double part then you can take the casts out completely; but then you do also have to deal with zeros as well as negative values, e.g. keeping track of whether you have any of those too:
select to_char(a1.pricedate, 'dd-mm-yyyy') as pricedate, b1.bm, a1.bmval,
round(exp(sum(ln(abs(nullif(a1.bmval, 0)))) over (partition by b1.bmik)))
*
case when min(abs(a1.bmval)) over (partition by b1.bmik) = 0 then 0 else 1 end
*
case
when mod(count(case when a1.bmval < 0 then pricedate end) over (partition by b1.bmik), 2) = 0
then 1
else -1
end as product
from tab_a a1
inner join benchmarkdefs b1 on (a1.id = b1.bmik);
db<>fiddle
For this query, which just gets values for the first date and product across all dates, that would translate (with casting) to:
select * from
(
select to_char(bmret.pricedate, 'dd-mm-yyyy') as pricedate, bmret.bmval, bmret.id
, round(exp(sum(ln(abs(nullif(bmret.bmval, 0)))) over (partition by bmret.id)))
*
case when min(abs(bmret.bmval)) over (partition by bmret.id) = 0 then 0 else 1 end
*
case
when mod(count(case when bmret.bmval < 0 then pricedate end) over (partition by bmret.id), 2) = 0
then 1
else -1
end as twr
, rank() over (partition by bmret.id order by bmret.pricedate asc) as rank
from tab_a bmret
) b
where b.rank=1
PRICEDATE
BMVAL
ID
TWR
RANK
11-08-2021
1
1
120
1
11-08-2021
12
2
524160
1
11-08-2021
22
3
-7893600
1
11-08-2021
1
4
0
1
db<>fiddle
As you were told in an old answer, if you don't want to see the (not very interesting) rank column then change select * from to select pricedate, bmval, id, twr from in the outer query.
You could also use aggregation with keep to avoid needing an inline view:
select to_char(min(pricedate), 'dd-mm-yyyy') as pricedate
, min(bmret.bmval) keep (dense_rank first order by pricedate) as bmval
, min(bmret.id) keep (dense_rank first order by pricedate) as id
, round(exp(sum(ln(abs(nullif(bmret.bmval, 0))))))
*
case when min(abs(bmret.bmval)) = 0 then 0 else 1 end
*
case
when mod(count(case when bmret.bmval < 0 then pricedate end), 2) = 0
then 1
else -1
end as twr
from tab_a bmret
group by bmret.id
PRICEDATE
BMVAL
ID
TWR
11-08-2021
1
1
120
11-08-2021
12
2
524160
11-08-2021
22
3
-7893600
11-08-2021
1
4
0
db<>fiddle

Check whether an employee is present on three consecutive days

I have a table called tbl_A with the following schema:
After insert, I have the following data in tbl_A:
Now the question is how to write a query for the following scenario:
Put (1) in front of any employee who was present three days consecutively
Put (0) in front of employee who was not present three days consecutively
The output screen shoot:
I think we should use case statement, but I am not able to check three consecutive days from date. I hope I am helped in this
Thank you
select name, case when max(cons_days) >= 3 then 1 else 0 end as presence
from (
select name, count(*) as cons_days
from tbl_A, (values (0),(1),(2)) as a(dd)
group by name, adate + dd
)x
group by name
With a self-join on name and available = 'Y', we create an inner table with different combinations of dates for a given name and take a count of those entries in which the dates of the two instances of the table are less than 2 units apart i.e. for each value of a date adate, it will check for entries with its own value adate as well as adate + 1 and adate + 2. If all 3 entries are present, the count will be 3 and you will have a flag with value 1 for such names(this is done in the outer query). Try the below query:
SELECT Z.NAME,
CASE WHEN Z.CONSEQ_AVAIL >= 3 THEN 1 ELSE 0 END AS YOUR_FLAG
FROM
(
SELECT A.NAME,
SUM(CASE WHEN B.ADATE >= A.ADATE AND B.ADATE <= A.ADATE + 2 THEN 1 ELSE 0 END) AS CONSEQ_AVAIL
FROM
TABL_A A INNER JOIN TABL_A B
ON A.NAME = B.NAME AND A.AVAILABLE = 'Y' AND B.AVAILABLE = 'Y'
GROUP BY A.NAME
) Z;
Due to the complexity of the problem, I have not been able to test it out. If something is really wrong, please let me know and I will be happy to take down my answer.
--Below is My Approch
select Name,
Case WHen Max_Count>=3 Then 1 else 0 end as Presence
from
(
Select Name,MAx(Coun) as Max_Count
from
(
select Name, (count(*) over (partition by Name,Ref_Date)) as Coun from
(
select Name,adate + row_number() over (partition by Name order by Adate desc) as Ref_Date
from temp
where available='Y'
)
) group by Name
);
select name as employee , case when sum(diff) > =3 then 1 else 0 end as presence
from
(select id, name, Available,Adate, lead(Adate,1) over(order by name) as lead,
case when datediff(day, Adate,lead(Adate,1) over(order by name)) = 1 then 1 else 0 end as diff
from table_A
where Available = 'Y') A
group by name;

Looping in select query

I want to do something like this:
select id,
count(*) as total,
FOR temp IN SELECT DISTINCT somerow FROM mytable ORDER BY somerow LOOP
sum(case when somerow = temp then 1 else 0 end) temp,
END LOOP;
from mytable
group by id
order by id
I created working select:
select id,
count(*) as total,
sum(case when somerow = 'a' then 1 else 0 end) somerow_a,
sum(case when somerow = 'b' then 1 else 0 end) somerow_b,
sum(case when somerow = 'c' then 1 else 0 end) somerow_c,
sum(case when somerow = 'd' then 1 else 0 end) somerow_d,
sum(case when somerow = 'e' then 1 else 0 end) somerow_e,
sum(case when somerow = 'f' then 1 else 0 end) somerow_f,
sum(case when somerow = 'g' then 1 else 0 end) somerow_g,
sum(case when somerow = 'h' then 1 else 0 end) somerow_h,
sum(case when somerow = 'i' then 1 else 0 end) somerow_i,
sum(case when somerow = 'j' then 1 else 0 end) somerow_j,
sum(case when somerow = 'k' then 1 else 0 end) somerow_k
from mytable
group by id
order by id
this works, but it is 'static' - if some new value will be added to 'somerow' I will have to change sql manually to get all the values from somerow column, and that is why I'm wondering if it is possible to do something with for loop.
So what I want to get is this:
id somerow_a somerow_b ....
0 3 2 ....
1 2 10 ....
2 19 3 ....
. ... ...
. ... ...
. ... ...
So what I'd like to do is to count all the rows which has some specific letter in it and group it by id (this id isn't primary key, but it is repeating - for id there are about 80 different values possible).
http://sqlfiddle.com/#!15/18feb/2
Are arrays good for you? (SQL Fiddle)
select
id,
sum(totalcol) as total,
array_agg(somecol) as somecol,
array_agg(totalcol) as totalcol
from (
select id, somecol, count(*) as totalcol
from mytable
group by id, somecol
) s
group by id
;
id | total | somecol | totalcol
----+-------+---------+----------
1 | 6 | {b,a,c} | {2,1,3}
2 | 5 | {d,f} | {2,3}
In 9.2 it is possible to have a set of JSON objects (Fiddle)
select row_to_json(s)
from (
select
id,
sum(totalcol) as total,
array_agg(somecol) as somecol,
array_agg(totalcol) as totalcol
from (
select id, somecol, count(*) as totalcol
from mytable
group by id, somecol
) s
group by id
) s
;
row_to_json
---------------------------------------------------------------
{"id":1,"total":6,"somecol":["b","a","c"],"totalcol":[2,1,3]}
{"id":2,"total":5,"somecol":["d","f"],"totalcol":[2,3]}
In 9.3, with the addition of lateral, a single object (Fiddle)
select to_json(format('{%s}', (string_agg(j, ','))))
from (
select format('%s:%s', to_json(id), to_json(c)) as j
from
(
select
id,
sum(totalcol) as total_sum,
array_agg(somecol) as somecol_array,
array_agg(totalcol) as totalcol_array
from (
select id, somecol, count(*) as totalcol
from mytable
group by id, somecol
) s
group by id
) s
cross join lateral
(
select
total_sum as total,
somecol_array as somecol,
totalcol_array as totalcol
) c
) s
;
to_json
---------------------------------------------------------------------------------------------------------------------------------------
"{1:{\"total\":6,\"somecol\":[\"b\",\"a\",\"c\"],\"totalcol\":[2,1,3]},2:{\"total\":5,\"somecol\":[\"d\",\"f\"],\"totalcol\":[2,3]}}"
In 9.2 it is also possible to have a single object in a more convoluted way using subqueries in instead of lateral
SQL is very rigid about the return type. It demands to know what to return beforehand.
For a completely dynamic number of resulting values, you can only use arrays like #Clodoaldo posted. Effectively a static return type, you do not get individual columns for each value.
If you know the number of columns at call time ("semi-dynamic"), you can create a function taking (and returning) polymorphic parameters. Closely related answer with lots of details:
Dynamic alternative to pivot with CASE and GROUP BY
(You also find a related answer with arrays from #Clodoaldo there.)
Your remaining option is to use two round-trips to the server. The first to determine the the actual query with the actual return type. The second to execute the query based on the first call.
Else, you have to go with a static query. While doing that, I see two nicer options for what you have right now:
1. Simpler expression
select id
, count(*) AS total
, count(somecol = 'a' OR NULL) AS somerow_a
, count(somecol = 'b' OR NULL) AS somerow_b
, ...
from mytable
group by id
order by id;
How does it work?
Compute percents from SUM() in the same SELECT sql query
SQL Fiddle.
2. crosstab()
crosstab() is more complex at first, but written in C, optimized for the task and shorter for long lists. You need the additional module tablefunc installed. Read the basics here if you are not familiar:
PostgreSQL Crosstab Query
SELECT * FROM crosstab(
$$
SELECT id
, count(*) OVER (PARTITION BY id)::int AS total
, somecol
, count(*)::int AS ct -- casting to int, don't think you need bigint?
FROM mytable
GROUP BY 1,3
ORDER BY 1,3
$$
,
$$SELECT unnest('{a,b,c,d}'::text[])$$
) AS f (id int, total int, a int, b int, c int, d int);

Divide derived rank, count columns from subquery to find percentile

I'm having trouble dividing two columns from a subquery. The only answer that's returned is 0. I've tried multiplying the two columns just to see if it works and it does. I cannot figure out what the problem is.
SELECT cert, repdte, NAMEFULL, Rnk, Cnt, (Cnt - Rnk) / Cnt as 'Perc'
FROM
(
SELECT STRU.cert, STRU.repdte, STRU.NAMEFULL,
CASE
WHEN ISNULL(BAL.DEPI5,0) = 0 THEN NULL
ELSE (ISNULL(INC.EINTEXPA,0) / ISNULL(BAL.DEPI5,0))*100
END AS 'CoF', RANK() OVER (Partition by STRU.repdte ORDER BY
CASE
WHEN ISNULL(BAL.DEPI5,0) = 0 THEN NULL
ELSE (ISNULL(INC.EINTEXPA,0) / ISNULL(BAL.DEPI5,0))*100
END DESC) AS 'Rnk', COUNT(*) OVER (PARTITION BY STRU.repdte) as 'Cnt'
FROM MODEL_RIS_RMS_FDIC.dbo.STRU as STRU
JOIN MODEL_RIS_FDIC.dbo.CDI_RC_BAL as BAL
ON STRU.cert = BAL.cert AND STRU.callYMD = BAL.callYMD
JOIN MODEL_RIS_FDIC.dbo.CDI_RI_INC as INC
ON STRU.cert = INC.cert and STRU.callYMD = INC.callYMD
WHERE
CASE
WHEN ISNULL(BAL.DEPI5,0) = 0 THEN NULL
ELSE (ISNULL(INC.EINTEXPA,0) / ISNULL(BAL.DEPI5,0))*100
END IS NOT NULL AND STRU.callYMD >= '2008-03-31'
) A
WHERE Perc < .11
I think the datatypes of Cnt and Rnk are INT and hence the result gives you the least value of INT type which is 0. Try casting Cnt and Rnk to FLOAT
(cast(Cnt - Rnk) as FLOAT)/CAST(Cnt as FLOAT)