SQL: conditioning in aggregation function - sql

I have following sql:
SELECT LISTAGG((TO_CHAR(ch.count), '|') WITHIN GROUP (ORDER BY ch.Count)
FROM ChG cg
JOIN Ch ch on ch.GroupID = cg.GroupID
WHERE cg.PartyID = cp.PartyID
I would like to add condition, pseudocode:
if(ch.TYPECODE = 1) then ch.count = 'A' + ch.count. How it's better to achieve in stored procedure?

listagg(case when ch.typecode = 1 then 'A' end || to_char(ch.count), '|') .....
Before aggregation, each row is inspected for the condition ch.typecode = 1. If it is true, an 'A' is pre-pended (concatenated in front of) to_char(ch.count). I am just guessing that's what you need.
If you need this 'A' to be pre-pended to ch.count for the ORDER BY condition also, then you can do the same thing there. You will need to wrap within to_char as well. (If you don't, Big Brother Oracle will do it for you anyway, but try to avoid implicit conversions whenever possible.)

Related

SQL How to make nested query with substring/case statement/ trim

New to SQL and trying to understand nested queries and how to use them. I have a substring, case statement, and trim statement that I'm trying to put together but am unsure of how. The substring has to be done first, then the case statement, then the trim. This is what I have at the moment but unsure of how to get it working. The code is random names/tables as an example
SELECT dtXYZ.*
FROM
(
SELECT dt,
SUBSTRING_INDEX(SUBSTRING_INDEX(dt, ..................... ) as lioness,
SUBSTRING_INDEX(SUBSTRING_INDEX(dt, .....................) as tiger,
SUBSTRING_INDEX(dt, .................) as bear
FROM Animaltab
) dtXYZ
SELECT
CASE WHEN length(bear) = 4 THEN bear
ELSE concat('0', bear)
END AS bear_corr,
CASE WHEN length(lion) = 7 THEN lioness
ELSE concat('0', lioness)
END AS lion_corr
trim(lion_corr) || '_' || trim(tiger) || '_' || trim(bear_corr) as new_imp_animal
Spark supports CTE https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-cte.html
even with databrics this will work see Common Table Expressions (CTEs) in Databricks and Spark
ANd you can nest them like this
WITH dtXYZ(dt,lioness,tiger,bear() AS ( SELECT dt,
SUBSTRING_INDEX(SUBSTRING_INDEX(dt, ..................... ) as lioness,
SUBSTRING_INDEX(SUBSTRING_INDEX(dt, .....................) as tiger,
SUBSTRING_INDEX(dt, .................) as bear
FROM Animaltab),
dtcorrected (dt,bear_corr,lion_corr,tiger) as (
SELECT
dt,
CASE WHEN length(bear) = 4 THEN bear
ELSE concat('0', bear)
END AS bear_corr,
CASE WHEN length(lion) = 7 THEN lioness
ELSE concat('0', lioness)
END AS lion_corr
,tiger
FROM dtXYZ)
SELECT
dt,
trim(lion_corr) || '_' || trim(tiger) || '_' || trim(bear_corr) as new_imp_animal FROM dtcorrected
Order of operations can be tricky with SQL if you're used to ordering things in a procedure. Like nbk commented, CTEs or Common Table Expressions are your best bet. CTEs are defined by the 'with' keyword and are very similar to nested subqueries (you could write the same query nested if you wanted) but are better suited to this operation where the nesting structure of the code doesn't mimic the nesting of the data. I always use CTEs if I'm joining tables that each need independent grouping or filtering. The SQL in the parenthesis essentially creates a view, and the outside SQL is a second higher-order select statement to create a result set. If I'm working with hierarchical data (parent, child, grandchild), I'll go with the nesting in the query to follow that path, but usually, the CTE is easier to organize your ideas. Here's how that would work:
with dtXYZ as
(
SELECT dt,
SUBSTRING_INDEX(SUBSTRING_INDEX(dt, ..................... ) as lioness,
SUBSTRING_INDEX(SUBSTRING_INDEX(dt, .....................) as tiger,
SUBSTRING_INDEX(dt, .................) as bear
FROM Animaltab
)
SELECT
CASE WHEN length(bear) = 4 THEN bear
ELSE concat('0', bear)
END AS bear_corr,
CASE WHEN length(lion) = 7 THEN lioness
ELSE concat('0', lioness)
END AS lion_corr,
trim(lion_corr) || '_' || trim(tiger) || '_' || trim(bear_corr) as new_imp_animal
from
dtXYZ
And in terms of 'order of operations,' case statements and functions in a select can be referenced by other parts of the select statement as inputs. Things can get hairy when you use 'if' ideas that resolve to illogical or error-causing conditions. Still, otherwise, I've had no issues with having many parts of a select refer to each other. It's an excellent way to test out nesting functions.

How to easily remove count=1 on aliased field in SQL?

I have the following data in a table:
GROUP1|FIELD
Z_12TXT|111
Z_2TXT|222
Z_31TBT|333
Z_4TXT|444
Z_52TNT|555
Z_6TNT|666
And I engineer in a field that removes the leading numbers after the '_'
GROUP1|GROUP_ALIAS|FIELD
Z_12TXT|Z_TXT|111
Z_2TXT|Z_TXT|222
Z_31TBT|Z_TBT|333 <- to be removed
Z_4TXT|Z_TXT|444
Z_52TNT|Z_TNT|555
Z_6TNT|Z_TNT|666
How can I easily query the original table for only GROUP's that correspond to GROUP_ALIASES with only one Distinct FIELD in it?
Desired result:
GROUP1|GROUP_ALIAS|FIELD
Z_12TXT|Z_TXT|111
Z_2TXT|Z_TXT|222
Z_4TXT|Z_TXT|444
Z_52TNT|Z_TNT|555
Z_6TNT|Z_TNT|666
This is how I get all the GROUP_ALIAS's I don't want:
SELECT GROUP_ALIAS
FROM
(SELECT
GROUP1,FIELD,
case when instr(GROUP1, '_') = 2
then
substr(GROUP1, 1, 2) ||
ltrim(substr(GROUP1, 3), '0123456789')
else
substr(GROUP1 , 1, 1) ||
ltrim(substr(GROUP1, 2), '0123456789')
end GROUP_ALIAS
FROM MY_TABLE
GROUP BY GROUP_ALIAS
HAVING COUNT(FIELD)=1
Probably I could make the engineered field a second time simply on the original table and check that it isn't in the result from the latter, but want to avoid so much nesting. I don't know how to partition or do anything more sophisticated on my case statement making this engineered field, though.
UPDATE
Thanks for all the great replies below. Something about the SQL used must differ from what I thought because I'm getting info like:
GROUP1|GROUP_ALIAS|FIELD
111,222|,111|111
111,222|,222|222
etc.
Not sure why since the solutions work on my unabstracted data in db-fiddle. If anyone can spot what db it's actually using that would help but I'll also check on my end.
Here is one way, using analytic count. If you are not familiar with the with clause, read up on it - it's a very neat way to make your code readable. The way I declare column names in the with clause works since Oracle 11.2; if your version is older than that, the code needs to be re-written just slightly.
I also computed the "engineered field" in a more compact way. Use whatever you need to.
I used sample_data for the table name; adapt as needed.
with
add_alias (group1, group_alias, field) as (
select group1,
substr(group1, 1, instr(group1, '_')) ||
ltrim(substr(group1, instr(group1, '_') + 1), '0123456789'),
field
from sample_data
)
, add_counts (group1, group_alias, field, ct) as (
select group1, group_alias, field, count(*) over (partition by group_alias)
from add_alias
)
select group1, group_alias, field
from add_counts
where ct > 1
;
With Oracle you can use REGEXP_REPLACE and analytic functions:
select Group1, group_alias, field
from (select group1, REGEXP_REPLACE(group1,'_\d+','_') group_alias, field,
count(*) over (PARTITION BY REGEXP_REPLACE(group1,'_\d+','_')) as count from test) a
where count > 1
db-fiddle

Is there any way to add a unique identifier to every replacement that REGEXP_REPLACE performs?

I have a large text-CLOB that needs some converting done.
A lot of the lines in my CLOB are preceded by a variable name in brackets like so:
[VARIABLE_NAME_ONE] variable_one = 1 + variable_two;
[VARIABLE_NAME_TWO] variable_two = 2 + variable_three;
[VARIABLE_NAME_ONE] variable_one = variable_four - 4;
The problem is that some of the variable names in brackets are not unique, but they need to be unique after I'm done converting.
What I would like is to extend all the variable names in brackets with something like a counter, in order to ensure uniqueness. Because of the brackets, my initial thought was a simple regexp_replace, but is there any way to incorporate a counter in that?
To complete my explanation, I would like the previous example lines converted into this:
[VARIABLE_NAME_ONE_1] variable_one = 1 + variable_two;
[VARIABLE_NAME_TWO_2] variable_two = 2 + variable_three;
[VARIABLE_NAME_ONE_3] variable_one = variable_four - 4;
You can use hierarchical query through splitting by semi-colons by REGEXP_SUBSTR while replacing the values just before the square brackets. And then combine the pieces by LISTAGG() function
UPDATE tab
SET col = (
WITH t AS
(
SELECT REPLACE(REGEXP_SUBSTR(col,'[^;]+',1,level),']','_'||level||']')
AS col, level AS lvl
FROM TAB t
CONNECT BY level <= REGEXP_COUNT(col,';')
)
SELECT LISTAGG(col,';') WITHIN GROUP (ORDER BY lvl)||';'
FROM t)
Demo

Check if any substring from array appear in string?

I have the following query:
select case when count(*)>0 then true else false end
from tab
where param in ('a','b') and position('T' in listofitem)>0
This checks if 'T' exists in the column listofitem and if it does the count is > 0. Basically it's a search for sub string.
This works well in this private case. However my real case is that I have text[] called sub_array meaning multiple values to check. How can I modify the query to handle the sub_array type? I prefer to have it in a query rather than a function with a LOOP.
What I actualy need is:
select case when count(*)>0 then true else false end
from tab
where param in ('a','b') and position(sub_array in listofitem)>0
This is not working since sub_array is of type Text[]
Use the unnest() function to expand your array & bool_and() (or bool_or() -- this depends on what you want match: all array elements, or at least one) to aggregate:
select count(*) > 0
from tab
where param in ('a','b')
and (select bool_and(position(u in listofitem) > 0)
from unnest(sub_array) u)
A brute force method would be to convert the array to a string:
select (count(*) > 0) as flag
from tab
where param in ('a','b') and
array_to_string(listofitem, '') like '%T%';
I should note that comparing count(*) is not the most efficient way of doing this. I would suggest instead:
select exists (select 1
from tab
where param in ('a','b') and
array_to_string(listofitem, '') like '%T%'
) as flag;
This stops the logic at the first match, rather than counting all matching rows.

sql with <> and substring function

The output of query has to return records where company is not equal to 'CABS' OR substring of company until empty space (eg CABS NUTS).The company name can the CABS, COBS, CABST , CABS NUTS , CAB
SELECT *
FROM records
WHERE UPPER(SUBSTR(company, 0, (INSTR(company,' ')-1))) <> 'CABS'
OR COMPANY <> 'CABS'
But the above query is returing CABS NUTS along with COBS , CAB.
I tried using "LIKE CABS" it looks fine but if the company name is "CAB" it will not return "CABS" and CABS NUTS because of like. So LIKE is completely ruled out.
Can anyone please suggest me.
So you want all records where the first 4 characters of the Company field are not "CABS". Okay.
WHERE left(company, 4) != 'CABS'
SELECT
*
FROM
Records
WHERE
LEFT(Company, 4) <> 'CABS'
AND Company <> 'CABS'
Note: Basic TSQL String Comparison is case-insensitive
Can quite work out which ones you do want returns, but have you considered LIKE 'CABS %'
select * from records where company NOT IN (SELECT company
FROM records
WHERE UPPER(SUBSTR(company, 0, (INSTR(company,' ')-1))) = 'CABS'
OR COMPANY = 'CABS')
I think this will fetch the desired records from the records table
RECORDS:
COMPANY
=====================
CAB
CABST
COBS
First, I think you should use AND instead of OR in your compound condition.
Second, you could simplify the condition this way:
WHERE UPPER(SUBSTR(company, 0, (INSTR(company || ' ',' ') - 1))) <> 'CABS'
That is, the company <> 'CABS' part is not needed in this case.
The problem you are getting comes about because the result of the SUBSTR is null if there is not a space. And thanks to three value logic, the result of some_var <> NULL is NULL, rather than TRUE as you might expect.
And example of this is shown by the query below:
with mytab as (
select 1 as myval from dual union all
select 2 as myval from dual union all
select null as myval from dual
)
select *
from mytab
where myval = 1
union all
select *
from mytab
where myval <> 1
This example will only return two rows rather than three rows that you might expect.
There are several ways to rewrite the condition to make it ignore the null result from the substr function. These are listed below. However, as mentioned by one of the other respondents, the two conditions need to be joined using the AND operator rather than OR.
Firstly, you could explicitly check that the column has a space in it using the set of conditions below:
(INSTR(company,' ') = 0 or
UPPER(SUBSTR(company, 0, (INSTR(company,' ')-1))) <> 'CABS') and
COMPANY <> 'CABS'
Another option would be to use the LNNVL function. This is a function that I only recently found out about. It return TRUE from a condition when the result of the condition provided as the input is FALSE or NULL.
lnnvl(UPPER(SUBSTR(company, 0, (INSTR(company,' ')-1))) = 'CABS') and
COMPANY <> 'CABS'
And another option (which would probably be my preferred option) is to use the REGEXP_LIKE function. This is simple, to the point and easy to read.
WHERE not regexp_like(company, '^CABS( |$)')