Assume there are ten employee records and each contains a salary value of 100, except for one, which has a null value in the salary field....
SELECT SUM((AVG(LENGTH(NVL(SALARY,0)))))
FROM DUCK
GROUP BY SALARY;
First inner bracket NVL(SALARY, 0) -> the first 9 employee's salary is 100 and the last one is 0.
Second inner bracket LENGTH() -> the first 9 will be 3 and the last one is 0.
Third inner bracket AVG() calculates length of the salary of 10 employees which is ((3*9)+0)/10 = 2.7
So what does the last bracket do when the sum function computes a list of data. but the avg function computes out the data which leaves to a digit?
Your query is running like this:
null is considered as 0
length of the salary is calculated. For 100, it is 3 and for null salary (0), it is 1 (length of 0 is 1)
average of the salaries group by salary amount. So there will be two groups, 1st is 100 and 2nd is 0. The average will be 3 for the first group and 1 for the 2nd group.
the sum of all averages. That is 3+1 = 4
So for sample data mentioned in your case, it will be 4.
See this db<>fiddle to get an idea.
In my view the group by clause at the end is what's to be noted. Without the group by you are Not able to do a nested group by. Trying to do it results in
With sal as
(
Select 100 salary from dual union all
Select 200 from dual union all
Select 300 from dual union all
Select 10 from dual union all
Select 20 from dual union all
Select 40 from dual union all
Select 50 from dual union all
Select 50 from dual union all
Select null from dual union all
Select null from dual
)
Select SUM(AVG(LENGTH(coalesce(SALARY,0))))
--,AVG(LENGTH(coalesce(SALARY,0)))
from sal
ORA-00978: nested group function without GROUP BY
00978. 00000 - "nested group function without GROUP BY"
*Cause:
*Action:
Now when you add the group by , the sum is repeated for each grouped by salary. So in this case you are summing the value 2.1 9 times . The value 2.1 is derived by the AVG AVG(LENGTH(coalesce(SALARY,0))). NULL are counted as invididual records and not as one after being grouped hence 9 and not 8 records.
Hope this helps.
By the way what does it do in your application, the use case? just curious.
Related
I would like to create dynamic column in Redshift, which will add new value incremented by 1 dynamically. Basically it will calculate month distance from specific date, let's say 1 Jan 2020. So for current month it should be 23, in next month it should be 24 etc. Is it possible to somehow replace something which I have now static in WITH statement? Counter stops on 12 and I would have to increment it every month manually.
with months as (
select 1 as mon union all select 2 union all select 3 union all select 4 union all
select 5 as mon union all select 6 union all select 7 union all select 8 union all
select 9 as mon union all select 10 union all select 11 union all select 12
),
I think you should use DATEDIFF function as it gives you the difference in months between two dates. Simply put the dates you want: https://docs.aws.amazon.com/redshift/latest/dg/r_DATEDIFF_function.html
Example:
select datediff(mon,'2020-01-01',current_date) as mon_diff
Depends on the size for your table, maybe save the code as a view so every time you run it you will get the correct difference.
Try this
Alter table tablename
Add New_column number Default
datediff(mon,date_col, current_date);
Or
With data as
(Select row_number() over (order by 1) rn from
table)
Select datediff(month, max(rn), current_date)
from data;
Note: replace table to some table with entries count as more than your required like 9 and so on then can limit the results as required
I have two tables .
Input:
I have joined with the calendar table and bring the data till current.
I need a output .
I tried a query with UNION and Aggregation but I need to query two times and aggregate the same table . Since the table is very big .Is there a option to do different way
SELECT ID ,PERIOD,SUM(AMOUNTYTD) AMOUNTYTD,SUM(AMOUNT) AMOUNT
FROM (
SELECT ID ,b.PERIOD,SUM(AMOUNT) AMOUNTYTD,0 AMOUNT
FROM transaction a RIGHT OUTER JOIN CALENDAR b
ON b.PERIOD<=a.PERIOD
UNION ALL
SELECT ID ,PERIOD,0,SUM(AMOUNT)
FROM transaction
GROUP BY ID,PERIOD
)
GROUP BY ID,PERIOD
Showing the periodic amount side by side with the cumulative amount is easy - actually you only need to be able to create the correct table with the periodic amounts, the cumulative amounts are a simple application of analytic sum.
The key to joining the calendar table to the "input" data is to use a partitioned outer join - notice the partition by (id) clause in the join of the two tables. This causes the "inputs" data to be partitioned into separate sub-tables, one for each distinct id; the outer join to the calendar table is done separately for each such sub-table, and then the results are combined with a logical "union all".
with
input (id, period, amount) as (
select 1, 202010, 100 from dual union all
select 1, 202011, 50 from dual union all
select 2, 202011, 400 from dual
)
, calendar (period) as (
select 202010 from dual union all
select 202011 from dual union all
select 202012 from dual union all
select 202101 from dual
)
select id, period, amountytd, amount
from (
select i.id, period, i.amount,
sum(i.amount) over (partition by i.id order by period)
as amountytd
from calendar c left outer join input i partition by (id)
using (period)
)
where amountytd is not null
order by id, period
;
ID PERIOD AMOUNTYTD AMOUNT
--- ---------- ---------- ----------
1 202010 100 100
1 202011 150 50
1 202012 150
1 202101 150
2 202011 400 400
2 202012 400
2 202101 400
You have the query - I have joined with the calendar table and bring the data till current. let us assume it as your_query
You can use analytical function on it as follows:
Select t.*,
Case when lead(amountytd) over (partition by id order by period) = amountytd
then null
else amountytd
end as amount
From (your_query) t
Customer table and Acct tables has global scope, they share and increment this value
Below is customer table, SEQ NO 1 is beginning of customer data, SEQ_NO 238 is beginning of another customer data
Another is account table, all accounts with their SEQ_NOs inside a boundary of customer get same group (I want to group those accounts to the same customer, so that I can use listAgg to concatenate account id.), for example, below from SEQ_NO 2 and NO 224 (inclusive) should be assigned to the same group.
Is there a SQL way to do that, The worst case I was thinking is to define oracle type, and using function do that.
Any help is appreciate.
If I understand your question correctly, you want to be able to assign rows in the account table to groups, one per customer, so that you can then aggregate based on these groups.
So, the question is how to identify to which customer each account belongs, based on the sequence boundaries given in the first table ("customer") and the specific account numbers in the second table ("account").
This can be done in plain SQL, and relatively easily. You need a join between the accounts table and a subquery based on the customers table. The subquery must show the first and the last sequence number allocated to each client; to do that, you can use the lead analytic function. A bit of care must be taken regarding the last customer, for whom there is no upper limit for the sequence numbers.
You didn't provide test data in a usable format, so I created sample data in the with clause below (which is not part of the query - it's just there as a placeholder for test data).
with
customer (cust_id, seq_no) as (
select 101, 1 from dual union all
select 102, 34 from dual union all
select 200, 58 from dual union all
select 130, 90 from dual
)
, account (acct_id, seq_no) as (
select 1003, 3 from dual union all
select 1005, 11 from dual union all
select 1007, 33 from dual union all
select 1008, 60 from dual union all
select 1103, 77 from dual union all
select 1140, 92 from dual union all
select 1145, 99 from dual
)
select c.cust_id,
listagg(a.acct_id, ',') within group (order by a.acct_id) as acct_list
from (
select cust_id, seq_no as lower_no,
lead(seq_no) over (order by seq_no) - 1 as upper_no
from customer
) c
left outer join account a
on a.seq_no between c.lower_no and nvl(c.upper_no, a.seq_no)
group by c.cust_id
order by c.cust_id
;
OUTPUT
CUST_ID ACCT_LIST
------- --------------------
101 1003,1005,1007
102
130 1140,1145
200 1008,1103
Is there any difference between COUNT(*) and COUNT(attribute_name)?
I used count(attribute_name) as I thought that it would be specific hence the searching process would be easier. Is that true?
It would be great to see any example with sql code with my issue to help me understand better
Imagine this table:
select Count(TelephoneNumber) from Calls -- returns 3
select Count(*) from Calls -- returns 4
count(column_name) also counts duplicate values. Consider:
select Count(TelephoneNumber) from Calls -- returns 4
COUNT(*) counts all the records in the group.
COUNT(column_name) only counts non-null values.
There is also another typical expression, COUNT(DISTINCT column_name), that counts non-null distinct values.
Since you asked for it, here is a demo on DB Fiddlde:
with t as (
select 1 x from dual
union all select 1 from dual
union all select null from dual
)
select count(*), count(x), count(distinct x) from t
COUNT(*) | COUNT(X) | COUNT(DISTINCTX)
-------: | -------: | ---------------:
3 | 2 | 1
COUNT(*) will count all the rows.
COUNT(column) will count non-NULLs only.
Your can use of COUNT(*) or COUNT(column) which should be based on the desired output only.
Consider below Example of employee table
ID Name Description
1 Raji Smart
2 Rahi Positive
3
4 Falle Smart
select count(*) from employee;
Count(*)
4
select count(name) from employee;
Count(Name)
3
count() only counts non-null values. * references the complete row and as such never excludes any rows. count(attribute_name) only counts rows where that column is no null.
So this:
select count(attribute_name)
from the_table
is equivalent to:
select count(*)
from the_table
where attribute_name is not null
The difference is simple: COUNT(*) counts the number of rows produced by the query, whereas COUNT(1) counts the number of 1 values. Note that when you include a literal such as a number or a string in a query, this literal is "appended" or attached to every row that is produced by the FROM clause.
For more detail this link would help you understand.
I need a way to put results into # of groups that I specify.
I have tried ntile() function, which I thought would use but it's not working:
WITH CTE AS (
SELECT 1 as Number
UNION ALL
SELECT Number+1
FROM CTE
WHERE Number < 100
)
SELECT *, ntile(80) over (order by number desc) as 'test'
FROM CTE
For the expected results, the Quartile column should output a number for every 2 entries (as specified in NTILE(80)), but it can be 2, 4, 10, or any number I specify.
Maybe NTILE() is not the right function but is there a function that does what I want?
So, if I specify 3, then the result should group every 3 records. If I specify 15, then the result should group every 15 records and move onto next group.
Hope I'm being clear
...should output a number for every 2 entries...
No, you have 100 entries and you want to divide them in 80 groups. You'll get some groups with 1 entry and other groups with 2 entries.
Read the definition of NTILE(). If you want groups with 2 entries you can do it as shown below by dividing it in 50 groups:
WITH recursive
CTE AS (
SELECT 1 as Number
UNION ALL
SELECT Number + 1
FROM CTE
WHERE Number < 100
)
SELECT *,
ntile(50) -- changed here
over (order by number desc) as test
FROM CTE
You didn't say what database engine you are using, so I assumed PostgreSQL.
I think you simply want the modulus operator:
WITH CTE AS (
SELECT 1 as Number
UNION ALL
SELECT Number+1
FROM CTE
WHERE Number < 100
)
SELECT cte.*,
(ROW_NUMBER() OVER (ORDER BY Number DESC) - 1) % 3 -- or however many groups that you want
FROM CTE