mean, std and counts in one record

mean, std and counts in one record - sql

I have data that looks like this:
id res res_q
1 12.9 normal
2 11.5 low
3 13.2 normal
4 9.7 low
5 12.0 low
6 15.5 normal
7 13.5 normal
8 13.3 normal
9 13.5 normal
10 13.1 normal
11 13.4 normal
12 12.9 normal
13 11.8 low
14 11.9 low
15 12.8 normal
16 13.1 normal
17 12.2 normal
18 11.9 low
19 12.5 normal
20 16.5 normal
res_q can take the values 'low', 'normal' and 'high'.
I want to aggregate it so in one record I will have both the mean and std of res, and the counts of low, normal and high, all in one record, like this
mean sd low normal high
12.9 1.41 6 14 0
Off course I can do it by first aggregating the mean and std using AVG and STDEV, and then using COUNT to get the low/normal/high counts, like this:
SELECT AVG(res) AS mean,
STD(res) AS sd,
(SELECT COUNT(1) FROM temp1 WHERE res_q='low') AS low,
(SELECT COUNT(1) FROM temp1 WHERE res_q='normal') AS normal,
(SELECT COUNT(1) FROM temp1 WHERE res_q='high') AS high
FROM temp1
But, is there a more efficient way to do it?
One possibility I can think of is first to get the mean and the sd using AVG and STDEV, then get the counts using GROUP BY and then add the counts using UPDATE. Is this really more efficient? Anything else?
Thank you for your help.

Use conditional aggregation
SELECT AVG(res) AS mean,
STD(res) AS sd,
count(case when res_q='low' then 1 end) AS low
count(case when res_q='normal' then 1 end) AS normal,
count(case when res_q='high' then 1 end) AS high
FROM temp1

Related

Individuals in multiple departments affecting grand total count

I have a report I am trying to simplify but I am running into an issue.
(Undesired) The rows/columns of the report currently look like the following.
Department
Total
Probation (%)
Suspended (%)
All Employees
32
16.3
1.4
All Teams
30
23.5
2.2
Total Men's Teams
10
14.8
2.8
Total Women's Teams
10
34.3
1.4
Men's Wear
10
5.9
0.0
Women's Wear
10
21.4
0.0
UniSec Wear
10
15.0
6.3
This is happening because two people work on two teams. One person works in Mens Wear and UniSex Wear, and one person works in Women's Wear and UniSex Wear. The below table has records like this.
Col1
Col2
1234
Men's Wear
1234
UniSex Wear
9876
Women's Wear
9876
UniSex Wear
(Desired) Im looking for something like this.
Department
Total
Probation (%)
Suspended (%)
All Employees
30
16.3
1.4
All Teams
30
23.5
2.2
Total Men's Teams
10
14.8
2.8
Total Women's Teams
10
34.3
1.4
Men's Wear
10
5.9
0.0
Women's Wear
10
21.4
0.0
UniSec Wear
10
15.0
6.3
I have thought about using LISTAGG() on Col2 to get this affect.
Col1
Col2
1234
Men's Wear,UniSex Wear
9876
Women's Wear,UniSex Wear
Using LISTAGG() gives me the correct count for "All Employees" but then I get groupings of "Men's Wear,UniSex Wear" instead of a separate one for "Men's Wear" and one for "UniSex Wear". Is it possible to group by the individual comma separated values in Col2 after they have been LISTAGG()'ed, or is there a better way of achieving my end results?
Any assistance on achieving this would be greatly appreciated.

I would advise correcting the All_Employees data alone instead of doing the LISTAGG.
OR
Use a separate table to LISTAGG and un-LISTAGG your data which is different from the original table used to calculate the Total, Probation and Suspended data
For un-LISTAGG you can use the below example where table_two is your source table.
with
d2 as (
select
distinct id,
regexp_substr(
products, '[^,]+', 1, column_value
) as products
from
table_two cross
join TABLE(
Cast(
MULTISET (
SELECT
LEVEL
FROM
dual CONNECT BY level <= REGEXP_COUNT(products, '[^,]+')
) AS sys.ODCINUMBERLIST
)
)
)
SELECT
ID,
PRODUCTS
FROM
d2;

LAG function with sequential calculus

I come to you today because I'm struggling with a query that involve the LAG function (FYI, I am using PostgreSQL).
I have a table that contains the quantities of a product sold by country to another one on a monthly basis. The table is defined like this:
create table market_research.test_tonnage(
origin text, -- Origin country
desti text, -- Destination country
yr int, -- Year
mt int, -- Month
q numeric -- quantity sold (always > 0)
)
Here is the content:
origin
desti
yr
mt
q
toto
coucou
2019
1
1.4
toto
coucou
2019
2
2.5
toto
coucou
2019
3
1.2
tata
yoyo
2018
11
5.4
tata
yoyo
2018
12
5.5
tata
yoyo
2019
1
5.2
I am trying to create a view that will add 2 calculated fields as following:
beginning_stock : Initial value of 0, then beginning_stock = ending_stock of the previous month
ending_stock : ending_stock = beginning_stock - q
origin
desti
yr
mt
q
beginning_stock
ending_stock
toto
coucou
2019
1
1.4
0
-1.4
toto
coucou
2019
2
2.5
-1.4
-3.9
toto
coucou
2019
3
1.2
-3.9
-5.1
tata
yoyo
2018
11
5.4
0
-5.4
tata
yoyo
2018
12
5.5
-5.4
-10.9
tata
yoyo
2019
1
5.2
-10.9
-16.1
I have tried many queries using the LAG function but I think that the problem comes from the sequentiality of the calculus over time. Here is an example of my attempt:
select origin,
desti,
yr,
mt,
q,
COALESCE(lag(ending_stock, 1) over (partition by origin order by yr, mt), 0) beginning_stock,
beginning_stock - q ending_stock
from market_research.test_tonnage
Thank you for your help!
Max

You need a cumulative SUM() function instead of LAG():
demo:db<>fiddle
SELECT
*,
SUM(-q) OVER (PARTITION BY origin ORDER BY yr, mt) + q as beginning, -- 2
SUM(-q) OVER (PARTITION BY origin ORDER BY yr, mt) as ending -- 1
FROM my_table
Sum all quantities (because you want negative values, you can make the values negative before, of course) until the current gives you current total (ending)
Same operation without the current value (add q again, because the SUM() subtracted it already) gives the beginning.

SQL Server : Segregate data into dynamic buckets

Please help me with a SQL Server Query that can bucket data dynamically into ranges.
Here is my source data:
Value
=======
45
33.5
33.1
33
32.8
25.3
25.2
25.1
25
21.3
21.2
21.1
20.9
12.3
12.2
12.15
12.1
12
11.8
Expected output:
Value Rank
=============
45 1
(mean value in this range is 45)
33.5 2
33.1 2
33 2
32.8 2
(mean value is 33.1 - any value in the range (-10%) 29.79 to 36.41 (+10%) should be given a rank of 2)
25.3 3
25.2 3
25.1 3
25 3
21.3 4
21.2 4
21.1 4
20.9 4
12.3 5
12.2 5
12.15 5
12.1 5
12 5
11.8 5
DENSE, RANK and NTILE does not seem to give me a ranking like this. The range is dynamic and not known earlier. Any help highly appreciated.
The bucketing rule is:
Each bucket contains a data set with 10% variation from the mean value

Here's one way:
select val, dense_rank() over (order by cast(val/10 as int) desc) ntile
from yourtable
Use dense_rank but specify your buckets in the order by clause. (I'm assuming this is how it works for your sample data)

First convert the value to a number having 2 decimal places.
Then, use a CASE expression for doing FLOOR or ROUND function based on the first number after decimal point.
Then use DENSE_RANK function for giving rank based on the rounded value.
Query
select z.[Value], dense_rank() over(order by z.[val_rounded] desc) as [Rank] from(
select t.[Value],
case when substring(t.[Value2], charindex('.', t.[Value2], 1) + 1, 1) > 5
then round(t.[Value], 0) else floor(t.[Value]) end as [val_rounded] from(
select [Value], cast((cast([Value]as decimal(6, 2))) as varchar(50)) as [Value2]
from [your_table_name]
)t
)z;
Demo

Sequence / serial no in Oracle sql

My question is similar to how to generate Serial numbers +Add 1 in select statement
But I need the seq as below in Oracle sql
table 1 data:
facility store stop_seq
32 729 1
32 380 2
32 603 3
12 722 4
12 671 5
48 423 6
I need result as below:
facility res_seq
32 1
12 2
48 3
Here res_seq should be order by based on stop_seq in tabel 1
Please help

select facility, row_number() over(order by max(stop_seq)) res_seq
from your_tab group by facility;
ROW_NUMBER is explained in the link posted in the question
Analytic functions are performed after GROUP BY, so in this query the data is aggregated by facility and then row numbers are assigned

Transpose groups/subgroups in sql oracle

I have date column which i have to divide in 6 quarters and calculate count,ratio- A/Count, Avg(colC) by State. Date column i have converted to YYYY Q format. I was wondering if i can get results shown below. i am using oracle 11g. I am just trying to write a sql which can give me results in above format. I am able to Group results in quarter but unable to further subgroup it to show count,Ratio and Avg under each quarter. –
I have 2 tables that i need to use to get data below.
Table 1 Table 2
Customer_id St_Nme St_Cd Customer_id No_of_sales Time_spent Date
1 Alabama AL 1 4 4.5 01122012
2 California CA 2 7.5 9.33 03062012
Desired Output
Count-Count of sales
Ratio-Time_spent/Count of sales
Avg - Average of time spent
Q42012 Q32012 Q22012 Q12012 Q42011 Q32012
count Ratio Avg count Ratio Avg count Ratio Avg
State
Alabama 3 4.5 1.2 8 7.4 3.2 65 21.1 34.4
A.. 4 7.5 3.2 5 9.4 5.2 61 25.1 39.4
A.. 9 6.5 5.2 4 3.4 3.7 54 41.1 44.4
Boston
Cali..
Den..

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

mean, std and counts in one record - sql

Use conditional aggregation SELECT AVG(res) AS mean, STD(res) AS sd, count(case when res_q='low' then 1 end) AS low count(case when res_q='normal' then 1 end) AS normal, count(case when res_q='high' then 1 end) AS high FROM temp1

Related

Individuals in multiple departments affecting grand total count

LAG function with sequential calculus

SQL Server : Segregate data into dynamic buckets

Sequence / serial no in Oracle sql

Transpose groups/subgroups in sql oracle

Categories

Resources