How to calculate mean value per group in Teradata SQL? - sql

have table in Teradata SQL like below:
SMS_ID | PRODUCT
-------------------
11 | A
22 | A
33 | A
87 | B
89 | B
14 | C
Column "SMS_ID" presents ID of SMS sent do client
Column "PRODUCT" presents ID of product which was a subject of SMS
My question is: How can I calculate in Teradata SQL mean number of SMS per PRODUCT ?
As a result I need something like below:
AVG | PRODUCT
-------
0.5 | A -> because 3 / 6 = 0.5
0.33 | B -> because 2 / 6 = 033
0.16 | C -> because 1 / 6 = 0.16

You want fractions of the total count:
SELECT
product
,COUNT(*) -- count per product
/ CAST(SUM(COUNT(*)) OVER () AS FLOAT) -- total count = sum of counts per procuct
FROM yourTable
GROUP BY PRODUCT

Related

Sum dataset in SQL with conditional expressions

I'm working with a dataset of items with different values and I would like a SQL query to calculate the total USD value of the dataset.
Example Dataset:
id | type | numOrdered
0 | apple | 1
1 | orange | 3
2 | apple | 10
3 | apple | 5
4 | orange | 2
5 | apple | 1
Consider this dataset of fruit orders. Let's say apples are worth $1 and oranges are worth $2. I would like to know how much total USD in fruit orders we have.
I'd like to perform the same operation as this example Javascript function, but using SQL:
let sum = 0;
for(let fruitOrder of fruitOrders) {
if(fruitOrder.type == "orange"){
sum += fruitOrder.numOrdered*2;
} else {
sum += fruitOrder.numOrdered*1;
}
}
return sum;
So the correct answer for this dataset would be $27 USD total since there are 17 apples worth $1 and 5 oranges worth $2.
I know how to break it down into two distinct queries giving me the number I want split by type
SELECT
sum("public"."fruitOrders"."num"*2) AS "sum"
FROM "public"."fruitOrders"
WHERE "public"."fruitOrders"."type" = 'orange';
which would return $10, the total USD value of oranges
SELECT
sum("public"."fruitOrders"."num") AS "sum"
FROM "public"."fruitOrders"
WHERE "public"."fruitOrders"."type" = 'apple';
which would return $17, the total USD value of apples
I just don't know how to sum those numbers together in SQL to get $27, the total USD value of the dataset.
If you want the values 1 and 2 hardcoded then you can use a CASE statement with SUM():
SELECT
sum(case type
when 'apple' then 1
when 'orange' then 2
end * numordered
) AS "sum"
FROM "public"."fruitOrders"
See the demo.
Result:
| sum |
| --- |
| 27 |

Percentage of variable corresponding to percentage of other variable

I have two numerical variables, and would like to calculate the percentage of one variable corresponding to at least 50% of the other variable's sum.
For example:
A | B
__________
2 | 8
1 | 20
3 | 12
5 | 4
2 | 7
1 | 11
4 | 5
Here, the sum of column B is 68, so I'm looking for the rows (in B's descending order) where cumulative sum is at least 34.
In that case, they are rows 2, 3 & 6 (cumulative sum of 45). The sum of these row's column A is 5, which I want to compare to the total sum of column A (18).
Therefore, the result I'm looking for is 5 / 18 * 100 = 28.78%
I'm looking for a way to implement this in QlikSense, or in SQL.
Here's one way you can do it - there is probably some optimisation to be done, but this gives what you want.
Source:
LOAD
*,
RowNo() as RowNo_Source
Inline [
A , B
2 , 8
1 , 20
3 , 12
5 , 4
2 , 7
1 , 11
4 , 5
];
SourceSorted:
NoConcatenate LOAD *,
RowNo() as RowNo_SourceSorted
Resident Source
Order by B asc;
drop table Source;
BTotal:
LOAD sum(B) as BTotal
Resident SourceSorted;
let BTotal=peek('BTotal',0);
SourceWithCumu:
NoConcatenate LOAD
*,
rangesum(peek('BCumu'),B) as BCumu,
$(BTotal) as BTotal,
rangesum(peek('BCumu'),B)/$(BTotal) as BCumuPct,
if(rangesum(peek('BCumu'),B)/$(BTotal)>=0.5,A,0) as AFiltered
Resident SourceSorted;
Drop Table SourceSorted;
I worked with a debug fields that might be useful but you could of course remove these.
Then in the front end you do your calculation of sum(AFiltered)/sum(A) to get the stat you want and format it as a percentage.

Find referenced value of multiple columns

I have a table Setpoints which contains 3 columns Base,Effective and Actual which contains an id that refers to the item found in io.
I would like to make a query that will return the io_value found in the io table for the id found in Setpoints.
Currently my query will return multiple id's and then I query the io table to find the io_value for each id.
Ex Query returning the ID's in the row
row # | base | effective | actual
1 | 24 | 30 | 40
2 | 25 | 31 | 41
3 | 26 | 32 | 42
But i want it return the value instead of the id
Ex returning the value for the id's instead
row # | base | effective | actual
1 | 2.3 | 4.5 | 3.44
2 | 4.2 | 7.7 | 4.41
3 | 3.9 | 8.12 | 5.42
Here are the table fields
IO
io_value
io_id
Setpoints
stpt_base
stpt_effective
stpt_actual
Using postgres 9.5
What Im using now
SELECT * from setpoints
For each row
SELECT io_id, io_value
from io
where io_id in
(stpt_effective, stpt_actual, stpt_base);
// these are from previous query
You can solve this by joining the io table three times to the setpoints table, using the three columns in each individual JOIN:
SELECT a.io_value AS base,
b.io_value AS effective,
c.io_value AS actual
FROM setpoints s
JOIN io a ON a.io_id = s.stpt_base
JOIN io b ON b.io_id = s.stpt_effective
JOIN io c ON c.io_id = s.stpt_actual;

Microsoft Report Builder - Row totals percent of column total

I'd really appreciate some help with Report Builder. As seen below, I have a report that shows the number of items. In my SQL query I have used a CASE statement to tag some of the items with a y or a n.
What I want to do is add a calculated cell that sums all the values of the items tagged with y and divide by the total and * 100 to find the percent of the rows tagged y of the total amount.
Answer looking for is -
Apple | Y | 100
Pear | Y | 200
Orange| N | 500
Total | 800
Percent of Ys = 37.5% (100+200/800*100)
I'm new to report builder so please let me know if this doesn't make sense.
Many thanks.
You could add two more columns to your query, using similar logic as your CASE statement for the Y/N column. The first column is populated with the value only when the condition for "Y" is true, otherwise it is zero. The second column is populated with the value only when the condition for "N" is true, otherwise it is zero. This would give you a result set similar to this:
All Y N
Apple | Y | 100 | 100 | 0
Pear | Y | 200 | 200 | 0
Orange| N | 500 | 0 | 500
Total | 800 | 300 | 500
Then your calculation is something like this:
Percent of Ys = (Sum(Y) / Sum(All)) * 100
i.e.
Percent of Ys = (300 / 800) * 100 = 37.5%

Tabulate Command Stata

I don't know if Stata can do this but I use the tabulate command a lot in order to find frequencies. For instance, I have a success variable which takes on values 0 to 1 and I would like to know the success rate for a certain group of observations ie tab success if group==1. I was wondering if I can do sort of the inverse of this operation. That is, I would like to know if I can find a value of "group" for which the frequency is greater than or equal to 15% for example.
Is there a command that does this?
Thanks
As an example
sysuse auto
gen success=mpg<29
Now I want to find the value of price such that the frequency of the success variable is greater than 75% for example.
According to #Nick:
ssc install groups
sysuse auto
count
74
#return list optional
local nobs=r(N) # r(N) gives total observation
groups rep78, sel(f >(0.15*`r(N)')) #gives the group for which freq >15 %
+---------------------------------+
| rep78 Freq. Percent % <= |
|---------------------------------|
| 3 30 43.48 57.97 |
| 4 18 26.09 84.06 |
+---------------------------------+
groups rep78, sel(f >(0.10*`nobs'))# more than 10 %
+----------------------------------+
| rep78 Freq. Percent % <= |
|----------------------------------|
| 2 8 11.59 14.49 |
| 3 30 43.48 57.97 |
| 4 18 26.09 84.06 |
| 5 11 15.94 100.00 |
+----------------------------------+
I'm not sure if I fully understand your question/situation, but I believe this might be useful. You can egen a variable that is equal to the mean of success, by group, and then see which observations have the value for mean(success) that you're looking for.
egen avgsuccess = mean(success), by(group)
tab group if avgsuccess >= 0.15
list group if avgsuccess >= 0.15
Does that accomplish what you want?