SQL - how to get as a query result both a column and the sum of that column's values - sql

I have a complicated stored procedure that calculates a column with numeric values and returns it as a part of data-set containing other columns as well. I am trying to find a way to return in the same query the SUM of that special column as well. I use SQL Management Studio and was thinking to use an OUT parameter or even a RETURN value. But if there is a more SQL-ish way to do it will definitely prefer it.
SELECT
OrID, QN, PRID, PCKID, Person, Price, CSID,
CASE
WHEN (COUNT(*) OVER (PARTITION BY OrID)) > 1
THEN Price * 0.2
ELSE Price * 0.1
END AS Commission
FROM
( < my subquery > )
I would also like to add SUM(Commission) to the the results of the above statement.
If my data is (partial)
OrID|Price
----+-----
1 | 100
2 | 100
2 | 50
3 | 80
I will get the following result
OrID|Price|Commission
----+-----+----------
1 | 100 | 10
2 | 100 | 20
2 | 50 | 10
3 | 80 | 8
And somewhere I would also like to see the SUM of the last column - 48
Something like Excel's SUM function at the end of the Commission column

You can use a subquery:
SELECT s.*, SUM(Commission) OVER (PARTITION BY OrId) as sum_commission
FROM (SELECT OrID, QN, PRID, PCKID, Person, Price, CSID
(CASE WHEN (count(*) OVER (PARTITION BY OrID)) > 1
THEN Price*0.2
ELSE Price*0.1
END) AS Commission
FROM (< my subquery >
) s
) s;
I assume you want it by OrId. If not remove the partition by.

Try using the with Rollup command. It does what you want
https://technet.microsoft.com/en-us/library/ms189305%28v=sql.90%29.aspx?f=255&MSPPError=-2147217396

Related

Calculating an average of averages in SQL Server

I want to do something very simple but I'm obviously missing a trick! I want to get an average of average values but I want to include the weighting of the original average calculation. I'll use a stripped back version of what I'm attempting to do.
So let's say I have the following table
Product date RunInterval AvgDuration_secs Executions
--------------------------------------------------------------------
A 29/12/19 1 1 100
A 29/12/19 2 2 10
What I want to find out is what the average duration was for Product A on 29/12. All the things I've tried so far are giving me an average of 1.5 secs ie it's adding together the duration of 1 & 2 secs (3) and dividing by number of rows (2) to give 1.5. What I want to get to is to have the average but taking into account how often it runs so (100*1) + (10*2) / 110 = 1.09 secs. I've tried various attempts with GROUP BY statements and CURSORS but not getting there.
I'm evidently tackling it the wrong way! Any help welcome :-)
You can do it like this:
select product, date,
round(1.0 * sum([Executions] * [AvgDuration_secs]) / sum([Executions]), 2) result
from tablename
group by product, date
I'm not sure if you want RunInterval or AvgDuration_secs in the 1st sum.
See the demo.
Results:
> product | date | result
> :------ | :---------| :-----
> A | 29/12/2019| 1.09
If you got those results from a query or view that select from some table, then grouped by Product & date & RunInterval.
Then you could simply run a query that on that table that
groups only by the Product & date.
An example:
--
-- Sample data
--
CREATE TABLE sometable
(
Product varchar(30),
ExecutionDatetime datetime,
RunInterval int
);
WITH RCTE_NUMS AS
(
SELECT 1 AS n
UNION ALL
SELECT n+1
FROM RCTE_NUMS
WHERE n < 110
)
INSERT INTO sometable
(Product, ExecutionDatetime, RunInterval)
SELECT
'A' p,
DATEADD(minute,n*12,'2019-12-29 01:00:00') dt,
IIF(n<=100,1,2) ri
FROM RCTE_NUMS
OPTION (MAXRECURSION 1000);
110 rows affected
select
Product,
cast(ExecutionDatetime as date) as [Date],
AVG(1.0*RunInterval) AS AvgDuration_secs,
COUNT(*) AS Executions
from sometable t
group by
Product,
cast(ExecutionDatetime as date)
ORDER BY Product, [Date]
Product | Date | AvgDuration_secs | Executions
:------ | :------------------ | :--------------- | ---------:
A | 29/12/2019 00:00:00 | 1.090909 | 110
db<>fiddle here

SQL Finding sum of rows and returning count of keys

For a database table looking something like this:
id | year | stint | sv
----+------+-------+---
mk1 | 2001 | 1 | 30
mk1 | 2001 | 2 | 20
ml0 | 1999 | 1 | 43
ml0 | 2000 | 1 | 44
hj2 | 1993 | 1 | 70
I want to get the following output:
count
-------
3
with the conditions being count the number of ids that have a sv > 40 for a single year greater than 1994. If there is more than one stint for the same year, add the sv points and see if > 40.
This is what I have written so far but it is obviously not right:
SELECT COUNT(DISTINCT id),
SUM(sv) as SV
FROM public.pitching
WHERE (year > 1994 AND sv >40);
I know the syntax is completely wrong and some of the conditions' information is missing but I'm not familiar enough with SQL and don't know how to properly do the summing of two rows in the same table with a condition (maybe with a subquery?). Any help would be appreciated! (using postgres)
You could use a nested query to get the aggregations, and wrap that for getting the count. Note that the condition on the sum must be in a having clause:
SELECT COUNT(id)
FROM (
SELECT id,
year,
SUM(sv) as SV
FROM public.pitching
WHERE year > 1994
GROUP BY id,
year
HAVING SUM(sv) > 40 ) years
If an id should only count once even it fulfils the condition in more than one year, then do COUNT(distinct id) instead of COUNT(id)
You can try like following using sum and partition by year.
select count( distinct year) from
(
select year, sum(sv) over (partition by year) s
from public.pitching
where year > 1994
) t where s>40
Online Demo

Count results in SQL statement additional row

I am trying to get 3% of total membership which the code below does, but the results are bringing me back two rows one has the % and the other is "0" not sure why or how to get rid of it ...
select
sum(Diabetes_FLAG) * 100 / (select round(count(medicaid_no) * 0.03) as percent
from membership) AS PERCENT_OF_Dia
from
prefinal
group by
Diabetes_Flag
Not sure why it brought back a second row I only need the % not the second row .
Not sure what I am doing wrong
Output:
PERCENT_OF_DIA
1 11.1111111111111
2 0
SELECT sum(Diabetes_FLAG)*100 / (SELECT round(count(medicaid_no)*0.03) as percentt
FROM membership) AS PERCENT_OF_Dia
FROM prefinal
WHERE Diabetes_FLAG = 1
# GROUP BY Diabetes_Flag # as you're limiting by the flag in the where clause, this isn't needed.
Remove the group by if you want one row:
select sum(Diabetes_FLAG)*100/( SELECT round(count(medicaid_no)*0.03) as percentt
from membership) AS PERCENT_OF_Dia
from prefinal;
When you include group by Diabetes_FLAG, it creates a separate row for each value of Diabetes_FLAG. Based on your results, I'm guessing that it takes on the values 0 and 1.
Not sure why it brought back a second row
This is how GROUP BY query works. The group by clause group data by a given column, that is - it collects all values of this column, makes a distinct set of these values and displays one row for each individual value.
Please consider this simple demo: http://sqlfiddle.com/#!9/3a38df/1
SELECT * FROM prefinal;
| Diabetes_Flag |
|---------------|
| 1 |
| 1 |
| 5 |
Usually GROUP BY column is listed in in SELECT clause too, in this way:
SELECT Diabetes_Flag, sum(Diabetes_Flag)
FROM prefinal
GROUP BY Diabetes_Flag;
| Diabetes_Flag | sum(Diabetes_Flag) |
|---------------|--------------------|
| 1 | 2 |
| 5 | 5 |
As you see, GROUP BY display two rows - one row for each unique value of Diabetes_Flag column.
If you remove Diabetes_Flag colum from SELECT clause, you will get the same result as above, but without this column:
SELECT sum(Diabetes_Flag)
FROM prefinal
GROUP BY Diabetes_Flag;
| sum(Diabetes_Flag) |
|--------------------|
| 2 |
| 5 |
So the reason that you get 2 rows is that Diabetes_Flag has 2 distict values in the table.

One one customer from table

I need help using Teradata SQL and I hope you can help.
I have a table that looks like this:
email | article number | discount | price
customer01#test.de | 123 | 15 | 999
customer01#test.de | 456 | 30 | 1999
customer01#test.de | 789 | 30 | 999
From this table I want only row from the customer which has the highest discount and (if there are multiple rows with the same discount) the lowest price.
So in the example above, I only want the 3rd line. How can I write a SQL query for this?
The most flexible way utilizes ROW_NUMBER:
select * from myTable
QUALIFY
ROW_NUMBER()
OVER (PARTITION BY email -- for each customer, otherwise remove it
ORDER BY discount DESC, price ASC) = 1
The simplest way to do this is via a simple select statement ordered by discount (descending) and then by price (ascending).
SELECT * FROM customers
ORDER BY discount DESC, price ASC
LIMIT 1
Use NOT EXISTS to return a row only if there are no other row with a higher discount, or another row with same discount and a lower price.
select *
from tablename t1
where not exists (select 1 from tablename t2
where t2.discount > t1.discount
or (t2.discount = t1.discount and t2.price < t1.price))

calculating percentages for repeat rate cohorts

I wrote a repeat rate query that gives me cohort repeat rate data in the following format:
cohort_join_day | repeat_day | repeat_users
11/15/16 | 0 | 10000
11/15/16 | 1 | 6000
11/15/16 | 2 | 3000
repeat_day 0 represents the total cohort size for that day
I'm trying to skip an excel step and add a forth column with daily repeat rate percentages like so:
cohort_join_day | repeat_day | repeat_users | repeat_percentage
11/15/16 | 0 | 10000 | 100%
11/15/16 | 1 | 6000 | 60%
11/15/16 | 2 | 3000 | 30%
The calculation for this row should be pretty simple e.g.:
day 1 cohort repeat rate on day 6 = (day 1 cohort repeat rate on day 6) / (day 1 cohort repeat rate on day 0)
(day 1 cohort repeat rate on day 0) represents the total size of the cohort
What's the best way to accomplish this?
Here's the daily repeat rate query I wrote:
SELECT
to_char(cohort_join_day, 'YYYY-MM-DD') AS cohort_join_day,
EXTRACT(DAY FROM (current_day - cohort_join_day)) AS repeat_day,
COUNT(DISTINCT unique_id) AS repeat_users
FROM
(
SELECT
auu.unique_id,
date_trunc('day', auu.ds) AS current_day,
date_trunc('day', fsb.ds) AS cohort_join_day
FROM rust.a_unique_users AS auu
JOIN mobile.first_seen_byos AS fsb
ON fsb.unique_id = auu.unique_id
WHERE
auu.os_type = 'iphone_native_app'
AND fsb.ds >= '2016-11-01'
) AS uniques_by_day
WHERE
cohort_join_day <= current_day
GROUP BY
cohort_join_day,
repeat_day;
SQL DEMO
WITH boo AS (
SELECT *
FROM foo -- here go your query
), base as (
SELECT "repeat_users"
FROM boo
WHERE "repeat_day" = 0
)
SELECT boo.cohort_join_day,
boo.repeat_day,
boo.repeat_users,
100* ((boo.repeat_users * 1.0) / base.repeat_users) as repeat_percentage
FROM boo
CROSS JOIN base
OUTPUT
SELECT
*
,(repeat_users * 100.0) /
MAX(CASE WHEN repeat_day = 0 THEN repeat_users END) OVER () as repeat_percentage
FROM
Table
Conditional Aggregation and Window Functions makes this much easier
And if you are trying to do this calucation for every day then PARTITION the window function by cohor_join_day:
SELECT
*
,(repeat_users * 100.0) /
MAX(CASE WHEN repeat_day = 0 THEN repeat_users END) OVER (PARTITION BY cohort_join_day) as repeat_percentage
FROM
Table
MAX(column) OVER () would simply provide the MAX value in the column accross the entire data set.
MAX(column) OVER (PARTITION BY column2) will provide the MAX value in that column for the matching column2 value. You can think of PARTITION BY similar to GROUP BY.
replacing column with a case expression allows you to do conditional aggregation. So for example when you only want the repeat_users when repeat_day = 0 a case expression saying that it will mean it will only return 1 value per partition and ignore the other values because they will be null.
So if you wanted to do the same thing in a straight query without the window function you would do something like this:
SELECT
t.*
,(t.repeat_users * 100.0) / (SELECT t2.repeat_users
FROM
Table t2
WHERE
t.cohort_join_day = t2.cohort_join_day
AND t2.repeat_day = 0) as repeat_percentage
FROM
Table t
And to show you how to do it with Juan Carlo's method when you have multiple days involved you could do it like so:
WITH cte AS (
SELECT
cohort_join_day
,repeat_users
FROM
#Table
WHERE
repeat_day = 0
)
SELECT
t.*
,(t.repeat_users * 100.0) / c.repeat_users as repeat_percentage
FROM
Table t
CROSS JOIN cte c
WHERE
t.cohort_join_day = c.cohort_join_day
If you ever want a running total try something like
SUM(column) OVER (PARTITION BY column2 ORDER BY column3)
definitely get familiar with window functions they are life savers these days.