Calculating percentage by group using Teradata - sql

I'm trying to create a table that displays the percentage of counts per state dependent on the indicator.
Here's an example of the dataset I'm using to create my new table.
+-------------+-------+-------+
| Indicator | State | Count |
+-------------+-------+-------+
| Registered | CA | 25 |
| Registered | FL | 12 |
| Total | CA | 50 |
| Total | FL | 36 |
+-------------+-------+-------+
I'm trying to create a new table that would have a Percentage for each corresponding row like this:
+-------------+-------+-------+------------+
| Indicator | State | Count | Percentage |
+-------------+-------+-------+------------+
| Registered | CA | 25 | 50 |
| Registered | FL | 12 | 33.3 |
| Total | CA | 50 | . |
| Total | FL | 36 | . |
+-------------+-------+-------+------------+
So far, i've tried doing the below query:
select indicator, state, count
, case when (select count from table where indicator='Registered') * 100 / (select count from table where indicator='Total')
when indicator = 'Total' then . end as Percentage
from table;
This doesn't work because I get an error: "Subquery evaluated more than one row." I'm guessing its because I'm not taking into account the state in the case when statement, but i'm not sure as to how I would go about that.
What would be the best way to do this?

Just join the table back with itself.
select a.indicator, a.state, a.count
, case when (indicator='Total') then null
else 100 * a.count/b.count
end as Percentage
from table a
inner join (select state,count from table where indicator='Total') b
on a.state = b.state
;

You can use window functions:
select t.*,
(case when indicator <> 'Total'
then count * 100.0 / sum(case when indicator = 'Total' then indicator end) over (partition by state)
end) as percentage
from t;

Related

SQL how to calculate median not based on rows

I have a sample of cars in my table and I would like to calculate the median price for my sample with SQL. What is the best way to do it?
+-----+-------+----------+
| Car | Price | Quantity |
+-----+-------+----------+
| A | 100 | 2 |
| B | 150 | 4 |
| C | 200 | 8 |
+-----+-------+----------+
I know that I can use percentile_cont (or percentile_disc) if my table is like this:
+-----+-------+
| Car | Price |
+-----+-------+
| A | 100 |
| A | 100 |
| B | 150 |
| B | 150 |
| B | 150 |
| B | 150 |
| C | 200 |
| C | 200 |
| C | 200 |
| C | 200 |
| C | 200 |
| C | 200 |
| C | 200 |
| C | 200 |
+-----+-------+
But in the real world, my first table has about 100 million rows and the second table should have about 3 billiard rows (and moreover I don't know how to transform my first table into the second).
Here is a way to do this in sql server
In the first step i do is calculate the indexes corresponding to the lower and upper bounds for the median (if we have odd number of elements then the lower and upper bounds are same else its based on the x/2 and x/2+1th value)
Then i get the cumulative sum of the quantity and the use that to choose the elements corresponding to the lower and upper bounds as follows
with median_dt
as (
select case when sum(quantity)%2=0 then
sum(quantity)/2
else
sum(quantity)/2 + 1
end as lower_limit
,case when sum(quantity)%2=0 then
(sum(quantity)/2) + 1
else
sum(quantity)/2 + 1
end as upper_limit
from t
)
,data
as (
select *,sum(quantity) over(order by price asc) as cum_sum
from t
)
,rnk_val
as(select *
from (
select price,row_number() over(order by d.cum_sum asc) as rnk
from data d
join median_dt b
on b.lower_limit<=d.cum_sum
)x
where x.rnk=1
union all
select *
from (
select price,row_number() over(order by d.cum_sum asc) as rnk
from data d
join median_dt b
on b.upper_limit<=d.cum_sum
)x
where x.rnk=1
)
select avg(price) as median
from rnk_val
+--------+
| median |
+--------+
| 200 |
+--------+
db fiddle link
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=c5cfa645a22aa9c135032eb28f1749f6
This looks right on few results, but try on a larger set to double-check.
First create a table which has the total for each car (or use CTE or sub-query), your choice. I'm just creating a separate table here.
create table table2 as
(
select car,
quantity,
price,
price * quantity as total
from table1
)
Then run this query, which looks for the price group that falls in the middle.
select price
from (
select car, price,
sum(total) over (order by car) as rollsum,
sum(total) over () as total
from table2
)a
where rollsum >= total/2
Correctly returns a value of $200.

Getting a distinct value from one column if all rows matches a certain criteria

I'm trying to find a performant and easy-to-read query to get a distinct value from one column, if all rows in the table matches a certain criteria.
I have a table that tracks e-commerce orders and whether they're delivered on time, contents and schema as following:
> select * from orders;
+----+--------------------+-------------+
| id | delivered_on_time | customer_id |
+----+--------------------+-------------+
| 1 | 1 | 9 |
| 2 | 0 | 9 |
| 3 | 1 | 10 |
| 4 | 1 | 10 |
| 5 | 0 | 11 |
+----+--------------------+-------------+
I would like to get all distinct customer_id's which have had all their orders delivered on time. I.e. I would like an output like this:
+-------------+
| customer_id |
+-------------+
| 10 |
+-------------+
What's the best way to do this?
I've found a solution, but it's a bit hard to read and I doubt it's the most efficient way to do it (using double CTE's):
> with hits_all as (
select memberid,count(*) as count from orders group by memberid
),
hits_true as
(select memberid,count(*) as count from orders where hit = true group by memberid)
select
*
from
hits_true
inner join
hits_all on
hits_all.memberid = hits_true.memberid
and hits_all.count = hits_true.count;
+----------+-------+----------+-------+
| memberid | count | memberid | count |
+----------+-------+----------+-------+
| 10 | 2 | 10 | 2 |
+----------+-------+----------+-------+
You use group by and having as follows:
select customer_id
from orders
group by customer_id
having sum(delivered_on_time) = count(*)
This works because an ontime delivery is identified by delivered_on_time = 1. So you can just ensure that the sum of delivered_on_time is equal to the number of records for the customer.
You can use aggregation and having:
select customer_id
from orders
group by customer_id
having min(delivered_on_time) = max(delivered_on_time);

Transposing lines containing Text to columns

I have a table just like this one:
+----+---------+-------------+------------+
| ID | Period | Total Units | Source |
+----+---------+-------------+------------+
| 1 | Past | 400 | Competitor |
| 1 | Present | 250 | PAWS |
| 2 | Past | 3 | BP |
| 2 | Present | 15 | BP |
+----+---------+-------------+------------+
And I'm trying to transpose the lines into columns, so that for each ID, I have one unique line that compares past and present numbers and attributes. Like following :
+----+------------------+---------------------+-------------+----------------+
| ID | Total Units Past | Total Units Present | Source Past | Source Present |
+----+------------------+---------------------+-------------+----------------+
| 1 | 400 | 250 | Competitor | PAWS
|
| 2 | 3 | 15 | BP | BP |
+----+------------------+---------------------+-------------+----------------+
Transposing the total units is not a problem, as I use a SUM(CASE WHEN Period = Past THEN Total_Units ELSE 0 END) AS Total_Units.
However I don't know how to proceed with text columns. I've seen some pivot and unpivot clause used but they all use an aggregate function at some point.
You can do conditional aggregation :
select id,
sum(case when period = 'past' then units else 0 end) as unitspast,
sum(case when period = 'present' then units else 0 end) as unitpresent,
max(case when period = 'past' then source end) as sourcepast,
max(case when period = 'present' then source end) as sourcepresent
from table t
group by id;
Assuming you only have two rows per ID, you could also join:
Select a.ID, a.units as UnitsPast, a.source as SourcePast
, b.units as UnitsPresent, b.source as SourcePresent
from MyTable a
left join MyTable b
on a.ID = b.ID
and b.period = 'Present'
where a.period = 'Past'

An SQL query that combines aggregate and non-aggregate values in one row

The following query gives me the information that I need but I want it to take it just a step further. In the table at the bottom (only showing a subset of the fields), I want to group by cust_line in an unusual way (at least to me it's unusual).
Let's look at the items with a cust_line of 2 as an example. I would like these to be represented by one line not 5. For this line, I would like to select all the fields except for the price field where the cust_part = "GROUPINVC". For the total field I would like it to be 'sum(total) as new_total' and for the price, I would like it to be new_total / qty_invoiced, where qty_invoiced is the value on the line where cust_part = "GROUPINV".
Is what I am asking for completely ridiculous? Is it even possible? I'm not advanced at SQL so it may also be easy and I just don't know how to approach it. I thought of using 'partition by' but I couldn't imagine how I would get it to work as I figured it would still return 5 rows where I only want 1.
I've also looked at these questions with similar titles but not really what I am looking for:
SQL query that returns aggregate AND non aggregate results
Combined aggregated and non-aggregate query in SQL
SELECT L.CUST_LINE, I.LINE_NO, I.ORDER_NO, I.STAGE, I.ORDER_LINE_POS, I.CUST_PART,
I.LINE_ITEM_NO, I.QTY_INVOICED, I.CUST_DESC, I.DESCRIPTION, I.SALE_UNIT_PRICE, I.PRICE_TOTAL,
I.INVOICE_NO, I.CUSTOMER_PO_NO, I.ORDER_NO, I.CUSTOMER_NO, I.CATALOG_DESC, I.ORDER_LINE_NOTES
FROM
(SELECT CUST_LINE, ORDER_NO, LINE_NO
FROM CUSTOMER_ORDER_LINE
GROUP BY CUST_LINE, ORDER_NO, LINE_NO
) L
INNER JOIN CUSTOMER_ORDER_IVC_REP I
ON I.ORDER_NO = L.ORDER_NO
WHERE RESULT_KEY = 999999
AND I.LINE_NO = L.LINE_NO
ORDER BY L.CUST_LINE;
| cust_line | line_no | cust_part | qty_invoiced | cust_desc | price | total |
| 1 | 4 | ... | 1 | ... | 55 | 55 |
| 2 | 1 | GROUPINV | 1 | some part | 0 | 0 |
| 2 | 6 | ... | 3 | ... | 0 | 0 |
| 2 | 2 | ... | 1 | ... | 0 | 0 |
| 2 | 3 | ... | 1 | ... | 0 | 0 |
| 2 | 7 | ... | 2 | ... | 10 | 20 |
| 3 | 7 | ... | 1 | ... | 67 | 67 |
You can use an analytic function to calculate a total over multiple rows of a result set, then filter out the rows you don't want.
Leaving out all the extra columns for sake of brevity:
SELECT cust_line, qty_invoiced, order_total/qty_invoiced AS price
FROM (
SELECT l.cust_line, qty_invoiced,
SUM(total) OVER (PARTITION BY l.cust_line) AS order_total,
COUNT(cust_line) OVER (PARTITION BY l.cust_line) AS group_count
FROM
(SELECT CUST_LINE, ORDER_NO, LINE_NO
FROM CUSTOMER_ORDER_LINE
GROUP BY CUST_LINE, ORDER_NO, LINE_NO
) L
INNER JOIN CUSTOMER_ORDER_IVC_REP I
ON I.ORDER_NO = L.ORDER_NO
WHERE RESULT_KEY = 999999
AND I.LINE_NO = L.LINE_NO
)
WHERE ( cust_part = 'GROUPINV' OR group_count = 1 )
ORDER BY cust_line
I am guessing on what you want in the PARTITION BY clause; this is essentially a GROUP BY that applies only to the SUM function. Not sure if you might also want order_no in the partition.
The trick is to select all the rows in the inner query, applying SUM across them all; then filter out the rows you are not interested in in the outermost query.

Inserting results of two sums together to a table

From my previous question, I've managed to get things to work using the Microsoft SQL Server and importing the excel file to a new database. Now, my question is dealing with writing the right sql commands. I'm trying to sum up the same column of a table twice using different criteria, and then input those numbers to a different table. I know how to insert a sum to a table, but I wonder how I can insert two sums that came from the same column simultaneously (since each time I insert, a new row is created) to the same row of a table.
Additional question: what is the way I should organize my results are dependent on values from a third table? Sample data as follows.
Some sample data:
DeptID Department
15 Eng
16 Eng
17 Mkt
18 Mkt
| Person | DeptID | Type | Number |
+--------+------------+------+---------+
| A | 15 | p1 | 1 |
| B | 18 | p2 | 5 |
| C | 16 | p2 | 10 |
| D | 17 | p1 | 7 |
| E | 18 | p1 | 11 |
| F | 16 | p2 | 12 |
So the result I should give is as such:
| Department | Sum of p1 | Sum of p2 |
+------------+------------+-----------+
| Eng | 1 | 22 |
| Mkt | 18 | 5 |
What I've tried is as follows:
select sum(Amount) as engsump1 from Sheet1$
where Department = 'Eng' and Type = 'p1'
select sum(Amount) as engsump2 from Sheet1$
where Department = 'Eng' and Type = 'p2'
select sum(Amount) as mktsump1 from Sheet1$
where Department = 'Mkt' and Type = 'p1'
select sum(Amount) as mktsump2 from Sheet1$
where Department = 'Mkt' and Type = 'p2'
Run it once and after I see the results, perform the insert into function. I'm just wondering if I can do this two steps in one step.
To sum the 'number' column in each group, you can use the SUM() function; Just put a case statement inside there, so you can sum the cases when type is p1, and sum the cases when type is p2.
To separate the departments, just use a GROUP BY clause. Try this:
SELECT department,
SUM(CASE WHEN type = 'p1' THEN number ELSE 0 END) AS totalP1,
SUM(CASE WHEN type = 'p2' THEN number ELSE 0 END) AS totalP2
FROM myTable
GROUP BY department;
Here is an SQL Fiddle example.