Calculate % of total - redshift / sql - sql

I'm trying to calculate the percentage of one column over a secondary total column.
I wrote:
create temporary table screenings_count_2018 as
select guid,
datepart(y, screening_screen_date) as year,
sum(case when screening_package = 4 then 1 end) as count_package_4,
sum(case when screening_package = 3 then 1 end) as count_package_3,
sum(case when screening_package = 2 then 1 end) as count_package_2,
sum(case when screening_package = 1 then 1 end) as count_package_1,
sum(case when screening_package in (1, 2, 3, 4) then 1 end) as count_total_packages
from prod.leasing_fact
where year = 2018
group by guid, year;
That table establishes the initial count and total count columns. All columns look correct.
Then, I'm using ratio_to_report to calculate the percentage (referencing this tutorial):
create temporary table screenings_percentage as
select
guid,
year,
ratio_to_report(count_package_1) over (partition by count_total_packages) as percentage_package_1
from screenings_count_2018
group by guid, year,count_package_1,count_total_packages
order by percentage_package_1 desc;
I also tried:
select
guid,
year,
sum(count_package_1/count_total_packages) as percentage_package_1
-- ratio_to_report(count_package_1) over (partition by count_total_packages) as percentage_package_1
from screenings_count_2018
group by guid, year,count_package_1,count_total_packages
order by percentage_package_1 desc;
Unfortunately, percentage_package_1 just returns all null values (this is not correct - I'm expecting percentages). Neither are working.
What am I doing wrong?
Thanks!

Since you are already laid out the columns with components and a total, in creating screenings_count_2018, do you actually need to use ratio_to_report?
select
, guid
, year
, count_package_1/count_total_packages as percentage_package_1
, count_package_2/count_total_packages as percentage_package_2
, count_package_3/count_total_packages as percentage_package_3
, count_package_4/count_total_packages as percentage_package_4
from screenings_count_2018
That should work. NB are you guaranteed to never have count_total_packages be zero? If it can be zero you'll need to handle it. One way is with a case statement.
If you wish for the per-package percentages to appear in a single column, then you can use ratio_to_report -- it is a "window" analytic function and it will be something like this against the original table.
with count_table as (
select guid
, datepart(y, screening_screen_date) as year
, screening_package
, count(1) as count
from prod.leasing_fact
where year = 2018
group by guid
, datepart(y, screening_screen_date)
, screening_package
)
select guid
, year
, screening_package
, ratio_to_report(count) over(partition by guid, year, screening_package) as perc_of_total
from count_table

you will need round(100.0*count_package_1/count_total_packages,1) and so on as you already calculated the subtotal and total

Related

using case when in sum function is returning wrong results for an aggregated table that is joined to itself in SQL

I have a table of transactions from customers who buy credits for our products. Let's just say for this example it's for pizza products. When a customer buys credits a new row is added to the table with their customer_id, amount uploaded, date_time, note describing whether credits were bought or another type of transaction, their previous balance, voucher balance, and condition explaining whether the process was confirmed or not.
I want to make a new table by getting some stats for each user. So I want a table that consists only of one row for each user. Along with the stats I want to have their last balance included. In order to do this I have to get the last row for each user and join it back to itself or at least that was my impression from the answers I saw online. this is my attempt along with some sample data.
CREATE TABLE pizza_transactions
(customer_id int,
amount int,
date_time date,
note varchar,
previous_balance int,
previous_voucher_balance int,
condition1 varchar)
;
INSERT INTO pizza_transactions
(customer_id, amount, date_time, note,previous_balance, previous_voucher_balance, condition1)
VALUES
(1, 10, '2022-01-01','Pizza credits bought',100,50,'confirmed'),
(1, -45, '2022-02-02','something else',110,50, 'processing'),
(2, 70, '2022-05-1','Pizza credits bought',20,5,'confirmed'),
(3, 20, '2022-09-01','Pizza credits bought',10,15,'confirmed'),
(3, 10, '2022-09-02','Pizza credits bought',30,15,'confirmed'),
(3, -15, '2022-09-03','something else',40,15,'processing')
select u.customer_id,
sum(case when note like '%Pizza credits bought%' and condition1 = 'confirmed' then amount else 0 end) as total_bought,
avg(case when note like '%Pizza credits bought%' and condition1 = 'confirmed' then amount end) as avg_bought,
min(case when note like '%Pizza credits bought%' then date_time end) as first_purchased_date,
max(case when note like '%Pizza credits bought%' then date_time end) as last_purchased_date,
max(case when hu.rn1 = 1 then hu.previous_balance end) as last_balance,
max(case when hu.rn1 = 1 then hu.previous_voucher_balance end) as last_voucher_balance
from pizza_transactions as u
inner join (
select row_number() over (partition by customer_id order by date_time desc) as rn1,
previous_balance,
previous_voucher_balance,
customer_id
from pizza_transactions
) as hu
on u.customer_id = hu.customer_id
group by u.customer_id;
This query however returns a table with some right information except for the newly created column called total_bought. After playing around with the query I realized the join was causing duplicate rows and that is why the sum of the amount was wrong. I then tried to get rid of the duplicate rows by changing my SQL query to look like this
select u.customer_id,
sum(case when u.note like '%Pizza credits bought%' and u.condition1 = 'confirmed' then amount else 0 end) as total_bought,
avg(case when u.note like '%Pizza credits bought%' and u.condition1 = 'confirmed' then amount end) as avg_bought,
min(case when u.note like '%Pizza credits bought%' then u.date_time end) as first_purchased_date,
max(case when u.note like '%Pizza credits bought%' then u.date_time end) as last_purchased_date,
max(hu.previous_balance) as last_balance,
max(hu.previous_voucher_balance) as last_voucher_balance
from pizza_transactions as u
left join (select *
from (
select row_number() over (partition by customer_id order by date_time desc) as rn1,
previous_balance,
previous_voucher_balance,
customer_id
from pizza_transactions )t
where t.rn1 = 1
) as hu
on u.customer_id = hu.customer_id;
group by u.customer_id
But this returned ERROR: column "u.customer_id" must appear in the GROUP BY clause or be used in an aggregate function Position: 8. I did however get rid of the duplicate rows.
So my question is how can I aggregate a table and group by users and then add their last balances to this table? I can't seem to figure this out.

Oracle SQL Count and Avg in the same query

I have the table PATIENT_SESSIONS with these fields:
PATIENT_ID,
Session_Date,
Session_Status (Scheduled, Completed, Canceled),
PATIENT_Paid_Date,
Amount
I want from this table to get for each patient_id the last session_date, the average between PATIENT_Paid_Date and Session_Date, the max(Amount) and count of Complete sessions in a single query.
Is it possible?
Guessing PATIENT_ID is what you mean by "for each student_id"?
SELECT
PATIENT_ID
, MAX(Session_Date) AS last_session_date
, AVG(Session_Date - PATIENT_Paid_Date) AS avg_between_dates
-- not sure if this is what you want without seeing sample data
, MAX(Amount) AS max_amount
, SUM(CASE WHEN Session_Status = 'Completed' THEN 1 ELSE 0 END)
AS count_complete_sessions
FROM PATIENT_SESSIONS
GROUP BY PATIENT_ID
Should be possible.

SQL query to return all columns of max date for group by but exclude records where a different column for that group is not null

For my example data set in the image below, I need to return only records for the MAX DueDate while Grouping on CompletedCertificationChecklist_Id. But if there is data in the CompletedDate, filter that CompletedCertificationChecklist_Id out.
The report will show me only open records (not completed) with the most recent DueDate
Table Query
SELECT [Id_CertificationHandsOnAssesment]
,[CompletedCertificationChecklist_Id]
,[DueDate]
,[CompletedDate]
FROM [sccCertificationHandsOnAssesments]
if I understand correctly :
select * from (
select *
, row_number() over (partition by CompletedCertificationChecklist_Id order by DueDate desc) rn
, max(case when CompletedDate is not null then 1 else 0 end) over (partition by CompletedCertificationChecklist_Id) IsCompleted
from sccCertificationHandsOnAssesments
) t
where IsCompleted = 0 and rn = 1

How to get the difference between (multiple) two different rows?

I have a set of data containing some fields: month, customer_id, row_num (RANK), and verified_date.
The rank field indicates the first (1) and second (2) purchase of each customer. I would like to know the time difference between first and second purchase for each customer and show only its first month = month where row_num = 1.
https://i.ibb.co/PjJk5Y0/Capture.png
So my expected result is like below image:
https://i.ibb.co/y5Mww7k/Capture-2.png
I'm using StandardSQL in Google Bigquery.
row_num, verified_date
from table
GROUP BY 1, 2```
We can try using a pivot query here, aggregating by the customer_id:
SELECT
MAX(CASE WHEN row_num = 1 THEN month END) AS month,
customer_id,
1 AS row_num,
DATE_DIFF(MAX(CASE WHEN row_num = 2 THEN verified_date END),
MAX(CASE WHEN row_num = 1 THEN verified_date END), DAY) AS difference
FROM yourTable
GROUP BY
customer_id;

SQL Server 2008 multiple column filter

I have stuck on sql query to bring the wanted data. I have table as following
I have tried cte table but did not work . I need the get source 'O' if available else 'T' with max sequence as above result table.
select district
, id
, building
, year
, date
, period
, sequence
, source from GetAttData gt with (nolock) where sequence in (select max(sequence) from GetAttData with (nolock)
where district = gt.district
and building = gt.building
and year = gt.year
and id= gt.id
group by district, id, building, year, date, period)
and source = 'O'
select
district, id, building, year, date, period, sequence, source
from (
select district, id, building, year, date, period, sequence, source,
row_number() over(partition by district, id, building, year, date, period
order by case when source = 'O' then 0 else 1 end, sequence desc
) as takeme
) foo
where takeme = 1