I'm trying to calculate the percentage of one column over a secondary total column.
I wrote:
create temporary table screenings_count_2018 as
select guid,
datepart(y, screening_screen_date) as year,
sum(case when screening_package = 4 then 1 end) as count_package_4,
sum(case when screening_package = 3 then 1 end) as count_package_3,
sum(case when screening_package = 2 then 1 end) as count_package_2,
sum(case when screening_package = 1 then 1 end) as count_package_1,
sum(case when screening_package in (1, 2, 3, 4) then 1 end) as count_total_packages
from prod.leasing_fact
where year = 2018
group by guid, year;
That table establishes the initial count and total count columns. All columns look correct.
Then, I'm using ratio_to_report to calculate the percentage (referencing this tutorial):
create temporary table screenings_percentage as
select
guid,
year,
ratio_to_report(count_package_1) over (partition by count_total_packages) as percentage_package_1
from screenings_count_2018
group by guid, year,count_package_1,count_total_packages
order by percentage_package_1 desc;
I also tried:
select
guid,
year,
sum(count_package_1/count_total_packages) as percentage_package_1
-- ratio_to_report(count_package_1) over (partition by count_total_packages) as percentage_package_1
from screenings_count_2018
group by guid, year,count_package_1,count_total_packages
order by percentage_package_1 desc;
Unfortunately, percentage_package_1 just returns all null values (this is not correct - I'm expecting percentages). Neither are working.
What am I doing wrong?
Thanks!
Since you are already laid out the columns with components and a total, in creating screenings_count_2018, do you actually need to use ratio_to_report?
select
, guid
, year
, count_package_1/count_total_packages as percentage_package_1
, count_package_2/count_total_packages as percentage_package_2
, count_package_3/count_total_packages as percentage_package_3
, count_package_4/count_total_packages as percentage_package_4
from screenings_count_2018
That should work. NB are you guaranteed to never have count_total_packages be zero? If it can be zero you'll need to handle it. One way is with a case statement.
If you wish for the per-package percentages to appear in a single column, then you can use ratio_to_report -- it is a "window" analytic function and it will be something like this against the original table.
with count_table as (
select guid
, datepart(y, screening_screen_date) as year
, screening_package
, count(1) as count
from prod.leasing_fact
where year = 2018
group by guid
, datepart(y, screening_screen_date)
, screening_package
)
select guid
, year
, screening_package
, ratio_to_report(count) over(partition by guid, year, screening_package) as perc_of_total
from count_table
you will need round(100.0*count_package_1/count_total_packages,1) and so on as you already calculated the subtotal and total
Related
I have a table of transactions from customers who buy credits for our products. Let's just say for this example it's for pizza products. When a customer buys credits a new row is added to the table with their customer_id, amount uploaded, date_time, note describing whether credits were bought or another type of transaction, their previous balance, voucher balance, and condition explaining whether the process was confirmed or not.
I want to make a new table by getting some stats for each user. So I want a table that consists only of one row for each user. Along with the stats I want to have their last balance included. In order to do this I have to get the last row for each user and join it back to itself or at least that was my impression from the answers I saw online. this is my attempt along with some sample data.
CREATE TABLE pizza_transactions
(customer_id int,
amount int,
date_time date,
note varchar,
previous_balance int,
previous_voucher_balance int,
condition1 varchar)
;
INSERT INTO pizza_transactions
(customer_id, amount, date_time, note,previous_balance, previous_voucher_balance, condition1)
VALUES
(1, 10, '2022-01-01','Pizza credits bought',100,50,'confirmed'),
(1, -45, '2022-02-02','something else',110,50, 'processing'),
(2, 70, '2022-05-1','Pizza credits bought',20,5,'confirmed'),
(3, 20, '2022-09-01','Pizza credits bought',10,15,'confirmed'),
(3, 10, '2022-09-02','Pizza credits bought',30,15,'confirmed'),
(3, -15, '2022-09-03','something else',40,15,'processing')
select u.customer_id,
sum(case when note like '%Pizza credits bought%' and condition1 = 'confirmed' then amount else 0 end) as total_bought,
avg(case when note like '%Pizza credits bought%' and condition1 = 'confirmed' then amount end) as avg_bought,
min(case when note like '%Pizza credits bought%' then date_time end) as first_purchased_date,
max(case when note like '%Pizza credits bought%' then date_time end) as last_purchased_date,
max(case when hu.rn1 = 1 then hu.previous_balance end) as last_balance,
max(case when hu.rn1 = 1 then hu.previous_voucher_balance end) as last_voucher_balance
from pizza_transactions as u
inner join (
select row_number() over (partition by customer_id order by date_time desc) as rn1,
previous_balance,
previous_voucher_balance,
customer_id
from pizza_transactions
) as hu
on u.customer_id = hu.customer_id
group by u.customer_id;
This query however returns a table with some right information except for the newly created column called total_bought. After playing around with the query I realized the join was causing duplicate rows and that is why the sum of the amount was wrong. I then tried to get rid of the duplicate rows by changing my SQL query to look like this
select u.customer_id,
sum(case when u.note like '%Pizza credits bought%' and u.condition1 = 'confirmed' then amount else 0 end) as total_bought,
avg(case when u.note like '%Pizza credits bought%' and u.condition1 = 'confirmed' then amount end) as avg_bought,
min(case when u.note like '%Pizza credits bought%' then u.date_time end) as first_purchased_date,
max(case when u.note like '%Pizza credits bought%' then u.date_time end) as last_purchased_date,
max(hu.previous_balance) as last_balance,
max(hu.previous_voucher_balance) as last_voucher_balance
from pizza_transactions as u
left join (select *
from (
select row_number() over (partition by customer_id order by date_time desc) as rn1,
previous_balance,
previous_voucher_balance,
customer_id
from pizza_transactions )t
where t.rn1 = 1
) as hu
on u.customer_id = hu.customer_id;
group by u.customer_id
But this returned ERROR: column "u.customer_id" must appear in the GROUP BY clause or be used in an aggregate function Position: 8. I did however get rid of the duplicate rows.
So my question is how can I aggregate a table and group by users and then add their last balances to this table? I can't seem to figure this out.
I have the table PATIENT_SESSIONS with these fields:
PATIENT_ID,
Session_Date,
Session_Status (Scheduled, Completed, Canceled),
PATIENT_Paid_Date,
Amount
I want from this table to get for each patient_id the last session_date, the average between PATIENT_Paid_Date and Session_Date, the max(Amount) and count of Complete sessions in a single query.
Is it possible?
Guessing PATIENT_ID is what you mean by "for each student_id"?
SELECT
PATIENT_ID
, MAX(Session_Date) AS last_session_date
, AVG(Session_Date - PATIENT_Paid_Date) AS avg_between_dates
-- not sure if this is what you want without seeing sample data
, MAX(Amount) AS max_amount
, SUM(CASE WHEN Session_Status = 'Completed' THEN 1 ELSE 0 END)
AS count_complete_sessions
FROM PATIENT_SESSIONS
GROUP BY PATIENT_ID
Should be possible.
For my example data set in the image below, I need to return only records for the MAX DueDate while Grouping on CompletedCertificationChecklist_Id. But if there is data in the CompletedDate, filter that CompletedCertificationChecklist_Id out.
The report will show me only open records (not completed) with the most recent DueDate
Table Query
SELECT [Id_CertificationHandsOnAssesment]
,[CompletedCertificationChecklist_Id]
,[DueDate]
,[CompletedDate]
FROM [sccCertificationHandsOnAssesments]
if I understand correctly :
select * from (
select *
, row_number() over (partition by CompletedCertificationChecklist_Id order by DueDate desc) rn
, max(case when CompletedDate is not null then 1 else 0 end) over (partition by CompletedCertificationChecklist_Id) IsCompleted
from sccCertificationHandsOnAssesments
) t
where IsCompleted = 0 and rn = 1
I have a set of data containing some fields: month, customer_id, row_num (RANK), and verified_date.
The rank field indicates the first (1) and second (2) purchase of each customer. I would like to know the time difference between first and second purchase for each customer and show only its first month = month where row_num = 1.
https://i.ibb.co/PjJk5Y0/Capture.png
So my expected result is like below image:
https://i.ibb.co/y5Mww7k/Capture-2.png
I'm using StandardSQL in Google Bigquery.
row_num, verified_date
from table
GROUP BY 1, 2```
We can try using a pivot query here, aggregating by the customer_id:
SELECT
MAX(CASE WHEN row_num = 1 THEN month END) AS month,
customer_id,
1 AS row_num,
DATE_DIFF(MAX(CASE WHEN row_num = 2 THEN verified_date END),
MAX(CASE WHEN row_num = 1 THEN verified_date END), DAY) AS difference
FROM yourTable
GROUP BY
customer_id;
I have stuck on sql query to bring the wanted data. I have table as following
I have tried cte table but did not work . I need the get source 'O' if available else 'T' with max sequence as above result table.
select district
, id
, building
, year
, date
, period
, sequence
, source from GetAttData gt with (nolock) where sequence in (select max(sequence) from GetAttData with (nolock)
where district = gt.district
and building = gt.building
and year = gt.year
and id= gt.id
group by district, id, building, year, date, period)
and source = 'O'
select
district, id, building, year, date, period, sequence, source
from (
select district, id, building, year, date, period, sequence, source,
row_number() over(partition by district, id, building, year, date, period
order by case when source = 'O' then 0 else 1 end, sequence desc
) as takeme
) foo
where takeme = 1