How to calculate a percentage of a percentage in a single query - apache-spark-sql

Using databricks SQL I created a table that looks like this:
tablename
RunDate
Quality
qaflag
Status
blah
2022-06-02
bronze
1
Passed
blah1
2022-06-02
silver
-1
failed
I can write a query that will calculate the percentage of each Quality type in the table
e.i.
SELECT Quality,
COUNT(Status)*100/(SELECT COUNT(Status) FROM test) as Percentage
FROM test
GROUP BY Quality
That will give me this output:
Quality
PercentageTotal
Silver
64
Gold
3
Bronze
33
I would like to add a percentage of each Quality that passed or failed as well.
Basically, I need to try to get it to look like this:
Quality
PercentageTotal
PercentagePassed
PercentageFailed
Silver
64
99
1
Gold
3
87
3
Bronze
33
60
40
What the table is saying is:
Silver Tables constitute 64 percent of all tables tested, and 99 percent of them passed and 1 percent failed. (and so on for the other ones)
I am stuck at trying to figure out how to calculate the PercentagePassed/Failed can anyone help?

Switch to a window function with distinct to eliminate your group by needs, then use a case expression:
SELECT distinct Quality
, count(Status) over (partition by Quality) * 100 / count(status) over (partition by null) PercentageTotal
, count(case when Status = 'Passed' then 1 end) over (partition by Quality) * 100 / count(status) over (partition by Quality) PercentagePassed
, count(case when Status = 'Failed' then 1 end) over (partition by Quality) * 100 / count(status) over (partition by Quality) PercentageFailed
FROM test
You may want to consider switching to another data type that will support decimals, but I've left it as is for your example.

Related

select top 5 max records in "High" column and 5 min records from "Low" Column in same query and from same table partitioned by stock name

we have 6 months historic data and need to find out what is the top 2 max highs and top 2 min lows per each stock for all the stocks. Below is the sample data
Stock High Low Date prevclose ....
------------------------------------
ABB 100 75 29/12/2019 90
ABB 83 50 30/12/2019 87
ABB 73 45 30/12/2019 87
infy 1000 675 29/12/2019 900
infy 830 650 30/12/2019 810
infy 730 645 30/12/2019 788
I tried the following queries, but not getting the expected results.. I need results such as top 2 high rows and top 3 min low in one result set. I tried below query but no luck..
select * into SRTrend from (
--- Resistance
select * from (Select top (5) with ties 'H' as 'Resistance', RowN=Row_Number() over(partition by name order by High desc),* from Historic
order by Row_Number() over(partition by name order by High desc))B
Union all
--Support
select * from (Select top (5) with ties 'L' as 'Support', RowN=Row_Number() over(partition by name order by Low asc),* from Historic
--where name='ABB'
order by Row_Number() over(partition by name order by Low asc))C
)D
PS: Hurdles which I faced is when I tried to export data to another table, getting very messed up results instead of getting top 2 max(highs) and top3 min(lows), I am getting single rows.
You can use rank() as follows:
select *
from (
select
t.*,
rank() over(partition by stock order by high desc) rn_high,
rank() over(partition by stock order by low asc) rn_low
from mytable t
) t
where rn_high <= 2 or rn_low <= 3
The inner query ranks records twice, by descending high and ascending low within groups of stocks. Then the outer query filters on top 2 and bottom 3 per stock (ties included).

SQL counting with condition

If I have a table called Buildings.
Room_No Bldg Capacity
112 SCEN 23
242 JBHT 25
542 SCEN 4
324 JBHT 24
What I want is to print out the Bldg name and the total number of rooms that have a capacity more than 20 in each building. So it is supposed to look like:
Bldg Total
SCEN 1
JBHT 2
Am I going on the right track:
Select Bldg, Count(Capacity > 20) as Total from Buildings Group By Total Desc
You could use CASE:
Select Bldg, Count(CASE WHEN Capacity > 20 THEN 1 END) as Total
from Buildings
Group By Bldg
ORDER BY Total DESC;
If you are using Postgresql you could rewrite it as:
Select Bldg, Count(1) FILTER(WHERE Capacity>20) as Total
from Buildings
Group By Bldg
ORDER BY Total DESC;
Rextester Demo
The other answers seem overly complicated for this problem. The solution is rather simple:
SELECT Bldg, COUNT(*) AS count
FROM Buildings
WHERE Capacity > 20
GROUP BY Bldg
Here is the Fiddle: http://sqlfiddle.com/#!9/308a6/1

SQL Query to get sums among multiple payments which are greater than or less than 10k

I am trying to write a query to get sums of payments from accounts for a month. I have been able to get it for the most part but I have hit a road block. My challenge is that I need a count of the amount of payments that are either < 10000 or => 10000. The business rules are that a single payment may not exceed 10000 but there can be multiple payments made that can total more than 10000. As a simple mock database it might look like
ID | AccountNo | Payment
1 | 1 | 5000
2 | 1 | 6000
3 | 2 | 5000
4 | 3 | 9000
5 | 3 | 5000
So the results I would expect would be something like
NumberOfPaymentsBelow10K | NumberOfPayments10K+
1 | 2
I would like to avoid doing a function or stored procedure and would prefer a sub query.
Any help with this query would be greatly appreciated!
I suggest avoiding sub-queries as much as possible because it hits the performance, specially if you have a huge amount of data, so, you can use something like Common Table Expression instead. You can do the same by using:
;WITH CTE
AS
(
SELECT AccountNo, SUM(Payment) AS TotalPayment
FROM Payments
GROUP BY AccountNo
)
SELECT
SUM(CASE WHEN TotalPayment < 10000 THEN 1 ELSE 0 END) AS 'NumberOfPaymentsBelow10K',
SUM(CASE WHEN TotalPayment >= 10000 THEN 1 ELSE 0 END) AS 'NumberOfPayments10K+'
FROM CTE
You can get the totals per account using SUM and GROUP BY...
SELECT AccountNo, SUM(Payment) AS TotPay
FROM payments
GROUP BY AccountNo
You can use that result to count the number over 10000
SELECT COUNT(*)
FROM (
SELECT AccountNo, SUM(Payment) AS TotPay
FROM payments
GROUP BY AccountNo
)
WHERE TotPay>10000
You can get the the number over and the number under in a single query if you want but that's a but more complicated:
SELECT
COUNT(CASE WHEN TotPay<=10000 THEN 1 END) AS Below10K,
COUNT(CASE WHEN TotPay> 10000 THEN 1 END) AS Above10K
FROM (
SELECT AccountNo, SUM(Payment) AS TotPay
FROM payments
GROUP BY AccountNo
)

SQL Get percent of bad records from total

i am relatively new to SQL. Each employee access an account for testing with a tech, sometimes it's a good attempt, sometimes it's bad, so I need to calculate the percentage of the bad attempts mostly, my report should look something like this:
SELECT
employee, event, total, percentage
FROM my_table
employee | event | total | percentage|
user1 | good | 50 | 50% |
user1 | bad | 50 | 50% |
Calculate the total in a subquery and then JOIN to calculate percentage on each row.
SELECT employee, event, COUNT(*), COUNT(*) * 100.0 / t.total as percentage
FROM my_table
JOIN (SELECT employee, count(*) total
FROM my_table
GROUP BY employee) T
ON my_table.employee = t.employee
GROUP BY employee, event
Try something like this calculate the bad event percentage for each employee
select employee,(sum(case when event = 'bad' then 1 else 0 end) / count(*)) * 100
From Yourtable
Group by employee

Grouping of Similar data by amount in Oracle

I have a txn table with columns ac_id, txn_amt. It will store the data txn amounts along with account ids. Below is example of data
AC_ID TXN_AMT
10 1000
10 1000
10 1010
10 1030
10 5000
10 5010
10 10000
20 32000
20 32200
20 5000
I want to write a query in such a way that all the amounts which are within 10% range of the previous amounts should be grouped together. Output should be something like this:
AC_ID TOTAL_AMT TOTAL_CNT GROUP
10 4040 4 1
10 10010 2 2
20 64200 2 3
20 5000 1 4
I tried with LAG function but still clueless. This is the code snippet I tried:
select ac_id, txn_amt, round((((txn_amt - lag(txn_amt, 1) over (partition by ac_id order by ac_id, txn_amt))/txn_amt)*100,2) as amt_diff_pct from txn;
Any clue or help will be highly appreciated.
If by previous you mean "the largest amount less than", then you can do this. You can find where the gaps are (i.e. larger than a 10% difference). Then you can assign a group by counting the number of gaps:
select ac_id, sum(txn_amt) as total_amt, count(*) as total_cnt, grp
from (select t.*,
sum(case when prev_txn_amt * 1.1 > txn_amt then 0 else 1 end) over
(partition by ac_id order by txn_amt) as grp
from (select t.*,
lag(txn_amt) over (partition by ac_id order by txn_amt) as prev_txn_amt
from txn t
) t
) t
group by ac_id, grp;