To calculate % in hive query - hive

with the below query i can able to get the approved transaction for per client on per day basis.
select
q1.client_id,
q1.receive_day,
count(q1.client_id) as cnt
from
(select * from sale where response=00) q1
group by
q1.client_id, q1.receive_day
I want to get the approval %, i.e. the approval_per, is 100*(count(client_id)/response), while count(client_id) is the number of clients for the approved transaction.
Response is a count of whole response including all the values (approved and not approved) . I can get the response by select count(response) from sale , but how to make it here for calculating % in the same query is the problem am facing now. I tried out some options as it didn't work , reached user group.
so my expected output format is client_id,receive_day,count(client_id),approval_per.
Any of your help are really appreciated
Thanks & Regards,
dti

You could simply add another subquery which calculates that count. Also for the query you have now it seems unnecessary to have that subquery (q1).
I also make the assumption that you want your percentage should be the percentage of transactions each client is responsible for. I.e. number of transactions for given client divided by number of total transactions.
SELECT
s.client_id,
s.receive_day,
count(s.client_id) as cnt,
100 * (count(s.client_id) / q1.total)
FROM
sale s,
(select count(*) total from sale) q1
WHERE
response = 00
GROUP BY
s.client_id, s.receive_day

Related

Calculating the percentage of different types of customer feedback in each quarter

The problem statement is: I have a table (order_t) which has customer feedback (one column) and quarter number (as another column).
Using a CTE, I need to calculate the percentage of number of customer feedback in each category as well as the total number of customer feedback in each quarter.
After this happens, I need the percentage of different types of customer feedback (like good, bad, ok, very good, very bad) but using CTE.
How can I solve this statement?
I try to solve customer feedback as
WITH total_feedback AS
(
SELECT *
COUNT(CUSTOMER_FEEDBACK), QUARTER NUMBER
FROM
table1
GROUP BY
2
)
But I'm unable to calculate the first half portion, i.e. percentage of different types of customer feedback in each quarter using CTE.
How can I do that?
Find the file of the data
What you could do, and I'll keep the example as close to the code you provided as possible, is the following - using 2 CTE's:
WITH total_feedback AS (
SELECT COUNT(CUSTOMER_FEEDBACK) AS total_feedback, QUARTER_NUMBER
FROM table1
GROUP BY 2
),
category_feedback AS (
SELECT COUNT(CUSTOMER_FEEDBACK) AS feedback_count, CUSTOMER_FEEDBACK, QUARTER_NUMBER
FROM table1
GROUP BY 2, 3
)
SELECT
category_feedback.CUSTOMER_FEEDBACK,
category_feedback.QUARTER_NUMBER,
(feedback_count / total_feedback.total_feedback) * 100 AS feedback_percentage
FROM category_feedback
INNER JOIN total_feedback
ON category_feedback.QUARTER_NUMBER = total_feedback.QUARTER_NUMBER

SQL inner query trying to use alias in the where clause

I have a join of 2 tables, that represent a list of payments that contracts have done.
Sample Query: (https://www.db-fiddle.com/f/iXGxgDTopsBBgXGUJsXpa/13)
That sample data consist of 5 contracts, some of them are running behind in payments, so I want to get the list of contracts that havent done a payment in the last 7 days, considering the current date to be: 9 of may of 2021.
For the example, contracts 121, 300, 321 and 400 have made a payment in the last 7 days, so any records from them should not appear in the final query. However:
Contract 321 despite of a payment in the last 7 days they had a reversal that was the total of the credits made by them in the last 7 days, this is equivalent to 0 payments, so I want this contract to appear in my final query.
Contract 121, I dont want to appear in the final result becuase despite of the reversal the is a total credits of 20 (100 credit - 80 reversal)
Contract 400 I want to appear in my results because one of the rows has as codename Special Delete.
In the fiddle I was able to create the query that Filters all records with payments in the last 7 days, but I need help adding the extra filtering:
If any contract that appear there if the sum of the credits and debits is 0, then appear it should appear in the final result (as it is like no payments have been send) this will be the case for the contract 321.
If the credits are positive but one of the rows has as codename "SpecialDelete" then display it in the final result (this is the case for the contract 400)
Total Debits against total credits greater than 0
I will be using this query with AWS Athena
I am guessing the part I need to ammend is (WHERE Payments.ContractID NOT IN ....):
SELECT PaymentID,
Payments.ContractID,
PaymentDate,
Credit,
Debit,
Code,
CodeName,
amount,
city
FROM Payments
LEFT JOIN Info ON Info.ContractID = Payments.ContractID
WHERE Payments.ContractID NOT IN (
SELECT Payments.ContractID
FROM Payments
WHERE PaymentDate >= '20210501'
)
ORDER BY PaymentDate DESC
;
you guess is correct, Here is what you need (if I didn't miss anything):
SELECT p.ContractID,PaymentDate,Credit,Debit,Code,CodeName,amount,city
FROM Payments p
LEFT JOIN Info ON Info.ContractID = p.ContractID
WHERE p.ContractID NOT IN (
SELECT p2.ContractID
FROM Payments p2
WHERE p2.PaymentDate >= '20210501'
group by p2.ContractID
having sum(p2.credit - p2.debit) > 0
) or codename = 'Special Delete'
ORDER BY PaymentDate DESC;

query the average requests completed in a year\month\week\day

I have a query built below on the total amount of requests created in a year grouped by the task name (c.status_text) and the year they were worked(a.finish_date). In the SR_WORKFLOW table in my query below, there is a field called 'assigned_to' which is the employee number of the user that finished a ticket. I need to find the average number of requests our employees complete in a year/month, etc grouped by the task name (c.status_text). If I can figure out how to write this query for a year, I should be able to do the rest of them. So essentially what I am doing is trying to count the number of employees that have worked a specific task and then get the average number of tasks a person worked. Sorry if that sounds confusing. If so I will try to explain better.
I am able to get the totals of each employee id but that is over 5000 lines of results and does not give me an average.
select count(*) as [Total SRs],
c.status_text,
DATEPART(YEAR, a.finish_date) as [Year]
from SR_WORKFLOW a
inner join SR_SERVICE_REQUEST b on a.sr_id = b.sr_id
inner join WORKFLOW c on a.WORKFLOW_GUID = c.WORKFLOW_GUID
where b.type = 'Fed Wire'
and a.finish_date >= '2018-01-01'
and a.assigned_to is not null
group by rollup(c.STATUS_TEXT, DATEPART(YEAR, a.finish_date))
order by c.status_text

Find most recent date of purchase in user day table

I'm trying to put together a query that will fetch the date, purchase amount, and number of transactions of the last time each user made a purchase. I am pulling from a user day table that contains a row for each time a user does anything in the app, purchase or not. Basically all I am trying to get is the most recent date in which the number of transactions field was greater than zero. The below query returns all days of purchase made by a particular user when all I'm looking for is the last purchase so just the 1st row shown in the attached screenshot is what I am trying to get.
screen shot of query and result set
select tuid, max(event_day),
purchases_day_rev as last_dop_rev,
purchases_day_num as last_dop_quantity,
purchases_day_rev/nullif(purchases_day_num,0) as last_dop_spend_pp
from
(select tuid, event_day,purchases_day_rev,purchases_day_num
from
app.user_day
where purchases_day_num > 0
and tuid='122d665e-1d71-4319-bb0d-05c7f37a28b0'
group by 1,2,3,4) a
group by 1,3,4,5
I'm not going to comment on the logic of your query... if all you want is the first row of your result set, you can try:
<your query here> ORDER BY 2 DESC LIMIT 1 ;
Where ORDER BY 2 DESC orders the result set on max(event_day) and LIMIT 1 extracts only the first row.
I don't know all of the ins and outs of your data, but I don't understand why you are grouping within the subquery without any aggregate function (sum, average, min, max, etc). With that said, I would try something like this:
select tuid
,event_day
,purchases_day_rev as last_dop_rev
,purchases_day_num as last_dop_quantity
,purchases_day_rev/nullif(purchases_day_num,0) as last_day_spend_pp
from app.user_day a
inner join
(
select tuid
,max(event_day) as MAX_DAY
from app.user_day
where purchases_day_num > 0
and tuid='122d665e-1d71-4319-bb0d-05c7f37a28b0'
group by 1
) b
on a.tuid = b.tuid
and a.event_day = b.max_day;

STDEVP for calculated fields

I have a table that looks like this:
ID CHANNEL VENDOR num_PERIOD SALES.A SALES.B
000001 Business Shop 1 40 30
000001 Business Shop 2 60 20
000001 Business Shop 3 NULL 30
With many combinations of ID, CHANNEL and VENDOR, and sales records for each of them over time (num_PERIOD).
I want to get the average Standard Deviation of a new field, which returns the sum of SALES.A + SALES.B sum(IS.NULL(SALES.A,0) + ISNULL(SALES.B,0)).
The problem I have is that STDEVP seem to fail with calculated fields, and the result that returns is invalid.
I have been trying with:
select ID, CHANNEL, VENDOR, stdevp(sum(isnull(SALES.A,0) + ISNULL(QSALES.B,0))) OVER (PARTITION BY ID, CHANNEL, VENDOR) as STDEV_SALES
FROM TABLE
GROUP BY ID, CHANNEL, VENDOR
However, the results I'm obtaning are always 0 or NULL.
What I want to obtain is the Average Standard Deviation of each ID, CHANNEL and VENDOR over time (num_PERIOD).
Can someone find an approximation for this please?
Your query doesn't match the sample data.
I can see the problem, though. The SUM() are calculating a single value for each group, and then you are taking the standard deviation of that value. Because you cannot nest aggregation functions, you have turned it into a window function.
Get rid of the sum(). The following should work in SQL Server:
SELECT ID, CHANNEL, VENDOR,
STDEVP(COALESCE(SALES.A, 0) + COALESCE(QSALES.B, 0)) as STDEV_SALES
FROM SALES . . .
QSALES
GROUP BY ID, CHANNEL, VENDOR;
I would also return the COUNT(*) . . . the standard deviation doesn't make sense if you have fewer than 3 rows. (Okay, it is defined for two values, but not very useful.)