Hive query to find percentage value - hive

The columns of the table I am working on is customer_id,operating_system,device_type,transaction_id,transaction_time.
I want to find out the % of the operating system used by the customer for the transactions done on mobile/tablet device in past 360 days.
Basic approach is : number of transactions where device type in(mobile/tablet) and timestamp is past 360 days group by customer _id,operating_system * 100 / total number of transactions done by particular customer for device type in (mobile/tablet) regardless of the operating system.
How can I write query to find the output as : customer_id,operating_system,% of operating system used
thank you in advance!

In the subquery s below the total count for consumer and count for operating sytem are calculated. Since analytical functions are used, the number of rows remains the same as in source dataset. That is why you need to aggregate by consumer_id and operating_system. Use max or min:
select --group by consumer_id and operating_system
customer_id,
operating_system,
max(operating_system_cnt) operating_system_cnt,
max(total_cnt) total_cnt,
max(operating_system_cnt)*100/max(total_cnt) operating_system_percent
from
(
select --calculate total count and operating_system_count
customer_id,
operating_system,
count(transaction_id) over(partition by customer_id, operating_system) operating_system_cnt,
count(transaction_id) over(partition by customer_id) total_cnt
from your_table
where --your filter conditions here for mobile/tablet and last 360 days
)s
group by
customer_id,
operating_system

Related

want to calculate 2 different aggregations on different criteria in bigquery

Have customer payments , i want to calculate who are the top 10 customers per day based on sum of amount per day per customer. Eventually i want to display those 10 customers and their payment per hour (sum of the amount per hour)
I tried to create 2 window functions in bigquery one window function for per customer and per hour (Value_Hr) values, and one more window function for sum of values per customer (Value_customer).
with base as (
select Name, sum(amount) over W1 as Value_Hr, Hour, sum(amount) over w2 as Value_customer
from
(SELECT trim(cast(format('%t',Name) as string) ) as Name,
cast(round(amount) as numeric) as amount , extract(hour from SettlementTimestamp) as Hr
FROM Payments
where length(trim(Name))>0
)
qualify row_number() over (partition by Name,hr )=1
window w1 as (partition by Name,hr ),
w2 as (partition by Name)
)
select Name,Value_Hr,Hour ,Value_customer
from base
qualify row_number() over (partition by Value_customer order by Value_customer desc )<=10
I expect data as below
but row_number is calculating with in the group of customers and hourly amounts instead per customer and its total value
Can anyone help ?

How to return date of reaching a certain threshold

Using SQL Server Management Studio.
Let's say I have a table with transactions that contains User, Date, Transaction amount. I want a query that will return the date when a certain amount is reached - let's say 100.
For example the same user performs 10 transactions for 10 EUR. I want the query to select the date of the last transaction because that's when his volume reached 100. Of course, once 100 is reached, the query shouldn't change the date with the latest transaction anymore, but leave it at when 100 was reached.
Wrote this on pgadmin but I think syntax should be the same.
with cumulative as
(
select customer_id,
sum(amount) over (partition by customer_id order by payment_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) cum_amt,
payment_date
from payment
)
select customer_id
, min(payment_date) as threshold_reached
from cumulative
where cum_amt>=100
group by customer_id
case when sum(amt) over (partition by user order by date) - amt < 100
and sum(amt) over (partition by user order by date) >= 100
then 1 else 0 end

Calculate percentage using SQL

I am trying to find how I can calculate Percentage from table below
From the above table, I am hoping to get the data with percentage like below. However my SQL statement below returns all percentage value to 0. Please advise what I missed. Thanks
Select SupplierID
, ProductID
, Month
, Year
, Count(SupplierID) AS TotalSales
, (count(SupplierID)* 100/(Select Count (*) From ProductOrder)) AS Percentage
From ProductOrder
Order by Year DESC, Month DESC
That's integer division. Both operands to the division are integers, so the result is given as integer as well; 3/4 is 0, not 0.75. To work around this, just turn one of the arguments to a decimal.
I would also recommend using window functions rather than a subquery:
select supplierid, month, year,
count(*) as totalsales,
100.0 * count(*) / sum(count(*)) over(partition by month, year) as percentage
from productorder
group by supplierid, month, year
order by year desc, month desc, supplierid
Notes:
your original code is not a valid aggregation query to start with: it is missing a group by clause
your sample data indicates that you want the ratio per supplier against the total of the month and year, while your query (attempts to) compute the overal ratio; I fixed that too

i am trying to use the avg() function in a subquery after using a count in the inner query but i cannot seem to get it work in SQL

my table name is CustomerDetails and it has the following columns:
customer_id, login_id, session_id, login_date
i am trying to write a query that calculates the average number of customers login in per day.
i tried this:
select avg(session_id)
from CustomerDetails
where exists (select count(session_id) from CustomerDetails as 'no_of_entries')
.
but then i realized it was going straight to the column and just calculating the average of that column but that's not what i want to do. can someone help me?
thanks
The first thing you need to do is get logins per day:
SELECT login_date, COUNT(*) AS loginsPerDay
FROM CustomerDetails
GROUP BY login_date
Then you can use that to get average logins per day:
SELECT AVG(loginsPerDay)
FROM (
SELECT login_date, COUNT(*) AS loginsPerDay
FROM CustomerDetails
GROUP BY login_date
)
If your login_date is a DATE type you're all set. If it has a time component then you'll need to truncate it to date only:
SELECT AVG(loginsPerDay)
FROM (
SELECT CAST(login_date AS DATE), COUNT(*)
FROM CustomerDetails
GROUP BY CAST(login_date AS DATE)
)
i am trying to write a query that calculates the average number of customers login in per day.
Count the number of customers. Divide by the number of days. I think that is:
select count(*) * 1.0 / count(distinct cast(login_date as date))
from customerdetails;
I understand that you want do count the number of visitors per day, not the number of visits. So if a customer logged twice on the same day, you want to count him only once.
If so, you can use distinct and two levels of aggregation, like so:
select avg(cnt_visitors) avg_cnt_vistors_per_day
from (
select count(distinct customer_id) cnt_visitors
from customer_details
group by cast(login_date as date)
) t
The inner query computes the count of distinct customers for each day, he outer query gives you the overall average.

Can a daily customer frequency filter be reduced to less than 2 subqueries when multiple timestamps exist for a given day and customer?

I am trying to figure out if there is a more efficient way to get a count of frequent customers. The tricky part is I want to filter the customers based on payments per day while removing secondary records that occur for a given customer on more than one day. The dataset includes records for customers on the same day but at different times. I only want to count 1 and only 1 payment per day.
For example, given the following values for (payment_id, customer_id, payment_date), I want a count 2
(17504, 341, '2007-02-16 17:23:14'),
(17505, 341, '2007-02-16 22:41:45'),
(17506, 341, '2007-02-19 19:39:56')
Once the records are grouped by customer and day, I want to filter on customers having more than 3 records and I want to return the count.
My current query is below. Is there another way to do this without so many nested subqueries?
SELECT (COUNT(*)) AS count_for_customers_with_more_than_3_visits
FROM (
SELECT customer_id
FROM (
SELECT customer_id, date_trunc('day', payment_date) AS day
FROM payments
GROUP BY customer_id, day
) visits_by_day
GROUP BY customer_id
HAVING COUNT(day) > 3
) sub;
I'm using Postgres v9.6
Data and query on SQL fiddle
This may not be more efficient, but it is shorter:
SELECT COUNT(*)) AS count_for_customers_with_more_than_3_visits
FROM (SELECT customer_id
FROM payments
GROUP BY customer_id
HAVING COUNT(DISTINCT date_trunc('day', payment_date)) > 3
) sub;