Calculation of weighted average counts in SQL - sql

I have a query that I am currently using to find counts
select Name, Count(Distinct(ID)), Status, Team, Date from list
In addition to the counts, I need to calculate a goal based on weighted average of counts per status and team, for each day.
For example, if Name 1 counts are divided into 50% Status1-Team1(X) and 50% Status2-Team2(Y) yesterday, then today's goal for Name1 needs to be (X+Y)/2.
The table would look like this, with the 'Goal' field needed as the output:
What is the best way to do this in the same query?

I'm almost guessing here since you did not provide more details but maybe you want to do this:
SELECT name,status,team,data,(select sum(data)/(select count(*) from list where name = q.name)) FROM (SELECT Name, Count(Distinct(ID)) as data, Status, Team, Date FROM list) as q

Related

Finding the initial sampled time window after using SAMPLE BY again

I can't seem to find a perhaps easy solution to what I'm trying to accomplish here, using SQL and, more importantly, QuestDB. I also find it hard to put my exact question into words so bear with me.
Input
My real input is different of course but a similar dataset or case is the gas_prices table on the demo page of QuestDB. On https://demo.questdb.io, you can directly write and run queries against some sample database, so it should be easy enough to follow.
The main task I want to accomplish is to find out which month was responsible for the year's highest galon price.
Output
Using the following query, I can get the average galon price per month just fine.
SELECT timestamp, avg(galon_price) as avg_per_month FROM 'gas_prices' SAMPLE BY 1M
timestamp
avg_per_month
2000-06-05T00:00:00.000000Z
1.6724
2000-07-05T00:00:00.000000Z
1.69275
2000-08-05T00:00:00.000000Z
1.635
...
...
Then, I get all these monthly averages, group them by year and return the maximum galon price per year by wrapping the above query in a subquery, like so:
SELECT timestamp, max(avg_per_month) as max_per_year FROM (
SELECT timestamp, avg(galon_price) as avg_per_month FROM 'gas_prices' SAMPLE BY 1M
) SAMPLE BY 12M
timestamp
max_per_year
2000-01-05T00:00:00.000000Z
1.69275
2001-01-05T00:00:00.000000Z
1.767399999999
2002-01-05T00:00:00.000000Z
1.52075
...
...
Wanted output
I want to know which month was responsible for the maximum price of a year.
Looking at the output of the above query, we see that the maximum galon price for the year 2000 was 1.69275. Which month of the year 2000 had this amount as average price? I'd like to display this month in an additional column.
For the first row, July 2000 is shown in the additional column for year 2000 because it is responsible for the highest average price in 2000. For the second row, it was May 2001 as that month had the highest average price of 2001.
timestamp
max_per_year
which_month_is_responsible
2000-01-05T00:00:00.000000Z
1.69275
2000-07-05T00:00:00.000000Z
2001-01-05T00:00:00.000000Z
1.767399999999
2001-05-05T00:00:00.000000Z
...
...
What did I try?
I tried by adding a subquery to the SELECT to have a "duplicate" of some sort for the timestamp column but that's apparently never valid in QuestDB (?), so probably the solution is by adding even more subqueries in the FROM? Or a UNION?
Who can help me out with this? The data is there in the database and it can be calculated. It's just a matter of getting it out.
I think 'wanted output' can be achieved with window functions.
Please have a look at:
CREATE TABLE electricity (ts TIMESTAMP, consumption DOUBLE) TIMESTAMP(ts);
INSERT INTO electricity
SELECT (x*1000000)::timestamp, rnd_double()
FROM long_sequence(10000000);
SELECT day, ts, max_per_day
FROM
(
SELECT timestamp_floor('d', ts) as day,
ts,
avg_in_15_min as max_per_day,
row_number() OVER (PARTITION BY timestamp_floor('d', ts) ORDER BY avg_in_15_min desc) as rn_per_day
FROM
(
SELECT ts, avg(consumption) as avg_in_15_min
FROM electricity
SAMPLE BY 15m
)
) WHERE rn_per_day = 1

How to write an SQL query to get max number of counts for the most number of travelling of a user within a month

I have been given a task by my manager to write a SQL query to select the max number of counts (no of records) for a user who has travelled the most within a month provided that if the user travels multiple places on the same date, then it should be counted as one. For instance, if you look at the following table design; according to this scenario, my query must return me a count of 2. Although traveller_id "1" has traveled three times within a month, but he traveled to Thailand and USA on the same date, that is why its count is reduced to 2.
I have also developed my logic for this query but I am unable to write it due to lack of syntax knowledge. I split up this query into 3 parts:
Select All records from the table within a month using the MONTH function of SQL
Select All distinct DateTime records from the above result so that the same DateTime gets eliminated.
Select max number of counts for the traveller who visited most places.
Please help me in completing my query. You can also use a different approach from mine.
You can use the count aggregation in a cte then select top(1):
with u as
(select traveller_id,
count(distinct visit_date) as n
from travellers_log
where visit_date between '2022-03-01' and '2022-03-31'
group by traveller_id)
select top(1) traveller_id, name, n from u inner join table_travellers
on u.traveller_id = table_travellers.id
order by n desc;

Optimize Average of Averages SQL Query

I have a table where each row is a vendor with a sale made on some date.
I'm trying to compute average daily sales per vendor for the year 2019, and get a single number. Which I think means I want to compute an average of averages.
This is the query I'm considering, but it takes a very long time on this large table. Is there a smarter way to compute this average without this much nesting? I have a feeling I'm scanning rows more times than I need to.
-- Average of all vendor's average daily sale counts
SELECT AVG(vendor_avgs.avg_daily_sales) avg_of_avgs
FROM (
-- Get average number of daily sales for each vendor
SELECT vendor_daily_totals.memberdeviceid, AVG(vendor_daily_totals.cnt)
avg_daily_sales
FROM (
-- Get total number of sales for each vendor
SELECT vendorid, COUNT(*) cnt
FROM vendor_sales
WHERE year = 2019
GROUP BY vendorid, month, day
) vendor_daily_totals
GROUP BY vendor_daily_totals.vendorid
) vendor_avgs;
I'm curious if there is in general a way to compute an average of averages more efficiently.
This is running in Impala, by the way.
I think you can just do the calculation in one shot:
SELECT AVG(t.avgs)
FROM (
SELECT vendorid,
COUNT(*) * 1.0 / COUNT(DISTINCT month, day) as avgs
FROM vendor_sales
WHERE year = 2019
GROUP BY vendorid
) t
This gets the total and divides by the number of days. However, COUNT(DISTINCT) might be even slower than nested GROUP BYs in Impala, so you need to test this.

How to Caluculate data per year per employee

I need to find a way to calculate total salary per employee per year
I have a list of employee's data which i need to sum their total salaries per each employee per year
below is sample data
This is basic use of SQL that would be covered in any class, book, or online course. In general you should not ask these "how do you code it" questions on this site but instead ask questions about code you have already written, we are not here to write it for you we are here to help if you have problems. On the other hand it is easy:
select employee_no, year, sum(sal_amount)
from table_name_you_did_no_say
group by emplyee_no, year
You could make yourself a formula like this:
salary_per_annum = salary_per_month * 12
What you really need to do is to first categorize your staff based on the salary they earn.
If they earn allowances/bonuses, you could make a different calculations for that too and then add the both values.
If you're talking of an SQL query, for each employee, you could loop:
select sum(salary) * 12
If you're looking to calculate and group, you could have:
select employee_num, sal_year, salary from tbl_payrol group by emplyee_num, sal_amount

MDX- Divide Each row by a value based on parent

I am in a situation where I need to calculate Percentage for every fiscal year depending on distinct count of the rows.
I have achieved the distinct count (fairly simple task) for each year city-wise and reached till these 2 listings in cube.
The first listing is state wide distinct count for given year.
Second listing is city wise distinct count for given year with percentage based on state-wide count for that year for that city.
My problem is that I need to prepare a calculated member for the percentage column for each given year.
For eg, In year 2009, City 1 has distinct count of 2697 and percentage raise of 32.94%. (Formula used= 2697/8187 ).
I tried with ([Measures].[Distinct Count])/(SUM(ROOT(),[Measures].[Distinct Count])) but no luck.
Any help is highly appreciated.
Thanks in advance.
PS: City wide sum of year 2009 can never be equal to statewide distinct count of that year. This is because we are calculating the distinct count for city and state both.
You need to create a Region Hierarchy for this, like State -> City. The create a calculation like below. Then in the browser put your Hierarchy on the left and the sales and calculated percentage in values.
([Dim].[Region].CurrentMember, [Measures].[Salesamt]) /
iif(
([Dim].[Region].CurrentMember.Parent, [Measures].[Salesamt]) = 0,
([Dim].[Region].CurrentMember, [Measures].[Salesamt]),
([Dim].[Region].CurrentMember.Parent, [Measures].[Salesamt])
)