How to join to inner query and calculate column based on different groupings? - sql

I have a table that contains data about a series of visits to shops.
The raw data for these visits can be found here.
My main table will have 1 row per Country, and will use something along the lines of:
Select Distinct o.Country from OtherTable as o
I need to add a new column to my main table, that uses the following calculation:
"Avg Visits by User" = (Sum of (No. Call IDs / No. unique User IDs)
for each day) / No. unique of days (based on Actual Start) for the
row.
I have formed this additional select statement to get the number of calls and users by day - but I am struggling to join this to my main table:
Select DATEPART(DAY, c.ActualStart) As 'Day',
CAST(CAST(COUNT(c.CallID) AS DECIMAL (5,1))/CAST(COUNT(Distinct c.UserID) AS DECIMAL (5,1)) AS DECIMAL (5,1)) as 'Value' from CallInfo as c
where (c.Status = 3))
Group by DATEPART(DAY, c.ActualStart)
For the country GB, I would expect to come to the see the following output:
Day Calls Users Calls / Users
13-Jun 29 8 3.625
14-Jun 31 7 4.428571429
So, in my main table, the calculation for my new column would be:
8.053571 / 2
Therefore, if I somehow add this to my table I would expect the following output:
Country Unique Days Sum of Calls/Users for each day) Final Calc
GB 2 8.053571429 4.026785714
I have tried adding this as a join, but I don't know how to join this to my main table. I could for example join on Call Id - but this would require the addition of a callID column in my inner query, and this would mean that the values are incorrect.

You can use a subquery to make calculations by day and after that make calculations by country. The result SQL query can be like this:
-- Make calculation by country, from the subquery
SELECT Country, UniqueDays = count(TheDay), CallsUserPerDay = sum(CallsPerUser),
FinalCalc = sum(CallsPerUser) / cast(count(TheDay) as DECIMAL)
FROM (
-- SUBQUERY: Make calculations by day
SELECT c.Country, c.ActualStart as TheDay,
Calls = COUNT(c.CallID),
Users = COUNT(Distinct c.UserID),
COUNT(c.CallID)
/CAST(COUNT(Distinct c.UserID) AS DECIMAL) as CallsPerUser
FROM CallInfo as c
WHERE (c.Status = 3)
GROUP BY c.Country, c.ActualStart
) data
GROUP BY Country
Note: I avoid use precission on DECIMAL casting to avoid rounding on final result.

Related

I'm trying to find the average_months_between two functions and also rounding the number to one decimal point. IN oracle sql

CREATE VIEW AVGMNTHSBETWEEN
AS
SELECT
VENDOR_NAME,
AVG(INVOICE_DUE_DATE, INVOICE_DATE) AS MONTHS_BETWN
FROM
VENDORS
INNER JOIN
INVOICES ON VENDORS.VENDOR_ID = INVOICES.VENDOR_ID
GROUP BY
VENDOR_NAME
HAVING
AVG(ROUND(CONVERT(DECIMAL(5, 4 (INVOICE_DUE_DATE, INVOICE_DATE)) >= 1.5
ORDER BY
MONTHS_BETWN DESC;
I get errors with sorting the result set in descending order by the average_months_between, and the results to only show those vendors that the “average_months_between” is greater than or equal 1.5 months.
If you're looking for months between, then include that function. If you just subtract two DATE datatype values, you'll get number of days between them.
Round result where you selected it (not in the having clause, although you'll probably want to do that as well).
Something like this:
CREATE OR REPLACE VIEW avgmnthsbetween
AS
SELECT
vendor_name,
ROUND(AVG(months_between(invoice_due_date, invoice_date)), 1) AS avg_months_betwn
FROM vendors INNER JOIN invoices
ON vendors.vendor_id = invoices.vendor_id
GROUP BY
vendor_name
HAVING
ROUND(AVG(months_between(invoice_due_date, invoice_date)), 1) >= 1.5
ORDER BY avg_months_betwn DESC;

SQL-How to Sum Data of Clients Over Time?

Goal: SUM/AVG Client Data over multiple dates/transactions.
Detailed Question: How do I properly Group clients ('PlayerID') then SUM the int(MinsPlayed), then AVG (AvgBet)?
Current Issue: my Results are giving individual transactions day by day over the 90 day time period instead of the SUM/AVG over the 90 days.
Current Script/Results: FirstName-Riley is showing each individual daily transaction instead of 1 total SUM/AVG over set time period
Firstly, you don't need to use DISTINCT as you are going to be aggregating the results using GROUP BY, so you can take that out.
The reason you are returning a row for each transaction is that your GROUP BY clause includes the column you are trying to aggregate (e.g. TimePlayed). Typically, you only want to GROUP BY the columns that are not being aggregated, so remove all the columns from the GROUP BY clause that you are aggregating using SUM or AVG (TimePlayed, PlayerSkill etc.).
Here's your current SQL:
SELECT DISTINCT CDS_StatDetail.PlayerID,
StatType,
FirstName,
LastName,
Email,
SUM(TimePlayed)/60 AS MinsPlayed,
SUM(CashIn) AS AvgBet,
SUM(PlayerSkill) AS AvgSkillRating,
SUM(PlayerSpeed) AS Speed,
CustomFlag1
FROM CDS_Player INNER JOIN CDS_StatDetail
ON CDS_Player.Player_ID = CDS_StatDetail.PlayerID
WHERE StatType='PIT' AND CDS_StatDetail.GamingDate >= '1/02/17' and CDS_StatDetail.GamingDate <= '4/02/2017' AND CustomFlag1='N'
GROUP BY CDS_StatDetail.PlayerID, StatType, FirstName, LastName, Email, TimePlayed, CashIn, PlayerSkill, PlayerSpeed, CustomFlag1
ORDER BY CDS_StatDetail.PlayerID
You want something like:
SELECT CDS_StatDetail.PlayerID,
SUM(TimePlayed)/60 AS MinsPlayed,
AVG(CashIn) AS AvgBet,
AVG(PlayerSkill) AS AvgSkillRating,
SUM(PlayerSpeed) AS Speed,
FROM CDS_Player INNER JOIN CDS_StatDetail
ON CDS_Player.Player_ID = CDS_StatDetail.PlayerID
WHERE StatType='PIT' AND CDS_StatDetail.GamingDate BETWEEN '2017-01-02' AND '2017-04-02' AND CustomFlag1='N'
GROUP BY CDS_StatDetail.PlayerID
Next time, please copy and paste your text, not just linking to a screenshot.

Access: Having trouble with getting average movies per day

I have a database project at my school and I am almost finished. The only thing that I need is average movies per day. I have a watchhistory where you can find the users who have watch a movie. The instrucition is that you filter the people out of the watchhistory who have an average of 2 movies per day.
I wrote the following SQL statement. But every time I get errors. Can someone help me?
SQL:
SELECT
customer_mail_address,
COUNT(movie_id) AS AantalBekeken,
COUNT(movie_id) / SUM(GETDATE() -
(SELECT subscription_start FROM Customer)) AS AveragePerDay
FROM
Watchhistory
GROUP BY
customer_mail_address
The error:
Msg 130, Level 15, State 1, Line 1
Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
I tried something different and this query sums the total movie's per day. Now I need the average of everything and that SQL only shows the cusotmers who are have more than 2 movies per day average.
SELECT
Count(movie_id) as AantalPerDag,
Customer_mail_address,
Cast(watchhistory.watch_date as Date) as Date
FROM
Watchhistory
GROUP BY
customer_mail_address, Cast(watch_date as Date)
The big problem that I see is that you're trying to use a subquery as if it's a single value. A subquery could potentially return many values, and unless you have only one customer in your system it will do exactly that. You should be JOINing to the Customer table instead. Hopefully the JOIN only returns one customer per row in WatchHistory. If that's not the case then you'll have more work to do there.
SELECT
customer_mail_address,
COUNT(movie_id) AS AantalBekeken,
CAST(COUNT(movie_id) AS DECIMAL(10, 4)) / DATEDIFF(dy, C.subscription_start, GETDATE()) AS AveragePerDay
FROM
WatchHistory WH
INNER JOIN Customer C ON C.customer_id = WH.customer_id -- I'm guessing at the join criteria here since no table structures were provided
GROUP BY
C.customer_mail_address,
C.subscription_start
HAVING
COUNT(movie_id) / DATEDIFF(dy, C.subscription_start, GETDATE()) <> 2
I'm guessing that the criteria isn't exactly 2 movies per day, but either less than 2 or more than 2. You'll need to adjust based on that. Also, you'll need to adjust the precision for the average based on what you want.
What the error message is telling you is that you can't use SUM together with COUNT.
try putting SUM(GETDATE()-(SELECT subscription_start FROM Customer)) as your second aggregate variable, and
try using HAVING & FILTER at the end of your query to select only the users that have count/sum = 2
maybe this is what you need?
lets join the two tables Watchhistory and Customers
select customer_mail_address,
COUNT(movie_id) AS AantalBekeken,
COUNT(movie_id) / datediff(Day, GETDATE(),Customer.subscription_start) AS AveragePerDay
from Watchhistory inner join Customer
on Watchhistory.customer_mail_address = Customer.customer_mail_address
GROUP BY
customer_mail_address
having AveragePerDay = 2
change the last line of code according to what you need (I did not understand if you want it in or out)
I got it guys. Finally :)
SELECT customer_mail_address, SUM(AveragePerDay) / COUNT(customer_mail_address) AS gemiddelde
FROM (SELECT DISTINCT customer_mail_address, COUNT(CAST(watch_date AS date)) AS AveragePerDay
FROM dbo.Watchhistory
GROUP BY customer_mail_address, CAST(watch_date AS date)) AS d
GROUP BY customer_mail_address
HAVING (SUM(AveragePerDay) / COUNT(customer_mail_address) >= 2

SQL: Average value per day

I have a database called ‘tweets’. The database 'tweets' includes (amongst others) the rows 'tweet_id', 'created at' (dd/mm/yyyy hh/mm/ss), ‘classified’ and 'processed text'. Within the ‘processed text’ row there are certain strings such as {TICKER|IBM}', to which I will refer as ticker-strings.
My target is to get the average value of ‘classified’ per ticker-string per day. The row ‘classified’ includes the numerical values -1, 0 and 1.
At this moment, I have a working SQL query for the average value of ‘classified’ for one ticker-string per day. See the script below.
SELECT Date( `created_at` ) , AVG( `classified` ) AS Classified
FROM `tweets`
WHERE `processed_text` LIKE '%{TICKER|IBM}%'
GROUP BY Date( `created_at` )
There are however two problems with this script:
It does not include days on which there were zero ‘processed_text’s like {TICKER|IBM}. I would however like it to spit out the value zero in this case.
I have 100+ different ticker-strings and would thus like to have a script which can process multiple strings at the same time. I can also do them manually, one by one, but this would cost me a terrible lot of time.
When I had a similar question for counting the ‘tweet_id’s per ticker-string, somebody else suggested using the following:
SELECT d.date, coalesce(IBM, 0) as IBM, coalesce(GOOG, 0) as GOOG,
coalesce(BAC, 0) AS BAC
FROM dates d LEFT JOIN
(SELECT DATE(created_at) AS date,
COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|IBM}%' then tweet_id
END) as IBM,
COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|GOOG}%' then tweet_id
END) as GOOG,
COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|BAC}%' then tweet_id
END) as BAC
FROM tweets
GROUP BY date
) t
ON d.date = t.date;
This script worked perfectly for counting the tweet_ids per ticker-string. As I however stated, I am not looking to find the average classified scores per ticker-string. My question is therefore: Could someone show me how to adjust this script in such a way that I can calculate the average classified scores per ticker-string per day?
SELECT d.date, t.ticker, COALESCE(COUNT(DISTINCT tweet_id), 0) AS tweets
FROM dates d
LEFT JOIN
(SELECT DATE(created_at) AS date,
SUBSTR(processed_text,
LOCATE('{TICKER|', processed_text) + 8,
LOCATE('}', processed_text, LOCATE('{TICKER|', processed_text))
- LOCATE('{TICKER|', processed_text) - 8)) t
ON d.date = t.date
GROUP BY d.date, t.ticker
This will put each ticker on its own row, not a column. If you want them moved to columns, you have to pivot the result. How you do this depends on the DBMS. Some have built-in features for creating pivot tables. Others (e.g. MySQL) do not and you have to write tricky code to do it; if you know all the possible values ahead of time, it's not too hard, but if they can change you have to write dynamic SQL in a stored procedure.
See MySQL pivot table for how to do it in MySQL.

Aggregated data from transactional table for sparklines

I'm working on an Ruby-on-Rails app which contains a list type of report. Two columns within that table are an aggregation from a transactional table.
So let's say we have these two tables:
**items**
id
name
group
price
**transactions**
id
item_id
type
date
qty
These two tables are connected with item_id in the transactions table.
Now I want to show some set of lines within the items table in a table and have two calculated columns within that table:
Calculated column 1 (Sparkline data):
Sparkline for transactions for the item with type="actuals" for the last 12 months. The result from the database should be text with aggregated qty for each month seperated by comma. Example:
15,20,0,12,44,33,6,4,33,23,11,65
Calculated column 2 (6m total sale):
Total qty for the item multiplied by sale for the last 6 months.
So the results would how columns like these:
Item name - Sparkline data - 6m total sale
So the result could by many thousand of lines, but would probably be paged.
So the question is, how is the most straightforward way of doing this in Rails models which doesn't sacrifice to much performance? Although this is a ruby-on-rails question it might contain more of a sql type solution.
The core sql could be something similar:
select
i.id,
i.name,
y.sparkline,
i.price*s.sum totalsale6m
from
items i left join
(select
x.item_id,
GROUP_CONCAT(x.sumqtd order by datemonth asc SEPARATOR ',') sparkline
from
(select
t.item_id,
date_format(date, '%m') datemonth,
sum(qtd) sumqtd
from
transactions t
where
t.type='actuals' and
t.date>date_sub(now(), interval 1 year)
group by
t.item_id, datemonth
) x
group by
x.item_id
) y on i.id=y.item_id
left join
(select
t.item_id,
sum(qtd) sumqtd
from
transactions t
where
t.date>date_sub(now(), interval 6 month)
group by
t.item_id
) s on i.id=s.item_id
group by
i.id, i.name
A few comments:
I wasn't able to test it without real data.
If there are gaps in the sales, I mean no sales in a given month, then the list will not contain 12 elements. In this case you need to adjust x,y tables
If you need the result only for a given few items, then probably you can put the item id filter deeper into the subqueries sparing time.