SQL Group By + Count with multiple tables - sql

I'm studying for an interview next week which has a small data analysis component. The recruiter gave me the following sample SQL question which I'm having trouble wrapping my mind around a solution. I'm hoping that I'm not biting off more than I can chew ;)..
SAMPLE QUESTION:
You are given two tables:
AdClick Table (columns: ClickID, AdvertiserID, UserID, and other
fields) and AdConversion Table (columns: ClickID, UserID and other
fields).
You have to find the total conversion rate (# of conversions/# of
clicks) for users with 1 click, 2 click etc.
I've been playing with this for about an hour and keep hitting road blocks. I understand COUNT and GROUP BY but suspect I'm missing a simple SQL feature that I'm unaware of. This also makes it difficult for me to find any possible pointers/solutions via Google: not knowing the magic keywords to search on.
Example Input
dbo.AdConversion
----------------
ClickID UserID
1 1
2 1
4 1
5 3
6 2
7 2
12 1
9 4
10 4
dbo.AdClick
-----------
ClickID AdvertiserID UserID
1 1 1
2 2 1
3 1 2
4 1 1
5 1 3
6 2 2
7 3 2
8 1 1
9 4 4
10 2 4
11 3 4
12 2 1
Expected Result:
----------------
UserClickCount ConversionRate
4 80.00%
2 66.67%
1 100.00%
Explanation/Clarification:
Users with 4 AdConversion.ClickIDs (aka Conversions) have an 80% conversation rate.
Here there's just one user, UserID 1, which has 5 AdClicks with 4 AdConversions.
Users with 2 Conversions have a combined 6 Adclicks with 4 conversions for a conversion rate of 66.67%. Here, that'd be UserID 2 and 4.
Users with 1 Conversion, here only UserID 3, has 1 conversion against 1 AdClick for a 100% conversion rate.
Here's one possible solution I've come up with after some direction from Zack's comment. I can't imagine that it's the ideal solution or whether it has bugs in it or not:
DECLARE #Conversions TABLE
(
UserID int NOT NULL,
AdConversions int
)
INSERT INTO #Conversions (UserID, AdConversions)
SELECT adc.UserID, COUNT(adc.UserID)
FROM dbo.AdConversion adc
GROUP BY adc.UserID;
DECLARE #Clicks TABLE
(
UserID int NOT NULL,
AdClicks int
)
INSERT INTO #Clicks(UserID, AdClicks)
SELECT UserID, Count (ClickID)
FROM dbo.AdClick
GROUP BY UserID;
SELECT co.AdConversions, CONVERT(decimal(6,3), (CAST(SUM(co.AdConversions) AS float) / SUM(cl.AdClicks))) * 100
FROM #Conversions co
INNER JOIN #Clicks cl
ON co.UserID = cl.UserID
GROUP BY co.AdConversions;
Any advice would be greatly appreciated!
Thanks,
Michael

Your logic seems good. Here is a version with common table expressions and a little update with the numeric conversion:
WITH tConversions as
(SELECT UserID, COUNT(ClickID) as AdConversions
FROM AdConversion
GROUP BY UserID),
tClicks as
(SELECT UserID, COUNT(ClickID) as AdClicks
FROM AdClick
GROUP BY UserID)
SELECT co.AdConversions, CONVERT(decimal(10,2),CAST(SUM(co.AdConversions) as float) / SUM(cl.AdClicks) * 100) as ConversionRate
FROM tConversions co
INNER JOIN tClicks cl
ON co.UserID = cl.UserID
GROUP BY co.AdConversions
You can also use subqueries directly:
SELECT co.AdConversions, CONVERT(decimal(10,2),CAST(SUM(co.AdConversions) as float) / SUM(cl.AdClicks) * 100) as ConversionRate
FROM
(SELECT UserID, COUNT(ClickID) as AdConversions
FROM AdConversion
GROUP BY UserID)
as co
INNER JOIN
(SELECT UserID, COUNT(ClickID) as AdClicks
FROM AdClick
GROUP BY UserID)
as cl
ON co.UserID = cl.UserID
GROUP BY co.AdConversions

Related

Find missing dates from multi table query

I'm looking to write a query that can count missing entries from a table of dates based on skills that a resource has to forecast availability of resource for booking. I'm not sure if it can be done and I'm certainly struggling with the logic!!
Tables
Dates
ID dateFrom StaffID
1 01-06-2014 1
2 02-06-2014 1
3 03-06-2014 1
4 04-06-2014 1
5 05-06-2014 1
6 01-06-2014 2
7 03-06-2014 2
8 04-06-2014 2
9 05-06-2014 2
10 06-06-2014 2
(Free dates on the 6th for staffID 1 and 2nd for staffID 2)
Staff
StaffID Name
1 John
2 Paul
Skills
ID StaffID SkillID
1 1 1
2 1 2
3 1 3
4 2 2
5 2 3
6 2 4
So I want to write a query that says in June, for each of the skills there is X no of days available to book. Is this even possible? looking for records that don't exist to join with a staff table?
I've put together a calendar table that can identify days without bookings but I'm struggling from there on to be honest.
Any help would be greatly appreciated!!
Steve
EDIT: DB is SQL 2005.
Expected output (if possible)
SkillID Number of days available
1 20
2 22
3 14
etc
create a calendar table with all possible dates (booked or not)
select count(distinct ad.calendarDate), s.SkillID
from all_dates ad
cross join skills s
where not exists (
select 1 from
dates where dateFrom = ad.calendarDate
and StaffID = s.StaffID
)
group by s.SkillID
If I understand your problem, your query will be some thing like:
Select sum(temp.nbrDate), temp.SkillID from
(Select s.SkillID, count (d.ID) as nbrDate from Skills s, Dates d
where s.StaffID = d.StaffID
Group by SkillID) temp
group by SkillID
If you want to add a date range, add this in your where close:
and d.DateForm between '01-06-2014' and '30-06-2014'

SQL - Overall average Points

I have a table like this:
[challenge_log]
User_id | challenge | Try | Points
==============================================
1 1 1 5
1 1 2 8
1 1 3 10
1 2 1 5
1 2 2 8
2 1 1 5
2 2 1 8
2 2 2 10
I want the overall average points. To do so, i believe i need 3 steps:
Step 1 - Get the MAX value (of points) of each user in each challenge:
User_id | challenge | Points
===================================
1 1 10
1 2 8
2 1 5
2 2 10
Step 2 - SUM all the MAX values of one user
User_id | Points
===================
1 18
2 15
Step 3 - The average
AVG = SUM (Points from step 2) / number of users = 16.5
Can you help me find a query for this?
You can get the overall average by dividing the total number of points by the number of distinct users. However, you need the maximum per challenge, so the sum is a bit more complicated. One way is with a subquery:
select sum(Points) / count(distinct userid)
from (select userid, challenge, max(Points) as Points
from challenge_log
group by userid, challenge
) cl;
You can also do this with one level of aggregation, by finding the maximum in the where clause:
select sum(Points) / count(distinct userid)
from challenge_log cl
where not exists (select 1
from challenge_log cl2
where cl2.userid = cl.userid and
cl2.challenge = cl.challenge and
cl2.points > cl.points
);
Try these on for size.
Overall Mean
select avg( Points ) as mean_score
from challenge_log
Per-Challenge Mean
select challenge ,
avg( Points ) as mean_score
from challenge_log
group by challenge
If you want to compute the mean of each users highest score per challenge, you're not exactly raising the level of complexity very much:
Overall Mean
select avg( high_score )
from ( select user_id ,
challenge ,
max( Points ) as high_score
from challenge_log
) t
Per-Challenge Mean
select challenge ,
avg( high_score )
from ( select user_id ,
challenge ,
max( Points ) as high_score
from challenge_log
) t
group by challenge
After step 1 do
SELECT USER_ID, AVG(POINTS)
FROM STEP1
GROUP BY USER_ID
You can combine step 1 and 2 into a single query/subquery as follows:
Select BestShot.[User_ID], AVG(cast (BestShot.MostPoints as money))
from (select tLog.Challenge, tLog.[User_ID], MostPoints = max(tLog.points)
from dbo.tmp_Challenge_Log tLog
Group by tLog.User_ID, tLog.Challenge
) BestShot
Group by BestShot.User_ID
The subquery determines the most points for each user/challenge combo, and the outer query takes these max values and uses the AVG function to return the average value of them. The last Group By tells SQL to average all the values across each User_ID.

SQL - Order by amount of occurrences

It's my first question here so I hope I can explain it well enough,
I want to order my data by amount of occurrences in the table.
My table is like this:
id Daynr
1 2
1 4
2 4
2 5
2 6
3 1
4 2
4 5
And I want it to sort it like this:
id Daynr
3 1
1 2
1 4
4 2
4 5
2 4
2 5
2 6
Player #3 has one day in the table, and Player #1 has 2.
My table is named "dayid"
Both id and Daynr are foreign keys, together making it a primary key
I hope this explains my problem enough, Please ask for more information it's my first time here.
Thanks in advance
You can do this by counting the number of times that things occur for each id. Most databases support window functions, so you can do this as:
select id, daynr
from (select t.*, count(*) over (partition by id) as cnt
from table t
) t
order by cnt, id;
You can also express this as a join:
select t.id, t.daynr
from table as t inner join
(select id, count(*) as cnt
from table
group by id
) as tg
on t.id = tg.id
order by tg.cnt, id;
Note that both of these include the id in the order by. That way, if two ids have the same count, all rows for the id will appear together.

Need hierarichal data from 3 tables in SQL Server

I have following tables:
UserMaster:
UserId Int, UserName Varchar(200),AddedBy Int
UserId EmpName AddedBy
1 admin 0
2 SubAdmin1 1
3 SubAdmin2 1
4 Vikas 2
5 Mohit 4
6 Atul 5
7 Vishal 6
8 Mani 3
9 Sunny 8
SalesMaster:
SalesId Int, UserId Int (FK_UserMaster_UserId) , Price Int
SalesId UserId Price
1 1 100
2 2 200
3 3 300
4 4 500
5 5 100
6 6 200
7 7 111
8 8 222
9 9 333
Case 1: Now I want the price total of all the users who are under the one particular user and its own price also.
Means If i consider UserId=1 , Then the price will be calculated for all users where Column value in AddedBy=1
and their lower level employees.
Means the total Price of users will be calulated for the users having UserId are: 1,2,3,4,5,6,7,8,9.
Case 2: Similarly, If i want to calculate the total price under UserId=3(SubAdmin2) then the total price from the salesMaster will be calculated for the Users having UserId are: 3,8,9
The Result of first Case should be:
UserId Price
1 2066
The Result of Second Case should be:
UserId Price
3 300+222+333
Please Help
Thanks & Regards
Nitin
with cte as (
select #UserId as UserId
union all
select um.UserId
from UserMaster as um
inner join cte as c on c.UserId = um.AddedBy
)
select sum(s.Price)
from cte as c
inner join SalesMaster as s on s.UserId = c.UserId
sql fiddle demo

SQL RANDOM ORDER BY ON JOINED TABLE

I have 2 tables: Persons(idPerson INT) and Questions(idQuestion INT).
I want to insert the data into a 3rd table: OrderedQuestions(idPerson INT, idQuestion INT, questionRank INT)
I want to assign all the questions to all the persons but in a random order.
I thought of doing a CROSS JOIN but then, I get the same order of questions for every persons.
INSERT INTO OrderedQuestions
SELECT idPerson, idQuestion, questionRank FROM Persons
CROSS JOIN
(SELECT idQuestion,ROW_NUMBER() OVER (ORDER BY NEWID()) as questionRank
FROM Questions) as t
How can I achieve such a random, distinct ordering for every persons?
Obviously, I want the solution to be as fast as possible.
(It can be done using TSQL or Linq to SQL)
Desired results for 3 persons and 5 questions:
idPerson idQuestion questionRank
1. 1 18 1
2. 1 14 2
3. 1 25 3
4. 1 31 4
5. 1 2 5
6. 2 2 1
7. 2 25 2
8. 2 31 3
9. 2 18 4
10. 2 14 5
11. 3 31 1
12. 3 18 2
13. 3 14 3
14. 3 25 4
15. 3 2 5
I just edited the results (Since the IDs are autogenerated, they can't be used to order the questions).
This could probably be written more efficently, but it meets all the reqs.
SELECT
idperson,
idQuestion,
ROW_NUMBER() OVER (PARTITION BY personid ORDER BY ordering) as questionRank
FROM (
SELECT idperson, idQuestion, ordering
FROM person
CROSS JOIN
(
SELECT idQuestion, NewID() as ordering FROM Question
) as t
) as a
order by personid, QuestionRank