SQL - DataExplorer Query Top Unsung Users - sql

As previously discussed on meta:
I want to create a Data Explorer query to show the top 100 most unsung users on StackOverflow.
What I mean by top 100 is a list ordered by biggest % of zero accepted answers in descending order.
This is my first time trying to work with SQL, I was looking into other queries and thought this would be it:
SELECT TOP 100
u.Id as [User Link],
count(a.Id) as [Answers],
(select sum(CASE WHEN a.Score = 0 then 1 else 0 end) * 1000 / count(a.Id) / 10.0) as [Percentage]
from
Users u
inner join
Posts q on q.AcceptedAnswerId = u.Id
inner join
Posts a
on a.Id = q.AcceptedAnswerId
where
a.CommunityOwnedDate is null
and a.postTypeId = 2
and u.Reputation > 1000
group by u.Id
order by Percentage DESC
Result: https://data.stackexchange.com/stackoverflow/query/218910
The result show users have one answer, which isn't true when you check their profiles.

You can pull this information using Sam Saffron's The true unsung heros query. I've modified it slightly to include only the top 100.
select top 100
X.*, u.Reputation from (
select a.OwnerUserId [User Link],
sum(case when a.Score = 0 then 0 else 1 end) as [Non Zero Score Answers],
sum(case when a.Score = 0 then 1 else 0 end) as [Zero Score Answers]
from Posts q
join Posts a on a.Id = q.AcceptedAnswerId
where a.CommunityOwnedDate is null and a.OwnerUserId is not null
and a.OwnerUserId <> isnull(q.OwnerUserId,-1)
group by a.OwnerUserId
having sum(case when a.Score = 0 then 1 else 0 end) > 10
) as X
join Users u on u.Id = [User Link]
order by --[Zero Score Answers] desc,
([Zero Score Answers]+ 0.0) / ([Zero Score Answers]+ [Non Zero Score Answers]+ 0.0) desc
This sorts by the ratio of Zero Score answers to total answers.

Related

SQL Query not finding all rows in SUM and COUNT

select coalesce(ratings.positive,0) as positive,coalesce(ratings.negative,0) as negative,articles.id,x.username,commentnumb,
articles.category,
articles."createdAt",
articles.id,
articles.title,
articles."updatedAt"
FROM articles
LEFT JOIN (SELECT id AS userId,username,about FROM users) x ON articles.user_id = x.userId
LEFT JOIN (SELECT id,
article_id,
sum(case when rating = '1' then 1 else 0 end) as positive,
sum(case when rating = '0' then 1 else 0 end) as negative
from article_ratings
GROUP by id
) as ratings ON ratings.article_id = articles.id
LEFT JOIN (SELECT article_id,id,
count(article_id) as commentNumb
from article_comments
GROUP by id
) as comments ON comments.article_id = articles.id
WHERE articles."createdAt" <= :date
group by ratings.positive,ratings.negative,articles.id,x.username,commentnumb
order by articles."createdAt" desc
LIMIT 10
The code is working, however I have many more comments and many more ratings than what is counted in both SUM and COUNT functions.
How do I fix this query?
This is using postgres.
I've done some experimentation and it seems that the third join for comments is the one causing issues.
In the derived tables, you should ideally be grouping using article_id. But, you are grouping based on id. Due to this, you are getting more than the necessary rows in the derived tables. I have modified the query to suit your needs.
SELECT COALESCE(ratings.positive,0) AS positive,COALESCE(ratings.negative,0) AS negative,articles.id,x.username,commentnumb,
articles.category,
articles."createdAt",
articles.id,
articles.title,
articles."updatedAt"
FROM articles
LEFT OUTER JOIN (SELECT id AS userId,username,about FROM users) x ON articles.user_id = x.userId
LEFT OUTER JOIN (SELECT article_id,
SUM(case when rating = '1' then 1 else 0 end) as positive,
SUM(case when rating = '0' then 1 else 0 end) as negative
FROM article_ratings
GROUP by article_id
) AS ratings ON ratings.article_id = articles.id
LEFT OUTER JOIN (SELECT article_id,
count(article_id) as commentNumb
FROM article_comments
GROUP by article_id
) AS comments ON comments.article_id = articles.id
WHERE articles."createdAt" <= :date
ORDER BY articles."createdAt" desc
LIMIT 10;

Return multiple counts in a group by query

I have a 2 tables. A user table and a preference table.
Users
-UserId
Preferences
-PreferenceId
-UserId
-PreferenceType
-Enabled
I have a query to get be the number of users grouped by location:
SELECT u.Location, u.Count
FROM Users u
GROUP BY u.Location
I want to make a report of the Users by Location, but also include columns for each PreferenceType (say there are 3 types 'pref1', 'pref2', 'pref3'
So currently I am making a seperate query like this for each PreferenceType:
SELECT u.Location, u.Count
From Users u
inner join Preferences p ON p.UserId = u.UserId
WHERE
p.PreferenceType = 'pref1'
and p.Enabled = 1
GROUP BY u.Location
Would it be possible to combine all of these and get a result set like:
SELECT u.Location, u.Count, Pref1Count, Pref2Count, Pref3Count
From Users u
inner join Preferences p ON p.UserId = u.UserId
WHERE
You can use conditional aggregation:
select u.Location, count(distinct u.Userid) as cnt,
sum(case when p.PreferenceType = 'pref1' and p.Enabled = 1 then 1 else 0 end) as pref1,
sum(case when p.PreferenceType = 'pref2' and p.Enabled = 1 then 1 else 0 end) as pref2,
sum(case when p.PreferenceType = 'pref3' and p.Enabled = 1 then 1 else 0 end) as pref3,
From Users u left join
Preferences p
on p.UserId = u.UserId
group by u.Location;

SQL count number of users hava a value > 1

I need a select which brings two lines, one with the number of people with the " number of hits " > 0 and the other line with the number of people with the " number of hits " = 0
SELECT u.name as 'Usuário',u.full_name as 'Nome do Usuário',count(l.referer) as 'Número de Acessos'
FROM mmp_user u
LEFT JOIN MMP_MMPUBLISH_LOG l
on u.id=l.user_id
AND l.event_date between '2015-08-01' and '2015-08-08'
group by u.name,u.full_name
order by count(l.referer) desc
I have,
151 Users
9 accessed and
142 not accessed.
But i don't return this values in select, help me please.
Table mmp_user fields (ID,CREATED_BY,AVATAR_ID,CREATION_DATE,EMAIL,FULL_NAME,LAST_EDITED_BY,LAST_EDITION_DATE,NAME,OBSERVATION,USER_PASSWORD,PASSWORD_REMINDER,SIGNATURE,STATUS,ADMINISTRATOR,DESIGNER,SECURITY_OFFICE,PUBLISHER,BRANCH_ID,DEPARTMENT_ID,EXTENSION,PHONE,COMPANY_ID,POSITION,ADMISSION_DATE,PASSWORD_LAST_EDITION_DATE,DISMISSED_DATE,NEWSLETTER,EXPIRE_DATE,COMPANY,BRANCH,DEPARTMENT,AREA_ID,SITE,USER_NUMBER,PREFIX_HOME_PHONE,PREFIX_MOBILE_PHONE,ADDRESS,ADDRESS_COMPLEMENT,ADDRESS_TYPE,CITY,NEIGHBORHOOD,STATE,ZIP_CODE,BIRTHDATE,GENDER,HOME_PHONE,MOBILE_PHONE,CPF,MARIAGE_STATUS,NATIONALITY,RG,EDUCATION,URL_SITE,FIRST_NAME,LAST_NAME,ID_SAP,PASSWORD_GAFISA,NICKNAME,CODE_POSITION,CREATION_USER_ORIGIN,LEVEL_POSITION,BIRTH_DATE_VISIBILITY,HOME_PHONE_COUNTRY_PREFIX,HOME_PHONE_VISIBILITY,MOBILE_PHONE_COUNTRY_PREFIX,MOBILE_PHONE_VISIBILITY,AREA_PREFIX,COUNTRY_PREFIX,PHONE_OBSERVATION,RESPONSIBLE,RESOURCE_ID,AVATAR_RF_ID,RESOURCE_AVATAR_ID,AVATAR_URL_LUCENE,avatarurl,PASSWORD_EXCHANGE,USER_NAME_EXCHANGE,DOMAIN_EXCHANGE,I18N,LAST_IMPORT_FILE,HIERARCHY_POSITION,SECRET_NICKNAME,PROFILE_TYPE,NOT_VIEW_USER,CHANGE_POSITION_DATE,DISTINGUISHED_NAME,OU_USER,AUTH_TOKEN,AUTH_TOKEN_EXPIRATION)
TableMMP_MMPUBLISH_LOG fields (ID,MMPUBLISH_LOG_TYPE,EVENT_DATE,USER_ID,TRANSACTION_NAME,USER_IP,USER_LOGIN,USER_NAME,SESSION_ID,REFERER,PUBLISHING_OBJECT_ID,PUBLISHING_OBJECT_NAME,PHASE_ID,PHASE_NAME,PHASE_COMMENT,ACCESS_URL,HOME_PAGE_ID,HOMEPAGE_ID,phaseComment,phaseId,phaseName,PO_VERSION_NUMBER)
Thanks
You could wrap this query with another query and apply a case expression to the count:
SELECT access_code, COUNT(*)
FROM (SELECT u.name,
u.full_name,
CASE WHEN COUNT(l.referer) > 0 THEN 'access'
ELSE 'no access'
END as access_code
FROM mmp_user u
LEFT JOIN mmp_mmpupluish_log l ON
u.id=l.user_id AND
l.event_date BETWEEN '2015-08-01' AND '2015-08-08'
GROUP BY u.name, u.full_name) t
GROUP BY access_code
ORDER BY access_code ASC
SELECT u.name Usuário, u.full_name [Nome do Usuário],
count(l.referer) [Número de Acessos],
Sum(case when NumberOfHits = 0 then 1 else 0 end) ZeroHitsCount,
Sum(case when NumberOfHits > 0 then 1 else 0 end) HasSomeHitsCount
FROM mmp_user u
LEFT JOIN MMP_MMPUBLISH_LOG l
on u.id=l.user_id
AND l.event_date between '2015-08-01' and '2015-08-08'
group by u.name, u.full_name
order by count(l.referer) desc
Use a case statement:
SELECT (case when l.referer is null then 'Not Accessed'
else 'Accessed'
end) as which,
count(*) as 'Número de Acessos'
FROM mmp_user u LEFT JOIN
MMP_MMPUBLISH_LOG l
on u.id = l.user_id AND
l.event_date between '2015-08-01' and '2015-08-08'
group by (case when l.referer is null then 'Not Accessed'
else 'Accessed'
end)
order by count(l.referer) desc;
Actually, the above counts the number of accesses. One way to get the number of users is to use count(distinct u.id). Another way uses a subquery:
select AccessType, count(*)
from (select u.*,
(case when exists (select 1
from MMP_MMPUBLISH_LOG l
where u.id = l.user_id AND
l.event_date between '2015-08-01' and '2015-08-08'
)
then 'Accessed' else 'Not Accessed'
end) as AccessType
from mmp_user u
) u
group by AccessType;

Join table on conditions, count on conditions

SELECT *, null AS score,
'0' AS SortOrder
FROM products
WHERE datelive = -1
AND hidden = 0
UNION
SELECT e.*, (SUM(r.a)/(COUNT(*)*1.0)+
SUM(r.b)/(COUNT(*)*1.0)+
SUM(r.c)/(COUNT(*)*1.0)+
SUM(r.d)/(COUNT(*)*1.0))/4 AS score,
'1' AS SortOrder
FROM products e
LEFT JOIN reviews r
ON r.productID = e.productID
WHERE e.hidden = 0
AND e.datelive != -1
GROUP BY e.productID
HAVING COUNT(*) >= 5
UNION
SELECT e.*, (SUM(r.a)/(COUNT(*)*1.0)+
SUM(r.b)/(COUNT(*)*1.0)+
SUM(r.c)/(COUNT(*)*1.0)+
SUM(r.d)/(COUNT(*)*1.0))/4 AS score,
'2' AS SortOrder
FROM products e
LEFT JOIN reviews r
ON r.productID = e.productID
WHERE e.hidden = 0
AND e.datelive != -1
GROUP BY e.productID
HAVING COUNT(*) < 5
ORDER BY SortOrder ASC, score DESC
This creates an SQL object for displaying products on a page. The first request grabs items of type datelive = -1, the second of type datelive != -1 but r.count(*) >= 5, and the third of type datelive != -1 and r.count(*) < 5. The reviews table is structured similar to the below:
reviewID | productID | a | b | c | d | approved
-------------------------------------------------
1 1 5 4 5 5 1
2 5 3 2 5 5 0
3 2 5 5 4 3 1
... ... ... ... ... ... ...
I'm trying to work it such that r.count(*) only cares for rows of type approved = 1, since tallying data based on unapproved reviews isn't ideal. How can I join these tables such that the summations of scores and the number of rows is dependent only on approved = 1?
I've tried adding in AND r.approved = 1 in the WHERE conditional for the joins and it doesn't do what I'd like. It does sort it properly, but then it no longer includes items with zero reviews.
You seem to be nearly there.
In your question you talked about adding the AND r.approved = 1 to the join criteria but by the sounds of it you are actually adding it to the WHERE clause.
If you instead properly add it to the join criteria like below then it should work fine:
SELECT *, null AS score,
'0' AS SortOrder
FROM products
WHERE datelive = -1
AND hidden = 0
UNION
SELECT e.*, (SUM(r.a)/(COUNT(*)*1.0)+
SUM(r.b)/(COUNT(*)*1.0)+
SUM(r.c)/(COUNT(*)*1.0)+
SUM(r.d)/(COUNT(*)*1.0))/4 AS score,
'1' AS SortOrder
FROM products e
LEFT JOIN reviews r ON r.productID = e.productID
WHERE e.hidden = 0
AND e.datelive != -1
GROUP BY e.productID
HAVING COUNT(*) >= 5
UNION
SELECT e.*, (SUM(r.a)/(COUNT(*)*1.0)+
SUM(r.b)/(COUNT(*)*1.0)+
SUM(r.c)/(COUNT(*)*1.0)+
SUM(r.d)/(COUNT(*)*1.0))/4 AS score,
'2' AS SortOrder
FROM products e
LEFT JOIN reviews r ON r.productID = e.productID AND r.approved = 1
WHERE e.hidden = 0
AND e.datelive != -1
GROUP BY e.productID
HAVING COUNT(*) < 5
ORDER BY SortOrder ASC, score DESC
SQL Fiddle here.
Notice again how I have simply put the AND r.approved = 1 directly after LEFT JOIN reviews r ON r.productID = e.productID which adds an extra criteria to the join.
As I mentioned in my comment, the WHERE clause will filter rows out of the combined record set after the join has been made. In some cases the RDBMS may optimise it out and put it into the join criteria but only where that would make no difference to the result set.
Calculating the non-zero sums and joining it to your result may solve it;
fiddle
SELECT a.productID,
NULL AS score,
'0' AS SortOrder
FROM products a
WHERE datelive = -1
AND hidden = 0
UNION
SELECT e.productID,
(min(x.a)/(min(x.cnt)*1.0)+ min(x.b)/(min(x.cnt)*1.0)+ min(x.c)/(min(x.cnt)*1.0)+ min(x.d)/(min(x.cnt)*1.0))/4 AS score,
'1' AS SortOrder
FROM products e
JOIN reviews r ON r.productID = e.productID
LEFT JOIN
(SELECT ee.productID,
sum(rr.a) AS a,
sum(rr.b) AS b,
sum(rr.c) AS c,
sum(rr.d) AS d,
count(*) AS cnt
FROM products ee
LEFT JOIN reviews rr ON ee.productID = rr.productID
GROUP BY ee.productID) x ON e.productID = x.productID
WHERE e.hidden = 0
AND e.datelive != -1
GROUP BY e.productID HAVING COUNT(*) >= 5
UNION
SELECT e.productID,
(min(x.a)/(min(x.cnt)*1.0)+ min(x.b)/(min(x.cnt)*1.0)+ min(x.c)/(min(x.cnt)*1.0)+ min(x.d)/(min(x.cnt)*1.0))/4 AS score,
'2' AS SortOrder
FROM products e
LEFT JOIN reviews r ON r.productID = e.productID
LEFT JOIN
(SELECT ee.productID,
sum(rr.a) AS a,
sum(rr.b) AS b,
sum(rr.c) AS c,
sum(rr.d) AS d,
count(*) AS cnt
FROM products ee
LEFT JOIN reviews rr ON ee.productID = rr.productID
GROUP BY ee.productID) x ON e.productID = x.productID
WHERE e.hidden = 0
AND e.datelive != -1
GROUP BY e.productID HAVING COUNT(*) < 5
ORDER BY SortOrder ASC,
score DESC
You could create a temp table that only contains rows where approved = 1, and then join on the temp table instead of reviews.
create table tt_reviews like reviews;
insert into tt_reviews
select * from reviews
where approved = 1;
alter table tt_reviews add index(productID);
Then replace reviews with tt_reviews in your above query.

HAVING clause on SUM column

I want to have a condition on my score column that I get from sum, but HAVING score =< 1 is not working if I put it after group by. That would have to show me projects that have good score. I am using hsqldb, what's going wrong? I get 'user lacks privelege or object not found: SCORE'
SELECT p.id, p.project_name, SUM(CASE r.type_code
WHEN 'GOOD' THEN 1
WHEN 'VERY_GOOD' THEN 1
WHEN 'BAD' THEN -1
WHEN 'VERY_BAD' THEN -1
ELSE 0 END) AS score
FROM record_project AS rp
JOIN project AS p ON p.id = rp.project_id
JOIN record AS r ON r.id = rp.record_id
GROUP BY p.id, p.project_name
HAVING score =< 1 <<<---- wrong?!
ORDER BY score DESC LIMIT 1
You should be using the whole calculated column,
SELECT p.id, p.project_name,
SUM(CASE WHEN r.type_code IN ('GOOD','VERY_GOOD') THEN 1
WHEN r.type_code IN ('BAD','VERY_BAD') THEN -1
ELSE 0 END) score
FROM record_project AS rp
JOIN project AS p ON p.id = rp.project_id
JOIN record AS r ON r.id = rp.record_id
GROUP BY p.id, p.project_name
HAVING SUM(CASE WHEN r.type_code IN ('GOOD','VERY_GOOD') THEN 1
WHEN r.type_code IN ('BAD','VERY_BAD') THEN -1
ELSE 0 END) <= 1
ORDER BY score DESC
-- LIMIT 1
You can incorporate the HAVING as a WHERE over a subquery:
SELECT * FROM (
SELECT p.id, p.project_name, SUM(CASE r.type_code
WHEN 'GOOD' THEN 1
WHEN 'VERY_GOOD' THEN 1
WHEN 'BAD' THEN -1
WHEN 'VERY_BAD' THEN -1
ELSE 0 END) AS score
FROM record_project AS rp
JOIN project AS p ON p.id = rp.project_id
JOIN record AS r ON r.id = rp.record_id
GROUP BY p.id, p.project_name) x
WHERE score =< 1
ORDER BY score DESC
LIMIT 1