SQL SUM and COUNT returning wrong values - sql

I found a bunch of similar questions but nothing worked for me, or I am too stupid to get how to do it right.
The visit count works fine if I use COUNT(DISTINCT visits.id) but then the vote count goes totally wrong - it displays a value 3 to 4 times larger than it should be.
So this is the query
SELECT SUM(votes.rating), COUNT(visits.id)
FROM topics
LEFT JOIN visits ON ( visits.content_id = topics.id )
LEFT JOIN votes ON ( votes.content_id = topics.id )
WHERE topics.id='1'
GROUP BY topics.id
The votes table looks like this
id int(11) | rating tinyint(4) | content_id int(11) | uid int(11)
visits table
id int(11) | content_id int(11) | uid int(11)
topics table
id int(11) | name varchar(128) | message varchar(512) | uid int(11)
help?

Basically, you're summing or counting the total number of rows potentially returned. So, if there are three visits and four votes for each id, then the visits will be multiplied by four and the votes by three.
I think what you want can easiest be ackomplished by using subqueries:
SELECT (SELECT SUM(v.rating) FROM votes v WHERE v.content_id = t.id),
(SELECT COUNT(vi.id) FROM visits vi WHERE vi.content_id = t.id)
FROM topics t
WHERE t.id=1
GROUP BY t.id

I suspect the problem is in the join with the table votes.
If votes have more than one row you will have the count using also that duplicated rows.
If you use distinct you skip the duplication of the Ids (due to the join with vote).
As a first tiral I will temporarely disapble the join with votes and see what happen.
Hope it helps

Without seeing the data it is a bit tough to debug, but I would guess it is because there are more visits than votes. The following should work for you:
SELECT (SELECT SUM (rating) FROM votes WHERE votes.content_id = topics.id),
(SELECT COUNT (1) FROM visits WHERE visits.content_id = topics.id)
FROM topics
WHERE topics.id = 1

You need to do this as two separate subqueries:
SELECT sumrating, numvisit
FROM (select visits.content_id, count(*) as numvisits
from visits
) tvisit left outer join
(select votes.content_id, SUM(votes.rating) as sumrating
from votes
group by votes.content_id
) v
ON ( v.content_id = tvisit.content_id )
WHERE tvisit.content_id='1'
As it turns out, you don't need to join in the topic table at all.

Related

Postgres many to one relationship join multiple tables and select all rows, provided that at least one row matches some criterea

Suppose I have a schema something like
create table if not exists user (
id serial primary key,
name text not null
);
create table if not exists post (
id serial primary key,
user_id integer not null references user (id),
score integer not null
)
I want to run a query that selects a row from the user table by ID, and all the rows that reference it from the post table, provided that at least one row in the post table has a score of greater than some number n (e.g. 50). I'm not exactly sure how to do this though.
You can use window functions. Let me assume that post has a user_id column so the tables can be tied together:
select u.*
from user u join
(select p.*, max(score) over (partition by user_id) as max_score
from post p
) p
on p.user_id = u.id
where p.max_score > 50;
If you just wanted all scores, then aggregation with filtering might be sufficient:
select u.*, array_agg(p.score order by p.score desc)
from user u join
post p
) p
on p.user_id = u.id
group by u.id
having max(p.score) > 50;

INNER JOIN of pagevies, contacts and companies - duplicated entries

In short: 3 table inner join duplicates records
I have data in BigQuery in 3 tables:
Pageviews with columns:
timestamp
user_id
title
path
Contacts with columns:
website_user_id
email
company_id
Companies with columns:
id
name
I want to display all recorded pageviews and, if user and/or company is known, display this data next to pageview.
First, I join contact and pageviews data (SQL is generated by Metabase business intelligence tool):
SELECT
`analytics.pageviews`.`timestamp` AS `timestamp`,
`analytics.pageviews`.`title` AS `title`,
`analytics.pageviews`.`path` AS `path`,
`Contacts`.`email` AS `email`
FROM `analytics.pageviews`
INNER JOIN `analytics.contacts` `Contacts` ON `analytics.pageviews`.`user_id` = `Contacts`.`website_user_id`
ORDER BY `timestamp` DESC
It works as expected and I can see pageviews attributed to known contacts.
Next, I'd like to show pageviews of contacts with known company and which company is this:
SELECT
`analytics.pageviews`.`timestamp` AS `timestamp`,
`analytics.pageviews`.`title` AS `title`,
`analytics.pageviews`.`path` AS `path`,
`Contacts`.`email` AS `email`,
`Companies`.`name` AS `name`
FROM `analytics.pageviews`
INNER JOIN `analytics.contacts` `Contacts` ON `analytics.pageviews`.`user_id` = `Contacts`.`website_user_id`
INNER JOIN `analytics.companies` `Companies` ON `Contacts`.`company_id` = `Companies`.`id`
ORDER BY `timestamp` DESC
With this query I would expect to see only pageviews where associated contact AND company are known (just another column for company name). The problem is, I get duplicate rows for every pageview (sometimes 5, sometimes 20 identical rows).
I want to avoid selecting DISTINCT timestamps because it can lead to excluding valid pageviews from different users but with identical timestamp.
How to approach this?
Your description sounds like you have duplciates in companies. This is easy to test for:
select c.id, count(*)
from `analytics.companies` c
group by c.id
having count(*) >= 2;
You can get the details using window functions:
select c.*
from (select c.*, count(*) over (partition by c.id) as cnt
from `analytics.companies` c
) c
where cnt >= 2
order by cnt desc, id;

SQL: Select count of a record in right table with joins

I have 2 tables one for mobiles and other is for reviews. Reviews table store the reviews of a specific mobile against its mobile id.
Structure of mobiles table.
mobile_id | mobile_name
Structure of reviews table.
review_id | mobile_id | review_body
So far I have written this query.
SELECT c.*, p.review_body
FROM ((select mobile_id, mobile_name from mobiles
WHERE brand_id=1 limit 0,5) c)
left JOIN
(
SELECT mobile_id,
MAX(review_id) MaxDate
FROM reviews
GROUP BY mobile_id
) MaxDates ON c.mobile_id = MaxDates.mobile_id left JOIN
reviews p ON MaxDates.mobile_id = p.mobile_id
AND MaxDates.MaxDate = p.review_id
This query returns the first 5 mobiles from mobile table and their latest (one) review from review table. This is the result it returns.
mobile_id | mobile_name | review_body
Question: But i also want review_count with it. review_count should be equal to total number of reviews a mobile has in reviews table against its mobile_id.
So please tell me how it can be done with a single query that I already have. Any help would be appreciated as i am trying to do this since 24 hours.
I think this would work
SELECT c.*, p.review_body, MaxDates.review_count
FROM ((select mobile_id, mobile_name from mobiles
WHERE brand_id=1 limit 0,5) c)
left JOIN
(
SELECT mobile_id,count(review_id) review_count,
MAX(review_id) MaxDate
FROM reviews
GROUP BY mobile_id
) MaxDates ON c.mobile_id = MaxDates.mobile_id left JOIN
reviews p ON MaxDates.mobile_id = p.mobile_id
AND MaxDates.MaxDate = p.review_id

selecting records from parent / child tables

I have parent / child tables that look like this:
CREATE TABLE users(
id integer primary key AUTOINCREMENT,
pnum varchar(10),
dloc varchar(100),
cc varchar(10),
name varchar(255),
group active bit(1)
);
CREATE TABLE group_members(
id integer primary key AUTOINCREMENT,
group_id integer,
member_id integer,
FOREIGN KEY (group_id) REFERENCES users(id),
FOREIGN KEY (member_id) REFERENCES users(id)
);
Users Data looks like:
ID PNUM DLOC CC NAME GRP
86|23101|dloc 89| | |0
87|23101|dloc 90| | |0
88|23102|dloc 91| | |0
590|12345|Group | |Test Group|1
591|90000|dloc 1 | | |0
group_members data looks like:
ID GROUP_ID
1 |590 | 87
2 |590 | 88
Based on the PNUM, I would like to be able to get the dloc values for all users, whether its a group or not.
So for example, if someone requests pnum 23101, I would like to get back
"dloc 89" and
"dloc 90"
But if they request 12345, I would like to get back
"dloc 90", and
"dloc 91"
So far, I have come up with this query:
SELECT users.dloc
FROM users
WHERE users.id IN
(SELECT group_members.member_id
FROM group_members
INNER JOIN users on users.id = group_members.group_id
WHERE users.pnum='12345');
That works for groups, but it won't return any results if i run the same query with pnum 23101.
What I've tried so far
I tried to see if I could use OR like so:
SELECT users.dloc
FROM users
WHERE users.id in
(SELECT group_members.member_id
FROM group_members
INNER JOIN users on users.id = group_members.group_id
WHERE users.pnum='12345');
OR
(SELECT users.dloc
FROM users
WHERE pnum='12345')
Any suggestions would be appreciated.
You may use a UNION to execute both queries, one of which matches directly on PNUM for values where GROUP <> 1 and the other which matches on PNUM and joins through group_members for values where GROUP = 1.
In the second part, you need to join twice to users to get the members of the group matching the original PNUM.
Since the GROUP condition is opposite in each, only one part of the UNION will ever return results.
/* First part of UNION directly matches PNUM for GROUP = 0 */
SELECT dloc
FROM users
WHERE
PNUM = 23101
AND `group` <> 1
UNION
/**
Second part of UNION matches GROUP = 1
and joins through group_members back to users
to get member dloc (from the second users join)
*/
SELECT uu.dloc
FROM users u
INNER JOIN group_members m ON u.`GROUP` = 1 AND u.id = m.group_id
INNER JOIN users uu ON m.member_id = uu.id
WHERE
u.PNUM = 23101
This does unfortunately require placing the PNUM value twice in the query, once per UNION part, but that isn't so bad.
Here it is in action (using MySQL rather than SQLite, but that doesn't really matter)
Using your original method with OR and an IN() subquery, it can also be done, but I've added WHERE conditions for GROUP = 0 and GROUP = 1.
SELECT users.dloc
FROM users
WHERE users.id in (
SELECT group_members.member_id
FROM group_members
INNER JOIN users on users.id = group_members.group_id
WHERE users.pnum='23101' AND `GROUP` = 1
)
OR users.id IN (
SELECT users.ID
FROM users
WHERE pnum='23101' AND `GROUP` = 0
);
And here's the alternative method in action...

Can I SQL join a table twice?

I have two entities: Proposal and Vote.
Proposal: A user can make a proposition.
Vote: A user can vote for a proposition.
CREATE TABLE `proposal` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
);
CREATE TABLE `vote` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`idea_id` int(11) NOT NULL,
`updated` datetime NOT NULL,
PRIMARY KEY (`id`),
);
Now I want to fetch rising Propsals, which means:
Proposal title
Total number of all time votes
has received votes within the last 3 days
I am trying to fetch without a subSELECT because I am using doctrine which doesn't allow subSELECTs. So my approach is to fetch by joining the votes table twice (first for fetching the total amount of votes, second to be able to create a WHERE clause to filter last 3 days) and do a INNER JOIN:
SELECT
p.title,
COUNT(v.p_id) AS votes,
DATEDIFF(NOW(), DATE(x.updated))
FROM proposal p
JOIN vote v ON p.id = v.p_id
INNER JOIN vote x ON p.id = x.p_id
WHERE DATEDIFF(NOW(), DATE(x.updated)) < 3
GROUP BY p.id
ORDER BY votes DESC;
It's clear that this will return a wrong votes amount as it triples the votes' COUNT(). It's actually , because it creates a cartesian product just as a CROSS JOIN does.
Is there any way I can get the proper amount without using a subSELECT?
Instead, you can create a kind of COUNTIF function using this pattern:
- COUNT(CASE WHEN <condition> THEN <field> ELSE NULL END)
For example...
SELECT
p.title,
COUNT(v.p_id) AS votes,
COUNT(CASE WHEN v.updated >= DATEADD(DAY, -3, CURRENT_DATE()) THEN v.p_id ELSE NULL END) AS new_votes
FROM
proposal p
JOIN
vote v
ON p.id = v.p_id
GROUP BY
p.title
ORDER BY
COUNT(v.p_id) DESC
;