Can I SQL join a table twice? - sql

I have two entities: Proposal and Vote.
Proposal: A user can make a proposition.
Vote: A user can vote for a proposition.
CREATE TABLE `proposal` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
);
CREATE TABLE `vote` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`idea_id` int(11) NOT NULL,
`updated` datetime NOT NULL,
PRIMARY KEY (`id`),
);
Now I want to fetch rising Propsals, which means:
Proposal title
Total number of all time votes
has received votes within the last 3 days
I am trying to fetch without a subSELECT because I am using doctrine which doesn't allow subSELECTs. So my approach is to fetch by joining the votes table twice (first for fetching the total amount of votes, second to be able to create a WHERE clause to filter last 3 days) and do a INNER JOIN:
SELECT
p.title,
COUNT(v.p_id) AS votes,
DATEDIFF(NOW(), DATE(x.updated))
FROM proposal p
JOIN vote v ON p.id = v.p_id
INNER JOIN vote x ON p.id = x.p_id
WHERE DATEDIFF(NOW(), DATE(x.updated)) < 3
GROUP BY p.id
ORDER BY votes DESC;
It's clear that this will return a wrong votes amount as it triples the votes' COUNT(). It's actually , because it creates a cartesian product just as a CROSS JOIN does.
Is there any way I can get the proper amount without using a subSELECT?

Instead, you can create a kind of COUNTIF function using this pattern:
- COUNT(CASE WHEN <condition> THEN <field> ELSE NULL END)
For example...
SELECT
p.title,
COUNT(v.p_id) AS votes,
COUNT(CASE WHEN v.updated >= DATEADD(DAY, -3, CURRENT_DATE()) THEN v.p_id ELSE NULL END) AS new_votes
FROM
proposal p
JOIN
vote v
ON p.id = v.p_id
GROUP BY
p.title
ORDER BY
COUNT(v.p_id) DESC
;

Related

Multiple selects on joined tables with group by?

I have three tables with the structures outlined below:
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
username VARCHAR(255) UNIQUE
);
CREATE TABLE posts (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT REFERENCES users(id) NOT NULL,
category BIGINT REFERENCES categories(id) NOT NULL,
text TEXT NOT NULL
);
CREATE TABLE posts_votes (
user_id BIGINT REFERENCES users(id) NOT NULL,
post_id BIGINT REFERENCES posts(id) NOT NULL
value SMALLINT NOT NULL,
PRIMARY KEY(user_id, post_id)
);
I was able to compose a query that gets each post with its user and its total value using the below query:
SELECT p.id, p.text, u.username, COALESCE(SUM(v.value), 0) AS vote_value
FROM posts p
LEFT JOIN posts_votes v ON p.id=t.post_id
JOIN users u ON p.user_id=u.id
WHERE posts.category=1337
GROUP BY p.id, p.text, u.username
But now I want to also return a column that returns the result of SELECT COALESCE((SELECT value FROM posts_votes WHERE user_id=1234 AND post_id=n), 0) for each post_id n in the above query. What would be the best way to do this?
I think an additional LEFT JOIN is a reasonable approach:
SELECT p.id, p.text, u.username, COALESCE(SUM(v.value), 0) AS vote_value,
COALESCE(pv.value, 0)
FROM posts p JOIN
users u
ON p.user_id=u.id LEFT JOIN
topics_votes v
ON p.id = t.post_id LEFT JOIN
post_votes pv
ON pv.user_id = 1234 AND pv.post_id = p.id
WHERE p.category = 1337
GROUP BY p.id, p.text, u.username, pv.value;

Writing a query to combine results from multiple tables with all possible combinations

I have this database schema:
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name char(50) NOT NULL UNIQUE
);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name char(50) NOT NULL,
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
uid INTEGER REFERENCES users (id) NOT NULL,
pid INTEGER REFERENCES products (id) NOT NULL,
quantity INTEGER NOT NULL,
price FLOAT NOT NULL CHECK (price >= 0)
);
I am trying to write a query that will give me all combinations of users and products, as well as the total amount spent by the user on that product. Specifically, if I have 5 products and 5 users, there should be 25 rows in the table. Right now I have a query that almost gets the job done, however, if the user has never purchased that product then there is no row printed at all.
Here's what I've written so far:
SELECT u.name as username, p.name as productname, SUM(o.quantity * o.price) as totalPrice
FROM users u, orders o, products p
WHERE u.id = o.uid
AND p.id = o.pid
GROUP BY u.name, p.name
ORDER BY u.name, p.name
I figure that this requires some sort of join, but my SQL knowledge is limited and I am not sure what would be the best way to go about doing this. I think if somebody can help me figure this out then I will have a much better understanding.
You can do this using cross join and left join:
select u.name as username, p.name as productname,
sum(o.quantity * o.price) as totalPrice
from users u cross join
products p left join
orders o
on o.uid = u.id and o.pid = p.id
group by u.name, p.name;
The cross join generates all the rows. The left join brings in the matching rows. A simple rule when using SQL is: Never use commas in the FROM clause. Always use explicit JOIN syntax.

SQL SUM and COUNT returning wrong values

I found a bunch of similar questions but nothing worked for me, or I am too stupid to get how to do it right.
The visit count works fine if I use COUNT(DISTINCT visits.id) but then the vote count goes totally wrong - it displays a value 3 to 4 times larger than it should be.
So this is the query
SELECT SUM(votes.rating), COUNT(visits.id)
FROM topics
LEFT JOIN visits ON ( visits.content_id = topics.id )
LEFT JOIN votes ON ( votes.content_id = topics.id )
WHERE topics.id='1'
GROUP BY topics.id
The votes table looks like this
id int(11) | rating tinyint(4) | content_id int(11) | uid int(11)
visits table
id int(11) | content_id int(11) | uid int(11)
topics table
id int(11) | name varchar(128) | message varchar(512) | uid int(11)
help?
Basically, you're summing or counting the total number of rows potentially returned. So, if there are three visits and four votes for each id, then the visits will be multiplied by four and the votes by three.
I think what you want can easiest be ackomplished by using subqueries:
SELECT (SELECT SUM(v.rating) FROM votes v WHERE v.content_id = t.id),
(SELECT COUNT(vi.id) FROM visits vi WHERE vi.content_id = t.id)
FROM topics t
WHERE t.id=1
GROUP BY t.id
I suspect the problem is in the join with the table votes.
If votes have more than one row you will have the count using also that duplicated rows.
If you use distinct you skip the duplication of the Ids (due to the join with vote).
As a first tiral I will temporarely disapble the join with votes and see what happen.
Hope it helps
Without seeing the data it is a bit tough to debug, but I would guess it is because there are more visits than votes. The following should work for you:
SELECT (SELECT SUM (rating) FROM votes WHERE votes.content_id = topics.id),
(SELECT COUNT (1) FROM visits WHERE visits.content_id = topics.id)
FROM topics
WHERE topics.id = 1
You need to do this as two separate subqueries:
SELECT sumrating, numvisit
FROM (select visits.content_id, count(*) as numvisits
from visits
) tvisit left outer join
(select votes.content_id, SUM(votes.rating) as sumrating
from votes
group by votes.content_id
) v
ON ( v.content_id = tvisit.content_id )
WHERE tvisit.content_id='1'
As it turns out, you don't need to join in the topic table at all.

Getting sum() on a different distinct row MySQL

I was looking on different questions on this issue, but couldn't find an answer for my problem.
This is my query:
SELECT SUM( lead_value ) AS lead_value_sum, count( DISTINCT phone ) AS SUM, referer
FROM leads t1
INNER JOIN leads_people_details t2 ON t1.lead_id = t2.lead_id
INNER JOIN user_to_leads t3 ON t1.lead_id = t3.lead_id
WHERE lead_date
BETWEEN 20100716000000
AND 20100716235959
AND t1.site_id =8
GROUP BY t1.referer
I am trying to sum up the lead_value only of unique phone numbers. The count (Distinct phone) actually works and gives me the number of unique phones for each referer, but I can't seem to understand how should I SUM the lead_value for unique phone numbers at each referer.
Would appreciate any help you can give me,
Eden
Edit: Table Structures
CREATE TABLE user_to_leads
(
user_idINT(10) NOT NULL,
lead_idINT(10) NOT NULL,
site_idINT(10) NOT NULL,
lead_value INT(10) NOT NULL
)
CREATE TABLE leads
(
lead_id INT(100) NOT NULL auto_increment ,
site_id INT(10) NOT NULL ,
lead_date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ,
vaild_date TIMESTAMP NOT NULL DEFAULT '0000-00-00 00:00:00',
referer VARCHAR(255) NOT NULL,
KEYWORD VARCHAR(255) NOT NULL,
upsaleINT(11) NOT NULL DEFAULT '0' ,
vaild INT(2) NOT NULL,
PRIMARY KEY (lead_id),
KEY lead_date (lead_date)
)
CREATE TABLE leads_people_details
(
lead_id INT(100) NOT NULL auto_increment ,
fullnameVARCHAR(255) NOT NULL,
phone VARCHAR(12) NOT NULL ,
email VARCHAR(255) NOT NULL,
homeVARCHAR(255) NOT NULL,
browser VARCHAR(255) NOT NULL,
browser_version VARCHAR(100) NOT NULL,
resolutionVARCHAR(255) NOT NULL,
IPVARCHAR(255) NOT NULL,
statusVARCHAR(255) NOT NULL DEFAULT '0',
COMMENT text NOT NULL,
PRIMARY KEY (lead_id)
)
You say
For a particular referer,phone, the
lead_value will always be the same
Based on the limited information you have given I think this should return the right answer. If you update your question with the requested information it will probably be possible to improve upon it though.
SELECT SUM(lead_value ) AS lead_value_sum, count(phone ) AS phone_count, referer
FROM
(
SELECT DISTINCT lead_value, phone, referer
FROM leads t1
INNER JOIN leads_people_details t2 ON t1.lead_id = t2.lead_id
INNER JOIN user_to_leads t3 ON t1.lead_id = t3.lead_id
WHERE lead_date
BETWEEN 20100716000000
AND 20100716235959
AND t1.site_id =8
) derived
GROUP BY referer
Upated after table structure posted
I don't really understand why have both leads_people_details and leads got a primary key and auto_increment column of lead_id that you are joining on? That would imply a 1-1 relationship between leads and leads_people_details? If so one of them probably shouldn't be an auto_increment to avoid the possibility of the ids getting out of synch without you realising.
Also there is no Primary Key on the user_to_leads table. Should there one on user_id, lead_id, site_id? Additionally you are not currently filtering by siteid on that table. Is that intentional? If not if you do that does that stop the duplicate records from coming back? If it doesn't then can you describe the significance of user_id in that table? You earlier said that For a particular referer,phone, the lead_value will always be the same can it differ by user_id? If so which should be used? If not why is user_id in that table?
A provisional query that might be closer is here but there are still the unresolved queries above.
SELECT SUM(lead_value ) AS lead_value_sum, count(phone ) AS phone_count, referer
FROM leads t1
INNER JOIN leads_people_details t2 ON t1.lead_id = t2.lead_id
INNER JOIN user_to_leads t3 ON t1.lead_id = t3.lead_id
and t1.site_id = t3.site_id
WHERE lead_date
BETWEEN 20100716000000
AND 20100716235959
AND t1.site_id =8

Select newest entry from a joined MySQL table

I have stock quantity information in my database.
1 table, "stock", holds the productid (sku) along with the quantity and the filename from where it came.
The other table, "stockfile", contains all the processed filenames along with dates.
Now I need to get all the products with their latest stock quantity values.
This gives me ALL the products multiple times with all their stock quantity (resulting in 300.000 records)
SELECT stock.stockid, stock.sku, stock.quantity, stockfile.filename, stockfile.date
FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
ORDER BY stock.sku ASC
I already tried this:
SELECT * FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
GROUP BY sku
HAVING stockfile.date = MAX( stockfile.date )
ORDER BY stock.sku ASC
But it did not work
SHOW CREATE TABLE stock:
CREATE TABLE stock (
stockid bigint(20) NOT NULL AUTO_INCREMENT,
sku char(25) NOT NULL,
quantity int(5) NOT NULL,
creationdate datetime NOT NULL,
stockfileid smallint(5) unsigned NOT NULL,
touchdate datetime NOT NULL,
PRIMARY KEY (stockid)
) ENGINE=MyISAM AUTO_INCREMENT=315169 DEFAULT CHARSET=latin1
SHOW CREATE TABLE stockfile:
CREATE TABLE stockfile (
stockfileid smallint(5) unsigned NOT NULL AUTO_INCREMENT,
filename varchar(25) NOT NULL,
creationdate datetime DEFAULT NULL,
touchdate datetime DEFAULT NULL,
date datetime DEFAULT NULL,
begindate datetime DEFAULT NULL,
enddate datetime DEFAULT NULL,
PRIMARY KEY (stockfileid)
) ENGINE=MyISAM AUTO_INCREMENT=265 DEFAULT CHARSET=latin1
This is an example of the frequently-asked "greatest-n-per-group" question that we see every week on StackOverflow. Follow that tag to see other similar solutions.
SELECT s.*, f1.*
FROM stock s
INNER JOIN stockfile f1
ON (s.stockfileid = f1.stockfileid)
LEFT OUTER JOIN stockfile f2
ON (s.stockfileid = f2.stockfileid AND f1.date < f2.date)
WHERE f2.stockfileid IS NULL;
If there are multiple rows in stockfile that have the max date, you'll get them both in the result set. To resolve this, you'd have to add some tie-breaker conditions into the join on f2.
Thanks for adding the CREATE TABLE info. That's very helpful when you're asking SQL questions.
I see from the AUTO_INCREMENT table options that you have 315k rows in stock and only 265 rows in stockfile. Your stockfile table is the parent in the relationship, and the stock table is the child, with a column stockfileid that references the primary key of stockfile.
So your original question was misleading. You want the latest row from stock, not the latest row from stockfile.
SELECT f.*, s1.*
FROM stockfile f
INNER JOIN stock s1
ON (f.stockfileid = s1.stockfileid)
LEFT OUTER JOIN stock s2
ON (f.stockfileid = s2.stockfileid AND (s1.touchdate < s2.touchdate
OR s1.touchdate = s2.touchdate AND s1.stockid < s2.stockid))
WHERE s2.stockid IS NULL;
I'm assuming you want "latest" to be relative to touchdate, so if you want to use creationdate instead, you can do the edit.
I've added a term to the join so that it resolves ties. I know you said the dates are "practically unique" but as the saying goes, "one in a million is next Tuesday."
Okay, I think I understand what you're trying to do now. You want the most recent row per sku, but the date by which to compare them is in the referenced table stockfile.
SELECT s1.*, f1.*
FROM stock s1
JOIN stockfile f1 ON (s1.stockfileid = f1.stockfileid)
LEFT OUTER JOIN (stock s2 JOIN stockfile f2 ON (s2.stockfileid = f2.stockfileid))
ON (s1.sku = s2.sku AND (f1.date < f2.date OR f1.date = f2.date AND f1.stockfileid < f2.stockfileid))
WHERE s2.sku IS NULL;
This does a self-join of stock to itself, looking for a row with the same sku and a more recent date. When none is found, then s1 contains the most recent row for its sku. And each instance of stock has to join to its stockfile to get the date.
Re comment about optimization: It's hard for me to test because I don't have tables populated with data matching yours, but I'd guess you should have the following indexes:
CREATE INDEX stock_sku ON stock(sku);
CREATE INDEX stock_stockfileid ON stock(stockfileid);
CREATE INDEX stockfile_date ON stockfile(date);
I'd suggest using EXPLAIN to analyze the query without the indexes, and then create one index at a time and re-analyze with EXPLAIN to see which one gives the most direct benefit.
Use:
SELECT DISTINCT s.stockid,
s.sku,
s.quantity,
sf.filename,
sf.date
FROM STOCK s
JOIN STOCKFILE sf ON sf.stockfileid = s.stockfileid
JOIN (SELECT t.stockfileid,
MAX(t.date) 'max_date'
FROM STOCKFILE t
GROUP BY t.stockfileid) x ON x.stockfileid = sf.stockfileid
AND x.max_date = sf.date
select *
from stock
where stockfileid in (
select top 1 stockfileid
from stockfile
order by date desc
)
There are two common ways to accomplish this: a sub query or a self-join.
See this example of selecting the group-wise maximum at the MySQL site.
Edit, an example using a subquery:
SELECT stock.stockid, stock.sku, stock.quantity,
stockfile.filename, stockfile.date
FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
WHERE stockfile.date = (SELECT MAX(date) FROM stockfile);