I am using Firebird 2.5.8 and Delphi 10.2.3 and I want to fill a DBGrid with a query:
SELECT c.ID, l.ID,
(
SELECT COUNT(pl.ID)
FROM Tbl_ProtocolLicense AS pl
WHERE (pl.ReferenceId=l.ID)
) AS ReferenceCount
FROM Tbl_License AS l, tbl_client AS c
WHERE l.ClientId=c.Id;
How I can add a value ( ReferenceCount > 0 ) as boolean or (0/1) to that query?
Why even use a correlated query that would be re-calculated again and again for every row ?
The first query does not actually work. Was too hasty.
SELECT
c.ID,
l.ID,
IIF( r.CNT > 0, 1, 0 )
FROM Tbl_License AS l
JOIN tbl_client AS c ON l.ClientId=c.Id
JOIN (
SELECT COUNT(*) as CNT, ReferenceId as ID
FROM Tbl_ProtocolLicense
GROUP BY 2
) as r ON r.ID = l.ID
Note: this assumes that Tbl_ProtocolLicense.ID column is never NULL.
UPD. I gave a bit of lecture about COUNT and other aggregates at http://stackoverflow.com/a/51159126/976391 - but here I missed it myself.
SELECT COUNT(*) as CNT, ReferenceId as ID
FROM Tbl_ProtocolLicense
GROUP BY 2
Do run the query and see the result. Notice anything fishy?
This query only returns rows that do exist, not those that do not exist.
The intermediate grouping query would not have a single row, where count=0 !
And thus the whole Inner Join based query would not have them too!
What should we do is using Outer Join, that lets row exist even when there is no matching row in another table. Read: https://en.wikipedia.org/wiki/Join_(SQL)
SELECT
c.ID,
l.ID,
IIF( r.CNT is not NULL, 1, 0 )
FROM Tbl_License AS l
JOIN tbl_client AS c ON l.ClientId=c.Id
LEFT JOIN (
SELECT COUNT(*) as CNT, ReferenceId as ID
FROM Tbl_ProtocolLicense
GROUP BY 2
) as r ON r.ID = l.ID
Compare the output with the first query and see the difference.
UPD 2. However even that was not good enough, probably. Problem here is that "you say you want the things you do not actually want".
You demand Firebird to COUNT ALL the rows, when you really DO NOT care about the count. All you care is "if there is at least one row or none at all". If there is one row - you do not care if there would be 10 or 100 or 1000 more. So actually counting objects when you do not want their count - is an extra work done for nothing.
That is especially wasteful in Interbase/Firebird family, where counting over the table can trigger garbage collection and slow down the work. But it would be true even in pure Delphi - you do not want to loop through ALL the array if you would be satisfied with finding first suiting element of it.
And then we can move back to the correlated sub-query.
SELECT
c.ID,
l.ID,
IIF( EXISTS (
SELECT * FROM Tbl_ProtocolLicense AS pl
WHERE pl.ReferenceId=l.ID
), 1, 0 )
FROM Tbl_License AS l, tbl_client AS c
WHERE l.ClientId=c.Id;
The bitter side of c.s.q. is that it is being run again and again for every result row
The bitter side of calculating grouped total counts - is that you actually do not need that data, do not need the exact count.
Which is worse? Who knows. Depending on the real data and real tables/indexes - there can be case when one or another approach would be faster. Human would not notice the difference on small data. It is the question on "scaling up" to thousands and millions of real data, where the difference would show.
UPD 3. Can we have best of the both approaches? I hope we can. The trick is - asking exactly what we need and not any more than that. Can we ask Firebird to list all the IDs which we have in the table without actually counting them? Yes, there is.
SELECT DISTINCT ReferenceId FROM Tbl_ProtocolLicense
Run the query and see the result!
Notice, it still would NOT list the IDs that are not in the table. Obvious? Well, I missed it in my first approach and then two persons upvoting me missed too. Stupid errors are hardest to spot, as you can not believe such stupidity.
So, now we have to plug it instead of "counting" query of the 2nd attempt.
SELECT
c.ID,
l.ID,
IIF( r.ReferenceId is NULL, 0, 1 )
FROM Tbl_License AS l
JOIN tbl_client AS c ON l.ClientId=c.Id
LEFT JOIN (
SELECT DISTINCT ReferenceId
FROM Tbl_ProtocolLicense
) as r ON r.ReferenceId = l.ID
UPD. 4 One last trick. If I am correct, this query would have exactly the same result as above, without using IIF/CASE. Try it and compare. If the results are same - then try to understand why and how it works and which extra assumptions about data it requires.
SELECT
c.ID,
l.ID,
COUNT( r.ReferenceId )
FROM Tbl_License AS l
JOIN tbl_client AS c ON l.ClientId=c.Id
LEFT JOIN (
SELECT DISTINCT ReferenceId
FROM Tbl_ProtocolLicense
) as r ON r.ReferenceId = l.ID
GROUP BY c.ID, l.ID
This query is not better than Upd.3, it is just a quest to think about and then to understand SQL better.
Now do some work to actually check and compare the results, because blindly trusting unknown person on the internet is not good. Even if that person is not malicious, he can make stupid mistakes too.
Whatever you peek from Internet forums, that is only "example" and "idea demonstration", and it is always your responsibility to understand and check that example. And maybe to reject it.
To read and to understand:
Conditional Functions as https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-functions-scalarfuncs.html#fblangref25-functions-conditional
grouping as https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-dml-select.html#fblangref25-dml-select-groupby
joins as https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-dml-select.html#fblangref25-dml-select-joins
Additionally it would really be useful for you to read some good book on general SQL, like Martin Gruber's ones
Related
Background
I've got this PostgreSQL join that works pretty well for me:
select m.id,
m.zodiac_sign,
m.favorite_color,
m.state,
c.combined_id
from people."People" m
LEFT JOIN people.person_to_person_composite_crosstable c on m.id = c.id
As you can see, I'm joining two tables to bring in a combined_id, which I need for later analysis elsewhere.
The Goal
I'd like to write a query that does so by picking the combined_id that's got the lowest value of m.id next to it (along with the other variables too). This ought to result in a new table with unique/distinct values of combined_id.
The Problem
The issue is that the current query returns ~300 records, but I need it to return ~100. Why? Each combined_id has, on average, 3 different m.id's. I don't actually care about the m.id's; I care about getting a unique combined_id. Because of this, I decided that a good "selection criterion" would be to select rows based on the lowest value m.id for rows with the same combined_id.
What I've tried
I've consulted several posts on this and I feel like I'm fairly close. See for instance this one or this one. This other one does exactly what I need (with MAX instead of MIN) but he's asking for it in Unix Bash 😞
Here's an example of something I've tried:
select m.id,
m.zodiac_sign,
m.favorite_color,
m.state,
c.combined_id
from people."People" m
LEFT JOIN people.person_to_person_composite_crosstable c on m.id = c.id
WHERE m.id IN (select min(m.id))
This returns the error ERROR: aggregate functions are not allowed in WHERE.
Any ideas?
Postgres's DISTINCT ON is probably the best approach here:
SELECT DISTINCT ON (c.combined_id)
m.id,
m.zodiac_sign,
m.favorite_color,
m.state,
c.combined_id
FROM people."People" m
LEFT JOIN people.person_to_person_composite_crosstable c
ON m.id = c.id
ORDER BY
c.combined_id,
m.id;
As for performance, the following index on the crosstable might speed up the query:
CREATE INDEX idx ON people.person_to_person_composite_crosstable (id, combined_id);
If used, the above index should let the join happen faster. Note that I cover the combined_id column, which is required by the select.
I've written the following query:
WITH m2 AS (
SELECT m.id, m.original_title, m.votes, l.name as lang
FROM movies m
JOIN movie_languages ml ON m.id = ml.movie_id
JOIN languages l ON l.id = ml.language_id
)
SELECT m.original_title
FROM movies m
WHERE NOT EXISTS (
SELECT 1
FROM m2
WHERE m.id = m2.id AND m2.lang <> 'English'
)
The results appear after 1.5 seconds.
After adding the following line at the end of the query, it takes at least 5 minutes to run it:
ORDER BY votes DESC;
It's not the size of the data, as ORDER BY on the entire table return results in notime.
What am I doing wrong?
Why is the ORDER BY adds so much time? (The query SELECT * FROM movies ORDER BY votes DESC returns immediately).
The order by in the CTE is irrelevant. But I would suggest aggregation for this purpose:
SELECT m.original_title
FROM movies m JOIN
movie_languages ml
ON m.id = ml.movie_id JOIN
languages l
ON l.id = ml.language_id
GROUP BY m.original_title, m.id
HAVING SUM(lang = 'English') = 0;
In order to examine your queries you may turn on the timer by entering .time on at the SQLite prompt. More importantly utilize the EXPLAIN function to see details on your query.
The query initially written does seem to be rather more complex than necessary as already pointed out above. It does not seem apparent what the necessity is for 'movie_languages' and 'languages' tables in general, but especially in this particular query. That would require more explanation on your part but I believe at least one could be removed thus speeding up your query.
The ORDER BY clause in SQLite is handled as described below.
SQLite attempts to use an index to satisfy the ORDER BY clause of a query when possible. When faced with the choice of using an index to satisfy WHERE clause constraints or satisfying an ORDER BY clause, SQLite does the same cost analysis described above and chooses the index that it believes will result in the fastest answer.
SQLite will also attempt to use indices to help satisfy GROUP BY clauses and the DISTINCT keyword. If the nested loops of the join can be arranged such that rows that are equivalent for the GROUP BY or for the DISTINCT are consecutive, then the GROUP BY or DISTINCT logic can determine if the current row is part of the same group or if the current row is distinct simply by comparing the current row to the previous row. This can be much faster than the alternative of comparing each row to all prior rows.
Since there is no index or type on votes stated and the above logic may be followed thus choosing 'the index that it believes will result in the fastest answer'. With the over-complicated query and no index on votes which is being used as ORDER BY then there is much more for it to figure out than necessary. Since the simple query with ORDER BY executes then the complexity of the query causing SQLite much more to compute than necessary.
Additionally the type of the column, most likely INTEGER, is important when sorting (and joining). Attempting to sort on a character type will not only get you wrong results in this case if votes end up above single digits it would be the wrong type to use (I'm not assuming you are just mentioning it).
So simplify the query, ensure your PRIMARY KEYS are properly set, and test it. If it is still not returning in time try an index on votes. This will give you much better insight into what is going on and how different changes affect your queries.
SQLite Documentation - check all and note 6. Sorting, Grouping and Compound SELECTs
SQLite Documentation - check 10. ORDER BY optimizations
You can do it with NOT EXISTS, without joins and aggregation (assuming that there is always at least 1 row for each movie in the table movie_languages):
SELECT m.*
FROM movies m
WHERE NOT EXISTS (
SELECT 1 FROM movie_languages ml
WHERE m.id = ml.movie_id
AND ml.language_id <> (SELECT l.id FROM languages l WHERE l.lang = 'English')
)
ORDER BY m.votes DESC
or with a LEFT join to languages to get the unmatched rows:
SELECT m.*
FROM movies m
INNER JOIN movie_languages ml ON m.id = ml.movie_id
LEFT JOIN languages l ON l.id = ml.language_id AND l.lang <> 'English'
WHERE l.id IS NULL
ORDER BY m.votes DESC
Refer to this link for more information:
here
In a nutshell, When you include an order by clause, the database builds a list of the rows in the correct order and then returns the data in that order.
The creation of the list mentioned above takes a lot of extra processing, translating into a longer execution time.
I need to use INNER JOINs to get a series of information and then I need to COUNT this info. I need to be able to "View all courses and the instructor taking them, the capacity of the course, and the number of members currently booked on the course."
To get all the info I have done the following query:
SELECT
C.coursename, Instructors.fname, Instructors.lname,C.maxNo, membercourse.memno
FROM Courses AS C
INNER JOIN Instructors ON C.instructorNo = Instructors.instructorNo
INNER JOIN Membercourse ON C.courseID = Membercourse.courseID;
but no matter where I put the COUNT it always tells me that whatever is outside the COUNT should be in the GROUP BY
I have worked out how to COUNT/GROUP BY the necessary info e.g.:
SELECT courseID, COUNT (DISTINCT MC.memno)
FROM Membercourse AS MC
GROUP BY MC.courseID;
but I don't know how to combine the two!
I think what you're looking for is a subquery. I'm a SQL-Server guy (not postgresql) but the concept looks to be almost identical after some crash-course postgresql googling.
Anyway, basically, when you write a SELECT statement, you can use a subquery instead of an actual table. So your SQL would look something like:
select count(*)
from
(
select stuff from table
inner join someOtherTable
)
... hopefully that makes sense. Instead of trying to write one big query where you're doing both the inner join and count, you're writing two: an inner one that gets your inner-join'ed data, and then an outer one to actually count the rows.
EDIT: To help explain a bit more on the thought process behind subqueries.
Subqueries are a way of logically breaking down the steps/processes on the data. Instead of trying to do everything in one big step, you do it in steps.
In this case, what's step one? It's to get a combined data source for your combined, inner-join'ed data.
Step 1: Write the Inner Join query
SELECT
C.coursename, Instructors.fname, Instructors.lname,C.maxNo,
membercourse.memno
FROM Courses AS C
INNER JOIN Instructors ON C.instructorNo = Instructors.instructorNo
INNER JOIN Membercourse ON C.courseID = Membercourse.courseID;
Okay, now, what next?
Well, let's say we want to get a count of how many entries there are for each 'memno' in that result above.
Instead of trying to figure out how to modify that query above, we instead use it as a data source, like it was a table itself.
Step 2 - Make it A Subquery
select * from
(
SELECT
C.coursename, Instructors.fname, Instructors.lname,C.maxNo,
membercourse.memno
FROM Courses AS C
INNER JOIN Instructors ON C.instructorNo = Instructors.instructorNo
INNER JOIN Membercourse ON C.courseID = Membercourse.courseID
) mySubQuery
Step 3 - Modify your outer query to get the data you want.
Well, we wanted to group by 'memno', and get the count, right? So...
select memno, count(*)
from
(
-- all that same subquery stuff
) mySubQuery
group by memno
... make sense? Once you've got your subquery written out, you don't need to worry about it any more - you just treat it like a table you're working with.
This is actually incredibly important, and makes it much easier to read more intricate queries - especially since you can name your subqueries in a way that explains what the subquery represents data-wise.
There are many ways to solve this, such using Window Functions and so on. But you can also achieve it using a simple subquery:
SELECT
C.coursename,
Instructors.fname,
Instructors.lname,
C.maxNo,
(SELECT
COUNT(*)
FROM
membercourse
WHERE
C.courseID = Membercourse.courseID) AS members
FROM
Courses AS C
INNER JOIN Instructors ON C.instructorNo = Instructors.instructorNo;
I get a different result set for this query intermittently when I run it...sometimes it gives 1363, sometimes 1365 and sometimes 1366 results. The data doesn't change. What could be causing this and is there a way to prevent it? Query looks something like this:
SELECT *
FROM
(
SELECT
RC.UserGroupId,
RC.UserGroup,
RC.ClientId AS CLID,
CASE WHEN T1.MultipleClients = 1 THEN RC.Salutation1 ELSE RC.DisplayName1 END AS szDisplayName,
T1.MultipleClients,
RC.IsPrimaryRecord,
RC.RecordTypeId,
RC.ClientTypeId,
RC.ClientType,
RC.IsDeleted,
RC.IsCompany,
RC.KnownAs,
RC.Salutation1,
RC.FirstName,
RC.Surname,
Relationship,
C.DisplayName Client,
RC.DisplayName RelatedClient,
E.Email,
RC.DisplayName + ' is the ' + R.Relationship + ' of ' + C.DisplayName Description,
ROW_NUMBER() OVER (PARTITION BY E.Email ORDER BY Relationship DESC) AS sequence_id
FROM
SSDS.Client.ClientExtended C
INNER JOIN
SSDS.Client.ClientRelationship R WITH (NOLOCK)ON C.ClientId = R.ClientID
INNER JOIN
SSDS.Client.ClientExtended RC WITH (NOLOCK)ON R.RelatedClientId = RC.ClientId
LEFT OUTER JOIN
SSDS.Client.Email E WITH (NOLOCK)ON RC.ClientId = E.ClientId
LEFT OUTER JOIN
SSDS.Client.UserDefinedData UD WITH (NOLOCK)ON C.ClientId = UD.ClientId AND C.UserGroupId = UD.UserGroupId
INNER JOIN
(
SELECT
E.Email,
CASE WHEN (COUNT(DISTINCT RC.DisplayName) > 1) THEN 1 ELSE 0 END AS MultipleClients
FROM
SSDS.Client.ClientExtended C
INNER JOIN
SSDS.Client.ClientRelationship R WITH (NOLOCK)ON C.ClientId = R.ClientID
INNER JOIN
SSDS.Client.ClientExtended RC WITH (NOLOCK)ON R.RelatedClientId = RC.ClientId
LEFT OUTER JOIN
SSDS.Client.Email E WITH (NOLOCK)ON RC.ClientId = E.ClientId
LEFT OUTER JOIN
SSDS.Client.UserDefinedData UD WITH (NOLOCK)ON C.ClientId = UD.ClientId AND C.UserGroupId = UD.UserGroupId
WHERE
Relationship IN ('z-Group Principle', 'z-Group Member ')
AND E.Email IS NOT NULL
GROUP BY E.Email
) T1 ON E.Email = T1.Email
WHERE
Relationship IN ('z-Group Principle', 'z-Group Member ')
AND E.Email IS NOT NULL
) T
WHERE
sequence_id = 1
AND T.UserGroupId IN (Select * from iCentral.dbo.GetSubUserGroups('471b9cbd-2312-4a8a-bb20-35ea53d30340',0))
AND T.IsDeleted = 0
AND T.RecordTypeId = 1
AND T.ClientTypeId IN
(
'1', --Client
'-1652203805' --NTU
)
AND T.CLID NOT IN
(
SELECT DISTINCT
UDDF.CLID
FROM
SLacsis_SLM.dbo.T_UserDef UD WITH (NOLOCK)
INNER JOIN
SLacsis_SLM.dbo.T_UserDefData UDDF WITH (NOLOCK)
ON UD.UserDef_ID = UDDF.UserDef_ID
INNER JOIN
SLacsis_SLM.dbo.T_Client CLL WITH (NOLOCK)
ON CLL.CLID = UDDF.CLID AND CLL.UserGroup_CLID = UD.UserID
WHERE
UD.UserDef_ID in
(
'F68F31CE-525B-4455-9D50-6DA77C66FEE5',
'A7CECB03-866C-4F1F-9E1A-CEB09474FE47'
)
AND UDDF.Data = 'NO'
)
ORDER BY T.Surname
EDIT:
I have removed all NOLOCK's (including the ones in views and UDFs) and I'm still having the same issue. I get the same results every time for the nested select (T) and if I put the result set of T into a temp table in the beginning of the query and join onto the temp table instead of the nested select then the final result set is the same every time I run the query.
EDIT2:
I have been doing some more reading on ROW_NUMBER()...I'm partitioning by email (of which there are duplicates) and ordering by Relationship (where there are only 1 of 2 relationships). Could this cause the query to be non-deterministic and would there be a way to fix that?
EDIT3:
Here are the actual execution plans if anyone is interested http://www.mediafire.com/?qo5gkh5dftxf0ml. Is it possible to see that it is running as read committed from the execution plan? I've compared the files using WinMerge and the only differences seem to be the counts (ActualRows="").
EDIT4:
This works:
SELECT * FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY B.Email ORDER BY Relationship DESC) AS sequence_id
FROM
(
SELECT DISTINCT
RC.UserGroupId,
...
) B...
EDIT5:
When running the same ROW_NUMBER() query (T in the original question, just selecting RC.DisplayName and ROW_NUMBER) twice in a row I get different rank for some people:
Does anyone have a good explanation/example of why or how ROW_NUMBER() over a result set that contains duplicates can rank differently each time it is run and ultimately change the number of results?
EDIT6:
Ok I think this makes sense to me now. This occurs when people 2 people have the same email address (e.g a husband and wife pair) and relationship. I guess in this case their ROW_NUMBER() ranking is arbitrary and can change every time it is run.
Your use of NOLOCK all over means you are doing dirty reads and will see uncommitted data, data that will be rolled back, transient and inconsistent data etc
Take these off, try again, report back pleas
Edit: some options with NOLOCKS removed
Data is really changing
Some parameter or filter is changing (eg GETDATE)
Some float comparisons running on different cores each time
See this on dba.se https://dba.stackexchange.com/q/4810/630
Embedded NOLOCKs in udfs or views (eg iCentral.dbo.GetSubUserGroups)
...
As I said yesterday in the comments the row numbering for rows with duplicate E.Email, Relationship values will be arbitrary.
To make it deterministic you would need to do PARTITION BY B.Email ORDER BY Relationship DESC, SomeUniqueColumn . Interesting that it changes between runs though using the same execution plan. I assume this is a consequence of the hash join.
I think your problem is the first row over the partition is not deterministic. I suspect that Email and Relationship is not unique.
ROW_NUMBER() OVER (PARTITION BY E.Email ORDER BY Relationship DESC) AS sequence_id
Later you examine the first row of the partition.
WHERE T.sequence_id = 1
AND T.UserGroupId ...
If that first row is arbitrary then you are going to get an arbitrary where comparison. You need to add to the ORDER BY to include a complete unique key. If there is no unique key then you need to make one or live with arbitrary results. Even on a table with a clustered PK the select row order is not guaranteed unless the entire PK is in the sort clause.
This probably has to do with ordering. You have a sequence_id defined as a row_number ordered by Relationship. You'll always get a sensible order by relationship, but other than that your row_number will be random. So you can get different rows with sequence_id 1 each time. That in turn will affect your where clause, and you can get different numbers of results. To fix this to get a consistent result, add another field to your row_number's order by. Use a primary key to be certain of consistent results.
There's a recent KB that addresses problems with ROW_NUMBER() ... see FIX: You receive an incorrect result when you run a query that uses the row_number function in SQL Server 2008 for the details.
However this KB indicates that it's a problem when parallelism is invoked for execution, and looking at your execution plans I can't see this kicking in. But the fact that MS have found a problem with it in one situation makes me a little bit wary - i.e., could the same issue occur for a sufficiently complicated query (and your execution plan does look sufficiently large).
So it may be worth checking your patch levels of SQL Server 2008.
U Must only use
Order by
without prtition by.
ROW_NUMBER() OVER (ORDER BY Relationship DESC) AS sequence_id
I am using SQLite and will port to MySQL (5) later.
I wanted to know if I am doing something I shouldn't be doing. I tried purposely to design so I'll compare to 0 instead of 1 (I changed hasApproved to NotApproved to do this, not a big deal and I haven't written any code). I was told I never need to write a subquery but I do here. My Votes table is just id, ip, postid (I don't think I can write that subquery as a join instead?) and that's pretty much all that is on my mind.
Naming conventions I don't really care about since the tables are created via reflection and is all over the place.
select
id,
name,
body,
upvotes,
downvotes,
(select 1 from UpVotes where IPAddr=? AND post=Post.id) as myup,
(select 1 from DownVotes where IPAddr=#0 AND post=Post.id) as mydown
from Post
where
flag = '0'
limit ?, ?"
Since you're asking about good practices... the "upvotes" and "downvotes" appearing in your Posts table looks like you're duplicating data in your database. That's a problem, because now you always have to worry whether or not the data is in sync and correct. If you want to know the number of upvotes then count them, don't also store them in the Post table. I'm not positive that is what you're doing, but it's a guess.
Onto your query... You will probably get better performance using a JOINed subquery instead of how you have it. With the scalar subqueries as columns they have to be run once for every row that is returned. That could be a pretty big performance hit if you're returning a bunch of rows. Instead, try:
SELECT
P.id,
P.name,
P.body,
P.upvotes,
P.downvotes,
COALESCE(UV.cnt, 0) AS upvotes2,
COALESCE(DV.cnt, 0) AS downvotes2
FROM
dbo.Posts P
LEFT OUTER JOIN (SELECT post_id, COUNT(*) cnt FROM dbo.UpVotes GROUP BY post_id) AS UV ON UV.post_id = P.id
LEFT OUTER JOIN (SELECT post_id, COUNT(*) cnt FROM dbo.DownVotes GROUP BY post_id) AS DV ON DV.post_id = P.id
Compare it to your own query and see if it gives you better performance.
EDIT: A couple of other posters have advocated a single table for up/down votes. They are absolutely correct. That makes the query even easier and also probably much faster:
SELECT
P.id,
P.name,
P.body,
P.upvotes,
P.downvotes,
SUM(CASE WHEN V.vote_type = 'UP' THEN 1 ELSE 0 END) AS upvotes2,
SUM(CASE WHEN V.vote_type = 'DOWN' THEN 1 ELSE 0 END) AS downvotes2,
FROM
dbo.Posts P
LEFT OUTER JOIN Votes V ON
V.post_id = P.id
GROUP BY
P.id,
P.name,
P.body,
P.upvotes,
P.downvotes
I'm guessing that you're trying to ensure that a user only votes once on each post here.
I wouldn't - I don't - use separate tables for up votes and down votes. Add vote type to your votes table and you won't need correlated subqueries.
Here is my opinions:
It seems that table "UpVotes" and "DownVotes" have same structure and can be merged into one table.
The relation between table "Post" and "Up/DownVotes" can be constrained by foreign key.
Although I am not sure about the performance difference, but I think it would be better to use "join" mechanism rather than nesting two select statement in a select statement.
You can use joins to achieve the same thing, and I would expect joins to work a lot more efficiently than embeded selects.