How to write the T-SQL code for a query - sql

I need to use t-sql to query two tables. The first table is Books. The second table is Authors. For each Book record there could be multiple child Author records. I want to write a query that only returns the first Author record found for the current Book record. There are hundreds of thousands of records in the tables so I need the query to be efficient.
select a.FirstName, a.LastName, b.BookName
from Books b
left join
(
select TOP 1 t.BookID, t.FirstName, t.LastName
from Authors t
) a
on a.BookID = b.BookID
where b.BookClassification = 2
This query is not right. I only want to select the top 1 record in the Authors which match the BookID. How can I get the results I am looking for?

You were close:
select a.FirstName, a.LastName, b.BookName
from Books b
outer apply
(
select TOP 1 t.BookID, t.FirstName, t.LastName
from Authors t
WHERE t.BookID = b.BookID
-- uncomment the next line to control which author to prefer
-- ORDER BY t.<someColumn>...
) a
where b.BookClassification = 2
Though it seems odd to me that Authors would be a child of Books... :)

See if this is more efficient. By looking for min(authorID) just once you might get better performance.
select author.FirstName, author.LastName, author.BookName
from Books with (nolock)
join
( select min(authorID) as authorID, bookID
from Authors with (nolock)
group by bookID
) as Author1
on Author1.authorID = Books.authorID
join Authors with (no lock)
on Authors.authorID = Author1.authorID
and Authors.bookID = Author1.bookID
where Books.BookClassification = 2

In the spirit of TIMTOWTDI.
You can use a CTE, a fancy subquery but helpful if the subquery is used more than once. And one the rank functions, row_number().
with bookAuthors as (
select a.FirstName, a.LastName, b.BookName, BookClassification,
row_number() over(partition by b.BookName order by a.lastName ) as rank
from Books b
left join Authors a
on a.BookID = b.BookID
)
select a.FirstName, a.LastName, b.BookName
from bookAuthors
where rank = 1
and BookClassification = 2

Related

Combine a CROSS JOIN and a LEFT JOIN

I have two tables named author and commit_metrics. Both of them have an id field. Author has author_name and author_email. Commit_metrics has author_id and author_date.
I am trying to write a query that will get the number of commits that each author had in a given week, even if that number is 0. Here's what I have so far:
SELECT a.id, a.author_name, a.author_email, c.week_num, COUNT(c.id)
FROM author AS a
CROSS JOIN generate_series(1, 610) AS s(n)
LEFT JOIN (SELECT c.id,
c.author_id,
c.author_date,
WEEK_NUMBER(c.author_date) AS week_num
FROM commit_metrics c) AS c ON s.n = c.week_num AND a.id = c.author_id
WHERE c.week_num IS NOT NULL
GROUP BY a.id, a.author_name, a.author_email, c.week_num
ORDER BY c.week_num DESC, a.author_name;
WEEK_NUMBER is a function I wrote for this query:
CREATE OR REPLACE FUNCTION WEEK_NUMBER(date TIMESTAMP) RETURNS INTEGER AS
$$
SELECT TRUNC(DATE_PART('day', date - '2008-01-01') / 7)::INTEGER;
$$ LANGUAGE SQL;
Currently, the query works like a charm with one major caveat. It doesn't properly calculate 0 when the author made no commits in a given week. I'm not sure why it doesn't. When I do the query with just the FROM and CROSS JOIN, it properly prints the many thousand combined authors/weeks. However, when I add the LEFT JOIN, it loses any week where the author did not make a commit.
Any help would be greatly appreciated. I'm open to doing away with the generate_series call if it's unnecessary.
Also, I found this post, but I don't think it's helpful for my case.
Although you are using a left join, "WHERE c.week_num IS NOT NULL" filters out all of the cases where there is no post. Try this:
SELECT a.id, a.author_name, a.author_email, s.n as week_num, COUNT(c.id) as post_count
FROM author AS a
CROSS JOIN generate_series(1, 610) AS s(n)
LEFT JOIN (SELECT c.id,
c.author_id,
c.author_date,
WEEK_NUMBER(c.author_date) AS week_num
FROM commit_metrics c) AS c ON s.n = c.week_num AND a.id = c.author_id
GROUP BY a.id, a.author_name, a.author_email, s.n
ORDER BY s.n DESC, a.author_name;
Your WHERE clause is excluding the records on commit_metrics that are null, which is the case when the author has no commits during the week selected. You should just remove this from the WHERE clause to get your desired output.
If you need the WHERE clause to eliminate some of the CROSS JOIN records based on your data, you will need that CROSS JOIN and WHERE to be in a sub-select that you LEFT JOIN to, or create some more complicated logic in the current WHERE clause.
Remove the filtering condition. Also a subquery is not needed and you want to select s.n instead of c.week_num:
SELECT a.id, a.author_name, a.author_email, s.n as week_num, COUNT(c.id)
FROM author a CROSS JOIN
generate_series(1, 610) AS s(n) LEFT JOIN
commit_metrics c
ON s.n = WEEK_NUMBER(c.author_date) AND a.id = c.author_id
GROUP BY a.id, a.author_name, a.author_email, c.week_num
ORDER BY c.week_num DESC, a.author_name;

list all table elements and count() even if count is 0

Suppose the following tables:
table book(
id,
title,
deleted
)
table invoice(
id,
book_id,
settled
)
I need a list off all the books and the number of settled invoices for each book.
I tried this:
select book.id, title, count(invoice.id)
from book LEFT OUTER JOIN invoice ON book.id=invoice.book_id
where deleted=0
and settled=1
group by book.id
This works only if a book has at least 1 settled invoice or if it doesn't have any invoce at all. However it fails when a book has unsettled invoices and it doesn't have any settled invoice.
Any idea how to query it ?
The following will list all books, but only join and count settled invoices.
SELECT
b.id, b.title, COUNT(i.id) AS settled
FROM
book b
LEFT JOIN invoice i
ON b.id = i.book_id
AND i.settled = 1
WHERE
b.deleted = 0
GROUP BY
b.id
The condition and settled = 1 on your WHERE is effectively turning your LEFT JOIN into an INNER JOIN. You can add a CASE expression to your COUNT:
SELECT b.id,
b.title,
COUNT(CASE WHEN i.settled = 1 THEN 1 END)
FROM book b
LEFT JOIN invoice i
ON b.id = i.book_id
WHERE b.deleted=0
GROUP BY b.id;
Or use the LEFT JOIN with the invoice table already filtered:
SELECT b.id,
b.title,
COUNT(i.id)
FROM book b
LEFT JOIN ( SELECT *
FROM invoice
WHERE settled = 1) i
ON b.id = i.book_id
WHERE b.deleted=0
GROUP BY b.id;

Combine Two Tables in Select (SQL Server 2008)

If I have two tables, like this for example:
Table 1 (products)
id
name
price
agentid
Table 2 (agent)
userid
name
email
How do I get a result set from products that include the agents name and email, meaning that products.agentid = agent.userid?
How do I join for example SELECT WHERE price < 100?
Edited to support price filter
You can use the INNER JOIN clause to join those tables. It is done this way:
select p.id, p.name as ProductName, a.userid, a.name as AgentName
from products p
inner join agents a on a.userid = p.agentid
where p.price < 100
Another way to do this is by a WHERE clause:
select p.id, p.name as ProductName, a.userid, a.name as AgentName
from products p, agents a
where a.userid = p.agentid and p.price < 100
Note in the second case you are making a natural product of all rows from both tables and then filtering the result. In the first case you are directly filtering the result while joining in the same step. The DBMS will understand your intentions (regardless of the way you choose to solve this) and handle it in the fastest way.
This is a very rudimentary INNER JOIN:
SELECT
products.name AS productname,
price,
agent.name AS agentname
email
FROM
products
INNER JOIN agent ON products.agentid = agent.userid
I recommend reviewing basic JOIN syntax and concepts. Here's a link to Microsoft's documentation, though what you have above is pretty universal as standard SQL.
Note that the INNER JOIN here assumes every product has an associated agentid that isn't NULL. If there are NULL agentid in products, use LEFT OUTER JOIN instead to return even the products with no agent.
select p.name productname, p.price, a.name as agent_name, a.email
from products p
inner join agent a on (a.userid = p.agentid)
This is my join for slightly larger tables in Prod.Hope it helps.
SELECT TOP 1000 p.[id]
,p.[attributeId]
,p.[name] as PropertyName
,p.[description]
,p.[active],
a.[appId],
a.[activityId],
a.[Name] as AttributeName
FROM [XYZ.Gamification.V2B13.Full].[dbo].[ADM_attributeProperty] p
Inner join [XYZ.Gamification.V2B13.Full].[dbo].[ADM_activityAttribute] a
on a.id=p.attributeId
where a.appId=23098;
select ProductName=p.[name]
, ProductPrice=p.price
, AgentName=a.[name]
, AgentEmail=a.email
from products p
inner join agent a on a.userid=p.agentid
If you don't want to use inner join (or don't have possibility to do it!) and would combine rows, you can use a cross join :
SELECT *
FROM table1
CROSS JOIN table2
or simply
SELECT *
FROM table1, table2

sql triple join: ambigious attribute name on a count

So I want to count a number of books, but the books are stored in 2 different tables with the same attribute name.
I want to get a result that looks like:
name1 [total number of books of 1]
name2 [total number of books of 2]
I tried this triple join;
SELECT DISTINCT name, count(book)
FROM writes w
LEFT JOIN person p on p.id = w.author
LEFT JOIN book b on b.title = w.book
LEFT JOIN controls l on l.controller=p.id
GROUP BY name
ORDER BY name DESC
but since book exists as an attribute in writes and in controls, it cant execute the query.
It can only do it if I leave out one of joins so it can identify book.
How can I tell the sql engine to count the number of both book attributes together for each person?
As a result of database design that you interested in, you should issue 2 different sql and then merge them to handle single output.
A)
SELECT DISTINCT w.name as 'Name', count(w.book) as 'Cnt'
FROM writes w
LEFT JOIN person p on p.id = w.author
LEFT JOIN book b on b.title = w.book
B)
SELECT DISTINCT l.name as 'Name', count(l.book) as 'Cnt'
FROM controls l
LEFT JOIN person p on p.id = l.controller
LEFT JOIN book b on b.title = l.book
For your purpose, you can get UNION of A and B.
or you can use them as data source on a third SQL
select A.Name, sum(A.Cnt+B.Cnt)
from A, B
where A.Name = B.Name
group by A.Name
order by A.Name
WITH T AS
(
SELECT DISTINCT 'WRITES' FROMTABLE, w.name, w.count(book)
FROM writes w
LEFT JOIN person p on p.id = w.author
LEFT JOIN book b on b.title = w.book
GROUP BY name
UNION ALL
SELECT DISTINCT 'CONTROLLS' FROMTABLE, c.name, count(c.book)
FROM controlls c
LEFT JOIN person p on p.id = c.author
LEFT JOIN book b on b.title = c.book
GROUP BY name
)
SELECT * FROM T ORDER BY NAME
Should work.
HTH
This will work on a per distinct author's ID to how many books they've written. The pre-aggregation will return one record per author with how many books by that author. THEN, join to the person table to get the name. The reason I am leaving it by ID and Name of the author is... what if you have two authors "John Smith", but they have respective IDs of 123 and 389. You wouldn't want these rolled-up to the same person (or do you).
select
P.ID,
P.Name,
PreAgg.BooksPerAuthor
from
( select
w.author,
count(*) BooksPerAuthor
from
writes w
group by
w.author ) PreAgg
JOIN Person P
on PreAgg.Author = P.id
order by
P.Name

Simple SQL question about getting rows and associated counts

this oughta be an easy one.
My question is very similar to this one; basically, I've got a table of posts, a table of comments with a foreign key for the post_id, and a table of votes with a foreign key for the post id. I'd like to do a single query and get back a result set containing one row per post, along with the count of associated comments and votes.
From the question I've linked to above, it seems that for getting a table back containing just a row for each post and a comment count, this is the right approach:
SELECT a.ID, a.Title, COUNT(c.ID) AS NumComments
FROM Articles a
LEFT JOIN Comments c ON c.ParentID = a.ID
GROUP BY a.ID, a.Title
I thought adding vote count would be as easy as adding another left join, as in
SELECT a.ID, a.Title, COUNT(c.ID) AS NumComments, COUNT(v.id AS NumVotes)
FROM Articles a
LEFT JOIN Comments c ON c.ParentID = a.ID
LEFT JOIN Votes v ON v.ParentID = a.ID
GROUP BY a.ID, a.Title
but I'm getting bad numbers back. What am I missing?
SELECT
a.ID,
a.Title,
COUNT(DISTINCT c.ID) AS NumComments,
COUNT(DISTINCT v.id) AS NumVotes
FROM
Articles a
LEFT JOIN Comments c ON c.ParentID = a.ID
LEFT JOIN Votes v ON v.ParentID = a.ID
GROUP BY
a.ID,
a.Title
SELECT id, title,
(
SELECT COUNT(*)
FROM comments c
WHERE c.ParentID = a.ID
) AS NumComments,
(
SELECT COUNT(*)
FROM votes v
WHERE v.ParentID = a.ID
) AS NumVotes
FROM articles a
try:
COUNT(DISTINCT c.ID) AS NumComments
You are thinking in trees, not recordsets.
In the recordset the you get each Comment and each Vote returned multiple times combined with each other. Run the query without the group by and the count to see what I mean.
The solution is simple: use COUNT(DISCTINCT c.ID) and COUNT(DISTINCT v.ID)