List a number of copies per book - sql

For a school project, I should do queries on this Library model.
I've done good so far, but now I'm stuck on this question: "List a number of copies per book".
Here's what've done and not working as it should:
SELECT liv_titulo, (
SELECT count(exe_cod)
FROM exemplar
GROUP BY liv_cod
ORDER BY COUNT(exe_cod) desc) Quantidade
FROM livro INNER JOIN exemplar USING (liv_cod);

I think the technique you wanted to use is a correlated subquery:
you need a WHERE clause in the subquery so it only takes into account the liv_code from the current row in the outer query
no GROUP BY clause is needed in the subquery; it should return a scalar value (just one row and one column, that contains the count of matching rows from table examplar)
there is no need for a join in the outer query
Code:
SELECT
liv_titulo,
(
SELECT count(*)
FROM exemplar e
WHERE e.liv_cod = l.liv_cod -- correlation
) Quantidade
FROM livro l;

Related

Query with Left outer join and group by returning duplicates

To begin with, I have a table in my db that is fed with SalesForce info. When I run this example query it returns 2 rows:
select * from SalesForce_INT_Account__c where ID_SAP_BAYER__c = '3783513'
When I run this next query on the same table I obtain one of the rows, which is what I need:
SELECT MAX(ID_SAP_BAYER__c) FROM SalesForce_INT_Account__c where ID_SAP_BAYER__c = '3783513' GROUP BY ID_SAP_BAYER__c
Now, I have another table (PedidosEspecialesZarateCabeceras) which has a field (NroClienteDireccionEntrega) that I can match with the field I've been using in the SalesForce table (ID_SAP_BAYER__c). This table has a key that consists of just 1 field (NroPedido).
What I need to do is join these 2 tables to obtain a row from PedidosEspecialesZarateCabeceras with additional fields coming from the SalesForce table, and in case those additional fields are not available, they should come as NULL values, so for that im using a LEFT OUTER JOIN.
The problem is, since I have to match NroClienteDireccionEntrega and ID_SAP_BAYER__c and there's 2 rows in the salesforce table with the same ID_SAP_BAYER__c, my query returns 2 duplicate rows from PedidosEspecialesZarateCabeceras (They both have the same NroPedido).
This is an example query that returns duplicates:
SELECT
cab.CUIT AS CUIT,
convert(nvarchar(4000), cab.NroPedido) AS NroPedido,
sales.BillingCity__c as Localidad,
sales.BillingState__c as IdProvincia,
sales.BillingState__c_Desc as Provincia,
sales.BillingStreet__c as Calle,
sales.Billing_Department__c as Distrito,
sales.Name as RazonSocial,
cab.NroCliente as ClienteId
FROM PedidosEspecialesZarateCabeceras AS cab WITH (NOLOCK)
LEFT OUTER JOIN
SalesForce_INT_Account__c AS sales WITH (NOLOCK) ON
cab.NroClienteDireccionEntrega = sales.ID_SAP_BAYER__c
and sales.ID_SAP_BAYER__c in
( SELECT MAX(ID_SAP_BAYER__c)
FROM SalesForce_INT_Account__c
GROUP BY ID_SAP_BAYER__c
)
WHERE cab.NroPedido ='5320'
Even though the join has MAX and Group By, this returns 2 duplicate rows with different SalesForce information (Because of the 2 salesforce rows with the same ID_SAP_BAYER__c), which should not be possible.
What I need is for the left outer join in my query to pick only ONE of the salesforce rows to prevent duplication like its happening right now. For some reason the select max with the group by is not working.
Maybe I should try to join this tables in a different way, can anyone give me some other ideas on how to join the two tables to return just 1 row? It doesnt matter if the SalesForce row that gets picked out of the 2 isn't the correct one, I just need it to pick one of them.
Your IN clause is not actually doing anything, since...
SELECT MAX(ID_SAP_BAYER__c)
FROM SalesForce_INT_Account__c
GROUP BY ID_SAP_BAYER__c
... returns all possible IDSAP_BAYER__c values. (The GROUP BY says you want to return one row per unique ID_SAP_BAYER__c and then, since your MAX is operating on exactly one unique value per group, you simply return that value.)
You will want to change your query to operate on a value that is actually different between the two rows you are trying to differentiate (probably the MAX(ID) for the relevant ID_SAP_BAYER__c). Plus, you will want to link that inner query to your outer query.
You could probably do something like:
...
LEFT OUTER JOIN
SalesForce_INT_Account__c sales
ON cab.NroClienteDireccionEntrega = sales.ID_SAP_BAYER__c
and sales.ID in
(
SELECT MAX(ID)
FROM SalesForce_INT_Account__c sales2
WHERE sales2.ID_SAP_BAYER__c = cab.NroClienteDireccionEntrega
)
WHERE cab.NroPedido ='5320'
By using sales.ID in ... SELECT MAX(ID) ... instead of sales.ID_SAP_BAYER__c in ... SELECT MAX(ID_SAP_BAYER__c) ... this ensures you only match one of the two rows for that ID_SAP_BAYER__c. The WHERE sales2.ID_SAP_BAYER__c = cab.NroClienteDireccionEntrega condition links the inner query to the outer query.
There are multiple ways of doing the above, especially if you don't care which of the relevant rows you match on. You can use the above as a starting point and make it match your preferred style.
An alternative might be to use OUTER APPLY with TOP 1. Something like:
SELECT
...
FROM PedidosEspecialesZarateCabeceras AS cab
OUTER APPLY(
SELECT TOP 1 *
FROM SalesForce_INT_Account__c s1
WHERE cab.NroClienteDireccionEntrega = s1.ID_SAP_BAYER__c
) sales
WHERE cab.NroPedido ='5320'
Without an ORDER BY the match that TOP 1 chooses will be arbitrary, but I think that's what you want anyway. (If not, you could add an ORDER BY).

How do I display only one result (the highest) with SQL query? (Beginner)

I need help making the following query only display one result, the one with the MAX Procurement Rate.
Currently the query works, but displays all results not just the one with the output of the MAX function
SELECT SalesPeople.SalesPersonID, FirstName, LastName, Region, SalesRevenueYear1, ProcurementCost
FROM ProductRevenueAndCosts
INNER JOIN SalesPeople
ON ProductRevenueAndCosts.SalesPersonID = SalesPeople.SalesPersonID
WHERE SalesPeople.Region = 'Central' AND (
SELECT MAX (ProcurementCost)
FROM ProductRevenueAndCosts
WHERE SalesPeople.Region = 'Central'
)
If you add a LIMIT 1 clause at the end of your SQL, then only the first record will be shown. If you add an ORDER BY column_name, then the results will be ordered by that column. Using these two together is a quick way to get the max or min without having to worry about aggregate functions.
https://www.w3schools.com/mysql/mysql_limit.asp
Otherwise, you can try aggregating the results with a max function:
https://www.w3schools.com/mysql/mysql_min_max.asp
As mentioned, you need to correlate the subquery to outer query. Be sure to use aliases between same named columns and exercise good practice in qualifying all columns with table names or aliases especially in JOIN queries:
SELECT sp.SalesPersonID, sp.FirstName, sp.LastName, sp.Region, sp.SalesRevenueYear1,
prc.ProcurementCost
FROM ProductRevenueAndCosts prc
INNER JOIN SalesPeople sp
ON prc.SalesPersonID = prc.SalesPersonID
WHERE sp.Region = 'Central'
AND prc.ProcurementCost = ( -- CORRELATE OUTER QUERY WITH SUBQUERY
SELECT MAX(ProcurementCost)
FROM ProductRevenueAndCosts
)
Note: If running in MS Access, remove the comment

How To Get Count of the records in sql

I have 2 tables in my database one tblNews and another tblNewsComments
I want to select 10 records from tblNewsComments than have must Comments of news
I used this query but it give an error
SELECT tblNews.id,
tblNews.newsTitle,
tblNews.createdate,
tblNews.viewcount,
COUNT(tblNewsComments.id) AS comcounts
FROM tblNews
INNER JOIN tblNewsComments ON tblNews.id = tblNewsComments.newsID
GROUP BY tblNews.id
Try to replace
GROUP BY tblNews.id
With
GROUP BY tblNews.id,
tblNews.newsTitle,
tblNews.createdate,
tblNews.viewcount
All the expressions in the SELECT list should be in the GROUP BY or inside an aggregate function.
I've always found this to be an annoyance in SQL. There's nothing logically wrong with your query; you're grouping by news item and selecting various attributes of the news item, and then selecting the count of comments linked to the news item. That makes sense.
The error arises because the SQL engine isn't smart enough to realize that all the columns in tblNews are at the same data context, and that grouping by tblNews.id effectively guarantees that there will only be one newsTitle, createdate, and viewcount for each group. It should be able to realize that, I think, and carry out the query. But it doesn't do that; the only column it considers to be unique in the group data context is the exact column that you grouped by, id.
One solution, as Multisync just posted, is to group by ALL the columns you want to include in the select clause. I don't think this is the best solution, however, as you shouldn't have to specify all those columns in the group by clause, and that would force you to keep adding to that list whenever you want to add a new TblNews column to the select clause.
The solution I've always used is to wrap the column in an ineffectual aggregate function in the select clause; I always use max():
select
tblNews.id,
max(tblNews.newsTitle),
max(tblNews.createdate),
max(tblNews.viewcount),
count(tblNewsComments.id) comcounts
from
tblNews
inner join tblNewsComments on tblNews.id=tblNewsComments.newsID
group by
tblNews.id
;
Or with subquery:
SELECT n.id,
n.newsTitle,
n.createdate,
n.viewcount,
(SELECT COUNT(*) FROM tblNewsComments c ON n.id = c.newsID) AS comcounts
FROM tblNews n
you have to select one column and group by another...other column will not work as they are not in the aggregate function.
SELECT tblNews.id, COUNT(tblNewsComments.newsID) AS comcounts
FROM tblNews
INNER JOIN tblNewsComments ON tblNews.id = tblNewsComments.newsID
GROUP BY tblNews.id
Read Here

COUNT(*) vs. COUNT(1) vs. COUNT(pk): which is better? [duplicate]

This question already has answers here:
Count(*) vs Count(1) - SQL Server
(13 answers)
Closed 8 years ago.
I often find these three variants:
SELECT COUNT(*) FROM Foo;
SELECT COUNT(1) FROM Foo;
SELECT COUNT(PrimaryKey) FROM Foo;
As far as I can see, they all do the same thing, and I find myself using the three in my codebase. However, I don't like to do the same thing different ways. To which one should I stick? Is any one of them better than the two others?
Bottom Line
Use either COUNT(field) or COUNT(*), and stick with it consistently, and if your database allows COUNT(tableHere) or COUNT(tableHere.*), use that.
In short, don't use COUNT(1) for anything. It's a one-trick pony, which rarely does what you want, and in those rare cases is equivalent to count(*)
Use count(*) for counting
Use * for all your queries that need to count everything, even for joins, use *
SELECT boss.boss_id, COUNT(subordinate.*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id
But don't use COUNT(*) for LEFT joins, as that will return 1 even if the subordinate table doesn't match anything from parent table
SELECT boss.boss_id, COUNT(*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id
Don't be fooled by those advising that when using * in COUNT, it fetches entire row from your table, saying that * is slow. The * on SELECT COUNT(*) and SELECT * has no bearing to each other, they are entirely different thing, they just share a common token, i.e. *.
An alternate syntax
In fact, if it is not permitted to name a field as same as its table name, RDBMS language designer could give COUNT(tableNameHere) the same semantics as COUNT(*). Example:
For counting rows we could have this:
SELECT COUNT(emp) FROM emp
And they could make it simpler:
SELECT COUNT() FROM emp
And for LEFT JOINs, we could have this:
SELECT boss.boss_id, COUNT(subordinate)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id
But they cannot do that (COUNT(tableNameHere)) since SQL standard permits naming a field with the same name as its table name:
CREATE TABLE fruit -- ORM-friendly name
(
fruit_id int NOT NULL,
fruit varchar(50), /* same name as table name,
and let's say, someone forgot to put NOT NULL */
shape varchar(50) NOT NULL,
color varchar(50) NOT NULL
)
Counting with null
And also, it is not a good practice to make a field nullable if its name matches the table name. Say you have values 'Banana', 'Apple', NULL, 'Pears' on fruit field. This will not count all rows, it will only yield 3, not 4
SELECT count(fruit) FROM fruit
Though some RDBMS do that sort of principle (for counting the table's rows, it accepts table name as COUNT's parameter), this will work in Postgresql (if there is no subordinate field in any of the two tables below, i.e. as long as there is no name conflict between field name and table name):
SELECT boss.boss_id, COUNT(subordinate)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id
But that could cause confusion later if we will add a subordinate field in the table, as it will count the field(which could be nullable), not the table rows.
So to be on the safe side, use:
SELECT boss.boss_id, COUNT(subordinate.*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id
count(1): The one-trick pony
In particular to COUNT(1), it is a one-trick pony, it works well only on one table query:
SELECT COUNT(1) FROM tbl
But when you use joins, that trick won't work on multi-table queries without its semantics being confused, and in particular you cannot write:
-- count the subordinates that belongs to boss
SELECT boss.boss_id, COUNT(subordinate.1)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id
So what's the meaning of COUNT(1) here?
SELECT boss.boss_id, COUNT(1)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id
Is it this...?
-- counting all the subordinates only
SELECT boss.boss_id, COUNT(subordinate.boss_id)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id
Or this...?
-- or is that COUNT(1) will also count 1 for boss regardless if boss has a subordinate
SELECT boss.boss_id, COUNT(*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id
By careful thought, you can infer that COUNT(1) is the same as COUNT(*), regardless of type of join. But for LEFT JOINs result, we cannot mold COUNT(1) to work as: COUNT(subordinate.boss_id), COUNT(subordinate.*)
So just use either of the following:
-- count the subordinates that belongs to boss
SELECT boss.boss_id, COUNT(subordinate.boss_id)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id
Works on Postgresql, it's clear that you want to count the cardinality of the set
-- count the subordinates that belongs to boss
SELECT boss.boss_id, COUNT(subordinate.*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id
Another way to count the cardinality of the set, very English-like (just don't make a column with a name same as its table name) : http://www.sqlfiddle.com/#!1/98515/7
select boss.boss_name, count(subordinate)
from boss
left join subordinate on subordinate.boss_code = boss.boss_code
group by boss.boss_name
You cannot do this: http://www.sqlfiddle.com/#!1/98515/8
select boss.boss_name, count(subordinate.1)
from boss
left join subordinate on subordinate.boss_code = boss.boss_code
group by boss.boss_name
You can do this, but this produces wrong result: http://www.sqlfiddle.com/#!1/98515/9
select boss.boss_name, count(1)
from boss
left join subordinate on subordinate.boss_code = boss.boss_code
group by boss.boss_name
Two of them always produce the same answer:
COUNT(*) counts the number of rows
COUNT(1) also counts the number of rows
Assuming the pk is a primary key and that no nulls are allowed in the values, then
COUNT(pk) also counts the number of rows
However, if pk is not constrained to be not null, then it produces a different answer:
COUNT(possibly_null) counts the number of rows with non-null values in the column possibly_null.
COUNT(DISTINCT pk) also counts the number of rows (because a primary key does not allow duplicates).
COUNT(DISTINCT possibly_null_or_dup) counts the number of distinct non-null values in the column possibly_null_or_dup.
COUNT(DISTINCT possibly_duplicated) counts the number of distinct (necessarily non-null) values in the column possibly_duplicated when that has the NOT NULL clause on it.
Normally, I write COUNT(*); it is the original recommended notation for SQL. Similarly, with the EXISTS clause, I normally write WHERE EXISTS(SELECT * FROM ...) because that was the original recommend notation. There should be no benefit to the alternatives; the optimizer should see through the more obscure notations.
Asked and answered before...
Books on line says "COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } )"
"1" is a non-null expression so it's the same as COUNT(*).
The optimiser recognises it as trivial so gives the same plan. A PK is unique and non-null (in SQL Server at least) so COUNT(PK) = COUNT(*)
This is a similar myth to EXISTS (SELECT * ... or EXISTS (SELECT 1 ...
And see the ANSI 92 spec, section 6.5, General Rules, case 1
a) If COUNT(*) is specified, then the result is the cardinality
of T.
b) Otherwise, let TX be the single-column table that is the
result of applying the <value expression> to each row of T
and eliminating null values. If one or more null values are
eliminated, then a completion condition is raised: warning-
null value eliminated in set function.
At least on Oracle they are all the same: http://www.oracledba.co.uk/tips/count_speed.htm
I feel the performance characteristics change from one DBMS to another. It's all on how they choose to implement it. Since I have worked extensively on Oracle, I'll tell from that perspective.
COUNT(*) - Fetches entire row into result set before passing on to the count function, count function will aggregate 1 if the row is not null
COUNT(1) - Will not fetch any row, instead count is called with a constant value of 1 for each row in the table when the WHERE matches.
COUNT(PK) - The PK in Oracle is indexed. This means Oracle has to read only the index. Normally one row in the index B+ tree is many times smaller than the actual row. So considering the disk IOPS rate, Oracle can fetch many times more rows from Index with a single block transfer as compared to entire row. This leads to higher throughput of the query.
From this you can see the first count is the slowest and the last count is the fastest in Oracle.

SQL GROUP BY/COUNT even if no results

I am attempting to get the information from one table (games) and count the entries in another table (tickets) that correspond to each entry in the first. I want each entry in the first table to be returned even if there aren't any entries in the second. My query is as follows:
SELECT g.*, count(*)
FROM games g, tickets t
WHERE (t.game_number = g.game_number
OR NOT EXISTS (SELECT * FROM tickets t2 WHERE t2.game_number=g.game_number))
GROUP BY t.game_number;
What am I doing wrong?
You need to do a left-join:
SELECT g.Game_Number, g.PutColumnsHere, count(t.Game_Number)
FROM games g
LEFT JOIN tickets t ON g.Game_Number = t.Game_Number
GROUP BY g.Game_Number, g.PutColumnsHere
Alternatively, I think this is a little clearer with a correlated subquery:
SELECT g.Game_Number, G.PutColumnsHere,
(SELECT COUNT(*) FROM Tickets T WHERE t.Game_Number = g.Game_Number) Tickets_Count
FROM Games g
Just make sure you check the query plan to confirm that the optimizer interprets this well.
You need to learn more about how to use joins in SQL:
SELECT g.*, count(*)
FROM games g
LEFT OUTER JOIN tickets t
USING (game_number)
GROUP BY g.game_number;
Note that unlike some database brands, MySQL permits you to list many columns in the select-list even if you only GROUP BY their primary key. As long as the columns in your select-list are functionally dependent on the GROUP BY column, the result is unambiguous.
Other brands of database (Microsoft, Firebird, etc.) give you an error if you list any columns in the select-list without including them in GROUP BY or in an aggregate function.
"FROM games g, tickets t" is the problem line. This performs an inner join. Any where clause can't add on to this. I think you want a LEFT OUTER JOIN.