SQL: different results when using wildcard? - sql

Using PostgreSQL 9.6.12.
Given an author has many blog posts.
When I run the following query I get a row for each associated post.
SELECT authors.id
FROM authors
LEFT JOIN posts ON authors.id = posts.author_id
When I run the following, I only get a row for each author:
SELECT authors.*
FROM authors
LEFT JOIN posts ON authors.id = posts.author_id
When I run a count on either one, however, I get the higher row count. E.g. the count of all the posts.
Why don't I get the higher row count result when I use the wildcard to select all the columns?

The problem could be caused by how you are running the query, and the settings of the IDE. These queries should return the same row count. Please run the following queries to check.
select count(*) from (SELECT authors.id
FROM authors
LEFT JOIN posts ON authors.id = posts.author_id)
select count(*) from (SELECT authors.*
FROM authors
LEFT JOIN posts ON authors.id = posts.author_id)

Why don't I get a cartesian product result when I use the wildcard to
select all the columns?
You do not get a cartesian product in either of the two SQL queries.
When I run a count on either one, however, I get the cartesian product
number of rows. E.g. the count of all the posts.
You are not calculating the count of all the posts. You are retrieving all posts that have an author in the authors table.
I am afraid you are confusing the term cartesian product. A cartesian product is the number of rows in the first table times the number of rows in the second table, without and limiting clause/condition. In simple SQL it would correspond to the following e.g.:
SELECT * FROM authors, posts
The two queries in your question return the exact same rows, except that the first query displays only the column id of the authors table while the second displays all the columns of the authors table.
This is standard SQL and I am very confident that every technology respecting the SQL standard would respect the above said.
I hope you see what I mean and suggest that you review the question. It may help if you can show some concrete example, in particular you would have to clarify:
what do you mean by "cartesian product"? (your definition differs from the common usage)
how do you count rows? (according to your example I find it hard to believe you count different number of rows; they must be equal)

Related

Counting Unique IDs In a LEFT/RIGHT JOIN Query in Access

I am working on a database to track staff productivity. Two of the ways we do that is by monitoring the number of orders they fulfil and by tracking their error rate.
Each order they finish is recorded in a table. In one day they can complete many orders.
It is also possible for a single order to have multiple errors.
I am trying to create a query that provides a summary of their results. This query should have one column with "TotalOrders" and another with "TotalErrors".
I connect the two tables with a LEFT/RIGHT join since not all orders will have errors.
The problem comes when I want to total the number of orders. If someone made multiple mistakes on an order, that order gets counted multiple times; once for each error.
I want to modify my query so that when counting the number of orders it only counts records with distinct OrderID's; yet, in the same query, also count the total errors without losing any.
Is this possible?
Here is my SQL
SELECT Count(tblTickets.TicketID) AS TotalOrders,
Count(tblErrors.ErrorID) AS TotalErrors
FROM tblTickets
LEFT JOIN tblErrors ON tblTickets.TicketID = tblErrors.TicketID;
I have played around with SELECT DISTINCT and UNION but am struggling with the correct syntax in Access. Also, a lot of the examples I have seen are trying to total a single field rather than two fields in different ways.
To be clear when totalling the OrderCount field I want to only count records with DISTINCT TicketID's. When totalling the ErrorCount field I want to count ALL errors.
Ticket = Order.
Query Result: Order Count Too High
Ticket/Order Table: Total of 14 records
Error Table: You can see two errors for the same order on 8th
do a query that counts orders by staff from Orders table and a query that counts errors by staff from Errors table then join those two queries to Staff table (queries can be nested for one long SQL statement)
correlated subqueries
SELECT Staff.*,
(SELECT Count(*) FROM Orders WHERE StaffID = Staff.ID) AS CntOrders,
(SELECT Count(*) FROM Errors WHERE StaffID = Staff.ID) AS CntErrors
FROM Staff;
use DCount() domain aggregate function
Option 1 is probably the most efficient and option 3 the least.

SQL Query across two tables only show most recently updated result per tag address

I have two tables: violator_state and violator_tags
violator_state:
m_state_id
is_violating
m_translatedid
m_tag
m_violator_tag
This table holds the "tags" which has an unchanging row count of 10 in this case. The purpose is to list out each tag present, connect the full tag address (m_violator_tag) with its shorthand name (m_tag) and state whether it is in "violation". I need to use this table as reference because of the link between m_violator_tag and m_tag.
violator_tags
m_violator_id
m_eval_time_from
m_eval_time_to
m_tag
m_tag_peers
m_tag_position
This table is constantly having new rows added to it holding the information of what tags are in violation with a specific tag. So it would show T6 in violation with T1,T2,T9 ect.
I am looking to create a query which joins the two tables to show only the most recently updated (largest m_eval_time_from) for each tag.
I am using the following query to join the two tables but I expect m_translatedid and m_tag to match but they do not. Unsure why.
SELECT violator_state.m_violator_tag, violator_state.is_violating, violator_state.m_translatedid, violator_tags.m_tag, violator_tags.m_eval_time_to, violator_tags.m_tag_peers,
violator_tags.m_tag_position, violator_tags.m_eval_time_from
FROM violator_tags CROSS JOIN
violator_state
Violation_state table
violation_tags table
results of my (incorrect) query
Any suggestions on what I should try?
Your CROSS JOIN will give you a cartesian product where EVERY row in the first table is paired with ALL the rows in the second table e.g. if you have 10 rows in each, you will get 10 x 10 = 100 rows in the result! I believe you need to join the tables on the m_tag column and select the violator_tags row with the latest date. The query below should do this for you (though you haven't provided your question in a manner that makes it easy for me to double-check my code - see the link provided by a_horse_with_no_name for more on this or use a website like db-fiddle to set up your example).
SELECT vs.m_violator_tag,
vs.is_violating,
vs.m_translatedid,
vt.m_tag,
vt.m_eval_time_to,
vt.m_tag_peers,
vt.m_tag_position,
vt.m_eval_time_from
FROM violator_tags vt
JOIN violator_state vs
ON vt.m_tag = vs.m_tag
AND vt.m_eval_time_from = (SELECT MAX(vt.m_eval_time_from)
FROM violator_tags
WHERE m_tag = vt.m_tag)

Difference in where clause and join

im very new to SQL and currently working with joins the first time in my life. What I am trying to figure out right now is trying to get the difference between to queries.
Query 1:
SELECT name
FROM actor
JOIN casting ON id = actorid
where (SELECT COUNT(ord) FROM casting join actor on actorid = actor.id AND ord=1) >= 30
GROUP BY name
Query 2:
SELECT name
FROM actor
JOIN casting ON id = actorid
AND (SELECT COUNT(ord) FROM casting WHERE actorid = actor.id AND ord=1)>=30)
GROUP BY name
So I would think that doing
FROM casting join actor on actorid = actor.id
in the subquery is the same as
FROM casting WHERE actorid = actor.id.
But apparently it is not. Could anyone help me out and explain why?
Edit: If anyone is wondering: The queries are based on question 13 from http://sqlzoo.net/wiki/More_JOIN_operations
Actually, the part that really looks like a "where" statement is only what's after the keyword ON. We sometimes fall on queries performing some data filtering directly at this stage, but its actual purpose is to specify the criteria used
A "join" is a very common operation that consists of associating the rows of two distinct tables according to a common criteria. For example, if you have, on one side, a table containing a client list in which each of them has a unique client number, and on a other side a order list table in which each order contains the client's number, then you may want to "resolve" the number of the latter table into its name, address, and so on.
Before SQL92 (26 years ago), the only way to achieve this was to write something like this :
SELECT client.name,
client.adress,
orders.product,
orders.totalprice
FROM client,orders
WHERE orders.clientNumber = client.clientNumber
AND orders.totalprice > 100.00
Selecting something from two (or more) tables induces a "cartesian product" which actually consists of associating every row from the first set which every row of the second one. This means that if your first table contains 3 rows and the second one 8 rows, the resulting set would be 24-row wide. And out of these, you use the WHERE clause to exclude basically everything and retain only rows in which the client number is the same on both side.
We understand that the size of the resulting set before filtering can grow exponentially if the contents of the different tables exceed a few rows (which is always the case) and it can get even worse if you imply more than two tables. Also, on the programmer's side, it rapidly becomes rather unreadable.
Therefore, if this is what you actually want to do, you now can explicitly tell the server about it, and specify the criteria at first, which will avoid unnecessary growing temporary subsets, while still letting you filter the results with WHERE if needed.
SELECT client.name,
client.adress,
orders.product,
orders.totalprice
FROM client
JOIN orders
ON orders.clientNumber = client.clientNumber
WHERE orders.totalprice > 100.00
It becomes critical when performing multiple JOIN in a single query, especially when performing both INNER and OUTER joins.
In the 2nd query your nested query takes the actor.id from its root query and only counts the results from that. In the 1st query your nested query counts results from all actors instead of only the specified one.

Linking Three Tables together

I'm creating an archive for Academic Papers. Each paper may have one author, or multiple authors. I've created the tables in the following manner:
Table 1: PaperInfo - Each row contains information on the paper
Table 2: PaperAuthor - Only Two Columns: contains PaperID, and AuthorID
Table 3: AuthorList - Contains Author Information.
There is also a Table 4 which is linked to Table 4, which contains a list of Universities which the author belongs to, but I'm going to leave it out for now in case it gets too complicated.
I wish to have a Query which will link all three tables together, and display Paper Information of the recordset in a table, with columns such as these:
Paper Title
Paper Authors
The column "Paper Authors" is going to contain more than one authors in some cases.
I've wrote the following query:
SELECT a.*,b.*,c.*
FROM PaperInfo a, PaperAuthor b, AuthorList c
WHERE a.PaperID = b.PaperID AND b.AuthorID = c.AuthorID
So far, the results I've been getting for each row is one author per row. I wish to contain more authors in one column. Can this be done in anyway?
Note: I'm using Access 2010 as my database.
In straight SQL the answer unfortunately is that it isn't possible. You would need to use a processing language in order to get the result you are after.
Since you mention you are using Access 2010 please refer to this question: is there a group_concat function in ms-access?
Particularly, read the post which points to http://www.rogersaccesslibrary.com/forum/generic-function-to-concatenate-child-records_topic16&SID=453fabc6-b3z9-34z6zb14-a78f832z-19z89a2c.html
You probably need to implement a custom function but the 2nd url does what you are looking for.
This functionality is not part of the SQL standard, but different vendors have solutions for it, see for instance Pivot Table with many to many table, MySQL pivot table.
If you know the maximum number of authors per paper (for example 3 or 4), you could get away with a triple or quadruple left join.
What you are after is an inner join.
An SQL JOIN clause is used to combine rows from two or more tables, based on a common field between them.
The most common type of join is: SQL INNER JOIN (simple join). An SQL INNER JOIN return all rows from multiple tables where the join
condition is met.
http://www.w3schools.com/sql/sql_join.asp
You may want to combine the inner join with a group to give you 1 paper to many authors in your results.
The GROUP BY statement is used in conjunction with the aggregate
functions to group the result-set by one or more columns.
http://www.w3schools.com/sql/sql_groupby.asp

Inner join sql statement

I have two tables, Invoices and members, connected by PK/FK relationship through the field InvoiceNum. I have created the following sql and it works fine, and pulls 44 records as expected.
SELECT
INVOICES.InvoiceNum,
INVOICES.GroupNum,
INVOICES.DivisionNum,
INVOICES.DateBillFrom,
INVOICES.DateBillTo
FROM INVOICES
INNER JOIN MEMBERS ON INVOICES.InvoiceNum = MEMBERS.InvoiceNum
WHERE MEMBERS.MemberNum = '20032526000'
Now, I want to replace INVOICES.GroupNum and INVOICES.DivisionNum in the above query with GroupName and DivisionName. These values are present in the Groups and Divisions tables which also have the corresponding Group_num and Division_num fields. I have created the following sql. The problem is that it now pulls 528 records instead of 44!
SELECT
INVOICES.InvoiceNum,
INVOICES.DateBillFrom,
INVOICES.DateBillTo,
DIVISIONS.DIVISION_NAME,
GROUPS.GROUP_NAME
FROM INVOICES
INNER JOIN MEMBERS ON INVOICES.InvoiceNum = MEMBERS.InvoiceNum
INNER JOIN GROUPS ON INVOICES.GroupNum = GROUPS.Group_Num
INNER JOIN DIVISIONS ON INVOICES.DivisionNum = DIVISIONS.Division_Num
WHERE MEMBERS.MemberNum = '20032526000'
Any help is greatly appreciated.
You have at least one relation between your tables which is missing in your query. It gives you extra records. Find all common fields. Say, are divisions related to groups?
The statement is fine, as far as the SQL syntax goes.
But the question you have to ask yourself (and answer it):
How many rows in Groups do you get for that given GroupNum?
Ditto for Divisions - how many rows exist for that DivisionNum?
It would appear that those numbers aren't unique - multiple rows exist for each number - therefore you get multiple rows returned