Linking Three Tables together - sql

I'm creating an archive for Academic Papers. Each paper may have one author, or multiple authors. I've created the tables in the following manner:
Table 1: PaperInfo - Each row contains information on the paper
Table 2: PaperAuthor - Only Two Columns: contains PaperID, and AuthorID
Table 3: AuthorList - Contains Author Information.
There is also a Table 4 which is linked to Table 4, which contains a list of Universities which the author belongs to, but I'm going to leave it out for now in case it gets too complicated.
I wish to have a Query which will link all three tables together, and display Paper Information of the recordset in a table, with columns such as these:
Paper Title
Paper Authors
The column "Paper Authors" is going to contain more than one authors in some cases.
I've wrote the following query:
SELECT a.*,b.*,c.*
FROM PaperInfo a, PaperAuthor b, AuthorList c
WHERE a.PaperID = b.PaperID AND b.AuthorID = c.AuthorID
So far, the results I've been getting for each row is one author per row. I wish to contain more authors in one column. Can this be done in anyway?
Note: I'm using Access 2010 as my database.

In straight SQL the answer unfortunately is that it isn't possible. You would need to use a processing language in order to get the result you are after.

Since you mention you are using Access 2010 please refer to this question: is there a group_concat function in ms-access?
Particularly, read the post which points to http://www.rogersaccesslibrary.com/forum/generic-function-to-concatenate-child-records_topic16&SID=453fabc6-b3z9-34z6zb14-a78f832z-19z89a2c.html
You probably need to implement a custom function but the 2nd url does what you are looking for.

This functionality is not part of the SQL standard, but different vendors have solutions for it, see for instance Pivot Table with many to many table, MySQL pivot table.

If you know the maximum number of authors per paper (for example 3 or 4), you could get away with a triple or quadruple left join.

What you are after is an inner join.
An SQL JOIN clause is used to combine rows from two or more tables, based on a common field between them.
The most common type of join is: SQL INNER JOIN (simple join). An SQL INNER JOIN return all rows from multiple tables where the join
condition is met.
http://www.w3schools.com/sql/sql_join.asp
You may want to combine the inner join with a group to give you 1 paper to many authors in your results.
The GROUP BY statement is used in conjunction with the aggregate
functions to group the result-set by one or more columns.
http://www.w3schools.com/sql/sql_groupby.asp

Related

Refer to different id's within the same query in SQL

I need to brind a lot of columns from several tables using LEFT JOIN. My starting point is the orders table and I bring the vendor name from the "Address_table". Then I add another table with order details and then the shipping information of each order detail.
My problem is that I need to bring a different record from "Address_table" to refer onether id's detailed in shipment table as of "origin_id" and "destination_id".
In other words, "address_id", "origin_id" and "destination_id" are all records from "Address_table". I brought the first one related to the vendor, how can I retrieve the other two?
Example
Thanks in advance
Your question is not exactly clear in terms of the tables and their relationships. It is, however, clear what the problem is. You need to join against the same table twice using different columns.
In order to do that you need to use table aliases. For example, you can do:
select *
from shipment s
left join address_table a on a.address_id = s.origin_id
left join address_table b on b.address_id = s.destination_id
In this example the table address_table is joined twice against the table shipment; the first time we use a as an alias, the second time b. This way you can differentiate how to pick the right columns and make the joins work as you need them to.

SQL Query across two tables only show most recently updated result per tag address

I have two tables: violator_state and violator_tags
violator_state:
m_state_id
is_violating
m_translatedid
m_tag
m_violator_tag
This table holds the "tags" which has an unchanging row count of 10 in this case. The purpose is to list out each tag present, connect the full tag address (m_violator_tag) with its shorthand name (m_tag) and state whether it is in "violation". I need to use this table as reference because of the link between m_violator_tag and m_tag.
violator_tags
m_violator_id
m_eval_time_from
m_eval_time_to
m_tag
m_tag_peers
m_tag_position
This table is constantly having new rows added to it holding the information of what tags are in violation with a specific tag. So it would show T6 in violation with T1,T2,T9 ect.
I am looking to create a query which joins the two tables to show only the most recently updated (largest m_eval_time_from) for each tag.
I am using the following query to join the two tables but I expect m_translatedid and m_tag to match but they do not. Unsure why.
SELECT violator_state.m_violator_tag, violator_state.is_violating, violator_state.m_translatedid, violator_tags.m_tag, violator_tags.m_eval_time_to, violator_tags.m_tag_peers,
violator_tags.m_tag_position, violator_tags.m_eval_time_from
FROM violator_tags CROSS JOIN
violator_state
Violation_state table
violation_tags table
results of my (incorrect) query
Any suggestions on what I should try?
Your CROSS JOIN will give you a cartesian product where EVERY row in the first table is paired with ALL the rows in the second table e.g. if you have 10 rows in each, you will get 10 x 10 = 100 rows in the result! I believe you need to join the tables on the m_tag column and select the violator_tags row with the latest date. The query below should do this for you (though you haven't provided your question in a manner that makes it easy for me to double-check my code - see the link provided by a_horse_with_no_name for more on this or use a website like db-fiddle to set up your example).
SELECT vs.m_violator_tag,
vs.is_violating,
vs.m_translatedid,
vt.m_tag,
vt.m_eval_time_to,
vt.m_tag_peers,
vt.m_tag_position,
vt.m_eval_time_from
FROM violator_tags vt
JOIN violator_state vs
ON vt.m_tag = vs.m_tag
AND vt.m_eval_time_from = (SELECT MAX(vt.m_eval_time_from)
FROM violator_tags
WHERE m_tag = vt.m_tag)

Select name for authors that havent written a book

We have to select authors that havent written a book but there are 3 different tables which makes me confused about how to write the join expression.
We have tables:
authors: author_id
authorships: author_id, book_id
books: book_id.
Obviously I selected the names from authors and tried inner join but it wont work for me. Help would be appreciated!
Since this sounds like a school assignment I won't give the full answer.
Try using an outer join between authors and authorship. Make sure you retrieve the book I'd from the authorship.
Try to work out what an author who has not published looks like the. You can use this to formulate the query for the answer you are looking for with an appropriate where clause.
This is a good spot to use the LEFT JOIN antipattern:
SELECT a.*
FROM authors a
LEFT JOIN authorships s ON s.author_id = a.author_id
WHERE s.author_id IS NULL
Rationale: when the LEFT JOIN comes up empty, it means that the author has no corresponding record in the authorships table. The WHERE clause filters out on unmatched authors records only (ie authors that have no books). This is called an antipattern because the purpose of a JOIN is usually to match records, whereas here we use it to detect unmatched records.
Its really easy, just check which column seems to be having common value between all this three tables if something is common atleast within two tables then put inner join on those two and an outer join on the uncommon data table.
Remember your Aliases will always matter when you join between different tables, also the ON and WHERE should be properly mentioned.

SQL: different results when using wildcard?

Using PostgreSQL 9.6.12.
Given an author has many blog posts.
When I run the following query I get a row for each associated post.
SELECT authors.id
FROM authors
LEFT JOIN posts ON authors.id = posts.author_id
When I run the following, I only get a row for each author:
SELECT authors.*
FROM authors
LEFT JOIN posts ON authors.id = posts.author_id
When I run a count on either one, however, I get the higher row count. E.g. the count of all the posts.
Why don't I get the higher row count result when I use the wildcard to select all the columns?
The problem could be caused by how you are running the query, and the settings of the IDE. These queries should return the same row count. Please run the following queries to check.
select count(*) from (SELECT authors.id
FROM authors
LEFT JOIN posts ON authors.id = posts.author_id)
select count(*) from (SELECT authors.*
FROM authors
LEFT JOIN posts ON authors.id = posts.author_id)
Why don't I get a cartesian product result when I use the wildcard to
select all the columns?
You do not get a cartesian product in either of the two SQL queries.
When I run a count on either one, however, I get the cartesian product
number of rows. E.g. the count of all the posts.
You are not calculating the count of all the posts. You are retrieving all posts that have an author in the authors table.
I am afraid you are confusing the term cartesian product. A cartesian product is the number of rows in the first table times the number of rows in the second table, without and limiting clause/condition. In simple SQL it would correspond to the following e.g.:
SELECT * FROM authors, posts
The two queries in your question return the exact same rows, except that the first query displays only the column id of the authors table while the second displays all the columns of the authors table.
This is standard SQL and I am very confident that every technology respecting the SQL standard would respect the above said.
I hope you see what I mean and suggest that you review the question. It may help if you can show some concrete example, in particular you would have to clarify:
what do you mean by "cartesian product"? (your definition differs from the common usage)
how do you count rows? (according to your example I find it hard to believe you count different number of rows; they must be equal)

How do you JOIN tables to a view using a Vertica DB?

Good morning/afternoon! I was hoping someone could help me out with something that probably should be very simple.
Admittedly, I’m not the strongest SQL query designer. That said, I’ve spent a couple hours beating my head against my keyboard trying to get a seemingly simple three way join working.
NOTE: I'm querying a Vertica DB.
Here is my query:
SELECT A.CaseOriginalProductNumber, A.CaseCreatedDate, A.CaseNumber, B.BU2_Key as BusinessUnit, C.product_number_desc as ModelNumber
FROM pps_sfdc.v_Case A
INNER JOIN reference_data.DIM_PRODUCT_LINE_HIERARCHY B
ON B.PL_Key = A.CaseOriginalProductLine
INNER JOIN reference_data.DIM_PRODUCT C
ON C.product_line_code = A.CaseOriginalProductLine
WHERE B.BU2_Key = 'XWT'
LIMIT 20
I have a view (v_Case) that I’m trying to join to two other tables so I can lookup a value from each of them. The above query returns identical data on everything EXCEPT the last column (see below). It's like it's iterating through the last column to pull out the unique entries, sort of like a "GROUP BY" clause. What SHOULD be happening is that I get unique rows with specific "BusinessUnit" and "ModelNumber" for that record.
DUMEPRINT 5/2/2014 8:56:27 AM 3002845327 JJT Product 1
DUMEPRINT 5/2/2014 8:56:27 AM 3002845327 JJT Product 2
DUMEPRINT 5/2/2014 8:56:27 AM 3002845327 JJT Product 3
DUMEPRINT 5/2/2014 8:56:27 AM 3002845327 JJT Product 4
I modeled my solution after this post:
How to deal with multiple lookup tables for beginners of SQL?
What am I doing wrong?
Thank you for any help you can provide.
Data issue. General rule in trouble shooting these is the column that is distinct (in this case C.product_number_desc as ModelNumber) for each record is generally where the issue is going to be...and why I pointed you towards dim_product.
If you receive duplicates, this query below will help identify if this table is giving you the issues. Remember key in this statement can be multiple fields...whatever you are joining the table on:
Select key,count(1) from table group by key having count(1)>1
Other options for the future...don't assume it's your code, duplicates like this almost always point towards dirty data (other option is you are causing cross joins because keys are not correct). If you comment out the 'c' table and the column referred to in the select clause, you would have received one row...hence your dupes were coming from the 'c' table here.
Good luck with it