Terminology am I doing a one to many or many to many? Left join? Right? - sql

I need to research this but am confused about the terminology of what I should be researching.
Table1 fields:
salenumber
category
quantity
price
Table2 fields:
field
category
requirement
Table3 fields:
field
salenumber
value
I need to combine these.
Essentially one (salenumber, category, quantity, Price) can have a dynamic number of "fields" containing unique "data associated with it. I'm a little confused as to the terminology of what I am doing here. I'm all mixed up with left and right joins and many to one and many to many databases. If I simply knew the term for what I am trying to do it would help me to narrow down my research.

Joins are for your queries - which you run to find specific records from your database. You aren't there yet. First, you want to do table design, which is where you consider many-to-many and one-to-many relationships.
Start by thinking about what the tables represent. For example, a single sale can involve multiple different products (e.g. you buy a fork, a spoon, a knife, and a new car). Each product can be in a different category (utensils or motor vehicles, in this case). In your table design, you would decide whether a product belongs to only a single category or to multiple categories.
Let's assume there's just one category per product - then you can have many products in one category (fork, spoon and knife are all utensils), but a product can have only one category. In this case, Category to Product is a One-to-Many relationship.
How these connect is where the related fields in tables come in to play - so in the Product table you have 'Category', which refers to the Category table ('Fork' is of Category 'Utensil', and in the Category table 'Utensil' is an entry with additional information).
You probably want to look up some basic database lessons to help you out. There are some good free online classes and resources - just search for info about databases.

This may help understand joins: http://blog.codinghorror.com/a-visual-explanation-of-sql-joins/
If you have a row in Table_A with some id, and multiple rows in Table_B related referencing that id you can do a LEFT OUTER JOIN (often just LEFT JOIN). This will match the data in Table_A to Table_B - but you will have as many rows as there were in Table_B!
If you want all of the Table_B data related to the Table_A row in one row result you need an aggregate function. In SQL this is normally something like array_agg. When you do this you need to tell it what to GROUP BY, in my example it would be Table_A.id

Related

How to understand this query?

SELECT DISTINCT
...
...
...
FROM Reviews Rev
INNER JOIN Reviews SubRev ON Subrev.W_ID=Rev.ID
WHERE Rev.Status='Approved'
This is a small part of a long query that I've been trying to understand for a day now. What is happening with the join? Reviews table appears to be joined with itself, under different aliases. Why is this done? What does it achieve? Also, ID field of the Reviews table is null for the entries that are nevertheless selected and returned. This is correct, but I don't understand how that can happen if the W_ID field is not null.
It allows you to join one row from the table to a different row in the table.
I've both seen this done, and used it myself, in cases where you maybe have a relationship between those rows.
Real-world examples:
An old version of a record and a newer version
Some sort of hierarchical relationship (e.g. if the table contains records of people, you can record that someone is a parent of someone else). There are probably plenty of other possible use cases, too.
SQL allows you to create a foreign key which relates between two different columns in the same table.

I am making a SQL database of categories and subcategories. What is the best way to link these tables?

The database has a table called "categories" with columns CATEGORY_ID(primary key) and CATEGORY_NAME.
I have subcategories for each category.
For better accessing which is the best method from the below methods.
Method 1: The "CATEGORY_ID" column in the "categories" table is a FOREIGN KEY in the "subcategories " table.
Method 2: Maintaining a separate table for each category representing the subcategories.
I prefer to use same table for category and sub category
like
Table Categories
[CATEGORY_ID, CATEGORY_NAME, PARENT_CATEGORY_ID]
In case you don't know how many sub categories are there.
This scenario is just an example, the scenario is as follows: We have a product table where all the records of products are stored. Same way we will have customer table where records of customers are stored. The daily sales keep the record of all the sales. This sales table will keep record of which product who has purchased. So linking is to be done from Sales table to product table and customer table.
The query to link the two tables is as follows:SELECT product_name, customer.name, date_of_sale FROM sales, product, customer WHERE product.product_id = sales.product_id and customer.customer_id >= sales.customer_id LIMIT 0, 30
It is better to go with Method 1 since it is more scalable.
Let me elaborate on this. If we go with method 1, we need to maintain 2 tables only that is Categories and Subcategories. In future if we have new categories or subcategories we can directly deal with this 2 tables.
If we consider same situation with Method2 then we need to create new tables every time, this may become maintenance overhead.
Let me be a bit more direct. You explain in a comment that Method 2 is a separate table for each category. If so, then Method 2 -- in general -- is just wrong.
There are two methods for storing this type of information. One is a Categories table with a (single) Subcategories table. The Subcategories table would have CategoryId, a foreign key reference back to Categories. This is the normalized data model.
The second method is to store everything in one table. Each row would be a category/subcategory combination. Information about a given category would be duplicated across multiple rows, so this is not a normalized approach. However, this is a typical approach when doing dimensional modeling for decision support systems.
If the subcategories are just names of things, there is a third approach, which would be to store a list of the subcategories within each Category row. The list would not be a delimited string. It would be JSON, a nested table, XML, array, or similar collection data type supported by the database you are using. I am mentioning this as a possibility, but not recommending it.

Retrieve data from two different table in a single report

I have two table Employee and Salary table, salary consists Salary of employee in a field named Salary_employee.
Second one is Extra Expense, Extra expense consists records related to extra expenses of a company like electricity bills,office maintenance in a field named extra_expense.
(Their is no relationship between these two table).
Finally, I just wanted to show all the expenses of company in a report, for this i need to group both the table. what to use here join or union ??.
If there is no relationship between the two tables, then this really cannot work since you dont know where the expense is supposed to tie into. You should redesign the database if possible as this sounds impossible based on your description.
UPDATE
OK, by the look of your screenshots, I am guessing that this database only stores one companies info? And not multiple?
IF that is correct, AND if all you want to do is squish the data together into one flowing report of expenses, then I would indeed suggest a UNION. A JOIN would not give you the flow you are looking for. A UNION will just smash the two outputs together into one...which I think is what you are asking for?
SELECT ext_amount AS amount, ext_date AS date_of_trans
FROM extra_expenses
UNION
SELECT sal_cash AS amount, sal_dateof_payment AS date_of_trans
FROM employee_salary
It sounds like you don't need to use group or join. Simply query both tables separately within a script and handle them both accordingly to their structure to produce a report.
Join and union are functions which you can use to extract different information on a common thing from separate tables. E.g. if you have a user whose private details are stored in one table, but their profile information is in another table. If you want to display both private details as well as profile info, you can join the two tables by the common user name in order to combine and gather all info on the user in one query.

SQL join basic questions

When I have to select a number of fields from different tables:
do I always need to join tables?
which tables do I need to join?
which fields do I have to use for the join/s?
do the joins effects reflect on fields specified in select clause or on where conditions?
Thanks in advance.
Think about joins as a way of creating a new table (just for the purposes of running the query) with data from several different sources. Absent a specific example to work with, let's imagine we have a database of cars which includes these two tables:
CREATE TABLE car (plate_number CHAR(8),
state_code CHAR(2),
make VARCHAR(128),
model VARCHAR(128),);
CREATE TABLE state (state_code CHAR(2),
state_name VARCHAR(128));
If you wanted, say, to get a list of the license plates of all the Hondas in the database, that information is already contained in the car table. You can simply SELECT * FROM car WHERE make='Honda';
Similarly, if you wanted a list of all the states beginning with "A" you can SELECT * FROM state WHERE state_name LIKE 'A%';
In either case, since you already have a table with the information you want, there's no need for a join.
You may even want a list of cars with Colorado plates, but if you already know that "CO" is the state code for Colorado you can SELECT * FROM car WHERE state_code='CO'; Once again, the information you need is all in one place, so there is no need for a join.
But suppose you want a list of Hondas including the name of the state where they're registered. That information is not already contained within a table in your database. You will need to "create" one via a join:
car INNER JOIN state ON (car.state_code = state.state_code)
Note that I've said absolutely nothing about what we're SELECTing. That's a separate question entirely. We also haven't applied a WHERE clause limiting what rows are included in the results. That too is a separate question. The only thing we're addressing with the join is getting data from two tables together. We now, in effect, have a new table called car INNER JOIN state with each row from car joined to each row in state that has the same state_code.
Now, from this new "table" we can apply some conditions and select some specific fields:
SELECT plate_number, make, model, state_name
FROM car
INNER JOIN state ON (car.state_code = state.state_code)
WHERE make = 'Honda'
So, to answer your questions more directly, do you always need to join tables? Yes, if you intend to select data from both of them. You cannot select fields from car that are not in the car table. You must first join in the other tables you need.
Which tables do you need to join? Whichever tables contain the data you're interested in querying.
Which fields do you have to use? Whichever fields are relevant. In this case, the relationship between cars and states is through the state_code field in both table. I could just as easily have written
car INNER JOIN state ON (state.state_code = car.plate_number)
This would, for each car, show any states whose abbreviations happen to match the car's license plate number. This is, of course, nonsensical and likely to find no results, but as far as your database is concerned it's perfectly valid. Only you know that state_code is what's relevant.
And does the join affect SELECTed fields or WHERE conditions? Not really. You can still select whatever fields you want and you can still limit the results to whichever rows you want. There are two caveats.
First, if you have the same column name in both tables (e.g., state_code) you cannot select it without clarifying which table you want it from. In this case I might write SELECT car.state_code ...
Second, when you're using an INNER JOIN (or on many database engines just a JOIN), only rows where your join conditions are met will be returned. So in my nonsensical example of looking for a state code that matches a car's license plate, there probably won't be any states that match. No rows will be returned. So while you can still use the WHERE clause however you'd like, if you have an INNER JOIN your results may already be limited by that condition.
Very broad question, i would suggest doing some reading on it first but in summary:
1. joins can make life much easier and queries faster, in a nut shell try to
2. the ones with the data you are looking for
3. a field that is in both tables and generally is unique in at least one
4. yes, essentially you are createing one larger table with joins. if there are two fields with the same name, you will need to reference them by table name.columnname
do I always need to join tables?
No - you could perform multiple selects if you wished
which tables do I need to join?
Any that you want the data from (and need to be related to each other)
which fields do I have to use for the
join/s?
Any that are the same in any tables within the join (usually primary key)
do the joins effects reflect on fields specified in select clause or on where conditions?
No, however outerjoins can cause problems
(1) what else but tables would you want to join in mySQL?
(2) those from which you want to correlate and retrieve fields (=data)
(3) best use indexed fields (unique identifiers) to join as this is fast. e.g. join
retrieve user-email and all the users comments in a 2 table db
(with tables: tableA=user_settings, tableB=comments) and both having the column uid to indetify the user by
select * from user_settings as uset join comments as c on uset.uid = c.uid where uset.email = "test#stackoverflow.com";
(4) both...

Architecture of SQL tables

I am wondering is it more useful and practical (size of DB) to create multiple tables in sql with two columns (one column containing foreign key and one column containing random data) or merge it and create one table containing multiple columns. I am asking this because in my scenario one product holding primary key could have sufficient/applicable data for only one column while other columns would be empty.
example a. one table
productID productname weight no_of_pages
1 book 130 500
2 watch 50 null
3 ring null null
example b. three tables
productID productname
1 book
2 watch
3 ring
productID weight
1 130
2 50
productID no_of_pages
1 500
The multi-table approach is more "normal" (in database terms) because it avoids columns that commonly store NULLs. It's also something of a pain in programming terms because you have to JOIN a bunch of tables to get your original entity back.
I suggest adopting a middle way. Weight seems to be a property of most products, if not all (indeed, a ring has a weight even if small and you'll probably want to know it for shipping purposes), so I'd leave that in the Products table. But number of pages applies only to a book, as do a slew of other unmentioned properties (author, ISBN, etc). In this example, I'd use a Products table and a Books table. The books table would extend the Products table in a fashion similar to class inheritance in object oriented program.
All book-specific properties go into the Books table, and you join only Products and Books to get a complete description of a book.
I think this all depends on how the tables will be used. Maybe your examples are oversimplifying things too much but it seems to me that the first option should be good enough.
You'd really use the second example if you're going to be doing extremely CPU intensive stuff with the first table and will only need the second and third tables when more information about a product is needed.
If you're going to need the information in the second and third tables most times you query the table, then there's no reason to join over every time and you should just keep it in one table.
I would suggest example a, in case there is a defined set of attributes for product, and an example c if you need variable number of attributes (new attributes keep coming every now and then) -
example c
productID productName
1 book
2 watch
3 ring
attrID productID attrType attrValue
1 1 weight 130
2 1 no_of_pages 500
3 2 weight 50
The table structure you have shown in example b is not normalized - there will be separate id columns required in second and third tables, since productId will be an fk and not a pk.
It depends on how many rows you are expecting on your PRODUCTS table. I would say that it would not make sense to normalize your tables to 3N in this case because product name, weight, and no_of_pages each describe the products. If you had repeating data such as manufacturers, it would make more sense to normalize your tables at that point.
Without knowing the background (data model), there is no way to tell which variant is more "correct". both are fine in certain scenarios.
You want three tables, full stop. That's best because there's no chance of watches winding up with pages (no pun intended) and some books without. If you normalize, the server works for you. If you don't, you do the work instead, just not as well. Up to you.
I am asking this because in my scenario one product holding primary key could have sufficient/applicable data for only one column while other columns would be empty.
That's always true of nullable columns. Here's the rule: a nullable column has an optional relationship to the key. A nullable column can always be, and usually should be, in a separate table where it can be non-null.