Matching Entities - Self many to many - sql

Working with a new master data management product, specifically people matching.
I have two tables: Person and PersonMatch. PersonMatch is a join table that matches rows from Person to another row in Person.
Person: 1,2,3,4 (Per the PersonMatch, these are all the same Person).
PersonMatch: 1+2, 2+3, 3+4, 4+1
I can't wrap my head around a query to treat all four entities from the Person table as the same. Thanks for any help!

It sounds to me like you need a specific column that is a PersonUniqueId, and all the Person instances that are the same could have there own unique id to identify their differences, but they would each share the same 'PersonUniqueId', which reflects that they are really the same person, just with different or additional subsequent info.
If you do not go with a PersonUniqueId then you would need to figure out what in each record identifies the two persons as being the same, perhaps the name, or whatever else. Without knowing more about your overall structure It is hard to say what is the best direction, but hopefully this is a start.
It might be that the PersonMatch table is not even needed. Alternately, you could put the PersonUniqueId column in that table and then do something like:
SELECT p.*
FROM Persons p, PersonMatch pm
WHERE p.PersonId = pm.PersonId
GROUP BY pm.PersonUniqueId
Hope this helps.

Related

How to understand this query?

SELECT DISTINCT
...
...
...
FROM Reviews Rev
INNER JOIN Reviews SubRev ON Subrev.W_ID=Rev.ID
WHERE Rev.Status='Approved'
This is a small part of a long query that I've been trying to understand for a day now. What is happening with the join? Reviews table appears to be joined with itself, under different aliases. Why is this done? What does it achieve? Also, ID field of the Reviews table is null for the entries that are nevertheless selected and returned. This is correct, but I don't understand how that can happen if the W_ID field is not null.
It allows you to join one row from the table to a different row in the table.
I've both seen this done, and used it myself, in cases where you maybe have a relationship between those rows.
Real-world examples:
An old version of a record and a newer version
Some sort of hierarchical relationship (e.g. if the table contains records of people, you can record that someone is a parent of someone else). There are probably plenty of other possible use cases, too.
SQL allows you to create a foreign key which relates between two different columns in the same table.

Terminology am I doing a one to many or many to many? Left join? Right?

I need to research this but am confused about the terminology of what I should be researching.
Table1 fields:
salenumber
category
quantity
price
Table2 fields:
field
category
requirement
Table3 fields:
field
salenumber
value
I need to combine these.
Essentially one (salenumber, category, quantity, Price) can have a dynamic number of "fields" containing unique "data associated with it. I'm a little confused as to the terminology of what I am doing here. I'm all mixed up with left and right joins and many to one and many to many databases. If I simply knew the term for what I am trying to do it would help me to narrow down my research.
Joins are for your queries - which you run to find specific records from your database. You aren't there yet. First, you want to do table design, which is where you consider many-to-many and one-to-many relationships.
Start by thinking about what the tables represent. For example, a single sale can involve multiple different products (e.g. you buy a fork, a spoon, a knife, and a new car). Each product can be in a different category (utensils or motor vehicles, in this case). In your table design, you would decide whether a product belongs to only a single category or to multiple categories.
Let's assume there's just one category per product - then you can have many products in one category (fork, spoon and knife are all utensils), but a product can have only one category. In this case, Category to Product is a One-to-Many relationship.
How these connect is where the related fields in tables come in to play - so in the Product table you have 'Category', which refers to the Category table ('Fork' is of Category 'Utensil', and in the Category table 'Utensil' is an entry with additional information).
You probably want to look up some basic database lessons to help you out. There are some good free online classes and resources - just search for info about databases.
This may help understand joins: http://blog.codinghorror.com/a-visual-explanation-of-sql-joins/
If you have a row in Table_A with some id, and multiple rows in Table_B related referencing that id you can do a LEFT OUTER JOIN (often just LEFT JOIN). This will match the data in Table_A to Table_B - but you will have as many rows as there were in Table_B!
If you want all of the Table_B data related to the Table_A row in one row result you need an aggregate function. In SQL this is normally something like array_agg. When you do this you need to tell it what to GROUP BY, in my example it would be Table_A.id

Table with set of 3 columns maximum which have same meaning

I have table Person which have maximum 3 opinion and for each rows of Person we have different opinion, in other word you never find 2 person that have the same opinion, there is no many-to-many relationship between person and opinion.
I will never check opinion for validation like no 2 person have same opinion, it's just for information.
the question it is :
should i make just one table
Person ( #id_person , ... , opinion1 , opinion2 , opinion3 , ... )
or add a new table :
Person ( #id_person , ... )
opinions ( #id_opinion , opinion , *id_person* ) // id_person FK
For me i don't want to create a new table opinions because it will have no meaning i will always add a new rows as much i have a new Person
also if i group them in one table and i have just one opinion there will be problem of waste of space ? even if i declare opinion as varchar ?
And if i create a new table opinions it will need a primary pseudo-key id_opinion ?
a opinion can be a varchar(50).
I would recommend two tables. The first table is the persons table. This would have a PersonId and all sorts of other information about the person.
The second table would be the PersonOpinions table. This would have one row per opinion, with information such as:
PersonId
Opinion
Date and time of the opinion
Topic of the opinion (if appropriate)
Method for inputing the opinion (if appropriate)
From what you say, there is no need for a separate opinions table, because the opinion is basically "unique". However, you probably do want to store the opinions themselves in a separate table, which a separate row per opinion.
You can use a trigger to enforce the constraint that a person only has three opinions. If you decide to change this in the future, then it will be easy with a two-table solution.
I would suggest to go for 3 different tables(2 base and 1 intersection). Since Person and Opinions are 2 different entities and they might share a relationship of M:M so hence the need of intersection table. I believe it will simply things instead of clutering the info in 1 single table.
You should have two tables. (Whether you want opinion ids is a separate issue. You wouldn't need to.)
// person [id_person] has name [name] and ...
person(id_person,name,...)
// person [id_person] has opinion [opinion]
opinion(id_person,opinion)
fk (id_person) references person
Your design is bad partly because: it is not clear what the answers to the following questions are, yet the answers must be known for someone to use the database. When you have given the answers (ie the rest of a full design of your table) they will illustrate other problems.
Suppose:
person p1 has opinion o11.
person p1 has opinion o12.
person p1 has no other opinions.
person p3 has opinion o21
person p3 has no other opinions.
A. What row(s) go in the table?
B. What are queries (without knowing the table value) for rows where
person p has opinion [opinion]
person [person] does not have opinion o
C. Every table needs a statement parameterized by its columns where the rows in the table are the rows that make the statement true.
What is the statement for your table?
Why isn't it
person [id_person] is named [name] and ...
AND person [id_person] holds opinion [opinion1]
AND person [id_person] holds opinion [opinion2]
AND person [id_person] holds opinion [opinion3]
...
You should experience the further badness in answering the questions. Besides being unclear, the desription so far is also bad because the answers are ugly and your table is difficult to use. So much so that I don't even want to type examples of what I know they will be like. So I will leave the questions rhetorical rather than with examples until you give more info.

Retrieve data from two different table in a single report

I have two table Employee and Salary table, salary consists Salary of employee in a field named Salary_employee.
Second one is Extra Expense, Extra expense consists records related to extra expenses of a company like electricity bills,office maintenance in a field named extra_expense.
(Their is no relationship between these two table).
Finally, I just wanted to show all the expenses of company in a report, for this i need to group both the table. what to use here join or union ??.
If there is no relationship between the two tables, then this really cannot work since you dont know where the expense is supposed to tie into. You should redesign the database if possible as this sounds impossible based on your description.
UPDATE
OK, by the look of your screenshots, I am guessing that this database only stores one companies info? And not multiple?
IF that is correct, AND if all you want to do is squish the data together into one flowing report of expenses, then I would indeed suggest a UNION. A JOIN would not give you the flow you are looking for. A UNION will just smash the two outputs together into one...which I think is what you are asking for?
SELECT ext_amount AS amount, ext_date AS date_of_trans
FROM extra_expenses
UNION
SELECT sal_cash AS amount, sal_dateof_payment AS date_of_trans
FROM employee_salary
It sounds like you don't need to use group or join. Simply query both tables separately within a script and handle them both accordingly to their structure to produce a report.
Join and union are functions which you can use to extract different information on a common thing from separate tables. E.g. if you have a user whose private details are stored in one table, but their profile information is in another table. If you want to display both private details as well as profile info, you can join the two tables by the common user name in order to combine and gather all info on the user in one query.

SQL join basic questions

When I have to select a number of fields from different tables:
do I always need to join tables?
which tables do I need to join?
which fields do I have to use for the join/s?
do the joins effects reflect on fields specified in select clause or on where conditions?
Thanks in advance.
Think about joins as a way of creating a new table (just for the purposes of running the query) with data from several different sources. Absent a specific example to work with, let's imagine we have a database of cars which includes these two tables:
CREATE TABLE car (plate_number CHAR(8),
state_code CHAR(2),
make VARCHAR(128),
model VARCHAR(128),);
CREATE TABLE state (state_code CHAR(2),
state_name VARCHAR(128));
If you wanted, say, to get a list of the license plates of all the Hondas in the database, that information is already contained in the car table. You can simply SELECT * FROM car WHERE make='Honda';
Similarly, if you wanted a list of all the states beginning with "A" you can SELECT * FROM state WHERE state_name LIKE 'A%';
In either case, since you already have a table with the information you want, there's no need for a join.
You may even want a list of cars with Colorado plates, but if you already know that "CO" is the state code for Colorado you can SELECT * FROM car WHERE state_code='CO'; Once again, the information you need is all in one place, so there is no need for a join.
But suppose you want a list of Hondas including the name of the state where they're registered. That information is not already contained within a table in your database. You will need to "create" one via a join:
car INNER JOIN state ON (car.state_code = state.state_code)
Note that I've said absolutely nothing about what we're SELECTing. That's a separate question entirely. We also haven't applied a WHERE clause limiting what rows are included in the results. That too is a separate question. The only thing we're addressing with the join is getting data from two tables together. We now, in effect, have a new table called car INNER JOIN state with each row from car joined to each row in state that has the same state_code.
Now, from this new "table" we can apply some conditions and select some specific fields:
SELECT plate_number, make, model, state_name
FROM car
INNER JOIN state ON (car.state_code = state.state_code)
WHERE make = 'Honda'
So, to answer your questions more directly, do you always need to join tables? Yes, if you intend to select data from both of them. You cannot select fields from car that are not in the car table. You must first join in the other tables you need.
Which tables do you need to join? Whichever tables contain the data you're interested in querying.
Which fields do you have to use? Whichever fields are relevant. In this case, the relationship between cars and states is through the state_code field in both table. I could just as easily have written
car INNER JOIN state ON (state.state_code = car.plate_number)
This would, for each car, show any states whose abbreviations happen to match the car's license plate number. This is, of course, nonsensical and likely to find no results, but as far as your database is concerned it's perfectly valid. Only you know that state_code is what's relevant.
And does the join affect SELECTed fields or WHERE conditions? Not really. You can still select whatever fields you want and you can still limit the results to whichever rows you want. There are two caveats.
First, if you have the same column name in both tables (e.g., state_code) you cannot select it without clarifying which table you want it from. In this case I might write SELECT car.state_code ...
Second, when you're using an INNER JOIN (or on many database engines just a JOIN), only rows where your join conditions are met will be returned. So in my nonsensical example of looking for a state code that matches a car's license plate, there probably won't be any states that match. No rows will be returned. So while you can still use the WHERE clause however you'd like, if you have an INNER JOIN your results may already be limited by that condition.
Very broad question, i would suggest doing some reading on it first but in summary:
1. joins can make life much easier and queries faster, in a nut shell try to
2. the ones with the data you are looking for
3. a field that is in both tables and generally is unique in at least one
4. yes, essentially you are createing one larger table with joins. if there are two fields with the same name, you will need to reference them by table name.columnname
do I always need to join tables?
No - you could perform multiple selects if you wished
which tables do I need to join?
Any that you want the data from (and need to be related to each other)
which fields do I have to use for the
join/s?
Any that are the same in any tables within the join (usually primary key)
do the joins effects reflect on fields specified in select clause or on where conditions?
No, however outerjoins can cause problems
(1) what else but tables would you want to join in mySQL?
(2) those from which you want to correlate and retrieve fields (=data)
(3) best use indexed fields (unique identifiers) to join as this is fast. e.g. join
retrieve user-email and all the users comments in a 2 table db
(with tables: tableA=user_settings, tableB=comments) and both having the column uid to indetify the user by
select * from user_settings as uset join comments as c on uset.uid = c.uid where uset.email = "test#stackoverflow.com";
(4) both...