About DB Relations And Indexes - sql

I have three tables and they have a total of 2 million rows.
I'm creating a SQL query with inner join with two , sometimes three tables together. Sometimes I'm just creating a SQL query for one table.
I want to create index for it, but I'm not sure how can I create. How should I do it ? Three columns (all ID columns in tables) together or seperated. And which index will works fine ?
The last question is about DB relations. This three tables doesn't have a PF-FK relationship. Can it effect for my query time ?
If you can help me, thanks for it :)

With so little data about your model I can suggest this:
Create an index for each column on each table that is used in a join
If you want more suggestions you need to give more information.

Related

Why isn't "union all" doing what I expect?

I created 2 summary tables form the same source data for different date ranges.
Now that I have these multiple summary tables, I want to put those tables together
so that I will be able to run a summary on the combined table.
It's creating the summary table that is presenting the problem.
scratch.table_1 has 809,598 records.
scratch.table_2 has 1,228,176 records.
They both have the same set of fields from the source table,
plus a "record_number" field I created on each table using count(1).
The code I used to put these two tables together was:
create table scratch.table_1_and_2
select * from scratch.table_1
union all
select * from scratch.table_2
I assumed that there would be 809,598 + 1,228,176 records in the new table (2,037,774 records).
But there are only 1,960,769 records in the new table.
What am i doing wrong?
One way to troubleshoot would be to identify some of the missing records and see what might be different about the data in those that would cause them to be left out. A UNION ALL should include duplicate records so duplicates shouldn't be the issue. Maybe there is some data issue that's causing those records to be dropped. Also I'm assuming there isn't any funny business with Views going on in the underlying tables and that no data loads are affecting your record counts.

How to understand this query?

SELECT DISTINCT
...
...
...
FROM Reviews Rev
INNER JOIN Reviews SubRev ON Subrev.W_ID=Rev.ID
WHERE Rev.Status='Approved'
This is a small part of a long query that I've been trying to understand for a day now. What is happening with the join? Reviews table appears to be joined with itself, under different aliases. Why is this done? What does it achieve? Also, ID field of the Reviews table is null for the entries that are nevertheless selected and returned. This is correct, but I don't understand how that can happen if the W_ID field is not null.
It allows you to join one row from the table to a different row in the table.
I've both seen this done, and used it myself, in cases where you maybe have a relationship between those rows.
Real-world examples:
An old version of a record and a newer version
Some sort of hierarchical relationship (e.g. if the table contains records of people, you can record that someone is a parent of someone else). There are probably plenty of other possible use cases, too.
SQL allows you to create a foreign key which relates between two different columns in the same table.

SQL JOIN OPTIMIZATION

I am working on a generalized problem where I am given only schema definition of multiple tables that i have.
Now i have to retrieve certain columns by joining multiple tables such that number of joins are minimized.
Example: Suppose i have 3 tables and here is the list of columns that they have.
Table 1:(1,2,3,4,5),
Table 2:(5,6,7),
Table 3:(5,6,7,8)
Now suppose I have a query in which i want all the columns 1,2,3,4,5,6,7,8.
Now i can join either table 1,table 2 and table 3 OR
table 1 and table 3.I would get the required information in both the cases but joining table 1 and table 3 would require only 1 join rather than 2 join in other case.
What i was trying was a greedy algorithm in which first i would consider table that has maximum number of required columns then eliminate the common columns between the query and table(from both query and table) and then consider updated required columns and update tables and so on.But i guess it would be slow.
So is there a generalized algorithm or if anyone can give me any hint in this direction?
first of all, I have to mention that it's not "join", but "union".
Then I have to mention that if you want to use the greedy algorithm, you have to first join the 2 most short, cause when you join a table 2 times, it would be of o(n), and so you will have 2n operations to do, and so it would be better if n be as smaller as possible.
Beside these, the following link may be useful for you:
Merging 3 tables/queries using MS Access Union Query

SSRS 2008 R2 Data Region Embedded in Another Data Region

I have two unrelated tables (Table A and Table B) that I would like to join to create a unique list of pairings of the two. So, each row in Table A will pair with each row in Table B creating a list of unique pairings between the two tables.
My ideas of what can be done:
I can either do this in the query (SQL) by creating one dataset and having two fields outputted (each row equaling a unique pairing).
Or by creating two different datasets (one for each table) and have a data region embedded within a different data region; each data region pulling from a different dataset (of the two created for each table).
I have tried implementing the second method but it would not allow me to select a different dataset for the embedded data region from the parent data region.
The first method I have not tried but do not understand how or even if it is possible through the SQL language.
Any help or guidance in this matter would be greatly appreciated!
The first is called a cross join:
select t1.*, t2.*
from t1 cross join
t2;
Whether you should do this in the application or in the database is open to question. It depends on the size of the tables and the bandwidth to the database -- there is an overhead to pulling rows from a database.
If each table has 2 rows, this is a non-issue. If each table has 100 rows, then you would be pulling 10,000 rows from the database and it might be faster to pull 2*100 rows and do the looping in the application.

Joining multiple Tables in Oracle gives out duplicated records

I am a newbie to sql. I have three tables mr1,mr2,mr3. Caseid is the primary keys in all these tables. I need to join all these table columns and display result.
Problem is that i dont know which join to use.
when i joined all these just like below query:
select mr1.col1,mr1.col2,mr2.col1,mr2.col2,mr3.col1,mr3.col2
from mr1,mr2,mr3
where mr1.caseid = mr2.caseid
and mr2.caseid = mr3.caseid;
it displays 4 records, eventhough the maximum number of records is two, which is in table mr2.
records are duplicated, can anyone help me in this regard?
Distinct will do it but it's not the correct approch.
You need to add another join (mr1.caseid = mr3.caseid) because mr2 and mr3 rows must be related to the same row in mr1, otherwise you end up with 2 pairs, onde for each tabled joined to your primary table (mr2).
First answer in SO, so forgive me if it wasn't that clear.
Your problem is that your tables are in a one-to many relationship. When you join them, it is expected that the number of rows will go up unless you take steps to limit the records returned. How to fix depends on the meaning of the data.
If all the fields are exactly the same, then adding DISTINCT will fix the problem. However, it may be faster, depending on the size of the tables and the number of records you are returning, to use a derived table to limit the records in the join to only one from the table with multiple records.
If at least one of the fields is different however, then you need to know the business rule that will allow you to pick the correct record. It might be accomplished by adding a where clause or by using an aggregate function and group by or even both. This really depends on the meaning of the result set which is why you need to ask further question in your own organization as they are the only ones who will know which of the multiple records is the correct one to pick from the perspectives of the people who will be using the results of the query. Further, the business might actually want to see all of the records and you have no problem at all.