maximum and minimum number of tuples in natural join - sql

I came across a question that states
Consider the following relation schema pertaining to a students
database: Student (rollno, name, address)
Enroll (rollno, courseno, coursename)
where the primary keys are shown underlined. The number of tuples in the
Student and Enroll tables are 120 and 8 respectively. What are the maximum
and minimum number of tuples that can be present in (Student * Enroll),
where '*' denotes natural join ?
I have seen several solutions on Internet like this or this
As per my understanding. maximum tuples should be 8 and minimum should be 8 as well, since for each (rollnum,course) there should be a roll num in Students. Anyone who can help in this regard

I hope, you understood what Natural Join exactly is. You can review here.
If the tables R and S contains common attributes and value of that attribute in each tuple in both tables are same, then the natural join will result n*m tuples as it will return all combinations of tuples.
Consider following two tables
Table R (With attributes A and C)
A | C
----+----
1 | 2
3 | 2
Table S (With attributes B and C)
B | C
----+----
4 | 2
5 | 2
6 | 2
Result of natural join R * S (If domain of attribute C in the two tables are same )
A | B | C
---+---+----
1 | 4 | 2
1 | 5 | 2
1 | 6 | 2
3 | 4 | 2
3 | 5 | 2
3 | 6 | 2
You can see both R and S contain the attribute C whose value is 2 in each and every tuple. Table R contains 2 tuples, Table S contains 3 tuples, where Result table contains 2*3=6 tuples.
Moreover, while performing a natural join, if there were no common attributes between the two relations, Natural join will behave as Cartesian Product. In that case, you'll obviously have m x n as maximum number of tuples.
Consider following two tables
Table R (With attributes A and B)
A | B
----+----
1 | 2
3 | 2
Table S (With attributes C and D)
C | D
----+----
4 | 2
5 | 2
Result of natural join R * S
A | B | C | D
---+---+----+----
1 | 2 | 4 | 2
1 | 2 | 5 | 2
3 | 2 | 4 | 2
3 | 2 | 5 | 2
Hope this helps.

If there was a referential constraint in place ensuring that every rollno in Enroll must also appear in Student then your answer of 8 for both minimum and maximum would be correct. The question doesn't actually mention any such constraint however. There's no need to assume that the RI constraint exists just because the rollno attribute appears in both tables. So the best answer is 0 minimum and 8 maximum. If it's a multiple-choice question and 0,8 isn't one of the given answers then answer 8,8 instead - and tell your teacher that the question is unclear.

If you are asking about the maximum number of tuple that could appear in the natural join of R and S
the its the Cartesian product of both the tuples

Yes answer should be 8,8 .
Because Rollno is key in Student table and rollno,courseno are compound key .
Relationships between Student and enrol table is 1:M .
So maximum number of tuples is same as many side ie. 8
And minimum number of tuples is 8 if Foreign key exist other wise 0.
So answer is 8,8 .

Related

Flatten tree structure represented in SQL [duplicate]

This question already has an answer here:
SQL Server recursive self join
(1 answer)
Closed 3 years ago.
I'm using an engineering calculation package and trying to extract some information from it in a built in reporting tool that allows SQL query
An abbreviated example SQL tables are as follows:
Id | Description | Ref
---|---------------------
1 | system 1 |
3 | block 4 | 6
3 | block 4 | 1
5 | formula1 | 3
6 | f |
7 | something | 1
9 | cheese | 5
The "Ref" column identifies rows that are subrecords of other items.
What I want to do is run a query that will produce a list that will show all items that appear on a each page. As you can see from the table above "ID" is not the unique key; each item can appear in multiple locations within the table. In the example above:
ID 5 is a subitem of ID3
ID 3 is a subitem of ID 1 AND ID 6
ID 1 and ID 6 aren't subitems of anything
So effectively it is representing a tree structure:
ID 1
+-------- ID 7
|---- ID 3
+---- ID 5
+---- ID 9
ID 6
+---- ID 3
+---- ID 5
+---- ID 9
What I'm hoping to is work out which items appear under each top level item (so the end result should be a table where in the "Ref" column only top level items appear):
Id | Description | Ref
---|---------------------
1 | system 1 |
3 | block 4 | 6
3 | block 4 | 1
5 | formula1 | 1
5 | formula1 | 6
6 | f |
9 | cheese | 1
9 | cheese | 6
7 | something | 1
The tree structure can be a total of 5 levels deep
I've been trying to use left joins to build up a list of page references, but I think I'm also going to need to union results tables (because obviously rows like ID=9, ID=5, and ID = 6 have to be duplicated in the final results set). It starts to get a bit messy!
WITH A
AS (SELECT *
FROM [RbdBlocks]),
B
AS (SELECT [x].[Id],
[x].[Description],
[x].[Page] AS Page1,
[y].[Page] AS Page2,
FROM A AS x
LEFT OUTER JOIN
A AS y
ON y.Id = x.Page)
SELECT *
FROM B
The above gives me some of the nested references, but I'm not sure if there's a better way to get this data together, and to manage the recursion rather than just duplicating the set of queries 4 times?
Have a look at Recursive Common Table Expressions (CTEs). They should be able to accomplish exactly what you need.
Have a look at Example D on the SQL Docs page.
Basically what you'd do in your case is:
In the "anchor member" of the CTE, select all top-level items
In the "recursive member" of the CTE, join all of the nested children to the top-level item
Recursive CTEs are not really trivial to understand, so be sure to read the docs carefully.

Trying to find non-duplicate entries in mostly identical tables(access)

I have 2 different databases. They track different things about inventory. in essence they share 3 common fields. Location, item number and quantity. I've extracted these into 2 tables, with only those fields. Every time I find an answer, it doesn't get all the test cases, just some of the fields.
Items can be in multiple locations, and as a turn each location can have multiple items. The primary key would be location and item number.
I need to flag when an entry doesn't match all three fields.
I've only been able to find queries that match an ID or so, or who's queries are beyond my comprehension. in the below, I'd need a query that would show that rows 1,2, and 5 had issues. I'd run it on each table and have to verify it with a physical inventory.
Please refrain from commenting on it being silly having information in 2 different databases, All I get in response it to deal with it =P
Table A
Location ItemNum | QTY
-------------------------
1a1a | as1001 | 5
1a1b | as1003 | 10
1a1b | as1004 | 2
1a1c | as1005 | 15
1a1d | as1005 | 15
Table B
Location ItemNum | QTY
-------------------------
1a1a | as1001 | 10
1a1d | as1003 | 10
1a1b | as1004 | 2
1a1c | as1005 | 15
1a1e | as1005 | 15
This article seemed to do what I wanted but I couldn't get it to work.
To find entries in Table A that don't have an exactly matching entry in Table B:
select A.*
from A
left join B on A.location = B.location and A.ItemNum = B.ItemNum and A.qty = B.qty
where B.location Is Null
Just swap all the A's and B's to get the list of entries in B with no matching entry in A.

PostgreSQL: Distribute rows evenly and according to frequency

I have trouble with a complex ordering problem. I have following example data:
table "categories"
id | frequency
1 | 0
2 | 4
3 | 0
table "entries"
id | category_id | type
1 | 1 | a
2 | 1 | a
3 | 1 | a
4 | 2 | b
5 | 2 | c
6 | 3 | d
I want to put entries rows in an order so that category_id,
and type are distributed evenly.
More precisely, I want to order entries in a way that:
category_ids that refer to a category that has frequency=0 are
distributed evenly - so that a row is followed by a different category_id
whenever possible. e.g. category_ids of rows: 1,2,1,3,1,2.
Rows with category_ids of categories with frequency<>0 should
be inserted from ca. the beginning with a minimum of frequency rows between them
(the gaps should vary). In my example these are rows with category_id=2.
So the result could start with row id #1, then #4, then a minimum of 4 rows of other
categories, then #5.
in the end result rows with same type should not be next to each other.
Example result:
id | category_id | type
1 | 1 | a
4 | 2 | b
2 | 1 | a
6 | 3 | d
.. some other row ..
.. some other row ..
.. some other row ..
5 | 2 | c
entries are like a stream of things the user gets (one at a time).
The whole ordering should give users some variation. It's just there to not
present them similar entries all the time, so it doesn't have to be perfect.
The query also does not have to give the same result on each call - using
random() is totally fine.
frequencies are there to give entries of certain categories a higher
priority so that they are not distributed across the whole range, but are placed more
at the beginning of the result list. Even if there are a lot of these entries, they
should not completely crowd out the frequency=0 entries at the beginning, through.
I'm no sure how to start this. I think I can use window functions and
ntile() to distribute rows by category_id and type.
But I have no idea how to insert the non-0-category-entries afterwards.

Don't understand why inner join is necessary for filtering in sql

I have the following tables:
Basically I have a many2many relation between students and courses using the junction table students_courses
Here is some data populated into the tables:
students:
courses
students_courses:
So basically I would like to select the full_name and c_id for a given student. So for example for student with id=3 i would have Aurica 5 and Aurica 6.
My first approach was to write:
select s.full_name,sc.c_id from students s, students_courses sc
where sc.s_id=3
But i obtain this:
Aurica 5
Aurica 6
Aurica 5
Aurica 6
Aurica 5
Aurica 6
So it is duplicated by the number of rows of the students_courses table. Now I'm not sure why this happens.
If I would be an SQL parser, I would parse it like this:
"take the c_id from students_courses, full_name from students, and display them if the students_course row respects the where filter"
Not it works using join, but I don't really understand why the inner join is necessary.
select s.full_name, sc.c_id from students s
inner join students_courses sc
on sc.s_id=s.id and s.id=3;
Explain a bit how is the first sql interpreted by the SQL parser and why with join works.
Thanks,
When you select information from two tables what it does is a cross product of all the records and then it looks to the all of the records that satisfy the where clause. You have 3 records in the Students table
id | full_name
---+----------
3 | Aurica
4 | Aurica
5 | Aurica
And 6 records in the student_courses table.
s_is | c_id
-----+-----
3 | 5
3 | 6
4 | 7
4 | 8
5 | 9
5 | 10
So before your where statement it creates 18 different records. So it is easy to see I will include all of the columns.
s.id | s.full_name | sc.s_id | sc.c_id
-----+-------------+---------+--------
3 | Aurica | 3 | 5
3 | Aurica | 3 | 6
3 | Aurica | 4 | 7
3 | Aurica | 4 | 8
3 | Aurica | 5 | 9
3 | Aurica | 5 | 10
4 | Aurica | 3 | 5
4 | Aurica | 3 | 6
4 | Aurica | 4 | 7
4 | Aurica | 4 | 8
4 | Aurica | 5 | 9
4 | Aurica | 5 | 10
5 | Aurica | 3 | 5
5 | Aurica | 3 | 6
5 | Aurica | 4 | 7
5 | Aurica | 4 | 8
5 | Aurica | 5 | 9
5 | Aurica | 5 | 10
From there it only displays the ones where cs.id=3
s.full_name | sc.c_id
------------+--------
Aurica | 5
Aurica | 6
Aurica | 5
Aurica | 6
Aurica | 5
Aurica | 6
The second query you had compared the value of sc.s_id=s.id and only displays the ones where those values are the same, as well as the c_id=3
The SQL parser doesn't try to guess how your two tables are related. It would seem like the database engine has enough information to figure this out itself by following constraints, but SQL intentionally doesn't use the FK relationships to decide how to join your tables; you might want to remove constraints at at future date for some reason (such as in order to improve performance), and you wouldn't want dropping a constraint to alter how joins were made. The DBA needs freedom to change indexes and constraints without having to worry about having changed what results are returned by queries.
Since it can't count on having complete information to go on, the SQL engine is not in the business of deducing/guessing relationships. It's up to the person writing the SQL to specify what they are joining on. If you don't give it any instructions telling it how to hook up the tables (using a JOIN ON clause or WHERE clause) then it creates a cross join, which gives you the duplicated results.
First of all, SQL is a set-based language, you operate on sets of data, not on single (rows of) data.
If I would be an SQL parser, I would parse it like this: "take the
c_id from students_courses, full_name from students, and display them
if the students_course row respects the where filter"
Here, you're overlooking the sets students_courses and students, and just thinking about each row of data, like if this rows respects the filter, give me all the informations.
The JOIN doesn't filter data (that's what WHERE does), but instead it puts it together.
When you SELECT from table A, you ask for the set of rows in A, all of them.
When you SELECT from table A WHERE some condition, you ask for the set of rows in A that respect the condition (so the SQL engine discards rows from A that do not belong to the set you described with your query).
When you JOIN table_a and table_b, you ask to join the set of rows in a with the set of rows in b, obtaining a new set whose rows are the "concatenation" (let me use that term) of the columns from a row in A and the columns from a row in B; this, without giving any other information about how to join the rows, simply results in each row of table_a joined with each row of table_b.
That's why you don't get what you expect.
Finally, from a conceptual point of view, I'd like to point out that the SQL engine doesn't take the columns you request from a table or another, but after (1) having joined the rows in any table you requested and (2) having filtered out any row that doesn't match the where condition, it just return the columns you requested from the rows of the resulting set after (1) and (2).
In real life, RDBMS may reorder these operations, and apply any kind of optimization they find possible based on indexes and other query and tables informations they have available.
This should give you a rough idea of what's going on. But as #GordonLinoff suggested you, I think you should get a stronger basis about SQL and relational databases before you go any further, or it will get harder than this.
As a side note, what you had in your FROM clause, is a sort of implicit join, a former join syntax in which the FROM clause specifies the tables involved, and the WHERE clause the join predicate (the columns whose values should match to join the data).
If you would have done something like
select s.full_name,sc.c_id
from students s, students_courses sc
where sc.s_id = s.id --<-- you left this out
AND sc.s_id=3
You would have got the same results, Inner join is not necessary for this statement but it is a good practice to use this newer INNER JOIN Syntax to retrieve data.
Both of your queries are in fact joins, only in your first example there is no word "join" (but it is there, trust me).
However, that's an old style join and it's not recommended to use any more. In short, it's about NULL values - this old style join has an problem with interpreting NULL values and that's why you have wrong result.
For more details see here.

Advance Query with Join

I'm trying to convert a product table that contains all the detail of the product into separate tables in SQL. I've got everything done except for duplicated descriptor details.
The problem I am having all the products have size/color/style/other that many other products contain. I want to only have one size or color descriptor for all the items and reuse the "ID" for all the product which I believe is a Parent key to the Product ID which is a ...Foreign Key. The only problem is that every descriptor would have multiple Foreign Keys assigned to it. So I was thinking on the fly just have it skip figuring out a Foreign Parent key for each descriptor and just check to see if that descriptor exist and if it does use its Key for the descriptor.
Data Table
PI Colo Sz OTHER
1 | Blue | 5 | Vintage
2 | Blue | 6 | Vintage
3 | Blac | 5 | Simple
4 | Blac | 6 | Simple
===================================
Its destination table is this
===================================
DI Description
1 | Blue
2 | Blac
3 | 5
4 | 6
6 | Vintage
7 | Simple
=============================
Select Data.Table
Unique.Data.Table.Colo
Unique.Data.Table.Sz
Unique.Data.Table.Other
=======================================
Then the dual part of the questions after we create all the descriptors how to do a new query and assign the product ID to the descriptors.
PI| DI
1 | 1
1 | 3
1 | 4
2 | 1
2 | 3
2 | 4
By figuring out how to do this I should be able to duplicate this pattern for all 300 + columns in the product. Some of these fields are 60+ characters large so its going to save a ton of space.
Do I use a Array?
Okay, if I understand you correctly, you want all unique attributes converted from columns into rows in a single table (detailstable) that has an id and a description field:
Assuming the schema:
datatable
------------------
PI [PK]
Colo
Sz
OTHER
detailstable
------------------
DI [PK]
Description
You can first get all of the unique attributes into its own table with:
INSERT INTO detailstable (Description)
SELECT
a.description
FROM
(
SELECT DISTINCT Colo AS description
FROM datatable
UNION
SELECT DISTINCT Sz AS description
FROM datatable
UNION
SELECT DISTINCT OTHER AS description
FROM datatable
) a
Then to link up the datatable to the detailstable, I'm assuming you have a cross-reference table defined like:
datadetails
------------------
PI [PK]
DI [PK]
You can then do:
INSERT INTO datadetails (PI, DI)
SELECT
a.PI
b.DI
FROM
datatable a
INNER JOIN
detailstable b ON b.Description IN (a.Colo, a.Sz, a.OTHER)
I reckon you want to split description table for different categories, like - colorDescription, sizeDescription etc.
If that is not practical then I would recommend having an extra column showing an category attribute:
DI Description Category
1 | Blue | Color
2 | Blac | Color
3 | 5 | Size
4 | 6 | Size
6 | Vintage | Other
7 | Simple | Other
And then have primary key in this table as combination of ID and Category column.
This will have less chances for injecting any data errors. It will be also easy to track that down.