What are "descendant tables" in Postgresql? - sql

Database dumps from Postgresql use ALTER TABLE ONLY tablename instead of ALTER TABLE tablename which I am familiar with. I was curious what the ONLY keyword does, so I looked it up in the Postgresql documentation, and it says the following:
name
The name (optionally schema-qualified) of an existing table to alter. If ONLY is specified before the table name, only that table is altered. If ONLY is not specified, the table and all its descendant tables (if any) are altered. Optionally, * can be specified after the table name to explicitly indicate that descendant tables are included.
What are descendant tables?

PostgreSQL implements table inheritance, which can be a useful tool
for database designers. (SQL:1999 and later define a type inheritance
feature, which differs in many respects from the features described
here.)
Let's start with an example: suppose we are trying to build a data
model for cities. Each state has many cities, but only one capital. We
want to be able to quickly retrieve the capital city for any
particular state. This can be done by creating two tables, one for
state capitals and one for cities that are not capitals. However, what
happens when we want to ask for data about a city, regardless of
whether it is a capital or not? The inheritance feature can help to
resolve this problem. We define the capitals table so that it inherits
from cities:
CREATE TABLE cities (
name text,
population float,
altitude int -- in feet
);
CREATE TABLE capitals (
state char(2)
) INHERITS (cities);
In this case, the capitals table inherits all the columns of its
parent table, cities. State capitals also have an extra column, state,
that shows their state.
In PostgreSQL, a table can inherit from zero or more other tables, and
a query can reference either all rows of a table or all rows of a
table plus all of its descendant tables. The latter behavior is the
default.
Source: https://www.postgresql.org/docs/8.4/static/ddl-inherit.html

Descendant tables of a table are all tables that inherit from it, either directly or indirectly. So if table B inherits table A, and table C inherits table B, then:
Tables B and C are descendant tables of A.
Table C is a descendant table of B.
A query against a table (without ONLY) is a query against the table and all descendant tables. So, for example, a SELECT on a table with descendant tables is effectively a UNION of SELECT ... FROM ONLY across that table and all of its descendant tables. (In fact, if you inspect the query plan for a SELECT query on a table with descendants, you'll see that the plan is nearly identical to such a UNION query.)
If you are not using table inheritance, then the ONLY keyword has no effect on queries, as the set of descendant tables is empty.

Related

I want one table to inherit from another

I have 2 SQL tables say all_cities, regular_cities, (and capitol_cities).
all_cities has 2 columns : name, population
regular_cities has 2 columns : name, population
capitol_cities has 3 columns : name, population, state
These tables are already created. I want to connect them using table partitioning but first I need to make sure that tables regular_cities and capitol_cities inherit from all_cities.
Is there anyway to have these tables inherit from another table even after the tables are created or only when creating a new table?
After searching around the correct way to do this is with a simple ALTER statement
ALTER TABLE <child_table> INHERIT <parent_table>;
That's it!
Documentation here
https://www.postgresql.org/docs/9.6/static/sql-altertable.html

SQL - Selecting columns based on attributes of the column

I am currently designing a SQL database to house a large amount of biological data. The main table has over 100 columns, where each row is a particular sampling event and each column is a species name. Values are the number of individuals found of that species for that sampling event.
Often, I would like to aggregate species together based on their taxonomy. For example: suppose Sp1, Sp2, and Sp3 belong to Family1; Sp4, Sp5, and Sp6 belong to Family2; and Family1 and Family2 belong to Class1. How do I structure the database so I can simply query a particular Family or Class, instead of listing 100+ columns each time?
My first thought was to create a second table that lists the attributes of each column from the first table. Such that the primary key in the second table corresponded to the column headers in table 1, and the columns in table 2 are the categories I would want to select by (such as Family, Feeding type, life stage, etc.). However, I'm not sure how to write a query that can join tables in such a way.
I'm a newbie to SQL, and am not sure if I'm going about this in completely the wrong way. How can I structure my data/write queries to accomplish my goal?
Thanks in advance.
No, no, no. Don't make species columns in the table.
Instead, where you have one row now, you want multiple rows. It would have columns such as:
id: auto generated sequential number
sampleId: whatever each row in the current table belongs to
speciesId: reference to the species table
columns of data for that species on that sampling
The species table could then have a hierarchy, the entire hierarchy with genus, family, order, and so on.

Strategies to store extra information about models without too many column names (alternatives to DB normalization and model subclassing)

Say you had a Model called Forest. Each object represents a forest on your continent. There is a set of data that is common to all these forests, like forest type, area etc., and these can be easily represented by columns on the SQL table, forest.
However, imagine that these forests had additional data about them that might not always be repeatable. For example the 20 coniferous forests have a pine-fir split ratio number, whereas the deciduous forests have a autumn-duration number. One way would be to store all these columns on the main table itself, but there will be too many columns on each row, with many columns remaining un-filled by definition.
The most obvious way around this is to make sub-classes of the Forest model and have separate table for each subclass. I feel that's a heavy handed approach that I would rather not follow. If I need some data about the generic forest I'll have to consult another table.
Is there a pattern to solve this problem? What solution do you usually prefer?
NOTE: I have seen the other questions about this. The solutions proposed were:
Subtyping, same as I proposed above.
Have all the columns on the same table.
Have separate tables for each kind of forest, with duplicated data like area and rainfall... duplicated.
Is there an inventive solution that I don't know of?
UPDATE: I have run into the EAV model, and also a modified version where the unpredictable fields are stored out in a NoSQL/JSON store, and the id for that is held in the RDB. I like both, but welcome suggestions in this direction.
On the database side, the best approach is often to store attributes common to all forests in one table, and to store unique attributes in other tables. Build updatable views for clients to use.
create table forests (
forest_id integer primary key,
-- Assumes forest names are not unique on a continent.
forest_name varchar(45) not null,
forest_type char(1) not null
check (forest_type in ('c', 'd')),
area_sq_km integer not null
check (area_sq_km > 0),
-- Other columns common to all forests go here.
--
-- This constraint lets foreign keys target the pair
-- of columns, guaranteeing that a row in each subtype
-- table references a row here having the same subtype.
unique (forest_id, forest_type)
);
create table coniferous_forests_subtype (
forest_id integer primary key,
forest_type char(1) not null
default 'c'
check (forest_type = 'c'),
pine_fir_ratio float not null
check (pine_fir_ratio >= 0),
foreign key (forest_id, forest_type)
references forests (forest_id, forest_type)
);
create table deciduous_forests_subtype (
forest_id integer primary key,
forest_type char(1) not null
default 'd'
check (forest_type = 'd'),
autumn_duration_days integer not null
check (autumn_duration_days between 20 and 100),
foreign key (forest_id, forest_type)
references forests (forest_id, forest_type)
);
Clients usually use updatable views, one for each subtype, instead of using the base tables. (You can revoke privileges on the base subtype tables to guarantee this.) You might want to omit the "forest_type" column.
create view coniferous_forests as
select t1.forest_id, t1.forest_type, t1.area_sq_km,
t2.pine_fir_ratio
from forests t1
inner join coniferous_forests_subtype t2
on t1.forest_id = t2.forest_id;
create view deciduous_forests as
select t1.forest_id, t1.forest_type, t1.area_sq_km,
t2.autumn_duration_days
from forests t1
inner join deciduous_forests_subtype t2
on t1.forest_id = t2.forest_id;
What you have to do to make these views updatable varies a little with the dbms, but expect to write some triggers (not shown). You'll need triggers to handle all the DML actions--insert, update, and delete.
If you need to report only on columns that appear in "forests", then just query the table "forests".
Well, the easiest way is putting all the columns into one table and then having a "type" field to decide which columns to use. This works for smaller tables, but for more complicated cases it can lead to a big messy table and issues with database constraints (such as NULLs).
My preferred method would be something like this:
A generic "Forests" table with: id, type, [generic_columns, ...]
"Coniferous_Forests" table with: id, forest_id (FK to Forests), ...
So, in order to get all the data for a Coniferous Forest with id of 1, you'd have a query like so:
SELECT * FROM Coniferous_Forests INNER JOIN Forests
ON Coniferous_Forests.forest_id = Forests.id
AND Coniferous_Forests.id = 1
As for inventive solutions, there is such a thing as an OODBMS (Object Oriented Database Management Sytem).
The most popular alternative to Relational SQL databases are Document-Oriented NoSQL databases like MongoDB. This is comparable to using JSON objects to store your data, and allows you to be more flexible with your database fields.

In Oracle - ALTER TABLE can I cascade data type changes?

I have a parent table (ParentId) and a child (ParentId,ChildId) table. ParentId in the parent table is defined as VARCHAR2 ( 25 characters ) and so is ParentId in the child table. I want to change the data type to VARCHAR2 ( 50 characters ) . Can I do it in such a way that this change is cascaded to the Child table ?
Cheers !
No. There is no way to cascade the changes. (Well, there might be some obscure way if you had defined your columns based on the same user-defined type but that would undoubtedly create more issues than it would solve). You'd need to issue one ALTER for the parent and one for the child table.
If you have a more complicated schema where you're trying to trace changes from a parent table through many child and grandchild tables, you could use the data dictionary to walk the hierarchy. Here is one query to return the table hierarchy based on one of Frank Kulash's OTN Forums posts. If the column is named the same in every table, you can do something even easier and just query dba_tab_columns for all the columns with that name and a data_length of 25 rather than 50.

How to write a SELECT statement when I don't know in advance what columns I want?

My application has several tables: a master OBJECT table, and several
tables for storing specific kinds of objects: CAT, SHOE and BOOK.
Here's an idea of what the table columns look like:
object
object_id (primary key)
object_type (string)
cat
cat_id (primary key)
object_id (foreign key)
name (string)
color (string)
shoe
shoe_id (primary key)
object_id (foreign key)
model (string)
size (string)
book
book_id (primary key)
object_id (foreign key)
title (string)
author (string)
From the user's point of view, each specific object is primarily identified
by its name, which is a different column for each table. For the CAT table
it's name, for the SHOE table it's model, and for the BOOK table it's
title.
Let's say I'm handed an object_id without knowing in advance what kind of
object it represents -- a cat, a shoe or a book. How do I write a
SELECT statement to get this information?
Obviously it would look a little like this:
SELECT object_type,name FROM object WHERE object_id = 12345;
But how do I get the right contents in the "name" column?
It seems like you're describing a scenario where the user's view on the data (objects have names, I don't care what type they are) is different from the model you're using to store the data.
If that is the case, and assuming you have some control over the database objects, I'd probably create a VIEW, allowing you to coalesce similar data for each type of object.
Example on SQL Server:
CREATE VIEW object_names AS
SELECT object_id, name FROM cat
UNION ALL
SELECT object_id, model AS name FROM shoe
UNION ALL
SELECT object_id, title AS name FROM book
GO
You can then SELECT name FROM object_names WHERE object_id = 12345, without concerning yourself with the underlying column names.
Your only real solutions basically boil down to the same thing: writing explicit statements for each specific table and unioning them into a single result set.
You can either do this in a view (giving you a dynamic database object that you can query) or as part of the query (whether it's straight SQL or a stored procedure). You don't mention which database you're using, but the basic query is something like this:
select object_id, name from cat where object_id = 12345 union all
select object_id, model from shoe where object_id = 12345 union all
select object_id, title from book where object_id = 12345
For SQL Server, the syntax for creating the view would be:
create view object_view as
select 'cat' as type, object_id, name from cat union all
select 'shoe', object_id, model from shoe union all
select 'book', object_id, title from book
And you could query like:
select type, name from object_view where object_id = 12345
However, what you have is a basic table inheritance pattern, but it's implemented improperly since:
The primary key of child tables (cat, shoe, book) should also be a foreign key to the parent table (object). You should not have a different key for this, unless two cat records can represent the same object (in which case this is not inheritance at all)
Common elements, such as a name, should be represented at the highest level of the hierarchy as appropriate (in this case in object, since all of the objects have the concept of a "name").
do a join
http://www.tizag.com/sqlTutorial/sqljoin.php
You can't. Why not name them the same thing, or pull that name back to the OBJECT table. You can very easily create a column called Name and put that in the OBEJCT table. This would still be normalized.
It seems like you have a few options. You could use a config file or a 'schema' table. You could rename your tables so that the name of ye column is always te same. You could have the class in your code know its table. You could make your architecture a little less generic, and allow the data access layer to understand the data it's accessing.
Which to choose? What problem are you solving? What problem were you solving, whose solution created this problem?
There's really no way to do this without first SELECTing to find out the kind, then SELECTing a second time to get the actual data. If you only have a few different kinds of objects, you could do it with a single SELECT and a bunch of LEFT JOINs to join all the tables at once, but that doesn't scale well if you've got lots of joiner tables.
But just thinking outside the box a bit, does the "identifier" that users see have to correspond exactly to the primary key in the table? Could you encode the "kind" of the object in the identifier itself? So for example, if object_id 12345 is a shoe you could "encode" this as "S12345" from the user's point of view. A book would be "B4567" and a cat "C2578". Then in your code, just separate out the first letter and use that to decide which table to join on, and the remaining numbers are your primary key.
If you cannot alter the original table due to dependencies, you could probably create a view of the table with uniform column name. More information on how to create views can be found here.
There a table you can look at that tells you the properties of all the tables (and column properties) in your db.
In postgres this is something like pg_stat_alltables, I think there is something similar in sql server. You could query this and work out what you required, then construct a query based on that info...
EDIT: Sorry re-reading the question, I don't think that is what you require. - I've solved a similar problem before by having a surrogate key table - one table with all the id's in and a type id, then a serial/identity column that contains the primary key for that table - this is the id you should use... then you can create a view which looks up the other information based on the type id in that table.
The 'entity ref' table would have columns 'entityref' (PK), 'id', 'type id' etc... (that is assuming you can't restructure to use inheritance)