Select (retrieve) all records from multiple schemas using Postgres - sql

I have a PostgreSQL database with some schemas, like below:
My_Database
|-> Schemas
|-> AccountA
|-> AccountB
|-> AccountC
|-> AccountD
|-> AccountE
.
.
.
|-> AccountZ
All schemas have a table called product which has a column called title. I would like to know if is possible to execute a select statement to retrieve all records from all schemas with a certain conditional.
The only way I found until now is to run a query account by account, like below.
SET search_path TO AccountA;
SELECT title FROM product WHERE title ILIKE '%test%';
Schemas are created dynamically, so I don't know their names or how many of them exist.

With inheritance like #Denis mentioned, this would be very simple. Works for Postgres 8.4, too. Be sure to consider the limitations.
Basically, you would have a master table, I suppose in a master schema:
CREATE TABLE master.product (title text);
And all other tables in various schemata inherit from it, possibly adding more local columns:
CREATE TABLE a.product (product_id serial PRIMARY KEY, col2 text)
INHERITS (master.product);
CREATE TABLE b.product (product_id serial PRIMARY KEY, col2 text, col3 text)
INHERITS (master.product);
etc.
Tables don't have to share the same name or schema.
Then you can query all tables in a single fell swoop:
SELECT title, tableoid::regclass::text AS source
FROM master.product
WHERE title ILIKE '%test%';
tableoid::regclass::text is a handy way to tell the source of each row. But it interacts with the search_path. See:
Find out which schema based on table values
fiddle
Old sqlfiddle

You basically want a union all:
SELECT title FROM AccountA.product WHERE title ILIKE '%test%'
UNION ALL
SELECT title FROM AccountB.product WHERE title ILIKE '%test%'
UNION ALL
...;
You can do so automatically by using dynamic SQL and the catalog to locate all AccountXYZ schemas that have a products table.
Alternatively, create a AllAccounts schema with similar tables as the ones in individual schemas, and use table inheritance.
Note that neither will tell you which schema the data is from, however. In the former case, this is easy enough to add; not so much in the latter unless you add an extra column.

Related

Generate dynamic GraphQL Schema

I'm creating a system where the no. of columns in a table is not fixed. I'll explain with an example.
Consider a people table with 3 columns.
people
----------
id | name | email
This table will be exposed to a GraphQL API and I can query the table. In my system, the user will be able to add custom columns to the people table. Let's say they add a nationality column. When they do this, the nationality won't be available in the API because it is not defined in the Schema.
So how can I make my schema dynamic that allows the user to query the people table with every extra column they add?
I can query the information_schema table or a fields table and get the extra fields for the people table and then use the GraphQLObjectType to build out my schema rather than using SDL.
this is not the answer to your question but the suggestion because creating dynamic columns in SQL doesn't look like a good idea to me. Instead of thinking how can you create dynamic schema I think you should rethink on your DB structure.
Instead of adding new columns in Person table you should have different table for your custom columns like create person_columns table like
person_columns
----------
id | people_id | column_name | column_value
so for every people you can have multiple columns and its corresponding values and you dont have to worry about dynamically creating your graphQL schema. (depending on your requirement you can add more columns to person_columns table for more control)

What are "descendant tables" in Postgresql?

Database dumps from Postgresql use ALTER TABLE ONLY tablename instead of ALTER TABLE tablename which I am familiar with. I was curious what the ONLY keyword does, so I looked it up in the Postgresql documentation, and it says the following:
name
The name (optionally schema-qualified) of an existing table to alter. If ONLY is specified before the table name, only that table is altered. If ONLY is not specified, the table and all its descendant tables (if any) are altered. Optionally, * can be specified after the table name to explicitly indicate that descendant tables are included.
What are descendant tables?
PostgreSQL implements table inheritance, which can be a useful tool
for database designers. (SQL:1999 and later define a type inheritance
feature, which differs in many respects from the features described
here.)
Let's start with an example: suppose we are trying to build a data
model for cities. Each state has many cities, but only one capital. We
want to be able to quickly retrieve the capital city for any
particular state. This can be done by creating two tables, one for
state capitals and one for cities that are not capitals. However, what
happens when we want to ask for data about a city, regardless of
whether it is a capital or not? The inheritance feature can help to
resolve this problem. We define the capitals table so that it inherits
from cities:
CREATE TABLE cities (
name text,
population float,
altitude int -- in feet
);
CREATE TABLE capitals (
state char(2)
) INHERITS (cities);
In this case, the capitals table inherits all the columns of its
parent table, cities. State capitals also have an extra column, state,
that shows their state.
In PostgreSQL, a table can inherit from zero or more other tables, and
a query can reference either all rows of a table or all rows of a
table plus all of its descendant tables. The latter behavior is the
default.
Source: https://www.postgresql.org/docs/8.4/static/ddl-inherit.html
Descendant tables of a table are all tables that inherit from it, either directly or indirectly. So if table B inherits table A, and table C inherits table B, then:
Tables B and C are descendant tables of A.
Table C is a descendant table of B.
A query against a table (without ONLY) is a query against the table and all descendant tables. So, for example, a SELECT on a table with descendant tables is effectively a UNION of SELECT ... FROM ONLY across that table and all of its descendant tables. (In fact, if you inspect the query plan for a SELECT query on a table with descendants, you'll see that the plan is nearly identical to such a UNION query.)
If you are not using table inheritance, then the ONLY keyword has no effect on queries, as the set of descendant tables is empty.

in postgres, is it possible to optimize a VIEW of UNIONs

in the database there are many identical schemas, cmp01..cmpa0
each schema has a users table
each schema's users table's primary key has its own unique range
for example, in cmp01.users the usr_id is between 0x01000000 and 0x01ffffffff.
is there any way I could define a VIEW global.users that is a union of each of the cmp*.union tables in such a way that, if querying by usr_id, the optimizer would head for the correct schema?
was thinking something like:
create view global.users as
select * from cmp01.users where usr_id between 0x01000000 and 0x01ffffffff
union all
select * from cmp02.users where usr_id between 0x02000000 and 0x02ffffffff
....
would this work? NO. EXPLAIN ANALYZE shows all schema used.
Is there an approach that might give good hints to the optimizer?
Why not create a table in a public schema that has all users in it, possibly with an extra column to store the source schema. Since the ids are globally unique, you could keep the id column unique:
create table all_users (
source_schema varchar(32),
usr_id int primary key,
-- other columns as per existing table(s)
);
Poluate the table by inserting all rows:
insert into all_users
select 'cmp01', * from cmp01.users union
select 'cmp02', * from cmp02.users union ...; -- etc
Use triggers to keep the table up to date.
It's not that hard to set up, and it will perform every well
What about creating a partitioned table? The master table would be created as global.users and it would be partitioned by the schema name.
That way you'd get the small user tables in each schema (including fast retrievals) provided you can create queries that PostgreSQL can optimize i.e. including the schema name in the where condition. You could also create a view in each schema that would hide the needed schema name to query the partitioned tables. I don't think it would work by specifying only the user_id. I fear that PostgreSQL's partitioning features are not smart enough for that.
Or use just one single table, and create views in each schema with an instead of trigger and limiting the result to that schema's users.
Try something like:
create view global.users as
select *
from (select 'cmp01' sel_schema, 0x01000000 usr_id_start, 0x01ffffffff usr_id_end
union all
select 'cmp02' sel_schema, 0x02000000 usr_id_start, 0x02ffffffff usr_id_end) s
join (select u1.*, 'cmp01' schema from cmp01.users u1
union all
select u2.*, 'cmp02' schema from cmp02.users u2) u
on s.sel_schema = u.schema
and include a condition like specified_usr_id between usr_id_start and usr_id_end when querying the view by a specified user ID.

SQLite create pre-populated FTS table

Is there a way to create an FTS table in SQLite that is pre-populated with data from a SELECT query?
I know it’s possible to create a regular table that is prepopulated with data from a SELECT:
CREATE TABLE foo AS SELECT ref_id, name FROM other_table
And we can create an FTS table like so:
CREATE VIRTUAL TABLE bar USING FTS3(ref_id, name)
The point of doing this is to update my app’s SQLite database schema while avoiding reading in all of the data from other_table. I’m really hoping there’s some way to let SQLite do all the heavy lifting here (which is what it's really good at!).
I'm not sure if you can do it in one statement, but you can do it in two... after your CREATE VIRTUAL TABLE statement, you can do: INSERT INTO bar SELECT * FROM other_table

How to write a SELECT statement when I don't know in advance what columns I want?

My application has several tables: a master OBJECT table, and several
tables for storing specific kinds of objects: CAT, SHOE and BOOK.
Here's an idea of what the table columns look like:
object
object_id (primary key)
object_type (string)
cat
cat_id (primary key)
object_id (foreign key)
name (string)
color (string)
shoe
shoe_id (primary key)
object_id (foreign key)
model (string)
size (string)
book
book_id (primary key)
object_id (foreign key)
title (string)
author (string)
From the user's point of view, each specific object is primarily identified
by its name, which is a different column for each table. For the CAT table
it's name, for the SHOE table it's model, and for the BOOK table it's
title.
Let's say I'm handed an object_id without knowing in advance what kind of
object it represents -- a cat, a shoe or a book. How do I write a
SELECT statement to get this information?
Obviously it would look a little like this:
SELECT object_type,name FROM object WHERE object_id = 12345;
But how do I get the right contents in the "name" column?
It seems like you're describing a scenario where the user's view on the data (objects have names, I don't care what type they are) is different from the model you're using to store the data.
If that is the case, and assuming you have some control over the database objects, I'd probably create a VIEW, allowing you to coalesce similar data for each type of object.
Example on SQL Server:
CREATE VIEW object_names AS
SELECT object_id, name FROM cat
UNION ALL
SELECT object_id, model AS name FROM shoe
UNION ALL
SELECT object_id, title AS name FROM book
GO
You can then SELECT name FROM object_names WHERE object_id = 12345, without concerning yourself with the underlying column names.
Your only real solutions basically boil down to the same thing: writing explicit statements for each specific table and unioning them into a single result set.
You can either do this in a view (giving you a dynamic database object that you can query) or as part of the query (whether it's straight SQL or a stored procedure). You don't mention which database you're using, but the basic query is something like this:
select object_id, name from cat where object_id = 12345 union all
select object_id, model from shoe where object_id = 12345 union all
select object_id, title from book where object_id = 12345
For SQL Server, the syntax for creating the view would be:
create view object_view as
select 'cat' as type, object_id, name from cat union all
select 'shoe', object_id, model from shoe union all
select 'book', object_id, title from book
And you could query like:
select type, name from object_view where object_id = 12345
However, what you have is a basic table inheritance pattern, but it's implemented improperly since:
The primary key of child tables (cat, shoe, book) should also be a foreign key to the parent table (object). You should not have a different key for this, unless two cat records can represent the same object (in which case this is not inheritance at all)
Common elements, such as a name, should be represented at the highest level of the hierarchy as appropriate (in this case in object, since all of the objects have the concept of a "name").
do a join
http://www.tizag.com/sqlTutorial/sqljoin.php
You can't. Why not name them the same thing, or pull that name back to the OBJECT table. You can very easily create a column called Name and put that in the OBEJCT table. This would still be normalized.
It seems like you have a few options. You could use a config file or a 'schema' table. You could rename your tables so that the name of ye column is always te same. You could have the class in your code know its table. You could make your architecture a little less generic, and allow the data access layer to understand the data it's accessing.
Which to choose? What problem are you solving? What problem were you solving, whose solution created this problem?
There's really no way to do this without first SELECTing to find out the kind, then SELECTing a second time to get the actual data. If you only have a few different kinds of objects, you could do it with a single SELECT and a bunch of LEFT JOINs to join all the tables at once, but that doesn't scale well if you've got lots of joiner tables.
But just thinking outside the box a bit, does the "identifier" that users see have to correspond exactly to the primary key in the table? Could you encode the "kind" of the object in the identifier itself? So for example, if object_id 12345 is a shoe you could "encode" this as "S12345" from the user's point of view. A book would be "B4567" and a cat "C2578". Then in your code, just separate out the first letter and use that to decide which table to join on, and the remaining numbers are your primary key.
If you cannot alter the original table due to dependencies, you could probably create a view of the table with uniform column name. More information on how to create views can be found here.
There a table you can look at that tells you the properties of all the tables (and column properties) in your db.
In postgres this is something like pg_stat_alltables, I think there is something similar in sql server. You could query this and work out what you required, then construct a query based on that info...
EDIT: Sorry re-reading the question, I don't think that is what you require. - I've solved a similar problem before by having a surrogate key table - one table with all the id's in and a type id, then a serial/identity column that contains the primary key for that table - this is the id you should use... then you can create a view which looks up the other information based on the type id in that table.
The 'entity ref' table would have columns 'entityref' (PK), 'id', 'type id' etc... (that is assuming you can't restructure to use inheritance)