in postgres, is it possible to optimize a VIEW of UNIONs - sql

in the database there are many identical schemas, cmp01..cmpa0
each schema has a users table
each schema's users table's primary key has its own unique range
for example, in cmp01.users the usr_id is between 0x01000000 and 0x01ffffffff.
is there any way I could define a VIEW global.users that is a union of each of the cmp*.union tables in such a way that, if querying by usr_id, the optimizer would head for the correct schema?
was thinking something like:
create view global.users as
select * from cmp01.users where usr_id between 0x01000000 and 0x01ffffffff
union all
select * from cmp02.users where usr_id between 0x02000000 and 0x02ffffffff
....
would this work? NO. EXPLAIN ANALYZE shows all schema used.
Is there an approach that might give good hints to the optimizer?

Why not create a table in a public schema that has all users in it, possibly with an extra column to store the source schema. Since the ids are globally unique, you could keep the id column unique:
create table all_users (
source_schema varchar(32),
usr_id int primary key,
-- other columns as per existing table(s)
);
Poluate the table by inserting all rows:
insert into all_users
select 'cmp01', * from cmp01.users union
select 'cmp02', * from cmp02.users union ...; -- etc
Use triggers to keep the table up to date.
It's not that hard to set up, and it will perform every well

What about creating a partitioned table? The master table would be created as global.users and it would be partitioned by the schema name.
That way you'd get the small user tables in each schema (including fast retrievals) provided you can create queries that PostgreSQL can optimize i.e. including the schema name in the where condition. You could also create a view in each schema that would hide the needed schema name to query the partitioned tables. I don't think it would work by specifying only the user_id. I fear that PostgreSQL's partitioning features are not smart enough for that.
Or use just one single table, and create views in each schema with an instead of trigger and limiting the result to that schema's users.

Try something like:
create view global.users as
select *
from (select 'cmp01' sel_schema, 0x01000000 usr_id_start, 0x01ffffffff usr_id_end
union all
select 'cmp02' sel_schema, 0x02000000 usr_id_start, 0x02ffffffff usr_id_end) s
join (select u1.*, 'cmp01' schema from cmp01.users u1
union all
select u2.*, 'cmp02' schema from cmp02.users u2) u
on s.sel_schema = u.schema
and include a condition like specified_usr_id between usr_id_start and usr_id_end when querying the view by a specified user ID.

Related

How to delete customer information from hdfs

Suppose, I have several customers today so I am storing their information like customer_id, customer_name, customer_emailid etc. If my customer is leaving and he wants that his personal information should be removed from my hdfs.
So I have below two approaches to achieve the same.
Approach 1:
1.Create Internal Table on top of HDFS
2.Create external table from first table using filter logic
3.While Creating 2nd Table apply udfs on specific columns for more column filtering
Approach 2:
Spark=> Read, filter, write
Is there any other solution?
Approach 2 is possible in Hive - select, filter, write
Create a table on top of directory in hdfs (external or managed, does not matter in this context, better external if you are going to drop table later and keep the data as is). Insert overwrite table or partition from select with filter.
insert overwrite mytable
select *
from mytable --the same table
where customer_id not in (...) --filter rows

Select (retrieve) all records from multiple schemas using Postgres

I have a PostgreSQL database with some schemas, like below:
My_Database
|-> Schemas
|-> AccountA
|-> AccountB
|-> AccountC
|-> AccountD
|-> AccountE
.
.
.
|-> AccountZ
All schemas have a table called product which has a column called title. I would like to know if is possible to execute a select statement to retrieve all records from all schemas with a certain conditional.
The only way I found until now is to run a query account by account, like below.
SET search_path TO AccountA;
SELECT title FROM product WHERE title ILIKE '%test%';
Schemas are created dynamically, so I don't know their names or how many of them exist.
With inheritance like #Denis mentioned, this would be very simple. Works for Postgres 8.4, too. Be sure to consider the limitations.
Basically, you would have a master table, I suppose in a master schema:
CREATE TABLE master.product (title text);
And all other tables in various schemata inherit from it, possibly adding more local columns:
CREATE TABLE a.product (product_id serial PRIMARY KEY, col2 text)
INHERITS (master.product);
CREATE TABLE b.product (product_id serial PRIMARY KEY, col2 text, col3 text)
INHERITS (master.product);
etc.
Tables don't have to share the same name or schema.
Then you can query all tables in a single fell swoop:
SELECT title, tableoid::regclass::text AS source
FROM master.product
WHERE title ILIKE '%test%';
tableoid::regclass::text is a handy way to tell the source of each row. But it interacts with the search_path. See:
Find out which schema based on table values
fiddle
Old sqlfiddle
You basically want a union all:
SELECT title FROM AccountA.product WHERE title ILIKE '%test%'
UNION ALL
SELECT title FROM AccountB.product WHERE title ILIKE '%test%'
UNION ALL
...;
You can do so automatically by using dynamic SQL and the catalog to locate all AccountXYZ schemas that have a products table.
Alternatively, create a AllAccounts schema with similar tables as the ones in individual schemas, and use table inheritance.
Note that neither will tell you which schema the data is from, however. In the former case, this is easy enough to add; not so much in the latter unless you add an extra column.

Alter Physical Structure of table Oracle11gr2

I have a table
Name Age RollNo.
A 1 10
B 2 20
Now I want to alter the table permanently in such a way that,After altering it should look as below
RollNo. Name Age
10 A 1
20 B 2
How shall i alter this table , All i want to do is to change physical structure of the table.
Why do you want to do it?
If it's just because you'd like to have a correct order of columns when using SELECT *, then you should not have used * in the first place. Always use the exact list of columns in your queries.
If it's because you think it would improve the performance, have you done the actual measurements? I doubt you'll find many scenarios where changing the physical column order influences performance in a significant way. There are some scenarios with chained rows where it might (see the "Row Chaining" section in this article), but that doesn't apply to narrow rows such as yours.
That being said, you could:
CREATE TABLE NEW_TABLE AS SELECT <different column order> FROM OLD_TABLE.
Recreate all the relevant constraints (such as keys, FKs), indexes and triggers/procedures on the NEW_TABLE.
DROP TABLE OLD_TABLE.
ALTER TABLE NEW_TABLE RENAME TO OLD_TABLE.
You might also want to look at the dbms_redefinition if you need to do that while accepting updates.
You can drop and create the table without loosing the data in oracle using statement
create table YOUR_TABLE_BU as select * from YOUR_TABLE
Please go through the link - How can I create a copy of an Oracle table without copying the data? for more details. Try:
CREATE TABLE YOUR_TABLE_BU AS SELECT * FROM YOUR_TABLE;
DROP TABLE YOUR_TABLE;
CREATE TABLE YOUR_TABLE AS SELECT RollNo., Name, Age FROM YOUR_TABLE_BU;
DROP TABLE YOUR_TABLE_BU;

SQLite create pre-populated FTS table

Is there a way to create an FTS table in SQLite that is pre-populated with data from a SELECT query?
I know it’s possible to create a regular table that is prepopulated with data from a SELECT:
CREATE TABLE foo AS SELECT ref_id, name FROM other_table
And we can create an FTS table like so:
CREATE VIRTUAL TABLE bar USING FTS3(ref_id, name)
The point of doing this is to update my app’s SQLite database schema while avoiding reading in all of the data from other_table. I’m really hoping there’s some way to let SQLite do all the heavy lifting here (which is what it's really good at!).
I'm not sure if you can do it in one statement, but you can do it in two... after your CREATE VIRTUAL TABLE statement, you can do: INSERT INTO bar SELECT * FROM other_table

Grant access to subset of table to user on PostgreSQL

I know that I can use views to grant access to a subset of attributes in a table. But how can I grant access to particular tuples only? Say I have a table of registered students, a username attribute and then some other like degree_status, how do I grant access so that user A can only select from the table a tuple corresponding to username A ? I have a database exam and I'm studying some past papers and I came across this question but I don't know how to answer it and I cant find how to do it from my book "Dtabase System: A practical Approach to Database Design, Implementation and Management'
Thanks any help is much appreciated!
Matt
Say that you got :
Table items (item_id, ...)
Table users (user_id, ...)
Table users_permissions( user_id, item_id, perm_type )
You could create a VIEW like this :
SELECT i.*, p.perm_type
FROM items JOIN users_permissions USING (item_id)
WHERE user_id = get_current_user_id();
Users can select from this view but not remove the WHERE and JOIN restricting the permissions.
The get_current_user_id() function is likely to be the major problem ;)
Along the lines of peufeu's answer, in Postgresql the current user name is available through the function current_user. So a view
CREATE VIEW available_bigtable AS
SELECT * FROM bigtable
WHERE username = current_user;
looks like it does what you need. Grant SELECT to everyone on the view, but to no one (except admins) on the underlying bigtable.
The Veil project provides a framework for row-level access control in PostgreSQL.
How about creating a function that takes the user id and returns the subset of rows he has access to?
CREATE FUNCTION user_items(integer) RETURNS SETOF items AS $$
SELECT * FROM items WHERE user_id = $1
$$ LANGUAGE SQL;
SELECT * FROM user_items(55); # 55 being the user id
edit Thinking about it more, this could cause quite a performance hit, as the user_id condition would be applied to the whole data set, prior to any other "user-land" conditions.
For example, SELECT * FROM user_items(55) WHERE id=45 would first filter the entire table for user items, and only than find the ID on that subset.
With views, the query planner can decide on the optimal order to evaluate the conditions (where he'll probably filter for the ID first, than for user id). When using a function like I suggested, postgres can't do that.