Index on base64 column oracle - indexing

The requirement is to fetch data from remote DB using fast refresh mview
Certain columns needs to be encrypted , so we did STANDARD_HASH
For Fast Refresh Base64_encode is not supported , so we have created a view V_TEST , and applied the function
The problem is , we want to create a functional index on the MVIEW , as we need to fetch the ID based on the ENCRY_TEXT
The functional index we tried created on MVIEW using function UTL_RAW.cast_to_varchar2 (
UTL_ENCODE.base64_encode (mv.ENCRY_TEXT)) , throws an error that the function should be deterministic .
We created the deteminstic function and used that in the index creation script , but now it is throwing another error
ORA-01450: maximum key length (6397) exceeded
Data Type of ENCRY_TEXT is VARCHAR2(64) in the MVIEW
CREATE MATERIALIZED VIEW TEST
BUILD IMMEDIATE
REFRESH FAST ON DEMAND
WITH PRIMARY KEY
AS
SELECT TC.ROWID TC_ROWID,
TU.ROWID TU_ROWID,
TU.ID AS ID,
RAWTOHEX (STANDARD_HASH (TC.VALUE, 'SHA256')) AS ENCRY_TEXT
FROM TABLE1#DBLINK1 TC
INNER JOIN TABLE2#DBLINK1 TU ON TC.ID = TU.ID
WHERE 1=1;
CREATE OR REPLACE FORCE VIEW V_TEST
AS
SELECT mv.ID,
UTL_RAW.cast_to_varchar2 (
UTL_ENCODE.base64_encode (mv.ENCRY_TEXT)) ENCRY_TEXT
FROM TEST mv;
--request query
SELECT ID
FROM V_TEST
WHERE ENCRY_TEXT = <INPUT_AS_BASE_64>; -- cause performance issue as this column is not indexed

Related

Partitioned tables cannot have ROW triggers. when adding a for each row trigger that calls a fn which creates a partition before inserting (postgres)

I am new to Database, so seeking some advise or help.
I have a table that is partitioned by list host, as shown below.
CREATE TABLE public. services
(
id integer,
service_name character varying(128),
host character varying(128),
) PARTITION BY LIST (host)
And to insert in the table, I have created a function that will check if the partitioned table is present if not, let's create one before inserting and trying to add a trigger to the function.
CREATE OR REPLACE FUNCTION service_function()
RETURNS TRIGGER AS $$
DECLARE
host TEXT;
partition_name TEXT;
BEGIN
partition_name := 'services_' || host;
IF NOT EXISTS
(SELECT 1
FROM information_schema.tables
WHERE table_name= partation_name)
THEN
RAISE NOTICE 'A partition has been created %', partition_name;
EXECUTE format(E'CREATE TABLE %I PARTITION OF services for values in (%L)'), partition_name, host;
END IF;
EXECUTE format('INSERT INTO %I (id, service_name, host) VALUES ($1, $2, $3)', partition_name) using NEW.id, NEW.service_name, NEW.host;
RETURN NULL;
Now when I am trying to add a trigger,
CREATE TRIGGER insert_service_trigger
BEFORE INSERT on services
FOR EACH ROW EXECUTE PROCEDURE service_function();
END
following error is thrown: ERROR: "services" is a partitioned table
DETAIL: Partitioned tables cannot have ROW triggers.
any suggestions or solution for it?
what I am trying to achieve: services table will be in 100G+ and always host will be there in where clause for all the SELECT queries, so I thought of creating a partition using host. is this is rite approach ??
You must be using PostgreSQL v10, which is the only release that ever sported that error message. Triggers on partitioned tables were introduced in v11.
But you won't be able to achieve what you want in any PostgreSQL version. By the time your trigger has started processing, it is too late too change the table definition. You'd get the following error:
ERROR: cannot CREATE TABLE .. PARTITION OF "services" because it is being used by active queries in this session
There is no way to achieve what you want in PostgreSQL. For a thorough discussion of this and possible workarounds, read my article.

Limit inserts to 6 rows per id

I am learning Hibernate creating a basic console app, using Oracle as the back end. I have a table where if a student enters a 7th record he should not be permitted to do so. How do I do this?
Well beside triggers, you can create a materialized view , then a checking constraint on the table.
create materialized view log on test_table;
create materialized view mv_test_table
refresh FAST on COMMIT
ENABLE QUERY REWRITE
as
select id, count(*) cnts
from test_table
group by id;
alter table test_table
add constraint check_userid
check (cnts< 7);
You can also use a simple trigger (on condition that your table has an ID column):
create or replace trigger trg_limit_row
after insert on your_table
for each row
begin
if :new.id >5 then -- assume that you have id in range (0-5) -> 6 rows
execute immediate 'delete from your_table t where t.id = '
|| ':' || 'new_id';
end if;
end;
/

Oracle: Materialized View not refreshed when using a Prebuilt Table

I'm having problems when using the Prebuilt Table option in a MV in Oracle 12. This code works fine:
CREATE TABLE empt
( ename VARCHAR2(20),
empno INTEGER PRIMARY KEY);
CREATE MATERIALIZED VIEW LOG ON empt
WITH SEQUENCE , rowid (empno)
INCLUDING NEW VALUES;
CREATE MATERIALIZED VIEW empt_MV
REFRESH FAST ON COMMIT
WITH ROWID
AS
SELECT count(*) numberofemps
FROM empt ;
INSERT INTO empt VALUES ('A',1);
COMMIT;
SELECT * FROM empt_MV;
Previous Select return, as expected:
NUMBEROFEMPS
------------
1
But, if I use the ON PREBUILT TABLE option, nothing happens. I mean, the MV remains empty:
drop materialized view empt_mv;
drop materialized view log on empt;
drop table empt;
CREATE TABLE empt
( ename VARCHAR2(20),
empno INTEGER PRIMARY KEY);
CREATE MATERIALIZED VIEW LOG ON empt
WITH SEQUENCE , rowid (empno)
INCLUDING NEW VALUES;
CREATE TABLE empt_MV (
numberofemps NUMBER);
CREATE MATERIALIZED VIEW empt_MV
ON PREBUILT TABLE
REFRESH FAST ON COMMIT
WITH ROWID
AS
SELECT count(*) numberofemps
FROM empt ;
INSERT INTO empt VALUES ('A',1);
COMMIT;
SELECT * FROM empt_MV;
Previous Selects returns no rows.
Anyone knows what happens?
You can't use REFRESH FAST if you are employing WITH ROWID - on your prebuilt table. WITH ROWID Clause
This varies greatly by Oracle version.
Use the instructions provided in the relevant documentation, which will show you how to determine the fast refresh capabilities in your particular situation using DBMS_MView.Explain_MView.
12.1: https://docs.oracle.com/database/121/REPLN/repmview.htm#REPLN304

Find out which schema based on table values

My database is separated into schemas based on clients (i.e.: each client has their own schema, with same data structure).
I also happen to have an external action that does not know which schema it should target. It comes from another part of the system that has no concepts of clients and does not know in which client's set it is operating. Before I process it, I have to find out which schema that request needs to target
To find the right schema, I have to find out which holds the record R with a particular unique ID (string)
From my understanding, the following
SET search_path TO schema1,schema2,schema3,...
will only look through the tables in schema1 (or the first schema that matches the table) and will not do a global search.
Is there a way for me to do a global search across all schemas, or am I just going to have to use a for loop and iterate through all of them, one at a time?
You could use inheritance for this. (Be sure to consider the limitations.)
Consider this little demo:
CREATE SCHEMA master; -- no access of others ..
CREATE SEQUENCE master.myseq; -- global sequence for globally unique ids
CREATE table master.tbl (
id int primary key DEFAULT nextval('master.myseq')
, foo text);
CREATE SCHEMA x;
CREATE table x.tbl() INHERITS (master.tbl);
INSERT INTO x.tbl(foo) VALUES ('x');
CREATE SCHEMA y;
CREATE table y.tbl() INHERITS (master.tbl);
INSERT INTO y.tbl(foo) VALUES ('y');
SELECT * FROM x.tbl; -- returns 'x'
SELECT * FROM y.tbl; -- returns 'y'
SELECT * FROM master.tbl; -- returns 'x' and 'y' <-- !!
Now, to actually identify the table a particular row lives in, use the tableoid:
SELECT *, tableoid::regclass AS table_name
FROM master.tbl
WHERE id = 2;
Result:
id | foo | table_name
---+-----+-----------
2 | y | y.tbl
You can derive the source schema from the tableoid, best by querying the system catalogs with the tableoid directly. (The displayed name depends on the setting of search_path.)
SELECT n.nspname
FROM master.tbl t
JOIN pg_class c ON c.oid = t.tableoid
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE t.id = 2;
This is also much faster than looping through many separate tables.
You will have to iterate over all namespaces. You can get a lot of this information from the pg_* system catalogs. In theory, you should be able to resolve the client -> schema mapping at request time without talking to the database so that the first SQL call you make is:
SET search_path = client1,global_schema;
While I think Erwin's solution is probably preferable if you can re-structure your tables, an alternative that doesn't require any schema changes is to write a PL/PgSQL function that scans the tables using dynamic SQL based on the system catalog information.
Given:
CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE a.testtab ( searchval text );
CREATE TABLE b.testtab (LIKE a.testtab);
INSERT INTO a.testtab(searchval) VALUES ('ham');
INSERT INTO b.testtab(searchval) VALUES ('eggs');
The following PL/PgSQL function searches all schemas containing tables named _tabname for values in _colname equal to _value and returns the first matching schema.
CREATE OR REPLACE FUNCTION find_schema_for_value(_tabname text, _colname text, _value text) RETURNS text AS $$
DECLARE
cur_schema text;
foundval integer;
BEGIN
FOR cur_schema IN
SELECT nspname
FROM pg_class c
INNER JOIN pg_namespace n ON (c.relnamespace = n.oid)
WHERE c.relname = _tabname AND c.relkind = 'r'
LOOP
EXECUTE
format('SELECT 1 FROM %I.%I WHERE %I = $1',
cur_schema, _tabname, _colname
) INTO foundval USING _value;
IF foundval = 1 THEN
RETURN cur_schema;
END IF;
END LOOP;
RETURN NULL;
END;
$$ LANGUAGE 'plpgsql';
If there are are no matches then null is returned. If there are multiple matches the result will be one of them, but no guarantee is made about which one. Add an ORDER BY clause to the schema query if you want to return (say) the first in alphabetical order or something. The function is also trivially modified to return setof text and RETURN NEXT cur_schema if you want to return all the matches.
regress=# SELECT find_schema_for_value('testtab','searchval','ham');
find_schema_for_value
-----------------------
a
(1 row)
regress=# SELECT find_schema_for_value('testtab','searchval','eggs');
find_schema_for_value
-----------------------
b
(1 row)
regress=# SELECT find_schema_for_value('testtab','searchval','bones');
find_schema_for_value
-----------------------
(1 row)
By the way, you can re-use the table definitions without inheritance if you want, and you really should. Either use a common composite data type:
CREATE TYPE public.testtab AS ( searchval text );
CREATE TABLE a.testtab OF public.testtab;
CREATE TABLE b.testtab OF public.testtab;
in which case they share the same data type but not any data; or or via LIKE:
CREATE TABLE public.testtab ( searchval text );
CREATE TABLE a.testtab (LIKE public.testtab);
CREATE TABLE b.testtab (LIKE public.testtab);
in which case they're completely unconnected to each other after creation.

What would be the right steps for horizontal partitioning in Postgresql?

We have an E-commerce portal with a Postgresql 9.1 database. One very important table has at the moment 32 million records. If we want to deliver all items this table would grow to 320 million records, mostly dates. Which would be to heavy.
So we are thinking about horizontal partitioning / sharding. We can divide items in this table into 12 pieces horizontal (1 per month). What would be the best steps and technics to do so? Would horizontal partitioning within the database be good enough or do we have to start thinking about sharding?
While 320 million is not small, it's not really huge either.
It largely depends on the queries you run on the table. If you always include the partition key in your queries then "regular" partitioning would probably work.
An example for this can be found in the PostgreSQL wiki:
http://wiki.postgresql.org/wiki/Month_based_partitioning
The manual also explains some of the caveats of partitioning:
http://www.postgresql.org/docs/current/interactive/ddl-partitioning.html
If you are thinking about sharding, you might read how Instagram (which is powered by PostgreSQL) has implemented that:
http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram
If you have mostly read-queries, another option might be to use streaming replication to setup multiple servers and distribute the read queries by connecting to the hot-standby for read access and connecting to the master for write access. I think pg-pool II can do that (somewhat) automatically. That can be combined with partitioning to further reduce the query runtime.
If you are adventurous and don't have very immediate needs to do so, you might also consider Postgres-XC which promises to support transparent horizontal scaling:
http://postgres-xc.sourceforge.net/
There is no final release yet, but it looks like this isn't taking too long
Here is my sample code for partitioning:
t_master is a view to be select/insert/update/delete in your application
t_1 and t_2 is the underlying tables actually storing the data.
create or replace view t_master(id, col1)
as
select id, col1 from t_1
union all
select id, col1 from t_2
CREATE TABLE t_1
(
id bigint PRIMARY KEY,
col1 text
);
CREATE TABLE t_2
(
id bigint PRIMARY KEY,
col1 text
);
CREATE OR REPLACE FUNCTION t_insert_partition_function()
returns TRIGGER AS $$
begin
raise notice '%s', 'hello';
execute 'insert into t_'
|| ( mod(NEW.id, 2)+ 1 )
|| ' values ( $1, $2 )' USING NEW.id, NEW.col1 ;
RETURN NULL;
end;
$$
LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION t_update_partition_function()
returns TRIGGER AS $$
begin
raise notice '%s', 'hello';
execute 'update t_'
|| ( mod(NEW.id, 2)+ 1 )
|| ' set id = $1, col1 = $2 where id = $1'
USING NEW.id, NEW.col1 ;
RETURN NULL;
end;
$$
LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION t_delete_partition_function()
returns TRIGGER AS $$
begin
raise notice '%s', 'hello';
execute 'delete from t_'
|| ( mod(OLD.id, 2)+ 1 )
|| ' where id = $1'
USING OLD.id;
RETURN NULL;
end;
$$
LANGUAGE plpgsql;
CREATE TRIGGER t_insert_partition_trigger instead of INSERT
ON t_master FOR each row
execute procedure t_insert_partition_function();
CREATE TRIGGER t_update_partition_trigger instead of update
ON t_master FOR each row
execute procedure t_update_partition_function();
CREATE TRIGGER t_delete_partition_trigger instead of delete
ON t_master FOR each row
execute procedure t_delete_partition_function();
If you don't mind upgrading to PostgreSQL 9.4, then you could use the pg_shard extension, which lets you transparently shard a PostgreSQL table across many machines. Every shard is stored as a regular PostgreSQL table on another PostgreSQL server and replicated to other servers. It uses hash-partitioning to decide which shard(s) to use for a given query. pg_shard would work well if your queries have a natural partition dimension (e.g., customer ID).
More info: https://github.com/citusdata/pg_shard