I have a set of 4 tables that contain different partitions of some data. I also have a plpgsql function that will take an id and return the name of the table containing that data as a 'character varying'.
However, when I try to use this function to select from the correct table, e.g.
SELECT f.a FROM getTable(someID) AS f;
it doesn't seem to work. It doesn't throw an error on the SELECT, but it doesn't return the fields I'm expecting either (i.e. it says f.a doesn't exist).
How do I select data from a table where the table name is given by the return of a function?
I'm using PostgreSQL 9.1. This is going to be run over millions of records, so I don't want to have to do it as two separate calls.
I think for the problem described in the topic you should use inheritance with tables.
Answering the question - use execute:
execute "SELECT f.a FROM " || getTable(someID) || " AS f";
Related
We have a large amount of data in an Oracle 11g server. Most of the engineers use Tableau for visualizing data, but there is currently not a great solution for visualizing straight from the Oracle server because of the structure of the database. Unfortunately, this cannot be changed, as it's very deeply integrated with the rest of our systems. There is a "dictionary" table, let's call it tab_keys:
name | key
---------------
AB-7 | 19756
BG-0 | 76519
FY-10 | 79513
JB-2 | 18765
...
...
And there are also the tables actually containing the data. Each entry in tab_keys has a corresponding data table named by prefixing the key with an identifier, in this case, we'll use "dat_". So AB-7 will store all its data in a table called dat_19756. These keys are not known to the user, and are only used for tracking "behind the scenes". The user only knows the AB-7 moniker.
Tableau allows communication with Oracle servers using standard SQL select statements, but because the user doesn't know the key value, they cannot write a SQL statement to query the data.
Tableau recently added the ability for users to query Oracle Table Functions, so I started going down the road of writing a table function to query for the key, and return a table of the results for Tableau to use. The problem is that each dat_ table is basically unique with a different numbers of columns, labels, number of records, and datatypes from the next dat_ table.
What is the right way to handle this problem? Can I:
1) Write a function (which tableau can call inline in regular SQL) to return a bonified table name which is dynamically generated? I tried this:
create or replace FUNCTION TEST_FUNC
(
V_NAME IN VARCHAR2
) RETURN user_tables.table_name%type AS
V_KEY VARCHAR(100);
V_TABLE user_tables.table_name%type;
BEGIN
select KEY into V_KEY from my_schema.tab_keys where NAME = V_NAME;
V_TABLE := dbms_assert.sql_object_name('my_schema.dat_' || V_KEY);
RETURN V_TABLE;
END TEST_FUNC;
and then SELECT * from TABLE(TEST_FUNC('AB-7')); but I get:
ORA-22905: cannot access rows from a non-nested table item
22905. 00000 - "cannot access rows from a non-nested table item"
*Cause: attempt to access rows of an item whose type is not known at
parse time or that is not of a nested table type
*Action: use CAST to cast the item to a nested table type
I couldn't figure out a good way to CAST the table as the table type I needed. Could this be done in the function before returning?
2) Write a table function? Tableau can supposedly query these like tables, but then I run into the problem of dynamically generating types (which I understand isn't easy) but with the added complication of this needing to be used by multiple users simultaneously, so each user would need a data type generated for them each time they connect to a table (if I'm understanding this correctly).
I have to think I'm missing something simple. How do I cast the return of this query as this other table's datatype?
There is no simple way to have a single generic function return a dynamically configurable nested table. With other products you could use a Ref Cursor (which maps to ODBC or JDBC ResultSet object) but my understanding is Tableau does not support that option.
One thing you can do is generate views from your data dictionary. You can use this query to produce a one-off script.
select 'create or replace view "' || name || '" as select * from dat_' || key || ';'
from tab_keys;
The double-quotes are necessary because AB-7 is not a valid object name in Oracle, due to the dash.
This would allow your users to query their data like this:
select * from "AB-7";
Note they would have to use the double-quotes too.
Obviously, any time you inserted a row in tab_keys you'd need to create the required view. That could be done through a trigger.
You can build dynamic SQL in SQL using the open source program Method4:
select * from table(method4.dynamic_query(
q'[
select 'select * from dat_'||key
from tab_keys
where name = 'AB-7'
]'
));
A
-
1
The program combines Oracle Data Cartridge Interface with ANYDATASET to create a function that can return dynamic types.
There might be a way to further simplify the interface but I haven't figured it out yet. These Oracle Data Cartridge Interface functions are very picky and are not easy to repackage.
Here's the sample schema I used:
create table tab_keys(name varchar2(100), key varchar2(100));
insert into tab_keys
select 'AB-7' , '19756' from dual union all
select 'BG-0' , '76519' from dual union all
select 'FY-10', '79513' from dual union all
select 'JB-2' , '18765' from dual;
create table dat_19756 as select 1 a from dual;
create table dat_76519 as select 2 b from dual;
create table dat_79513 as select 3 c from dual;
create table dat_18765 as select 4 d from dual;
I need a more efficient way to update rows of a single table in Postgres 9.5.
I am currently doing it with pg_dump, and re-import with updated values after search and replace operations in a Linux OS environment.
table_a has 300000 rows with 2 columns: id bigint and json_col jsonb.
json_col has about 30 keys: "C1" to "C30" like in this example:
Table_A
id,json_col
1 {"C1":"Paris","C2":"London","C3":"Berlin","C4":"Tokyo", ... "C30":"Dallas"}
2 {"C1":"Dublin","C2":"Berlin","C3":"Kiev","C4":"Tokyo", ... "C30":"Phoenix"}
3 {"C1":"Paris","C2":"London","C3":"Berlin","C4":"Ankara", ... "C30":"Madrid"}
...
The requirement is to mass search all keys from C1 to C30 then look in
them for the value "Berlin" and replace with "Madrid" and only if
Madrid is not repeated. i.e. id:1 with Key C3, and id:2 with C2. id:3
will be skipped because C30 exists with this value already
It has to be in a single SQL command in PostgreSQL 9.5, one time and considering all keys from the jsonb column.
The fastest and simplest way is to modify the column as text:
update table_a
set json_col = replace(json_col::text, '"Berlin"', '"Madrid"')::jsonb
where json_col::text like '%"Berlin"%'
and json_col::text not like '%"Madrid"%'
It's a practical choice. The above query is rather a find-and-replace operation (like in a text editor) than a modification of objects attributes. The second option is more complicated and surely much more expensive. Even using the fast Javascript engine (example below) more formal solution would be many times slower.
You can try Postgres Javascript:
create extension if not exists plv8;
create or replace function replace_item(data jsonb, from_str text, to_str text)
returns jsonb language plv8 as $$
var found = 0;
Object.keys(data).forEach(function(key) {
if (data[key] == to_str) {
found = 1;
}
})
if (found == 0) {
Object.keys(data).forEach(function(key) {
if (data[key] == from_str) {
data[key] = to_str;
}
})
}
return data;
$$;
update table_a
set json_col = replace_item(json_col, 'Berlin', 'Madrid');
What makes this hard is that you are looking for unknown keys holding values of interest. Postgres infrastructure is optimized to find keys (or array values).
Possibly caused by a sub-optimal table design. The many top-level objects of your jsonb column might be replaced by an array, discarding irrelevant key names altogether. (Or maybe another array for key names.) Or, ideally with a full normalized DB schema to begin with.
Be that as it may, here is a proof of concept, how this can be fast and clean with stock Postgres 9.5 or later anyway.
Additional difficulty 1: it's unknown whether duplicate values are possible.
Additional difficulty 2: value frequencies are unknown, too.
Additional difficulty 3: only the first value found is to be replaced and only if the target value is not there yet. Implementing this with set-based operations is possible, but unwieldy. I wrote a plpgsql function instead:
CREATE OR REPLACE FUNCTION jsonb_replace_value(_j jsonb, _old jsonb, _new jsonb)
RETURNS jsonb AS
$func$
DECLARE
_key text;
_val jsonb;
BEGIN
FOR _key, _val IN
SELECT * FROM jsonb_each(_j)
LOOP
IF _val = _old THEN
RETURN jsonb_set(_j, ARRAY[_key], _new); -- update 1st key
END IF;
END LOOP;
RETURN _j; -- nothing found, return original
END
$func$ LANGUAGE plpgsql IMMUTABLE;
COMMENT ON FUNCTION jsonb_replace_value(jsonb, jsonb, jsonb) IS '
Replace the first occurrence of _old value with _new.
Call:
SELECT jsonb_replace_value('{"C1":"Paris","C3":"Berlin","C4":"Berlin"}', '"Berlin"', '"Madrid"')';
Could be enhanced to optionally replace all occurrences etc. but that's beyond the scope of this question.
Now this would be simple:
UPDATE table_a
SET json_col = jsonb_replace_value(json_col, '"Berlin"', '"Madrid"'); -- note jsonb literal syntax!
If all rows need an update, we can stop here. Won't get faster. (Except possibly with alternatives like demonstrated by #klin.)
If a large percentage of all rows need an update, add a WHERE condition to avoid empty updates:
...
WHERE json_col <> jsonb_replace_value(json_col, '"Berlin"', '"Madrid"');
See:
How do I (or can I) SELECT DISTINCT on multiple columns?
Typically, only very few rows actually need an update. Then iterating through all rows with above query is expensive. We need index support to make it fast. Not easy for the case. I suggest an expression index based on an IMMUTABLE function extracting the array of values:
CREATE OR REPLACE FUNCTION jsonb_object_val_arr(jsonb)
RETURNS text[] LANGUAGE sql IMMUTABLE AS
'SELECT ARRAY (SELECT value FROM jsonb_each_text($1))';
COMMENT ON FUNCTION jsonb_object_val_arr(jsonb) IS '
Generates text array of values in outermost jsonb object.
Of limited use if there can be nested objects.';
CREATE INDEX table_a_val_arr_idx ON table_a USING gin (jsonb_object_val_arr(json_col));
Related, with more explanation:
Find rows containing a key in a JSONB array of records
Query making use of this index:
UPDATE table_a a
SET json_col = jsonb_replace_value(a.json_col, '"Berlin"', '"Madrid"')
WHERE jsonb_object_val_arr(json_col) #> '{Berlin}' -- has Berlin, possibly > 1x ..
-- AND NOT jsonb_object_val_arr(json_col) #> '{Madrid}'
AND NOT EXISTS ( -- .. but not Madrid
SELECT FROM table_a b
WHERE jsonb_object_val_arr(json_col) #> '{Madrid}' -- note array literal syntax
AND b.id = a.id
);
The NOT EXISTS semi-anti-join is carefully drafted to utilize the index a 2nd time.
The commented simpler alternative is faster if there are few rows with 'Berlin' and 'Madrid' - then a filter step in the query plan will be cheaper.
Should be very fast.
db<>fiddle here for Postgres 9.5 demonstrating all.
Ok i have tested all methods and i can say you did a great job
This helped me a lot. Let me share my feedback with you.
Method 1 sugested by Klin. Works perfect and is totally fine, except if
key is named like value, then both will be replaced key and value.
i.e.: "Berlin":"Berlin" becomes "Madrid":"Madrid"
Method 2 with plv8 extension did not worked because i am missing controll file
i had to install it and i just skipped this method, so i have no
feedback regarding this method.
Error that i was getting was this:
ERROR: could not open extension control file
"/usr/pgsql-9.5/share/extension/plv8.control": No such file or directory
Method 3 similar to method 2 with jsonb_replace_value function
works perfect, in replaces rows that contains specific value regardless
of the key. And adding condition
WHERE json_col <> jsonb_replace_value(json_col, '"Berlin"', '"Madrid"')
will avoid empty updates and will skip rows than do not need to be updated
And somethig like this
{"Berlin":"Berlin"} becomes {"Berlin":"Madrid"} i.e. Key is not touched, just value
Method 4 is a little more complicated, it uses Method 3 and Indexes
It works totally awesome and super speedy.
And NOT EXISTS semi-anti-join indeed forced to use Index again.
I was shocked how fast it performed!!!
However i discovered all this methods will work if json string looks like this:
{"key":"value"}
If i have for example to update a value that is a json object it will not update
something like this: {"C30":{"id":10044,"value":"Berlin","created_by":"John Doe"}}
MANY THANKS to you guys. #klin and #erwin-brandstetter. This helped me to learn something new!
I want to do something along the lines of:
SELECT some_things
FROM `myproject.mydataset.mytable_#suffix`
But this doesn't work because the parameter isn't expanded inside the table name.
This does work, using wildcard tables:
SELECT some_things
FROM `myproject.mydataset.mytable_*`
WHERE _TABLE_SUFFIX = #suffix
However, it has some problems:
If I mistype the parameter, this query silently returns zero rows, rather than yelling at me loudly.
Query caching stops working when querying with a wildcard.
If other tables exist with the mytable_ prefix, they must have the same schema, even if they don't match the suffix. Otherwise, weird stuff happens. It seems like BigQuery either computes the union of all columns, or takes the schema of an arbitrary table; it's not documented and I didn't look at it in detail.
Is there a better way to query a single table whose name depends on a query parameter?
Yes, you can, here's a working example:
DECLARE tablename STRING;
DECLARE tableQuery STRING;
##get list of tables
CREATE TEMP TABLE tableNames as select table_name from nomo_nausea.INFORMATION_SCHEMA.TABLES where table_name not in ('_sdc_primary_keys', '_sdc_rejected', 'fba_all_order_report_data');
WHILE (select count(*) from tableNames) >= 1 DO
SET tablename = (select table_name from tableNames LIMIT 1);
##build dataset + table name
SET tableQuery = CONCAT('nomo_nausea.' , tablename);
##use concat to build string and execute
EXECUTE IMMEDIATE CONCAT('SELECT * from `', tableQuery, '` where _sdc_deleted_at is not null');
DELETE FROM tableNames where table_name = tablename;
END WHILE;
In order to answer your stated problems:
Table scanning happens in FROM clause, in WHERE clause happens filtering [1] thus if WHERE condition is not match an empty result would be returned.
"Currently, Cached results are not supported when querying with wildcard" [2].
"BigQuery uses the schema for the most recently created table that matches the wildcard as the schema" [3]. What kind of weird stuff you have faced in your use case? "A wildcard table represents a union of all the tables that match the wildcard expression" [4].
In BigQuery parameterized queries can be run, But table names can not be parameterized [5]. Your wildcard solution seems to be the only way.
You can actually use tables as parameters if you use the Python API, but it's not documented yet. If you pass the tables as parameters through a formatted text string vs. a docstring, your query should work.
SQL example:
sql = "SELECT max(_last_updt) FROM `{0}.{1}.{2}` WHERE _last_updt >= TIMESTAMP(" +
"CURRENT_DATE('-06:00'))".format(project_id, dataset_name, table_name)
SQL in context of Python API:
bigquery_client = bigquery.Client() #setup the client
query_job = bigquery_client.query(sql) #run the query
results = query_job.result() # waits for job to complete
for row in results:
print row
Im trying to create a function which takes in 3 parameters and returns one. The value of the returned parameter is obtained by querying two tables, however, one of the tables is on a different schema. The SQL im using is below:
create or replace FUNCTION tester
(
originaltext IN VARCHAR2, lang IN VARCHAR2, category IN VARCHAR2 DEFAULT NULL)
RETURN VARCHAR2 AS
translatedttextval VARCHAR2(255) := '';
BEGIN
--if category is null, disregard category
if category is null then
SELECT distinct nvl(trans.translatedtext, originaltext)
INTO translatedttext
FROM tbl_translations trans, SECDEVSCHEMA.tbl_instanceids ids
WHERE trans.translatedlang = ids.id_a
AND ids.name = lang
AND trans.originaltext = originaltext;
end if;
RETURN translatedttextval;
END;
I've removed a bit of the query here which searches with category because it does something similar and has the same issue.
So, when I run this and pass in the params, I get an error which reads:
Error(16,46): PL/SQL: ORA-00942: table or view does not exist
If I do the following query it works fine when not on the SECDEVSCHEMA and returns the reults from tbl_instanceids which is on the SECDEVSCHEMA schema:
SELECT * FROM SECDEVSCHEMA.tbl_instanceids ids WHERE ids.id_a = 1234555;
I don't have DBA access to the DB but if I need some select priviledge granted or something I can get it done. Not sure if this the case though as the above query works.
Small additional question also:
where it says in the query
nvl(trans.translatedtext, originaltext)
If i wanted to surround the original text value with brackets when no translatedtext value exists, how would I go about this?
I'm using SQL Developer by the way in case that's important.
Thanks
for your smaller question
nvl(trans.translatedtext, '<'|| originaltext ||'>' )
This was down to a priviledges issue.
In SQL Developer, I expanded the tables package and selected the table I needed access to in the Function, right clicked and selected Priviledges > Grant.
As this is just a Development DB I just assigned all the priviledges to the appropriate schema or User and its working now.
Still unsure about this part though:
Small additional question also: where it says in the query nvl(trans.translatedtext, originaltext)
If i wanted to surround the original text value with brackets when no translatedtext value exists, how would >I go about this?
Any takers for this?
Thanks
If I have two queries, which I will call horrible_query_1 and ugly_query_2, and I want to perform the following two minus operations on them:
(horrible_query_1) minus (ugly_query_2)
(ugly_query_2) minus (horrible_query_1)
Or maybe I have a terribly_large_and_useful_query, and the result set it produces I want to use as part of several future queries.
How can I avoid copying and pasting the same queries in multiple places? How can I "not repeat myself," and follow DRY principles. Is this possible in SQL?
I'm using Oracle SQL. Portable SQL solutions are preferable, but if I have to use an Oracle specific feature (including PL/SQL) that's OK.
create view horrible_query_1_VIEW as
select .. ...
from .. .. ..
create view ugly_query_2_VIEW as
select .. ...
from .. .. ..
Then
(horrible_query_1_VIEW) minus (ugly_query_2_VIEW)
(ugly_query_2_VIEW) minus (horrible_query_1_VIEW)
Or, maybe, with a with clause:
with horrible_query_1 as (
select .. .. ..
from .. .. ..
) ,
ugly_query_2 as (
select .. .. ..
.. .. ..
)
(select * from horrible_query_1 minus select * from ugly_query_2 ) union all
(select * from ugly_query_2 minus select * from horrible_query_1)
If you want to reuse the SQL text of the queries, then defining views is the best way, as described earlier.
If you want to reuse the result of the queries, then you should consider global temporary tables. These temporary tables store data for the duration of session or transaction (whichever you choose). These are really useful in case you need to reuse calculated data many times over, especially if your queries are indeed "ugly" and "horrible" (meaning long running). See Temporary tables for more information.
If you need to keep the data longer than a session, you can consider materialized views.
Since you're using Oracle, I'd create Pipelined TABLE functions.
The function takes parameters and returns an object (which you have to create)
and then you SELECT * or even specific columns from it using the TABLE() function and can use it with a WHERE clause or with JOINs. If you want a unit of reuse (a function) you're not restricted to just returning values (i.e a scalar function) you can write a function that returns rows or recordsets.
something like this:
FUNCTION RETURN_MY_ROWS(Param1 IN type...ParamX IN Type)
RETURN PARENT_OBJECT PIPELINED
IS
local_curs cursor_alias; --you need a cursor alias if this function is in a Package
out_rec ROW_RECORD_OF_CUSTOM_OBJECT:=ROW_RECORD_OF_CUSTOM_OBJECT(NULL, NULL,NULL) --one NULL for each field in the record sub-object
BEGIN
OPEN local_curs FOR
--the SELECT query that you're trying to encapsulate goes here
-- and it can be very detailed/complex and even have WITH () etc..
SELECT * FROM baseTable WHERE col1 = x;
-- now that you have captured the SELECT into a Cursor
-- here you put a LOOP to take what's in the cursor and put it in the
-- child object (that holds the individual records)
LOOP
FETCH local_curs --opening the ref-cursor
INTO out_rec.COL1,
out_rec.COL2,
out_rec.COL3;
EXIT WHEN local_curs%NOTFOUND;
PIPE ROW(out_rec); --piping out the Object
END LOOP;
CLOSE local_curs; -- always do this
RETURN; -- we're now done
END RETURN_MY_ROWS;
after you've done that, you can use it like so
SELECT * FROM TABLE(RETURN_MY_ROWS(val1, val2));
you can INSERT SELECT or even CREATE TABLE out of it , you can have it in joins.
two more things to mention:
--ROW_RECORD_OF_CUSTOM_OBJECT is something along these lines
CREATE or REPLACE TYPE ROW_RECORD_OF_CUSTOM_OBJECT AS OBJECT
(
col1 type;
col2 type;
...
colx type;
);
and PARENT_OBJECT is a table of the other object (with the field definitions) we just made
create or replace TYPE PARENT_OBJECT IS TABLE OF ROW_RECORD_OF_CUSTOM_OBJECT;
so this function needs two OBJECTs to support it, but one is a record, the other is a table of that record (you have to create the record first).
In a nutshell, the function is easy to write, you need a child object (with fields), and a parent object that will house that child object that is of type TABLE of the child object, and you open the original base-table fetching SQL into a SYS_REFCURSOR (which you may need to alias) if you're in a package and you read from that cursor from a loop into the individual records.
The function returns a type of PARENT_OBJECT but inside it packs the records sub-object with values from the cursor.
I hope this works for you (there may be permissioning issues with your DBA if you want to create OBJECTs and Table functions)*/
If you operate with values, you could write functions.
Here you find infos on how to do it. It basically works like writing a function in any language. You can define parameters and return values.
Which gives you the cool possibility to write code just once. Here is how you do it:
http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_5009.htm
Have you tried using RESULT_CACHE hint in your queries? Also, you could
ALTER SESSION SET RESULT_CACHE_MODE=FORCE
and see if it helps.