Postgres 9.1 - I have a schema that has tables "partitioned" by month (a new table is created each month, all columns the same). It is not set up as normal partitioning with a "master" table. I am currently writing a fairly large query, that I will have to run a few times each month.
Schema: augmented_events
tables:
p201301 (January 2013)
p201302 (Feb 2013)
p201303 (March 2013)
...
p201312 (December 2013)
p201401 (January 2014)
Right now I have to write my (simplified) query as:
select *
from augmented_events.p201301
union
select *
from augmented_events.p201302
union
select *
from augmented_events.p201303
union
select *
from augmented_events.p201312
union
select *
from augmented_events.p201401
And every month I need to add in the new month. I would like to make this a little more scalable without me having to revisit it every month. Is there a function I can create (or one that exists) that loops through each table in the augmented_events schema, and treats it as if I was to union these tables?
Proper solution
... would be partitioning via inheritance. It's rather simple actually. Consider this related answer:
Select (retrieve) all records from multiple schemas using Postgres
For now
While stuck with your unfortunate design, you can use dynamic SQL in a plpgsql function with EXECUTE.
Create this function once:
CREATE OR REPLACE FUNCTION f_all_in_schema_foo()
RETURNS SETOF t
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT string_agg(format('SELECT * FROM %s', c.oid ::regclass)
,E'\nUNION ALL\n'
ORDER BY relname)
FROM pg_namespace n
JOIN pg_class c ON c.relnamespace = n.oid
WHERE n.nspname = 'foo'
AND c.relkind = 'r'
);
END
$func$;
Note how I carefully avoid any possibility for SQL injection (table names have to be considered as "user input"!). See:
Table name as a PostgreSQL function parameter
Generates and executes a query of the form:
SELECT * FROM foo.p201301
UNION ALL
SELECT * FROM foo.p201302
UNION ALL
SELECT * FROM foo.p201303
UNION ALL
...
Tables are ordered by name due to the ORDER BY clause in string_agg().
You can use this table function just like a table. Call:
SELECT * FROM f_all_in_schema_foo();
Performance should be good.
You can find similar examples with explanation and links here on SO with this search.
I doubt there's anything like this possible in straight SQL, but you could use outside code (PHP, Perl, .NET; whatever's familiar to you). It would query the schema, drop the old view, and create the new. Schedule it to run daily and you'll be able to use the view without giving a thought to which tables are included.
This is a Band-Aid: better would be to correct this pseudo-partitioning.
Related
In Postgres I have a query that uses a table value function
SELECT
forecast.*
FROM (
SELECT
generate_series(begin_date, end_date, '1 mon'::interval)::date zdate
) zdate
LEFT JOIN LATERAL forecast_f(zdate.zdate)
forecast(forecast_version , source, forecast_date, gl_date, customer, program, rev) ON true
where 1=1;
and forecast_f looks like:
A lot of boiler plate and then:
BEGIN
return query
select * from table where a lot of parameters are pulled in.
I'm trying to do the same thing in Bigquery and have googled around a few uncommon concepts:
Generating a series
passing parameters into a udf with the right data type (always an adventure)
tvf
and the documentation did not have a lot on TVF's I thought maybe it can't handle tvfs and I'd have to join it all in to a column and split it somehow when it comes out of the function. When I googled, others complained about special cases when TVF's don't work, but that would suggest there are cases where it does work, like maybe mine. So I made this:
create or replace temp function snap(t timestamp)
as
(select * from forecast_stuff.forecast_full_practice where zfrom <= t and (zto> t or zto is null));
select * from snap(current_time())
which didn't work. Also, this fancy number:
create or replace temp function snap(t timestamp)
as
((select intersection from forecast_stuff.forecast_full_practice where zfrom <= t and (zto> t or zto is null)));
select * from snap(current_time())
Didn't work either. Something about if not exist not supported in temporary functions. I remember doing something like this in f1 or dremel a few years back. Did they not bring the technology forward?
Oracle 12cR1 - I have a complex business process I am putting into a query.
In general, the process will be
with t1 as (select CATEGORY, PRODUCT from ... )
select <some manipulation> from t1;
t1 -aka the output of the first line- will look like this:
CATEGORY PRODUCT
Database Oracle, MS SQL Server, DB2
Language C, Java, Python
I need the 2nd line of the SQL query (aka the manipulation) to keep the CATEGORY column, and to split the PRODUCT column on the comma. The output needs to look like this
CATEGORY PRODUCT
Database Oracle
Database MS SQL Server
Database DB2
Language C
Language Java
Language Python
I have looked at a couple of different CSV splitting options. I cannot use the DBMS_Utility.comma_to_Table function as this has restrictions with special characters or starting with numbers. I found a nice TABLE function which will convert a string to separate rows, called f_convert. This function is on StackOverflow about 1/3 the way down the page here.
Since this is a table function, it is called like so...And will give me 3 rows, as expected.
SELECT * FROM TABLE(f_convert('Oracle, MS SQL Server, DB2'));
How do I treat this TABLE function as it is was a "column function"? Although this is totally improper SQL, I am looking for something like
with t1 as (select CATEGORY, PRODUCT from ... )
select CATEGORY from T1, TABLE(f_convert(PRODUCT) as PRODUCT from t1;
Any help appreciated...
Use connect by to "loop" through the elements of the list where a comma-space is the delimiter. regexp_substr gets the list elements (the regex allows for NULL list elements) and the prior clauses keep the categories straight.
with t1(category, product) as (
select 'Database', 'Oracle, MS SQL Server, DB2' from dual union all
select 'Language', 'C, Java, Python' from dual
)
select category,
regexp_substr(product, '(.*?)(, |$)', 1, level, NULL, 1) product
from t1
connect by level <= regexp_count(product, ', ')+1
and prior category = category
and prior sys_guid() is not null;
CATEGORY PRODUCT
-------- --------------------------
Database Oracle
Database MS SQL Server
Database DB2
Language C
Language Java
Language Python
6 rows selected.
SQL>
SQL Server 2014 database. Table with 200 million rows.
Very large query with HUGE IN clause.
I originally wrote this query for them, but they have grown the IN clause to over 700 entries. The CTE looks unnecessary because I have omitted all the select columns and their substring() transformations for simplicity.
The focus is on the IN clause. 700+ pairs of these.
WITH cte AS (
SELECT *
FROM AODS-DB1B
WHERE
Source+'-'+Target
IN
(
'ACY-DTW',
'ACY-ATL',
'ACY-ORD',
:
: 700+ of these pairs
:
'HTS-PGD',
'PIE-BMI',
'PGD-HTS'
)
)
SELECT *
FROM cte
order by Source, Target, YEAR, QUARTER
When running, this query shoots CPU to 100% for hours - not unexpectedly.
There are indexes on all columns involved.
Question 1: Is there a better, or more effecient way to accomplish this query other than the huge IN clause? Would 700 UNION ALLs be better?
Question 2: When this query runs, it creates a Session_ID that contains 49 "threads" (49 processes that all have the same Session_ID). Every one of them an instance of this query with it's "Command" being this query text.
21 of them SUSPENDED,
14 of them RUNNING, and
14 of them RUNNABLE.
This changes rapidly as the task is running.
WHAT the heck is going on there? Is this SQL Server breaking the query up into pieces to work on it?
I recommend you store your 700+ strings in a permanent table as it is generally perceived as bad practice to store that much meta data in a script. You can create the table like this:
CREATE TABLE dbo.LookUp(Source varchar(250), Target varchar(250))
CREATE INDEX IX_Lookup_Source_Target on dbo.Lookup(Source,Target)
INSERT INTO dbo.Lookup (Source,Target)
SELECT 'ACY','DTW'
UNION
SELECT 'ACY','ATL'
.......
and then you can simply join on this table:
SELECT * FROM [AODS-DB1B] a
INNER JOIN dbo.Lookup lt ON lt.Source = a.Source
AND lt.Target=a.Target
ORDER BY Source, Target, YEAR, QUARTER
However, even better would be to normalise the AODS-DB1B table and store SourceId and TargetId INT values instead, with the VARCHAR values stored in Source and Target tables. You can then write a query that only performs integer comparisons rather than string comparisons and this should be much faster.
Put all of your codes into a temporary table (or permamnent if suitable).....
SELECT *
FROM AODS-DB1B
INNER JOIN NEW_TABLE ON Source+'-'+Target = NEWTABLE.Code
WHERE
...
...
you can create a temp table with all those values and then JOIN to that table, it would make the process a lot faster
I like the answer from Jaco
Have an index on source, target
It may be worth giving this a try
where ( source = 'ACY' and target in ('DTW', 'ATL', 'ORD') )
or ( source = 'HTS' and target in ('PGD') )
I'm trying to find which rows are missing from 1 database to another, I already have link to the both DBs and I already found out that I can't just join separate tables so what I'm trying right now is select the ID's from one table and paste them into the select statement for the other DB however I don't know how to parse a clob into a condition.
let me explain further:
I got this collection of varchar2's with all the ID's i need to check on the other DB, and I can iterate through that collection so I get a clob with form: 'id1','id2','id3'
I want to run this query on the other DB
SELECT * FROM atable#db2
WHERE id NOT IN (clob_with_ids)
but I don't know how to tell PL/SQL to evaluate that clob as part of the statement.
id field on atable#db2 is an integer and the varchar2 id's I got are from runnning a regex on a clob
edit:
I've been ask to add the example I was trying to run:
SELECT *
FROM myTable#db1
WHERE ( (creation_date BETWEEN to_date('14-JUN-2011 00:00:00','DD-MON-YYYY HH24:MI:SS') AND to_date('14-JUN-2011 23:59:59','DD-MON-YYYY HH24:MI:SS')) )
AND acertain_id NOT IN ( SELECT to_number(REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_SUBSTR(payload,'<xmlTag>([[:alnum:]]+)-'),'<xmlTag>',''),'-','')) as sameIDasOtherTable
FROM anotherTable#db2
WHERE condition1 ='bla'
AND condition2 ='blabla'
AND ( (creation_date BETWEEN to_date('14-JUN-2011 00:00:00','DD-MON-YYYY HH24:MI:SS') AND to_date('14-JUN-2011 23:59:59','DD-MON-YYYY HH24:MI:SS')) ) )
ORDER BY TO_CHAR(creation_date, 'MM/DD/YYYY') ASC;
I get error ORA-22992
Any suggestiongs?
It seems to me you have invested a lot of time in developing the wrong solution. A way simpler solution would be to use a SET operator. This query retrieves all the IDs in the local instance of ATABLE which are missing in the remote instance of the same table:
select id from atable
minus
select id from atable#db2
If your heart is set on retrieving the whole local row, you could try an anti-join:
select loc.*
from atable loc
left join atable#db2 rem
on (loc.id = rem.id )
where rem.id is null
/
I don't believe you can do that, but I've been proved wrong on many occasions... Even if you could find a way to get it to treat the contents of the CLOB as individual values for the IN you'd probably hit the 1000-item limit (ORA-01795) fairly quickly.
I'm not sure what you mean by 'I already found out that I can't just join separate tables'. Why can't you do something like:
SELECT * FROM atable#db2 WHERE id NOT IN (SELECT id FROM atable#db1)
Or:
SELECT * from atable#db2 WHERE id IN (
SELECT id FROM atable#db2 MINUS SELECT id FROM atable#db1)
(Or use #APC's anti-join, which is probably more performant!)
There may be performance issues with joining large tables on remote databases, but it looks like you have to do that at some point, and if it's a one-off task then it might be bearable.
Edited after question updated with join error
The ORA-22992 is because you're trying to pull a CLOB from the the remote database, which doesn't seem to work. From this I assume your reference to not being able to join is because you're joining two remote tables.
The simple option is not to pull all the columns - specify which you need rather than doing a select *. If you do need the CLOB value, the only thing I can suggest trying is using a CTE (WITH tmp_ids AS (SELECT <regex> FROM anotherTable#db2) ...), but I really have no idea if that avoids the two-link restriction. Or pull the IDs into a local temporary table; or run the query on one of the remote databases.
Is there a better solution to the problem of looking up multiple known IDs in a table:
SELECT * FROM some_table WHERE id='1001' OR id='2002' OR id='3003' OR ...
I can have several hundreds of known items. Ideas?
SELECT * FROM some_table WHERE ID IN ('1001', '1002', '1003')
and if your known IDs are coming from another table
SELECT * FROM some_table WHERE ID IN (
SELECT KnownID FROM some_other_table WHERE someCondition
)
The first (naive) option:
SELECT * FROM some_table WHERE id IN ('1001', '2002', '3003' ... )
However, we should be able to do better. IN is very bad when you have a lot of items, and you mentioned hundreds of these ids. What creates them? Where do they come from? Can you write a query that returns this list? If so:
SELECT *
FROM some_table
INNER JOIN ( your query here) filter ON some_table.id=filter.id
See Arrays and Lists in SQL Server 2005
ORs are notoriously slow in SQL.
Your question is short on specifics, but depending on your requirements and constraints I would build a look-up table with your IDs and use the EXISTS predicate:
select t.id from some_table t
where EXISTS (select * from lookup_table l where t.id = l.id)
For a fixed set of IDs you can do:
SELECT * FROM some_table WHERE id IN (1001, 2002, 3003);
For a set that changes each time, you might want to create a table to hold them and then query:
SELECT * FROM some_table WHERE id IN
(SELECT id FROM selected_ids WHERE key=123);
Another approach is to use collections - the syntax for this will depend on your DBMS.
Finally, there is always this "kludgy" approach:
SELECT * FROM some_table WHERE '|1001|2002|3003|' LIKE '%|' || id || '|%';
In Oracle, I always put the id's into a TEMPORARY TABLE to perform massive SELECT's and DML operations:
CREATE GLOBAL TEMPORARY TABLE t_temp (id INT)
SELECT *
FROM mytable
WHERE mytable.id IN
(
SELECT id
FROM t_temp
)
You can fill the temporary table in a single client-server roundtrip using Oracle collection types.
We have a similar issue in an application written for MS SQL Server 7. Although I dislike the solution used, we're not aware of anything better...
'Better' solutions exist in 2008 as far as I know, but we have Zero clients using that :)
We created a table valued user defined function that takes a comma delimited string of IDs, and returns a table of IDs. The SQL then reads reasonably well, and none of it is dynamic, but there is still the annoying double overhead:
1. Client concatenates the IDs into the string
2. SQL Server parses the string to create a table of IDs
There are lots of ways of turning '1,2,3,4,5' into a table of IDs, but the Stored Procedure which uses the function ends up looking like...
CREATE PROCEDURE my_road_to_hell #IDs AS VARCHAR(8000)
AS
BEGIN
SELECT
*
FROM
myTable
INNER JOIN
dbo.fn_split_list(#IDs) AS [IDs]
ON [IDs].id = myTable.id
END
The fastest is to put the ids in another table and JOIN
SELECT some_table.*
FROM some_table INNER JOIN some_other_table ON some_table.id = some_other_table.id
where some_other_table would have just one field (ids) and all values would be unique