Evaluate a varchar2 string into a condition for a SQL statement - sql

I'm trying to find which rows are missing from 1 database to another, I already have link to the both DBs and I already found out that I can't just join separate tables so what I'm trying right now is select the ID's from one table and paste them into the select statement for the other DB however I don't know how to parse a clob into a condition.
let me explain further:
I got this collection of varchar2's with all the ID's i need to check on the other DB, and I can iterate through that collection so I get a clob with form: 'id1','id2','id3'
I want to run this query on the other DB
SELECT * FROM atable#db2
WHERE id NOT IN (clob_with_ids)
but I don't know how to tell PL/SQL to evaluate that clob as part of the statement.
id field on atable#db2 is an integer and the varchar2 id's I got are from runnning a regex on a clob
edit:
I've been ask to add the example I was trying to run:
SELECT *
FROM myTable#db1
WHERE ( (creation_date BETWEEN to_date('14-JUN-2011 00:00:00','DD-MON-YYYY HH24:MI:SS') AND to_date('14-JUN-2011 23:59:59','DD-MON-YYYY HH24:MI:SS')) )
AND acertain_id NOT IN ( SELECT to_number(REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_SUBSTR(payload,'<xmlTag>([[:alnum:]]+)-'),'<xmlTag>',''),'-','')) as sameIDasOtherTable
FROM anotherTable#db2
WHERE condition1 ='bla'
AND condition2 ='blabla'
AND ( (creation_date BETWEEN to_date('14-JUN-2011 00:00:00','DD-MON-YYYY HH24:MI:SS') AND to_date('14-JUN-2011 23:59:59','DD-MON-YYYY HH24:MI:SS')) ) )
ORDER BY TO_CHAR(creation_date, 'MM/DD/YYYY') ASC;
I get error ORA-22992
Any suggestiongs?

It seems to me you have invested a lot of time in developing the wrong solution. A way simpler solution would be to use a SET operator. This query retrieves all the IDs in the local instance of ATABLE which are missing in the remote instance of the same table:
select id from atable
minus
select id from atable#db2
If your heart is set on retrieving the whole local row, you could try an anti-join:
select loc.*
from atable loc
left join atable#db2 rem
on (loc.id = rem.id )
where rem.id is null
/

I don't believe you can do that, but I've been proved wrong on many occasions... Even if you could find a way to get it to treat the contents of the CLOB as individual values for the IN you'd probably hit the 1000-item limit (ORA-01795) fairly quickly.
I'm not sure what you mean by 'I already found out that I can't just join separate tables'. Why can't you do something like:
SELECT * FROM atable#db2 WHERE id NOT IN (SELECT id FROM atable#db1)
Or:
SELECT * from atable#db2 WHERE id IN (
SELECT id FROM atable#db2 MINUS SELECT id FROM atable#db1)
(Or use #APC's anti-join, which is probably more performant!)
There may be performance issues with joining large tables on remote databases, but it looks like you have to do that at some point, and if it's a one-off task then it might be bearable.
Edited after question updated with join error
The ORA-22992 is because you're trying to pull a CLOB from the the remote database, which doesn't seem to work. From this I assume your reference to not being able to join is because you're joining two remote tables.
The simple option is not to pull all the columns - specify which you need rather than doing a select *. If you do need the CLOB value, the only thing I can suggest trying is using a CTE (WITH tmp_ids AS (SELECT <regex> FROM anotherTable#db2) ...), but I really have no idea if that avoids the two-link restriction. Or pull the IDs into a local temporary table; or run the query on one of the remote databases.

Related

How do I pass a series of parameters into a TVF to get series of tables in Big Query

In Postgres I have a query that uses a table value function
SELECT
forecast.*
FROM (
SELECT
generate_series(begin_date, end_date, '1 mon'::interval)::date zdate
) zdate
LEFT JOIN LATERAL forecast_f(zdate.zdate)
forecast(forecast_version , source, forecast_date, gl_date, customer, program, rev) ON true
where 1=1;
and forecast_f looks like:
A lot of boiler plate and then:
BEGIN
return query
select * from table where a lot of parameters are pulled in.
I'm trying to do the same thing in Bigquery and have googled around a few uncommon concepts:
Generating a series
passing parameters into a udf with the right data type (always an adventure)
tvf
and the documentation did not have a lot on TVF's I thought maybe it can't handle tvfs and I'd have to join it all in to a column and split it somehow when it comes out of the function. When I googled, others complained about special cases when TVF's don't work, but that would suggest there are cases where it does work, like maybe mine. So I made this:
create or replace temp function snap(t timestamp)
as
(select * from forecast_stuff.forecast_full_practice where zfrom <= t and (zto> t or zto is null));
select * from snap(current_time())
which didn't work. Also, this fancy number:
create or replace temp function snap(t timestamp)
as
((select intersection from forecast_stuff.forecast_full_practice where zfrom <= t and (zto> t or zto is null)));
select * from snap(current_time())
Didn't work either. Something about if not exist not supported in temporary functions. I remember doing something like this in f1 or dremel a few years back. Did they not bring the technology forward?

SQL NOT IN failed

I am working on a query that will check the temp table if there is a record that do not exist on the main table. My query looks like this
SELECT * FROM [Telemarketing].[dbo].[PDCampaignBatch_temp]
WHERE [StartDateTime] NOT IN (SELECT [StartDateTime] FROM [Telemarketing].[dbo].PDCampaignBatch GROUP BY [StartDateTime])
but the problem is it does not display this row
even if that data does not exist in my main table. What seems to be the problem?
NOT IN has strange semantics. If any values in the subquery are NULL, then the query returns no rows at all. For this reason, I strongly recommend using NOT EXISTS instead:
SELECT t.*
FROM [Telemarketing].[dbo].[PDCampaignBatch_temp] t
WHERE NOT EXISTS (SELECT 1
FROM [Telemarketing].[dbo].PDCampaignBatch cb
WHERE t.StartDateTime = cb.StartDateTime
);
If the set is evaluated by the SQL NOT IN condition contains any values that are null, then the outer query here will return an empty set, even if there are many [StartDateTime]s that match [StartDateTime]s in the PDCampaignBatch table.
To avoid such issue,
SELECT *
FROM [Telemarketing].[dbo].[PDCampaignBatch_temp]
WHERE [StartDateTime] NOT IN (
SELECT DISTINCT [StartDateTime]
FROM [Telemarketing].[dbo].PDCampaignBatch
WHERE [StartDateTime] IS NOT NULL
);
Let's say PDCampaignBatch_temp and PDCampaignBatch happen to have the same structure (same columns in the same order) and you're tasked with getting the set of all rows in PDCampaignBatch_temp that aren't in PDCampaignBatch. The most effective way to do that is to make use of the EXCEPT operator, which will deal with NULL in the expected way as well:
SELECT * FROM [Telemarketing].[dbo].[PDCampaignBatch_temp]
EXCEPT
SELECT * FROM [Telemarketing].[dbo].[PDCampaignBatch]
In production code that is not a one-off, don't use SELECT *, write out the column names instead.
Most likely your issue is with the datetime. You may be only displaying a certain degree of percision like the year/month/date. The data may be stored as year/month/date/hour/minute/second/milisecond. If so you have to match down the the most granluar measurement of the data. If one field is a date and the other is a date time they also will likely never match up. Thus you always get no responses.

SQL Server : compare two tables with UNION and Select * plus additional label column

I've been playing around with the sample on Jeff' Server blog to compare two tables to find the differences.
In my case the tables are a backup and the current data. I can get what I want with this SQL statement (simplified by removing most of the columns). I can then see the rows from each table that don't have an exact match and I can see from which table they come.
SELECT
MIN(TableName) as TableName
,[strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
FROM
(SELECT
'Old' as TableName
,[JAS001].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001].[dbo].[AR_CustomerAddresses].[strPostalCode]
FROM
[JAS001].[dbo].[AR_CustomerAddresses]
UNION ALL
SELECT
'New' as TableName
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strPostalCode]
FROM
[JAS001new].[dbo].[AR_CustomerAddresses]) tmp
GROUP BY
[strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
HAVING
COUNT(*) = 1
This Stack Overflow Answer gives me a much cleaner SQL query but does not tell me from which table the rows come.
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
UNION
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
EXCEPT
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
INTERSECT
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
I could use the first version but I have many tables that I need to compare and I think that there has to be an easy way to add the source table column to the second query. I've tried several things and googled to no avail. I suspect that maybe I'm just not searching for the correct thing since I'm sure it's been answered before.
Maybe I'm going down the wrong trail and there is a better way to compare the databases?
Could you use the following setup to accomplish your goal?
SELECT 'New not in Old' Descriptor, *
FROM
(
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
EXCEPT
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
) a
UNION
SELECT 'Old not in New' Descriptor, *
FROM
(
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
EXCEPT
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
) b
You can't add the table name there because union, except, and intersection all compare all columns. This means you can't differentiate between them by adding the table name to the query. A group by gives you control over what columns are considered in finding duplicates so you can exclude the table name.
To help you with the large number of tables you need to compare you could write a sql query off the metadata tables that hold table names and columns and generate the sql commands dynamically off those values.
Derive one column using table names like below
SELECT MIN(TableName) as TableName
,[strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
,table_name_came
FROM
(SELECT 'Old' as TableName
,[JAS001].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001].[dbo].[AR_CustomerAddresses].[strPostalCode]
,'[JAS001].[dbo].[AR_CustomerAddresses]' as table_name_came
FROM [JAS001].[dbo].[AR_CustomerAddresses]
UNION ALL
SELECT 'New' as TableName
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strPostalCode]
,'[JAS001new].[dbo].[AR_CustomerAddresses]' as table_name_came
FROM [JAS001new].[dbo].[AR_CustomerAddresses]
) tmp
GROUP BY [strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
,table_name_came
HAVING COUNT(*) = 1

Find the tables affected by user error & reverse the mistake

I'm working on an Oracle database with an error made by a user. The issue is a number of person records were moved into a different "round". Each round has "episodes": Wrong "round" means all the episode processing has been affected (episodes skipped over). These users won't receive mails they were supposed to receive as a result of missed "episodes".
I have a query put together that identifies all the records that have been mistakenly updated. I need a way to modify the query to help find all tables that have been wrongly moved into "round 2".
(All the tables I need to identify are ones featuring the "round_no" value)
EDIT: There are over 70+ tables! With "ROUND_NO" COLUMN, I need to only identify the ones with these person records found in them.
I also need to then take this data and return it back to round 1, from the incorrect round 2.
Here is the query that identifies persons that have been "skipped" into round 2 in error:
SELECT p.person_id
, p.name
, ep2.open_date
, ( SELECT pr1.open_date
FROM Person_ep ep1
WHERE ep1.person_id = ep2.person_id
AND er1.round_no = 1 /* SOMETHING IS MISSING WHERE, WHERE IS er1 defined */
)
r1epiopen /* Round 1 episode open date */
FROM person p
join region r
on r.region_code = p.region_code
and r.location_id = 50
join Person_ep er2
ON er2.person_id = p.person_id
AND er2.round_no = 2
ORDER
BY p.person_id
Using SQL Developer 3.2.20.09 on an Oracle 11G RDBMS.
Sorry to see this post that late... Hope it's not too late...
I suppose you are using Oracle 10+, and you know approximately the hour of the crime (!).
I see 2 possibilities:
1) Use the Log Miner to review the executed SQL: http://docs.oracle.com/cd/B19306_01/server.102/b14215/logminer.htm
2) Use the flashback query to review data of a table in the past. But for this one you need to test in on every suspected table (70+) :( http://docs.oracle.com/cd/E11882_01/appdev.112/e41502/adfns_flashback.htm#ADFNS01001
On the suspected table you could run this kind of SQL to see if update occurred in timeframe:
SELECT versions_startscn, versions_starttime,
versions_endscn, versions_endtime,
versions_xid, versions_operation,
description
FROM my_table
VERSIONS BETWEEN TIMESTAMP TO_TIMESTAMP('2014-01-29 14:59:08', 'YYYY-MM-DD HH24:MI:SS')
AND TO_TIMESTAMP('2014-01-29 14:59:36', 'YYYY-MM-DD HH24:MI:SS')
WHERE id = 1;
I have no practical experience using log miner but I think it would be the best solution, especially if you have archive log activated :D
You can access the data values of affected table before the update (if you know the time of the update) using a query like this one:
SELECT COUNT(*) FROM myTable AS OF TIMESTAMP TO_TIMESTAMP('2014-01-29 13:34:12', 'YYYY-MM-DD HH24:MI:SS');
Of course, data will be available only if still available (retention in undo tablepace).
You could then create a temp table with data before the update:
create table tempTableA as SELECT * FROM myTable AS OF TIMESTAMP TO_TIMESTAMP('2014-01-29 13:34:12', 'YYYY-MM-DD HH24:MI:SS');
Then update you table with values coming from tempTableA.
If you want to find all tables with column "round_no" you probably should use this query
select table_name from all_tab_columns where column_name='round_no'
if you want to get only the tables you can update
SELECT table_name
FROM user_tab_columns c, user_tables t
WHERE c.column_name = 'ROUND_NO'
AND t.table_name = c.table_name;
should work
or for the purists
SELECT table_name
FROM user_tab_columns c
JOIN user_tables t ON t.table_name = c.table_name
WHERE c.column_name = 'ROUND_NO';

Alternative SQL ways of looking up multiple items of known IDs?

Is there a better solution to the problem of looking up multiple known IDs in a table:
SELECT * FROM some_table WHERE id='1001' OR id='2002' OR id='3003' OR ...
I can have several hundreds of known items. Ideas?
SELECT * FROM some_table WHERE ID IN ('1001', '1002', '1003')
and if your known IDs are coming from another table
SELECT * FROM some_table WHERE ID IN (
SELECT KnownID FROM some_other_table WHERE someCondition
)
The first (naive) option:
SELECT * FROM some_table WHERE id IN ('1001', '2002', '3003' ... )
However, we should be able to do better. IN is very bad when you have a lot of items, and you mentioned hundreds of these ids. What creates them? Where do they come from? Can you write a query that returns this list? If so:
SELECT *
FROM some_table
INNER JOIN ( your query here) filter ON some_table.id=filter.id
See Arrays and Lists in SQL Server 2005
ORs are notoriously slow in SQL.
Your question is short on specifics, but depending on your requirements and constraints I would build a look-up table with your IDs and use the EXISTS predicate:
select t.id from some_table t
where EXISTS (select * from lookup_table l where t.id = l.id)
For a fixed set of IDs you can do:
SELECT * FROM some_table WHERE id IN (1001, 2002, 3003);
For a set that changes each time, you might want to create a table to hold them and then query:
SELECT * FROM some_table WHERE id IN
(SELECT id FROM selected_ids WHERE key=123);
Another approach is to use collections - the syntax for this will depend on your DBMS.
Finally, there is always this "kludgy" approach:
SELECT * FROM some_table WHERE '|1001|2002|3003|' LIKE '%|' || id || '|%';
In Oracle, I always put the id's into a TEMPORARY TABLE to perform massive SELECT's and DML operations:
CREATE GLOBAL TEMPORARY TABLE t_temp (id INT)
SELECT *
FROM mytable
WHERE mytable.id IN
(
SELECT id
FROM t_temp
)
You can fill the temporary table in a single client-server roundtrip using Oracle collection types.
We have a similar issue in an application written for MS SQL Server 7. Although I dislike the solution used, we're not aware of anything better...
'Better' solutions exist in 2008 as far as I know, but we have Zero clients using that :)
We created a table valued user defined function that takes a comma delimited string of IDs, and returns a table of IDs. The SQL then reads reasonably well, and none of it is dynamic, but there is still the annoying double overhead:
1. Client concatenates the IDs into the string
2. SQL Server parses the string to create a table of IDs
There are lots of ways of turning '1,2,3,4,5' into a table of IDs, but the Stored Procedure which uses the function ends up looking like...
CREATE PROCEDURE my_road_to_hell #IDs AS VARCHAR(8000)
AS
BEGIN
SELECT
*
FROM
myTable
INNER JOIN
dbo.fn_split_list(#IDs) AS [IDs]
ON [IDs].id = myTable.id
END
The fastest is to put the ids in another table and JOIN
SELECT some_table.*
FROM some_table INNER JOIN some_other_table ON some_table.id = some_other_table.id
where some_other_table would have just one field (ids) and all values would be unique