Oracle: "grep" across multiple columns? - sql

I would like to perform a like or regexp across several columns. These columns contain tags, keywords, etc. Something that's the equivalent of:
sqlplus scott/tiger #myquery.sql | grep somestring
Right now I have something like
select * from foo where c1 || c2 || c3 like '%somestring%'
but I'm hoping I can get something a bit more general purpose and/or optimized. Any clues appreciated!

Have you thought about using the Concatenated Datastore feature of Oracle Text?
Oracle Text lets you create full-text indexes on multiple columns in the same table. Or at least there's a process by which you can do this.
There's a good example document on the Oracle site I've used before:
http://www.oracle.com/technology/sample_code/products/text/htdocs/concatenated_text_datastore/cdstore_readme.html
Oracle Text searches are ridiculously fast. I think I'd look at keeping separate context indexes on each individual column so that you could apply relevance and priority to each column match.
Let me know if you'd like an example and I'll add something to the answer.
Hope this helps.

On 11G you could create a virtual column:
alter table foo add all_text varchar2(4000) generated always as (c1 ||','|| c2 ||','|| c3);
(See Oracle 11G new features).
Then query:
select * from foo where all_text like '%somestring%'
You could add an index on all_text if it helps performance too (see this answer for when it might help and when not).
Prior to 11G you could do the same thing but with a normal column, maintained via a trigger.

As a consultant I often have to search in poorly documented databases and have the need to have some handy scripts to find data. Here is two examples how to generate a select clause for searching data in all 'VARCHAR2' columns in a table:
Example1. Search part of string:
SELECT 'SELECT * FROM ' || min(TABLE_NAME) ||' WHERE ' || LISTAGG(COLUMN_NAME, ' like ''%somestring%'' or ') WITHIN GROUP (ORDER BY COLUMN_ID) || ' like ''%somestring%'';'
from ALL_TAB_COLUMNS
WHERE OWNER = 'YOUR_SCHEMA_NAME' -- Uppercase
AND TABLE_NAME = 'YOUR_TABLE_NAME' --Uppercase
AND DATA_TYPE LIKE 'VARCHAR2';
Example2. Search the entire value:
SELECT 'SELECT * FROM ' || min(TABLE_NAME) ||' WHERE ''somestring'' in (' || LISTAGG(COLUMN_NAME, ', ') WITHIN GROUP (ORDER BY COLUMN_ID) || ');'
from ALL_TAB_COLUMNS
WHERE OWNER = 'YOUR_SCHEMA_NAME' -- Uppercase
AND TABLE_NAME = 'YOUR_TABLE_NAME' --Uppercase
AND DATA_TYPE LIKE 'VARCHAR2';

Does regexp_like help.
http://www.psoug.org/reference/regexp.html

SELECT * FROM table WHERE REGEXP_LIKE(col1, <pattern>)
union
SELECT * FROM table WHERE REGEXP_LIKE(col2, <pattern>)
union
SELECT * FROM table WHERE REGEXP_LIKE(col3, <pattern>)
this should work. but i doubt if this would be any better in performance than your query. you might want to compare the performances of both. would really love to hear from you on your findings. :-)

Related

Is there a way to execute a query on a database schema instead of a table

Thanks for reading my post. In our organisation we make use of an IBM DB2 database with multiple schema's which all have their own tables, procedures, views, etc... We would like to find a quick way to query one of these schema's based on the 'changed_by' field which exists in every table of the schema.
One of our users had write access on our database. We want to have an overview of which table's exactly he has updated in the past days. It is to much work to query every table of the schema individually.
The schema name is S_ORDER_SUMM, the schema contains 182 tables.
Something like this is what we need:
select (ALL TABLES) from S_ORDER_SUMM
where CHANGED_BY = 'Our_User'
Any help would be highly appreciated.
SELECT
-- 'UNION ALL ' ||
'SELECT ''' || T.TABNAME || ''' FROM SYSIBM.SYSDUMMY1 '
||'WHERE EXISTS (SELECT 1 FROM "' || T.TABSCHEMA || '"."' || T.TABNAME || '" '
||'WHERE CHANGE_DATE > CURRENT TIMESTAMP - 2 DAY AND CHANGED_BY=''Our_User'')'
FROM SYSCAT.TABLES T
JOIN SYSCAT.COLUMNS C ON C.TABSCHEMA=T.TABSCHEMA AND C.TABNAME=T.TABNAME
WHERE T.TABSCHEMA='S_ORDER_SUMM' AND T.TYPE='T'
AND C.COLNAME IN ('CHANGE_DATE', 'CHANGED_BY')
GROUP BY T.TABSCHEMA, T.TABNAME
HAVING COUNT(1)=2;
The query above returns a list of SELECT statements on every table of schema S_ORDER_SUMM containing both CHANGE_DATE and CHANGED_BY columns.
It's a series of the following statements (one line per statement in reality, I've formatted it just for demo):
SELECT 'MYTABLE'
FROM SYSIBM.SYSDUMMY1
WHERE EXISTS
(
SELECT 1
FROM "S_ORDER_SUMM"."MYTABLE"
WHERE CHANGE_DATE > CURRENT TIMESTAMP - 2 DAY AND CHANGED_BY='Our_User'
)
If you save the output to some file, for example, you may run this script afterwards.
You may generate a single statement for all tables as well. But you need to uncomment the commented out line and wrap the output into a final SELECT statement manually.

SQL inner join based on table name pattern

On this legacy SQL Database with hundreds of tables, I need to do a inner join on all tables whose name follows a format:
barX_foo_bazX
barX_foo_bazY
barZ_foo_bazZ
I would like to inner join all tables with foo in their name
I am not sure this is possible at all.
Clearly, with this syntax it is not (but it may help understand what I'm aiming at):
USE [LegacyDB_Name]
SELECT *
FROM '%_foo_%' inner join '%_foo_%'
where my_stuff_is(some condition)
Any Suggestions? Ideas on how I can do this? Maybe there is an easier path this young padawan is not seeing...
Many Thanks!
I am not sure this is possible at all.
Nope, table names cannot contain or use wildcards, they must be strings.
My advice would be to find whatever program makes these select queries and include whatever pattern matching you need in the queries in there.
But your finished query must contain table names as strings.
Maybe the simplest way to do this is to declare a cursor based on the below query and build a dynamic sql query. Research tsql cursor and dynamic sql execution and it should be fairly simple.
SELECT *
FROM information_schema.tables
Where Table_Type = 'Base Table' And Table_Name Like '%_foo_%'
If your tables all have the same structure (i.e. columns), then you could do this in two steps.
Generate the SQL statement:
select 'UNION ALL SELECT ''' + table_name + ''' AS table_name, * FROM '
+ table_name AS stmt
from information_schema.tables
where table_type = 'BASE TABLE'
and table_catalog = 'LegacyDB_Name'
and table_name LIKE '%foo%';
The output will be something like:
stmt
--------------------------------------------------------------------
UNION ALL SELECT 'barX_foo_bazX' AS table_name, * FROM barX_foo_bazX
UNION ALL SELECT 'barX_foo_bazY' AS table_name, * FROM barX_foo_bazY
UNION ALL SELECT 'barX_foo_bazZ' AS table_name, * FROM barX_foo_bazZ
From this output, copy the SQL rows and remove the first 2 words (UNION ALL) from the first line. This is a valid SQL statement.
Execute the SQL statement derived above
If you need this SQL more often, then create a view for it:
CREATE OR REPLACE VIEW all_foo AS
SELECT 'barX_foo_bazX' AS table_name, * FROM barX_foo_bazX
UNION ALL SELECT 'barX_foo_bazY' AS table_name, * FROM barX_foo_bazY
UNION ALL SELECT 'barX_foo_bazZ' AS table_name, * FROM barX_foo_bazZ;
Now you can query like
SELECT * FROM all_foo WHERE ...

one query for many similar tables

I have an Oracle database with many tables that have identical structure (columns are all the same). The table names are similar also. The names of the tables are like table_1, table_2, table_3...
I know this isn't the most efficient design, but I don't have the option of changing this at this time.
In this case, is it possible to make a single sql query, to extract all rows with the same condition across multiple tables (hundreds of tables) without explicitly using the exact table name?
I realize I could use something like
select * from table_1 UNION select * from table_2 UNION select * from table_3...select * from table_1000
But is there a more elegant sql statement that can be run that extracts from all matching table names into one result without having to name each table explicitly.
Something like
select * from table_%
Is something like that possible? If not, what is the most efficient way to write this query?
You can use dbms_xmlgen to query tables using a pattern, which generates an XML document as a CLOB:
select dbms_xmlgen.getxml('select * from ' || table_name
|| ' where some_col like ''%Test%''') as xml_clob
from user_tables
where table_name like 'TABLE_%';
You said you wanted a condition, so I've included a dummy one, where some_col like '%Test%'.
You can then use XMLTable to extract the values back as relational data, converting the CLOB to XMLType on the way:
select x.*
from (
select xmltype(dbms_xmlgen.getxml('select * from ' || table_name
|| ' where some_col like ''%Test%''')) as xml
from user_tables
where table_name like 'TABLE_%'
) t
cross join xmltable('/ROWSET/ROW'
passing t.xml
columns id number path 'ID',
some_col varchar2(10) path 'SOME_COL'
) x;
SQL Fiddle demo which retrieves one matching row from each of two similar tables. Of course, this assumes your table names follow a useful pattern like table_%, but you suggest they do.
This is the only way I know to do something like this without resorting to PL/SQL (and having searched back a bit, was probably inspired by this answer to count multiple tables). Whether it's efficient (enough) is something you'd need to test with your data.
This is kind of messy and best performed in a middle-tier, but I suppose you could basically loop over the tables and use EXECUTE IMMEDIATE to do it.
Something like:
for t in (select table_name from all_tables where table_name like 'table_%') loop
execute immediate 'select blah from ' || t.table_name;
end loop;
You can write "select * from table_1 and table_2 and tabl_3;"

PostgreSQL: change order of columns in query

I have a huge query with about 30 columns.
I ordered the query with:
Select *
From
.
.
.
order by id,status
Now I want that in the result to present columns in certain way.
The id column will be first, followed by status column and then all the rest.
is there a way to do that (without manually specifying 30 column names in select). Something like: Select id,status, REST
this will give you all columns except those you don't want to
SELECT id, status,' || array_to_string(ARRAY(SELECT 'o' || '.' || c.column_name
FROM information_schema.columns As c
WHERE table_name = 'table_name'
AND c.column_name NOT IN('id', 'status')
), ',') || ' FROM officepark As o' As sqlstmt
The "select *" will return the fields in the order in which they were listed when the table was created. If you want them returned in a particular order, just be sure to create the table with that order.
If you have to do it repeatly, you could create a new table:
CREATE TABLE FOO as
SELECT id, status, mydesiredorder
Or just a view,don't forget to move index constraint and foreign keys. If you must do it just once, was faster specify 30 columns than ask here

excluding duplicate fields in a join

I have a dataset I'm doing analysis on. It turns out it can easily be enriched with demographic and community data which vastly improves the analytical results.
In order to do this I'm joining in demographic and community data before doing analysis. I need to exclude some fields from my core sample set, so my join looks something like this:
select sampledata.c1,
sampledata.c2,
demographics.*,
community.*
from sample data
join demographics using (zip)
join community using (fips)
This gets me multiple zip or fips columns in the output which my analysis engine can't deal with. I can't specify each field by hand - the enrichment tables result in hundreds of columns in the end.
I could do select *, but then I'd have all the columns from my sample data which I don't want.
How can I join in my enrichment data without duplicating fields, whilst still selecting the columns I want from my sample table?
One thought I had, was if postgres (my database) could fully qualify each column in the output (like sample.c1, demographics.c1, etc) I would be perfectly happy with this.
There is no column exclusion syntax in SQL, there is only column inclusion syntax (via the * operator for all columns, or listing the column names explicitly).
Generate list of only columns you want
However, you could generate the SQL statement with its hundreds of column names, minus the few duplicate columns you do not want, using schema tables and some built-in functions of your database.
SELECT
'SELECT sampledata.c1, sampledata.c2, ' || ARRAY_TO_STRING(ARRAY(
SELECT 'demographics' || '.' || column_name
FROM information_schema.columns
WHERE table_name = 'demographics'
AND column_name NOT IN ('zip')
UNION ALL
SELECT 'community' || '.' || column_name
FROM information_schema.columns
WHERE table_name = 'community'
AND column_name NOT IN ('fips')
), ',') || ' FROM sampledata JOIN demographics USING (zip) JOIN community USING (fips)'
AS statement
This only prints out the statement, it does not execute it. Then you just copy the result and run it.
If you want to both generate and run the statement dynamically in one go, then you may read up on how to run dynamic SQL in the PostgreSQL documentation.
Prepend column names with table name
Alternately, this generates a select list of all the columns, including those with duplicate data, but then aliases them to include the table name of each column as well.
SELECT
'SELECT ' || ARRAY_TO_STRING(ARRAY(
SELECT table_name || '.' || column_name || ' AS ' || table_name || '_' || column_name
FROM information_schema.columns
WHERE table_name in ('sampledata', 'demographics', 'community')
), ',') || ' FROM sampledata JOIN demographics USING (zip) JOIN community USING (fips)'
AS statement
Again, this only generates the statement. If you want to both generate and run the statement dynamically, then you'll need to brush up on dynamic SQL execution for your database, otherwise just copy and run the result.
If you really want a dot separator in the column aliases, then you'll have to use double-quoted aliases such as SELECT table_name || '.' || column_name || ' AS "' || table_name || '.' || column_name || '"'. However, double-quoted aliases can cause extra complications (case-sensitivity, etc); so, I used the underscore character instead to separate the table name from the column name within the alias, and the aliases can then be treated like regular column names else-wise.