excluding duplicate fields in a join - sql

I have a dataset I'm doing analysis on. It turns out it can easily be enriched with demographic and community data which vastly improves the analytical results.
In order to do this I'm joining in demographic and community data before doing analysis. I need to exclude some fields from my core sample set, so my join looks something like this:
select sampledata.c1,
sampledata.c2,
demographics.*,
community.*
from sample data
join demographics using (zip)
join community using (fips)
This gets me multiple zip or fips columns in the output which my analysis engine can't deal with. I can't specify each field by hand - the enrichment tables result in hundreds of columns in the end.
I could do select *, but then I'd have all the columns from my sample data which I don't want.
How can I join in my enrichment data without duplicating fields, whilst still selecting the columns I want from my sample table?
One thought I had, was if postgres (my database) could fully qualify each column in the output (like sample.c1, demographics.c1, etc) I would be perfectly happy with this.

There is no column exclusion syntax in SQL, there is only column inclusion syntax (via the * operator for all columns, or listing the column names explicitly).
Generate list of only columns you want
However, you could generate the SQL statement with its hundreds of column names, minus the few duplicate columns you do not want, using schema tables and some built-in functions of your database.
SELECT
'SELECT sampledata.c1, sampledata.c2, ' || ARRAY_TO_STRING(ARRAY(
SELECT 'demographics' || '.' || column_name
FROM information_schema.columns
WHERE table_name = 'demographics'
AND column_name NOT IN ('zip')
UNION ALL
SELECT 'community' || '.' || column_name
FROM information_schema.columns
WHERE table_name = 'community'
AND column_name NOT IN ('fips')
), ',') || ' FROM sampledata JOIN demographics USING (zip) JOIN community USING (fips)'
AS statement
This only prints out the statement, it does not execute it. Then you just copy the result and run it.
If you want to both generate and run the statement dynamically in one go, then you may read up on how to run dynamic SQL in the PostgreSQL documentation.
Prepend column names with table name
Alternately, this generates a select list of all the columns, including those with duplicate data, but then aliases them to include the table name of each column as well.
SELECT
'SELECT ' || ARRAY_TO_STRING(ARRAY(
SELECT table_name || '.' || column_name || ' AS ' || table_name || '_' || column_name
FROM information_schema.columns
WHERE table_name in ('sampledata', 'demographics', 'community')
), ',') || ' FROM sampledata JOIN demographics USING (zip) JOIN community USING (fips)'
AS statement
Again, this only generates the statement. If you want to both generate and run the statement dynamically, then you'll need to brush up on dynamic SQL execution for your database, otherwise just copy and run the result.
If you really want a dot separator in the column aliases, then you'll have to use double-quoted aliases such as SELECT table_name || '.' || column_name || ' AS "' || table_name || '.' || column_name || '"'. However, double-quoted aliases can cause extra complications (case-sensitivity, etc); so, I used the underscore character instead to separate the table name from the column name within the alias, and the aliases can then be treated like regular column names else-wise.

Related

Is there a way to execute a query on a database schema instead of a table

Thanks for reading my post. In our organisation we make use of an IBM DB2 database with multiple schema's which all have their own tables, procedures, views, etc... We would like to find a quick way to query one of these schema's based on the 'changed_by' field which exists in every table of the schema.
One of our users had write access on our database. We want to have an overview of which table's exactly he has updated in the past days. It is to much work to query every table of the schema individually.
The schema name is S_ORDER_SUMM, the schema contains 182 tables.
Something like this is what we need:
select (ALL TABLES) from S_ORDER_SUMM
where CHANGED_BY = 'Our_User'
Any help would be highly appreciated.
SELECT
-- 'UNION ALL ' ||
'SELECT ''' || T.TABNAME || ''' FROM SYSIBM.SYSDUMMY1 '
||'WHERE EXISTS (SELECT 1 FROM "' || T.TABSCHEMA || '"."' || T.TABNAME || '" '
||'WHERE CHANGE_DATE > CURRENT TIMESTAMP - 2 DAY AND CHANGED_BY=''Our_User'')'
FROM SYSCAT.TABLES T
JOIN SYSCAT.COLUMNS C ON C.TABSCHEMA=T.TABSCHEMA AND C.TABNAME=T.TABNAME
WHERE T.TABSCHEMA='S_ORDER_SUMM' AND T.TYPE='T'
AND C.COLNAME IN ('CHANGE_DATE', 'CHANGED_BY')
GROUP BY T.TABSCHEMA, T.TABNAME
HAVING COUNT(1)=2;
The query above returns a list of SELECT statements on every table of schema S_ORDER_SUMM containing both CHANGE_DATE and CHANGED_BY columns.
It's a series of the following statements (one line per statement in reality, I've formatted it just for demo):
SELECT 'MYTABLE'
FROM SYSIBM.SYSDUMMY1
WHERE EXISTS
(
SELECT 1
FROM "S_ORDER_SUMM"."MYTABLE"
WHERE CHANGE_DATE > CURRENT TIMESTAMP - 2 DAY AND CHANGED_BY='Our_User'
)
If you save the output to some file, for example, you may run this script afterwards.
You may generate a single statement for all tables as well. But you need to uncomment the commented out line and wrap the output into a final SELECT statement manually.

Improve query perfomance of a union search view when querying just one table

I have implemented a simple full-text search on my postgresql database using a view. E.g. something like
create view searches as
(
select id as searchable_id, 'Person' as searchable_type,
coalesce(last_name, '') || ' ' || coalesce(first_name, '') || ' ' || coalesce(organization_name, '') || ' ' || coalesce(comments, '') as term
from people
union
select id as searchable_id, 'Community' as searchable_type,
name || ' ' || coalesce(comments, '') as term
from communities
union
select id as searchable_id, 'Street' as searchable_type,
name || ' ' || coalesce(comments, '') as term
from streets
)
This is a simplified example: it contains a union of 14 tables.
This way I can in general easily query all "records" matching a certain search-term. Nice. I can easily count them, and then I present to the user the nr of matches per "searchable_type".
The user then has the option to retrieve a specific kind, and then I bump into performance problems: querying the table always does a full-table scan, over all tables, even if I specify a specific "table" (searchable_type).
So to give an indication:
select * from searches where searchable_type like '%something%' and searchable_type='Person'
takes about 5 seconds, and if I run the same query
select * from (
select id as searchable_id, 'Person' as searchable_type,
coalesce(last_name, '') || ' ' || coalesce(first_name, '') || ' ' || coalesce(organization_name, '') || ' ' || coalesce(comments, '') as term
from people) as searches
where term like '%something%'
returns in a ~40ms.
So how can I solve this? I would want to use the view but with the performance of the single query. In other words: how can I avoid the duplication of defining the query twice? (once in the view and once separately).
Ideas to improve the speed:
use a materialized view with indexes, it should soar, but the data is of course dynamic so then we have the refreshes and not sure how costly that is
use some kind of hints (does that exist?) so postgresql knows it only needs to check one table?
instead of defining a single big view, define smaller views and the union the select per table-search-view so we have less duplication
actually does it make sense to use a view in that case, why not build a query every time with all the per-table-queries ad hoc? Is there any performance benefit in using a view over a big query?
If instead of using a string for "searchable_type", you would use an integer, you would improve a lot your performances.
You can add an additional table to get the corresponding integer id from String ("Person" for exemple).

one query for many similar tables

I have an Oracle database with many tables that have identical structure (columns are all the same). The table names are similar also. The names of the tables are like table_1, table_2, table_3...
I know this isn't the most efficient design, but I don't have the option of changing this at this time.
In this case, is it possible to make a single sql query, to extract all rows with the same condition across multiple tables (hundreds of tables) without explicitly using the exact table name?
I realize I could use something like
select * from table_1 UNION select * from table_2 UNION select * from table_3...select * from table_1000
But is there a more elegant sql statement that can be run that extracts from all matching table names into one result without having to name each table explicitly.
Something like
select * from table_%
Is something like that possible? If not, what is the most efficient way to write this query?
You can use dbms_xmlgen to query tables using a pattern, which generates an XML document as a CLOB:
select dbms_xmlgen.getxml('select * from ' || table_name
|| ' where some_col like ''%Test%''') as xml_clob
from user_tables
where table_name like 'TABLE_%';
You said you wanted a condition, so I've included a dummy one, where some_col like '%Test%'.
You can then use XMLTable to extract the values back as relational data, converting the CLOB to XMLType on the way:
select x.*
from (
select xmltype(dbms_xmlgen.getxml('select * from ' || table_name
|| ' where some_col like ''%Test%''')) as xml
from user_tables
where table_name like 'TABLE_%'
) t
cross join xmltable('/ROWSET/ROW'
passing t.xml
columns id number path 'ID',
some_col varchar2(10) path 'SOME_COL'
) x;
SQL Fiddle demo which retrieves one matching row from each of two similar tables. Of course, this assumes your table names follow a useful pattern like table_%, but you suggest they do.
This is the only way I know to do something like this without resorting to PL/SQL (and having searched back a bit, was probably inspired by this answer to count multiple tables). Whether it's efficient (enough) is something you'd need to test with your data.
This is kind of messy and best performed in a middle-tier, but I suppose you could basically loop over the tables and use EXECUTE IMMEDIATE to do it.
Something like:
for t in (select table_name from all_tables where table_name like 'table_%') loop
execute immediate 'select blah from ' || t.table_name;
end loop;
You can write "select * from table_1 and table_2 and tabl_3;"

PostgreSQL: change order of columns in query

I have a huge query with about 30 columns.
I ordered the query with:
Select *
From
.
.
.
order by id,status
Now I want that in the result to present columns in certain way.
The id column will be first, followed by status column and then all the rest.
is there a way to do that (without manually specifying 30 column names in select). Something like: Select id,status, REST
this will give you all columns except those you don't want to
SELECT id, status,' || array_to_string(ARRAY(SELECT 'o' || '.' || c.column_name
FROM information_schema.columns As c
WHERE table_name = 'table_name'
AND c.column_name NOT IN('id', 'status')
), ',') || ' FROM officepark As o' As sqlstmt
The "select *" will return the fields in the order in which they were listed when the table was created. If you want them returned in a particular order, just be sure to create the table with that order.
If you have to do it repeatly, you could create a new table:
CREATE TABLE FOO as
SELECT id, status, mydesiredorder
Or just a view,don't forget to move index constraint and foreign keys. If you must do it just once, was faster specify 30 columns than ask here

Oracle: "grep" across multiple columns?

I would like to perform a like or regexp across several columns. These columns contain tags, keywords, etc. Something that's the equivalent of:
sqlplus scott/tiger #myquery.sql | grep somestring
Right now I have something like
select * from foo where c1 || c2 || c3 like '%somestring%'
but I'm hoping I can get something a bit more general purpose and/or optimized. Any clues appreciated!
Have you thought about using the Concatenated Datastore feature of Oracle Text?
Oracle Text lets you create full-text indexes on multiple columns in the same table. Or at least there's a process by which you can do this.
There's a good example document on the Oracle site I've used before:
http://www.oracle.com/technology/sample_code/products/text/htdocs/concatenated_text_datastore/cdstore_readme.html
Oracle Text searches are ridiculously fast. I think I'd look at keeping separate context indexes on each individual column so that you could apply relevance and priority to each column match.
Let me know if you'd like an example and I'll add something to the answer.
Hope this helps.
On 11G you could create a virtual column:
alter table foo add all_text varchar2(4000) generated always as (c1 ||','|| c2 ||','|| c3);
(See Oracle 11G new features).
Then query:
select * from foo where all_text like '%somestring%'
You could add an index on all_text if it helps performance too (see this answer for when it might help and when not).
Prior to 11G you could do the same thing but with a normal column, maintained via a trigger.
As a consultant I often have to search in poorly documented databases and have the need to have some handy scripts to find data. Here is two examples how to generate a select clause for searching data in all 'VARCHAR2' columns in a table:
Example1. Search part of string:
SELECT 'SELECT * FROM ' || min(TABLE_NAME) ||' WHERE ' || LISTAGG(COLUMN_NAME, ' like ''%somestring%'' or ') WITHIN GROUP (ORDER BY COLUMN_ID) || ' like ''%somestring%'';'
from ALL_TAB_COLUMNS
WHERE OWNER = 'YOUR_SCHEMA_NAME' -- Uppercase
AND TABLE_NAME = 'YOUR_TABLE_NAME' --Uppercase
AND DATA_TYPE LIKE 'VARCHAR2';
Example2. Search the entire value:
SELECT 'SELECT * FROM ' || min(TABLE_NAME) ||' WHERE ''somestring'' in (' || LISTAGG(COLUMN_NAME, ', ') WITHIN GROUP (ORDER BY COLUMN_ID) || ');'
from ALL_TAB_COLUMNS
WHERE OWNER = 'YOUR_SCHEMA_NAME' -- Uppercase
AND TABLE_NAME = 'YOUR_TABLE_NAME' --Uppercase
AND DATA_TYPE LIKE 'VARCHAR2';
Does regexp_like help.
http://www.psoug.org/reference/regexp.html
SELECT * FROM table WHERE REGEXP_LIKE(col1, <pattern>)
union
SELECT * FROM table WHERE REGEXP_LIKE(col2, <pattern>)
union
SELECT * FROM table WHERE REGEXP_LIKE(col3, <pattern>)
this should work. but i doubt if this would be any better in performance than your query. you might want to compare the performances of both. would really love to hear from you on your findings. :-)