Distinct Join SQL with n number of Databases - sql

May seem silly, but still working through SQL and I've hit my limitation on what I can write out.
Here is the scenario: I have many SQL Servers that have multiple databases in them. I need to search every database and if a table is there return the results then join those results all together with a secondary database.
I also need the database name that each of these tables live in.
Query #1 which returns the the data that is important
SELECT DISTINCT
datafeed_id AS 'Datafeed ID',
MAX(start_time) AS 'Start Time',
MAX(end_time) AS 'End Time',
MAX(status_id) AS 'Status',
MAX(DATEDIFF(mi, [start_time], [end_time])) AS 'Run Time'
FROM
dbo.tblDataFeedHistory
GROUP BY
datafeed_id
Query #2 which returns the information Supporting Query1 that I need. Note I was calling each database out individually here but there could n databases
CREATE OR ALTER VIEW [vAllDatabasesDataFeedResult]
AS
SELECT
Name = 'Instance',
[datafeed_name], [status], [is_active]
FROM
instance.dbo.tblDatafeed
UNION ALL
SELECT
Name = 'Instance1',
[datafeed_name], [status], [is_active]
FROM
Instance2.dbo.tblDataFeed;
What I'd like to see output from ALL databases in SQL IF they contain the table dbo.tblDataFeed
| Database Name | Datafeed_name | Is_active | Status | Start_time | End_Time | Status_id | Runtime
I'd prefer it return the results vice creating a new view.
Thanks in advance

You can use sp_MSForEachDB.
For EXAMPLE, below, I have maybe 10 databases. Two of them have my recurring "LogEntry" table. If I wanted to get the count of rows per database, I can use the below.
NOTE, I have to check for "if exists" on the table, or the query will fail on databases without the table.
Instead of SELECT, I could also push my counts to an aggregate database/table/column(s).
You can do whatever, the key is the sp_MSForEachDB calls.
EXECUTE sp_MSForEachDB
'USE ?; IF (EXISTS (SELECT *
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = ''dbo''
AND TABLE_NAME = ''LogEntry''))
BEGIN
Select DB_NAME() AS DBName , COUNT(*) from dbo.LogEntry
END'

Related

Is there a way I create a SQL view of multiple tables based on a query of the table schema?

I have a SQL database where there are multiple tables maintained by other people which I would like to join up to create a view. The trouble is the number of tables keeps expanding! The columns and character lengths are the same.
I can get as far as creating a list of the tables by using
SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME LIKE 'Table%'.
At the moment I have a union all query like the below
SELECT * FROM table1
UNION ALL
SELECT * FROM table2;
but the table list keeps growing. Is there any I can create something to loop thought the tables? Something like
*SELECT * FROM (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME LIKE 'Table%') UNION ALL*
I know that wouldn't work but I'm hoping there's some sort of trick to get to go all the way! This is on SQL Server 2012 if that helps.
Thanks
declare #s nvarchar(max);
set #s=''
select
#s=#s +'(select * from [' + s.name + '])'+
case when ROW_NUMBER() over (order by s.name) != count(*) over ()
then
' UNION ALL '
else
''
end
from
sys.tables as s
where s.name like 't%'
print #s;
There is a bit more to think about, you need to make sure the field counts are the same before hand also to be on the safe side, may be best to avoid select * and use field names that you require. This should produce the SQL statement to run in a stored proc as detailed in the comments. Spend some time error trapping this and doing neccessary checks for continuity in the field names as mentioned.

How to get a list of tables used for each query in the query history in snowflake

I am doing analysis on table usage. I need to use the query history table and take out all tables that were used in a query. The query itself could use 10 separate tables, or even use the same table multiple times. I need to be able to parse through the query (using a query itself, not python or any other language, just SQL) and take out a list of all tables that the query hit.
Is there any way to do this?
An example of where I am getting the histories would be this query:
select query_text
from table(information_schema.query_history())
An alternative approach using rlike and information_schema.tables.
You could extend this further by looking at the # rows per table (high = fact, low = dimension) and the number of times accessed.
select query_text, array_agg(DISTINCT TABLE_NAME::string)
from
(select top 100 query_text
from
table(information_schema.query_history())
where
EXECUTION_STATUS = 'SUCCESS' ) a
left outer join
(select TABLE_NAME from INFORMATION_SCHEMA.TABLES group by TABLE_NAME) b
on
upper(a.query_text) rlike '.*('||upper(b.table_name)||').*'
group by
query_text
Extended Version:
I noticed there's some issues with the above answer. Firstly that it does not allow you to run the explain plan any more than one query at a time. Secondly if the query_id uses a cache it fails to return any objects.
So be extending my initial answer as follows.
Create a couple views that read all the databases and provide central authority on all tables/views/objects/query_histories.
Run the generated SQL which creates a couple views. It again uses rlike but substitutes database and schema names from the query_history when not present.
I've added credits, time elapsed to the two views for more extensions.
You can validate for yourself by checking the explain plan as above and if you don't see identical tables check the SQL and you'll most likely see the cache has been used.
Would be great to hear if anyone finds this useful.
Step 1 Create a couple views :
show databases;
select RTRIM( 'create or replace view bob as ( '||listagg('select CONCAT_WS(\'.\',TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME) table_name_3,CONCAT_WS(\'.\',TABLE_SCHEMA, TABLE_NAME) table_name_2,TABLE_NAME, ROW_COUNT, BYTES from ' ||"name" ||'.INFORMATION_SCHEMA.TABLES union all ') ,' union all')||')' tabs,
RTRIM( 'create or replace view bobby as ( '||listagg('select QUERY_ID, query_text ,DATABASE_NAME, SCHEMA_NAME, CREDITS_USED_CLOUD_SERVICES , TOTAL_ELAPSED_TIME from table( '||"name" ||'.information_schema.query_history()) where EXECUTION_STATUS = \'SUCCESS\' union all ') ,' union all')||')' tabs2
from table(result_scan( LAST_QUERY_ID()));
Step 2 Run this SQL:
select
QUERY_TEXT,
query_id,
CREDITS_USED,
TOTAL_ELAPSED,
array_agg(TABLE_NAME_3) tables_used
from
(
select
QUERY_TEXT
,query_id
,TABLE_NAME
, rlike( (a.query_text) , '.*(\\s.|\\.){1}('||(bob.TABLE_NAME)||'(\\s.*|$))','is') aa
, rlike( (a.query_text) , '.*(\\s.|\\.){1}('||(bob.TABLE_NAME_2)||'(\\s.*|$))','is') bb
, rlike( (a.query_text) , '.*(\\s.){1}('||(bob.TABLE_NAME_3)||'(\\s.*|$))','is') cc,
bob.TABLE_NAME_3,
count(1) cnt,
max(CREDITS_USED_CLOUD_SERVICES) CREDITS_USED,
max(TOTAL_ELAPSED_TIME) TOTAL_ELAPSED
from
BOBBY a
left outer join
BOB
on
rlike( (a.query_text) , '.*(\\s.|\\.){1}('||(bob.TABLE_NAME)||'(\\s.*|$))','is')
or rlike( (a.query_text) , '.*(\\s.|\\.){1}('||(bob.TABLE_NAME_2)||'(\\s.*|$))','is')
or rlike( (a.query_text) , '.*(\\s.|\\.){1}('||(bob.TABLE_NAME_3)||'(\\s.*|$))','is')
where
TABLE_NAME is not null
and ( cc
or iff(bb, upper( DATABASE_NAME||'.'||TABLE_NAME) = bob.TABLE_NAME_3, false)
or iff(aa, upper (DATABASE_NAME||'.'||SCHEMA_NAME||'.'||TABLE_NAME) = bob.TABLE_NAME_3, false)
)
group by
1,2,3,4,5,6,7)
group by
1,2,3,4;
The ACCESS_HISTORY view will tell you what tables are used in a query:
https://docs.snowflake.com/en/sql-reference/account-usage/access_history.html
That's an enterprise level feature. You could also run EXPLAIN on any query:
SELECT *, "objects"
FROM TABLE(EXPLAIN_JSON(SYSTEM$EXPLAIN_PLAN_JSON('SELECT * FROM a.b.any_table_or_view')))
WHERE "operation"='TableScan'
See more at https://stackoverflow.com/a/64343564/132438

Can you force SQL Server to send the WHERE clause to Linked Server?

I'm trying to determine if a table in my SQL Server 2012 database has any records that don't exist in a table that's on a linked Oracle 11g database.
I tried to do this with the following:
select 1
from my_order_table ord
where not exists (select 1
from LINK_ORA..[SCHEMA1].[ORDERS]
where doc_id = ord.document_id)
and document_id = 'N2324JKL3511'
The issue is that it never completes because the ORDERS table on the linked server has about 100 million rows and as per the explain plan on SQL Server, it is trying to pull back the entire ORDERS table from the linked server and then apply the WHERE clause.
As per the explain plan, it views the remote table as having an estimated 10000 rows - I assume that's some kind of default if it is unable to get statistics..?
Even running something as simple as this:
select 1 from LINK_ORA..[SCHEMA1].[ORDERS] where doc_id = 'N2324JKL3511'
causes SQL Server to not send the WHERE clause and the query never completes.
I tried to use OPENQUERY however it won't let me add the doc_id to concatenate into the WHERE clause of the query string.
Then I tried to build a select FROM OPENQUERY string in a function but I can't use sp_executesql in a function to run it.
Any help is greatly appreciated.
I think this would logically work for you, but it may take too long as well.
SELECT sql_ord.*
FROM my_order_table sql_ord
LEFT JOIN LINK_ORA..[SCHEMA1].[ORDERS] ora_ord ON sql_ord.document_id = ora_ord.doc_id
WHERE sql_ord.document_id = 'N2324JKL3511'
AND ora_ord.doc_id IS NULL
Since you have problem with something as simple as select 1 from LINK_ORA..[SCHEMA1].[ORDERS] where doc_id = 'N2324JKL3511' have you try to create a table on the remote server that will hold the doc_id that you want to look at. So your SELECT will include a table that contain only 1 row. I'm just not sure about the INSERT since I can't test it for now. I'm assuming that everything will be done on the remote server.
So something like :
CREATE TABLE LINK_ORA..[SCHEMA1].linked_server_doc_id (
doc_id nvarchar(12));
INSERT INTO LINK_ORA..[SCHEMA1].linked_server_doc_id (doc_id)
SELECT doc_id
FROM LINK_ORA..[SCHEMA1].[ORDERS] WHERE doc_id = 'N2324JKL3511';
select 1
from my_order_table ord
where not exists (select 1
from LINK_ORA..[SCHEMA1].[linked_server_doc_id]
where doc_id = ord.document_id)
and document_id = 'N2324JKL3511';
DROP TABLE LINK_ORA..[SCHEMA1].linked_server_doc_id

Find the tables affected by user error & reverse the mistake

I'm working on an Oracle database with an error made by a user. The issue is a number of person records were moved into a different "round". Each round has "episodes": Wrong "round" means all the episode processing has been affected (episodes skipped over). These users won't receive mails they were supposed to receive as a result of missed "episodes".
I have a query put together that identifies all the records that have been mistakenly updated. I need a way to modify the query to help find all tables that have been wrongly moved into "round 2".
(All the tables I need to identify are ones featuring the "round_no" value)
EDIT: There are over 70+ tables! With "ROUND_NO" COLUMN, I need to only identify the ones with these person records found in them.
I also need to then take this data and return it back to round 1, from the incorrect round 2.
Here is the query that identifies persons that have been "skipped" into round 2 in error:
SELECT p.person_id
, p.name
, ep2.open_date
, ( SELECT pr1.open_date
FROM Person_ep ep1
WHERE ep1.person_id = ep2.person_id
AND er1.round_no = 1 /* SOMETHING IS MISSING WHERE, WHERE IS er1 defined */
)
r1epiopen /* Round 1 episode open date */
FROM person p
join region r
on r.region_code = p.region_code
and r.location_id = 50
join Person_ep er2
ON er2.person_id = p.person_id
AND er2.round_no = 2
ORDER
BY p.person_id
Using SQL Developer 3.2.20.09 on an Oracle 11G RDBMS.
Sorry to see this post that late... Hope it's not too late...
I suppose you are using Oracle 10+, and you know approximately the hour of the crime (!).
I see 2 possibilities:
1) Use the Log Miner to review the executed SQL: http://docs.oracle.com/cd/B19306_01/server.102/b14215/logminer.htm
2) Use the flashback query to review data of a table in the past. But for this one you need to test in on every suspected table (70+) :( http://docs.oracle.com/cd/E11882_01/appdev.112/e41502/adfns_flashback.htm#ADFNS01001
On the suspected table you could run this kind of SQL to see if update occurred in timeframe:
SELECT versions_startscn, versions_starttime,
versions_endscn, versions_endtime,
versions_xid, versions_operation,
description
FROM my_table
VERSIONS BETWEEN TIMESTAMP TO_TIMESTAMP('2014-01-29 14:59:08', 'YYYY-MM-DD HH24:MI:SS')
AND TO_TIMESTAMP('2014-01-29 14:59:36', 'YYYY-MM-DD HH24:MI:SS')
WHERE id = 1;
I have no practical experience using log miner but I think it would be the best solution, especially if you have archive log activated :D
You can access the data values of affected table before the update (if you know the time of the update) using a query like this one:
SELECT COUNT(*) FROM myTable AS OF TIMESTAMP TO_TIMESTAMP('2014-01-29 13:34:12', 'YYYY-MM-DD HH24:MI:SS');
Of course, data will be available only if still available (retention in undo tablepace).
You could then create a temp table with data before the update:
create table tempTableA as SELECT * FROM myTable AS OF TIMESTAMP TO_TIMESTAMP('2014-01-29 13:34:12', 'YYYY-MM-DD HH24:MI:SS');
Then update you table with values coming from tempTableA.
If you want to find all tables with column "round_no" you probably should use this query
select table_name from all_tab_columns where column_name='round_no'
if you want to get only the tables you can update
SELECT table_name
FROM user_tab_columns c, user_tables t
WHERE c.column_name = 'ROUND_NO'
AND t.table_name = c.table_name;
should work
or for the purists
SELECT table_name
FROM user_tab_columns c
JOIN user_tables t ON t.table_name = c.table_name
WHERE c.column_name = 'ROUND_NO';

openquery giving different results

I have 2 similar queries
select *
from openquery(powerschool,
'select *
from TEACHERS
where teachernumber is not null
and schoolid=''1050''
and teacherloginid is not null
order by teachernumber')
and
SELECT *
from openquery(powerschool,
'SELECT NVL(teachernumber,'''')
from TEACHERS
where teachernumber is not null
and schoolid=''1050''
and teacherloginid is not null
order by teachernumber')
The first one is giving me 182 rows while the second one gives me 83.
What's wrong with the queries?
Second query never would return a null for the teachers table because of the NVL() so it could return more records depending on the data.
basically the " and teacherloginid is not null " never gets hit because you replace the nulls with ""
Just thoughts...
Same server? That is, linked server is different in target or credentialss o you are reading a different "TEACHERS" table
What does running both linked SQL statement actually on the linked server (not locally) give you?