SQL server find dependencies between tables that are not declared

SQL server find dependencies between tables that are not declared - sql

I am using SQL Server 2008 R2.
Recently I got a database that contains the live data of a web-application.
By reviewing it I found that there are many tables that have dependencies i.e. already implied but not declared.
For example :
TableA have columns [Id], [Name], [Address]. Here [Id] is primary key.
TableB have columns [Id], [TableAId], [Salary]. Here [Id] is primary key, and column [TableAId] contains only values of [TableA].[Id] (not any value except TableA's Id), but it is not declared as a foreign key.
By Reviewing code, I found that both of the table's record are inserted in same event. So [TableB].[TableAId] column will have only values that [TableA].[Id] contains.
Now, I want to find the other dependencies like them.
Is it possible using SQL server query, tool or any third party software?

In the general case, I don't think you can count on TableA.Id to imply a foreign key reference to TableA. It might be referring to a different table that also stores id numbers from table A. I know you've reviewed the code in this specific case, but you're looking for a solution that doesn't require reviewing the code.
Anyway . . .
You can join tables on expressions. This query (PostgreSQL syntax) joins part of the column name to table names. In PostgreSQL, the function call left(t1.column_name, -2) returns all but the last two characters of t1.column_name; left(t1.column_name, -3) returns all but the last three. That's intended to match names like "TableAid" and "TableA_id".
select t1.table_catalog, t1.table_schema, t1.table_name,
t1.column_name, left(t1.column_name, -2),
t2.table_catalog, t2.table_schema, t2.table_name
from information_schema.columns t1
inner join information_schema.tables t2
on left(t1.column_name, -2) = t2.table_name or
left(t1.column_name, -3) = t2.table_name
where t1.column_name like '%id';
I believe this query will return the same rows. It's using SQL Server syntax, but I haven't tested it in SQL Server.
select t1.table_catalog, t1.table_schema, t1.table_name,
t1.column_name, left(t1.column_name, length(t1.column_name) - 2),
t2.table_catalog, t2.table_schema, t2.table_name
from information_schema.columns t1
inner join information_schema.tables t2
on left(t1.column_name, length(t1.column_name) - 2) = t2.table_name or
left(t1.column_name, length(t1.column_name) - 3) = t2.table_name
where t1.column_name like '%id';
Both these might return rows incorrectly, mainly because joins probably need to take into account the "table_catalog" column at the very least. I debated whether to include it. I finally decided to leave it out. I think that, if I were in your shoes, I'd want this query to have the maximum chance of returning some surprising rows.

Try some dependency checks.
--- Get the source objects, columns and dependent objects using the data.
select so.name as sourceObj
, so.type as sourceType
, c.name as colname
, st.name as coltype
, u.name as DependentObj
, d.selall as is_select_all
, d.resultobj as is_updated
, d.readobj as is_read
--, d.*
from sys.columns c
----- object that owns the column
inner join sys.objects so on so.object_id = c.object_id
inner join sys.types st on c.system_type_id = st.system_type_id
----- holds dependencies
inner join sysdepends d on d.id = c.object_id
----- object that uses the column
inner join sys.objects u on u.object_id = d.depid
You can list all the tables/views/procs, etc. The bits that are really good for your use case are:
, d.sellall as is_select_all
, d.resultobj as is_updated
, d.readobj as is_read
If any of these fields have 1, they are either selected, updated or retrieved directly.
I hope this may help a bit.
Enjoy

Related

How to get the min value of a column after selecting it. Cannot perform an aggregate function on a column

My error is in this part (select min(sc.name) from so.name ), how to solve it ?
In the select I am getting the table and column name, in the same time i want to get the min value of the column from the table. Is that possible?.
select so.name table_name , sc.name Column_name,(select min(sc.name) from so.name )
from sysindexes si, syscolumns sc, sysobjects so
where si.indid < 2 -- 0 = if a table. 1 = if a clustered index on an allpages-locked table. >1 = if a nonclustered index or a clustered index on a data-only-locked table.
and so.type = 'U' --U – user table
and sc.status & 128 = 128 --(value 128) – indicates an identity column.
and so.id = sc.id
and so.id = si.id

So the problem is that You are trying to basically trying to do dynamic code where You try to select a column based on a table name from a system table.
Problem is that SQL doesnt know that the 'so.name' you are referencing is a table (further more, sysobjects also contains procedures and functions).
Rather than that, you should do an Inner join between sys.syscolumns and sys.systables based on object_id.

How do i determine the related column between tables?

I have two tables. I need to find the column(s) that relate them. There are not any foreign keys. The primary key for 'CUSTTABLE' is 'ACCOUNTNUM' and for 'DIRPARTYTABLE' is 'RECID'. The related columns are 'CUSTTABLE.PARTY' and 'DIRPARTYTABLE.RECID'. I need this to return 'CUSTTABLE.ACCOUNTNUM', 'DIRPARTYTABLE.NAME', 'DIRPARTYTABLE.PARTYNUMBER' on the same line. The code I have returns column names that match. However, the names are often never the same in my case.
Is there a similar way to return the columns with matching values, not names? even if it returns multiple columns that would help. In reality, these tables have 100+ columns and I have many tables to determine the same type of relationship.
CUSTTABLE:
ACCOUNTNUM CUSTGROUP PARTY
305342 LTL 5637459693
305343 LTL 5637468513
305345 LTL 5637472531
305398 LTL 5637468514
305405 LTL 5637468515
DIRPARTYTABLE:
NAME PARTYNUMBER RECID
ZEP MFG 1500121 5637459693
TABER EXTRUSIONS 1500122 5637459694
LAWSON PRODUCTS 1500123 5637459695
KIMRAY 1500124 5637459696
ANCHOR PAINT MFG 1500125 5637459697
RESULT:
ACCOUNTNUM NAME PARTYNUMBER
305342 ZEP MFG 1500121
Query:
select A.COLUMN_NAME
from INFORMATION_SCHEMA.COLUMNS A
join INFORMATION_SCHEMA.COLUMNS B
on A.COLUMN_NAME = B.COLUMN_NAME
where A.TABLE_NAME = 'CUSTTABLE'
and B.TABLE_NAME = 'DIRPARTYTABLE'

Seems you are working on D365 Database, in that case, usually DIRPARTY table contains all the info i.e. Employee, Customer, Vendor etc.. these would be filtered by DIRPARTY.INSTANCERELATIONTYPE. For the Customers CUSTTABLE.PARTY = DIRPARTYTABLE.RECID while INSTANCERELATIONTYPE = 1155 or 5767, This is standard info, you may want verify INSTANCERELATIONTYPE match with your setup.
Regarding relationship identification, I would suggest you to go through form properties - query from D365 application it self, otherwise it would be difficult to predict the relationship without verifying actual data unless you have meta data document available, because sometime we cannot relay on one column for key relation. i.e. VENDOR.ACCOUNTID is not key column unless you consider VENDOR.ENTITYID or VENDOR.DATAAREAID.
However, following query might be helpful as starting point. You can find similar name column in various number of tables excluding the tables that doesn't have any records.
with CTE as
(
SELECT DB_ID () DatabaseID,
t.Name AS TableName,
p.rows AS RowCounts,
SUM(a.total_pages) * 8 AS TblTotalSizeKB,
SUM(a.used_pages) * 8 AS UsedSpaceKB
-- CAST(ROUND(((SUM(a.used_pages) * 8) / 1024.00), 2) AS NUMERIC(36, 2)) AS UsedSpaceMB,
FROM
sys.tables as t
INNER JOIN
sys.indexes i ON t.Object_ID = i.object_id
INNER JOIN
sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN
sys.allocation_units a ON p.partition_id = a.container_id
--LEFT OUTER JOIN
-- sys.schemas s ON t.schema_id = s.schema_id
WHERE t.OBJECT_ID > 255 and p.rows > 0 --and T.name like '%WORKFLOWTRACKINGSTATUSTABLE%' -----------Enter TABLE name (search key word) here
GROUP BY T.name, p.Rows
)
select C.COLUMN_NAME,DATA_TYPE, CTE.*
from CTE
JOIN INFORMATION_SCHEMA.COLUMNS AS C on CTE.TableName = C.TABLE_NAME
WHERE COLUMN_NAME LIKE '%Hire%' -----------Enter COLUMN name (search key word) here
ORDER BY TABLE_NAME
GO

How to determine if a specific set of tables in a database are empty

I have database A which contains a table (CoreTables) that stores a list of active tables within database B that the organization's users are sending data to.
I would like to be able to have a set-based query that can output a list of only those tables within CoreTables that are populated with data.
Dynamically, I normally would do something like:
For each row in CoreTables
Get the table name
If table is empty
Do nothing
Else
Print table name
Is there a way to do this without a cursor or other dynamic methods? Thanks for any assistance...

Probably the most efficient option is:
SELECT c.name
FROM dbo.CoreTables AS c
WHERE EXISTS
(
SELECT 1
FROM sys.partitions
WHERE index_id IN (0,1)
AND rows > 0
AND [object_id] = OBJECT_ID(c.name)
);
Just note that the count in sys.sysindexes, sys.partitions and sys.dm_db_partition_stats are not guaranteed to be completely in sync due to in-flight transactions.
While you could just run this query in the context of the database, you could do this for a different database as follows (again assuming that CoreTables does not include schema in the name):
SELECT c.name
FROM DatabaseA.CoreTables AS c
WHERE EXISTS
(
SELECT 1
FROM DatabaseB.sys.partitions AS p
INNER JOIN DatabaseB.sys.tables AS t
ON p.[object_id] = t.object_id
WHERE t.name = c.name
AND p.rows > 0
);
If you need to do this for multiple databases that all contain the same schema (or at least overlapping schema that you're capturing in aggregate in a central CoreTables table), you might want to construct a view, such as:
CREATE VIEW dbo.CoreTableCounts
AS
SELECT db = 'DatabaseB', t.name, MAX(p.rows)
FROM DatabaseB.sys.partitions AS p
INNER JOIN DatabaseB.sys.tables AS t
ON p.[object_id] = t.[object_id]
INNER JOIN DatabaseA.dbo.CoreTables AS ct
ON t.name = ct.name
WHERE p.index_id IN (0,1)
GROUP BY t.name
UNION ALL
SELECT db = 'DatabaseC', t.name, rows = MAX(p.rows)
FROM DatabaseC.sys.partitions AS p
INNER JOIN DatabaseC.sys.tables AS t
ON p.[object_id] = t.[object_id]
INNER JOIN DatabaseA.dbo.CoreTables AS ct
ON t.name = ct.name
WHERE p.index_id IN (0,1)
GROUP BY t.name
-- ...
GO
Now your query isn't going to be quite as efficient, but doesn't need to hard-code database names as object prefixes, instead it can be:
SELECT name
FROM dbo.CoreTableCounts
WHERE db = 'DatabaseB'
AND rows > 0;
If that is painful to execute you could create a view for each database instead.

In SQL Server, you can do something like:
SELECT o.name, st.row_count
FROM sys.dm_db_partition_stats st join
sys.objects o
on st.object_id = o.object_id
WHERE index_id < 2 and st.row_count > 0
By the way, this specifically does not use OBJECT_ID() or OBJECT_NAME() because these are evaluated in the current database. The above code continues to work for another database, using 3-part naming. This version also takes into account multiple partitions:
SELECT o.name, sum(st.row_count)
FROM <dbname>.sys.dm_db_partition_stats st join
<dbname>.sys.objects o
on st.object_id = o.object_id
WHERE index_id < 2
group by o.name
having sum(st.row_count) > 0

something like this?
//
foreach (System.Data.DataTable dt in yourDataSet.Tables)
{
if (dt.Rows.Count != 0) { PrintYourTableName(dt.TableName); }
}
//

This is a way you can do it, that relies on system tables, so be AWARE it may not always work in future versions of SQL. With that strong caveat in mind.
select distinct OBJECT_NAME(id) as tabName,rowcnt
from sys.sysindexes si
join sys.objects so on si.id=si.id
where indid=1 and so.type='U'
You would add to the where clause the tables you are interested in and rowcnt <1

Find the real column name of an alias used in a view?

Suppose I have a view in which some of the column names are aliases, like "surName" in this example:
CREATE VIEW myView AS
SELECT
firstName,
middleName,
you.lastName surName
FROM
myTable me
LEFT OUTER JOIN yourTable you
ON me.code = you.code
GO
I'm able to retrieve some information about the view using the INFORMATION_SCHEMA views.
For example, the query
SELECT column_name AS ALIAS, data_type AS TYPE
FROM information_schema.columns
WHERE table_name = 'myView'
yields:
----------------
|ALIAS |TYPE |
----------------
|firstName |nchar|
|middleName|nchar|
|surName |nchar|
----------------
However, I would like to know the actual column name as well. Ideally:
---------------------------
|ALIAS |TYPE |REALNAME |
---------------------------
|firstName |nchar|firstName |
|middleName|nchar|middleName|
|surName |nchar|lastName |
---------------------------
How can I determine what the real column name is based on the alias? There must be some way to use the sys tables and/or INFORMATION_SCHEMA views to retrieve this information.
EDIT:
I can get close with this abomination, which is similar to Arion's answer:
SELECT
c.name AS ALIAS,
ISNULL(type_name(c.system_type_id), t.name) AS DATA_TYPE,
tablecols.name AS REALNAME
FROM
sys.views v
JOIN sys.columns c ON c.object_id = v.object_id
LEFT JOIN sys.types t ON c.user_type_id = t.user_type_id
JOIN sys.sql_dependencies d ON d.object_id = v.object_id
AND c.column_id = d.referenced_minor_id
JOIN sys.columns tablecols ON d.referenced_major_id = tablecols.object_id
AND tablecols.column_id = d.referenced_minor_id
AND tablecols.column_id = c.column_id
WHERE v.name ='myView'
This yields:
---------------------------
|ALIAS |TYPE |REALNAME |
---------------------------
|firstName |nchar|firstName |
|middleName|nchar|middleName|
|surName |nchar|code |
|surName |nchar|lastName |
---------------------------
but the third record is wrong -- this happens with any view created using a "JOIN" clause, because there are two columns with the same "column_id", but in different tables.

Given this view:
CREATE VIEW viewTest
AS
SELECT
books.id,
books.author,
Books.title AS Name
FROM
Books
What I can see you can get the columns used and the tables used by doing this:
SELECT *
FROM INFORMATION_SCHEMA.VIEW_COLUMN_USAGE AS UsedColumns
WHERE UsedColumns.VIEW_NAME='viewTest'
SELECT *
FROM INFORMATION_SCHEMA.VIEW_TABLE_USAGE AS UsedTables
WHERE UsedTables.VIEW_NAME='viewTest'
This is for sql server 2005+. See reference here
Edit
Give the same view. Try this query:
SELECT
c.name AS columnName,
columnTypes.name as dataType,
aliases.name as alias
FROM
sys.views v
JOIN sys.sql_dependencies d
ON d.object_id = v.object_id
JOIN .sys.objects t
ON t.object_id = d.referenced_major_id
JOIN sys.columns c
ON c.object_id = d.referenced_major_id
JOIN sys.types AS columnTypes
ON c.user_type_id=columnTypes.user_type_id
AND c.column_id = d.referenced_minor_id
JOIN sys.columns AS aliases
on c.column_id=aliases.column_id
AND aliases.object_id = object_id('viewTest')
WHERE
v.name = 'viewTest';
It returns this for me:
columnName dataType alias
id int id
author varchar author
title varchar Name
This is also tested in sql 2005+

Having spent a number of hours trying to find an answer to this, and repeatedly running into solutions that didn't work and posters that appeared to eventually give up, I eventually stumbled across an answer here that appears to work:
https://social.msdn.microsoft.com/Forums/windowsserver/en-US/afa2ed2b-62de-4a5e-ae70-942e75f887a1/find-out-original-columns-name-when-used-in-a-view-with-alias?forum=transactsql
The following SQL returns, I believe, exactly what you're looking for, it's certainly doing what I need and appears to perform well too.
SELECT name
, source_database
, source_schema
, source_table
, source_column
, system_type_name
, is_identity_column
FROM sys.dm_exec_describe_first_result_set (N'SELECT * from ViewName', null, 1)
Documentation on the sys.dm_exec_describe_first_result_set function can be found here, it's available in SQL Server 2012 and later:
https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-exec-describe-first-result-set-transact-sql
Full credit to the poster on the link, I didn't work this out myself, but I wanted to post this here in case it's useful to anyone else searching for this information as I found this thread much more easily than the one I linked to.

I think you can't.
Select query hides actual data source it was performed against. Because you can query anything, i.e. view, table, even linked remote server.

Not a Perfect solution; but, it is possible to parse the view_definition with a high degree of accuracy especially if the code is well organized with consistent aliasing by 'as'. Additionally, one can parse for a comma ',' after the alias.
Of note: the final field in the select clause will not have the comma and I was unable to exclude items being used as comments (for example interlaced in the view text with --)
I wrote the below for a table named 'My_Table' and view correspondingly called 'vMy_Table'
select alias, t.COLUMN_name
from
(
select VC.COLUMN_NAME,
case when
ROW_NUMBER () OVER (
partition by C.COLUMN_NAME order by
CHARINDEX(',',VIEW_DEFINITION,CHARINDEX(C.COLUMN_NAME,VIEW_DEFINITION))-
CHARINDEX(VC.COLUMN_NAME,VIEW_DEFINITION)
) = 1
then 1
else 0 end
as lenDiff
,C.COLUMN_NAME as alias
,CHARINDEX(',',VIEW_DEFINITION,CHARINDEX(C.COLUMN_NAME,VIEW_DEFINITION)) diff1
, CHARINDEX(VC.COLUMN_NAME,VIEW_DEFINITION) diff2
from INFORMATION_SCHEMA.VIEW_COLUMN_USAGE VC
inner join INFORMATION_SCHEMA.VIEWS V on V.TABLE_NAME = 'v'+VC.TABLE_Name
inner join information_schema.COLUMNS C on C.TABLE_NAME = 'v'+VC.TABLE_Name
where VC.TABLE_NAME = 'My_Table'
and CHARINDEX(',',VIEW_DEFINITION,CHARINDEX(C.COLUMN_NAME,VIEW_DEFINITION))-
CHARINDEX(VC.COLUMN_NAME,VIEW_DEFINITION) >0
)
t
where lenDiff = 1
Hope this helps and I look forward to your feedback

Joining on one of Two Tables Based on Parameter

Not sure if this can be done, but here is what I am trying to do.
I have two tables:
Table 1 is called Task and it contains all of the possible Task Names
Table 2 is called Task_subset and it contains only a subset of the Task Names included in Table 1
I have a variable called #TaskControl, that is passed in as a parameter, it either is equal to Table1 or Table2
Based on the value of the #TaskControl variable I want to join one of my Task Tables
For example:
If #TaskControl = 'Table1':
Select * From Orders O Join Task T on T.id = O.id
If #TaskControl = 'Table2):
Select * From Orders O Join Task_subset T on T.id = O.id
How would I do this, Sql Server 08

Don't overcomplicate it. Put it into a stored proc like so:
CREATE PROCEDURE dbo.MyProcedure(#TaskControl varchar(20))
AS
If #TaskControl = 'Table1'
Select * From Orders O Join Task T on T.id = O.id
ELSE If #TaskControl = 'Table2'
Select * From Orders O Join Task_subset T on T.id = O.id
ELSE SELECT 'Invalid Parameter'
Or just straight TSQL with no proc:
If #TaskControl = 'Table1'
Select * From Orders O Join Task T on T.id = O.id
ELSE If #TaskControl = 'Table2'
Select * From Orders O Join Task_subset T on T.id = O.id

Doing it exactly as you do it right now is the best way. Having one single statement that attempts to somehow dynamically join one of two statements is the last thing you want. T-SQL is a language for data access, not for DRY code-reuse programming. If you attempt to have a single statement then the optimizer has to come up with a plan that always work, no matter the value of #TaskControl, and so the plan will always have to join both tables.
A more lengthy discussion on this topic is Dynamic Search Conditions in T-SQL (your dynamic join falls into the same topic as dynamic search).

If they are UNION compatible you could give this a shot. From a quick test this end it only appears to access the relevant table.
I do agree more with JNK's and Remus's answers however. This does have a recompilation cost for every invocation and not much benefit.
;WITH T AS
(
SELECT 'Table1' AS TaskControl, id
FROM Task
UNION ALL
SELECT 'Table2' AS TaskControl, id
FROM Task_subset
)
SELECT *
FROM T
JOIN Orders O on T.id = O.id
WHERE TaskControl = #TaskControl
OPTION (RECOMPILE)

I don't know how good performance would be, and this would not scale well as you add on additional optional tables, but this should work in the situation that you present.
SELECT
O.some_column,
COALESCE(T.some_task_column, TS.some_task_subset_column)
FROM
Orders O
LEFT OUTER JOIN Tasks T ON
#task_control = 'Tasks' AND
T.id = O.id
LEFT OUTER JOIN Task_Subsets TS ON
#task_control = 'Task Subsets' AND
TS.id = O.id

Try the following. It should avoid the stored procedure plan getting bound based on the value of the parameter passed during the first execution of the stored procedure (See SQL Server Parameter Sniffing for details):
create proc dbo.foo
#TaskControl varchar(32)
as
declare #selection varchar(32)
set #selection = #TaskControl
select *
from dbo.Orders t
join dbo.Task t1 on t1.id = t.id
where #selection = 'Table1'
UNION ALL
select *
from dbo.Orders t
join dbo.Task_subset t1 on t1.id = t.id
where #selection = 'Table2'
return 0
go
The stored procedure shouldn't get recompiled for each invocation, either, as #Martin suggested might happen, but the parameter value 1st passed in should not influence the execution plan the gets bound. But if performance is an issue, run a sql trace with the profiler and see if the cached execution plan is reused or if a recompile is triggered.
One thing, though: you will need to ensure, though, that each individual select in the UNION returns the exact same columns. Each select in a UNION must have the same number of columns and each column must have a common type (or default conversion to the common type). The 1st select defines the number, types and names of the columns in the result set.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL server find dependencies between tables that are not declared - sql

Related

How to get the min value of a column after selecting it. Cannot perform an aggregate function on a column

How do i determine the related column between tables?

How to determine if a specific set of tables in a database are empty

Find the real column name of an alias used in a view?

Joining on one of Two Tables Based on Parameter

Categories

Resources