How do i determine the related column between tables? - sql

I have two tables. I need to find the column(s) that relate them. There are not any foreign keys. The primary key for 'CUSTTABLE' is 'ACCOUNTNUM' and for 'DIRPARTYTABLE' is 'RECID'. The related columns are 'CUSTTABLE.PARTY' and 'DIRPARTYTABLE.RECID'. I need this to return 'CUSTTABLE.ACCOUNTNUM', 'DIRPARTYTABLE.NAME', 'DIRPARTYTABLE.PARTYNUMBER' on the same line. The code I have returns column names that match. However, the names are often never the same in my case.
Is there a similar way to return the columns with matching values, not names? even if it returns multiple columns that would help. In reality, these tables have 100+ columns and I have many tables to determine the same type of relationship.
CUSTTABLE:
ACCOUNTNUM CUSTGROUP PARTY
305342 LTL 5637459693
305343 LTL 5637468513
305345 LTL 5637472531
305398 LTL 5637468514
305405 LTL 5637468515
DIRPARTYTABLE:
NAME PARTYNUMBER RECID
ZEP MFG 1500121 5637459693
TABER EXTRUSIONS 1500122 5637459694
LAWSON PRODUCTS 1500123 5637459695
KIMRAY 1500124 5637459696
ANCHOR PAINT MFG 1500125 5637459697
RESULT:
ACCOUNTNUM NAME PARTYNUMBER
305342 ZEP MFG 1500121
Query:
select A.COLUMN_NAME
from INFORMATION_SCHEMA.COLUMNS A
join INFORMATION_SCHEMA.COLUMNS B
on A.COLUMN_NAME = B.COLUMN_NAME
where A.TABLE_NAME = 'CUSTTABLE'
and B.TABLE_NAME = 'DIRPARTYTABLE'

Seems you are working on D365 Database, in that case, usually DIRPARTY table contains all the info i.e. Employee, Customer, Vendor etc.. these would be filtered by DIRPARTY.INSTANCERELATIONTYPE. For the Customers CUSTTABLE.PARTY = DIRPARTYTABLE.RECID while INSTANCERELATIONTYPE = 1155 or 5767, This is standard info, you may want verify INSTANCERELATIONTYPE match with your setup.
Regarding relationship identification, I would suggest you to go through form properties - query from D365 application it self, otherwise it would be difficult to predict the relationship without verifying actual data unless you have meta data document available, because sometime we cannot relay on one column for key relation. i.e. VENDOR.ACCOUNTID is not key column unless you consider VENDOR.ENTITYID or VENDOR.DATAAREAID.
However, following query might be helpful as starting point. You can find similar name column in various number of tables excluding the tables that doesn't have any records.
with CTE as
(
SELECT DB_ID () DatabaseID,
t.Name AS TableName,
p.rows AS RowCounts,
SUM(a.total_pages) * 8 AS TblTotalSizeKB,
SUM(a.used_pages) * 8 AS UsedSpaceKB
-- CAST(ROUND(((SUM(a.used_pages) * 8) / 1024.00), 2) AS NUMERIC(36, 2)) AS UsedSpaceMB,
FROM
sys.tables as t
INNER JOIN
sys.indexes i ON t.Object_ID = i.object_id
INNER JOIN
sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN
sys.allocation_units a ON p.partition_id = a.container_id
--LEFT OUTER JOIN
-- sys.schemas s ON t.schema_id = s.schema_id
WHERE t.OBJECT_ID > 255 and p.rows > 0 --and T.name like '%WORKFLOWTRACKINGSTATUSTABLE%' -----------Enter TABLE name (search key word) here
GROUP BY T.name, p.Rows
)
select C.COLUMN_NAME,DATA_TYPE, CTE.*
from CTE
JOIN INFORMATION_SCHEMA.COLUMNS AS C on CTE.TableName = C.TABLE_NAME
WHERE COLUMN_NAME LIKE '%Hire%' -----------Enter COLUMN name (search key word) here
ORDER BY TABLE_NAME
GO

Related

Table Documenting: Query to return basic stats (Min, Max, Count(*), Count(distinct *), etc) for every column in a table?

I am working on a project to document the tables in a database. One of the features I would like to include is a meta-table that includes some facts and statistics for every column in the table. I already have a query that does the following:
Column
Data Type
Nullable
Primary Key
Foreign Key
Col A
string
N
PK
Col B
string
N
FK
Etc.....
I would also like to get some common statistics of each column as either another table (or preferably as part of the above table) that would look like:
Column
Min
Max
Count
Count Distinct
Col A
1
486842
486842
486842
Col B
1
756
486842
395
Col C
1
5
210523
3
Perhaps also "Most Common Value" and "# of Most Common Value". Basic information that will give me a quick idea as to the distribution and uniqueness of values in a column. The calculations aren't the hard part. I don't understand how to connect actual tables to meta data. It also has a confusing back-and-forth pivot aspect.
Can this be done dynamically without typing out each and every column?
The following query produces the first table:
select
col.name as column_name,
t.name as data_type,
case when col.is_nullable = 0 then 'N'
else 'Y' end as nullable,
case when pk.column_id is not null then 'PK'
else '' end as primary_key,
case when fk.parent_column_id is not null then 'FK'
else '' end as foreign_key
from sys.tables as tab
left join sys.columns as col
on tab.object_id = col.object_id
left join sys.types as t
on col.user_type_id = t.user_type_id
left join (
select index_columns.object_id,
index_columns.column_id
from sys.index_columns
inner join sys.indexes
on index_columns.object_id = indexes.object_id
and index_columns.index_id = indexes.index_id
where indexes.is_primary_key = 1
) as pk
on col.object_id = pk.object_id
and col.column_id = pk.column_id
left join (
select fc.parent_column_id,
fc.parent_object_id
from sys.foreign_keys as f
inner join sys.foreign_key_columns as fc
on f.object_id = fc.constraint_object_id
group by fc.parent_column_id, fc.parent_object_id
) as fk
on fk.parent_object_id = col.object_id
and fk.parent_column_id = col.column_id
where tab.name = 'table_products'
order by primary_key desc, foreign_key desc, nullable asc, column_name asc;
Edit: Solved via comments showing that SQL Server Column Statistics already provide this type of information.

Check the existence of statistic with same columns before creating a statistic on a SQL table

I need to create some statistics on a table with different combination of columns. But there may be already some statistic exist with same combination of columns. So, Before creating a statistic with a combination of columns, I want to check if there exists any statistic with same combination of columns. If exist then I will not create the statistic and if not exist then only I will create the statistic.
For example create a table and a statistic on this table as following:
CREATE TABLE Gift
(
Gift_Id INTEGER IDENTITY (1,1) PRIMARY KEY,
Person_Id INTEGER,
Event_Id INTEGER,
Agent_Id INTEGER,
Fund_Id INTEGER,
Amount FLOAT
)
CREATE STATISTICS [Stats1_1_2_3]
ON [dbo].[Gift]([Gift_Id], [Person_Id], [Event_Id])
So we have a table Gift and a statistic with columns Gift_ID, Person_Id and Event_Id.
Now If I create another statistic as following:
CREATE STATISTICS [Stats2_1_2_3]
ON [dbo].[Gift]([Gift_Id], [Person_Id], [Event_Id])
See later statistic is duplicate of first statistic (With same columns).
So, to avoid the duplication, I need to check if there exists any statistic with same columns.
Is there any way to do this?
Please help
The below query will check the sys.stats,sys.stats_columns and sys.columns table to find all statistics for the given table and get each column for each statistic.
The grouping and counting is to check if one statistic exists that refers to all columns.
Note that the query mentions explicitly the table name, the column names and the number of columns that is being checked.
The updated query also checks if ONLY the columns listed are present (counting = 3) and ALL the columns listed are present (counting3 = 3). Use for other statistics will involve modifications for the CASE statement and the '3' values in the last WHERE line.
SELECT * FROM (
SELECT s.name AS statistics_name,
count(*) as counting,
sum(case
when c.name = 'Gift_Id' then 1
when c.name = 'Person_Id' then 1
when c.name = 'Event_Id' then 1
else 0
end) as counting3
FROM sys.stats AS s
INNER JOIN sys.stats_columns AS sc
ON s.object_id = sc.object_id AND s.stats_id = sc.stats_id
INNER JOIN sys.columns AS c
ON sc.object_id = c.object_id AND c.column_id = sc.column_id
WHERE s.object_id = OBJECT_ID('Gift')
Group by s.name) T
WHERE T.COUNTING = 3 AND T.COUNTING3 = 3;
Updated SQLFiddle for the above
Following query will give the name of the statistic (If any exists) with the given combination of the columns.
SELECT s.name
FROM sys.stats s inner join
sys.stats_columns sc on s.stats_id = sc.stats_id and s.object_id = sc.object_id
JOIN sys.columns c ON c.[object_id] = sc.[object_id] AND c.column_id = sc.column_id
WHERE OBJECT_NAME(s.OBJECT_ID) = 'Gift'
and s.stats_id not in
(
SELECT distinct s.stats_id
FROM sys.stats s inner join
sys.stats_columns sc on s.stats_id = sc.stats_id and s.object_id = sc.object_id
JOIN sys.columns c ON c.[object_id] = sc.[object_id] AND c.column_id = sc.column_id
WHERE OBJECT_NAME(s.OBJECT_ID) = 'Gift' and c.name not in ('Gift_Id','Person_Id','Event_Id')
)
group by s.name
having count(1)= 3
Updated SQLFiddle for the above

SQL server find dependencies between tables that are not declared

I am using SQL Server 2008 R2.
Recently I got a database that contains the live data of a web-application.
By reviewing it I found that there are many tables that have dependencies i.e. already implied but not declared.
For example :
TableA have columns [Id], [Name], [Address]. Here [Id] is primary key.
TableB have columns [Id], [TableAId], [Salary]. Here [Id] is primary key, and column [TableAId] contains only values of [TableA].[Id] (not any value except TableA's Id), but it is not declared as a foreign key.
By Reviewing code, I found that both of the table's record are inserted in same event. So [TableB].[TableAId] column will have only values that [TableA].[Id] contains.
Now, I want to find the other dependencies like them.
Is it possible using SQL server query, tool or any third party software?
In the general case, I don't think you can count on TableA.Id to imply a foreign key reference to TableA. It might be referring to a different table that also stores id numbers from table A. I know you've reviewed the code in this specific case, but you're looking for a solution that doesn't require reviewing the code.
Anyway . . .
You can join tables on expressions. This query (PostgreSQL syntax) joins part of the column name to table names. In PostgreSQL, the function call left(t1.column_name, -2) returns all but the last two characters of t1.column_name; left(t1.column_name, -3) returns all but the last three. That's intended to match names like "TableAid" and "TableA_id".
select t1.table_catalog, t1.table_schema, t1.table_name,
t1.column_name, left(t1.column_name, -2),
t2.table_catalog, t2.table_schema, t2.table_name
from information_schema.columns t1
inner join information_schema.tables t2
on left(t1.column_name, -2) = t2.table_name or
left(t1.column_name, -3) = t2.table_name
where t1.column_name like '%id';
I believe this query will return the same rows. It's using SQL Server syntax, but I haven't tested it in SQL Server.
select t1.table_catalog, t1.table_schema, t1.table_name,
t1.column_name, left(t1.column_name, length(t1.column_name) - 2),
t2.table_catalog, t2.table_schema, t2.table_name
from information_schema.columns t1
inner join information_schema.tables t2
on left(t1.column_name, length(t1.column_name) - 2) = t2.table_name or
left(t1.column_name, length(t1.column_name) - 3) = t2.table_name
where t1.column_name like '%id';
Both these might return rows incorrectly, mainly because joins probably need to take into account the "table_catalog" column at the very least. I debated whether to include it. I finally decided to leave it out. I think that, if I were in your shoes, I'd want this query to have the maximum chance of returning some surprising rows.
Try some dependency checks.
--- Get the source objects, columns and dependent objects using the data.
select so.name as sourceObj
, so.type as sourceType
, c.name as colname
, st.name as coltype
, u.name as DependentObj
, d.selall as is_select_all
, d.resultobj as is_updated
, d.readobj as is_read
--, d.*
from sys.columns c
----- object that owns the column
inner join sys.objects so on so.object_id = c.object_id
inner join sys.types st on c.system_type_id = st.system_type_id
----- holds dependencies
inner join sysdepends d on d.id = c.object_id
----- object that uses the column
inner join sys.objects u on u.object_id = d.depid
You can list all the tables/views/procs, etc. The bits that are really good for your use case are:
, d.sellall as is_select_all
, d.resultobj as is_updated
, d.readobj as is_read
If any of these fields have 1, they are either selected, updated or retrieved directly.
I hope this may help a bit.
Enjoy

How to determine if a specific set of tables in a database are empty

I have database A which contains a table (CoreTables) that stores a list of active tables within database B that the organization's users are sending data to.
I would like to be able to have a set-based query that can output a list of only those tables within CoreTables that are populated with data.
Dynamically, I normally would do something like:
For each row in CoreTables
Get the table name
If table is empty
Do nothing
Else
Print table name
Is there a way to do this without a cursor or other dynamic methods? Thanks for any assistance...
Probably the most efficient option is:
SELECT c.name
FROM dbo.CoreTables AS c
WHERE EXISTS
(
SELECT 1
FROM sys.partitions
WHERE index_id IN (0,1)
AND rows > 0
AND [object_id] = OBJECT_ID(c.name)
);
Just note that the count in sys.sysindexes, sys.partitions and sys.dm_db_partition_stats are not guaranteed to be completely in sync due to in-flight transactions.
While you could just run this query in the context of the database, you could do this for a different database as follows (again assuming that CoreTables does not include schema in the name):
SELECT c.name
FROM DatabaseA.CoreTables AS c
WHERE EXISTS
(
SELECT 1
FROM DatabaseB.sys.partitions AS p
INNER JOIN DatabaseB.sys.tables AS t
ON p.[object_id] = t.object_id
WHERE t.name = c.name
AND p.rows > 0
);
If you need to do this for multiple databases that all contain the same schema (or at least overlapping schema that you're capturing in aggregate in a central CoreTables table), you might want to construct a view, such as:
CREATE VIEW dbo.CoreTableCounts
AS
SELECT db = 'DatabaseB', t.name, MAX(p.rows)
FROM DatabaseB.sys.partitions AS p
INNER JOIN DatabaseB.sys.tables AS t
ON p.[object_id] = t.[object_id]
INNER JOIN DatabaseA.dbo.CoreTables AS ct
ON t.name = ct.name
WHERE p.index_id IN (0,1)
GROUP BY t.name
UNION ALL
SELECT db = 'DatabaseC', t.name, rows = MAX(p.rows)
FROM DatabaseC.sys.partitions AS p
INNER JOIN DatabaseC.sys.tables AS t
ON p.[object_id] = t.[object_id]
INNER JOIN DatabaseA.dbo.CoreTables AS ct
ON t.name = ct.name
WHERE p.index_id IN (0,1)
GROUP BY t.name
-- ...
GO
Now your query isn't going to be quite as efficient, but doesn't need to hard-code database names as object prefixes, instead it can be:
SELECT name
FROM dbo.CoreTableCounts
WHERE db = 'DatabaseB'
AND rows > 0;
If that is painful to execute you could create a view for each database instead.
In SQL Server, you can do something like:
SELECT o.name, st.row_count
FROM sys.dm_db_partition_stats st join
sys.objects o
on st.object_id = o.object_id
WHERE index_id < 2 and st.row_count > 0
By the way, this specifically does not use OBJECT_ID() or OBJECT_NAME() because these are evaluated in the current database. The above code continues to work for another database, using 3-part naming. This version also takes into account multiple partitions:
SELECT o.name, sum(st.row_count)
FROM <dbname>.sys.dm_db_partition_stats st join
<dbname>.sys.objects o
on st.object_id = o.object_id
WHERE index_id < 2
group by o.name
having sum(st.row_count) > 0
something like this?
//
foreach (System.Data.DataTable dt in yourDataSet.Tables)
{
if (dt.Rows.Count != 0) { PrintYourTableName(dt.TableName); }
}
//
This is a way you can do it, that relies on system tables, so be AWARE it may not always work in future versions of SQL. With that strong caveat in mind.
select distinct OBJECT_NAME(id) as tabName,rowcnt
from sys.sysindexes si
join sys.objects so on si.id=si.id
where indid=1 and so.type='U'
You would add to the where clause the tables you are interested in and rowcnt <1

Where are seed_value and increment_value for IDENTITY columns?

I'm collecting metadata using the sys.* views, and according to the documentation, the sys.identity_columns view will return the seed and increment values like so.
CREATE TABLE ident_test (
test_id int IDENTITY(1000,10),
other int
)
SELECT name, seed_value, increment_value
FROM sys.identity_columns
WHERE object_id = OBJECT_ID( 'ident_test' )
However, the above query just returns one column. Is it just me?
(Note: I've had to change this question somewhat from its earlier version.)
Shouldn't you reverse the from and join, like this:
SELECT c.name, i.seed_value, i.increment_value
from sys.identity_columns i
join sys.columns c
ON i.object_id = c.object_id
AND i.column_id = c.column_id
You are missing the Where clause. Your query is effectively saying 'Give me all of sys.columns and any matching rows from sys.identity_columns you have (but give me null if there are no matching rows)'.
By adding the Where clause below you'll change it to only return where an exact match is returned, which is the same as an inner join in this instance really.
SELECT
c.name, i.seed_value, i.increment_value
FROM
sys.columns c
LEFT OUTER JOIN sys.identity_columns i
ON i.object_id = c.object_id
AND i.column_id = c.column_id
Where I.seed_value is not null
So I think your data is correct, there are no results to view though.
Are you sure you are running this in a database with tables with IDENTITY columns?
SELECT c.name, i.seed_value, i.increment_value
FROM sys.columns c
INNER JOIN sys.identity_columns i
ON i.object_id = c.object_id
AND i.column_id = c.column_id
Returns rows for me in a regular production database with a few identities.
Using a LEFT JOIN returns these rows as well as many which are not IDENTITY
I ran this on another database, and I noticed some NULLs are returned (even in the INNER JOIN case). This is because some of the columns are in VIEWs.
Try adding:
INNER JOIN sys.tables t
ON t.object_id = c.object_id
To filter only to actual IDENTITY columns in tables.
your query returns what I'd expect [see below]; it returns the single meta-data row about the single identity column (test_ID) in table (ident_test), the oter column (other) has no meta-data in the sys.identity_column as is is not an identity.
SELECT name, seed_value, increment_value
FROM sys.identity_columns
WHERE object_id = OBJECT_ID( 'ident_test' )
select name, is_identity, is_nullable
from sys.columns
WHERE object_id = OBJECT_ID( 'ident_test' )
Which gives
name seed_value increment_value
-----------------------------------------
test_id 1000 10
(1 row(s) affected)
name is_identity is_nullable
-------------------------------------
test_id 1 0
other 0 1
(2 row(s) affected)