Cross-Database information_schema Joins in SQL Server

Cross-Database information_schema Joins in SQL Server - sql

I am attempting to provide a general solution for the migration of data from one schema version to another. A problem arises when the column data type from the source schema does not match that of the destination. I would like to create a query that will perform a preliminary compare on the columns data types to return which columns need to be fixed before migration is possible.
My current approach is to return the table and column names from information_schema.columns where DATA_TYPE's between catalogs do not match. However, querying information_schema directly will only return results from the catalog of the connection.
Has anyone written a query like this?

I do this by querying the system tables directly. Look into the syscolumns and sysobjects tables. You can also join across linked servers too
select t1.name as tname,c1.name as cname
from adventureworks.dbo.syscolumns c1
join adventureworks.dbo.sysobjects t1 on c1.id = t1.id
where t1.type = 'U'
order by t1.name,c1.colorder

I have always been in the fortunate position to have Red Gate Schema compare which i think would do what you ask. Cheap at twice the price!

Related

SQL Server: find relations between tables without Foreign Key constraints

I have a SQL Server database with lot of tables (several hundrends), somehow related with each other. All of them have Primary Keys (GUID), but only few of them have actually defined Foreign Key constraints.
I need to find all tables related with certain table (let's call it TargetTable) both related directly and inderectly (through 1, 2 or more intermediate tables) on any column.
My finish goal to get SQL queries (one per each related table) which JOIN all tables between TargetTable and that related table.
For example: it's found 5 related to TargetTable tables:
TargetTable - Table1
TargetTable - Table1 - Table2
TargetTable - Table3
TargetTable - Table3 - Table4
TargetTable - Table3 - Table4 - Table5
I need to get 5 separate JOINs.
It there any SQL query or software or utility or any way to get desired SQL codes? Or even enough to get relations in some convinient graph so i could parse them with my favourite script language and generate SQL codes.

You can certainly generate code by looping through information_schema.columns or sys.columns but I doubt this is going to work as well as you would like.
If they didn't bother to put in FKs then they probably have done some other awful things.. like no standard naming conventions or generic tables.
You are probably better off looking through the SQL queries/procedures in the database to see where most of the relationships are... then you will have to decide for yourself if tables are related or not.

You can use SQL Server Management Studio to have both, graph with database diagrams (Not ideal but usefull) https://www.mssqltips.com/sqlservertip/1816/getting-started-with-sql-server-database-diagrams/ and you can get SQL and joins with Query Designer (Still in SSMS) https://www.mssqltips.com/sqlservertip/1086/sql-server-management-studio-query-designer/
Hope this help,

You cannot infer relations given the tables only. To do that you need to have knowledge of the domain. For example, suppose you have two tables, T1 which contains and int field X and, T2 which has an int field Y. Then there is a relationship R, between the rows of T1 and T2, where (r1,r2) is in R if and only if r1.x = r2.y.
So I would suggest that you construct a model (ER-model for example) using your knowledge of the domain. Then add the foreign key constraints manually.

Join two result sets on different databases in SQL Server

May be this question is redundant but I am posting it as I could not get an exact solution (Please read Actual Scenario).
I have the following script which returns all the tables and corresponding no. of rows.
SELECT
sysobjects.Name, sysindexes.Rows
FROM
sysobjects
INNER JOIN sysindexes
ON sysobjects.id = sysindexes.id
WHERE
type = 'U'
AND sysindexes.IndId < 2 ORDER BY ([Rows])
Now, I want to join this result set with similar result set on a different database (with same structure). I am not able to use four partition naming with sysobjects. It gives error: The multi-part identifier "My_Database1.sysobjects.Name" could not be bound.
Actual Scenario: I have a duplicate database and want to know in which tables data has not been moved from original database.
Any alternate solution would also help.

put .dbo between "My_Database1. and sysobjects.Name" as in
My_Database1.dbo.sysobjects

you should be ble to query sys tables on different database than the one you are connected (as long as they are on the same instance, of course). Check your sysntax, I believe you are missing the schema name sys so it would be:
SELECT * FROM My_Database1.sys.sysobjects

Please use sys object for system data table:
select sys_obj.* from DatabaseName.Sys.sysobjects sys_obj

Find Informix table and column details using SQL query

I want to get Informix database table information and column information such as
table names
column names of the table
data types of the columns
data type length (ex: if column is varchar)
constraints on the columns
I am able to find the table names and column names using this query to get the table names:
select tabname from systables
to find the column name
SELECT TRIM(c.colname) AS table_dot_column
FROM "informix".systables AS t, "informix".syscolumns AS c
WHERE t.tabname = 'agent_status'
AND t.tabtype = 'T'
and t.tabid = c.tabid
AND t.tabid >= 100 ;
but I am not able to find the data types and constraints on the columns.
Can anyone tell me the SQL query to get the total details of the table mentioned above?

Wow! That's a complex query - best treated as at least two, probably three queries; or maybe that's what you had in mind anyway.
You might want to select tabid and owner in the first query, and it is good form to use "informix".systables rather than just systables (though that only really matters in a MODE ANSI database, but then it really does matter).
The query on syscolumns is fine, though the t.tabid >= 100 clause is probably redundant, unless you definitively want to prevent people learning about the columns in system catalog tables. Also, it can be helpful to know about the columns in a view, so the tabtype = 'T' might be too stringent.
Decoding the data types is fiddly. For the built-in types, it is not too difficult; for user defined types, it is considerably harder work. The coltype and collength (and extended_d) tell you about the type. You can find C code to translate the basic types in my SQLCMD package, in sqltypes.ec. You can find some simple SQL (that may not be complete) in $INFORMIXDIR/etc/xpg4_is.sql.
Constraint information is stored in sysconstraints and related tables. You can find code for some constraints in the SQLCMD source already mentioned (file sqlinfo.ec).

Simple query for generating database metrics?

Give a database (Sybase) with many tables, I would like to write a SQL query that will calculate, for each table, the number of rows and the number of columns.
Unfortunately, my SQL is a bit rusty. I can generate the table names:
select name from sysobjects where type = 'U'
but how to bind the databases returned to T in:
select count(*) from T
is beyond me. Is it even possible to do this kind of thing?

I don't use Sybase, but the online docs indicate the row counts are in systabstats and the columns are in syscolumns.
SELECT sysobjects.name,
(SELECT COUNT(*) FROM syscolumns WHERE syscolumns.id = sysobjects.id) AS cols,
systabstats.rowcnt
FROM sysobjects
JOIN systabstats
ON (sysobjects.id = systabstats.id AND sysobjects.type = 'U' AND systabstats.indid = 0)

As fredt has given the answer, I'll just provide some extra info.
The built in procedure sp_spaceused "tablename" will give you the number of rows for a selected table, along with details of how much storage space it's using. Used without the parameter it provides storage usage for the current database as a whole.
You can look at the SQL in the various system stored procedures to see where they get their information from. sp_spaceused and sp_help would both be useful for you in this. They live in the sybsystemprocs database. Just be careful not to modify any of those procedures.
There are various versions of a stored procedure called sp_rowcount floating around the internet, that provide what you ask for (rowcount anyway), but inside they are equivalent to the select statement from fredt. The one I use provides index count and table locking scheme. I don't recall exactly where I got mine so don't want to just distribute it in case I upset someone's copyright.

My SQL insert/update statement is too inefficient

I'm trying to write code for a batch import of lots of rows into the database.
Currently I bulk copy the raw data (from a .csv file) into a staging table, so that it's all at the database side. That leaves me with a staging table full of rows that identify 'contacts'. These now need to be moved into other tables of the database.
Next I copy over the rows from the staging table that I don't already have in the contacts table, and for the ones I do already have, I need to update the column named "GroupToBeAssignedTo", indicating a later operation I will perform.
I have a feeling I'm going about this wrong. The query isn't efficient and I'm looking for advice of how I could do this better.
update [t1]
set [t1].GroupToBeAssignedTo = [t2].GroupToBeAssignedTo from Contacts [t1]
inner join ContactImportStaging [t2] on [t1].UserID = [t2].UserID AND [t1].EmailAddress = [t2].EmailAddress AND [t2].GUID = #GUID
where not exists
(
select GroupID, ContactID from ContactGroupMapping
where GroupID = [t2].GroupToBeAssignedTo AND ContactID = [t1].ID
)
Might it be better to just import all the rows without checking for duplicates first and then 'clean' the data afterwards? Looking for suggestions of where I'm going wrong. Thanks.
EDIT: To clarify, the question is regarding MS SQL.

This answer is slightly "I wouldn't start from here", but it's the way I'd do it ;)
If you've got the Standard or Enterprise editions of MS SQL Server 2005, and you have access to SQL Server Integration Services, this kind of thing is a doddle to do with a Data Flow.
Create a data source linked to the CSV file (it's faster if it's sorted by some field)
...and another to your existing contacts table (using ORDER BY to sort it by the same field)
Do a Merge Join on their common field -- you'll need to use a Sort transformation if either the two sources aren't already sorted
Do a Conditional split to focus only on rows that aren't already in your table (i.e. a table-unique field is "null", i.e. the merge join didn't actually merge for that row)
Use an OLEDB destination to input to the table.
Probably more individual steps than a single insert-with-select statement, but it'll save your staging, and it's pretty intuitive to follow. Plus, you're probably already licenced to use it, and it's pretty easy :)

Next I copy over the rows from the staging table that I don't already have in the contacts table
It seems that implies that ContactGroupMapping does not have records matching Contacts.id, in which case you can just omit the EXISTS:
UPDATE [t1]
SET [t1].GroupToBeAssignedTo = [t2].GroupToBeAssignedTo
FROM Contacts [t1]
INNER JOIN
ContactImportStaging [t2]
ON [t1].UserID = [t2].UserID
AND [t1].EmailAddress = [t2].EmailAddress
AND [t2].GUID = #GUID
Or I am missing something?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Cross-Database information_schema Joins in SQL Server - sql

I have always been in the fortunate position to have Red Gate Schema compare which i think would do what you ask. Cheap at twice the price!

Related

SQL Server: find relations between tables without Foreign Key constraints

Join two result sets on different databases in SQL Server

Find Informix table and column details using SQL query

Simple query for generating database metrics?

My SQL insert/update statement is too inefficient

Categories

Resources