U-SQL Catalog metadata views - azure-data-lake

In the mature RDBMS world we have a lot of catalog views that describe metadata and can be used for codegeneration of mainenance scripts.
Does it exists something like this in U-SQL? For example i wanna generate U-SQL script that create statistics for some columns in some U-SQL tables.
SELECT
'CREATE STATISTICS st__' + t.name + '_' + c.name + ' ON ' + t.name + '(' +
c.name + ') WITH FULLSCAN;'
FROM
sys.tables t
INNER JOIN
sys.columns c ON t... = c....
It will generate me a script that will create statistics for me. Does it exists these system views?
(although looks like i can use powershell API, but i'm not sure if it powerfull enough).

Catalog views are now available, please see Catalog Views (U-SQL).

The Powershell scripts are supposed to provide you with all the information to allow you to generate such scripts. If something is missing or not working, please let us know.
We have catalog views on our roadmap for later this year.

$adlaAccount = "myacc";
$tablePath = "dbname.schemaname";
$tables = Get-AzureRmDataLakeAnalyticsCatalogItem -Account $adlaAccount -ItemType Table -Path $tablePath
ForEach ($t in $tables)
{
ForEach ($c in $t.ColumnList)
{
"CREATE STATISTICS st__$($t.Name)__$($c.Name) ON $($t.name)($($c.Name)) WITH FULLSCAN;"
}
}

Related

How can I query a database of tables for a specific row value?

I'm working with a database with ~20 tables, and need to find the table which has a string value 'fees' in it.
How would I go about searching for the table which contains this?
I tried using
Select t.name as table_name, c.name as column_name
from sys.columns c
inner join sys.tables t
on c.object_id=t.object_id
Where c.name like '%fees%'
although, obviously, this only returned tables with their respective columns that contained the word 'fees'.
Is there a way to find the table containing row values = 'fees' ?
SQL databases store structured data and therefore one needs to look in the right place in the structure for data sought after. That said, you can try multiple approaches depending on the total amount of data you have.
If the amount of data is relatively small, say a few hundred megabytes up to a gigabyte or so, you can dump each table into a file in text form. Then you can use a tool like grep to search for the files containing that data.
If the amount of data is considerably larger, you can query the system catalog as you did but the form of the queries can be ones that return sql selects as output. You can then execute those sql selects to see if any of them return what you are looking for. E.g.
select
'select ' + c.name + ' from ' + t.name + ' where ' + c.name + ' like ''%fees%'''
from
sys.columns as c
inner join
sys.tables as t
on c.object_id=t.object_id
where
c.system_type_id in (167, 231) -- Get the rest of these values by inspecting sys.types
and t.type_desc = 'USER_TABLE'

use tsql to detect XML dependency

i have a XML schema bound to a table. however, sometimes testers piggyback and bind to this schema too. when there is this "ninja" XML table reference, any alteration to this schema is painful.
i'd like to run a query before schema altering and raise exception if the XML schema is bound to more than one table. i've looked at sys.sql_dependencies and few of the other sys.xml_XXXX tables, but it's not clear how to do this in tsql. is something like this possible?
Something like this might be helpful
select object_name(object_id) as TableName,
col_name(object_id, column_id) as ColumnName
from sys.column_xml_schema_collection_usages as U
inner join sys.xml_schema_collections as S
on U.xml_collection_id = S.xml_collection_id
where S.name = 'YourXMLSchemaCollectionName'
This one is to find where the schema is used in a XML parameter.
select object_name(object_id)
from sys.parameter_xml_schema_collection_usages as P
inner join sys.xml_schema_collections as S
on P.xml_collection_id = S.xml_collection_id
where S.name = 'YourXMLSchemaCollectionName'

Accessing Database Meta Data

Im running into a tough issue. I have a database using Microsoft SQL 2008 and in this database there are many tables. The tables were auto generated and do not have meaningful names. There is one particular table that I need, and I can not seem to find it.
I know what the names of a few of the columns in the table are called. Is there a way I can go through all the tables one at a time looking at the names of the columns and seeing if they match the ones I know.
If they do, then I can look farther into it the table to see if it is the one I am looking for. Does this sound like a good approach to the problem? Is it possible? Any ideas of where to start?
SELECT OBJECT_SCHEMA_NAME([object_id]),
OBJECT_NAME([object_id])
FROM sys.columns
WHERE name IN ('column 1', 'column 2'
/* , ... other columns */);
EDIT by request, in case the OP meant to identify ALL vs. ANY:
SELECT OBJECT_SCHEMA_NAME([object_id), name
FROM sys.tables AS t
WHERE EXISTS
(
SELECT 1 FROM sys.columns
WHERE name = 'column 1'
AND [object_id] = t.[object_id]
)
AND EXISTS
(
SELECT 1 FROM sys.columns
WHERE name = 'column 2'
AND [object_id] = t.[object_id]
)
/* ... repeat for other columns ... */
Alternative to Aaron's answer using Information_schema.columns instead of sys.columns
SELECT Table_name
FROM
information_schema.columns
WHERE
column_name IN ('column 1', 'column 2')
GROUP BY Table_Name
Having COUNT(column_name) = 2
See this Data.SE query for a working example
With the scripts above, you are limited to SQL wild-carding, which can be pretty limited. You can use SchemaCrawler grep to more powerfully search through your database using regular expressions. SchemaCrawler also allows you additional features to to look for tables related by foreign keys, so for example, you can say find me all tables that have a customer address column, along with the tables that refer to these tables. SchemaCrawler is a command-line tool that is bundled with a Microsoft SQL Server database driver.
Sualeh Fatehi, SchemaCrawler

What is syncobj in SQL Server

When I run this script to search particular text in sys.columns and I get a lot of "dbo.syncobj_0x3934443438443332" like rows.
SELECT c.name, s.name + '.' + o.name
FROM sys.columns c
INNER JOIN sys.objects o ON c.object_id=o.object_id
INNER JOIN sys.schemas s ON o.schema_id=s.schema_id
WHERE c.name LIKE '%text%'
If I get it right, they are replication objects. Is it so? Can i just throw them away from my query just like o.name NOT LIKE '%syncobj%' or there's another way?
Thank you.
I've found a solution. Doesn't know, if it's the best one or not.
SELECT c.name, s.name + '.' + o.name
FROM sys.columns c
INNER JOIN sys.objects o ON c.object_id=o.object_id
INNER JOIN sys.schemas s ON o.schema_id=s.schema_id
WHERE c.name LIKE '%text%' AND o.type = 'U'
The result is fine now. As I said syncobj's are replication objects and they don't have a meaning for us. They're used for replication purposes only.
http://www.developmentnow.com/g/114_2007_12_0_0_443938/syncobj-views.htm
EDIT:
Forgot to add, syncobj's are stored in DB as Views, so if you need list of views, you'll probably need to ignore them as I did in my question.
While checking difference between syncobj's and my views, the only difference is is_ms_shipped column. For syncobj it's 1, for others 0. It means that syncobj views are created by system.
P.S. I'll wait for some time and if nobody gives another answer, I'll accept mine.
When you create a replication that does not include all the fields or other meta data changes from the original table. If you do a generate script from a publication it will show you how it is created (see below). The view provide a object to generate the bcp extracts during the initial snapshots.
Here is an example
-- Adding the article synchronization object exec sp_articleview #publication = N'publication_data', #article = N'tablename',
#view_name = N'syncobj_0x4239373642443436', #filter_clause = N'',
#force_invalidate_snapshot = 1, #force_reinit_subscription = 1 GO
P.S. I recently had a problem when the I dropped replication, it failed to drop these and then you have to manually drop the system views to reuse a replication script. Giving a error message
Msg 2714, Level 16, State 3: There is already an object named
'syncobj_0x3437324238353830' in the database.
Which caused the bcp to fail during the snapshot.

Given a column name how can I find which tables in database contain that column?

Given a column name how can I find
which tables in database contain that
column ?
or alternatively
How can I find that particular column
exists for all tables in Database ?
Note: Kindly explain answers with Examples as that I get most knowledge from the answer.
Edit: I am using MySQL Database.
SELECT * FROM information_schema.columns WHERE COLUMN_NAME = 'mycolumn'
Depends on the database you are using. Many database systems expose a set of tables of views that contain details of the schema. For example, you can get schema information from the SYSTABLE and SYSCOLUMN views in Sybase ASA.
in SQL Server:
select distinct t.name
from sys.Columns c
inner join sys.tables t on c.object_id = t.object_id
where c.name = 'YOUR_COLUMNNAME'