How to build a query that shows the size of segments - sql

I have a database table and I need to utilize the data in the table to tell me about segment sizes for the tables listed in the database table.
Here's an example of the kind of data in there, it's broken up into 4 columns and there are many rows:
TABLE_A TABLE_B JOIN_COND WHERE_CLAUSE
AZ AT A.AR_ID = B.AR_ID A.DE = 'AJS'
AZ1 AT1 A.AR_ID = B.AR_ID A.DE = 'AJS' AND B.END_DATE > '30-NOV-2015'
AZ2 AT3 A.AR_ID = B.AR_ID A.DE = 'AJS' AND B.END_DATE > '30-NOV-2015'
Here's what I need to accomplish:
Some sort of loop perhaps? That finds the size of each single "TABLE_A" in kilobytes.
Build a query that would find an estimate of the data (space) that would be needed to create a new table based on a subset of a query something like this:
...
SELECT *
FROM TABLE_A a, TABLE_B b
WHERE A.AR_ID = B.AR_ID
AND A.FININS_CDE = 'AJS'
AND B.END_DTE > '30-NOV-2015'
... but for every row in the table. So at the end of the process if there were 100 rows in the table, I would get 200 results:
100 rows telling me the size of each table A
100 results telling me the size that would be taken up by the subset with the WHERE clause.

You're going to need to use dynamic sql for this. The Oracle documentation is here.
You'll need to build some dynamic sql for each of your tables:
SELECT TABLE_A, 'select segment_name,segment_type,bytes/1024/1024 MB
from dba_segments
where segment_type=''TABLE'' and segment_name=''' || TABLE_A || ''''
FROM <your meta data table>
Then you'll need to loop over the result set and execute each of the statements and capture the results. Some info about that here.
Once you've executed all of the statements you'll have the answer for 1.
The next part is a little more tricky where you'll need to find the data type sizes for each of the columns, then adding all of these together you'll get the size of one row for one table. You can use vsize to get the size of each column.
Using more dynamic sql you can then build your actual statements and execute them as a SELECT COUNT(*) to get the actual number of rows. Multiply the number of rows by the size of a full row from each table and you'll have your answer. You'll obviously need another loop for that too.
Does that all make sense?

Related

Snowflake sql table name wildcard

What is a good way to "select" from multiple tables at once when the list of tables is not known in advance in snowflake sql?
Something that simulates
Select * from mytable*
which would fetch same results as
Select * from mytable_1
union
Select * from mytable_2
...
I tried doing this in a multistep.
show tables like 'mytable%';
set mytablevar =
(select listagg("name", ' union ') table_
from table(result_scan(last_query_id())))
The idea was to use the variable mytablevar to store the union of all tables in a subsequent query, but the variable size exceeded the size limit of 256 as the list of tables is quite large.
Even if you do not hit 256 character limits, it will not help you to query all these tables. How will you use that session variable?
If you have multiple tables which have the same structure, and hold similar data that you need to query together, why are the data not in one big table? You can use Snowflake's clustering feature to distribute data based on a specific column to your micro-partitions.
https://docs.snowflake.com/en/user-guide/tables-clustering-micropartitions.html
Anyway, you may create a stored procedure which will create/replace a view.
https://docs.snowflake.com/en/sql-reference/stored-procedures-usage.html#dynamically-creating-a-sql-statement
And then you can query that view:
CALL UPDATE_MY_VIEW( 'myview_name', 'table%' );
SELECT * FROM myview_name;

Will "Where 0=1" parse full table or just return column names

I came across this question:
SQL Server: Select Top 0?
I want to ask if I use the query
SELECT * FROM table WHERE 0=1
or
SELECT TOP 0 * FROM table
will it return just the column names instantly, or will it keep on parsing the whole table and in the end return zero results?
I have a production table with 10,000 rows - will it check the WHERE condition on each row?
The SQL Server query optimizer is smart enough to figure out that this WHERE condition can never ever produce a true result on any row, so it doesn't bother actually scanning the table.
If you look at the actual execution plan for such a query, it's easy to see that nothing is being done and the query returns immediately:
MySql is smart enough to detect it and know its impossible to do.
desc SELECT * FROM table WHERE 0=1;
In the query
SELECT * FROM table WHERE 0=1
the WHERE clause never will be true so SQL Server is not going to scan all of your table.
And in the query
SELECT TOP 0 * FROM table
you are specifying TOP 0 so SQL Server is very smart so it never scans your table for returning 0 rows.
Both the queries will return only the column headers.
Both query is used for getting an empty set of table;
SELECT TOP 0 * FROM table
SELECT * FROM table WHERE 0=1
As well as for achieve below things:
To get same structure of column name
Used for return column details but no data
And want query to check connectivity

Select * from n tables

Is there a way to write a query like:
select * from <some number of tables>
...where the number of tables is unknown? I would like to avoid using dynamic SQL. I would like to select all rows from all the tables that (the tables) have a specific prefix:
select * from t1
select * from t2
select * from t3
...
I don't know how many t(n) might there be (might be 1, might be 20, etc.) The t table column structures are not the same. Some of them have 2 columns, some of them 3 or 4.
It would not be hard using dynamic SQL, but I wanted to know if there is a way to do this using something like sys.tables.
UPDATE
Basic database design explained
N companies will register/log in to my application
Each company will set up ONE table with x columns
(x depends on the type of business the company is, can be different, for example think of two companies: one is a Carpenter and the other is a Newspaper)
Each company will fill his own table using an API built by me
What I do with the data:
I have a "processor", that will be SQL or C# or whatever.
If there is at least one row for one company, I will generate a record in a COMMON table.
So the final results will be all in one table.
Anybody from any of those N companies will log in and will see the COMMON table filtered for his own company.
There would be no way to do that without Dynamic SQL. And having different table structures does not help that at all.
Update
There would be no easy way to return the desired output in one single result set (result set would have at least the same # of columns of the table with most columns and don't even get me started on data types compatibility).
However, you should check #KM.'s answer. That will bring multiple result sets.
to list ALL tables you could try :
EXEC sp_msforeachtable 'SELECT * FROM ?'
you can programmability include/exclude table by doing something like:
EXEC sp_msforeachtable 'IF LEFT(''?'',9)=''[dbo].[xy'' BEGIN SELECT * FROM ? END ELSE PRINT LEFT(''?'',9)'

Limit number of rows - TSQL - Merge - SQL Server 2008

Hi all i have the following merge sql script which works fine for a relatively small number of rows (up to about 20,000 i've found). However sometimes the data i have in Table B can be up to 100,000 rows and trying to merge this with Table A (which is currently at 60 million rows). This takes quite a while to process, which is understandable as it has to merge 100,000 with 60 million existing records!
I was just wondering if there was a better way to do this. Or is it possible to have some sort of count, so merge 20,000 rows from Table B to Table A. Then delete those merged rows from table B. Then do the next 20,000 rows and so on, until Table B has no rows left?
Script:
MERGE
Table A AS [target]
USING
Table B AS [source]
ON
([target].recordID = [source].recordID)
WHEN NOT MATCHED BY TARGET
THEN
INSERT([recordID],[Field 1]),[Field 2],[Field 3],[Field 4],[Field 5])
VALUES([source].[recordID],[source].[Field 1],[source].[Field 2],[source].[Field 3],[source].[Field 4],[source].[Field 5]
);
MERGE is overkill for this since all you want is to INSERT missing values.
Try:
INSERT INTO Table_A
([recordID],[Field 1]),[Field 2],[Field 3],[Field 4],[Field 5])
SELECT B.[recordID],
B.[Field 1],B.[Field 2],B.[Field 3],B.[Field 4],B.[Field 5]
FROM Table_B as B
WHERE NOT EXISTS (SELECT 1 FROM Table_A A
WHERE A.RecordID = B.RecordID)
In my experience MERGE can perform worse for simple operations like this. I try to reserve it for when you need varying operations depending on conditions, like an UPSERT.
You can definitely do (SELECT TOP 20000 * FROM B ORDER BY [some_column]) as [source] in USING and then delete these records after MERGE. So you pseudo-code will look like :
1. Merge top 20000
2. Delete 20000 records from source table
3. Check ##ROWCOUNT. If it's 0, exit; otherwise goto step 1
I'm not sure if it runs any faster than merging all the records at the same time.
Also, are you sure you need MERGE? From what I see in your code INSERT INTO ... SELECT should also work for you.

How do I limit the rowcount in an SSIS data flow task?

I have an Oracle source, and I'm getting the entire table, and it is being copied to a SQL Server 2008 table that looks the same. Just for testing, I would like to only get a subset of the table.
In the old DTS packages, under Options on the data transform, I could set a first and last record number, and it would only get that many records.
If I were doing a query, I could change it to a select top 5000 or set rowcount 5000 at the top (maybe? This is an Oracle source). But I'm grabbing the entire table.
How do I limit the rowcount when selecting an Oracle table?
We can use the rowcount component in the dataflow and after the component make User::rowCount <= 500 in the precedence constraint condition while loding into the target. Whenever the count >500 the process stops to inserts the data into the target table.
thanks
prav
It's been a while since I've touched pl/sql, but I would think that you could simply put a where condition of "rownum <= n" where n = the number of rows that you want for your sample. ROWNUM is a pseudo-column that exists on each Oracle table . . . it's a handy feature for problems like this (it's equivalent to t-sql's row_number() function without the ability to partition and sort (I think). This would keep you from having to bring in the whole table into memory:
select col1, col2
from tableA
where rownum <= 10;
For future reference (and only because I've been working with it lately), DB2's equivalent for this is the clause "fetch first n only" at the end of the statement:
select col1, col2
from tableA
fetch first 10 only;
Hope I've not been too off base.
The row sampling component in the data flow restricts the number of rows. Just insert it between your source and destination and set the number of rows. Very useful for a large amount of data and when you can not modify the query. In this example, I execute an SP in the source.
See example below