What is the maximum number of table names under a single schema in a DB2 subsystem? - sql

What is the maximum number of results possible from the following SQL query for DB2 on z/OS?
SELECT NAME FROM SYSIBM.SYSTABLES WHERE TYPE='T' AND CREATOR=? ORDER BY NAME ASC
This query is intended to fetch a list of all table names under a specific schema/creator in a DB2 subsystem.
I am having trouble finding a definitive answer. According to IBM's "Limits in DB2 for z/OS" article, the maximum number of internal objects for a DB2 database is 32767. Objects include views, indexes, etc.
I would prefer a more specific answer for maximum number of table names under one schema. For instance, here is an excerpt from an IDUG thread for a related question:
Based on the limit of 32767 objects in one database, where each tablespace takes two entries, and tables and indexes take one entry each, then the theoretical max would seem to be, with one tablespace per database,
32767 - 2 (for the single tablespace) = 32765 / 2 = 16382 tables, assuming you need at least one index per table.
Are these assumptions valid (each tablespace takes two entries, at least one index per table)?

assuming you need at least one index per table.
That assumption doesn't seem valid. Tables don't always have indexes. And you are thinking about edge cases where someone is already doing something weird, so I definitely wouldn't presume there will be indexes on each table.*
If you really want to handle all possible cases, I think you need to assume that you can have up to 32765 tables (two object identifiers are needed for a table space, as mentioned in the quote).
*Also, the footnote in the documentation you linked indicates that an index takes up two internal object descriptors. So the math is also incorrect in that quote. It would actually be 10921 tables if they each had an index. But I don't think that is relevant anyway.

I'm not sure your assumptions are appropriate because there are just too many possibilities to consider and in the grand scheme of things probably doesn't make much difference to the answer from your point of view
I'll rephrase your question to make sure I understand you correctly, you are after the maximum number of rows i.e. worst case scenario, that could possibly be returned by your SQL query?
DB2 System Limits
Maximum databases
Limited by system storage and EDM pool size
Maximum number of databases
65217
Maximum number of internal objects for each database
32767
The number of internal object descriptors (OBDs) for external objects are as follows
Table space: 2 (minimum required)
Table: 1
Therefore the maximum number of rows from your SQL query:
65217 * (32767 - 2) = 2,136,835,005
N.B. DB2 for z/OS does not have a 1:1 ratio between schemas and databases
N.N.B. This figure assumes 32,765 tables/tablespace/database i.e. 32765:1:1
I'm sure ±2 billion rows is NOT a "reasonable" expectation for max number of table names that might show up under a schema but it is possible

Related

Selecting one column from a table that has 100 columns

I have a table with 100 columns (yes, code smell and arguably a potentially less optimized design). The table has an 'id' as PK. No other column is indexed.
So, if I fire a query like:
SELECT first_name from EMP where id = 10
Will SQL Server (or any other RDBMS) have to load the entire row (all columns) in memory and then return only the first_name?
(In other words - the page that contains the row id = 10 if it isn't in the memory already)
I think the answer is yes! unless it has column markers within a row. I understand there might be optimization techniques, but is it a default behavior?
[EDIT]
After reading some of your comments, I realized I asked an XY question unintentionally. Basically, we have tables with 100s of millions of rows with 100 columns each and receive all sorts of SELECT queries on them. The WHERE clause also changes but no incoming request needs all columns. Many of those cell values are also NULL.
So, I was thinking of exploring a column-oriented database to achieve better compression and faster retrieval. My understanding is that column-oriented databases will load only the requested columns. Yes! Compression will help too to save space and hopefully performance as well.
For MySQL: Indexes and data are stored in "blocks" of 16KB. Each level of the B+Tree holding the PRIMARY KEY in your case needs to be accessed. For example a million rows, that is 3 blocks. Within the leaf block, there are probably dozens of rows, with all their columns (unless a column is "too big"; but that is a different discussion).
For MariaDB's Columnstore: The contents of one columns for 64K rows is held in a packed, compressed structure that varies in size and structure. Before getting to that, the clump of 64K rows must be located. After getting it, it must be unpacked.
In both cases, the structure of the data on disk is a compromises between speed and space for both simple and complex queries.
Your simple query is easy and efficient to doing a regular RDBMS, but messier to do in a Columnstore. Columnstore is a niche market in which your query is abnormal.
Be aware that fetching blocks are typically the slowest part of performing the query, especially when I/O is required. There is a cache of blocks in RAM.

Multiple tables vs one table with more columns

My chosen database is MongoDB. But the question should be independent.
So for example, each row of record will have a flag that can take 1 of 2 possible values.
What is the pro and con of:
Having 1 table with a column to hold the value of this flag.
versus:
the pro and con of:
Having 2 tables to hold the two different types of records distinguished by the aforementioned flag?
Would this be cheaper in terms of storage, since you don't have that extra column?
Would this also be faster in queries, since you know exactly which table to look without having to perform a filter?
What is the common practice in industry?
Storage for a single column holding just a flag (e.g. active and archived) should be negligible. The query could be faster with two tables, however your application is more complex, you have to write 2 queries.
When you have only 2 distinct values and these values are more or less evenly distributed, then an index will not be used, thus the performance should be equal - unless you select the entire table.
It might be useful to have 2 tables if the flags are not evenly distributed. For example you have a rather small active data set which is queried frequently, and a big archive data set which is much bigger but hardly queried.
If available, you can also work with partitions which is actually a good combination of both.

What is the max number of columns per table for mariadb

Hi I would like to know the maximum amount of columns allowed per table for the different storage engines and the max row size. I searched the mariadb website documentation and could not find the information. Thank you
MariaDB in its current form is still close enough to MySQL that the same limits apply. The MariaDB fork may diverge further as time goes on.
The actual answer for the maximum number of columns per table in MySQL is complex, because of differences in data types, storage engines, and metadata storage. Sorry there's not just a simple answer.
As #Manquer cites, there's an absolute limit of 64KB per row, but BLOB/TEXT columns don't count toward this limit.
InnoDB pages must fit at least two rows per page, and a page is 16KB minus some header information. So regardless of number of columns, the row must be about 8000 bytes. But VARCHAR/BLOB/TEXT columns can overflow to additional pages in interesting ways. See http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/ for the details.
But there are even further restrictions, based on the .FRM metadata file that applies to all storage engines. It gets really complex, read http://www.mysqlperformanceblog.com/2013/04/08/understanding-the-maximum-number-of-columns-in-a-mysql-table/ for the details and examples about that. There's too much to copy down here.
Given the latter blog article, I was able to design a table that failed at 59 columns.
MariaDb being originally a fork and drop in replacement for MySQL, mostly follows similar design constraints as MySQL. Although MariaDB documentation does not explicitly say how many columns are allowed.
This number is highly dependent a number of factors including the storage engine used and the way the columns are structured. For InnoDB this is 1,000.
See explanation below from the official documentation (Ref: Column Count Limit)
There is a hard limit of 4096 columns per table, but the effective
maximum may be less for a given table. The exact limit depends on
several interacting factors.
Every table (regardless of storage engine) has a maximum row size of
65,535 bytes. Storage engines may place additional constraints on this
limit, reducing the effective maximum row size.
The maximum row size constrains the number (and possibly size) of
columns because the total length of all columns cannot exceed this
size.
...
Individual storage engines might impose additional restrictions that
limit table column count.
InnoDB permits up to 1000 columns.
This applies to MariaDb as well

How database system comes to know how many different values a particular column has?

At following link
http://www.programmerinterview.com/index.php/database-sql/selectivity-in-sql-databases/
the author has written that since "SEX" column has only two possible values thus its selectivity for 10000 records would be; according to formula given; 0.02 %.
But my question that how a database system come to know that this particular column has this many unique values? Wouldn't the database system require scanning the entire table at least once? or some other way the database system would come to know about those unique values?
First, you are applying the formula wrong. The selectivity for sex (in the example given) would be 50% not 0.02%. That means that each value appears about 50% of the time.
The general way that databases keep track of this is using something called "statistics". These are measures that are kept about all tables and used by the optimizer. Sometimes, the information can also be provided by an index on the column.
Comming back to your actual question: Yes, the database scans all table data frequently and saves some statistics, (e.g. max value, min value, number of distinct keys, number of rows in a table, etc.) in a internal table. These statistics are used to estimate the basic result of your query (or other DML operations) in order to evalutat the optimal execution plan. You can manually trigger generation of statistic by running command EXEC DBMS_STATS.GATHER_DATABASE_STATS; or some of the other ones. You can advise Oracle also to read only a sample of all data (e.g. 10% of all rows)
Usually data content does not change drastically, so it does not matter if the numbers are not absolutly exact, they are (usually) sufficient to estimate an execution plan.
Oracle has many processes related to calculating the number of distinct values (NDV).
Manual Statistics Gathering: Statistics gathering can be triggered manually, through many different procedures in DBMS_STATS.
AUTOTASK: Since 10g Oracle has a default AUTOTASK job, "auto optimizer stats collection". It will only gather statistics if the current stats are stale.
Bulk Load: In 12c statistics can be gathered during a bulk load.
Sample: The NDV can be computed from 100% of the data or can be estimated based on a sample. The sample can be either based on blocks or rows.
One-pass distinct sampling: 11g introduced a new AUTO_SAMPLE_SIZE algorithm. It scans the entire table but only uses one pass. It's much faster to scan the whole table than to have to sort even a small part of it. There are several more in-depth descriptions of the algorithm, such as this one.
Incremental Statistics: For partitioned tables Oracle can store extra information about the NDV, called a synopsis. With this information, if only a single partition is modified, only that one partition needs to be analyzed to generate both partition and global statistics.
Index NDV: Index statistics are created by default when an index is created. Also, the information can be periodically re-gathered from DBMS_STATS.GATHER_INDEX_STATS or the cascade option in other procedures in DBMS_STATS.
Custom Statistics: The NDV can be manually set with DBMS_STATS.SET_* or ASSOCIATE STATISTICS.
Dynamic Sampling: Right before a query is executed, Oracle can automatically sample a small number of blocks from the table to estimate the NDV. This usually only happens when statistics are missing.
Database scans the data set in a table so it can use the most efficient method to retrieve data. Database measures the uniqueness of values using the following formula:
Index Selectivity = number of distinct values / the total number of values
The result will be between zero or one. Index Selectivity of zero means that there are not any unique values. In these cases indexes actually reduce performance. So database uses sequential scanning instead of seek operations.
For more information on indexes read https://dba.stackexchange.com/questions/42553/index-seek-vs-index-scan

Largest table size with SQL Server 2008 R2?

For example, a website offers the ability to create mobile surveys. Each survey ID is a FK in the survey response table, which contains ALL of the survey responses.
What is the size limitation of this table in a SQL Server 2008 db, if the table contains, say 20 varchar(255) fields including the bigint PK & FK?
I realize this would depend on the file size limitation as well, but I would like some more of an educated answer rather than my guess on this.
In terms of searchability, some fields that contain geo-related details such as the survey ID, city, state, and two commends fields would have to be searchable, and thus indexed ... index only these fields?
Also, aged responses would expire after a given amount of time - thus deleted from the table. Does the table, at this point being very large, need to be re-indexed/cleaned up, after the deletions (which would be an automated process)?
Thanks.
Maximum Capacity Specifications for SQL Server
Bytes per row: 8,060
Rows per table: Limited by available storage
Note
SQL Server supports row-overflow storage which enables variable length
columns to be pushed off-row. Only a 24-byte root is stored in the
main record for variable length columns pushed out of row; because of
this, the effective row limit is higher than in previous releases of
SQL Server. For more information, see the "Row-Overflow Data Exceeding
8 KB" topic in SQL Server Books Online
You mention 'table size' -- does this mean number of rows?
Maximum Capacity Specifications for SQL Server
Rows per table : Limited by available storage
As per this Reference, the max size of a table is limited by the available storage.
It sounds like you are going to have a high traffic and high content table. You should consider performance and storage enhancements like Table Partitioning. Also, because this table will be the victim of often INSERTS/UPDATES/DELETES, carefully plan out your indexing, as indexes add overhead for DML statements on the table.