This is the DB2 create table statement:
CREATE TABLE SCHEMA_F.TABLE_A (
LEAFDESCRIPTION FOR F7UY2 CHAR(10) CCSID 1141 NOT NULL WITH DEFAULT
...
What does LEAFDESCRIPTION FOR F7UY2 mean exactly?
tl;dr
It allows you to specify the short name usable from (older) RPG programs on the i and some system utilities. In this case, LeafDescription is the long name usable from SQL, and F7UY2 usable inside RPG, etc (and SQL, but long names are usually preferred for obvious reasons).
It's part of the system-specific behavior available on the i.
See, the i (iSeries, Systemi, AS/400) started out as a very different machine. Several decades ago, before SQL was really a thing. And at the time, field and table names were limited to around 6 characters, and programs were written with this in mind.
Now, flash forward to when SQL comes around. On the i, DB2 SQL got retrofitted (of a sort) on top of the existing file system. This meant that, among other things, you could query over your existing files with SQL without recreating or repopulating them.
Unfortunately, it also meant they were stuck with the original, semi-cryptic names, in part because you still had all the original programs too. If you created (or re-created, as the case may be) a table via SQL, RPG programs couldn't reference it. So this allowed you to specify a name it could reference - which meant you could switch to SQL-based tables (which had some additional, new, advantages) without needing to change your programs. Even with the name-length restriction being lifted in recent versions of RPG, the ability to avoid renaming everything all at once is a boon.
Related
Trying to develop something which should be portable between the bigger RDBMS'es.
The issue is around generating and using auto-increment numbers as the primary key for a table.
There are two topics here
The mechanism used to generate the auto-increment numbers.
How to specify that you want to use this as the primary key on a
table.
I'm looking for verification for what I think is the current state of affairs:
Unfortunately standardization came late to this area and in some respect is still not implemented (as a mandatory standard). This means it is still in 2013 impossible to write a CREATE TABLE statement in a portable way ... if you want it with a auto-generated primary key.
Can this really be so?
Re (1). This is standardized because it came in SQL:2003. As far as I understand the way to go is SEQUENCEs. I believe these are a mandatory part of SQL:2003, right? The other possibility is the IDENTITY keyword which is also defined in SQL:2003 but that one is - as far as I can tell - an optional part of the standard ... which means a key player like Oracle doesn't implement it... and can still claim compliance. Ok, so SEQUENCEs is the designated portable method for this, right ?
Re (2). Database vendors implement this in different ways. In PostgreSQL you can link the CREATE TABLE statement directly with the sequence, in Oracle you would have to create a trigger to ensure the SEQUENCE is used with the table.
So my conclusion is that without a standardized solution to (2) it really doesn't help much that all the major players now support SEQUENCEs. I would still have to write db-specific code for something as simple as a CREATE TABLE statement.
Is this right?
Standards and their implementation aside I would also be interested if anyone has a portable solution to the problem, no matter if it is a hack from a RDBMS best practice perspective. For such a solution to work it would have to be independent from any application, i.e. it must the database that solves the issue, not the application layer. Perhaps if both the concept of TRIGGERs and SEQUENCEs can be said to be standardized then a solution that combines the two of them would be portable?
As for "portable create table statements": It starts with the data types: Whether boolean, int or long data types are part of any SQL standard or not, I really appreciate these types. PostgreSql supports these data types, Oracle does not. Ironically Oracle supports boolean in PL/SQL, but not as a data type in a table. Even the length of table/column names etc. are restricted in Oracle to 30 characters. So not even the most simple "create table" is always portable.
As for auto-generated primary keys: I am not aware of a syntax which is portable, so I do not define this in the "create table". Of couse this only delays the problem, and leaves it to the insert statements. This topic is connected with another problem: Getting the generated key after an insert using JDBC in the most efficient way. This differs substantially between Oracle and PostgreSql, and if you have ever dared to use case sensitive table/column names in Oracle, it won't be funny.
As for constraints, I prefer to add them in separate statements after "create table". The set of constraints may differ, if you implement a boolean data type in Oracle using char(1) together with a check constraint whereas PostgreSql supports this data type directly.
As for "standards": One example
SQL99 standard: for SELECT DISTINCT, ORDER BY expressions must appear in select list
This message is from PostgreSql, Oracle 11g does not complain. After 14 years, will they change it?
Generally speaking, you still have to write database specific code.
As for your conclusion: In our scenario we implemented a portable database application using a model driven approach. This logical meta data is used by the application, and there are different back ends for different database types. We do not use any ORM, just "direct SQL", because this simplifies tuning of SQL statements, and it gives full access to all SQL features. We wrote our own library, and later we found out that the key ideas match these of "Anorm".
The good news is that while there are tons of small annoyances, it works pretty well, even with complex queries. For example, window aggregate functions are quite portable (row_number(), partition by). You have to use listagg on Oracle, whereas you need string_agg on PostgreSql. Recursive commen table expressions require "with recursive" in PostgreSql, Oracle does not like it. PostgreSql supports "limit" and "offset" in queries, you need to wrap this in Oracle. It drives you crazy, if you use SQL arrays both in Oracle and PostgreSql (arrays as columns in tables). There are materialized views on Oracle, but they do not exist in PostgreSql. Surprisingly enough, it is possible to write database stored procedures not only in Java, but in Scala, and this works amazingly well in both Oracle and PostgreSql. This list is not complete. But so far we managed to find an acceptable (= fast) solution for any "portability problem".
Does it pay off? In our scenario, there is a central Oracle installation (RAC, read/write), but there are distributed PostgreSql installations as localhost databases on each application server (only readonly). This gives a big performance and scalability boost, without the cost penalty.
If you really want to have it solved in the database only, there is one possibility: Put anything in stored procedures, write these in Java/Scala, and restrict yourself in the application to call these procedures, and to read the result sets. This of course just moves the complexity from the application layer into the database, but you accepted hacks :-)
Triggers are quite standardized, if you use Java stored procedures. And if it is supported by your databases, by your management, your data center people, and your colleagues. The non-technical/social aspects are to be considered as well. I have even heard of database tuning people which do not accept the general "left outer join" syntax; they insisted on the Oracle way of using "(+)".
So even if triggers (PL/SQL) and sequences were standardized, there would be so many other things to consider.
Update
As for returning the generated primary keys I can only judge the situation from JDBC's perspective.
PostgreSql returns it, if you use Statement.getGeneratedKeys (I consider this the normal way).
Oracle requires you to specify the (primary key) column(s) whose values you want to get back explicitly when you create the prepared statement. This works, but only if you are not using case sensitive table names. In that case all you receive is a misleading ORA-00942: table or view does not exist thrown in Oracle's JDBC driver: There was/is a bug in Oracle's JDBC driver, and I have not found a way to get the value using a portable JDBC method. So at the cost of an additional proprietary "select sequence.currVal from dual" within the same transaction right after the insert, you can get back the primary key. The additional time was acceptable in our case, we compared the times to insert 100000 rows: PostgreSql is faster until the 10000th row, after that Oracle performs better.
See a stackoverflow question regarding the ways to get the primary key and
the bug report with case sensitive table names from 2008
This example shows pretty well the problems. Normally PostgreSql follows the way you expect it to work, but you may have to find a special way for Oracle.
I'm trying to find an efficient way to migrate tables with DB2 on the mainframe using JCL. When we update our application such that the schema changes, we need to migrate the database to match.
What we've been doing in the past is basically creating a new table, selecting from the old table into that, deleting the original and renaming the new table to the original name.
Needless to say, that's not a very high-performance solution when the tables are big (and some of them are very big).
With latter versions of DB2, I know you can do simple things like alter column types but we have migration jobs which need to do more complicated things to the data.
Consider for example the case where we want to combine two columns into one (firstname + lastname -> fullname). Never mind that it's not necessarily a good idea to do that, just take it for granted that this is the sort of thing we need to do. There may be arbitrarily complicated transformations to the data, basically anything you can do with a select statement.
My question is this. The DB2 unload utility can be used to pull all of the data out of a table into a couple of data sets (the load JCL used for reloading the data, and the data itself). Is there an easy way (or any way) to massage this output of unload so that these arbitrary changes are made when reloading the data?
I assume that I could modify the load JCL member and the data member somehow to achieve this but I'm not sure how easy that would be.
Or, better yet, can the unload/load process itself do this without having to massage the members directly?
Does anyone have any experience of this, or have pointers to redbooks or redpapers (or any other sources) that describe how to do this?
Is there a different (better, obviously) way of doing this other than unload/load?
As you have noted, SELECTing from the old table into the new table will have very poor performance. Poor performance here is generally due to the relatively high costs of insertion INTO the target table (index building and RI enforcement). The SELECT itself is generally not a performance issue. This is why the LOAD utility is generally perferred when large tables need to be populated from scratch, indices may be built more efficiently and RI may be deferred.
the UNLOAD utility allows unrestricted usage of SELECT. If you can SELECT data using scalar and/or column functions to build a result set that is compatible with your new table column definitions then UNLOAD can be used to do the data conversion. Specify a SELECT statement in SYSIN for the UNLOAD utility. Something like:
//SYSIN DD *
SELECT CONCAT(FIRST_NAME, LAST_NAME) AS "FULLNAME"
FROM OLD_TABLE
/*
The resulting SYSRECxx file will contain a single column that is a concatenation of the two identified columns (result of the CONCAT function) and SYSPUNCH will contain a
compatible column definition for FULLNAME - the converted column name for the new table. All you need to do is edit the new table name in SYSPUNCH (this will have defaulted to TBLNAME) and LOAD it. Try not to fiddle with the SYSRECxx data or the SYSPUNCH column definitions - a goof here could get ugly.
Use the REPLACE option when running the LOAD utility
to create the new table (I think the default is LOAD RESUME which won't work here). Often it is a good idea to leave RI off when running the LOAD, this will improve performance and
save the headache of figuring out the order in which LOAD jobs need to be run. Once finished you need to verify the
RI and build the indices.
The LOAD utility is documented here
I assume that I could modify the load JCL member and the data member somehow to achieve this but I'm not sure how easy that would be.
I believe you have provided the answer within your question. As to the question of "how easy that would be," it would depend on the nature of your modifications.
SORT utilities (DFSORT, SyncSort, etc.) now have very sophisticated data manipulation functions. We use these to move data around, substitute one value for another, combine fields, split fields, etc. albeit in a different context from what you are describing.
You could do something similar with your load control statements, but that might not be worth the trouble. It will depend on the extent of your changes. It may be worth your time to attempt to automate modification of the load control statements if you have a repetitive modification that is necessary. If the modifications are all "one off" then a manual solution may be more expedient.
I have a field that should contain the place of a seminar. A seminar can be held:
In-house
In another city
In another country
At first, I had only one table, so I have used an enum. But now I have three tables which can't be merged and they all should have this information and customer wants this field to be customizable to add or remove options in the future. But the number of options will be limited they say, probably 5 or so.
Now, my question is, should I use an enum or a table for this field? More importantly, what would be the proper way to decide between an enum or a table?
PS: enum fields are dynamically retrieved from the database, they are not embedded in the code.
As a general rule of thumb, in an application, use an enum if:
Your values change infrequently, and
There are only a few values.
Use a lookup table if:
Values changes frequently, or
Your customer expects to be able to add, change, or delete values in realtime, or
There are a lot of values.
Use both if the prior criteria for an enum are met and:
You need to be able to use a tool external to your application to report on the data, or
You believe you may need to eliminate the enum in favor of a pure lookup table approach later.
You'll need to pick what you feel is a reasonable cut of for the number of values for your cutoff point, I often hear somewhere between 10 and 15 suggested.
If you are using an enum construct provided by your database engine, the first 2 groups of rules still apply.
In your specific case I'd go with the lookup table.
If the elements are fixed in that case enum should be used if in case of variable data then enum should not be used. In those cases table is more preferable.
I think in your case table should be ok.
I agree with the top answers about enums being lean, slighty faster and reduce complexity of your database when used reasonably (that is without a lot of and fastchanging values).
BUT: There are some enormous caveats to consider:
The ENUM data type isn't standard SQL, and beyond MySQL not many other DBMS's have native support for it. PostgreSQL, MariaDB, and Drizzle (the latter two are forks of MySQL anyway), are the only three that I know of. Should you or someone else want to move your database to another system, someone is going to have to add more steps to the migration procedure to deal with all of your clever ENUMs.
(Source and even more points on http://komlenic.com/244/8-reasons-why-mysqls-enum-data-type-is-evil/)
ENUMS aren't supported by most ORMs such as Doctrine exactly because they're not SQL Standard. So if you ever consider using an ORM in your project or even use something as the Doctrine Migrations Bundle
you'll probably end up writing a complex extension bundle (which I've tried) or using an existing like this one which in Postgresql for example cannot even add more tha a pseudosupport for enums: by treating enums as string type with a check constraint. An example:
CREATE TABLE test (id SERIAL NOT NULL, pseudo_enum_type VARCHAR(255) CHECK(pseudo_enum_type IN ('value1', 'value2', 'value3')) , ...
So the gain of using Enums in a bit more complex setups really is below zero.
Seriously: If I don't absolutely have to (and I don't) I'd always avoid enums in a database.
What is the purpose of system table master..spt_values?
Why was it provided and how one should use it?
What are the meanings of its type, low, high values?
Update:
Google search gives thousands of " its uses", for example:
to split column using master..spt_values
it contains numbers from 0 to 2047. It is very useful.for example if you need to populate a table with 100 numbers from this range
for building indexes
creating virtual calendars
getting descriptions of used objects in somewhat non-intuitive and complicated way
Enumerate all the indexes in a SQL Server database (Giuseppe Dimauro, devx.com)
"SQL Server 2005 script to display disk space usage"
"It's not worth talking about. This table is a design nightmare of a "super" lookup table"
Some posts warn against its use since it can be removed in future versions of SQL Server but there are already code scripts using it to illustrate new features of new SQL Server 11 (Denali):
Using OFFSET N ROWS FETCH NEXT N ROWS ONLY In SQL Server Denali for easy paging
The spt_values table is not mentioned in the the SQL Server documentation but it goes back to the Sybase days and there is some extremely minimal documentation in the Sybase online docs that can be summed up in this comment:
To see how it is used, execute sp_helptext and look at the text for
one of the system procedures that references it.
In other words, read the code and work it out for yourself.
If you poke around the system stored procedures and examine the table data itself, it's fairly clear that the table is used to translate codes into readable strings (among other things). Or as the Sybase documentation linked above puts it, "to convert internal system values [...] into human-readable format"
The table is also sometimes used in code snippets in MSDN blogs - but never in formal documentation - usually as a convenient source of a list of numbers. But as discussed elsewhere, creating your own source of numbers is a safer and more reliable solution than using an undocumented system table. There is even a Connect request to provide a 'proper', documented number table in SQL Server itself.
Anyway, the table is entirely undocumented so there is no significant value in knowing anything about it, at least from a practical standpoint: the next servicepack or upgrade might change it completely. Intellectual curiosity is something else, of course :-)
Mostly answered in another question.
What are the meanings of its type, low, high values?
Most Types (ie. each specific lookup table, if they were separate) require either Name or Number. Some Types require Low and High columns as well. In Sybase, we have an ErrorNumber column, so the sproc can RAISERROR which is not hard-coded (can be changes with versions, etc).
It is not "cryptic" unless an enumerated list is "cryptic" to you; it is just a lookup table (all of them, differentiated by the Type column).
One usage, definitely not its primary usage, is to generate a series of numbers:
--Generate a series from 1 to 100
SELECT
TOP 100
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RowNum
FROM master.dbo.spt_values ORDER BY RowNum
I'm working on an application developed by another mob and am confounded by the use of a char field instead of bit for all the boolean columns in the database. It uses "Y" for true and "N" for false (these have to be uppercase). The type name itself is then aliased with some obscure name like ybln.
This is very annoying to work with for a lot of reasons, not the least of which is that it just looks downright aesthetically unpleasing.
But maybe its me that's stupid - why would anyone do this? Is it a database compatibility issue or some design pattern that I am not aware of?
Can anyone enlighten me?
I've seen this practice in older database schemas quite often. One advantage I've seen is that using CHAR(1) fields provides support for more than Y/N options, like "Yes", "No", "Maybe".
Other posters have mentioned that Oracle might have been used. The schema I referred to was in-fact deployed on Oracle and SQL Server. It limited the usage of data types to a common subset available on both platforms.
They did diverge in a few places between Oracle and SQL Server but for the most part they used a common schema between the databases to minimize the development work needed to support both DBs.
Welcome to brownfield. You've inherited an app designed by old-schoolers. It's not a design pattern (at least not a design pattern with something good going for it), it's a vestige of coders who cut their teeth on databases with limited data types. Short of refactoring the DB and lots of code, grit your teeth and gut your way through it (and watch your case)!
Other platforms (e.g. Oracle) do not have a bit SQL type. In which case, it's a choice between NUMBER(1) and a single character field. Maybe they started on a different platform or wanted cross platform compatibility.
I don't like the Y/N char(1) field as a replacement to a bit column too, but there is one major down-side to a bit field in a table: You can't create an index for a bit column or include it in a compound index (at least not in SQL Server 2000).
Sure, you could discuss if you'll ever need such an index. See this request on a SQL Server forum.
They may have started development back with Microsoft SQl 6.5
Back then, adding a bit field to an existing table with data in place was a royal pain in the rear. Bit fields couldn't be null, so the only way to add one to an existing table was to create a temp table with all the existing fields of the target table plus the bit field, and then copy the data over, populating the bit field with a default value. Then you had to delete the original table and rename the temp table to the original name. Throw in some foriegn key relationships and you've got a long script to write.
Having said that, there were always 3rd party tools to help with the process. If the previous developer chose to use char fields in lieu of bit fields, the reason, in a nutshell, was probably laziness.
The reasons are as follows (btw, they are not good reasons):
1) Y/N can quickly become "X" (for unknown), "L" (for likely), etc. - What I mean by this is that I have personally worked with programmers who were so used to not collecting requirements correctly that they just started with Y/N as sort of 'flags' with the superstition that it might need to expand (to which they should use an int as a status ID).
2) "Performance" - but as was mentioned above, SQL indexes are ruled out if they are not 'selective' enough... a field that only has 2 possible values will never use that index.
3) Lazyness. - Sometimes developers want to output directly to some visual display with the letter "Y" or "N" for human readableness, and they don't want to convert it themselves :)
There are all 3 bad reasons that I've heard/seen before.
I can't imagine any disadvantage in not being able to index a "BIT" column, as it would be unlikely to have enough different values to help the execution of a query at all.
I also imagine that in most cases the storage difference between BIT and CHAR(1) is negligible (is that CHAR a NCHAR? does it store a 16bit, 24bit or 32bit unicode char? Do we really care?)
This is terribly common in mainframe files, COBOL, etc.
If you only have one such column in a table, it's not that terrible in practice (no real bit-wasting); after all SQL Server will not let you say the natural WHERE BooleanColumn, you have to say WHERE BitColumn = 1 and IF #BitFlag = 1 instead of the far more natural IF #BooleanFlag. When you have multiple bit columns, SQL Server will pack them. The case of the Y/N should only be an issue if case-sensitive collation is used, and to stop invalid data, there is always the option of a constraint.
Having said all that, my personal preference is for bits and only allowing NULLs after careful consideration.
Apparently, bit columns aren't a good idea in MySQL.
They probably were used to using Oracle and didn't properly read up on the available datatypes for SQL Server. I'm in exactly that situation myself (and the Y/N field is driving me nuts).
I've seen worse ...
One O/R mapper I had occasion to work with used 'true' and 'false' as they could be cleanly cast into Java booleans.
Also, On a reporting database such as a data warehouse, the DB is the user interface (metadata based reporting tools notwithstanding). You might want to do this sort of thing as an aid to people developing reports. Also, an index with two values will still get used by index intersection operations on a star schema.
Sometimes such quirks are more associated with the application than the database. For example, handling booleans between PHP and MySQL is a bit hit-and-miss and makes for non-intuitive code. Using CHAR(1) fields and 'Y' and 'N' makes for much more maintainable code.
I don't have any strong feelings either way. I can't see any great benefit to doing it one way over another. I know philosophically the bit fields are better for storage. My reality is that I have very few databases that contain a lot of logical fields in a single record. If I had a lot then I would definitely want bit fields. If you only have a few I don't think it matters. I currently work with Oracle and SQL server DB's and I started with Cullinet's IDMS database (1980) where we packed all kinds of data into records and worried about bits and bytes. While I do still worry about the size of data, I long ago stopped worrying about a few bits.