How to document sql query - sql

For a Client of mine I'm documenting an existing database with a few tables and some queries.
For tables I'm using E-R Diagram to show tables and relationships. I'm doing that with DIA Diagram Editor.
How do I describe and visualize queries? There exist some sort of UML Object to do that?
So far, I've created a table with query name and a description of what it does and an example screenshot of data retrieved by the query itself. I'm doing that in Word.
I don't like the result of this work, there exist something more professional to do that?
I wouldn't install new software just know how database designers have to accomplish that task.
EDIT 1
As #Serg suggested I may use view and diagramm the view as an entity.
If I've understood something like:
What the client needs is to understand for each query where data come from.
EDIT 2
I'm doing something like the following:
Where
driver_tabella_utenti is a TABLE
driver_imp_causali_preparazione is a QUERY
driver_query_riepilogo_prsp is a QUERY
I think that isn't bast practice! How can I visual design sql queries as I do with entities and relationships?

Short summary
I see 2 options based on specification. This is only based on specification taken literally, not any additional research on the web.
Model class providing an operation representing what the SQL query does. Present a behavior describing this operation as a class stereotyped with <> and with linked object of a type OpaqueBehavior with provided values for attributes body (an SQL query) and language ('SQL').
Model class providing an operation representing what the SQL query does. Provide a note linked to the class containing description of respective OpaqueBehavior (body and language)
I've found another solution in the Specification
Model SQL query as action. The action can be depicted as usually in activity diagram (rectangle with rounded corners) and put SQL directly inside of the rectangle. The action needs to be a part of activity diagram (that is a description of a behaviour that utilizes this SQL).
Explanation
You can use a BehavioralFeature (e.g. Operation) to define that there is some SQL available (it sould be some class that exposes this operation) and then you can define a method with describing OpaqueBehavior (method) that contains body (SQL statement) and language ('SQL').
As specification does not provide any information about notation you can specify it either representing a method as an object (this is a specific instance of OpaqueBehavior) or using a note. In both cases it should be linked to a respective object describing the Behavior provided by the SQL statement as such (e.g. activity).
See 13.2.3.3 of UML Sepcification. Below is just the diagram describing this area.
Edit as a result of further research:
UML Specification describes literally your case in section 16.2 (as OpaqueAction). An example figure 16.8 in section 16.2.5.1 is exactly the case I've described as a 3rd possible answer.
Note however (as already mentioned in comments) that UML is not always the best suitable solution. While I strongly recommend modelling the system using UML, the SQL code itself should be a part of textual documentation that should be created together with the UML diagrams. It will benefit from more clarity, possibility to search, possibility to copy-paste the code etc. Also if your query is more than 2-3 short lines it might even hard to notice that it is still a part of an UML diagram.

Related

Dynamic TableView columns from a result set (kotlin)

To set the preface: I am a programmer with very limited experience in programming other than COBOL like languages which means I am very new to JVM based programming languages. But I been using python for personal use and am not completely new to modern age programming concepts. Benefit of new bee is expected and please don't shoot down the post.
I am looking to write a query tool. When a query is executed, the data is presented to the user in a table view. So as you can imagine the columns list going to be as dynamic as it can get. Closest I found about dynamic table view creation is here
Dynamic table columns
To write an MVC based query tool, how should the model be created? So that when the data in the table view is updated, it can be updated back to the database and vice versa.
I will assume that when you say 'dynamic table columns' you just mean table columns in which you edit data in the columns and it gets relayed back to the View Model. If you look at the guide in this page and go to the section Using "Property" properties. It talks about how you can make columns editable. In terms of handling data, this section talks all about the basic structure of MVVM. In your case, your database data would be represented by a class representation that writes and reads based into class members. That implementation is up to you however.

What are the origins of terms DDL, DML and DCL?

I'm familiar with the definitions of DDL, DML, and DCL as applied to SQL. There are lots of web sites and books that define and explain them. But no one seems to give an authoritative reference.
I'm interested in the origin of these terms. Did SQL invent them? Were they already used historically for other databases? Did some other standard create them, and they were used by SQL? Or did SQL even use them at all in the ISO specifications?
One book indicates that SQL92 included these terms, but I can't find them in the draft available online. (Maybe I'll have to purchase the final SQL92 ISO specification to know for sure.) SQL:1999 switched to a different classification system.
(One reason I'm curious is that, if these were general industry terms not invented by SQL, then it wouldn't necessarily be incorrect to continue using them, complementary to the new SQL classifications.)
Can anyone provide more insight on the origin of these terms, along with authoritative references to any standards or specifications that might have originally defined them?
According to Wikipedia's "Data definition language" entry:
The concept of the data definition language and its name was first
introduced in relation to the Codasyl database model, where the schema
of the database was written in a language syntax describing the
records, fields, and sets of the user data model.[1] Later it was used
to refer to a subset of Structured Query Language (SQL) for declaring
tables, columns, data types and constraints. SQL-92 introduced a
schema manipulation language and schema information tables to query
schemas. These information tables were specified as SQL/Schemata in
SQL:2003. The term DDL is also used in a generic sense to refer to any
formal language for describing data or information structures.
A bit more detail is provided in Wikipedia's "Codasyl" entry:
In October 1969 the DBTG published its first language specifications
for the network database model which became generally known as the
CODASYL Data Model. This specification in fact defined several
separate languages: a data definition language (DDL) to define the
schema of the database, another DDL to create one or more subschemas
defining application views of the database; and a data manipulation
language (DML) defining verbs for embedding in the COBOL programming
language to request and update data in the database. Although the work
was focused on COBOL, the idea of a host-language independent database
was starting to emerge, prompted by IBM's advocacy of PL/I as a COBOL
replacement.
And the "Data Base Task Group" Wikipedia entry says it published a final report 2 years later:
In April 1971, the DBTG published a report containing specifications
of a Data Manipulation Language (DML) and a Data Definition Language
(DDL) for standardization of network database model. The first DBTG
proposals had already been published in 1969. The specification was
subsequently modified and developed in various committees and
published by other reports in 1973 and 1978. The specification is
often referred to as the DBTG database model or the CODASYL database
model. As well as the data model, many basic concepts of database
terminology were introduced by this group, notably the concepts of
schema and subschema.
The above covers DDL and DML. Unfortunately Wikipedia's "Data control language" entry doesn't have much detail at the time of writing and I was unable to find the origins of this term elsewhere. But given the above and the Google Ngram graph shown below, would suspect it came later - possibly some time in the mid-1970s.
And here is another graph showing all three terms that appears to back this up:

Embeddable vs one to many

I have seen an article in Dzone regarding Post and Post Details (two different entities) and the relations between them. There the post and its details are in different tables. But as I see it, Post Detail is an embeddable part because it cannot be used without the "parent" Post. So what is the logic to separate it in another table?
Please give me a more clear explanation when to use which one?
Embeddable classes represent the state of their parent classes. So to take your example, a StackOverflow POST has an ID which is invariant and used in an unbreakable URL for sharing e.g. http://stackoverflow.com/q/44017535/146325. There are a series of other attributes (state, votes, etc) which are scalar properties. When the post gets edited we have various versions of the text (which are kept and visible to people with sufficient rep). Those are your POST DETAILS.
"what is the logic to separate it in another table?"
Because keeping different things in separate tables is what relational databases do. The standard way of representing this data model is a parent table POST and child table POST_DETAIL with a defined relationship enforced through a foreign key.
Embeddable is a concept from object-oriented programming. Oracle does support object-relational constructs in the database. So it would be possible to define a POST_DETAIL Type and create a POST Table which has a column declared as a nested table of that Type. However, that would be a bad design for two reasons:
The SQL for working with nested tables is clunky. For instance, to get the POST and the latest version of its text would require unnesting the collection of details every time we need to display it. Computationally not much different from joining to a child table and filtering on latest version flag, but harder to optimise.
Children can have children themselves. In the case of Posts, Tags are details because they can vary due to editing. But if you embed TAG in POST_DETAIL embedded in POST how easy would it be to find all the Posts with an [oracle] tag?
This is the difference between Object-Oriented design and relational design.
OO is strongly hierarchical: everything is belongs to something and the way to get the detail is through the parent. This approach works well when dealing with single instances of things, and so is appropriate for UI design.
Relational prioritises commonality: everything of the same type is grouped together with links to other things. This approach is suited for dealing with sets of things, and so is appropriate for data management tasks (do you want to find all the employees who work in BERLIN or whose job is ENGINEER or who are managed by ELLIOTT?)
"give me a more clear explanation when to use which one"
Always store the data relationally in separate tables. Build APIs using OO patterns when it makes sense to do so.

Context based (parameterised) Queries in Sparx EA

The Sparx Enterprise Architect Searches are great, however I would like to search for all requirements linked to a specific object (Activity) which I have included on a specific diagram. On that Diagram, I have added a model view (which can display a SQL query result)
My question is - Is there any way to obtain some sort of contextual perspective to use in the query? - essentially, I would like to know the diagram guid which the ModelView is being run by.
Personally I don't use Model Views, but AFAIK there's nothing extending SQL in that direction. You might want to send a feature request. But don't hold your breath.
On the other hand, if you hard code the GUID of a query, it will work for individual diagrams only. That calls for maintenance issues. Rather, you could stereotype diagrams and use that information in your query.

How to document a database [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
This post was edited and submitted for review 12 months ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
(Note: I realize this is close to How do you document your database structure? , but I don't think it's identical.)
I've started work at a place with a database with literally hundreds of tables and views, all with cryptic names with very few vowels, and no documentation. They also don't allow gratuitous changes to the database schema, nor can I touch any database except the test one on my own machine (which gets blown away and recreated regularly), so I can't add comments that would help anybody.
I tried using "Toad" to create an ER diagram, but after leaving it running for 48 hours straight it still hadn't produced anything visible and I needed my computer back. I was talking to some other recent hires and we all suggested that whenever we've puzzled out what a particular table or what some of its columns means, we should update it in the developers wiki.
So what's a good way to do this? Just list tables/views and their columns and fill them in as we go? The basic tools I've got to hand are Toad, Oracle's "SQL Developer", MS Office, and Visio.
In my experience, ER (or UML) diagrams aren't the most useful artifact - with a large number of tables, diagrams (especially reverse engineered ones) are often a big convoluted mess that nobody learns anything from.
For my money, some good human-readable documentation (perhaps supplemented with diagrams of smaller portions of the system) will give you the most mileage. This will include, for each table:
Descriptions of what the table means and how it's functionally used (in the UI, etc.)
Descriptions of what each attribute means, if it isn't obvious
Explanations of the relationships (foreign keys) from this table to others, and vice-versa
Explanations of additional constraints and / or triggers
Additional explanation of major views & procs that touch the table, if they're not well documented already
With all of the above, don't document for the sake of documenting - documentation that restates the obvious just gets in people's way. Instead, focus on the stuff that confused you at first, and spend a few minutes writing really clear, concise explanations. That'll help you think it through, and it'll massively help other developers who run into these tables for the first time.
As others have mentioned, there are a wide variety of tools to help you manage this, like Enterprise Architect, Red Gate SQL Doc, and the built-in tools from various vendors. But while tool support is helpful (and even critical, in bigger databases), doing the hard work of understanding and explaining the conceptual model of the database is the real win. From that perspective, you can even do it in a text file (though doing it in Wiki form would allow several people to collaborate on adding to that documentation incrementally - so, every time someone figures out something, they can add it to the growing body of documentation instantly).
One thing to consider is the COMMENT facility built into the DBMS. If you put comments on all of the tables and all of the columns in the DBMS itself, then your documentation will be inside the database system.
Using the COMMENT facility does not make any changes to the schema itself, it only adds data to the USER_TAB_COMMENTS catalog table.
In our team we came to useful approach to documenting legacy large Oracle and SQL Server databases. We use Dataedo for documenting database schema elements (data dictionary) and creating ERD diagrams. Dataedo comes with documentation repository so all your team can work on documenting and reading recent documentation online. And you don’t need to interfere with database (Oracle comments or SQL Server MS_Description).
First you import schema (all tables, views, stored procedures and functions – with triggers, foreign keys etc.). Then you define logical domains/modules and group all objects (drag & drop) into them to be able to analyze and work on smaller chunks of database. For each module you create an ERD diagram and write top level description. Then, as you discover meaning of tables and views write a short description for each. Do the same for each column. Dataedo enables you to add meaningful title for each object and column – it’s useful if object names are vague or invalid. Pro version enables you to describe foreign keys, unique keys/constraints and triggers – which is useful but not essential to understand a database.
You can access documentation through UI or you can export it to PDF or interactive HTML (the latter is available only in Pro version).
Described here is a continuous process rather than one time job. If your database changes (eg. new columns, views) you should sync your documentation on regular basis (couple clicks with Dataedo).
See sample documentation:
http://dataedo.com/download/Dataedo%20repository.pdf
Some guidelines on documentation process:
Diagrams:
Keep your diagrams small and readable – just include important tables, relations and columns – only the one that have any meaning to understand big picture – primary/business keys, important attributes and relations,
Use different color for key tables in a diagram,
You can have more than one diagram per module,
You can add diagram to description of most important tables/with most relations.
Descriptions:
Don’t document the obvious – don’t write description “Document date” for document.date column. If there’s nothing meaningful to add just leave it blank,
If objects stored in tables have types or statuses it’s good to list them in general description of a table,
Define format that is expected, eg. “mm/dd/yy” for a date that is stored in text field,
List all known/important values an it’s meaning, e.g. for status column could be something like this: “Document status: A – Active, C – Cancelled, D – Deleted”,
If there’s any API to a table – a view that should be used to read data and function/procedures to insert/update data – list it in the description of table,
Describe where does rows/columns’ values come from (procedure, form, interface etc.) ,
Use “[deprecated]” mark (or similar) for columns that should not be used (title column is useful for this, explain which field should be used instead in description field).
We use Enterprise Architect for our DB definitions. We include stored procedures, triggers, and all table definitions defined in UML. The three brilliant features of the program are:
Import UML Diagrams from an ODBC Connection.
Generate SQL Scripts (DDL) for the entire DB at once
Generate Custom Templated Documentation of your DB.
You can edit your class / table definitions within the UML tool, and generate a fully descriptive with pictures included document. The autogenerated document can be in multiple formats including MSWord. We have just less than 100 tables in our schema, and it's quite managable.
I've never been more impressed with any other tool in my 10+ years as a developer. EA supports Oracle, MySQL, SQL Server (multiple versions), PostGreSQL, Interbase, DB2, and Access in one fell swoop. Any time I've had problems, their forums have answered my problems promptly. Highly recommended!!
When DB changes come in, we make then in EA, generate the SQL, and check it into our version control (svn). We use Hudson for building, and it auto-builds the database from scripts when it sees you've modified the checked-in sql.
(Mostly stolen from another answer of mine)
This answer extends Kieveli's above, which I upvoted. If your version of EA supports Object Role Modeling (conceptual design, vs. logical design = ERD), reverse engineer to that and then fill out the model with the expressive richness it gives you.
The cheap and lighter-weight option is to download Visiomodeler for free from MS, and do the same with that.
The ORM (call it ORMDB) is the only tool I've ever found that supports and encourages database design conversations with non-IS stakeholders about BL objects and relationships.
Reality check - on the way to generating your DDL, it passes through a full-stop ERD phase where you can satisfy your questions about whether it does anything screwy. It doesn't. It will probably show you weaknesses in the ERD you designed yourself.
ORMDB is a classic case of the principle that the more conceptual the tool, the smaller the market. Girls just want to have fun, and programmers just want to code.
A wiki solution supports hyperlinks and collaborative editing, but a wiki is only as good as the people who keep it organized and up to date. You need someone to take ownership of the document project, regardless of what tool you use. That person may involve other knowledgeable people to fill in the details, but one person should be responsible for organizing the information.
If you can't use a tool to generate an ERD by reverse engineering, you'll have to design one by hand using TOAD or VISIO.
Any ERD with hundreds of objects is probably useless as a guide for developers, because it'll be unreadable with so many boxes and lines. In a database with so many objects, it's likely that there are "sub-systems" of a few dozen tables and views each. So you should make custom diagrams of these sub-systems, instead of expecting a tool to do it for you.
You can also design a pseudo-ERD, where groups of tables are represented by a single object in one diagram, and that group is expanded in another diagram.
A single ERD or set of ERD's are not sufficient to document a system of this complexity, any more than a class diagram would be adequate to document an OO system. You'll have to write a document, using the ERD's as illustrations. You need text descriptions of the meaning and use of each table, each column, and the relationships between tables (especially where such relationships are implicit instead of represented by referential integrity constraints).
All of this is a lot of work, but it will be worth it. If there's a clear and up-to-date place where the schema is documented, the whole team will benefit from it.
Since you have the luxury of working with fellow developers that are in the same boat, I would suggest asking them what they feel would convey the needed information, most easily. My company has over 100 tables, and my boss gave me an ERD for a specific set tables that all connect. So also, you might want to try breaking 1 massive ERD into a bunch of smaller, manageable, ERDs.
Well, a picture tells a thousand words so I would recommend creating ER diagrams where you can view the relationship between tables at a glance, something that is hard to do with a text-only description.
You don't have to do the whole database in one diagram, break it up into sections. We use Visual Paradigm at work but EA is a good alternative as is ERWIN, and no doubt there are lots of others that are just as good.
If you have the patience, then using html to document the tables and columns makes your documentation easier to access.
If describing your databases to your end users is your primary goal Ooluk Data Dictionary Manager can prove useful. It is a web-based multi-user software that allows you to attach descriptions to tables and columns and allows full text searches on those descriptions. It also allows you to logically group tables using labels and browse tables using those labels. Tables as well as columns can be tagged to find similar data items across your database/databases.
The software allows you to import metadata information such as table name, column name, column data type, foreign keys into its internal repository using an API. Support for JDBC data sources comes built-in and can be extended further as the API source is distributed under ASL 2.0. It is coded to read the COMMENTS/REMARKS from many RDBMSs.You can always manually override the imported information. The information you can store about tables and columns can be extended using custom fields.
The Data Dictionary Manager uses the "data object" and "attribute" terminology instead of table and column because it isn't designed specifically for relational databases.
Notes
If describing technical aspects of your database such as triggers,
indexes, statistics is important this software isn't the best option.
It is however possible to combine a technical solution with this
software using hyperlink custom fields.
The software doesn't produce an ERD
Disclosure: I work at the company that develops this product.