How to show that something increases relational expressive power? - sql

How do I show that something increases relational expressive power? For example I have been given a problem in which I need to show whether adding some certain functionality to the select-project-join queries of SQL increases the expressive power. Do I give an example and show that it is not expressible?

First you must decide what is that is being expressed by two notations. (Ie what it is that they are expressing, ie are expressive of, ie are denoting.) Otherwise, the problem doesn't make much sense.
Eg: As long as two notations' sets of expressions are countably infinite they can be set in 1:1 correspondence. So anything that one set's expressions can express the corresponding expression from the other set can be assigned to express. So they are in this trivial sense equally expressive. (Which sense is, essentially, equally expressive of each other's expressions.)
In being told what our two notations are expressing we are generally given for each:
some primitive expressions
some rules for generating expressions
some primitive things
some rules for generating things
a mapping from expressions to things
Sometimes the mapping is from terminal expressions to primitive things and from non-terminal expressions to structured things, but it doesn't have to be like that.
To show that one notation is more expressive (of whatever they are expressing) is to show that one notation can express all the things that the other can plus some that it cannot.
It is ok for the "things" to actually be expressions of one of the notations, with a trivial mapping from each of its expressions to itself, and the other (the less expressive) notation mapping only to a proper subset of it (the more expressive). (The reason that expressibility here is able to differ from the example above is that here each expresssion of the two notations is being defined to express something different than it is in that example.)
See discussions in the Alice book or Maier's book. These deal with database querying languages. Eg expressively equivalent versions of relational algebra, relational tuple calculus and relational domain calculus, and also other languages like predicate logic and versions of Datalog.

Related

Why are operators different in different languages?

Why do operators have different functions in different languages?
Because different languages exist to solve different problems, are developed at different times by different people with different levels of knowledge, under different outside constraints.
Depending on what problems a language tries to solve (or tries to solve first), some of the easier characters to type might have already been used for more common or newer concepts by the time a new operator is added.
E.g. PHP wasn't an object-oriented language at first, so used . as the concatenation operator. Then they added object-oriented PHP, and now they needed a different operator for identifying fields.
OTOH in a language like HyperTalk, which doesn't have data structures, you do not need a field-resolution operator at all.

Are there conventional synonyms used to replace keywords reserved in programming languages?

The main examples of the words I mean are "object", "value" etc. In many (well, not really, but the chances are on some occasions at least) cases you may happen to find yourself willing to name a variable etc. of yours this way.
Another example I have stumbled upon in my practice is "try" which represents both the keyword (in many C-like and other languages) used in exception handling and the currency of Turkey. But this is an example just for fun, I doubt there are any common practices known for this particular case (though I feel like there may be for the previous).
What do people do in such cases? What are some synonyms for an object, a value etc reasonable in the programming and data modelling context?
For example imagine you are developing an object database, manipulating objects, properties and values (rather than documents, fields and... eh... values) is, for some reason, among the key ideas of its philosophy and you really don't want to use words too distinct from these semantically. What words would you use to replace the reserved ones while keeping the sense very close to that of theirs?
The easiest solution to come into my mind so far it to use misspelled (or spelled in a different language orthography) varieties of the same words like "objekt", "walue" etc. but although this can do the job this just disgusts me so much I really don't want to accept going this way ever.
UPDATE: Indeed, in some specific cases (particular languages) using a different case (which, some times, may go against the case aspect of the commynity and/or the company naming convention by the way) and/or namespaces (which have been introduced almost exactly for this) may solve the problem at least partially but I am still interested in alternatives as I believe actually duplicating a system keyword is a thing one should at least think about avoiding (might there be a way to do it easily without accepting compromises considered too serious) in every case.
I am even considering writing script that would scrape through GitHub to analyse the common code elements naming vocabulary but I think it is always a good idea to ask first rather than to "reinvent a bicycle", perhaps somebody has done something like this already.
UPDATE2: Please do me a favour and consider the following with applicable degree of objectivity before voting to close. With all do respect I would like to emphasise that the actual degree of subjectivity of this question is excusably low (though, I admit, somewhat above zero anyway). The only real flaw of it is that it might perhaps fit the English site better but I believe the audience of StackOverflow is much more relevant (generally informed in a much more relevant way) to the context. The actual goal of publishing this question is to highlight a problem that is fairly easy to understand clearly enough and which can not be denied of existence (though its importance may be questionable so far) but is spoken of too little (as importance of code clarity and semantic relevance is increasing, IMHO, code as a media is quickly moving towards obtaining bigger cultural (in the broad meaning of this word) importance than of books). And to let people share the ways of addressing it in practice they know of.
Capitalization: Often, a different capitalization instead of a synonym does the trick, as most language keywords are case-sensitive. E.g. object = new Object();
Prefix / Postfix: Another often encountered solution is to write myObject = new Object(). Which one you chose really depends on the naming conventions you follow. For private class fields, some developers use an underscore, e.g. this._object indicating a private access modifier.
Specification: In most cases however, you can find a more specific word describing the role - such as instance, parent, child or argument - or the subclass - such as integer or n instead of a generic number datatype - of your object.
In addition to the above, many language communities follow de-facto conventions such as cls for Class, obj for Object, me or self for this etc.

What is the best way to represent a form with hundreds of questions in a model

I am trying to design an income return tax software.
What is the best way to represent/store a form with hundreds of questions in a model?
Just for this example, I need at least 6 models (T4, T4A(OAS), T4A(P), T1032, UCCB, T4E) which possibly contain hundreds of fields.
Is it by creating hundred of fields? Storing values in a map? An Array?
One very generic approach could be XML
XML allows you to
nest your data to any degree
combine values and meta information (attributes and elements)
describe your data in detail with XSD
store it externally
maintain it easily
even combine it with additional information (look at processing instructions)
and (last but not least) store the real data in almost the same format as the modell...
and (laster but even not leaster :-) ) there is XSLT to transform your XML data into any other format (such as HTML for nice presentation)
There is high support for XML in all major languages and database systems.
Another way could be a typical parts list (or bill of materials/BOM)
This tree structure is - typically - implemented as a table with a self-referenced parentID. Working with such a table needs a lot of recursion...
It is very highly recommended to store your data type-safe. Either use a character storage format and a type identifier (that means you have to cast all your values here and there), or you use different type-safe side tables via reference.
Further more - if your data is to be filled from lists - you should define a datasource to load a selection list dynamically.
Conclusio
What is best for you mainly depends on your needs: How often will the modell change? How many rules are there to guarantee data's integrity? Are you using a RDBMS? Which language/tools are you using?
With a case like this, the monolithic aggregate is probably unavoidable (unless you can deduce common fields). I'm going to exclude RDBMS since the topic seems to focus more on lower-level data structures and a more proprietary-style solution, though that could be a very valid option that can manage all these fields.
In this case, I think it ceases to become so much about formalities as just daily practicalities.
Probably worst from that standpoint in this case is a formal object aggregating fields, like a class or struct with a boatload of data members. Those tend to be the most awkward and the most unattractive as monoliths, since they tend to have a static nature about them. Depending on the language, declaration/definition/initialization could be separate which means 2-3 lines of code to maintain per field. If you want to read/write these fields from a file, you have to write a separate line of code for each and every field, and maintain and update all that code if new fields added or existing ones removed. If you start approaching anything resembling polymorphic needs in this case, you might have to write a boatload of branching code for each and every field, and that too has to be maintained.
So I'd say hundreds of fields in a static kind of aggregate is, by far, the most unmaintainable.
Arrays and maps are effectively the same thing to me here in a very language-agnostic sense provided that you need those key/value pairs, with only potential differences in where you store the keys and what kind of algorithmic complexity is involved. Whatever you do, probably a key search in this monolith should be logarithmic time or better. 'Maps/associative arrays' in most languages tend to inherently have this quality.
Those can be far more suitable, and you can achieve the kind of runtime flexibility that you like on top of those (like being able to manage these from a file and add the fields on the fly with no pre-existing knowledge). They'll be far more forgiving here.
So if the choice is between a bunch of fields in a class and something resembling a map, I'd suggest going for a map. The dynamic nature of it will be far more forgiving for these kinds of cases and will typically far outweigh the compile-time benefits of, say, checking to make sure a field actually exists and producing a syntax error otherwise. That kind of checking is easy to add back in and more if we just accept that it will occur at runtime.
An exception that might make the field solution more appealing is if you involve reflection and more dynamic techniques to generate an object with the appropriate fields on the fly. Then you get back those dynamic benefits and flexibility at runtime. But that might be more unwieldy to initialize the structure, could involve leaning a lot more heavily on heavy-duty (and possibly very computationally-expensive) introspection and type manipulation and code generation mechanisms, and also end up with more funky code that's hard to maintain.
So I think the safest bet is the map or associative array, and a language that lets you easily add new fields, inspect existing ones, etc. with very fast turnaround. If the language doesn't inherently have that quality, you could look to an external file to dynamically add fields, and just maintain the file.

Modularizing SQL even if only syntactic sugar

Is there a way to modularize SQL code so that is more readable and testable?
My SQL code often becomes a long complicated series of nested joins, inner joins, etc. that are hard to write and hard to debug. By contrast, in a procedural language like Javascript or Java, one would pinch off discrete elements as separate functions you would call by name.
Yes, one could write each as entirely separate queries, stored in the database, or as stored procedures, but often I don't want to change/clutter the database, just query it is fine, especially if the DBA doesn't wish to grant write permissions to all users.
For instance, conceptually a complex query might be easily described in pseudocode like this:
(getCustomerProfile)
left join
(getSummarizedCustomerTransactionHistory)
using (customerId)
left join
(getGeographicalSummaries)
using (region, vendor)
...
I realize that a lot is written on the topic from a theoretical vantage (a few links below), but I'm just looking for a way to make the code easier to write correctly, and easier to read once written. Perhaps just syntactic sugar to abstract the complexity from sight, if not from execution, that compiles down in the literal SQL I'm trying to not look at. By analogy...
Stylus: CSS ::
CoffeeScript : Javascript ::
SAS Macro language: SAS language ::
? : SQL
And if the specific SQL flavor matters, most of my work is in PostgresQL.
http://lambda-the-ultimate.org/node/2440
Code reuse and modularity in SQL
Are Databases and Functional Programming at odds?
In most databases, you can do what you want using CTEs (Common Table Expressions):
with CustomerProfile as (
getCustomerProfile
),
SummarizedCustomerTransactionHistory as (
getSummarizedCustomerTransactionHistory
),
GeographicalSummaries as (
getGeographicalSummaries
)
select <whatever>
This works for a single query. It has the advantage that you can define a CTE once, but use it multiple times. Also, I often define a CTE called const that has constant values.
The next step is to take these constructs and create views from them. This is especially useful when sharing code among multiple modules, to ensure constant definitions. In some databases, you can put indexes on the views to "instantiate" them, further optimizing processing.
Finally, I recommend wrapping inserts/updates/deletes in stored procedures. This allows you to do have a consistent framework.
Two more comments though. First, SQL is often used for transactional or reporting systems. Often, once you get the data in the right format for the purpose, the data speaks for itself. You example might just be asking for a data mart that has three tables devoted to those three subject areas, which get populated once per week or once per day.
And, SQL is not an idea language for abstraction. With good practice, naming conventions, and indentation style, you can make it useful. I sorely miss certain things from "real" languages, such as macros, error handling (why data errors are so hard to identify and handle is beyond me), consistent methods for common functionality (can someone say group string concatenation), and some other features. That said, because it is data centric and readily parallelizable, it is more useful for me than most other languages.
The issue here is you need to think about data in a relational way. I do not believe this type of abstraction correctly fits into the relational model. In terms of making SQL modular, that is what stored procedures and/or functions are for. Notice how these have the same characteristics as methods do in Java. You can abstract out that way. Another way is to abstract the data that is what you care about into materialized views. By doing this you can put a regular view (see virtual function) over top of these materialized views which allow you to test the structure of the data without touching the "raw" tables.

In non-procedural languages, what specifies how things are to be done?

If you compare C vs SQL, this is the argument:
In contrast to procedural languages
such as C, which describe how things
should be done, SQL is nonprocedural
and describes what should be done.
So, the how part for languages like SQL is specified by the language itself, is it? What if I want to change the way some query works. Suppose I want to change the way a SELECT is handled. Is that possible?
So, the how part for languages like
SQL is specified by the language
itself, is it?
Not strictly by the language (ie. SQL), but normally by the database and its optimiser. As such, even where the same data is being queried from tables with the same structures and the same indexes, some databases will build the resultset in a different way to others.
Suppose I want to change the way a
SELECT is handled. Is that possible?
To some degree, yes. You can either:
Rewrite the query, to achieve the same result a different way, or
Use hinting - http://en.wikipedia.org/wiki/Hint_%28SQL%29
Neither of these directly instruct the database engine which approach to use, but both of them will affect how the resultset is returned - this is likely to vary between databases.
Additionally, I understand that some databases have additional interfaces that allow more low-level interaction with the database engine, enabling greater control over how a query is built than is possible from plain SQL. (However, your question did specify SQL.)
This is actually exaggerating the difference. There is no clear-cut point at which one is telling how things are done and the other only telling what it done. Rather, one may have to specify what/how things are done at a greater level of detail than the other. A typical SQL implementation allows the user to control such things as what indexes are used (or ignored), what kind of locking to do, and so on.
If you were to do the same job in C, you would (at some point) have to specify a great deal more detail (unless you used something like ODBC). Nonetheless, you're still telling what should be done, not all the details of how it should be done (e.g., despite being about as low-level as possible short of assembly language, C will still do some type conversions automatically, so you don't have to tell it how to do something like adding an integer to a floating point number -- you just tell it to add them, and it handles the details).
Bottom line: trying to talk about one as procedural and the other as non-procedural is misleading. SQL doesn't always require as much detail, but it's a difference of degree, not really "how" versus "what".