Is it a data-type? And what language is it?
A real data type is a data type used in a computer program to
represent an approximation of a real number. Because the real numbers
are not countable, computers cannot represent them exactly using a
finite amount of information. Most often, a computer will use a
rational approximation to a real number.
https://en.wikipedia.org/wiki/Real_data_type
Real is known in SQL for example, but other languages have corresponding datatypes.
The specification "says" : DataTypes model Types whose instances are distinguished only by their value.
It means, as I understood, each instance as indentifier (technically, for me, it could seen as the address in memory if no other identifier is available) but two instances can have the same attributes values.
For example, you can have a class Person with an attribute name.
Two different instances of Person may have the same name because they have another identifier (they are not in the same address)
For Datatype, this is not possible because the identifier is the value.
Datatype is not PrimitiveType, PrimitiveType defines a predefined DataType, without any substructure. A PrimitiveType may have an algebra and operations defined outside of UML, for example, mathematically. (see 10.5.7 of specification document)
Real is a PrimitiveType defines as (see 21.1 of specification document)
:
An instance of Real is a value in the (infinite) set of real numbers. Typically an implementation
will internally represent Real numbers using a floating point standard such as ISO/IEC/IEEE
60559:2011 (whose content is identical to the predecessor IEEE 754 standard).
hope this helps you.
Related
It is my understanding that the memory layout of a Common Lisp object (bitwise tagging is defined by CLOS (classes).
I understand that every class has a corresponding type, but not every type has a corresponding class, because types can be compound (lists). I think that types are like logical constraints, as opposed to classes that are concrete "types" with a tagging scheme.
If this is correct, does the type system serve any other purpose other than being a logical constraint (such as specifying that an integer must be within a certain range, or that an array contains a particular type)?
If this is not correct, what purpose does the type system actually serve in light of CLOS? Thanks.
An object has only one class at a time, whereas it can satisfy multiple types.
The type system is a lattice, where you can compute a least-upper-bound and greatest lower bound of two types (using resp. or, and), and which admits a top type (T) and a bottom type (the NIL type, which is not the same as the NULL type).
An implementation of Common Lisp must be able to determine if a value belongs to a type, and that starts with atomic type specifiers, like character or integer, and grows with compound type specifiers (which can be defined by the user).
But whether this is done using tags or by static analysis is left to the implementation; in practice, CL is such that there are cases where you cannot statically determine the type of an object precisely (other than T), simply because an object can be redefined at a later point: you cannot assume its type is fixed (say: a function; that's why inlining or global declarations may help with type inference).
But if you have a scope in which a type can be guaranteed to be invariant, the the compiler is free to use unboxed data types to store values. Then you don't have tagged data. That is the case for local declaration of types for variables, but also for specialized arrays: once an array is built, its element type does not change over time and in some cases knowing that an array contains only (integer 0 15) elements can be used to pack data more efficiently.
CLOS was added to CL fairly late in the game (and it was not the only object system designed for CL)
Even with CLOS, the type system can be used by the compiler for optimizations and by user to reason about their code.
I think it's important to get away from the implementation of things, and instead concentrate on how the language thinks about them. Clearly the implementation needs to have enough information to know what sort of thing a given object is, and it's going to do that with some kind of 'tag' (which may or may not be some extra bits attached to the object -- some of it might be the leading bits of the address for instance). Below I've called this the 'representational type'. But you really have almost no access to that implementation detail from the language. It's tempting to think that, type-of tells you something which maps 1-1 onto the representational type, but that's not true: (type-of (cons 1 2) is permitted to return (cons integer integer) for instance, and I think it is probably allowed to return (cons integer number) or (cons (integer 1 1) (integer 2 2)). It's unlikely that there are distinct representational types for all of these: indeed there can't be since (type-of 1) can return (integer m n) for an infinite number of values of m & n.
So here's a take on how the language thinks about things, and the differences between classes and types, in CL.
Both the type system and the class system consist of a bounded lattice of types / classes. Being a lattice means that for any pair of objects there is a unique supremum (so, for types, a unique type of which both types are subtypes, and which has no subtypes for which that is true) and infimum (the reverse). Being bounded means there is a top & a bottom type / class.
Classes
Classes are first-class objects (you can store a class in a variable for instance).
All objects (including classes) belong to a class, and there is a well-defined operator to find the immediate class to which any object belongs.
There are a finite number of classes.
The class of an object corresponds fairly closely to its representational type, but not completely (there may be specialized array types which do not have corresponding classes for instance).
Classes can serve as types: (type-of 1 (class-of 1)) works, as does (subtypep (class-of 1) '(integer 0 1)) (the answers being t and nil, t respectively).
Types
Types are ways to denote collections of objects with common properties, but they are not themselves objects: they are, if anything, just names for collections of things -- the language specification calls these 'type specifiers'. In particular there are an infinite number of types: think of the type (integer m n) for instance. A small number of this infinitude of types correspond to representational types -- the actual information that tells the system what sort of thing something is -- but obviously most of them do not. There may be representational types which do not have corresponding types.
Types in practice serve three purposes I think.
Type information can tell the system about what representational types to use which can help it check that things are the right representational type and optimise things.
Type information can let the system make inferences which can help things significantly.
Type information can let programmers talk about what sort of things they are dealing with, even when that information is not helpful to the system. The system can treat such declarations as assertions about types which can make programs safer & easier to debug. This is an important reason for types: even if the system does not check them, it is useful for the person reading your code to know that it expects, say, an integer in [0, 30], ie an (integer 0 30). Indeed, even if the system does not automatically check declarations you can force checks with, say (check-type x '(integer 0 30) ...).
The second case is interesting. Let's say I have something which I have told the system is of type (double-float 0.0d0). This is very unlikely to be more useful in terms of representational type than double-float would be. But if I take the square root of this thing then knowing this type might be very useful indeed: the system can know that the result is a double-float, rather than a (complex double-float), and those types are extremely unlikely to be representationally the same. So the system can use my type declaration to make inferences in this way (and these inferences can cascade through the program). Note that classes can't do this (at least CL's classes can't), and neither can the representational type of an object: you need more information than that.
So yes, types serve a number of very useful purposes which aren't satisfied by classes.
A type is a set of values.
A type specifier is some way to succinctly represent a type.
Implementations may do all kinds of markings and registering in order to help them sort out the types of things, but that is not inherent to the concept of types.
A class is an object describing a set of other objects. Since having a succinct name for such a set (type) is quite useful, Common Lisp registers the class name as a type specifier for the corresponding set of objects. That is the whole relation of types to classes.
The type system defines different objects that do different things. The CLOS system is more so used for methods that define special behaviors for types in a more logical way for some programmers. Coming from Java, the CLOS System was more logical and systematic for me, so it has a role for some programmers. I like to think of the CLOS system as a class in Java such as the Integer class, and the type system similar to primitives in Java. The CLOS system simply helps you extend your objects with methods in a more systematic way than creating a structure imho.
I'm learning software testing now, just wondering what is difference between equivalence class testing and input domain partitioning, seems like both of them about to partition input domain.
Frankly saing, during my career as software testing engineer I haven't met a lot of mentions about input domain partitions.
But nevertheless this term exists and let's try to take a look is there a difference between equivalence class testing and input domain partitioning?
Equivalence class technique divides possible test data for, let's say application module, into partitions of equivalent data. They're "equivalent" because any member of that partition can perfectly represent the other member of that partition, and theoretically you need only one test using one of the partitions' members in order to make testing of that partition enough sufficient. Moreover the partitions should not overlap.
Yes I know, that's a little bit cumbersome, but let's take a look on the example: you have an input field on the web page which accepts all kind of chars but up to 256 of them. It gives you following equivalence partitions (simplified):
Char types:
only letters
only numbers
only special chars
mixed chars (letters + numbers + spec. chars)
Char quantity:
0
>0
<256
256
Each of that equivalence partitions has sub-partitions, e.g. "letters":
Big letters
Small letters
Mixed letters
That means that in order to sufficiently test "letters partitions" you have to design test case which will include at least one of those sub-partitions. Let's say it will be "letters -> Big letters": "TEST INPUT STRING". Take a look that here we've also combined our test string with "Char quantity - >0" equivalence partition.
So basicly saying combining sub-partitions of "Char types" and "Char quantities" partitions, you'll be able to design a minimum test set for testing input data of that field.
From the other side input domain for a program contains all the possible inputs to that program which is farely equal to equivalence classes of possible inputs of the application module.
Sometimes the ones who speak about input domain for a program, say also about regions which is the same thing as sub-partition of equivalence partitions. Moreover those input domains (and accordingly regions) must not overlap (so must they not within equivalence partition testing).
With all that said I would consider those two terms as ones, that describe the same matter but using different words.
I want to have real-valued exponents (not just integers) for the terminal variables.
For example, lets say I want to evolve a function y = x^3.5 + x^2.2 + 6. How should I proceed? I haven't seen any GP implementations which can do this.
I tried using the power function, but sometimes the initial solutions have so many exponents that the evaluated value exceeds 'double' bounds!
Any suggestion would be appreciated. Thanks in advance.
DEAP (in Python) implements it. In fact there is an example for that. By adding the math.pow from Python in the primitive set you can acheive what you want.
pset.addPrimitive(math.pow, 2)
But using the pow operator you risk getting something like x^(x^(x^(x))), which is probably not desired. You shall add a restriction (by a mean that I not sure) on where in your tree the pow is allowed (just before a leaf or something like that).
OpenBeagle (in C++) also allows it but you will need to develop your own primitive using the pow from <math.h>, you can use as an example the Sin or Cos primitive.
If only some of the initial population are suffering from the overflow problem then just penalise them with a poor fitness score and they will probably be removed from the population within a few generations.
But, if the problem is that virtually all individuals suffer from this problem, then you will have to add some constraints. The simplest thing to do would be to constrain the exponent child of the power function to be a real literal - which would mean powers would not be allowed to be nested. It depends on whether this is sufficient for your needs though. There are a few ways to add constraints like these (or more complex ones) - try looking in to Constrained Syntactic Structures and grammar guided GP.
A few other simple thoughts: can you use a data-type with a larger range? Also, you could reduce the maximum depth parameter, so that there will be less room for nested exponents. Of course that's only possible to an extent, and it depends on the complexity of the function.
Integers have a different binary representation than reals, so you have to use a slightly different bitstring representation and recombination/mutation operator.
For an excellent demonstration, see slide 24 of www.cs.vu.nl/~gusz/ecbook/slides/Genetic_Algorithms.ppt or check out the Eiben/Smith book "Introduction to Evolutionary Computing Genetic Algorithms." This describes how to map a bit string to a real number. You can then create a representation where x only lies within an interval [y,z]. In this case, choose y and z to be the of less magnitude than the capacity of the data type you are using (e.g. 10^308 for a double) so you don't run into the overflow issue you describe.
You have to consider that with real-valued exponents and a negative base you will not obtain a real, but a complex number. For example, the Math.Pow implementation in .NET says that you get NaN if you attempt to calculate the power of a negative base to a non-integer exponent. You have to make sure all your x values are positive. I think that's the problem that you're seeing when you "exceed double bounds".
Btw, you can try the HeuristicLab GP implementation. It is very flexible with a configurable grammar.
In Eiffel it is said that we should "loosen the pre-conditions and tightening the post-conditions", but I am not sure what this means. How does this benefit/is benefited by sub-classing?
Thank you
In Design by Contract, you specify a set of pre-conditions and a set of post-conditions for a function. For example, let's say you were writing a memory allocation function. You require that it accept a positive integer as input, and produces an evenly aligned pointer as its result.
Loosening the precondition means that when you create a derived class, it has to accept any input that the base class could accept, but might accept other inputs as well. Using the example above, a derived class could be written to accept a non-negative integer instead of just positive integers.
On the result side, you have to ensure that the result from a derived function meets all the requirements placed on the base function -- but it can also add more restrictions. For example, a derived version of the function above could decide to only produce results that were multiples of 8. Every multiple of 8 is clearly even, so it still meets the requirement of the base function, but has imposed an additional restriction as well.
The opposite would not work: if the base class function allows non-negative integers as input, then the derived class must continue to accept all non-negative integers as input. Attempting to change it to accept only positive integers (i.e., reject 0, which is allowed by the base class) would not be allowed -- your derived class can no longer be substituted for the base version under all circumstances.
Likewise with results: if the base class imposed a "multiple of 8" requirement on a result, the derived version must also ensure that all results are multiples of 8. Returning 2 or 4 would violate that requirement.
I'm working with data that is natively supplied as rational numbers. I have a slick generic C# class which beautifully represents this data in C# and allows conversion to many other forms. Unfortunately, when I turn around and want to store this in SQL, I've got a couple solutions in mind but none of them are very satisfying.
Here is an example. I have the raw value 2/3 which my new Rational<int>(2, 3) easily handles in C#. The options I've thought of for storing this in the database are as follows:
Just as a decimal/floating point, i.e. value = 0.66666667 of various precisions and exactness.
Pros: this allows me to query the data, e.g. find values < 1.
Cons: it has a loss of exactness and it is ugly when I go to display this simple value back in the UI.
Store as two exact integer fields, e.g. numerator = 2, denominator = 3 of various precisions and exactness.
Pros: This allows me to precisely represent the original value and display it in its simplest form later.
Cons: I now have two fields to represent this value and querying is now complicated/less efficient as every query must perform the arithmetic, e.g. find numerator / denominator < 1.
Serialize as string data, i.e. "2/3". I would be able to know the max string length and have a varchar that could hold this.
Pros: I'm back to one field but with an exact representation.
Cons: querying is pretty much busted and pay a serialization cost.
A combination of #1 & #2.
Pros: easily/efficiently query for ranges of values, and have precise values in the UI.
Cons: three fields (!?!) to hold one piece of data, must keep multiple representations in sync which breaks D.R.Y.
A combination of #1 & #3.
Pros: easily/efficiently query for ranges of values, and have precise values in the UI.
Cons: back down to two fields to hold one piece data, must keep multiple representations in sync which breaks D.R.Y., and must pay extra serialization costs.
Does anyone have another out-of-the-box solution which is better than these? Are there other things I'm not considering? Is there a relatively easy way to do this in SQL that I'm just unaware of?
If you're using SQL Server 2005 or 2008, you have the option to define your own CLR data types:
Beginning with SQL Server 2005, you
can use user-defined types (UDTs) to
extend the scalar type system of the
server, enabling storage of CLR
objects in a SQL Server database. UDTs
can contain multiple elements and can
have behaviors, differentiating them
from the traditional alias data types
which consist of a single SQL Server
system data type.
Because UDTs are accessed by the
system as a whole, their use for
complex data types may negatively
impact performance. Complex data is
generally best modeled using
traditional rows and tables. UDTs in
SQL Server are well suited to the
following:
Date, time, currency, and extended numeric types
Geospatial applications
Encoded or encrypted data
If you can live with the limitations, I can't imagine a better way to map data you're already capturing in a custom class.
I would probably go with Option #4, but use a calculated column for the 3rd column to avoid the sync/DRY issue (and also means you actually only store 2 columns, avoiding the "three fields" issue).
In SQL server, calculated column is defined like so:
CREATE TABLE dbo.Whatever(
Numerator INT NOT NULL,
Denominator INT NOT NULL,
Value AS (Numerator / Denominator) PERSISTED
)
(note you may have to do some type conversion and verification that Denominator is not zero, etc).
Also, SQL 2005 added a PERSISTED calculated column that would get rid of the calculation at query time.
How much precision do you need?
The language, C# or otherwise, will round 2/3rds at a given position in the precision. If it's acceptable for whatever you are working on to use decimal values of say scientific notation of 10, then set the precision accordingly in the db.
If the precision is really a concern, then separate the numerator & denominator. This would ensure you always have access to whatever precision you want, and you can use a computed column to represent the value for quick filtering:
numerator INT,
denominator INT,
result AS CASE WHEN denominator > 0 THEN numerator / denominator ELSE NULL END
I have experimented a little bit with using the geometry data type in SQL Server 2008 to store and manipulate rational numbers. Basically, I assume that the numerator goes in the X slot and the denominator goes in the Y slot of a fictitious geometry point.
This was good for my needs, but it might be useless for yours. That will depend on what your priorities are (performance, code readability, etc.). I personally found that T-SQL for geometry data manipulation is hard to write and read.
how much precision are you looking at ? double/float provide decent precision(in my opinion). Am pretty sure scientific/astronomical data need a lot more precision that that. I do know that libraries like matlab and mathematica are good at these. I found that you can use mathematica with your .net program. Here is the link
Edit: adding more links and quotes
"When Mathematica operates on rational numbers, it gives an exact result no matter how many digits are required" from here
Another good read, but you would have to implement it I guess