Are extensible records useless in Elm 0.19? - elm

Extensible records were one of the most amazing Elm's features, but since v0.16 adding and removing fields is no longer available. And this puts me in an awkward position.
Consider an example. I want to give a name to a random thing t, and extensible records provide me a perfect tool for this:
type alias Named t = { t | name: String }
„Okay,“ says the complier. Now I need a constructor, i.e. a function that equips a thing with specified name:
equip : String -> t -> Named t
equip name thing = { thing | name = name } -- Oops! Type mismatch
Compilation fails, because { thing | name = ... } syntax assumes thing to be a record with name field, but type system can't assure this. In fact, with Named t I've tried to express something opposite: t should be a record type without its own name field, and the function adds this field to a record. Anyway, field addition is necessary to implement equip function.
So, it seems impossible to write equip in polymorphic manner, but it's probably not a such big deal. After all, any time I'm going to give a name to some concrete thing I can do this by hands. Much worse, inverse function extract : Named t -> t (which erases name of a named thing) requires field removal mechanism, and thus is not implementable too:
extract : Named t -> t
extract thing = thing -- Error: No implicit upcast
It would be extremely important function, because I have tons of routines those accept old-fashioned unnamed things, and I need a way to use them for named things. Of course, massive refactoring of those functions is ineligible solution.
At last, after this long introduction, let me state my questions:
Does modern Elm provides some substitute for old deprecated field addition/removal syntax?
If not, is there some built-in function like equip and extract above? For every custom extensible record type I would like to have a polymorphic analyzer (a function that extracts its base part) and a polymorphic constructor (a function that combines base part with additive and produces the record).
Negative answers for both (1) and (2) would force me to implement Named t in a more traditional way:
type Named t = Named String t
In this case, I can't catch the purpose of extensible records. Is there a positive use case, a scenario in which extensible records play critical role?

Type { t | name : String } means a record that has a name field. It does not extend the t type but, rather, extends the compiler’s knowledge about t itself.
So in fact the type of equip is String -> { t | name : String } -> { t | name : String }.
What is more, as you noticed, Elm no longer supports adding fields to records so even if the type system allowed what you want, you still could not do it. { thing | name = name } syntax only supports updating the records of type { t | name : String }.
Similarly, there is no support for deleting fields from record.
If you really need to have types from which you can add or remove fields you can use Dict. The other options are either writing the transformers manually, or creating and using a code generator (this was recommended solution for JSON decoding boilerplate for a while).
And regarding the extensible records, Elm does not really support the “extensible” part much any more – the only remaining part is the { t | name : u } -> u projection so perhaps it should be called just scoped records. Elm docs itself acknowledge the extensibility is not very useful at the moment.

You could just wrap the t type with name but it wouldn't make a big difference compared to approach with custom type:
type alias Named t = { val: t, name: String }
equip : String -> t -> Named t
equip name thing = { val = thing, name = name }
extract : Named t -> t
extract thing = thing.val

Is there a positive use case, a scenario in which extensible records play critical role?
Yes, they are useful when your application Model grows too large and you face the question of how to scale out your application. Extensible records let you slice up the model in arbitrary ways, without committing to particular slices long term. If you sliced it up by splitting it into several smaller nested records, you would be committed to that particular arrangement - which might tend to lead to nested TEA and the 'out message' pattern; usually a bad design choice.
Instead, use extensible records to describe slices of the model, and group functions that operate over particular slices into their own modules. If you later need to work accross different areas of the model, you can create a new extensible record for that.
Its described by Richard Feldman in his Scaling Elm Apps talk:
https://www.youtube.com/watch?v=DoA4Txr4GUs&ab_channel=ElmEurope
I agree that extensible records can seem a bit useless in Elm, but it is a very good thing they are there to solve the scaling issue in the best way.

Related

Golang SQL rows.Scan function for all fields of generic type

I want to use the Scan() function from the sql package for executing a select statement that might (or not) return multiple rows, and return these results in my function.
I´m new to Golang generics, and am confused about how to achieve this.
Usually, we would use the Scan function on a *sql.Rows and provide the references to all fields of our expected 'result type' we want to read the rows into, e.g.:
var alb Album
rows.Scan(&alb.ID, &alb.Title, &alb.Artist,
&alb.Price, &alb.Quantity)
where Album is a struct type with those five fields shown.
Now, for the purpose of not writing a similar function N times for every SQL table I have, I want to use a generic type R instead. R is of generic interface type Result, and I will define this type as one of N different structs:
type Result interface {
StructA | StructB | StructC
}
func ExecSelect[R Result](conn *sql.DB, cmd Command, template R) []R
How can I now write rows.Scan(...) to apply the Scan operation on all fields of my struct of R´s concrete type? e.g. I would want to have rows.Scan(&res.Field1, &res.Field2, ...) where res is of type R, and Scan should receive all fields of my current concrete type R. And do I actually need to provide a 'template' as argument of R´s concrete type, so that at runtime it becomes clear which struct is now relevant?
Please correct me on any mistake I´m making considering the generics.
This is a poor use case for generics.
The arguments to the function sql.Rows.Scan are supposed to be the scan destinations, i.e. your struct fields, one for each column in the result set, and within the generic function body you do not have access to the fields of R type parameter.
Even if you did, the structs in your Result constraint likely have different fields...? So how do you envision writing generic code that works with different fields?
You might accomplish what you want with a package that provides arbitrary struct scanning like sqlx with facilities like StructScan, but that uses reflection under the hood to map the struct fields into sql.Rows.Scan arguments, so you are not getting any benefit at all with generics.
If anything, you are making it worse, because now you have the additional performance overheads of using type parameters.

Spock Extension - Extracting variable names from Data Tables

In order to extract data table values to use in a reporting extension for Spock, I am using the
following code:
#Override
public void beforeIteration(IterationInfo iteration) {
Object[] values = iteration.getDataValues();
}
This returns to me the reference to the objects in the data table. However, I would like to get
the name of the variable that references the value.
For example, in the following test:
private static User userAge15 = instantiateUserByAge(15);
private static User userAge18 = instantiateUserByAge(18);
private static User userAge19 = instantiateUserByAge(19);
private static User userAge40 = instantiateUserByAge(40);
def "Should show popup if user is 18 or under"(User user, Boolean shouldShowPopup) {
given: "the user <user>"
when: "the user do whatever"
...something here...
then: "the popup is shown is <showPopup>"
showPopup == shouldShowPopup
where:
user | shouldShowPopup
userAge15 | true
userAge18 | true
userAge19 | false
userAge40 | false
}
Is there a way to receive the string “userAge15”, “userAge18”, “userAge19”, “userAge40” instead of their values?
The motivation for this is that the object User is complex with lots of information as name, surname, etc, and its toString() method would make the where clause unreadable in the report I generate.
You can use specificationContext.currentFeature.dataVariables. It returns a list of strings containing the data variable names. This should work both in Spock 1.3 and 2.0.
Edit: Oh sorry, you do not want the data variable names ["a", "b", "expected"] but ["test1", "test1", "test2"]. Sorry, I cannot help you with that and would not if I could because that is just a horrible way to program IMO. I would rather make sure the toString() output gets shortened or trimmed in an appropriate manner, if necessary, or to (additionally or instead) print the class name and/or object ID.
Last but not least, writing tests is a design tool uncovering potential problems in your application. You might want to ask yourself why toString() results are not suited to print in a report and refactor those methods. Maybe your toString() methods use line breaks and should be simplified to print a one-line representation of the object. Maybe you want to factor out the multi-line representation into other methods and/or have a set of related methods like toString(), toShortString(), toLongString() (all seen in APIs before) or maybe something specific like toMultiLineString().
Update after OP significantly changed the question:
If the user of your extension feels that the report is not clear enough, she could add a column userType to the data table, containing values like "15 years old".
Or maybe simpler, just add an age column with values like 15, 18, 19, 40 and instantiate users directly via instantiateUserByAge(age) in the user column or in the test's given section instead of creating lots of static variables. The age value would be reported by your extension. In combination with an unrolled feature method name using #age this should be clear enough.
Is creating users so super expensive you have to put them into static variables? You want to avoid statics if not really necessary because they tend to bleed over side effects to other tests if those objects are mutable and their internal state changes in one test, e.g. because someone conveniently uses userAge15 in order to test setAge(int). Try to avoid premature optimisations via static variables which often just save microseconds. Even if you do decide to pre-create a set of users and re-use them in all tests, you could put them into a map with the age being the key and conveniently retrieve them from your feature methods, again just using the age in the data table as an input value for querying the map, either directly or via a helper method.
Bottom line: I think you do not have to change your extension in order to cater to users writing bad tests. Those users ought to learn how to write better tests. As a side effect, the reports will also look more comprehensive. 😀

Common return type for all ANTLR visitor methods

I'm writing a parser for an old proprietary report specification with ANTLR and I'm currently trying to implement a visitor of the generated parse tree extending the autogenerated abstract visito class.
I have little experience both with ANTLR (which I learned only recently) and with the visitor pattern in general, but if I understood it correctly, the visitor should encapsulate one single operation on the whole data structure (in this case the parse tree), thus sharing the same return type between each Visit*() method.
Taking an example from The Definitive ANTLR 4 Reference book by Terence Parr, to visit a parse tree generated by a grammar that parses a sequence of arithmetic expressions, it feels natural to choose the int return type, as each node of the tree is actually part of the the arithmetic operation that contributes to the final result by the calculator.
Considering my current situation, I don't have a common type: my grammar parses the whole document, which is actually split in different sections with different responsibilities (variable declarations, print options, actual text for the rows, etc...), and I can't find a common type between the result of the visit of so much different nodes, besides object of course.
I tried to think to some possible solutions:
I firstly tried implementing a stateless visitor using object as
the common type, but the amount of type casts needed sounds like a
big red flag to me. I was considering the usage of JSON, but I think
the problem remains, potentially adding some extra overhead in the
serialization process.
I was also thinking about splitting the visitor in more smaller
visitors with a specific purpose (get all the variables, get all the
rows, etc.), but with this solution for each visitor I would
implement only a small subset of the method of the autogenerated
interface (as it is meant to support the visit of the whole tree),
because each visiting operation would probably focus only on a
specific subtree. Is it normal?
Another possibility could be to redesign the data structure so that
it could be used at every level of the tree or, better, define a generic
specification of the nodes that can be used later to build the data
structure. This solution sounds good, but I think it is difficult to
apply in this domain.
A final option could be to switch to a stateful visitor, which
incapsulates one or more builders for the different sections that
each Visit*() method could use to build the data structure
step-by-step. This solution seems to be clean and doable, but I have
difficulties to think about how to scope the result of each visit
operation in the parent scope when needed.
What solution is generally used to visit complex ANTLR parse trees?
ANTLR4 parse trees are often complex because of recursion, e.g.
I would define the class ParsedDocumentModel whose properties would added or modified as your project evolves (which is normal, no program is set in stone).
Assuming your grammar be called Parser in the file Parser.g4, here is sample C# code:
public class ParsedDocumentModel {
public string Title { get; set; }
//other properties ...
}
public class ParserVisitor : ParserBaseVisitor<ParsedDocumentModel>
{
public override ParsedDocumentModel VisitNounz(NounzContext context)
{
var res = "unknown";
var s = context.GetText();
if (s == "products")
res = "<<products>>"; //for example
var model = new ParsedDocumentModel();
model.Title = res; //add more info...
return model;
}
}

F# Record vs Class

I used to think of a Record as a container for (immutable) data, until I came across some enlightening reading.
Given that functions can be seen as values in F#, record fields can hold function values as well. This offers possibilities for state encapsulation.
module RecordFun =
type CounterRecord = {GetState : unit -> int ; Increment : unit -> unit}
// Constructor
let makeRecord() =
let count = ref 0
{GetState = (fun () -> !count) ; Increment = (fun () -> incr count)}
module ClassFun =
// Equivalent
type CounterClass() =
let count = ref 0
member x.GetState() = !count
member x.Increment() = incr count
usage
counter.GetState()
counter.Increment()
counter.GetState()
It seems that, apart from inheritance, there’s not much you can do with a Class, that you couldn’t do with a Record and a helper function. Which plays better with functional concepts, such as pattern matching, type inference, higher order functions, generic equality...
Analyzing further, the Record could be seen as an interface implemented by the makeRecord() constructor. Applying (sort of) separation of concerns, where the logic in the makeRecord function can be changed without risk of breaking the contract, i.e. record fields.
This separation becomes apparent when replacing the makeRecord function with a module that matches the type’s name (ref Christmas Tree Record).
module RecordFun =
type CounterRecord = {GetState : unit -> int ; Increment : unit -> unit}
// Module showing allowed operations
[<CompilationRepresentation(CompilationRepresentationFlags.ModuleSuffix)>]
module CounterRecord =
let private count = ref 0
let create () =
{GetState = (fun () -> !count) ; Increment = (fun () -> incr count)}
Q’s: Should records be looked upon as simple containers for data or does state encapsulation make sense? Where should we draw the line, when should we use a Class instead of a Record?
Note the model from the linked post is pure, whereas the code above is not.
I do not think there is a single universal answer to this question. It is certainly true that records and classes overlap in some of their potential uses and you can choose either of them.
The one difference that is worth keeping in mind is that the compiler automatically generates structural equality and structural comparison for records, which is something you do not get for free for classes. This is why records are an obvious choice for "data types".
The rules that I tend to follow when choosing between records & classes are:
Use records for data types (to get structural equality for free)
Use classes when I want to provide C#-friendly or .NET-style public API (e.g. with optional parameters). You can do this with records too, but I find classes more straightforward
Use records for types used locally - I think you often end up using records directly (e.g. creating them) and so adding/removing fields is more work. This is not a problem for records that are used within just a single file.
Use records if I need to create clones using the { ... with ... } syntax. This is particularly nice if you are writing some recursive processing and need to keep state.
I don't think everyone would agree with this and it is not covering all choices - but generally speaking, using records for data and local types and classes for the rest seems like a reasonable method for choosing between the two.
If you want to achieve data hiding in a record, I feel there are better ways of going about it, like abstract data type "pattern".
Take a look at this:
type CounterRecord =
private {
mutable count : int
}
member this.Count = this.count
member this.Increment() = this.count <- this.count + 1
static member Make() = { count = 0 }
The record constructor is private, so the only way of constructing an instance is through the static Make member,
count field is mutable - not something to be proud about, but I'd say fair game for your counter example. Also it's not accessible from outside the module where it's defined due to private modifier. To access it from outside, you have the read-only Count property.
Like in your example, there's an Increment function on the record that mutates the internal state.
Unlike your example, you can compare CounterRecord instances using auto-generated structural comparisons - as Tomas mentioned, the selling point of records.
As for records-as-interfaces, you might see that sometimes in the field, though I think it's more of a JavaScript/Haskell idiom. Unlike those languages, F# has the interface system of .NET, made even stronger when coupled with object expressions. I feel there's not much reason to repurpose records for that.

Efficient way to define a class with multiple, optionally-empty slots in S4 of R?

I am building a package to handle data that arrives with up to 4 different types. Each of these types is a legitimate class in the form of a matrix, data.frame or tree. Depending on the way the data is processed and other experimental factors, some of these data components may be missing, but it is still extremely useful to be able to store this information as an instance of a special class and have methods that recognize the different component data.
Approach 1:
I have experimented with an incremental inheritance structure that looks like a nested tree, where each combination of data types has its own class explicitly defined. This seems difficult to extend for additional data types in the future, and is also challenging for new developers to learn all the class names, however well-organized those names might be.
Approach 2:
A second approach is to create a single "master-class" that includes a slot for all 4 data types. In order to allow the slots to be NULL for the instances of missing data, it appears necessary to first define a virtual class union between the NULL class and the new data type class, and then use the virtual class union as the expected class for the relevant slot in the master-class. Here is an example (assuming each data type class is already defined):
################################################################################
# Use setClassUnion to define the unholy NULL-data union as a virtual class.
################################################################################
setClassUnion("dataClass1OrNULL", c("dataClass1", "NULL"))
setClassUnion("dataClass2OrNULL", c("dataClass2", "NULL"))
setClassUnion("dataClass3OrNULL", c("dataClass3", "NULL"))
setClassUnion("dataClass4OrNULL", c("dataClass4", "NULL"))
################################################################################
# Now define the master class with all 4 slots, and
# also the possibility of empty (NULL) slots and an explicity prototype for
# slots to be set to NULL if they are not provided at instantiation.
################################################################################
setClass(Class="theMasterClass",
representation=representation(
slot1="dataClass1OrNULL",
slot2="dataClass2OrNULL",
slot3="dataClass3OrNULL",
slot4="dataClass4OrNULL"),
prototype=prototype(slot1=NULL, slot2=NULL, slot3=NULL, slot4=NULL)
)
################################################################################
So the question might be rephrased as:
Are there more efficient and/or flexible alternatives to either of these approaches?
This example is modified from an answer to a SO question about setting the default value of slot to NULL. This question differs in that I am interested in knowing the best options in R for creating classes with slots that can be empty if needed, despite requiring a specific complex class in all other non-empty cases.
In my opinion...
Approach 2
It sort of defeats the purpose to adopt a formal class system, and then to create a class that contains ill-defined slots ('A' or NULL). At a minimum I would try to make DataClass1 have a 'NULL'-like default. As a simple example, the default here is a zero-length numeric vector.
setClass("DataClass1", representation=representation(x="numeric"))
DataClass1 <- function(x=numeric(), ...) {
new("DataClass1", x=x, ...)
}
Then
setClass("MasterClass1", representation=representation(dataClass1="DataClass1"))
MasterClass1 <- function(dataClass1=DataClass1(), ...) {
new("MasterClass1", dataClass1=dataClass1, ...)
}
One benefit of this is that methods don't have to test whether the instance in the slot is NULL or 'DataClass1'
setMethod(length, "DataClass1", function(x) length(x#x))
setMethod(length, "MasterClass1", function(x) length(x#dataClass1))
> length(MasterClass1())
[1] 0
> length(MasterClass1(DataClass1(1:5)))
[1] 5
In response to your comment about warning users when they access 'empty' slots, and remembering that users usually want functions to do something rather than tell them they're doing something wrong, I'd probably return the empty object DataClass1() which accurately reflects the state of the object. Maybe a show method would provide an overview that reinforced the status of the slot -- DataClass1: none. This seems particularly appropriate if MasterClass1 represents a way of coordinating several different analyses, of which the user may do only some.
A limitation of this approach (or your Approach 2) is that you don't get method dispatch -- you can't write methods that are appropriate only for an instance with DataClass1 instances that have non-zero length, and are forced to do some sort of manual dispatch (e.g., with if or switch). This might seem like a limitation for the developer, but it also applies to the user -- the user doesn't get a sense of which operations are uniquely appropriate to instances of MasterClass1 that have non-zero length DataClass1 instances.
Approach 1
When you say that the names of the classes in the hierarchy are going to be confusing to your user, it seems like this is maybe pointing to a more fundamental issue -- you're trying too hard to make a comprehensive representation of data types; a user will never be able to keep track of ClassWithMatrixDataFrameAndTree because it doesn't represent the way they view the data. This is maybe an opportunity to scale back your ambitions to really tackle only the most prominent parts of the area you're investigating. Or perhaps an opportunity to re-think how the user might think of and interact with the data they've collected, and to use the separation of interface (what the user sees) from implementation (how you've chosen to represent the data in classes) provided by class systems to more effectively encapsulate what the user is likely to do.
Putting the naming and number of classes aside, when you say "difficult to extend for additional data types in the future" it makes me wonder if perhaps some of the nuances of S4 classes are tripping you up? The short solution is to avoid writing your own initialize methods, and rely on the constructors to do the tricky work, along the lines of
setClass("A", representation(x="numeric"))
setClass("B", representation(y="numeric"), contains="A")
A <- function(x = numeric(), ...) new("A", x=x, ...)
B <- function(a = A(), y = numeric(), ...) new("B", a, y=y, ...)
and then
> B(A(1:5), 10)
An object of class "B"
Slot "y":
[1] 10
Slot "x":
[1] 1 2 3 4 5