Data Type Encompassing Two Sum Types? - idris

Given these 2 sum types:
data Foo = A Int | B String
data Bar = C Int | D String
I'd like to define a function that returns Either (Foo or Bar) String.
So, I attempted to make:
data Higher = Foo | Bar
But it failed to compile:
*ADT> :r
Type checking ./ADT.idr
ADT.idr:3:6:Main.Foo is already defined
ADT.idr:4:6:Main.Bar is already defined
How can I create a Higher data type, which consists of Foo or Bar?

Yes you can indeed!
data Foo = A Int | B String
data Bar = C Int | D String
data Higher : Type where
InjFoo : Foo -> Higher
InjBar : Bar -> Higher
Now you can do InjFoo (B "Hello") or InjBar (C 5).

Related

Fortran derived type inheritance

Let's say I have a derived type bar_a that is included in derived type foo_a as variable bar.
Now I want to extend bar_a and create a new derived type named bar_b. I tried the following:
program main
implicit none
! Base types -----------
type :: bar_a
integer :: a
end type bar_a
type :: foo_a
type(bar_a) :: bar
end type foo_a
! Extended types -------
type, extends(bar_a) :: bar_b
integer :: b
end type bar_b
type, extends(foo_a) :: foo_b
type(bar_b) :: bar ! <-- Component ‘bar’ at (1) already in the parent type
end type foo_b
! ----------------------
type(foo_b) :: foo
print *, foo%bar%a
print *, foo%bar%b
end program main
but I get a compiler error: "the component ‘bar’ at (1) already in the parent type".
Is there a way to extend foo_a so that it includes the new derived type bar_b as I tried, or is there any way to "override" the bar variable declaration? I would like to inherit the type bound procedures that would be a part of foo_a in foo_b.
When I try to compile I get a better message:
aa.f90:21:22:
10 | type :: foo_a
| 2
......
21 | type(bar_b) :: bar ! <-- Component ‘bar’ already in the parent type
| 1
Error: Component ‘bar’ at (1) already in the parent type at (2)
and this seems logical you try to extend foo_a with an element with the name bar, but the type that you extend (from the definition at line 10) already has a variable bar at line 11 and you try to add another bar at line 21.

Divding column of Dataframe by constant value

I have a Data frame in below format.
| Occupation | wa_rating | Genre |
| engineer | 935 | Musical |
Now I want to divide Rating column of this Dataframe by totalRatings.
but when I am doing
resultDF = joinedDF.select(col("wa_rating")/totalRating)
It is giving me below error.
unsupported literal type class java.util.Arraylist
Likely your totalRating variable is a list. For example [100]. And you can't divide a number by a list. This throws your error:
resultDF = joinedDF.select(col("wa_rating")/[100])
but this does not
resultDF = joinedDF.select(col("wa_rating")/100)
Check that totalRating is an actual number (a float or integer). If it's a list containing a number, simply extract the number from it.
EDIT:
From your comments, we now know that your totalRating is a list. You can transform it to a number with:
totalRating = joinedDF3.groupBy().sum("Rating").collect()[0][0]

Changing names of variables using the values of another variable

I am trying to rename around 100 dummy variables with the values from a separate variable.
I have a variable products, which stores information on what products a company sells and have generated a dummy variable for each product using:
tab products, gen(productid)
However, the variables are named productid1, productid2 and so on. I would like these variables to take the values of the variable products instead.
Is there a way to do this in Stata without renaming each variable individually?
Edit:
Here is an example of the data that will be used. There will be duplications in the product column.
And then I have run the tab command to create a dummy variable for each product to produce the following table.
sort product
tab product, gen(productid)
I noticed it updates the labels to show what each variable represents.
What I would like to do is to assign the value to be the name of the variable such as commercial to replace productid1 and so on.
Using your example data:
clear
input companyid str10 product
1 "P2P"
2 "Retail"
3 "Commercial"
4 "CreditCard"
5 "CreditCard"
6 "EMFunds"
end
tabulate product, generate(productid)
list, abbreviate(10)
sort product
levelsof product, local(new) clean
tokenize `new'
ds productid*
local i 0
foreach var of varlist `r(varlist)' {
local ++i
rename `var' ``i''
}
Produces the desired output:
list, abbreviate(10)
+---------------------------------------------------------------------------+
| companyid product Commercial CreditCard EMFunds P2P Retail |
|---------------------------------------------------------------------------|
1. | 3 Commercial 1 0 0 0 0 |
2. | 5 CreditCard 0 1 0 0 0 |
3. | 4 CreditCard 0 1 0 0 0 |
4. | 6 EMFunds 0 0 1 0 0 |
5. | 1 P2P 0 0 0 1 0 |
6. | 2 Retail 0 0 0 0 1 |
+---------------------------------------------------------------------------+
Arbitrary strings might not be legal Stata variable names. This will happen if they (a) are too long; (b) start with any character other than a letter or an underscore; (c) contain characters other than letters, numeric digits and underscores; or (d) are identical to existing variable names. You might be better off making the strings into variable labels, where only an 80 character limit bites.
This code loops over the variables and does its best:
gen long obs = _n
foreach v of var productid? productid?? productid??? {
su obs if `v' == 1, meanonly
local tryit = product[r(min)]
capture rename `v' `=strtoname("`tryit'")'
}
Note: code not tested.
EDIT: Here is a test. I added code for variable labels. The data example and code show that repeated values and values that could not be variable names are accommodated.
clear
input str13 products
"one"
"two"
"one"
"three"
"four"
"five"
"six something"
end
tab products, gen(productsid)
gen long obs = _n
foreach v of var productsid*{
su obs if `v' == 1, meanonly
local value = products[r(min)]
local tryit = strtoname("`value'")
capture rename `v' `tryit'
if _rc == 0 capture label var `tryit' "`value'"
else label var `v' "`value'"
}
drop obs
describe
Contains data
obs: 7
vars: 7
size: 133
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
products str13 %13s
five byte %8.0g five
four byte %8.0g four
one byte %8.0g one
six_something byte %8.0g six something
three byte %8.0g three
two byte %8.0g two
-------------------------------------------------------------------------------
Another solution is to use the extended macro function
local varlabel:variable label
The tested code is:
clear
input companyid str10 product
1 "P2P"
2 "Retail"
3 "Commercial"
4 "CreditCard"
5 "CreditCard"
6 "EMFunds"
end
tab product, gen(product_id)
* get the list of product id variables
ds product_id*
* loop through the product id variables and change the
variable name to its label
foreach var of varlist `r(varlist)' {
local varlabel: variable label `var'
display "`varlabel'"
local pos = strpos("`varlabel'","==")+2
local varlabel = substr("`varlabel'",`pos',.)
display "`varlabel'"
rename `var' `varlabel'
}

How to filter after group by and aggregate in Spark dataframe?

I have a spark dataframe df with schema as such:
[id:string, label:string, tags:string]
id | label | tag
---|-------|-----
1 | h | null
1 | w | x
1 | v | null
1 | v | x
2 | h | x
3 | h | x
3 | w | x
3 | v | null
3 | v | null
4 | h | null
4 | w | x
5 | w | x
(h,w,v are labels. x can be any non-empty values)
For each id, there is at most one label "h" or "w", but there might be multiple "v". I would like to select all the ids that satisfies following conditions:
Each id has:
1. one label "h" and its tag = null,
2. one label "w" and its tag != null,
3. at least one label "v" for each id.
I am thinking that I need to create three columns checking each above conditions. And then I need to do a group by "id".
val hCheck = (label: String, tag: String) => {if (label=="h" && tag==null) 1 else 0}
val udfHCheck = udf(hCheck)
val wCheck = (label: String, tag: String) => {if (label=="w" && tag!=null) 1 else 0}
val udfWCheck = udf(wCheck)
val vCheck = (label: String) => {if (label==null) 1 else 0}
val udfVCheck = udf(vCheck)
dfx = df.withColumn("hCheck", udfHCheck(col("label"), col("tag")))
.withColumn("wCheck", udfWCheck(col("label"), col("tag")))
.withColumn("vCheck", udfVCheck(col("label")))
.select("id","hCheck","wCheck","vCheck")
.groupBy("id")
Somehow I need to group three columns {"hCheck","wCheck","vCheck"} into vector of list [x,0,0],[0,x,0],[0,0,x]. And check if these vector contain all three {[1,0,0],[0,1,0],[0,0,1]}
I have not been able to solve this problem yet. And there might be a better approach than this one. Hope someone can give me suggestions. Thanks
To convert the three checks to vectors you can do:
Specifically you can do:
val df1 = df.withColumn("hCheck", udfHCheck(col("label"), col("tag")))
.withColumn("wCheck", udfWCheck(col("label"), col("tag")))
.withColumn("vCheck", udfVCheck(col("label")))
.select($"id",array($"hCheck",$"wCheck",$"vCheck").as("vec"))
Next the groupby returns a grouped object on which you need to perform aggregations. Specifically to get all the vectors you should do something like:
.groupBy("id").agg(collect_list($"vec"))
Also you do not need udfs for the various checks. You can do it with column semantics. For example udfHCheck can be written as:
with($"label" == lit("h") && tag.isnull 1).otherwise(0)
BTW, you said you wanted a label 'v' for each but in vcheck you just check if the label is null.
Update: Alternative solution
Upon looking on this question again, I would do something like this:
val grouped = df.groupBy("id", "label").agg(count("$label").as("cnt"), first($"tag").as("tag"))
val filtered1 = grouped.filter($"label" === "v" || $"cnt" === 1)
val filtered2 = filtered.filter($"label" === "v" || ($"label" === "h" && $"tag".isNull) || ($"label" === "w" && $"tag".isNotNull))
val ids = filtered2.groupBy("id").count.filter($"count" === 3)
The idea is that first we groupby BOTH id and label so we have information on the combination. The information we collect is how many values (cnt) and the first element (doesn't matter which).
Now we do two filtering steps:
1. we need exactly one h and one w and any number of v so the first filter gets us these cases.
2. we make sure all the rules are met for each of the cases.
Now we have only combinations of id and label which match the rules so in order for the id to be legal we need to have exactly three instances of label. This leads to the second groupby which simply counts the number of labels which matched the rules. We need exactly three to be legal (i.e. matched all the rules).

What is the structure of a node for this B-Tree specification?

I am trying to create a B-tree with the following properties:
Every node x contains following attributes:
x.n is the number of keys present in node x
x.key1,x.key2,.....x.keyx.n are the keys present in the node
x.c1,x.c2,.........x.cx.n,x.cx.n+1 are the pointers to the child nodes
x.leaf is a boolean variable that shows whether the node is a leaf node or not
Based on this specification, how would I implement the structure for a node:
struct Node{
...?
}
The notional structure when drawn is something like this.
a b c d
/ | | | \
la bab bbc bcd gd
la = less than a
bab = between a and b
bbc = between b and c
bcd = between c and d
gd = greater than d
Where there are more pointers than elements.
So a b-tree of order N has at most N children. So using BTREE_ORDER as this value, and ensuring BTREE_ORDER is greater than 1.
The structure is most efficiently done as
struct Node{
size_t numNodes;
KEY_TYPE Key[BTREE_ORDER -1];
struct Node * Children[BTREE_ORDER];
}
So it has space for BTREE_ORDER-1 keys and BTREE_ORDER child nodes. The arangement is up to the code, and is
Children[0] Key[0] Children[1] Key[1] .... Key[numNodes - 2] Children[ numNodes - 1]