why both transform and map methods in scala? - api

I'm having trouble understanding the difference between / reason for, for example, immutable.Map.transform and immutable.Map.map. It looks like transform won't change the key, but that just seems like a trivial variation of the map method. Am I missing something?
I was expecting to find a method that applied a function to the (key,value) of the map when/if that element was accessed (rather than having to iterate through the map eagerly with the map function). Does such a method exist?

You can do exactly that with mapValues. Here is the explanation from the docs:
def mapValues[C](f: (B) ⇒ C): Map[A, C]
Transforms this map by applying a function to every retrieved value.
f - the function used to transform values of this map.
returns - a map view which maps every key of this map to f(this(key)). The resulting map wraps the original map without copying any elements.
edit:
Although extending classes of the collection API is not often a good idea, it could work like this:
class LazilyModifiedMap[A,B,C](underlying: Map[A,B])(f: (A,B) => C) extends Map[A,C] {
def get(key: A) = underlying.get(key).map( x => f(key, x))
def iterator = underlying.iterator.map { case (k,v) => (k, f(k,v)) }
def -(key: A) = iterator.toMap - key
def +[C1 >: C](kv: (A,C1)) = iterator.toMap + kv
}

If you only need the interface of PartialFunction, you can exploit the fact that Map inherits from PartialFunction:
val m = Map(1 -> "foo", 2 -> "bar")
val n = m.andThen(_.reverse)
n(1) // --> oof

Related

How to get size of specfic value inside array Kotlin

here is example of the list. I want to make dynamic where maybe the the value will become more.
val list = arrayListOf("A", "B", "C", "A", "A", "B") //Maybe they will be more
I want the output like:-
val result = list[i] + " size: " + list[i].size
So the output will display every String with the size.
A size: 3
B size: 2
C size: 1
If I add more value, so the result will increase also.
You can use groupBy in this way:
val result = list.groupBy { it }.map { it.key to it.value.size }.toMap()
Jeoffrey's way is better actually, since he is using .mapValues() directly, instead of an extra call to .toMap(). I'm just leaving this answer her since
I believe that the other info I put is relevant.
This will give a Map<String, Int>, where the Int is the count of the occurences.
This result will not change when you change the original list. That is not how the language works. If you want something like that, you'd need quite a bit of work, like overwriting the add function from your collection to refresh the result map.
Also, I see no reason for you to use an ArrayList, especially since you are expecting to increase the size of that collection, I'd stick with MutableList if I were you.
I think the terminology you're looking for is "frequency" here: the number of times an element appears in a list.
You can usually count elements in a list using the count method like this:
val numberOfAs = list.count { it == "A" }
This approach is pretty inefficient if you need to count all elements though, in which case you can create a map of frequencies the following way:
val freqs = list.groupBy { it }.mapValues { (_, g) -> g.size }
freqs here will be a Map where each key is a unique element from the original list, and the value is the corresponding frequency of that element in the list.
This works by first grouping elements that are equal to each other via groupBy, which returns a Map<String, List<String>> where each key is a unique element from the original list, and each value is the group of all elements in the list that were equal to the key.
Then mapValues will transform that map so that the values are the sizes of the groups instead of the groups themselves.
An improved approach, as suggested by #broot is to make use of Kotlin's Grouping class which has a built-in eachCount method:
val freqs = list.groupingBy { it }.eachCount()

Kotlin map function combined with put to map

Im trying to do following in simplest way.
I'd need to map list of objects into hashMap so that value1 is the key and value2 is entry object.
val map = mutableMapOf<String, Account>()
accountInfo.map {
map.put( it.value1 to it.value2 )
}
or
val map2 = accountInfo.map {
it.value1 to it.value2
}
but both are somehow wrong
first one says it only wants string as input.
Second one tells me its going to return List<Pair<String, Account>>
Why ?! thank you!
use put(x, y)
Not put( x to y)

generating DataFrames in for loop in Scala Spark cause out of memory

I'm generating small dataFrames in for loop. At each round of for loop, I pass the generated dataFrame to a function which returns double. This simple process (which I thought could be easily taken care of by garbage collector) blow up my memory. When I look at Spark UI at each round of for loop it adds a new "SQL{1-500}" (my loop runs 500 times). My question is how to drop this sql object before generating a new one?
my code is something like this:
Seq.fill(500){
val data = (1 to 1000).map(_=>Random.nextInt(1000))
val dataframe = createDataFrame(data)
myFunction(dataframe)
dataframe.unpersist()
}
def myFunction(df: DataFrame)={
df.count()
}
I tried to solve this problem by dataframe.unpersist() and sqlContext.clearCache() but neither of them worked.
You have two places where I suspect something fishy is happening:
in the definition of myFunction : you really need to put the = before the body of the definition. I had typos like that compile, but produce really weird errors (note I changed your myFunction for debugging purposes)
it is better to fill your Seq with something you know and then apply foreach or some such
(You also need to replace random.nexInt with Random.nextInt, and also, you can only create a DataFrame from a Seq of a type that is a subtype of Product, such as tuple, and need to use sqlContext to use createDataFrame)
This code works with no memory issues:
Seq.fill(500)(0).foreach{ i =>
val data = {1 to 1000}.map(_.toDouble).toList.zipWithIndex
val dataframe = sqlContext.createDataFrame(data)
myFunction(dataframe)
}
def myFunction(df: DataFrame) = {
println(df.count())
}
Edit: parallelizing the computation (across 10 cores) and returning the RDD of counts:
sc.parallelize(Seq.fill(500)(0), 10).map{ i =>
val data = {1 to 1000}.map(_.toDouble).toList.zipWithIndex
val dataframe = sqlContext.createDataFrame(data)
myFunction(dataframe)
}
def myFunction(df: DataFrame) = {
df.count()
}
Edit 2: the difference between declaring function myFunction with = and without = is that the first is (a usual) function definition, while the other is procedure definition and is only used for methods that return Unit. See explanation. Here is this point illustrated in Spark-shell:
scala> def myf(df:DataFrame) = df.count()
myf: (df: org.apache.spark.sql.DataFrame)Long
scala> def myf2(df:DataFrame) { df.count() }
myf2: (df: org.apache.spark.sql.DataFrame)Unit

Access Instance-property in Coffeescript within nested

so I am using express within a node-app. As my app is getting bigger I want to put my routes into extra files. I seem to be able to get hold of the bugDB if I just get rid of the intermediate get object. But I can't access the bugDB in the inner object. Any suggestions? Maybe there is even a more nice code pattern for how to accomplish this more elegantly.
I would appreachate your help. Thanks in advance. (As I am not a native speaker I couldn't find others with a similar problem, if you know how to phrase the question better, please show me the way :) )
BUGROUTER.COFFEE
class BugsRouter
constructor: (#bugDB)-> // instance-variable with databaselink
return
get:{
allBugs: (req, res)=>
console.log "db", #bugDB // this gives me undefined
// is "this" in the get context?
#bugDB.allDocs {include_docs: true}, (err, response)->
res.json 200, response
}
module.exports = BugsRouter
SERVER.COFFEE
BugsRouter = require "./routes/BUGROUTER"
bugsRouter = new BugsRouter(bugDB)
console.log bugsRouter.bugDB # this is working
app.get "/bugs/all", bugsRouter.get.allBugs
Sub-objects don't work like that. When you say this:
class C
p:
f: ->
Then p is just a plain object that happens to be a property on C's prototype, it will have no special idea of what # should be inside f. And if you try to use a fat-arrow instead:
class C
p:
f: =>
then you're accidentally creating a namespaced class function called f so # will be C when f is called. In either case, saying:
c = new C
c.p.f()
is the same as:
c = new C
p = c.p
p.f()
so f will be called in the context of p rather than c.
You can get around this if you don't mind manually binding the functions inside get when your constructor is called:
constructor: (#bugDB) ->
#get = { }
for name, func of #constructor::get
#get[name] = func.bind(#)
This assumes that you have Function.bind available. If you don't then you can use any of the other binding techniques (_.bind, $.proxy, ...). The #get = { } trick is needed to ensure that you don't accidentally modify the prototype's version of #get; if you're certain that you'll only be creating one instance of your BugsRouter then you could use this instead:
constructor: (#bugDB) ->
for name, func of #get
#get[name] = func.bind(#)
to bind the functions inside the prototype's version of get rather than the instance's local copy.
You can watch this simplified demo to see what's going on with # in various cases, keep an eye on the #flag values to see the accidental prototype modification caused by not using #get = { } and #constructor::get:
class C1
get:
f: -> console.log('C1', #)
class C2
get:
f: => console.log('C2', #)
class C3
constructor: ->
#flag = Math.random()
for name, func of #get
#get[name] = func.bind(#)
get:
f: -> console.log('C3', #)
class C4
constructor: ->
#flag = Math.random()
#get = { }
for name, func of #constructor::get
#get[name] = func.bind(#)
get:
f: -> console.log('C4', #)
for klass in [C1, C2, C3, C3, C4, C4]
o = new klass
o.get.f()
Live version of the above: http://jsfiddle.net/ambiguous/8XR7Z/
Hmm, seems like I found a better solution after all:
class Test
constructor: ->
#testVariable = "Have a nice"
return
Object.defineProperties #prototype,
get:
enumerable :true
get:->
{
day: => #testVariable + " day"
week: => #testVariable + " day"
}
console.log (new Test()).get.day()
This allows me to call (new Test()).get.day() the way I wanted.
Live version at: JSFiddle

overriding a method in R, using NextMethod

how dows this work in R...
I am using a package (zoo 1.6-4) that defines a S3 class for time series sets.
I am writing a derived class where I want to override a few methods and can't get past this one:[.zoo!
in my derived class rows are indexed by timestamp, like in zoo, but differently from zoo, I allow only POSIXct values in the index. my users will be selecting columns all of the time, while slicing series only occasionally so I want to offer obj[name] instead of obj[, name].
my objects have class c("delftfews", "zoo").
but...
how do I override a method?
I tried this:
"[.delftfews" <- function(x, i, j, drop=TRUE, ...) {
if (missing(i)) return(NextMethod())
if (all(class(i) == "character") && missing(j)) {
return(NextMethod('[', x=x, i=1:NROW(x), j=i, drop=drop, ...))
}
NextMethod()
}
but I get this error: Error in rval[i, j, drop = drop., ...] : incorrect number of dimensions.
I have solved by editing the source from zoo: I removed those ..., but I don't get why that works. anybody can explain what is going on here?
The problem is that with the above definition of [.delftfews this code:
library(zoo)
z <- structure(zoo(cbind(a = 1:3, b = 4:6)), class = c("delftfews", "zoo"))
z["a"]
# generates this call: `[.zoo`(x = 1:6, i = 1:3, j = "a", drop = TRUE, z, "a")
Your code does work as is if you write the call like this:
z[j = "a"]
# generates this call: `[.zoo`(x = z, j = "a")
I think what you want is to change the relevant line in [.delftfews to this:
return(NextMethod(.Generic, object = x, i = 1:NROW(x), drop = drop))
# z["a"] now generates this call: `[.zoo`(x = z, i = 1:3, j = "a", drop = TRUE)
A point of clarification: allowing only POSIXct index values does not allow indexing columns by name only. I'm not sure how you arrived at that conclusion.
You're overriding zoo correctly, but I think you misunderstand NextMethod. The error is caused by if (missing(i)) return(NextMethod()), which calls [.zoo if i is missing, but [.zoo requires i because zoo's internal data structure is a matrix. Something like this should work:
if (missing(i)) i <- 1:NROW(x)
though I'm not sure if you have to explicitly pass this new i to NextMethod...
You may be interested in the xts package, if you haven't already taken a look at it.