Kotlin lets-plot: minimal example - kotlin

I would like to write a completely minimal example of lets-plot, which just saves png and doesn't use any frontend. For this, I created a "helloworld" Kotlin project in IntelliJ IDEA. Then I added Maven dependency org.jetbrains.lets-plot:lets-plot-common:2.1.0. Now if I try to import jetbrains.letsPlot.letsPlot, I get the error "Unresolved reference: letsPlot". Thus, the question is how to write the most minimal lets-plot example, without using any frontend and Gradle.

The right dependencies are org.jetbrains.lets-plot:lets-plot-kotlin-jvm:3.0.2 (for API) and org.jetbrains.lets-plot:lets-plot-image-export:2.1.0 (to be able to export to raster). Now it works, and the resulting image in lets-plot-images directory.
The code:
import jetbrains.letsPlot.export.ggsave
import jetbrains.letsPlot.geom.geomPoint
import jetbrains.letsPlot.letsPlot
fun main() {
val xs = listOf(0, 0.5, 1, 2)
val ys = listOf(0, 0.25, 1, 4)
val data = mapOf<String, Any>("x" to xs, "y" to ys)
val fig = letsPlot(data) + geomPoint(
color = "dark-green",
size = 4.0
) { x = "x"; y = "y" }
ggsave(fig, "plot.png")
}
The resulting image:

Related

Pipeline generation - passing in simple datastructures like lists/arrays

For a code repository project in Palantir Foundry, I am struggling with re-using some of my transformation logic.
It seems almost trivial, but: is there way to send an Input to a Transform that is not a dataset/dataframe reference?
In my case I want to pass in strings or lists/arrays.
This is my code:
from pyspark.sql import functions as F
from transforms.api import Transform, Input, Output
def my_computation(result, customFilter, scope, my_categories, my_mappings):
scope_df = scope.dataframe()
my_categories_df = my_categories.dataframe()
my_mappings_df = my_mappings.dataframe()
filtered_cat_df = (
my_categories_df
.filter(F.col('CAT_NAME').isin(customFilter))
)
# ... more logic
def generateTransforms(config):
transforms = []
for key, value in config.items():
o = {}
for outKey, outValue in value['outputs'].items():
o[outKey] = Output(outValue)
i = {}
for inpKey, inpValue in value['inputs'].items():
i[inpKey] = Input(inpValue)
i['customFilter'] = Input(value['my_custom_filter'])
transforms.append(Transform(my_computation, inputs=i, outputs=o))
return transforms
config = {
"transform_one": {
"my_custom_filter": {
"foo",
"bar"
},
"inputs": {
"scope": "/my-project/input/scope",
"my_categories": "/my-project/input/my_categories",
"my_mappings": "/my-project/input/my_mappings"
},
"outputs": {
"result": "/my-project/output/result"
}
}
}
TRANSFORMS = generateTransforms(config)
The concrete question is: how can I send in the values from my_custom_filter into customFilter in the transformation function my_computation?
If I execute it like above, I get the error "TypeError: unhashable type: 'set'"
This looks like a python issue, any chance you can point out which line is causing the error?
Reading throung your code, I would guess it's this line:
i['customFilter'] = Input(value['my_custom_filter'])
Your python logic is wrong, if we unpack your code you're trying to do this call:
i['customFilter'] = Input({"foo", "bar"})
Edit to answer the comment on how to create a python transform to lock a variable in a closure:
def create_transform(inputs={}, outputs={}, my_other_var):
#transform(**inputs, **outputs)
def compute(input_foo, input_bar, output_foobar, ctx):
df = input_foo.dataframe()
df = df.withColumn("mycol", F.lit(my_other_var))
output_foorbar.write_dataframe(df)
return compute
and now you can call this:
transforms.append(create_tranform(inputs, outptus, "foobar"))

Convert Image File to a Negative

I want to get the negative of a specific Image based on following command-line input.
-in xyz.png -out xyz_negative.png
The files loaded are stored in a specific folder.
Here is what I was/am trying to do so far.
Read a png file
Calculate RGB values for each pixel
Calculate Negative RGB Values
Set new RGB Values for png file to get the negative
I am kinda stuck on the last step since the new .png ist not the negative. Here is my code, if anybody could help me or point me in the right direction I would be much obliged.
import java.awt.Color
import java.io.File
import javax.imageio.ImageIO
fun main(args: Array<String>) {
if (args.isNotEmpty()) {
val myFile = File(
"/Users/xyz/IdeaProjects/Seam Carving/Seam Carving/task/${args[1]}")
val importedImage = ImageIO.read(myFile)
for (x in 0 until importedImage.width) {
for (y in 0 until importedImage.height) {
val pixel = importedImage.getRGB(x, y)
var color = Color(pixel, true)
val redNegative = 255 - color.red
val greenNegative = 255 - color.green
val blueNegative = 255 - color.blue
color = Color(redNegative, blueNegative, greenNegative)
importedImage.setRGB(x, y, color.rgb)
}
}
ImageIO.write(importedImage, "png", File("/Users/xyz/IdeaProjects/Seam Carving/Seam Carving/task/${args[3]}"))
}
}
What its supposed to look like:
What my image looks like

Failed to build Kotlin code with proto models using Bazel

I'm trying to use compiled proto models into my kotlin code. Project is managed by bazel. So I reproduce problem with simple "HelloWorld" project.
WORKSPACE
load("#bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
RULES_KOTLIN_VERSION = "9051eb053f9c958440603d557316a6e9fda14687"
http_archive(
name = "io_bazel_rules_kotlin",
sha256 = "c36e71eec84c0e17dd098143a9d93d5720e81b4db32bceaf2daf939252352727",
strip_prefix = "rules_kotlin-%s" % RULES_KOTLIN_VERSION,
url = "https://github.com/bazelbuild/rules_kotlin/archive/%s.tar.gz" % RULES_KOTLIN_VERSION,
)
load("#io_bazel_rules_kotlin//kotlin:kotlin.bzl", "kotlin_repositories", "kt_register_toolchains")
kotlin_repositories()
kt_register_toolchains()
http_archive(
name = "com_google_protobuf",
strip_prefix = "protobuf-master",
urls = ["https://github.com/protocolbuffers/protobuf/archive/master.zip"],
)
load("#com_google_protobuf//:protobuf_deps.bzl", "protobuf_deps")
protobuf_deps()
BUILD
load("#io_bazel_rules_kotlin//kotlin:kotlin.bzl", "kt_jvm_library")
load("#rules_java//java:defs.bzl", "java_binary", "java_lite_proto_library", "java_proto_library")
load("#rules_proto//proto:defs.bzl", "proto_library")
package(default_visibility = ["//visibility:public"])
proto_library(
name = "clicklocation_proto",
srcs = ["ClickLocation.proto"],
)
java_proto_library(
name = "clicklocation_java_lite_proto",
deps = [":clicklocation_proto"],
)
kt_jvm_library(
name = "app_lib",
srcs = ["Main.kt"],
deps = [":clicklocation_java_lite_proto"]
)
java_binary(
name = "myapp",
main_class = "MyApp",
runtime_deps = [":app_lib"],
)
Proto file
syntax = "proto2";
package objectrecognition;
option java_package = "com.kshmax.objectrecognition.proto";
option java_outer_classname = "ClickLocationProtos";
message ClickLocation {
required float x = 1;
required float y = 2;
}
Main.kt
import com.kshmax.objectrecognition.proto.ClickLocationProtos
class MyApp {
companion object {
#JvmStatic
fun main(args: Array<String>) {
val location = ClickLocationProtos.ClickLocation.newBuilder()
location.x = 0.1f
location.y = 0.2f
location.build()
}
}
}
I have done it as described in protocolbuffers/protobuf repository examples .
But I got an error:
error: supertypes of the following classes cannot be resolved. Please
make sure you have the required dependencies in the classpath: class
com.kshmax.objectrecognition.proto.ClickLocationProtos.ClickLocation,
unresolved supertypes: com.google.protobuf.GeneratedMessageV3 class
com.kshmax.objectrecognition.proto.ClickLocationProtos.ClickLocationOrBuilder,
unresolved supertypes: com.google.protobuf.MessageOrBuilder class
com.kshmax.objectrecognition.proto.ClickLocationProtos.ClickLocation.Builder,
unresolved supertypes: com.google.protobuf.GeneratedMessageV3.Builder
What am I doing wrong?
The code that uses protobuf needs to depend on the protobuf library itself.
In theory, this could be exported by the rule, but since it doesn't work, I would add a dep on protobuf directly, similarly to BuildEventServiceTest
deps= [
"#com_google_protobuf//:protobuf_java",
"my_foo_java_proto",
]
or java_lite_proto_library / "#com_google_protobuf//:protobuf_java_lite" where the remote repository was defined in the WORKSPACE as
http_archive(
name = "com_google_protobuf",
patch_args = ["-p1"],
patches = ["#io_bazel//third_party/protobuf:3.11.3.patch"],
patch_cmds = EXPORT_WORKSPACE_IN_BUILD_FILE,
patch_cmds_win = EXPORT_WORKSPACE_IN_BUILD_FILE_WIN,
sha256 = "cf754718b0aa945b00550ed7962ddc167167bd922b842199eeb6505e6f344852",
strip_prefix = "protobuf-3.11.3",
urls = [
"https://mirror.bazel.build/github.com/protocolbuffers/protobuf/archive/v3.11.3.tar.gz",
"https://github.com/protocolbuffers/protobuf/archive/v3.11.3.tar.gz",
],
)
(I'm not sure you need the patch, and there might be a more recent version; look at the example)

How to convert spark dataframe[double , String] to LabeledPoint?

Following is the code that am experimenting with. Am trying to convert SalesData in csv to DF and then to LabeledPoints. However in the last step am getting following compilation error
package macros contains object and package with same name: blackbox
Can you please give me pointers on what am doing wrong here ? Thank you
--EDIT--
Compilation Issue solved by adding 2.11 mllib to build.gradle . but mlData.show fails with
ERROR: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.ml.linalg.Vector
val path = "SalesData.csv"
val conf = new SparkConf().setMaster("local[2]").set("deploy-mode", "client").set("spark.driver.bindAddress", "127.0.0.1")
.set("spark.broadcast.compress", "false")
.setAppName("local-spark-kafka-consumer-client")
val sparkSession = SparkSession
.builder()
.config(conf)
.getOrCreate()
val data = sparkSession.read.format("csv").option("header", "true").option("inferSchema", "true").load(path)
data.cache()
import org.apache.spark.sql.DataFrameNaFunctions
data.na.drop()
data.show
//get monthly sales totals
val summary = data.select("OrderMonthYear","SaleAmount").groupBy("OrderMonthYear").sum().orderBy("OrderMonthYear").toDF("OrderMonthYear","SaleAmount")
summary.show
// convert ordermonthyear to integer type
//val results = summary.map(df => (df.getAs[String]("OrderMonthYear").replace("-", "") , df.getAs[String]("SaleAmount"))).toDF(["OrderMonthYear","SaleAmount"])
import org.apache.spark.sql.functions._
val test = summary.withColumn("OrderMonthYear", (regexp_replace(col("OrderMonthYear").cast("String"),"-",""))).toDF("OrderMonthYear","SaleAmount")
test.printSchema()
test.show
import sparkSession.implicits._
val mlData = test.select("OrderMonthYear", "SaleAmount").
map(row => org.apache.spark.ml.feature.LabeledPoint(
row.getAs[Double](1),
row.getAs[org.apache.spark.ml.linalg.Vector](0))).toDF
mlData.show

Case Class serialization in Spark

In a Spark app (Spark 2.1) I'm trying to send a case class as input parameter of a function that is meant to run on executors
object TestJob extends App {
val appName = "TestJob"
val out = "out"
val p = Params("my-driver-string")
val spark = SparkSession.builder()
.appName(appName)
.getOrCreate()
import spark.implicits._
(1 to 100).toDF.as[Int].flatMap(i => Dummy.process(i, p))
.write
.option("header", "true")
.csv(out)
}
object Dummy {
def process(i: Int, v:Params): Vector[String] = {
Vector { if( i % 2 == 1) v + "_odd" else v + "_even" }
}
}
case class Params(v: String)
When I run it with master local[*] everything goes well, while when running in a cluster, Params class state is not getting serialized and the output results in
null_even
null_odd
...
Could you please help me understanding what I'm doing wrong?
Googling around I found this post that gave me the solution:Spark broadcasted variable returns NullPointerException when run in Amazon EMR cluster
In the end the problem is due to the extend Apps