How to produce RDD[Result] for unit testing - serialization

for unit testing purpose I am building my own HBase Result object as follows
val row = Bytes.toBytes( "row01" )
val cf = Bytes.toBytes( "cf" )
val cell1 = new KeyValue( row, cf, "v1".getBytes(), Bytes.toBytes( "file1" ) )
val cell2 = new KeyValue( row2, cf, "v2".getBytes(), Bytes.toBytes( "file2" ) )
val cells = List( cell1, cell2 )
val result = Result.create( cells )
Now I want to add this to a sparkContext Object , like
val sparkContext = new org.apache.spark.SparkContext( conf )
val rdd = sparkContext.parallelize( List( result ) )
However, once I try to access the rdd via foreach , like
rdd.foreach{x=>x}
I get the famous Spark Task Not serializable.
Does anyone know of a better way to crete RDD[Result]?

Result is not serializable, so if you want an RDD[Result] you have to produce the Results on the node itself from some other input (and of course, then actions like collect, first which would send Results between nodes etc. won't work). So e.g.
val rdd0 = sparkContext.parallelize( List( ("row", "cf") ) )
val rdd = rdd.map { case (str1, str2) =>
val row = Bytes.toBytes( str1 )
val cf = Bytes.toBytes( str2 )
val cell1 = new KeyValue( row, cf, "v1".getBytes(), Bytes.toBytes( "file1" ) )
val cell2 = new KeyValue( row2, cf, "v2".getBytes(), Bytes.toBytes( "file2" ) )
val cells = List( cell1, cell2 )
Result.create( cells )
}

Related

How to use a Dataframe, which is created from Dstream, outside of foreachRDD block?

i've been tried to working on spark streaming. My problem is I want to use wordCountsDataFrame again outside of the foreach block.
i want to conditionally join wordCountsDataFrame and another dataframe that is created from Dstream. Is there any way to do that or another approach?
Thanks.
My scala code block is below.
val Seq(projectId, subscription) = args.toSeq
val sparkConf = new SparkConf().setAppName("PubsubWordCount")
val ssc = new StreamingContext(sparkConf, Milliseconds(5000))
val credentail = SparkGCPCredentials.builder.build()
val pubsubStream: ReceiverInputDStream[SparkPubsubMessage] = PubsubUtils.createStream(ssc, projectId, None, subscription, credentail, StorageLevel.MEMORY_AND_DISK_SER_2)
val stream1= pubsubStream.map(message => new String(message.getData()))
stream1.foreachRDD{ rdd =>
val spark = SparkSession.builder.config(rdd.sparkContext.getConf).getOrCreate()
import spark.implicits._
// Convert RDD[String] to DataFrame
val wordsDataFrame = rdd.toDF("word")
wordsDataFrame.createOrReplaceTempView("words")
val wordCountsDataFrame =
spark.sql("select word, count(*) from words group by word")
wordCountsDataFrame.show()
}

Difference Between Two List Elements in Kotlin

I have a data class as below in Kotlin.
data class ProductData(
val code: String,
var value: Double)
There are two list for the above data class as
lstToday: List<ProductData> contains such as
("P1", 110)
("P2", 109)
("P3", 102)
("P4", 110)
..... 100+ records
and
lstYesterday: List<ProductData> contains such as
("P1", 112)
("P2", 109)
("P3", 110)
("P4", 90)
..... 100+ records
Both has the identical and exact number of records.
The output I am looking for is as below.
Output 1: Difference between yesterday and today
lstDifference: List<ProductData> contains such as
("P1", -2)
("P2", 0)
("P3", 8)
("P4", -20)
..... 100+ records
Output 2: Today Price and Difference between yesterday and today
using the data class below.
data class ProductDisplayData(
val code: String,
var value: Double,
var diff: Double
)
With list as below
lstDifference: List<ProductData> contains such as
("P1", 112, -2)
("P2", 109, 0)
("P3", 110, 8)
("P4", 90, -20)
..... 100+ records
Can this be achieved using any functions Kotlin, or is it we have loop each element and get the result.
Thanks
I would say this is not the most efficient solution. We're talking 0n ish. This could definitely be optimized. To note, this is fault tolerant of the code itself not being included in both days, and assumes 0 for a non-existent value between days.
Trying to guarantee that both data sets will always be the same, is going to be more maintenance than writing code that will tolerate that mistake.
data class ProductData(
val code: String,
var value: Double
)
val dayOne = listOf(
ProductData("P1", 110.0),
ProductData("P2", 109.0),
ProductData("P3", 102.0),
ProductData("P4", 110.0),
ProductData("P5", 105.0),
ProductData("P6", 104.0),
ProductData("P8", 32.0) // Not in set 2
)
val dayTwo = listOf(
ProductData("P1", 110.0),
ProductData("P2", 109.0),
ProductData("P3", 102.0),
ProductData("P4", 90.0),
ProductData("P5", 49.0),
ProductData("P6", 123.0),
ProductData("P7", 239.0) // Not in set 1
)
fun periodDataDifference(dayOne: List<ProductData>, dayTwo: List<ProductData>): List<ProductData> {
val mapOne = dayOne.associate { it.code to it.value }
val mapTwo = dayTwo.associate { it.code to it.value }
val keys = mapOne.keys + mapTwo.keys
return keys.map { key ->
val first = mapOne[key] ?: 0.0
val second = mapTwo[key] ?: 0.0
ProductData(key, second - first)
}
}
val out = periodDataDifference(dayOne, dayTwo)
println(out)
/*
[ProductData(code=P1, value=0.0),
ProductData(code=P2, value=0.0),
ProductData(code=P3, value=0.0),
ProductData(code=P4, value=-20.0),
ProductData(code=P5, value=-56.0),
ProductData(code=P6, value=19.0),
ProductData(code=P8, value=-32.0), // Set 1 only
ProductData(code=P7, value=239.0)] // Set 2 only
*/
If today and yesterday lists are gonna be same size this is the quickest solution I have
val lstYesterday: List<ProductData> = listOf(
ProductData("P1", 112.0),
ProductData("P2", 109.0),
ProductData("P3", 110.0)
)
val lstToday: List<ProductData> = listOf(
ProductData("P1", 110.0),
ProductData("P2", 109.0),
ProductData("P3", 102.0)
)
val lstDifference: MutableList<ProductData> = mutableListOf()
for ((index,j) in lstToday.withIndex()){
val code = lstToday[index].code
val value = lstToday[index].value-lstYesterday[index].value
lstDifference.add(ProductData(code,value))
}

Share Function Not Working But Same Code Between Apps

Ok, so I am trying to figure out what I am doing wrong here and I'm at a loss. I created one app with a share function to be able to email the data that is put in for later use. That code works fine and is below:
val shareButton1 = findViewById<Button>(R.id.shareButton)
shareButton1.setOnClickListener {
val contractNumber = findViewById<EditText>(R.id.contractNumber)
val conNumber = findViewById<TextView>(R.id.contractNum)
val desNumber = findViewById<EditText>(R.id.desNumber)
val desNum = findViewById<TextView>(R.id.desNum)
val lotNum = findViewById<EditText>(R.id.lotNumber)
val sublotNum = findViewById<EditText>(R.id.sublotNumber)
val genNum = findViewById<TextView>(R.id.genNum)
val ranTonnage = findViewById<TextView>(R.id.ranTonnage)
val sublotTonnage = findViewById<EditText>(R.id.sublotTonnage)
val sampleTonnage = findViewById<TextView>(R.id.sampleTonnage)
val conNumber1 = conNumber.text.toString()
val contractNumber1 = contractNumber.text.toString()
val desNum1 = desNum.text.toString()
val desNumber1 = desNumber.text.toString()
val lotNum1 = lotNum.text.toString()
val sublotNum1 = sublotNum.text.toString()
val genNum1 = genNum.text.toString()
val ranTonnage1 = ranTonnage.text.toString()
val sublotTonnage1 = sublotTonnage.text.toString()
val sampleTonnage1 = sampleTonnage.text.toString()
val shareIntent = Intent()
shareIntent.action = Intent.ACTION_SEND
shareIntent.type = "text/plain"
shareIntent.putExtra(Intent.EXTRA_TEXT, "$conNumber1 $contractNumber1 \n$desNum1 $desNumber1 \nLot #: $lotNum1 \nSublot #: $sublotNum1" +
"\nRandom Number Generated: $genNum1 \nRandom Tonnage: $ranTonnage1 \nSublot: $sublotTonnage1" +
"\nSample Tonnage: $sampleTonnage1 ")
startActivity(Intent.createChooser(shareIntent, "Share via"))
}
Now, I've created a second app to do the same thing with some different data and it isn't wanting to work. I've reviewed the code in the share function and everything seems the exact same. That code is posted below:
val shareButton1 = findViewById<Button>(R.id.share_button)
shareButton1.setOnClickListener {
val conNumber = findViewById<TextView>(R.id.contractNum)
val conNumber1 = conNumber.text.toString()
val contractNumber = findViewById<EditText>(R.id.contractNumInput)
val contractNumber1 = contractNumber.text.toString()
val desNumber = findViewById<TextView>(R.id.desNum)
val desNumber1 = desNumber.text.toString()
val desNum = findViewById<EditText>(R.id.desNumInput)
val desNum1 = desNum.text.toString()
val truckNumber = findViewById<TextView>(R.id.truckNum)
val truckNumber1 = truckNumber.text.toString()
val truckNum = findViewById<EditText>(R.id.truckNumInput)
val truckNum1 = truckNum.text.toString()
val cemDeliveredText = findViewById<TextView>(R.id.cementType1)
val cemDeliveredCalc = findViewById<TextView>(R.id.cementType1Calculated)
val shareIntent = Intent()
shareIntent.action = Intent.ACTION_SEND
shareIntent.type = "text/plain"
shareIntent.putExtra(Intent.EXTRA_INTENT, "$conNumber1 $contractNumber1 \n$desNumber1 $desNum1 \nTruck #: $truckNum1" +
"\nCement/Type1: $cemDeliveredCalc")
startActivity(Intent.createChooser(shareIntent, "Share via"))
}
I know there has to be something simple I am missing. Like I said, the first app code works exactly as expected and pulls all the information. The second app code doesn't pull anything at all. Any help on this would be greatly appreciated as I just seem to be running in circles.
the issue with your code is in second app you can see you send
shareIntent.putExtra(Intent.EXTRA_INTENT, "$conNumber1 $contractNumber1 \n$desNumber1 $desNum1 \nTruck #: $truckNum1" +
"\nCement/Type1: $cemDeliveredCalc")
where cemDeliveredCalc is an object that's why it's not working.
you forgotted below line in your code. Add this line
val cemDeliveredCalc1 = cemDeliveredCalc.text.toString()
and then pass cemDeliveredCalc1 instead of cemDeliveredCalc
shareIntent.putExtra(Intent.EXTRA_TEXT, "$conNumber1 $contractNumber1 \n$desNumber1 $desNum1 \nTruck #: $truckNum1" +
"\nCement/Type1: $cemDeliveredCalc1")

Replace multiple chars with multiple chars in string

I am looking for a possibility to replace multiple different characters with corresponding different characters in Kotlin.
As an example I look for a similar function as this one in PHP:
str_replace(["ā", "ē", "ī", "ō", "ū"], ["a","e","i","o","u"], word)
In Kotlin right now I am just calling 5 times the same function (for every single vocal) like this:
var newWord = word.replace("ā", "a")
newWord = word.replace("ē", "e")
newWord = word.replace("ī", "i")
newWord = word.replace("ō", "o")
newWord = word.replace("ū", "u")
Which of course might not be the best option, if I have to do this with a list of words and not just one word. Is there a way to do that?
You can maintain the character mapping and replace required characters by iterating over each character in the word.
val map = mapOf('ā' to 'a', 'ē' to 'e' ......)
val newword = word.map { map.getOrDefault(it, it) }.joinToString("")
If you want to do it for multiple words, you can create an extension function for better readability
fun String.replaceChars(replacement: Map<Char, Char>) =
map { replacement.getOrDefault(it, it) }.joinToString("")
val map = mapOf('ā' to 'a', 'ē' to 'e', .....)
val newword = word.replaceChars(map)
Just adding another way using zip with transform function
val l1 = listOf("ā", "ē", "ī", "ō", "ū")
val l2 = listOf("a", "e", "i", "o", "u")
l1.zip(l2) { a, b -> word = word.replace(a, b) }
l1.zip(l2) will build List<Pair<String,String>> which is:
[(ā, a), (ē, e), (ī, i), (ō, o), (ū, u)]
And the transform function { a, b -> word = word.replace(a, b) } will give you access to each item at each list (l1 ->a , l2->b).

Kotlin: split string from end

I want to split I_have_a_string into I_have_a and string. Is there a built in function to split from end in Kotlin? The following is what I am doing now
val words = myString.split("_")
val first = words.dropLast(1).joinToString("_")
val second = words.last()
Look at this:
val myString = "I_have_a_string"
val first = myString.substringBeforeLast("_")
val second = myString.substringAfterLast("_")
I think its self explanatory