How can I reformat a list of items from lua to kotlin? - kotlin

I am working on a project that originally started in Lua and I want to update it to Kotlin.
I have about 5000 questions/answers that look like:
['Question'][1] = "'7x' was used to refer to the secret ingredient of what drink";
['Answers'][1] = {"coca cola"};
['Question'][2] = "'And the big wheel keep on turning neon burning up above and I'm just high on the world come on and take the low ride with me girl on the.....' What's the Dire Straits song title?";
['Answers'][2] = {"tunnel of love"};
I want to change the format of these without manually going through all 5000, so that they look like:
val que1 = Question(
1, "'7x' was used to refer to the secret ingredient of what drink",
"coca cola"
)
val que2 = Question(
2, "'And the big wheel keep on turning neon burning up above and I'm just high on the world come on and take the low ride with me girl on the.....' What's the Dire Straits song title?",
"tunnel of love"
)
Please help me figure out how to reformat these questions/answers. Thanks.

You can write a script like this to get the desired output.
Step 1: Create input.txt and paste your lua code here which you want to translate
['Question'][1] = "'7x' was used to refer to the secret ingredient of what drink";
['Answers'][1] = {"coca cola"};
['Question'][2] = "'And the big wheel keep on turning neon burning up above and I'm just high on the world come on and take the low ride with me girl on the.....' What's the Dire Straits song title?";
['Answers'][2] = {"tunnel of love"};
Step 2: Create an empty file named output.txt. This file will consist of converted code.
Step 3: Run this main function. You might need to modify file path base on your directory structure. Note: This code is written in Kotlin.
fun main() {
val input = File("src/main/kotlin", "input.txt").readLines()
val outputWriter = File("src/main/kotlin", "output.txt").printWriter()
val lines = input.windowed(2, 2)
val questions = mutableListOf<String>()
lines.forEachIndexed { index, str ->
val que = str[0].split("\"")[1]
val ans = str[1].split("\"")[1]
val question = "val que${index + 1} = Question(${index + 1}, \"$que\", \"$ans\")"
println(question)
questions.add(question)
}
outputWriter.use { out->
questions.forEach {
out.println(it)
}
}
}
After running the script you will get the desired output in output.txt. Alternatively, you can also get this from the console.

Related

Scalding Unit Test - How to Write A Local File?

I work at a place where scalding writes are augmented with a specific API to track dataset meta data. When converting from normal writes to these special writes, there are some intricacies with respect to Key/Value, TSV/CSV, Thrift ... datasets. I would like to compare the binary file is the same prior to conversion and after conversion to the special API.
Given I cannot provide the specific api for the metadata-inclusive writes, I only ask how can I write a unit test for .write method on a TypedPipe?
implicit val timeZone: TimeZone = DateOps.UTC
implicit val dateParser: DateParser = DateParser.default
implicit def flowDef: FlowDef = new FlowDef()
implicit def mode: Mode = Local(true)
val fileStrPath = root + "/test"
println("writing data to " + fileStrPath)
TypedPipe
.from(Seq[Long](1, 2, 3, 4, 5))
// .map((x: Long) => { println(x.toString); System.out.flush(); x })
.write(TypedTsv[Long](fileStrPath))
.forceToDisk
The above doesn't seem to write anything to local (OSX) disk.
So I wonder if I need to use a MiniDFSCluster something like this:
def setUpTempFolder: String = {
val tempFolder = new TemporaryFolder
tempFolder.create()
tempFolder.getRoot.getAbsolutePath
}
val root: String = setUpTempFolder
println(s"root = $root")
val tempDir = Files.createTempDirectory(setUpTempFolder).toFile
val hdfsCluster: MiniDFSCluster = {
val configuration = new Configuration()
configuration.set(MiniDFSCluster.HDFS_MINIDFS_BASEDIR, tempDir.getAbsolutePath)
configuration.set("io.compression.codecs", classOf[LzopCodec].getName)
new MiniDFSCluster.Builder(configuration)
.manageNameDfsDirs(true)
.manageDataDfsDirs(true)
.format(true)
.build()
}
hdfsCluster.waitClusterUp()
val fs: DistributedFileSystem = hdfsCluster.getFileSystem
val rootPath = new Path(root)
fs.mkdirs(rootPath)
However, my attempts to get this MiniCluster to work haven't panned out either - somehow I need to link the MiniCluster with the Scalding write.
Note: The Scalding JobTest framework for unit testing isn't going to work due actual data written is sometimes wrapped in bijection codec or setup with case class wrappers prior to the writes made by the metadata-inclusive writes APIs.
Any ideas how I can write a local file (without using the Scalding REPL) with either Scalding alone or a MiniCluster? (If using the later, I need a hint how to read the file.)
Answering ... There is an example of how to use a mini cluster for exactly reading and writing to HDFS. I will be able to cross read with my different writes and examine them. Here it is in the tests for scalding's TypedParquet type
HadoopPlatformJobTest is an extension for JobTest that uses a MiniCluster.
With some hand-waiving on detail in the link, the bulk of the code is this:
"TypedParquetTuple" should {
"read and write correctly" in {
import com.twitter.scalding.parquet.tuple.TestValues._
def toMap[T](i: Iterable[T]): Map[T, Int] = i.groupBy(identity).mapValues(_.size)
HadoopPlatformJobTest(new WriteToTypedParquetTupleJob(_), cluster)
.arg("output", "output1")
.sink[SampleClassB](TypedParquet[SampleClassB](Seq("output1"))) {
toMap(_) shouldBe toMap(values)
}
.run()
HadoopPlatformJobTest(new ReadWithFilterPredicateJob(_), cluster)
.arg("input", "output1")
.arg("output", "output2")
.sink[Boolean]("output2")(toMap(_) shouldBe toMap(values.filter(_.string == "B1").map(_.a.bool)))
.run()
}
}

Kotlin Comparison between BufferedReader::readText and String always false

I read stdin and stderr from the command-line using:
fun runCommand(vararg commands: String): Pair<String, String> {
val proc = Runtime.getRuntime().exec(commands)
val stdIn = BufferedReader(InputStreamReader(proc.inputStream))
val stdErr = BufferedReader(InputStreamReader(proc.errorStream))
val p = Pair(stdIn.use(BufferedReader::readText).trim(), stdErr.use(BufferedReader::readText).trim())
stdIn.close();
stdErr.close();
return p;
}
This gives me a Pair of <String, String> with the output of stdin and stderr.
However, no matter how I try to compare these Strings to another String, the comparison always returns false.
Things I've tried:
runCommand("nordvpn", "account").first.compareTo("You are not logged in.")
runCommand("nordvpn", "account").first == "You are not logged in."
runCommand("nordvpn", "account").first.equals("You are not logged in.")
Might this have something to do with the encoding?
Or am I just reading the output incorrectly?
Any help would be appreciated!
Thanks to #gidds' comment I was able to find that for some reason the output of the command (stdout) had "-CR SP SP CR" (- CarriageReturn Space Space CarriageReturn" prepended, which I removed with a simple String.drop(5).
Edit: After some more thinking, I assume that the aforementioned charts were responsible for making the output of the command in the terminal colored (yellow)

mongoengine slow serialization of embedded documents with reference fields

I have a small DB with about 500 records. I'm trying to implement a versioning scheme where I save the form along with its current version to my Record collection. Ideally, I would like to store the form along with its version number in an embedded document to keep things nice and tidy:
class Structure(db.EmbeddedDocument):
form = db.ReferenceField(Form, required = True)
version = db.IntField(required = True)
#property
def short(self):
return {
'form': self.form,
'version': self.version
}
class Record(db.Document):
structure = db.EmbeddedDocumentField(Structure)
#property
def short(self):
return {
'structure': self.structure.short
}
This way when I recall a record I can grab the form and the version that was used at the time. Running some timing tests:
start = time.clock()
records = Record.objects.select_related()
print ('Time: ', time.clock() - start)
response = [i.short for i in records]
print ('Time: ', time.clock() - start)
I find the query time for all records Record.objects.select_related() to be reasonable at, ~ 1.12 s, however, I'm finding serialization for the purpose of JSON transfer is extremely expensive at ~ 24.1s!
If I make a slight modification by removing use of the EmbeddedDocument:
class Record(db.Document):
form = db.ReferenceField(Form, required = True)
version = db.IntField(required = True)
#property
def short(self):
return {
'form': self.form,
'version': self.version
}
Running the same test I find the query time to be pretty much unchanged at ~ 1.36 s, however, the serialization time improved by 24x to 1.14s. I really do not understand why use of an embedded document would lead to such as massive penalty in serialization time...? Is dereferencing in an embedded object more difficult?

Programatically creating dstreams in apache spark

I am writing some self contained integration tests around Apache Spark Streaming.
I want to test that my code can ingest all kinds of edge cases in my simulated test data.
When I was doing this with regular RDDs (not streaming). I could use my inline data and call "parallelize" on it to turn it into a spark RDD.
However, I can find no such method for creating destreams. Ideally I would like to call some "push" function once in a while and have the tupple magically appear in my dstream.
ATM I'm doing this by using Apache Kafka: I create a temp queue, and I write to it. But this seems like overkill. I'd much rather create the test-dstream directly from my test data without having to use Kafka as a mediator.
For testing purpose, you can create an input stream from a queue of RDDs.
Pushing more RDDs in the queue will simulate having processed more events in the batch interval.
val sc = SparkContextHolder.sc
val ssc = new StreamingContext(sc, Seconds(1))
val inputData: mutable.Queue[RDD[Int]] = mutable.Queue()
val inputStream: InputDStream[Int] = ssc.queueStream(inputData)
inputData += sc.makeRDD(List(1, 2)) // Emulate the RDD created during the first batch interval
inputData += sc.makeRDD(List(3, 4)) // 2nd batch interval
// etc
val result = inputStream.map(x => x*x)
result.foreachRDD(rdd => assertSomething(rdd))
ssc.start() // Don't forget to start the streaming context
In addition to Raphael solution I think you like to also either can process one batch a time or everything available approach. You need to set oneAtATime flag accordingly on queustream's optional method argument as shown below:
val slideDuration = Milliseconds(100)
val conf = new SparkConf().setAppName("NetworkWordCount").setMaster("local[8]")
val sparkSession: SparkSession = SparkSession.builder.config(conf).getOrCreate()
val sparkContext: SparkContext = sparkSession.sparkContext
val queueOfRDDs = mutable.Queue[RDD[String]]()
val streamingContext: StreamingContext = new StreamingContext(sparkContext, slideDuration)
val rddOneQueuesAtATimeDS: DStream[String] = streamingContext.queueStream(queueOfRDDs, oneAtATime = true)
val rddFloodOfQueuesDS: DStream[String] = streamingContext.queueStream(queueOfRDDs, oneAtATime = false)
rddOneQueuesAtATimeDS.print(120)
rddFloodOfQueuesDS.print(120)
streamingContext.start()
for (i <- (1 to 10)) {
queueOfRDDs += sparkContext.makeRDD(simplePurchase(i))
queueOfRDDs += sparkContext.makeRDD(simplePurchase((i + 3) * (i + 3)))
Thread.sleep(slideDuration.milliseconds)
}
Thread.sleep(1000L)
I found this base example:
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/CustomReceiver.scala
The key here is calling the "store" command. Replace the contents of store with whatever you want.

AVAudioUnitEQ / .BandPass filter doesn't work

I can't get the AVAudioUnitEQ to work.
Here's a piece of code that should filter out everything except 659.255Hz +/-0.05 octaves:
// Create Audio Engine
var audioEngine = AVAudioEngine()
// Create Equalizer Node
var equalizerNode = AVAudioUnitEQ(numberOfBands: 1)
var epualizerParameters: AVAudioUnitEQFilterParameters = equalizerNode.bands.first as AVAudioUnitEQFilterParameters
epualizerParameters.filterType = .BandPass
epualizerParameters.frequency = 659.255
epualizerParameters.bandwidth = 0.05
epualizerParameters.bypass = false
audioEngine.attachNode(equalizerNode)
// Configure Audio Engine
var format = audioEngine.inputNode.inputFormatForBus(0)
audioEngine.connect(audioEngine.inputNode, to: equalizerNode, format: format)
audioEngine.connect(equalizerNode, to: audioEngine.outputNode, format: format)
// Start Audio Engine
var error:NSError?
audioEngine.startAndReturnError(&error)
However, when I run it, put on my headphones and sing into the microphone, I can hear myself loud and clear.
Now, according to Wikipedia, the Band Pass filter is:
... a device that passes frequencies within a certain range and
rejects (attenuates) frequencies outside that range.
What am I doing wrong? I want to filter out everything except given frequency range.
It was your EQ params.
I created a github project with sliders and switches. You can hear the difference.
Try it.
This works in my project which uses a playerNode.
var format = engine.mainMixerNode.outputFormatForBus(0)
engine.connect(playerNode, to: EQNode, format: format )
engine.connect(EQNode, to: engine.mainMixerNode, format: format)
I see you're using the engine's inputNode. Try swapping out these few lines (hook into the mixer instead of the outputNode) and let us know if it works.