Add custom operations Spark SQL - sql

I want to add a new operation on Spark SQL, I have already used user defined function of the form
dataframe.filter(udf("$a", "$b"))
I need to add a similar function but operating on two dataFrames, for example adding a function like:
dataframe1.udf(dataframe2))
To be more precise, the function is an optimized join on two dataframes.
The actual code is
CustomJoin(dataframe1,dataframe2)
Is this possible using user defined functions? Any other solutions or examples?

you can use implicit for this:
class AugmentedDataFrame(val df: DataFrame) {
def CustomJoin(df2: Dataframe){ ......}
}
object DataFrameImplicits {
implicit def dfToAugmentedDataFrame(df: DataFrame) = new AugmentedDataFrame(df)
}
and then:
import DataFrameImplicits._
df.CustomJoin(df2)
to learn more how using implicit to add a custom methods to an existing class :
Add Your Own Methods to the String Class

Related

Object aggregations in Querydsl-jpa

I would like to ask if there is any possibility to use object aggregation functions in JPA (which uses HQL). Functions like json_agg()
I would like to achieve something like. So the goal is to take entity and transform it into string.
Expressions.stringTemplate("jsonb_agg(json_build_object('entity', {0}))", qEntity.id)
Why I try to do I am getting org.hibernate.QueryException: No data type for node: org.hibernate.hql.internal.ast.tree.MethodNode error. I´ve read that problem is I can not use HQL cause I can not use the HQL object properties in json aggregation functions.
I would like to avoid using querydsl-sql as much as I can (It makes complications in docker app deployment, It needs to be connected to database etc). So is there any way how to agregate objects like this using HQL? I am using spring-data-jpa so these is opportunity to use this tool to if there is better solution in it.
Your QueryDSL snippet looks just fine, but you need to register custom functions for JSONB_AGG and JSON_BUILD_OBJECT as well as custom types for the JSONB result.
For the custom JSONB type you can use the JsonBinaryType from the hibernate-types library.
For the custom function, you need to create a MetadataBuilderInitializer that registers the SQL functions with Hibernate. you can take inspiration from my hibernate-types-querydsl-apt library (for example ArrayFunctionInitializer). Applied to JSON functions specifically, you would end up with something along the lines of:
public class ArrayFunctionInitializer implements MetadataBuilderInitializer {
#Override
public void contribute(MetadataBuilder metadataBuilder, StandardServiceRegistry standardServiceRegistry) {
metadataBuilder.applySqlFunction("json_build_object", new StandardSQLFunction("json_build_object", JsonBinaryType.INSTANCE));
metadataBuilder.applySqlFunction("jsonb_agg", new StandardSQLFunction("jsonb_agg", JsonBinaryType.INSTANCE));
}
}

How can I avoid manually checking types in a list?

I seem to recall reading that manually checking types in an object-oriented language (e.g., Kotlin) usually means you are doing something "bad." I'm wondering if there is a better pattern for handling my situation than what I am currently doing ; currently I use when to check the type of the generic parameter, and then call the correct function based on type.
I have a set of data classes that store arrays of different types:
interface Data<T>
AData: Data<Int>
BData: Data<Double>
CData: Data<ComplexType>
DData: Data<Foo>
I can use these classes for doing various kinds of math, and sometimes I want to display them in a table. In order to do that (using JavaFX) I need to create a TableColumn<Int, *>, but I want to create a differently styled table column type based on type contained in the Data class.
I could think of two solutions. What I currently do is:
fun makeCol(data: Data<*>): TableColumn<Int, *> {
return when {
data[0] is Int -> makeBasicCol(data)
data[0] is Double-> makeBasicCol(data)
data[0] is ComplexType -> makeComplexCol(data)
data[0] is Int -> makeFooCol(data)
}
}
The other option I considered was creating a subclass of Data, something like FxData, which would have a function like fun <TableColumn<Int, *>> provideFxCol(). The problem I see with that is that now I am mixing UI functionality with the math functionality.

Function reference for class constructor nested within generic parent class

If I have
sealed class Foo<A> {
data class Bar<A>(val value: Int): Foo<A>()
}
and I want to refer to the Bar<Int> constructor as an implicit lambda using the :: operator, then none of the following are accepted as valid syntax:
Foo<Int>::Bar<Int>
::Foo.Bar<Int>
::(Foo.Bar<Int>) (the compiler tells me that this syntax is reserved for future use).
I can refer to it if I explicitly import the nested class constructor into the scope using
import com.package.Foo.Bar
which allows me to write ::Bar for the constructor and Bar<Int>::value for the property getter. But I have to do this for every nested constructor, and it kind of defeats the advantage of using the :: operator to save typing.
Is there a notation that I have missed which allows me to avoid having to import all nested class names and constructors?
Edit
My original example did not involve generics, which turns out was an oversimplification of the problem I had in my actual code that I am working on, which does use generics.
It turns out that for nested classes without generic parameters, the Foo::Bar notation actually works, so my original question had a false premise. It is, however, not possible to create callable references to constructors within generic classes. This is documented in the following bug report: https://youtrack.jetbrains.com/issue/KT-15952
It is a known bug in the language design: https://youtrack.jetbrains.com/issue/KT-15952
This bug report did however lead me to another workaround using type aliases which is equivalent to adding aliased imports, but has the advantage that you can put the alias where you want, and even share it between modules. In summary, this is the two only viable solutions I know of so far:
// Method one: Import Foo.Bar
import Foo.Bar as BarImported
sealed class Foo<A> {
data class Bar<A>(val value: A): Foo<A>()
}
val ctor: (Int) -> Foo<Int> = ::BarImported
val getter: (BarImported<Int>) -> Int = BarImported<Int>::value
// Method two: Alias Foo.Bar
typealias BarAlias<A> = Foo.Bar<A>
val ctor2: (Int) -> Foo<Int> = ::BarAlias
val getter2: (Foo.Bar<Int>) -> Int = BarAlias<Int>::value
What about wild card imports?
import com.package.Foo.*

Serialize list of objects with json_serializable without creating extra class

I saw a tutorial where an extra class is created just to be able to serialize a list of objects, instead of a single object:
I'm using json_serializable to generate some serialization code for my class Preference, but now I want to save a list of preferences using shared_preferences and I get an error obviously.
var sSavedPrefs = json.encode(PreferenceRepo.getSavedPrefs());
prefs.setString(saved_prefs_key, sSavedPrefs );
I used
#JsonSerializable()
class Preference{...}
to make it serializable, but I don't want to create an extra class like
#JsonSerializable()
class Preferences{...}
just to make it work - is there a better way?
I found a way:
Using the setStringList method I could create a List where I added each serialized object one by one without needing an extra list class. I also noticed that json.encode might not have been the right method to use, I saw jsonEncode in another tutorial and used it instead:
List<String> savedPrefsJson = [];
for (Preference savedPref in PreferenceRepo.savedPrefs) {
String savedPrefJson = jsonEncode(savedPref);
savedPrefsJson.add(savedPrefJson);
}
prefs.setStringList(saved_prefs_key, savedPrefsJson);

Kotlin - lambda to return list of member variables

Preface: This is something I'm not sure Kotlin can do, but I feel like it should be able to do.
Question: Is it possible to return a list composed from another lists' member variables without creating a separate function, via lambda, mapping, or otherwise?
I have a Kotlin inner class that has a name string representing a physical COM port. I have a routine that will poll for available COM ports on a device, and will return a list of the available port name strings for selection.
inner class ComPort() {
val portName: String = "something"
... }
...
ComPortSelectBox.setItems(*getComPortNames())
...
private fun getComPortNames(): Array<String> {
val names: ArrayList<String> = ArrayList()
for(comPort in availableComPorts)
{ names + comPort.portName }
return names.toTypedArray()
}
Because getComPortNames() is only used in the one location, I would love to simplify this call into something equivalent to getComPortNames that I can use inline within .setItems(...). Is this possible within Kotlin? If so, how would one do it?
I'm not sure what availableComPorts actually is, but it looks like Iterable. If so then you may do something like:
ComPortSelectBox.setItems(*availableComPorts.map(ComPort::portName).toTypedArray())
UPD. You did't mention which Java you're using. I assumed it is Java 8.