What is a good alternative for Spark CatalystSqlParser? - sql

I have previously used CatalystSqlParser to parse input strings to DataType like this:
private def convertToDataType(inputType: String): DataType = CatalystSqlParser.parseDataType(inputType)
It was very convenient and easy to implement. However as I can see for now CatalystSqlParser is not available for use. The import org.apache.spark.sql.catalyst.parser.CatalystSqlParser is not working.
Is there any alternative similar to CatalystSqlParser?

You can using CatalystSqlParser.parseDataType by calling DataType.fromDDL()

Related

SQLAlchemy Column Types As Generic (BIT Type)

I am trying to list out the columns of each table in an MSSQL database. I can get them fine, but I need to turn them into generic types for use elsewhere. (Python types would be more ideal but not sure how to do that?)
My code so far works until I come across a BIT type column.
from sqlalchemy import *
from sqlalchemy.engine import URL
connection_string = f"DRIVER={{ODBC Driver 17 for SQL Server}};SERVER={auth['server']};DATABASE={auth['database']};UID={auth['username']};PWD={auth['password']}"
connection_url = URL.create("mssql+pyodbc", query={"odbc_connect": connection_string})
engine = create_engine(connection_url)
metadata = MetaData(schema="Audit")
metadata.reflect(bind=engine)
for table in metadata.sorted_tables:
print('\n' + table.name)
for col in table.columns:
name = col.name
type = col.type.as_generic()
print(name, type)
I get the error:
NotImplementedError: Default TypeEngine.as_generic() heuristic method was unsuccessful for sqlalchemy.dialects.mssql.base.BIT. A custom as_generic() method must be implemented for this type class.
I have tried a number of things to work around this, but I'd like to learn what I need to do to fix it properly.
Can I make a custom method that turns BIT to INTEGER? And if so how can I implement it?
Thanks,
Solved. Was a bug, see comments on first post.

Does Kotlin has a keyword `None`, like Python?

I use python almost everyday. Now I am learning Kotlin. I wonder if there is None in Kotlin so that we can do something like:
v1 = None
if v1 is None:
pass
# then we do something
I did some research, and I found there is none in kotlin-stdlib/kotlin.collection, but that does not seems to be something I am looking for.
If there is such keyword like None in Kotlin, how can it be used? If not, how does Kotlin deal with the situation like the code shown above?
Kotlin's equivalent to Python's None is null.
Note that Kotlin has something called "null safety", so any variables that can receive null must be declared as nullable.
null is the alternative for 'None` (Python) in Kotlin
https://kotlinlang.org/docs/reference/null-safety.html
use the keyword "null" instead. in the documentation there is some examples how to use it. It is slightly different than using None in python but it is not complex if you know some C# I think because it has also this same null safety typing feature

Dask DataFrame to_parquet return bytes instead of writing to file

Is it possible to write dask/pandas DataFrame to parquet and than return bytes string? I know that is not possible with to_parquet() function which accepts file path. Maybe, you have some other ways to do it. If there is no possibility to do something like this, is it makes sense to add such functionality? Ideally, it should be like this:
parquet_bytes = df.to_parquet() # bytes string is returned
Thanks!
There has been work undertaken to allow such a thing, but it's not currently a one-line thing like you suggest.
Firstly, if you have data which can fit in memory, you can use fastparquet's write() method, and supply an open= argument. This must be a function that creates a file-like object in binary-write mode, in your case a BytesIO() would do.
To make this work directly with dask, you could make use of the MemoryFileSystem from the filesystem_spec project. You would need to add the class to Dask and write as following:
dask.bytes.core._filesystems['memory'] = fsspec.implementations.memory.MemoryFileSystem
df.to_parquet('memory://name.parquet')
When done, MemoryFileSystem.store, which is a class attribute, will contain keys that are like filenames, and values which are BytesIO objects containing data.

How should I form a list property type with my own type

I am trying to form below final kotlin code
val participants: List<AbstractParty>
I tried to use below code in kotlinpoet but it shows error, I think it is not correct, but don't know how should I fix it. Any one can help? Thanks.
PropertySpec.builder("participants", List<ClassName("AbstractParty">)
Depending on whether you have a reference to a class or if you need to create its name from Strings, you can do either this:
PropertySpec.builder("participants",
ParameterizedTypeName.get(List::class, AbstractParty::class)
).build()
Or this:
PropertySpec.builder("participants",
ParameterizedTypeName.get(
List::class.asClassName(),
ClassName("some.pckg.name", "AbstractParty"))
).build()
A hint to finding out these sorts of things: KotlinPoet has pretty extensive tests, you can find examples of almost anything in there.
You can use parameterizedBy() extension:
PropertySpec.builder(
"participants",
List::class.asClassName().parameterizedBy(ClassName("some.pckg.name", "AbstractParty")
).build()
https://square.github.io/kotlinpoet/1.x/kotlinpoet/kotlinpoet/com.squareup.kotlinpoet/-parameterized-type-name/-companion/parameterized-by.html

return a computed field in a serialized django object?

I'm writing an API using Django, and I'm running into some issues around returning data that isn't stored in the database directly, or in other cases organized differently than the database schema.
In particular, given a particular data request, I want to add a field of computed data to my model before I serialize and return it. However, if I just add the field to the model, the built-in serializer (I'm using json) ignores it, presumably because it's getting the list of fields from the model definition.
I could write my own serializer, but what a pain. Or I guess I could run model_to_dict, then serialize the dict instead of the model. Anyone have any better ideas?
Here's what the code vaguely looks like right now:
squidlets = Squidlet.objects.filter(stuff)
for i in range(len(squidlets)):
squidlets[i].newfield = do_some_computation(squid)
return HttpResponse(json_serializer.serialize(squidlets,ensure_ascii=False),
'text/json')
But newfield ain't in the returned json.
i think you should serialize using simple json.. and it doent have to be a queryset... to escape it as json also use marksafe
from django.utils.safestring import mark_safe
from django.utils import simplejson
simplejson.dumps(mark_safe(your_data_structure))
I went with the dict solution, which turned out to be fairly clean.
Here's what the code looks like:
from django.forms.models import model_to_dict
squiddicts = []
squidlets = Squidlet.objects.filter(stuff)
for i in range(len(squidlets)):
squiddict = model_to_dict(squidlets[i])
squiddict["newfield"] = do_some_computation(squidlets[i])
squiddicts.append(squiddict)
return HttpResponse(simplejson.dumps(squiddicts,ensure_ascii=False),
'text/json')
This is maybe slightly more verbose than necessary but I think it's clearer this way.
This does still feel somewhat unsatisfying, but seems to work just fine.