How to work with schema returned by 'get_catalog_schema_as_spark_schema'? - aws-glue-spark

Example:
schema = glueContext.get_catalog_schema_as_spark_schema(database=args['Database'], table_name=args['Table'])
if I simply print the returned schema I can see the StructType/StructField structure, something similar to:
StructType(
StructField(column1,StringType,true),
StructField(column2,StringType,true)
)
The object itself is java object and it does not seem to match StructType described in https://github.com/awslabs/aws-glue-libs/blob/master/awsglue/gluetypes.py
if I try to iterate through fields property, it throws error that fields is not iterable.
How do I work with this object? Ideally I want to either be able to convert it into JSON, or atleast get the list of columns.
I appreciate any help here.

Related

Convert LinkedHasSet from one type to another

I have a very simple problem, I need to convert a LinkedHashSet that holds one type of object, into another.
So basically what I want to do is something like this(if map could return anything else than TypeB:
LinkedHashSet<TypeA> firstSet
LinkedHashSet<TypeB> secondSet = firstSet.map {
TypeB(firstSet.value1, firstSet.value2)
}
This is mostly written to signalize what I want to achieve, of course it doesn't work. Could someone help me write this in Kotlin?
map returns a List, but you can use mapTo to insert the resulting elements directly into a collection that you provide as its first argument. This collection is also returned so you can assign it to secondSet:
val secondSet: LinkedHashSet<TypeB> = firstSet.mapTo(LinkedHashSet<TypeB>()) {
TypeB(it.value1, it.value2)
}
This is more efficient than using map because it avoids creating an intermediate List to hold the results.

how can I serialize tuples as list in F#?

I have a library that sends me results that include tuples. I need to process some of the data, serialize it and then it goes on its way to another system.
the tuples are ALWAYS made of 2 values but they are extremely wasteful when serialized:
(3, 4)
will serialize as:
{"Item1":3,"Item2":4}
whereas
[3; 4]
will serialize as:
[3,4]
I would like to avoid rebuilding the whole data structure and copying all the data to change this part.
Is there a way, at the serializer level, to convert the tuples into list?
the next process' parser can be easily changed to accommodate a list instead of tuples, so it seems like the best scenario.
the ugly option would be to fix the serialized string with a regex, but I would really like to avoid doing this.
You can override the default serialization behaviour by specifying your own JsonConverter. The following example shows a formatter that writes int * int tuples as two-element JSON arrays.
open Newtonsoft.Json
type IntIntConverter() =
inherit JsonConverter<int * int>()
override x.WriteJson(writer:JsonWriter, (a:int,b:int), serializer:JsonSerializer) =
writer.WriteStartArray()
writer.WriteValue(a)
writer.WriteValue(b)
writer.WriteEndArray()
override x.ReadJson(reader, objectType, existingValue, hasExistingValue, serializer) =
(0, 0)
let sample = [ (1,2); (3,4) ]
let json = JsonConvert.SerializeObject(sample, Formatting.None, IntIntConverter())
The result of running this will be [[1,2],[3,4]]. Note that I have not implemented the ReadJson method, so you cannot yet parse the tuples. This will involve some extra work, but you can look at existing JsonConverters to see how this should be done.
Also note that this is for a specific tuple type containing two integers. If you need to support other tuples, you will probably need to provide several variants of the converter.

AliasToBean DTO with known type

All the examples I am finding for using the AliasToBean transformer use the sessions CreateSqlQuery method rather than the CreateQuery method. They also only return the basic value types, and not any object's of the existing mapped types.
I was hoping it would be possible that my DTO have a property of one of my mapped Domain objects, like below, but I am not getting traction. I get the following exception:
Could not find a setter for property '0' in class 'namespace.DtoClass'
My select looks like the following on my mapped classes (I have confirmed the mappings pull correctly):
SELECT
fcs.MeasurementPoint,
fcs.Form,
fcs.MeasurementPoint.IsUnscheduled as ""IsVisitUnscheduled"",
fcs.MultipleEntryAllowed
FROM FormCollectionSchedule fcs
My end query will be more complex, but I wanted to confirm if this AliasToBean method can return mapped domain objects as well as basic field values from tables retrieved via sql.
the query execution looks like the following:
var result = session.CreateQuery(hqlQuery.ToString())
.SetResultTransformer(NHibernate.Transform.Transformers.AliasToBean(typeof (VisitFormCollectionResult)))
.List<VisitFormCollectionResult>();
note: the VisitFormCollectionResult DTO has more properties, but I wanted to know if I could populate the domain object properties matching the names
update found my problem! I have to explicitly alias each of the fields. once I added an alias, even though the member property on the class matched my DTO's property name, the hydration of the object worked correctly.
The answer to my own question was that each of the individual fields in the select needed an explicit alias matching the property, regardless if the field name already matched the property name of the DTO object:
SELECT
fcs.MeasurementPoint as "MeasurementPoint",
fcs.Form as "Form",
fcs.MeasurementPoint.IsUnscheduled as "IsVisitUnscheduled",
fcs.MultipleEntryAllowed as "MultipleEntryAllowed"
FROM FormCollectionSchedule fcs

A function that will convert input string to the actual object

I am unsure how to describe what I am looking for, so hopefully the situation will make it somewhat clear.
I have an object with a number of properties (let's say object.one, object.two, object.three). There are about 30 of these properties and they all hold a string ("Pass" or "Fail").
Right now the existing code checks whether the property has value "Pass" or "Fail" and then runs some code that prints stuff out. That is, the same snippet of code is duplicated 30 times, one for each of these properties.
The code looks something like this
If (object.one = ... )
...
End if
If (object.two = ... )
...
End if
If (object.three = ... )
...
End if
I want to use a loop to clean this mess up (each block is huge), but am not sure how to do it. I was thinking perhaps there was a way such that I might be able to construct a string like "object.one" and run some function that will tell the compiler that this is actually an object's property?
That way I could create an array containing the object's name like my array = {"object.one", "object.two", "object.three"} and then do something like, in pseudocode
For each string in my array
If (some_function(string) = ...)
...
End If
Essentially, it would take those massive blocks of duplicated code and reduce it to just one block. Is there such a some_function that I am looking for?
This is in VB.net.
I'm not sure i've understand but you can use reflection and get object properties runtime ?
With reflection you can access public object properties, use PropertyFiled.GetValue() to get one, two, etc. and build an array (i suppose that one, two, etc are object Properties, true?)
Here you can find more information: http://msdn.microsoft.com/en-us/library/system.reflection.aspx
Sorry for my bad english, i'm italian.
You seem to be describing serialization, which is the act of converting object state to a format that can be stored/transmitted and deserialization, which is the opposite.
The .NET framework has several different serializers that can work with text - either XML or JSON - the DataContractSerializer for XML and the DataContractJsonSerizlizer for JSON amongst them.

VB.NET problem converting DataTable to JSON

Ok so I'm trying to use the JavaScriptSerializer to work with this code. However it crashes when it reaches the last line;
Dim json As New String(sr.Serialize(dt))
I get this error message;
A circular reference was detected
while serializing an object of type
'System.Reflection.Module'.
I would really appreciate any insights that could help solve this problem.
Circular reference means that serialising the object would result in an infinite loop.
For example if you would try to serialize object "A" having a 1 to 1 reference to object "B".
Declare a class containg the data you want to serialize with JSON to solve this issue.
As hkda150 has already said, you could use a class specifically tailored for being serialized.
This will furthermore enable you to have foreign key values serialized instead of having related full objects serialized. Thus, if you are serializing object a which has a property a.SomeB of type B, then you will often want the ID of a.someB to be present in your webpage. Obviously I don't know enough to be able to say if this is relevant in your specific use case.
BTW, If you find yourself doing a lot of mapping between "business objects" and "objects meant for serialization" you may want to consider using AutoMapper.