#JsonTypeInfo and UnrecognizedPropertyException "type" deserializing Geojson - jackson

I need to deserialize geojson.
Json-example "Polygon" i got from: http://geojson.org/geojson-spec.html#id4
String polygonJson = "{ \"type\": \"Polygon\",\n" +
" \"coordinates\": [\n" +
" [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ]\n" +
" ]\n" +
" }";
Polygon pol = new ObjectMapper().readValue(polygonJson, Polygon.class); // fails :-(
Method readValue fails with:
com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "type" (class org.geojson.Polygon), not marked as ignorable (5 known properties: "interiorRings", "crs", "bbox", "coordinates", "exteriorRing"])
at [Source: { "type": "Polygon",
"coordinates": [
[ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ]
]
}; line: 1, column: 12] (through reference chain: org.geojson.Polygon["type"])
Implementation of parent class in org.geogson library:
#JsonTypeInfo(property = "type", use = Id.NAME)
#JsonSubTypes({ #Type(Feature.class), #Type(Polygon.class), #Type(MultiPolygon.class), #Type(FeatureCollection.class),
#Type(Point.class), #Type(MultiPoint.class), #Type(MultiLineString.class), #Type(LineString.class),
#Type(GeometryCollection.class) })
#JsonInclude(Include.NON_NULL)
public abstract class GeoJsonObject implements Serializable {...}
It uses property called 'type' to find out what actual class is represented by json string.
Why this property is unrecognized ?

Okay so your problem is that you are not giving enough info to #Type annotation. You need to add name so change it to look like this and it should work:
#JsonSubTypes({
#Type(value = Feature.class, name = "Feature"),
#Type(value = Polygon.class, name = "Polygon")
... and so on!

Related

Cannot cast dataframe column containing an array to String

I have the following dataframe:
I wouldl ike to transform the results column into another dataframe.
This is the code I am trying to execute:
val JsonString = df.select(col("results")).as[String]
val resultsDF = spark.read.json(JsonString)
But the first line returns this error:
AnalysisException: Cannot up cast `results` from array<struct<auctions:bigint,bid_price_sum:double,bid_selected_price_sum:double,bids_cancelled:bigint,bids_done:bigint,bids_fail_currency:bigint,bids_fail_parsing:bigint,bids_failed:bigint,bids_filtered_blockrule:bigint,bids_filtered_duration:bigint,bids_filtered_floor_price:bigint,bids_lost:bigint,bids_selected:bigint,bids_timeout:bigint,clicks:bigint,content_owner_id:string,content_owner_name:string,date:bigint,impressions:bigint,intext_inventory:bigint,ivt_blocked:struct<blocked_reason_automated_browsing:bigint,blocked_reason_data_center:bigint,blocked_reason_false_representation:bigint,blocked_reason_irregular_pattern:bigint,blocked_reason_known_crawler:bigint,blocked_reason_manipulated_behavior:bigint,blocked_reason_misleading_uer_interface:bigint,blocked_reason_undisclosed_classification:bigint,blocked_reason_undisclosed_classification_ml:bigint,blocked_reason_undisclosed_use_of_incentives:bigint,ivt_blocked_requests:bigint>,no_bid:bigint,requests:bigint,requests_country:bigint,revenue:double,vtr0:bigint,vtr100:bigint,vtr25:bigint,vtr50:bigint,vtr75:bigint>> to string.
The type path of the target object is:
- root class: "java.lang.String"
You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object
This means that results is not a String.
For example
import org.apache.spark.sql.{DataFrame, SparkSession}
import org.apache.spark.sql.functions.col
object Main extends App {
val spark = SparkSession.builder
.master("local")
.appName("Spark app")
.getOrCreate()
import spark.implicits._
case class MyClass(auctions: Int, bid_price_sum: Double)
val df: DataFrame =
Seq(
("xxx", "yyy", 1, """[{"auctions":9343, "bid_price_sum":1.062}, {"auctions":1225, "bid_price_sum":0.153}]"""),
("xxx1", "yyy1", 2, """{"auctions":1111, "bid_price_sum":0.111}"""),
)
.toDF("col1", "col2", "col3", "results")
df.show()
val JsonString = df.select(col("results")).as[String]
val resultsDF = spark.read.json(JsonString)
resultsDF.show()
}
produces
+----+----+----+--------------------+
|col1|col2|col3| results|
+----+----+----+--------------------+
| xxx| yyy| 1|[{"auctions":9343...|
|xxx1|yyy1| 2|{"auctions":1111,...|
+----+----+----+--------------------+
+--------+-------------+
|auctions|bid_price_sum|
+--------+-------------+
| 9343| 1.062|
| 1225| 0.153|
| 1111| 0.111|
+--------+-------------+
while
// ........................
val df: DataFrame =
Seq(
("xxx", "yyy", 1, Seq(MyClass(9343, 1.062), MyClass(1225, 0.153))),
("xxx1", "yyy1", 2, Seq(MyClass(1111, 0.111))),
)
.toDF("col1", "col2", "col3", "results")
// ........................
produces your exception
org.apache.spark.sql.AnalysisException: Cannot up cast results
from "ARRAY<STRUCT<auctions: INT, bid_price_sum: DOUBLE>>" to "STRING".
The type path of the target object is:
- root class: "java.lang.String"
You can either add an explicit cast to the input data
or choose a higher precision type of the field in the target object
You can fix the exception with
df.select(col("results")).as[Seq[MyClass]]
instead of
df.select(col("results")).as[String]
Similarly,
val df = spark.read.json("src/main/resources/file.json")
in the above code produces correct result for the following file.json
{"col1": "xxx", "col2": "yyy", "col3": 1, "results": "[{\"auctions\":9343, \"bid_price_sum\":1.062}, {\"auctions\":1225, \"bid_price_sum\":0.153}]"}
{"col1": "xxx1", "col2": "yyy1", "col3": 2, "results": "{\"auctions\":1111, \"bid_price_sum\":0.111}"}
but throws for the following file
{"col1": "xxx", "col2": "yyy", "col3": 1, "results": [{"auctions":9343, "bid_price_sum":1.062}, {"auctions":1225, "bid_price_sum":0.153}]}
{"col1": "xxx1", "col2": "yyy1", "col3": 2, "results": [{"auctions":1111, "bid_price_sum":0.111}]}
Extract columns from a json and write into a dataframe

Merge two GeoJSON into one in a dataframe

I have a dataframe containing GeoJSON:
data = {'geojson': {0: '{"type":"LineString","coordinates":[[1,4],[2,5]]}',
1: '{"type":"LineString","coordinates":[[3,6],[4,7]]}'},
'checkpoint': {0: 6, 1: 0},'lom_name': {0: 'marathon19', 1: 'marathon19'}}
df = pd.DataFrame.from_dict(data)
The desired result is:
geojson lob_name
{"type":"LineString","coordinates":[[1,4],[2,5],[3,6],[4,7]]} marathon19
I tried df = df.groupby(['geojson']).apply(list) not really giving something i need
You could use the following approach using geopandas dissolve function on geojson data (a non-geometric approach would involve aggregating, concatenating and parsing the df) :
import json
import geopandas as gpd
data = { "type": "FeatureCollection",
"features": [
{ "type": "Feature",
"geometry": {
"type": "LineString",
"coordinates": [
[3,6],[4,7]
]
},
"properties": {
"checkpoint": 0,
"prop1": 'marathon19'
}
},
{ "type": "Feature",
"geometry": {
"type": "LineString",
"coordinates": [
[1,4],[2,5]
]
},
"properties": {
"checkpoint": 0,
"prop1": 'marathon19'
}
}
]
}
# create geodataframe from geojson-features
gdf = gpd.GeoDataFrame.from_features(data)
# dissolve line strings
gdf = gdf.dissolve()
# convert back to geojson
gdf.to_json()
If the input lines are non-contiguous (like in your example data) you still need to create a linestring out of the resulting multilinestring (see for example and caveats here):
import shapely
# Put the sub-line coordinates into a list of sublists
outcoords = [list(i.coords) for i in gdf.iloc()[0].geometry]
# Flatten the list of sublists and use it to make a new line
outline = shapely.geometry.LineString([i for sublist in outcoords for i in sublist])
# set geometry of gdf to linestring
gdf = gdf.set_geometry([outline])

How to save DNNRegressor model and load it to predict

I tried this and got an error like this:
Node: 'dnn/hiddenlayer_0/Relu'
In[0] and In[1] has different ndims: [1] vs. [1,1]
[[{{node dnn/hiddenlayer_0/Relu}}]] [Op:__inference_pruned_1112]
here is my code:
data = pd.DataFrame(
{
'age': [40.0, 20.0, 18.0, 15.0, 20.0, 30.0, 21.0, 23.0],
'click': [0, 0, 1, 1, 0, 0, 1, 0]
}
)
train, test = train_test_split(data, test_size=0.2)
age = tf.feature_column.numeric_column("age")
feature_columns = [age]
model = tf.estimator.DNNRegressor(feature_columns=feature_columns, hidden_units=[1],
model_dir='./models/dnnregressor')
serving_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(tf.feature_column.make_parse_example_spec([age]))
export_path = model.export_saved_model("./models/dnnregressor_serve", serving_input_fn)
age_feature = tf.train.Feature(float_list=tf.train.FloatList(value=[40]))
simple_feature = tf.train.Features(feature={
"age": age_feature
})
simple_example = tf.train.Example(features=simple_feature)
serving_model = tf.saved_model.load(export_path)
f = serving_model.signatures['serving_default']
f(inputs=tf.constant(simple_example.SerializeToString()))
that is an very simple demo, i just want to check the usage of loading saved mode to predict new examples. then i got an error said at the head of the post.
if anyone can help me!!!
change the input from tf.constant(simple_example.SerializeToString()) to
tf.constant([simple_example.SerializeToString()]), will work。

Python Pandas How to read json without sorting by the index?

If i have a json file like this, and import it to dataframe, the column order is always sort -0.8, -0.9. i want the order maintain as it is defined in the json which is -0.9, -0.8
{
"-0.90": {
"A": 1.0,
"B": 0.4935585804
},
"-0.80": {
"A": 1.0,
"B": 0.4935585804
}
}
You can load load your json data as an OrderedDict to preserve the order of the keys, and then use the DataFrame.from_dict constructor:
import json
from collections import OrderedDict
s = """{
"-0.90": {
"A": 1.0,
"B": 0.4935585804
},
"-0.80": {
"A": 1.0,
"B": 0.4935585804
}
}"""
data = json.loads(s, object_pairs_hook=OrderedDict)
pd.DataFrame.from_dict(data)
-0.90 -0.80
A 1.000000 1.000000
B 0.493559 0.493559

Trouble setting up the SimpleVector encoder

Using the commits from breznak for the encoders (I wasn't able to figure out "git checkout ..." with GitHub, so I just carefully copied over the three files - base.py, multi.py, and multi_test.py).
I ran multi_test.py without any problems.
Then I adjusted my model parameters (MODEL_PARAMS), so that the encoders portion of 'sensorParams' looks like this:
'encoders': {
'frequency': {
'fieldname': u'frequency',
'type': 'SimpleVector',
'length': 5,
'minVal': 0,
'maxVal': 210
}
},
I also adjusted the modelInput portion of my code, so it looked like this:
model = ModelFactory.create(model_params.MODEL_PARAMS)
model.enableInference({'predictedField': 'frequency'})
y = [1,2,3,4,5]
modelInput = {"frequency": y}
result = model.run(modelInput)
But I get the final error, regardless if I instantiate 'y' as a list or a numpy.ndarray
File "nta/eng/lib/python2.7/site-packages/nupic/encoders/base.py", line 183, in _getInputValue
return getattr(obj, fieldname)
AttributeError: 'list' object has no attribute 'idx0'
I also tried initializing a SimpleVector encoder inline with my modelInput, directly encoding my array, then passing it through modelInput. That violated the input parameters of my SimpleVector, because I was now double encoding. So I removed the encoders portion of my model parameters dictionary. That caused a spit up, because some part of my model was looking for that portion of the dictionary.
Any suggestions on what I should do next?
Edit: Here're the files I'm using with the OPF.
sendAnArray.py
import numpy
from nupic.frameworks.opf.modelfactory import ModelFactory
import model_params
class sendAnArray():
def __init__(self):
self.model = ModelFactory.create(model_params.MODEL_PARAMS)
self.model.enableInference({'predictedField': 'frequency'})
for i in range(100):
self.run()
def run(self):
y = [1,2,3,4,5]
modelInput = {"frequency": y}
result = self.model.run(modelInput)
anomalyScore = result.inferences['anomalyScore']
print y, anomalyScore
sAA = sendAnArray()
model_params.py
MODEL_PARAMS = {
'model': "CLA",
'version': 1,
'predictAheadTime': None,
'modelParams': {
'inferenceType': 'TemporalAnomaly',
'sensorParams': {
'verbosity' : 0,
'encoders': {
'frequency': {
'fieldname': u'frequency',
'type': 'SimpleVector',
'length': 5,
'minVal': 0,
'maxVal': 210
}
},
'sensorAutoReset' : None,
},
'spEnable': True,
'spParams': {
'spVerbosity' : 0,
'globalInhibition': 1,
'columnCount': 2048,
'inputWidth': 5,
'numActivePerInhArea': 60,
'seed': 1956,
'coincInputPoolPct': 0.5,
'synPermConnected': 0.1,
'synPermActiveInc': 0.1,
'synPermInactiveDec': 0.01,
},
'tpEnable' : True,
'tpParams': {
'verbosity': 0,
'columnCount': 2048,
'cellsPerColumn': 32,
'inputWidth': 2048,
'seed': 1960,
'temporalImp': 'cpp',
'newSynapseCount': 20,
'maxSynapsesPerSegment': 32,
'maxSegmentsPerCell': 128,
'initialPerm': 0.21,
'permanenceInc': 0.1,
'permanenceDec' : 0.1,
'globalDecay': 0.0,
'maxAge': 0,
'minThreshold': 12,
'activationThreshold': 16,
'outputType': 'normal',
'pamLength': 1,
},
'clParams': {
'regionName' : 'CLAClassifierRegion',
'clVerbosity' : 0,
'alpha': 0.0001,
'steps': '5',
},
'anomalyParams': {
u'anomalyCacheRecords': None,
u'autoDetectThreshold': None,
u'autoDetectWaitRecords': 2184
},
'trainSPNetOnlyIfRequested': False,
},
}
The problem seems to be that the SimpleVector class is accepting an array instead of a dict as its input, and then reconstructs that internally as {'list': {'idx0': 1, 'idx1': 2, ...}} (ie as if this dict had been the input). This is fine if it is done consistently, but your error shows that it's broken down somewhere. Have a word with #breznak about this.
Working through the OPF was difficult. I wanted to input an array of indices into the temporal pooler, so I opted to interface directly with the algorithms (I relied heavy on hello_tp.py). I ignored SimpleVector all together, and instead worked through the BitmapArray encoder.
Subutai has a useful email on the nupic-discuss listserve, where he breaks down the three main areas of the NuPIC API: algorithms, networks/regions, & the OPF. That helped me understand my options better.