Pandas json normalize

Pandas json normalize - pandas

Can someone help on how to put this json structure into pandas dataframe ? I would like the final structure to have date, high, low, open, low, close as the column.
The issues I have with this structure is that the key is a running numbers (eg 1607956200, 1607956500, 1607956800 ...) and I can't possibly lay out all the running numbers.
I have tried with the code below but I get all the data in the column with only 1 row.
response=requests.get(url).text
data = json.loads(response)
df = pd.json_normalize(data['history'] )
Below is an example of data
data = {"meta": {"regularMarketPrice": 9.78, "chartPreviousClose": 9.02, "previousClose": 9.78, "scale": 3, "dataGranularity": "5m", "range": "" },
"history": {"1607956200": { "date": "14-12-2020", "open": 9.13, "high": 9.18, "low": 9.12, "close": 9.14 },
"1607956500": { "date": "14-12-2020", "open": 9.14, "high": 9.14, "low": 9.08, "close": 9.1 },
"1607956800": { "date": "14-12-2020", "open": 9.1, "high": 9.11, "low": 9.09, "close": 9.1 },}}

Create the dataframe and then transpose
df = pd.DataFrame(data['history']).T
date open high low close
1607956200 14-12-2020 9.13 9.18 9.12 9.14
1607956500 14-12-2020 9.14 9.14 9.08 9.10
1607956800 14-12-2020 9.10 9.11 9.09 9.10

Related

Laravel query sum group by week and get 0 for weeks not existent in the dataset

Hi I am trying to get sum of quantity group by week of current year.
Here is my query which is working
Sale::selectRaw('sum(qty) as y')
->selectRaw(DB::raw('WEEK(order_date,1) as x'))
->whereYear('order_date',Carbon::now()->format('Y'))
->groupBy('x')
->orderBy('x', 'ASC')
->get();
The response I get is look like this. where x is the week number and y is the sum value.
[
{
"y": "50",
"x": 2
},
{
"y": "4",
"x": 14
}
]
I want to get 0 values for the week that doesn't have any value for y
My desired result should be like this
[
{
"y": "0",
"x": 1
},
{
"y": "50",
"x": 2
},
...
...
...
{
"y": "4",
"x": 14
}
]

is there's function to extract value from dictionary in list?

kindly need to extract name value, even it's Action or adventure from this column in new column
in pandas
'[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]'

You want from_records:
import pandas as pd
data = [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]
df = pd.DataFrame.from_records(data)
df
you get
id name
0 28 Action
1 12 Adventure
2 14 Fantasy
3 878 Science Fiction

Combining separate temporal measurement series

I have a data set that combines two temporal measurement series with one row per measurement
time: 1, measurement: a, value: 5
time: 2, measurement: b, value: false
time: 10, measurement: a, value: 2
time: 13, measurement: b, value: true
time: 20, measurement: a, value: 4
time: 24, measurement: b, value: true
time: 30, measurement: a, value: 6
time: 32, measurement: b, value: false
in a visualization using Vega lite, I'd like to combine the measurement series and encode measurement a and b in a single visualization without simply layering their representation on a temporal axis but representing their value in a single encoding spec.
either measurement a values need to be interpolated and added as a new value to rows of measurement b
eg:
time: 2, measurement: b, value: false, interpolatedMeasurementA: 4.6667
or the other way around, which leaves the question of how to interpolate a boolean. maybe closest value by time, or simpler: last value
eg:
time: 30, measurement: a, value: 6, lastValueMeasurementB: true
I suppose this could be done either query side in which case this question would be regarding indexDB Flux query language
or this could be done on the visualization side in which case this would be regarding vega-lite

There's not any true linear interpolation schemes built-in to Vega-Lite (though the loess transform comes close), but you can achieve roughly what you wish with a window transform.
Here is an example (view in editor):
{
"data": {
"values": [
{"time": 1, "measurement": "a", "value": 5},
{"time": 2, "measurement": "b", "value": false},
{"time": 10, "measurement": "a", "value": 2},
{"time": 13, "measurement": "b", "value": true},
{"time": 20, "measurement": "a", "value": 4},
{"time": 24, "measurement": "b", "value": true},
{"time": 30, "measurement": "a", "value": 6},
{"time": 32, "measurement": "b", "value": false}
]
},
"transform": [
{
"calculate": "datum.measurement == 'a' ? datum.value : null",
"as": "measurement_a"
},
{
"window": [
{"op": "mean", "field": "measurement_a", "as": "interpolated"}
],
"sort": [{"field": "time"}],
"frame": [1, 1]
},
{"filter": "datum.measurement == 'b'"}
],
"mark": "line",
"encoding": {
"x": {"field": "time"},
"y": {"field": "interpolated"},
"color": {"field": "value"}
}
}
This first uses a calculate transform to isolate the values to be interpolated, then a window transform that computes the mean over adjacent values (frame: [1, 1]), then a filter transform to isolate interpolated rows.
If you wanted to go the other route, you could do a similar sequence of transforms targeting the boolean value instead.

BigQuery GIS table to GeoJSON type Feature Collection with SQL and/or Python

I have some BigQuery tables with spatial data (lon, lat, point, and linestring) representing boat tracks.
I'm trying to get them into a GeoJSON for an API output.
I have one method that uses Python to query my table into a pandas dataframe, which I can then use a function to create a feature collection.
But, there's so much data/day that it'd be ideal if I could perform some kind of efficient linestring union and simplify.
My current method:
# Query BigQuery table
sql = """SELECT
lon, lat, lead_lon, lead_lat,
CAST(CAST(timestamp AS DATE) as string) as date,
UNIX_SECONDS(timestamp) as unix_secs,
CAST(ship_num AS STRING) AS ship_num,
op,
CAST(knots AS FLOAT64) AS knots,
point,
linestring
FROM `ship_segments`
WHERE timestamp BETWEEN '2020-04-16' AND '2020-04-17';"""
# Make into pandas dataframe
df = client.query(sql).to_dataframe()
#df to geojson fn
def data2geojson(df):
features = []
insert_features = lambda X: features.append(
geojson.Feature(geometry=geojson.LineString(([X["lead_lon"], X["lead_lat"], X["knots"], X["unix_secs"]],
[X["lon"], X["lat"], X["knots"], X["unix_secs"]])),
properties=dict(date=X["date"],
mmsi=X["ship_num"],
operator=X["op"]
)))
df.apply(insert_features, axis=1)
geojson_obj = geojson.dumps(geojson.FeatureCollection(features, indent=2, sort_keys=True), sort_keys=True, ensure_ascii=False)
return(geojson_obj)
results = data2geojson(df)
This returns a GeoJSON:
{"features": [{"geometry": {"coordinates": [[-119.049945, 33.983277, 10.5502, 1587104709], [-119.034677, 33.975823, 10.5502, 1587104709]], "type": "LineString"}, "properties": {"date": "2020-04-17", "mmsi": "235098383", "operator": "Evergreen Marine Corp"}, "type": "Feature"}, {"geometry": {"coordinates": [[-120.176933, 34.282107, 22.7005, 1587114969], [-120.144453, 34.275147, 22.7005, 1587114969]], "type": "LineString"}, "properties": {"date": "2020-04-17", "mmsi": "235098383", "operator": "Evergreen Marine Corp"}, "type": "Feature"}, {"geometry": {"coordinates": [[-118.361737, 33.64647, 11.3283, 1587096305], [-118.356308, 33.643713, 11.3283, 1587096305]], "type": "LineString"}, "properties": {"date": "2020-04-17", "mmsi": "538005412", "operator": "Scorpio MR Pool Ltd"}, "type": "Feature"}, {"geometry": {"coordinates": [[-118.414667, 33.673013, 12.7684, 1587097278], [-118.411707, 33.671493, 12.7684, 1587097278]], "type": "LineString"}, "properties": {"date": "2020-04-17", "mmsi": "538005412", "operator": "Scorpio MR Pool Ltd"}, "type": "Feature"}, {"geometry": {"coordinates": [[-119.377783, 34.062612, 10.5456, 1587102119], [-119.384212, 34.064217, 10.5456, 1587102119]], "type": "LineString"}, "properties": {"date": "2020-04-17", "mmsi": "636018225", "operator": "Ocean Network Express Pte Ltd"}, "type": "Feature"}], "indent": 2, "sort_keys": true, "type": "FeatureCollection"}
But I'm trying something like:
select
ship_num,
date(timestamp) as date,
AVG(speed_knots) as avg_speed_knots,
st_union_agg(linestring) as multiline
from(
SELECT
*,
row_number() OVER w AS num,
ST_GeogPoint(lon,lat) as geom,
LEAD(ST_GeogPoint(lon,lat)) OVER w AS geom2,
ST_MAKELINE((ST_GeogPoint(lon,lat)), (LEAD(ST_GeogPoint(lon,lat)) OVER w)) AS linestring,
LEAD(STRING(timestamp), 0) OVER w AS t1,
LEAD(STRING(timestamp), 1) OVER w AS t2,
FROM
`ship_data`
where timestamp >= '2020-04-10'
WINDOW w AS (PARTITION BY ship_num ORDER BY timestamp)) AS q
group by
ship_num, date(timestamp);
This gives me multilinestrings in a table, but then I need to simplify and get them in a GeoJSON FeatureCollection output.
Any ideas that don't use PostGIS?

Rendering multi series line chart with time on x axis in AnyChart

I am trying Anychart line chart with multi series with date time on x axis. Not able to render the chart perfectly. Its drawing the series 1 with the given data then for the second series its rendering the date time freshly on x axis after the 1st series values then plotting the series 2.
data is like:
"data": [
{"x": "10/2/2016 01:00:00 AM", "value": "128.14"},
{"x": "10/2/2016 01:10:00 AM", "value": "112.61"}
]
},{
// second series data
"data": [
{"x": "10/2/2016 01:01:00 AM", "value": "90.54"},
{"x": "10/2/2016 01:02:00 AM", "value": "104.19"},
{"x": "10/2/2016 01:11:00 AM", "value": "150.67"}
]
It has to plot on x axis like 10/2/2016 01:00:00 AM, 10/2/2016 01:01:00 AM, 10/2/2016 01:02:00 AM, 10/2/2016 01:10:00 AM, 10/2/2016 01:11:00 AM
but it plotting like 10/2/2016 01:00:00 AM, 10/2/2016 01:10:00 AM, 10/2/2016 01:01:00 AM, 10/2/2016 01:02:00 AM, 10/2/2016 01:11:00 AM
Updating the code:
anychart.onDocumentReady(function() {
// JSON data
var json = {
// chart settings
"chart": {
// chart type
"type": "line",
// chart title
"title": "Axes settings from JSON",
// series settings
"series": [{
// first series data
"data": [
{"x": "10/2/2016 01:00:00 AM", "value": 128.14},
{"x": "10/2/2016 01:10:00 AM", "value": 112.61},
{"x": "10/3/2016 01:00:00 AM", "value": 12.14},
{"x": "10/3/2016 01:10:00 AM", "value": 152.61},
]},{
"data": [
{"x": "10/2/2016 01:09:00 AM", "value": 28.14},
{"x": "10/2/2016 01:11:00 AM", "value": 12.61},
{"x": "10/3/2016 01:01:00 AM", "value": 1.14},
{"x": "10/3/2016 01:12:00 AM", "value": 15.61},
]
}],
// x scale settings
"xScale": {
ticks:
{scale: "DateTime"}
},
xAxes: [{
title: "Basic X Axis"
}],
// chart container
"container": "container"
}
};
// get JSON data
var chart = anychart.fromJson(json);
// draw chart
chart.draw();
});

With this type of data you need to use scatter chart: http://docs.anychart.com/7.12.0/Basic_Charts_Types/Scatter_Chart
Datetime scale should be set like this in JSON:
"xScale": {
type: "datetime",
minimum: "10/02/2016 00:00:00",
maximum: "10/03/2016 12:00:00",
}
Here is a sample: https://jsfiddle.net/3ewcnp5j/102/

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Pandas json normalize - pandas

Create the dataframe and then transpose df = pd.DataFrame(data['history']).T date open high low close 1607956200 14-12-2020 9.13 9.18 9.12 9.14 1607956500 14-12-2020 9.14 9.14 9.08 9.10 1607956800 14-12-2020 9.10 9.11 9.09 9.10

Related

Laravel query sum group by week and get 0 for weeks not existent in the dataset

is there's function to extract value from dictionary in list?

Combining separate temporal measurement series

BigQuery GIS table to GeoJSON type Feature Collection with SQL and/or Python

Rendering multi series line chart with time on x axis in AnyChart

Categories

Resources