How to extract separate values from GeoJSON in BigQuery - sql

I have a GeoJSON string for a multipoint geometry. I want to extract each of those points to a table of individual point geometries in BigQuery
I have been able to achieve point geometry for one of the points. I want to do it for all the others as well in a automated fashion. I've already tried converting the string to an array but it remains an array of size 1 with the entire content as a single string.
This is what worked for me that I was able to extract one point and convert it to a geometry
WITH temp_table as (select '{ "type": "MultiPoint", "coordinates": [ [ 20, 10 ], [ 30, 5 ], [ 90, 50 ], [ 40, 80 ] ] }' as string)
select ST_GEOGPOINT(CAST(JSON_EXTRACT(string, '$.coordinates[0][0]') as FLOAT64), CAST(JSON_EXTRACT(string, '$.coordinates[0][1]') as FLOAT64)) from temp_table
This results in POINT(20 10)
I can write manual queries for each of these points and do a UNION ALL but that won't scale or work every time. I want to achieve this such that it is able to do it in a automated fashion. For architectural purposes, we can't do string manipulation in languages like Python.

Below is for BigQuery Standard SQL
#standardSQL
SELECT
ARRAY(
SELECT ST_GEOGPOINT(
CAST(SPLIT(pair)[OFFSET(0)] AS FLOAT64), CAST(SPLIT(pair)[SAFE_OFFSET(1)] AS FLOAT64))
FROM UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(STRING, '$.coordinates'), r'\[(\d+,\d+)\]')) pair
) points
FROM `project.dataset.temp_table`
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.temp_table` AS (
SELECT '{ "type": "MultiPoint", "coordinates": [ [ 20, 10 ], [ 30, 5 ], [ 90, 50 ], [ 40, 80 ] ] }' AS STRING
)
SELECT
ARRAY(
SELECT ST_GEOGPOINT(
CAST(SPLIT(pair)[OFFSET(0)] AS FLOAT64), CAST(SPLIT(pair)[SAFE_OFFSET(1)] AS FLOAT64))
FROM UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(STRING, '$.coordinates'), r'\[(\d+,\d+)\]')) pair
) points
FROM `project.dataset.temp_table`
with result
Row points
1 POINT(20 10)
POINT(30 5)
POINT(90 50)
POINT(40 80)
Note: in above version - array of points is produced for each respective original row. Obviously you can adjust it to flatten as in below example
#standardSQL
WITH `project.dataset.temp_table` AS (
SELECT '{ "type": "MultiPoint", "coordinates": [ [ 20, 10 ], [ 30, 5 ], [ 90, 50 ], [ 40, 80 ] ] }' AS STRING
)
SELECT
ST_GEOGPOINT(
CAST(SPLIT(pair)[OFFSET(0)] AS FLOAT64), CAST(SPLIT(pair)[SAFE_OFFSET(1)] AS FLOAT64)
) points
FROM `project.dataset.temp_table`, UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(STRING, '$.coordinates'), r'\[(\d+,\d+)\]')) pair
with result
Row points
1 POINT(20 10)
2 POINT(30 5)
3 POINT(90 50)
4 POINT(40 80)

Related

Linear Diophantine Equations with Restriction in the GAP System

I am searching for a way to use the GAP System to find a solution of a linear Diophantine equation over the non-negative integers. Explicitly, I have a list L of positive integers for each of which there exists a solution of the linear Diophantine equation s = 11*a + 7*b such that a and b are non-negative integers. I would like to have the GAP System return for each element s of L the ordered pair(s) [a, b] corresponding to the above solution(s).
I am familiar already with the command SolutionIntMat in the GAP System; however, this produces only some solution of the linear Diophantine equation s = 11*a + 7*b. Particularly, it is possible (and far more likely) that one of the coefficients a and b is negative. For instance, I obtain the solution [-375, 600] when I use the aforementioned command on the linear Diophantine equation 75 = 11*a + 7*b.
For additional context, this query arises when working with numerical semigroups generated by generalized arithmetic sequences. Use the command LoadPackage("numericalsgps"); to implement computations with such objects. For instance, if S := NumericalSemigroup(11, 29, 36, 43, 50, 57, 64, 71);, then each of the minimal generators of S other than 11 is of the form 2*11 + 7*i for some integer i in [1..7]. One can ask the GAP System for the SmallElements(S);, and the GAP System will return all elements of S up to FrobeniusNumber(S) + 1. Clearly, every element of S is of the form 11*a + 7*b for some non-negative integers a and b; I would like to investigate what coefficients a and b arise. In fact, the answer is known (cf. Proposition 2.5 of this paper); I am just trying to get an understanding of the intuition behind the proof.
Thank you in advance for your time and consideration.
Dylan, thank you for your query and for using GAP and numericalsgps.
You can probably use in this setting Factorizations from the package numericalsgps. It internally rewrites the output of RestrictedPartitions.
For instance, in your example, you can get all possible "factorizations" of the small elements of S, with respect to the generators of S, by typing List(SmallElements(S), x->[x,Factorizations(x,S)]). A particular example:
gap> Factorizations(104,S);
[ [ 1, 0, 0, 1, 1, 0, 0, 0 ], [ 1, 0, 1, 0, 0, 1, 0, 0 ],
[ 1, 1, 0, 0, 0, 0, 1, 0 ], [ 3, 0, 0, 0, 0, 0, 0, 1 ] ]
If you want to see the factorizations of the elements of S in terms of 11 and 7, then you can do the following:
gap> FactorizationsIntegerWRTList(29,[11,7]);
[ [ 2, 1 ] ]
So, for all minimal generators of S you would do
gap> List(MinimalGenerators(S), g-> FactorizationsIntegerWRTList(g,[11,7]));
[ [ [ 1, 0 ] ], [ [ 2, 1 ] ], [ [ 2, 2 ] ], [ [ 2, 3 ] ],
[ [ 2, 4 ] ], [ [ 2, 5 ] ], [ [ 2, 6 ] ], [ [ 2, 7 ] ] ]
For the set of small elements of S, try List(SmallElements(S), g-> FactorizationsIntegerWRTList(g,[11,7])). If you only want up to some integer, just replace SmallElements(S) with Intersection([1..200], S); or if you want the first, say 200, elements of S, use S{[1..200]}.
You may want to have a look at Chapter 9 of the manual, and in particular to FactorizationsElementListWRTNumericalSemigroup.
I hope this helps.

BigQuery string-formatting to json

Is the following a full list of all value types as they're passed to json in BigQuery? I've gotten this by trial and error but haven't been able to find this in the documentation:
select
NULL as NullValue,
FALSE as BoolValue,
DATE '2014-01-01' as DateValue,
INTERVAL 1 year as IntervalValue,
DATETIME '2014-01-01 01:02:03' as DatetimeValue,
TIMESTAMP '2014-01-01 01:02:03' as TimestampValue,
"Hello" as StringValue,
B"abc" as BytesValue,
123 as IntegerValue,
NUMERIC '3.14' as NumericValue,
3.14 as FloatValue,
TIME '12:30:00.45' as TimeValue,
[1,2,3] as ArrayValue,
STRUCT('Mark' as first, 'Thomas' as last) as StructValue,
[STRUCT(1 as x, 2 as y), STRUCT(5 as x, 6 as y)] as ArrayStructValue,
STRUCT(1 as x, [1,2,3] as y, ('a','b','c') as z) as StructNestedValue
{
"NullValue": null,
"BoolValue": "false", // why not just false without quotes?
"DateValue": "2014-01-01",
"IntervalValue": "1-0 0 0:0:0",
"DatetimeValue": "2014-01-01T01:02:03",
"TimestampValue": "2014-01-01T01:02:03Z",
"StringValue": "Hello",
"BytesValue": "YWJj",
"IntegerValue": "123",
"NumericValue": "3.14",
"FloatValue": "3.14",
"TimeValue": "12:30:00.450000",
"ArrayValue": ["1", "2", "3"],
"StructValue": {
"first": "Mark",
"last": "Thomas"
},
"ArrayStructValue": [
{"x": "1", "y": "2"},
{"x": "5", "y": "6"}
],
"StructNestedValue": {
"x": "1",
"y": ["1", 2", "3"],
"z": {"a": "a", b": "b", "c": "c"}
}
}
Honestly, it seems to me that other than the null value and the array [] or struct {} container, everything is string-enclosed, which seems a bit odd.
According to this document, json is built on two structures:
A collection of name/value pairs. In various languages, this is
realized as an object, record, struct, dictionary, hash table, keyed
list, or associative array.
An ordered list of values. In most
languages, this is realized as an array, vector, list, or sequence.
The result of the SELECT query is in json format, wherein [] depicts an array datatype, {} depicts an object datatype and double quotes(" ") depicts a string value as in the query itself.

In PostgreSQL, what's the best way to select an object from a JSONB array?

Right now, I have an an array that I'm able to select off a table.
[{"_id": 1, "count: 3},{"_id": 2, "count: 14},{"_id": 3, "count: 5}]
From this, I only need the count for a particular _id. For example, I need the count for
_id: 3
I've read the documentation but I haven't been able to figure out the correct way to get the object.
WITH test_array(data) AS ( VALUES
('[
{"_id": 1, "count": 3},
{"_id": 2, "count": 14},
{"_id": 3, "count": 5}
]'::JSONB)
)
SELECT val->>'count' AS result
FROM
test_array ta,
jsonb_array_elements(ta.data) val
WHERE val #> '{"_id":3}'::JSONB;
Result:
result
--------
5
(1 row)

Python folium GeoJSON map not displaying

I'm trying to use a combination of geopandas, Pandas and Folium to create a polygon map that I can embed incorporate into a web page.
For some reason, it's not displaying.
The steps I've taken:
Grabbed a .shp from the UK's OS for Parliamentary boundaries.
I've then used geopandas to change the projection to epsg=4326 and then exported as GeoJSON which takes the following format:
{ "type": "Feature", "properties": { "PCON13CD": "E14000532", "PCON13CDO": "A03", "PCON13NM": "Altrincham and Sale West" }, "geometry": { "type": "Polygon", "coordinates": [ [ [ -2.313999519326579, 53.357408280545918 ], [ -2.313941776174758, 53.358341455420039 ], [ -2.31519699483377, 53.359035664493433 ], [ -2.317953152796459, 53.359102954309151 ], [ -2.319855973429864, 53.358581917200119 ],... ] ] ] } },...
Then what I'd like to do is mesh this with a dataframe of constituencies in the following format, dty:
constituency count
0 Burton 667
1 Cannock Chase 595
2 Cheltenham 22
3 Cheshire East 2
4 Congleton 1
5 Derbyshire Dales 1
6 East Staffordshire 4
import folium
mapf = folium.Map(width=700, height=370, tiles = "Stamen Toner", zoom_start=8, location= ["53.0219392","-2.1597434"])
mapf.geo_json(geo_path="geo_json_shape2.json",
data_out="data.json",
data=dty,
columns=["constituency","count"],
key_on="feature.properties.PCON13NM.geometry.type.Polygon",
fill_color='PuRd',
fill_opacity=0.7,
line_opacity=0.2,
reset="True")
The output from mapf looks like:
mapf.json_data
{'../../Crime_data/staffs_data92.json': [{'Burton': 667,
'Cannock Chase': 595,
'Cheltenham': 22,
'Cheshire East': 2,
'Congleton': 1,
'Derbyshire Dales': 1,
'East Staffordshire': 4,
'Lichfield': 438,
'Newcastle-under-Lyme': 543,
'North Warwickshire': 1,
'Shropshire': 17,
'South Staffordshire': 358,
'Stafford': 623,
'Staffordshire Moorlands': 359,
'Stoke-on-Trent Central': 1053,
'Stoke-on-Trent North': 921,
'Stoke-on-Trent South': 766,
'Stone': 270,
'Tamworth': 600,
'Walsall': 1}]}
Although the mapf.create_map() function successfully creates a map, the polygons don't render.
What debugging steps should I take?
#elksie5000, Try mplleaflet it is extremely straightforward.
pip install mplleaflet
in Jupyter/Ipython notebook:
import mplleaflet
ax = geopandas_df.plot(column='variable_to_plot', scheme='QUANTILES', k=9, colormap='YlOrRd')
mplleaflet.show(fig=ax.figure)

Using a MultiIndex value in a boolean selection (while setting)

There is a similar question here: Pandas using row labels in boolean indexing
But that one uses a simple index and I can't figure out how to generalize it to a MultiIndex:
df = DataFrame( { 'ssn' : [ 489, 489, 220, 220 ],
'year': [ 2009, 2010, 2009, 2010 ],
'tax' : [ 300, 600, 800, 900 ],
'flag': [ 0, 0, 0, 0 ] } )
df.set_index( ['ssn','year'], inplace=True )
Semi-solutions:
df.flag[ (df.year ==2010) & (df.tax<700) ] = 9 (works if drop=False in set_index)
df.flag[ (df.index==2010) & (df.tax<700) ] = 9 (works for a simple index)
I've tried several things but I just can't figure out how to generalize from simple index to multi. E.g. df.index.year=2010 and 20 other guesses...
You can use index.get_level_values(), e.g.
df.flag[(df.index.get_level_values('year') == 2010) & (df.tax < 700)] = 9