How to generate a Digraph Tree from a big XML

How to generate a Digraph Tree from a big XML - matplotlib

I have a big AML/XML file with many layers and child of child elements. I need to display the ElementTree tags as a Digraph. I only want to use lxml, networkx and matplotlib!
I've run into multiple questions / issues:
How do I automatically iterate and loop through all the subtrees and child elements e.g. when i dont know how many layers i have? Is there a smarter way than making endless for loops?
How do i format the digraph to look like a tree instead of a mess and how do i get rid of self-referencing edges?
Hope somebody can give me advice and a smarter solution
Here's what I wrote and my result so far:
from lxml import etree
import networkx as nx
import matplotlib.pyplot as plt
G=nx.DiGraph()
tree = etree.parse("Test_AML.aml")
root = tree.getroot()
for node in root.getchildren():
G.add_node(node.tag)
G.add_edge(root.tag, node.tag)
print(node.tag)
for element in node.getchildren():
G.add_node(element.tag)
G.add_edge(node.tag, element.tag)
for knot in element.getchildren():
G.add_node(knot.tag)
G.add_edge(element.tag, knot.tag)
for leaf in knot.getchildren():
G.add_node(leaf.tag)
G.add_edge(knot.tag, leaf.tag)
nx.draw(G)
plt.show()
[1]: https://i.stack.imgur.com/D7qte.png

Related

optimize an array for interpolation

I am using an array (51x51x181) to make a 3d interpolation in python (and I can calculate any point inbetween if needed).
I need to reduce the size of the array and would like to do this with the least amount of error possible.
Attached you find an example, with the error function I would like to improve on. The number of values in the array should stay the same, however the Angles and Shifts in the example do not have to be equally spaced.
import numpy as np
from scipy.interpolate import RegularGridInterpolator
import itertools
Data=np.zeros((5,180))
Angles=np.linspace(0,360,10)
Shifts=np.linspace(0,100,10)
Data=np.sin(np.deg2rad(Angles[:,None]+Shifts[None,:]))
interp = RegularGridInterpolator((Angles, Shifts),Data, bounds_error=False, fill_value=None)
def errorfunc():
Angles=np.linspace(0,360,50)
Shifts=np.linspace(0,100,50)
Function_Results=np.sin(np.deg2rad(Angles[:,None]+Shifts[None,:]).flatten())
Data_interp=interp(np.array(list(itertools.product(Angles,Shifts))))
Error=np.sqrt(np.mean(np.square(Function_Results-Data_interp)))
return(Error)
I could not find a feasible optimizer in scipy (tried some with poor performance). Is there a standard way to do this?

Use folium Map as holoviews DynamicMap

I have a folium.Map that contains custom HTML Popups with clickable URLs. These Popups open when clicking on the polygons of the map. This is a feature that doesn't seem to be possible to achieve using holoviews.
My ideal example of the final application that I want to build with holoviews/geoviews is here with the source code here, but I would like to exchange the main map with my folium Map and plot polygons instead of rasterized points. Now when I would like to create the holoviews.DynamicMap from the folium.Map, holoviews complains (of course) that the data type "map" is not accepted. Is this somehow still possible?
I have found some notebook on GitHub where a holoviews plot in embedded in a folium map using a workaround that writes and reads again HTML, but it seems impossible to embed a folium map into holoviews such that other plots can be updated from this figure using Streams!?
Here is some toy data (from here) for the datasets that I use. For simplicity, let's assume I just had point data instead of polygons:
import folium as fn
def make_map():
m = fm.Map(location=[20.59,78.96], zoom_start=5)
green_p1 = fm.map.FeatureGroup()
green_p1.add_child(
fm.CircleMarker(
[row.Latitude, row.Longitude],
radius=10,
fill=True,
fill_color=fill_color,
fill_opacity=0.7
)
)
map.add_child(green_p1)
return map
If I understand it correctly, this needs to be tweaked now in the fashion that it can passed as the first argument to a holoviews.DynamicMap:
hv.DynamicMap(make_map, streams=my_streams)
where my_streams are some other plots that should be updated with the extent of the folium map.
Is that somehow possible or is my strategy wrong?

GeoViews saving inline HTML file is very large

I have created geo-dataframe using a combination of geopandas and geoviews. Libraries I'm using are below:
import pandas as pd
import numpy as np
import geopandas as gpd
import holoviews as hv
import geoviews as gv
import matplotlib.pyplot as plt
import matplotlib
import panel as pn
from cartopy import crs
gv.extension('bokeh')
I have concatenated 3 shapefiles to build a polygon picture of UK healthcare boundaries (links to files provided if needed). Unfortunately, from what i have found the UK doesn't produce one file that combines all of those, so have had to merge the shape files from the 3 individual countries i'm interested in. The 3 shape files have a size of:
shape file 1 = 5mb (https://www.opendatani.gov.uk/dataset/department-of-health-trust-boundaries)
shape file 2 = 204kb (https://geoportal.statistics.gov.uk/datasets/5252644ec26e4bffadf9d3661eef4826_4)
shape file 3 = 22kb (https://data.gov.uk/dataset/31ab16a2-22da-40d5-b5f0-625bafd76389/local-health-boards-december-2016-ultra-generalised-clipped-boundaries-in-wales)
I have merged them all successfully to build the picture i am looking for using:
Test = gv.Polygons(Merged_Shapes, vdims=[('Data'), ('CCG_Name')], crs=crs.OSGB()).options(tools=['hover'], width=550, height=700)
Test_2 = gv.Polygons(Merged_Shapes, vdims=[('Data'), ('CCG_Name')], crs=crs.OSGB()).options(tools=['hover'], width=550, height=700)
However, I would like to include these charts in a shareable html file. The issue I'm running into, is that when I save the HTML using:
from bokeh.resources import INLINE
layout = hv.Layout(Test + Test_2)
Final_report = pn.Tabs(('Test',layout)).save('Map_test.html', resources=INLINE)
I generate a html file that displays the charts, but the size is 80mb, which is far to large, especially if I want include more polygon charts and other charts in the same html.
Does anyone know of a more efficient way, from a memory perspective, I can store my polygon charts within a HTML file for sharing?

You can make the file smaller by rasterizng or by decimating the shapes. For rasterizng you can call hv.operation.datashader.rasterize(obj), and I think there is something in Shapely or GeoPandas for simplifying the shapes.

Prim's algorithm on existing graph

I plotted a graph using text input file now I have to apply prim's algorithm to it. How can I do it ? Below is my code for generating a graph using a text file
import matplotlib.pyplot as plt
import networkx as nx
f= open('input10.txt')
G=nx.Graph()
x=f.read()
x=x.split()
y=[float(i) for i in x]
for i in range(1,30,3):
G.add_node(y[i],pos=(y[i+1],y[i+2]))
def last_index(y):
return len(y)-1
z=last_index(y)
for i in range(31,z-3,5):
G.add_edge(y[i],y[i+1],weight=(y[i+2]))
pos=nx.get_node_attributes(G,'pos')
weight=nx.get_edge_attributes(G,'weight')
plt.figure()
nx.draw(G,pos)

Use the nodes u=y[i], v=y[i+1] and weight=y[i+2], and create adjacency matrix or adjacency list of the graph and then apply prim's algorithm, you can find a good and easy tutorial here :Prim’s Minimum Spanning Tree

Matplotlib namespace issues?

I have a question regarding the Matplotlib.pyplot and namespaces.
See the following code:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.pyplot import cm
x=np.linspace(0,1,28)
color=iter(cm.gist_rainbow_r(np.linspace(0,1,28)))
plt.clf()
for s in range(28):
c=next(color)
plt.plot(x,x*s, c=c)
plt.show()
The idea was to have the plots in different colors of the rainbow map.
Now what happens is that on first execution it works, but then things are getting weird.
On several consecutive executions the map is stopped being used and instead of that the default map is used.
I see that the problem may lie within the "c=c" in the plot function, but I have played around with different namings "c", "color", .... and could not find the systematic of the issue here.
Can someone reproduce the problem and (try the code at least 5 times or so consecutively) is able to explain, what is going on here?
Thanks

This is known issue with mpl + python3.4+ that has been fixed in mpl v1.5+.
Many of the style parameters have multiple aliases (ex 'c' vs 'color') which mpl was not merging properly and the artists were essentially getting told two different colors which internally means there is a dictionary with both 'c' and 'color' in it.
In python 3.4+ process-to-process order of iteration of dictionaries is random by default due to the seed for the underlying hash table being randomized (this was to prevent a possible DOS attack based on intentional hash table collisions). In older versions of python it so happened that the user supplied color always came later in the iteration order so things coincidentally worked.
The simple work around (iirc) is to use plot(x, y, color=c) or update to mpl 1.5.1.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to generate a Digraph Tree from a big XML - matplotlib

Related

optimize an array for interpolation

Use folium Map as holoviews DynamicMap

GeoViews saving inline HTML file is very large

Prim's algorithm on existing graph

Matplotlib namespace issues?

Categories

Resources