Appropriate data structure for storing data read from a large text file

Appropriate data structure for storing data read from a large text file - file-io

I need to read data from a very large (~a million entries) text file and am trying to decide which data structure is most appropriate. Each entry in the file contains two integers that represent an edge in a directed graph (the tail and the head vertices), and the vast majority of vertices have at least one outgoing edge. My "naive" solution is to use a vector of vectors, so if the tail vertex was 1 and the head vertex was 2 I'd just do something like graph[1].push_back(2) to read in the entry "1 2". Once the graph is read in I'll be using Kosaraju's algorithm to compute the strongly-connected components, so I figure it will be handy to be able to access each element via the [] operator in constant time.
What are the "typical" choices in terms of data structures in a situation like this? Also, assuming the vector of vectors idea is a bad one, why is it bad? I guess the fact that they vector will need to re-size itself will slow things down, but the number of edges/vertices isn't known until runtime so I'm not sure of a way around that.
Thanks

Do you know number of vertices?
Vector of vectors isn't such bad idea as you think because you can resize the outer vector before reading edges. So copying of the whole graph would be prevented.
As far as I know vector of vectors is good structure for graph. It is often used on olympiads on computer science.

Related

Basics of face Sculpting in Blender

I mean, the basics..
1) I have seen in the Online videos, that they are modelling a character (or anything) through one object only, they are extruding, loop cut, scaling, etc and model a character, why don't they design different objects separately (like hands separately, legs separately, body separate and then join them together and make one object)..??????
2) Like What the texturing department has to see so that they should not return the model back to the modelling department. I mean like the meshes(polygons) over the model face must be quad, etc not triangle. while modelling a character..
what type of basics i should know , means is there any check list or is there any basics which i should see before modelling a character..
Please correct me if i am wrong , and answer my both questions.. Thanks

It may be common but it definitely isn't mandatory to have a model as one solid mesh. Some models will have parts of the body underneath clothing removed to reduce the poly count. How the model is to be used will be a big factor to how you model it, that is a for a single image it is easy to get away with multiple parts, while a character that will be animated in a cartoony animation could be stretched and distorted in ways that could show holes in a model with multiple pieces. When working in a team, there may be rules in place determining whether a solid or multi-part model is considered acceptable.
An example of an animated model made from multiple parts is Sintel, the main character in the Sintel short animation.
There is nothing stopping you from making a library of separate body parts and joining them together when you make your model. Be aware that this can bring complications, if you model an arm with 12 verts and then you make your hand with 15, then you have to fiddle around to merge them together.
You will also find some extra freedom to work with multiple body parts during the sculpting phase as you are creating a high density mesh that is used as a template to model a clean mesh over. This step is called retopology.
It is more likely that the rigging department will send a model back for fixing than the texturing department. When adding a rig and deforming the mesh in different ways, any parts that deform badly will be revealed and need fixing.

[...] (like hands separately, legs separately, body separate and then
join them together and make one object) [...]
Some modelers I know do precisely this and they do it in a way where they block in the design using broad primitive shapes, start slicing some edge loops and add broad details, then merge everything together, then sculpt it a bit further with high-res sculpting tools, and finally retopologize everything.
The main modelers I know who do this, however, model in a way that tries to adhere as close as possible to the concept artist's illustration. They're not creating their own models from scratch but are instead given top/front/back/side illustrations of a character, for example, and are just trying to match it as closely as possible.
When you start modeling everything in small pieces, it helps to have that concept illustration since you can get lost in the topology otherwise and fusing organic meshes together can be difficult to do in a clean way.
[...] why don't they design different objects separately? [...]
Again they sometimes do, but one of the appeals of creating organic meshes by keeping it seamless the entire time is that you can start to focus on how edge loops propagate across the entire model. It helps to know that the base of a finger is a hexagon, for example, in figuring out how to cleanly propagate and terminate the edge loops for a hand, and likewise have a strategy for the hand to cleanly propagate and terminate edge loops as it joins into the forearm.
It can be hard to get the topology to match up cleanly if you designed everything in small pieces and then had to figure out how to merge it all together. Polygonal modeling is very topology-oriented. It tends to require as much thinking about the wireframe and edge flows as it does the shape of the model, since it needs to be a certain way for everything to subdivide cleanly and smoothly and animate predictably with subdivision surfaces.
I used to work with developers who took one glance at the topology-dominated workflow of polygonal modeling and immediately wanted to jump to seeking alternatives, like voxel sculpting. With voxels you could be able to potentially model everything in pieces and foose it all together in a nice and smooth organic way without thinking about topology whatsoever.
However, that loses sight of the key appeal of polygonal meshes. Their wire flow forms a control lattice with a very finite number of control points for the artist to animate and move around to predictably control the shape of their model. You immediately lose that with a voxel representation -- so while voxels free the artist of thinking about how the topology works and how the wireframe flows through the model, it also loses all those control benefits of having that. So often if people use voxel sculpting, they end up meticulously retopologizing everything at the end anyway to gain back that level of coarse and predictable control they have with polygonal meshes.
I mean like the
meshes(polygons) over the model face must be quad, etc not triangle.
while modelling a character..
This is all in the context of subdivision surfaces: the most popular of which are variants of catmull-clark. That favors quads to get the most predictable subdivision. It's much easier for the artist to predict how everything will look like and deform if they favor, as much as possible, uniform grids of quadrangles wrapped around their model with 4-valence vertices and every polygon having 4 points. Then only in the case where they kind of need to "join" these quad grids together, they might create some funky topology: a 5-valence vertex here, a 3-valence vertex there, a 5-sided polygon here, a triangle there -- but those cases tend to deform a bit unpredictably (at least unintuitively), so artists tend to try to avoid these as much as possible.
Because when artists model polygonal meshes in this way, they are not just trying to create a statue with a nice shape. If that's all they wanted to do, they'd save themselves a lot of grief avoiding dealing with things in terms of individual vertices/edges/polygons in the first place and using something like Sculptris. Instead they are designing not only shapes but also designing a control lattice, a wire flow and a set of control points they can easily move around in the future to get predictable behavior out of their control cage. They're basically designing controls or an "interactive GUI/rig" almost for themselves with how they design the topology.
2) Like What the texturing department has to see so that they should
not return the model back to the modelling department.
Generally how a mesh is modeled in a direct sense shouldn't affect the texture department's work much at all if they're working with UV maps and painting textures over them (at that point it doesn't really matter if a model has clean wire flows or not, since all the texture artists do is pain images over the 2D UV map or directly onto the 3D model).
However, if the modeler does the UV mapping, then regardless of whether he uses quad meshes and clean wire flows or not, if the UV mapping is poor, then the resulting texture images will look all distorted. So the UV maps need to be made well with minimal distortion, though that's usually easy to do automatically these days.
The other exception is if the department doesn't use UV maps and instead uses, say, PTex from Disney. PTex really favors quads. In the original paper at least, it only worked with quads.

Automatic feature extraction from chess board positions

I am working on a project where I take a chess board position (FEN string converted to binary) & it's evaluation score and feed it to a neural network. My aim is to make the neural network differentiate between good and bad positions.
How I encode the position : There are 12 unique pieces in chess i.e pawn, rook, knight, bishop, queen and king for white as well as black. I encode each piece using 4 bits with 0000 denoting an empty square. So the 64 squares are encoded into 256 bits and I use 6 more bits to denote game state like whose turn it is to move, king-castle status, etc.
Problem : Since the input space for chess positions is neither smooth nor uni-modal (one small change in the board position can result in a huge change in the evaluation score), the neural network doesn't learn well. Now, the next logical thing to somehow extract useful features (like material difference, center control, etc) and feed it to the network.
I do not want to hand pick the features as I want the network to learn everything by itself. Therefore I am thinking of extracting features automatically using autoencoders. Is there any better way to accomplish this?
Summary : What is the best way to automatically extract features from a chess board position so that it can be fed into a neural network?
UPDATE : To generate training data, I have modified Stockfish to dump it's evaluation process into a log file. So every new move(position) it considers is written to a file as an FEN string along with it's eval score

Neural networks can give an approximation of any function. The only consideration to do is the dimensionality of the search space, which give constraints to the amount of data you have to get a good approximation.
For a supervised network (you use autoencoders, then I think you use some variant of backpropagation), it's difficult for me to immagine how you think to do the trainig using single positions because you need similar positions in your training set. Maybe your approach is different, but I'm convinced that second strategy (using features) is more promising. I think using positions require a huge amount of data training to get good results.
For features take a look here, and to the classical work of Shannon.
I taked also useful informations from the source code of Crafty.
But you have to extract these informations from the FEN string.
Autoencoders are a way to give a reduction of data (good because increase performances). It seems to be better the use of Pincipal Component Analysys, as reported here.
I hope this can help you.

Titan vertex centric indices vs Neo4j labels

I was trying to make a comparison between these two technologies when approaching this and I was wondering if any of you already have some experience dealing with any or both of them?
I am mainly interested in performance numbers when dealing with similar use cases.

The difference between the two concepts is the difference between global and local indexing.
As I understand it, Neo4j vertex labels allow you to break up your index space by "categories" of vertices. In this way, a O(log(|V|)) lookup is now an O(log(|V|/c)), where c is the number of categories/labels you have over your vertex set and (the equation) assumes an equal number of vertices in each category. As such, vertex label aid in global index calls as this is a function of V.
Next, Titan's vertex-centric indices sort and index the incident edges of a vertex. The cost to find a particular edge by its label/properties incident to a vertex is O(log(inc(v))), where inc(v) is the size of the incident edge set to vertex v. As such, vertex-centric indices are local indices as this is a function of v.
As I understand it, Neo4j does not support vertex-centric indices. You see this concept currently in Titan, OrientDB, and TinkerGraph (…and RDF stores sort in this manner as well -- via spog pairings). Next, all known graph databases support global indices though, (I believe only Neo4j and OrientDB), support a vertex set partition via the concept of a label.
Again, assuming my assumptions are correct about the use of vertex labels in Neo4j, we are talking about two different use cases — global vs. local indexing. From the perspective of the supernode problem, global indices do not quell the issue of traversing through a large vertex, while this is the sole purpose of the local vertex-centric indices.
You can read about the supernode problem and vertex-centric indices here:
http://thinkaurelius.com/2012/10/25/a-solution-to-the-supernode-problem/

Agreeing with everything Marko said, one could take it further and argue that in the graph database world local indexes can (and even should) substitute global ones. In my opinion, the single greatest advantage of a graph data model is that it lets you encode your data model into the graph topology, gaining qualitative advantages in terms of flexibility, ease of evolution and performance. With this in mind, I'd argue that labels in Neo4j actually detract from all this; reifying a label into a node with adjacent edges pointing to the source having that label is much more in line with the "schema is the graph" philosophy.
Of course, if your engine lacks local indexes we are back at the supernode problem. But if you do have them (something which I'd say should be a requirement for something to be called a graph database), you can easily transform your label into a node L, and create relationships pointing to that node for those vertices which you want labeled with L
v -[L]-> L
meaning that v has label L. Now if you want this in Titan to behave like a Neo4j label, just make the -[L]-> relation to be "manyToOne" (see Titan cardinality constraints) and create a vertex-centric index. This pattern lets you get everything that you could with labels and much more; you can
effectively use this as a namespace for properties relating to that label
sort your elements inside one label
nest labels easily without losing performance (just use a composite key)
separate the declaration of a label L with how elements labeled with it are accessed

Labels may afford some design patterns that improve performance by de-densifying the graph. For example: they eliminate the need for type nodes, which can often get quite dense. Labels can optionally be associated with a unique index. Here, the ability to index a property isn't new, but the ability to constrain it uniquely is. If you were previously doing work in your application, you may experience some performance gains by letting the database handle this. (It's certainly much more convenient to do so.) Finally, if you don't assign a unique index to a label, it will still be indexed, in order to help performance for certain kinds of queries (e.g. "give me all of the nodes having label ")
All that said, while labels may help with performance in certain cases, they were introduced more with ease-of-use in mind. We're just getting started with Neo4j 2.1, which specifically addresses dense node performance (something I know you've been waiting for), along with other performance & scalability improvements... including removing (for all practical purposes eliminating) the upper size limits.
Philip

Computational complexity and shape nesting

I have SVG abirtrary paths which i need to pack as efficiently as possible within a given rectangle(as less waste of space as possible). After some research i found the bin packing algorithms which seems to be dealing with boxes and not curved random shapes(my SVG shapes are quite complex and include beziers etc.).
AFAIK, there is no deterministic algorithm for actually packing abstract shapes.
I wish to be proven wrong here which would be ideal(having a mathematical deterministic method for packing them). In case I am right however and there is not, what would be the best approach to this problem
The subject name is Shape Nesting, Nesting Problem or Nesting Process.
In Shape Nesting there is no single/uniform algorithm or mathematical method for nesting shapes and getting the least space waste possible.
The 1st method is the packing algorithm(creates an imaginary bounding
box for each shape and uses a rectangular 2D algorithm to pack the
bounding boxes).
This method is fast but the least efficient in regards to space
waste.
The 2nd method is some kind of incremental rotation. The algorithm
rotates the shape at incremental steps and checks if it fits in the
space. This is better than the packing method in regards to space
waste but it is painstakingly slow,
What are some other classroom examples for achieving a solution to this problem?

[Edit1] new answer
as mentioned before bin-packing is NP complete (hard) so forget about algebraic solution
known approaches are:
generate and test
either you test all possibility of the problem and remember the best solution or incrementally add items (not all at once) one by one with the same way. It is basically what you are doing now without proper heuristic is unusably slow. But has the best space efficiency (the first one is much better but much slower) O(N!)
take advantage of sorting items by size
something like this it is much faster almost O(N.log(N)) (according to used sorting algorithm). Space efficiency strongly depends on the items size range and count. For rectangular shapes is this the best approach (fastest and usable even for N>1000). For complex shapes is this not a good way but look at it anyway maybe you get some idea ...
use of Neural network
This is extremly vague approach without any warrant of solution but possible best space efficiency/runtime ratio
I think there could be some field approach out there
I sow a few for generating graph layouts. All items create fields (booth attractive and repulsive) so they are moving to semi-stable state.
At first all items are at random locations
When the movement stop remember best solution and shake all items a little or randomize their position again.
Cycle this few times
This approach is much faster then genere and test and can provide very close solution to it but it can hang in local min/max or oscillate if the fields are not optimally choosed. For example all items can have constant attractive force to each other and repulsive force getting stronger only when the items are very close. You have to prevent overlapping of items (either by stronger repulsion or by collision tests). You have also to create some rotation moment for example with that repulsive force. It differs on any vertex so it creates a rotation moment (that can automatically align similar sides closer together). Also you can have semi-stable state with big distances between items and after finding best solution just turn off repulsion fields so they stick together. Sometimes it can have better results some times not ... here is nice example for graph layout computation
Logic to strategically place items in a container with minimum overlapping connections
Demo from the same QA
And here solver for placing sliders in 2D:
How to implement a constraint solver for 2-D geometry?
[Edit0] old answer before reformulating the question
I am not clear what you want to achieve.
have SVG picture and want to separate its parts to rectangular regions
as filled as can be
least empty space in them
no shape change in picture
have svg picture and want to change its shapes according to some purpose
if this is the case some additional info is needed
[solution for 1]
create a list of points for whole SVG in global SVG space (all points are transformed)
for line you need add 2 points
for rectangles 4 points
circle/elipse/bezier/eliptic arc 8 points
find local centres of mass
use classical approach
or can speed things up by computing the average density of points per x,y axis separately and after that just check all combinations of found positions of local max of densities if they really are sub cluster center or not.
all sub cluster center is the center of your region
now find the most far points which are still part of your cluster (the are close enough to neighbour points)
create rectangular area that cover all points from sub cluster.
you also can remove all used points from list
repeat fro all valid sub clusters
until all points are used
another not precise but simpler approach is:
find SVG size
create planar map of svg with some precision for example int map[256][256].
size of map can be constant or with the same aspect as SVG
clear map with 0
for any point of SVG set related map point to 1 (or inc or whatever)
now just segmentate map and you will have find your objects
after segmentation you have position and size of all objects
so finding of bounding boxes should be easy

You can start with a variant of the rectangle bin-packing algorithm and add rotation. There is a method "Guillotine bin packer" and you can download a paper and a library at github.

Elegant representations of graphs in R^3

If I have a graph of a reasonable size (e.g. ~100 nodes, ~40 edges coming out of each node) and I want to represent it in R^3 (i.e. map each node to a point in R^3 and draw a straight line between any two nodes which are connected in the original graph) in a way which would make it easy to understand its structure, what do you think would make a good drawing criterion?
I know this question is ill-posed; it's not objective. The idea behind it is easier to understand with an extreme case. Suppose you have a connected graph in which each node connects to two and only two other nodes, except for two nodes which only connect to one other node. It's not difficult to see that this graph, when drawn in R^3, can be drawn as a straight line (with nodes sprinkled over the line). Nevertheless, it is possible to draw it in a way which makes it almost impossible to see its very simple structure, e.g. by "twisting" it as much as possible around some fixed point in R^3. So, for this simple case, it's clear that a simple 3D representation is that of a straight line. However, it is not clear what this simplicity property is in the general case.
So, the question is: how would you define this simplicity property?
I'm happy with any kind of answer, be it a definition of "simplicity" computable for graphs, or a greedy approximated algorithm which transforms graphs and that converges to "simpler" 3D representations.
Thanks!
EDITED
In the mean time I've put force-based graph drawing ideas suggested in the answer into practice and wrote an OCaml/openGL program to simulate how imposing an electrical repulsive force between nodes (Coulomb's Law) and a spring-like behaviour on edges (Hooke's law) would turn out. I've posted the video on youtube. The video starts with an initial graph of 100 nodes each with approximately 1-2 outgoing edges and places the nodes randomly in 3D space. Then all the forces I mentioned are put into place and the system is left to move around subject to those forces. In the beginning, the graph is a mess and it's very difficult to see the structure. Closer to the end, it is clear that the graph is almost linear. I've also experience with larger-sized graphs but sometimes the geometry of the graph is just a mess and no matter how you plot it, you won't be able to visualise anything. And here is an even more extreme example with 500 nodes.

One simple approach is described, e.g., at http://en.wikipedia.org/wiki/Force-based_algorithms_%28graph_drawing%29 . The underlying notion of "simplicity" is something like "minimal potential energy", which doesn't really correspond to simplicity in any useful sense but might be good enough in practice.
(If you have 100 nodes of degree 40, I have some doubt as to whether any way of drawing them is going to reveal much in the way of human-accessible structure. That's a lot of edges. Still, good luck!)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas