XML Schema Binding / Object Model Framework in Cocoa - objective-c

New to XSD here.
Has anyone found or written a framework for validating XML with an XML schema in Cocoa/Obj-C?
What I really need is the ability to define permitted types of modifications to an NSXMLDocument, as described in an XSD file. This includes defining sequences of child elements, list of attributes and their permitted values, etc etc. I need to expose these modification rules in my UI. For example:
I want to constrain the names of the new child elements added to an existing NSXMLElement node in my NSOutlineView
If the XSD says that Node A has required child elements (Nodes Aa and Ab) then when the user adds Node A to the XML tree, I want to automatically create Nodes Aa & Ab and add them to the just-created Node A.
etc etc
It seems to me that a good solution would be a Cocoa counterpart of JAXB. XSOM (which doesn't create schema-derived classes, but rather gives an query-able object model of the XSD) would work too.
My question is similar to this one, but I don't want to limit myself to JAXB-like solution. I'm interested in finding out other solutions that people have come up to this problem.
Cheers!!

You can create a DTD and validate against it, or create a recursive parser based on your XSD, such as existing ones for RSS or Atom based on the spec.

Related

I want to load a YAML file, possibly edit the data, and then dump it again. How can I preserve formatting?

This question tries to collect information spread over questions about different languages and YAML implementations in a mostly language-agnostic manner.
Suppose I have a YAML file like this:
first:
- foo: {a: "b"}
- "bar": [1, 2, 3]
second: | # some comment
some long block scalar value
I want to load this file into an native data structure, possibly change or add some values, and dump it again. However, when I dump it, the original formatting is not preserved:
The scalars are formatted differently, e.g. "b" loses its quotation marks, the value of second is not a literal block scalar anymore, etc.
The collections are formatted differently, e.g. the mapping value of foo is written in block style instead of the given flow style, similarly the sequence value of "bar" is written in block style
The order of mapping keys (e.g. first/second) changes
The comment is gone
The indentation level differs, e.g. the items in first are not indented anymore.
How can I preserve the formatting of the original file?
Preface: Throughout this answer, I mention some popular YAML implementations. Those mentions are never exhaustive since I do not know all YAML implementations out there.
I will use YAML terms for data structures: Atomic text content (even numbers) is a scalar. Item sequences, known elsewhere as arrays or lists, are sequences. A collection of key-value pairs, known elsewhere as dictionary or hash, is a mapping.
If you are using Python, using ruamel will help you preserve quite some formatting since it implements round-tripping up to native structures. However, it isn't perfect and cannot preserve all formatting.
Background
The process of loading YAML is also a process of losing information. Let's have a look at the process of loading/dumping YAML, as given in the spec:
When you are loading a YAML file, you are executing some or all of the steps in the Load direction, starting at the Presentation (Character Stream). YAML implementations usually promote their most high-level APIs, which load the YAML file all the way to Native (Data Structure). This is true for most common YAML implementations, e.g. PyYAML/ruamel, SnakeYAML, go-yaml, and Ruby's YAML module. Other implementations, such as libyaml and yaml-cpp, only provide deserialization up to the Representation (Node Graph), possibly due to restrictions of their implementation languages (loading into native data structures requires either compile-time or runtime reflection on types).
The important information for us is what is contained in those boxes. Each box mentions information which is not available anymore in the box left to it. So this means that styles and comments, according to the YAML specification, are only present in the actual YAML file content, but are discarded as soon as the YAML file is parsed. For you, this means that once you have loaded a YAML file to a native data structure, all information about how it originally looked in the input file is gone. Which means that when you dump the data, the YAML implementation chooses a representation it deems useful for your data. Some implementations let you give general hints/options, e.g. that all scalars should be quoted, but that doesn't help you restore the original formatting.
Thankfully, this diagram only describes the logical process of loading YAML; a conforming YAML implementation does not need to slavishly conform to it. Most implementations actually preserve data longer than they need to. This is true for PyYAML/ruamel, SnakeYAML, go-yaml, yaml-cpp, libyaml and others. In all these implementations, the style of scalars, sequences and mappings is remembered up until the Representation (Node Graph) level.
On the other hand, comments are discarded rather early since they do not belong to an event or node (the exceptions here is ruamel which links comments to the following event, and go-yaml which remembers comments before, at and after the line that created a node). Some YAML implementations (libyaml, SnakeYAML) provide access to a token stream which is even more low-level than the Event Tree. This token stream does contain comments, however it is only usable for doing things like syntax highlighting, since the APIs do not contain methods for consuming the token stream again.
So what to do?
Loading & Dumping
If you need to only load your YAML file and then dump it again, use one of the lower-level APIs of your implementation to only load the YAML up until the Representation (Node Graph) or Serialization (Event Tree) level. The API functions to search for are compose/parse and serialize/present respectively.
It is preferable to use the Event Tree instead of the Node Graph as some implementations already forget the original order of mapping keys (due to internally using hashmaps) when composing. This question, for example, details loading / dumping events with SnakeYAML.
Information that is already lost in the event stream of your implementation, for example comments in most implementations, is impossible to preserve. Also impossible to preserve is scalar layout, like in this example:
"1 \x2B 1"
This loads as string "1 + 1" after resolving the escape sequence. Even in the event stream, the information about the escape sequence has already been lost in all implementations I know. The event only remembers that it was a double-quoted scalar, so writing it back will result in:
"1 + 1"
Similarly, a folded block scalar (starting with >) will usually not remember where line breaks in the original input have been folded into space characters.
To sum up, loading to the Event Tree and dumping again will usually preserve:
Style: unquoted/quoted/block scalars, flow/block collections (sequences & mappings)
Order of keys in mappings
YAML tags and anchors
You will usually lose:
Information about escape sequences and line breaks in flow scalars
Indentation and non-content spacing
Comments – unless the implementation specifically supports putting them in events and/or nodes
If you use the Node Graph instead of the Event Tree, you will likely lose anchor representations (i.e. that &foo may be written out as &a later with all aliases referring to it using *a instead of *foo). You might also lose key order in mappings. Some APIs, like go-yaml, don't provide access to the Event Tree, so you have no choice but to use the Node Graph instead.
Modifying Data
If you want to modify data and still preserve what you can of the original formatting, you need to manipulate your data without loading it to a native structure. This usually means that you operate on YAML scalars, sequences and mappings, instead of strings, numbers, lists or whatever structures the target programming language provides.
You have the option to either process the Event Tree or the Node Graph (assuming your API gives you access to it). Which one is better usually depends on what you want to do:
The Event Tree is usually provided as stream of events. It may be better for large data since you do not need to load the complete data in memory; instead you inspect each event, track your position in the input structure, and place your modifications accordingly. The answer to this question shows how to append items giving a path and a value to a given YAML file with PyYAML's event API.
The Node Graph is better for highly structured data. If you use anchors and aliases, they will be resolved there but you will probably lose information about their names (as explained above). Unlike with events, where you need to track the current position yourself, the data is presented as complete graph here, and you can just descend into the relevant sections.
In any case, you need to know a bit about YAML type resolution to work with the given data correctly. When you load a YAML file into a declared native structure (typical in languages with a static type system, e.g. Java or Go), the YAML processor will map the YAML structure to the target type if that's possible. However, if no target type is given (typical in scripting languages like Python or Ruby, but also possible in Java), types are deduced from node content and style.
Since we are not working with native loading because we need to preserve formatting information, this type resolution will not be executed. However, you need to know how it works in two cases:
When you need to decide on the type of a scalar node or event, e.g. you have a scalar with content 42 and need to know whether that is a string or integer.
When you need to create a new event or node that should later be loaded as a specific type. E.g. if you create a scalar containing 42, you might want to control whether that it is loaded as integer 42 or string "42" later.
I won't discuss all the details here; in most cases, it suffices to know that if a string is encoded as a scalar but looks like something else (e.g. a number), you should use a quoted scalar.
Depending on your implementation, you may come in touch with YAML tags. Seldom used in YAML files (they look like e.g. !!str, !!map, !!int and so on), they contain type information about a node which can be used in collections with heterogeneous data. More importantly, YAML defines that all nodes without an explicit tag will be assigned one as part of type resolution. This may or may not have already happened at the Node Graph level. So in your node data, you may see a node's tag even when the original node does not have one.
Tags starting with two exclamation marks are actually shorthands, e.g. !!str is a shorthand for tag:yaml.org,2002:str. You may see either in your data, since implementations handle them quite differently.
Important for you is that when you create a node or event, you may be able and may also need to assign a tag. If you don't want the output to contain an explicit tag, use the non-specific tags ! for non-plain scalars and ? for everything else on event level. On node level, consult your implementation's documentation about whether you need to supply resolved tags. If not, same rule for the non-specific tags applies. If the documentation does not mention it (few do), try it out.
So to sum up: You modify data by loading either the Event Tree or the Node Graph, you add, delete or modify events or nodes in the data you get, and then you present the modified data as YAML again. Depending on what you want to do, it may help you to create the data you want to add to your YAML file as native structure, serialize it to YAML and then load it again as Node Graph or Event Tree. From there, you can include it in the structure of the YAML file you want to modify.
Conclusion / TL;DR
YAML has not been designed for this task. In fact, it has been defined as a serialization language, assuming that your data is authored as native data structures in some programming language and from there dumped to YAML. However, in reality, YAML is used a lot for configuration, meaning that you typically write YAML by hand and then load it into native data structures.
This contrast is the reason why it is so difficult to modify YAML files while preserving formatting: The YAML format has been designed as transient data format, to be written by one application, and then to be loaded by another (or the same) application. In that process, preserving formatting does not matter. It does, however, for data that is checked-in to version control (you want your diff to only contain the line(s) with data you actually changed), and other situations where you write your YAML by hand, because you want to keep style consistent.
There is no perfect solution for changing exactly one data item in a given YAML file and leaving everything else intact. Loading a YAML file does not give you a view of the YAML file, it gives you the content it describes. Therefore, everything that is not part of the described content – most importantly, comments and whitespace – is extremely hard to preserve.
If format preservation is important to you and you can't live with the compromises made by the suggestions in this answer, YAML is not the right tool for you.
I would like to challenge the accepted answer. Whether you can preserve comments, the order of map keys, or other features depends on the YAML parsing library that you use. For starters, the library needs to give you access to the parsed YAML as a YAML Document, which is a collection of YAML nodes. These nodes can contain metadata besides the actual key/value pairs. The kinds of metadata that your library chooses to store will determine how much of the initial YAML document you can preserve. I will not speak for all languages and all libraries, but Golang's most popular YAML parsing library, go-yaml supports parsing YAML into a YAML document tree and serializing YAML document back, and preserves:
comments
the order of keys
anchors and aliases
scalar blocks
However, it does not preserve indentation, insignificant whitespace, and some other minor things. On the plus side, it allows modifying the YAML document and there's another library,
yaml-jsonpath that simplifies browsing the YAML node tree. Example:
import (
"github.com/stretchr/testify/assert"
"gopkg.in/yaml.v3"
"testing"
)
func Test1(t *testing.T) {
var n yaml.Node
y := []byte(`# Comment
t: &t
- x: 1 # anchor
a:
b: *t # alias
b: |
cccc
dddd
`)
err := yaml.Unmarshal(y, &n)
assert.NoError(t, err)
y2, _ := yaml.Marshal(&n)
assert.Equal(t, y, y2)
}

Whats the difference between tiles:insertTemplate and tiles:insertDefinition

tiles:insertDefinition and tiles:insertTemplate both has putAttribute , i am not understanding the difference between the two.I am using tiles 2.x version.
thanks in advance
kranthi
A template is a view which expects to be supplied attributes while definitions are named instances of a template defined in tiles.xml (or pragmatically using the API).
tiles:insertDefinition requires the name attribue to be set, because you are inserting a defintion you have layed out in tiles.xml.
tiles:insertTemplate creates a new definition on the spot, from a view and expects you to insert values at that point. It requires the template parameter be set, there is no name attribute.
In general I don't think you should need to use either of these tags often (you can create tiles using applications without ever using either). Avoiding their use means having all definitions clearly laid out one place AND being able to see how all definitions fit together.
This central view is tiles greatest strength which these tags can undermine.
tiles:insertDefinition still means using named definitions, there is still one central location were all layout is controlled but because we are inserting the definition within a view we loose our overview of how everything fits together.
tiles:insertTemplate is akin to a JSP include, you are creating a new definition at that moment in the view and use it. This tile is not part of the overarching view.
In case the argument was not clear, JSP includes can achieve the same reduction in boiler plate code as Tiles can. It is the overarching view which tiles provides that allow you to easily change page structure across the whole application easily. Carefully consider that this is not being undermined.

Showing table hierarchy through table names

I am working on redesigning a database for a product called Project Billing. I am having trouble coming up with table names. In the old database, the names were super obscure (PRB_PROJ_LVL), so old is of no help. The database is small - 10 tables or so - but will grow over time.
Here's the problem - Projects are an entity (and table), but the word is also used as an adjetive. Example
Project - a table containing projects.
ProjectTask - a table containing project tasks; this is a child of Projects.
ProjectTemplate - a table for project templates, which is not a child of Projects. Project templates just serve as a model for creating a bunch of ProjectTasks.
So, how do I show that ProjectTask is a child of Project but ProjectTemplate isn't? Thanks as always.
Internal documentation of your schema and its intended use is one of the better ways to do this. Relying on naming convention alone will always leave open the possibility for interpretation - explicit definitions don't do that. That said, we have defined some objects which are intended for use as models (templates in your case). These model objects are not to be used or directly manipulated by the production application and over time are mutable with new objects being based on modified models. One way we tried to apply self-descriptiveness was the introduction of schema. Since we had different departments that could make use of the same model objects, we had something along the lines of (adjusted to apply to your question without assuming too much):
[dept_X].[projects]
[dept_X].[project_tasks]
And for templates, which are never directly used by the application or users (per say):
[model].[projects]
[model].[project_tasks]
As a programming reference for our developers, schema definition scripts contain documentation describing object relationships (as do the objects internally do via foreign keys, etc). As an added measure, a wiki article is generated for all new objects sorted by project. Objects existing prior to this new system (my onboarding) get documented as they get modified or as time permits which ever comes first.

Loading XML Files into VB.Net Structures

I have many, (15-20) different XML files that I need to load to VB.Net. They're designed as they would be in a database; they're designed in Access and bulk exported into XML files. Each file represents a different table in the database.
Now, I need to load this information into VB.Net. Initially, I'd love to use DAO and access the MDB directly via queries, but this won't be possible as I'm making sure the project will be easily ported to XNA/C# down the road. (Xbox 360 cannot use MDBs, so I'd rather deal with this problem now than down the road).
So, I'm stuck now trying to figure out how to wrangle together all of these XML files together. I've tried using Factories to parse each one individually. E.g., if three XML files contain data for a 'character' class, i'd pass in an instance of Character to each XML factory and the classes would apply the necessary data.
I'm trying to get past this though, as maintaining many different classes with redundant code is a pain. plus it is hard to debug as well. So I'm trying to figure out a new solution.
The only thing I can think of right now is using System.Reflection, where I parse through each member of the class/structure I'm instantiating, and then using the names of those members to read in the data from that element of the XML file.
However, this makes the assumption that each member of the structure/class has a matching element in the XML file, and vice-versa.
If you know the schema of the XML files - you could create .NET classes that can deserialize one of those XML files into an instance of a .NET object.
You can also you use xsd.exe (comes with Windows SDK download) to generate the .NET class definition for you if you have an XSD file (or can write an XSD easier than you can write a serializable .NET class).
Linq-to-XML is a good solution (and even better in VB.NET with things like XML Literals and Global Namespaces). Treating multiple XML files as DB tables can be a rough road some times, but certainly not impossible. I guess I'd start with JOIN (even though it has "C#" in the title, the samples are also in VB)..

How value objects are saved and loaded?

Since there aren no respositories for value objects. How can I load all value objects?
Suppose we are modeling a blog application and we have this classes:
Post (Entity)
Comment (Value object)
Tag (Value object)
PostsRespository (Respository)
I Know that when I save a new post, its tags are saved with it in the same table. But how could I load all tags of all posts. Should PostsRespository have a method to load all tags?
I usually do it, but I want to know others opinions
I'm looking for a better solution for this question and I found this post:
http://gojko.net/2009/09/30/ddd-and-relational-databases-the-value-object-dilemma/
This post explain very well why there is a lot of confusion with value objects and databases.
Here you are phrase which liked me too much:
"Persistence is not an excuse to turn everything to entities."
Gojko Adzic, give us three alternatives to save our value objects.
I am currently working through a similar example. Once you need to uniquely refer to tags they are no long simple value objects and may continue to grow in complexity. I decided to make them their own entities and create a separate repository to retrieve them. In most scenarios they are loaded or saved with the post but when they are required alone the other repository is used.
I hope this helps.
EDIT:
In part thanks to this post I decided to restructure my application slightly. You are right that I probably was incorrectly making tags an entity. I have since changed my application so that tags are just strings and the post repository handles all the storage requirements around tags. For operations that need posts the tags are loaded with them. For any operation that just requires tags or lists of tags the repository has methods for that.
Here is my take in how I might solve this type of problem in the way that I'm currently practicing DDD.
If you are editing something that requires tags to be added and removed from such as a Post then tags may be entities but perhaps they could be value objects and are loaded and saved along with the post either way. I personally tend to favor value objects unless the object needs to be modified but I do realize that there is a difference between entity object modeled as read only "snapshots" and actual value objects that lack identity. The tricky part is that perhaps sometimes what you would normally think of as a key could be part of a value object as long as it is not used as identity in that context and I think tags fall into this category.
If you are editing the tags themselves then it is probably a separate bounded context or at least a separate aggregate in which tags are themselves are the aggregate root and persisted through a repository. Note that the entity class that represents tags in this context doesn't have to be the same entity class for tags used in Post aggregate.
If your listing available tags on the display for read only purposes such as to provide a selection list, then that is probably a list of value objects. These value objects can but don't have to be in the Domain Model since they are mainly about supporting the UI and not about the actual domain.
Please chime in if anybody has any thoughts on why my take on this might be wrong but this is the way I've been doing it.