Differences between BP1 and BP2 or BP3 - W3C Data on the Web Best Practices - semantic-web

I'm starting to study about the W3C Best Practices on Linked Open Data.
And I'm stuck with an issue regarding the initial best practices
Best Practice 1: Provide metadata -> Provide metadata for both human users and computer applications.
vs.
Best Practice 2: Provide descriptive metadata -> Provide metadata that describes the overall features of datasets and distributions (both machine- and human-readable).
Best Practice 3: Provide structural metadata -> Provide metadata that describes the schema and internal structure of a distribution (both machine- and human-readable).
By providing evidence on BP2 or BP3, can I assume that BP1 is also met (i.e., BP1 is redundant)?

By my reading, satisfying either BP2 or BP3 would also satsify BP1, but BP1 might also be satisfied without satisfying either BP2 or BP3.
So, BP1 is not strictly redundant.

Related

Is this an intelligent usecase of Optaplanner?

I'm trying to build an intelligent field mapper and would like to know whether Optaplanner is the correct fit. The requirement is as follows
I have a UI which accept source and target xml field schema.
The source and target schema contains multiple business fields which can be further classified to multiple business groups. There will be certain rules (which we can consider as constraints as per optaplanner) which needs to be considered during the field mapping. The objective of the tool is to find the field mapping ( Each source field needs to find its best fit target field)
Can optaplanner be used to solve this problem ? I'm confused whether this is a mathematical optimzation problem or a machine learning predictive model problem (For this to work, i need to work on building sufficient labelled mapping data )
Any help will be much appreciated.
From the top of my head, there are 3 potential situations to be in, going from easy to hard:
A) You can use a decision table to figure the mapping decision. Don't use OptaPlanner, use Drools.
B) Given two mapping proposals, you can score which one is better, through a formal scoring function. Use OptaPlanner.
C) You don't know your scoring function in detail yet. You have some vague ideas. You want to use training data of historical decisions to build one. Don't use OptaPlanner, use Machine Learning.

cTAKES indication that category > 0 sources are about to be used?

In appendix 1 of the UMLS license agreement, there is a listing of all sources within the current version of the UMLS Metathesaurus with an indication of any additional restrictions and notices that apply. Loosely speaking, it seems like you can generally have your way with the Metathesaurus data sources that fall within category-0 of the license, but things get more restrictive at categories above that.
For example (likely a bad example as I am not a lawyer), looking at section 12.2 of the main license section:
LICENSEE is prohibited from using the vocabulary source in operational
applications that create records or information containing data from
the vocabulary source. Use for data creation research or product
development is allowed.
My question then is: (since cTAKES already has my UMLS credentials) is there any way to tell when doing a certain action with cTAKES is going to instruct it use/access data from the Metathesaurus that have a category > 0 (eg. some popup warning or header comment in the binary files)? Thanks
** The reason I'm interested is because: suppose that a certain ctakes process uses a category-2 data source to do something on some input that populates data into some XMI output (I don't know much about ctakes' full implementation, but for sake of arg. lets assume this is true) that gets post-processed and stored as some report for an organization. It would seem that the organization has violated the category-2 restriction inadvertently (since they were never warned about the underlying data being used to generate the outputs). I may be grossly misunderstanding something here, so please let me know if this is the case.

"Best practice" for HBase data "serialization"

Hi I am new to HBase and I wonder what is the best approach to serialize and store the data to HBase. Is there any convenient way how to transform "business objects" at application level to HBase objects (Put) - transformation to byte[]. I doubt that it has to be converted manually via helpers methods like .toByte etc.
What are the best practices and experiences?
I read about Avro, Thrift, n-orm, ...
Can someone share his knowledge?
I would go with the default Java API and enable compression on HDFS rather than using a framework for serializing / deserializing efficiently during RPC calls.
Apparently, updates like addition of a column to records in Avro/Thrift would be difficult as you are forced to delete and recreate.
Secondly, I don't see support for Filters in thrift/avro. In case you need to filter data at the source.
My two cents .
For a ORM solution, kindly have a look at https://github.com/impetus-opensource/Kundera .

WCF Service to store data dynamically from XML to Database

I'm trying to build a WCF service that gets info from another service via XML. The XML usually has 4 elements ranging from int, string to DateTime. I want to build the service dynamically so that when it gets an XML, it stores it in the Database. I don't want to hardcode the types and element names in the code. If there is a change, I want it to dynamically add it to the database What is the best way of doing this? Is this a good practice? Or should i stick with just using Entity Framework and having a set model for the database and hardcode the element names and types?
Thanks
:)
As usual, "it depends...".
If the nature of the data you're receiving is that it has no fixed schema, or a schema that changes frequently as a part of its business logic/domain logic, it's a good idea to design a solution to manage that. Storing data whose schema you don't know at design time is a long-running debate on StackOverflow - look for "property bag", "entity attribute value" or EAV to see various questions and answers.
The thing you give up with this approach is - at the web service layer - the ability to check that the data you're receiving meets the agreed interface contract; this in turn helps to avoid all kinds of exciting bugs - what should your system do if it receives invalid XML? Without an agreed schema/dtd, you have to build all kinds of other checks into your code.
At the database level, you usually end up giving up aspects of the relational model (and therefore the power of SQL) to store your data without "traditional" row-and-column relationships. That often makes queries harder, and may sacrifice standards compliance (e.g. by using vendor specific extensions).
If the data changes because of technical reasons - i.e. it's not the nature of the data to change, it's just the technology chain that makes you worry about this - I'd suggest you instead build in the concept of versioning, and have "strongly typed" services/data with different versions. Whilst this seems more work, the benefits of relying on schema validation, the relational database model, and simplicity usually make it a good trade-off...
Put the xml into the database as xml. Many modern databases support XML directly. If not, just store it as text.

Anyone using Change Data Capture for Replication?

I'm soon going to be involved in a project to replace our current method of replication from transactional replication to some other method (for various reasons). I'm considering using CDC as an alternate method. I'm envisioning CDC capturing all the changes and then another process would read the changes and apply them to a target database.
I don't know much about CDC and whether it's suitable for this task. Can anyone let me know of your experience doing this, or something closely related? Pros and cons, pitfalls etc.
Thanks very much.
We've been looking into the same thing. Our issue w/ replication has been the tight schema coupling. CDC allows for mostly the same functionality, without the schema coupling.
I think it's doable, but you may be reinventing the wheel. The typical use case for CDC is OLTP -> Data Warehouse, where schemas are completely different, and the data only moves in one direction.