Defining schema for the RDF/OWL - schema

I am using RDF as the data model for my meta data.
The meta data will contain IOPE and the other parameter, like efficiency/accuracy which are not the part of IOPE but part of metadata for a process.
There is schema for the process (http://www.daml.org/services/owl-s/1.0/Process.owl) which includes the IOPE.
1) How can I define a schema for other parameter to be present in my RDF file ?
2) I am using Protege but I also can't find a way to use the Process schema or any existing schema.

Related

How to auto detect schema from file in GCS and load to BigQuery?

I'm trying to load a file from GCS to BigQuery whose schema is auto-generated from the file in GCS. I'm using Apache Airflow to do the same, the problem I'm having is that when I use auto-detect schema from file, BigQuery creates schema based on some ~100 initial values.
For example, in my case there is a column say X, the values in X is mostly of Integer type, but there are some values which are of String type, so bq load will fail with schema mismatch, in such a scenario we need to change the data type to STRING.
So what I could do is manually create a new table by generating schema on my own. Or I could set the max_bad_record value to some 50, but that doesn't seem like a good solution. An ideal solution would be like this:
Try to load the file from GCS to BigQuery, if the table was created successfully in BQ without any data mismatch, then I don't need to do anything.
Otherwise I need to be able to update the schema dynamically and complete the table creation.
As you can not change column type in bq (see this link)
BigQuery natively supports the following schema modifications:
BigQuery natively supports the following schema modifications:
* Adding columns to a schema definition
* Relaxing a column's mode from REQUIRED to NULLABLE
All other schema modifications are unsupported and require manual workarounds
So as a workaround I suggest:
Use --max_rows_per_request = 1 in your script
Use 1 line which is the best suitable for your case with the optimized field type.
This will create the table with the correct schema and 1 line and from there you can load the rest of the data.

Is there was to apply a NEW Avro schema to an existing schema in Nifi without infering order?

I am using Nifi to load CSVs, apply a NEW schema and load them into a SQl db. Currently I am writting an Avro schema, and applying the schema to each CSV. I am writing the schema based on the order of the incoming CSV- the first field = first column in CSV. Is there a way to map one schema to another based on column name? I.e. can I say 'csv.name -> sql.username'.
I know this can be done manually before uploading the csvs, I am wondering if there is a way within Nifi to map a schema to data based on the datas current schema, not knowing the order of the current schema, just the fields.
I have read about recordpaths and update records. I am looking for something to match the whole incoming schema to a new schema, not based on order.
Avro Schema Settings:
PutDatabaseRecord settings
As I see it, you have two options:
Option 1(better one):
Add a header line to your records and set Treat First Line as Header to True in your CSVReader
Option 2:
Set Schema Access Strategy in your CSVReader to Infer Schema(available since NiFi 1.9.0)
The first one can guarantee a correct mapping of your fields their types.

What is meaning of schema evolution for Parquet and Avro file format in Hive

Can anyone explain meaning of schema evolution for parquet and Avro file format in Hive.
Schema evolution is nothing but a term used for how to store the behaves when schema changes . Users can start with a simple schema, and gradually add more columns to the schema as needed. In this way, users may end up with multiple Parquet/Avro files with different but mutually compatible schemas.
so lets say if you have one avro/parquet file and you want to change its schema, you can rewrite that file with a new schema inside. But what if you have terabytes of avro/parquet files and you want to change their schema? Will you rewrite all of the data, every time the schema changes?
Schema evolution allows you to update the schema used to write new data, while maintaining backwards compatibility with the schema(s) of your old data. Then you can read it all together, as if all of the data has one schema. Of course there are precise rules governing the changes allowed, to maintain compatibility. Those rules are listed under Schema Resolution.

Making XML according to XML Schema from data taken from SQL VIEW

Soon i will be doing xml file from my database according to given XML schema in .xsd file. I made it before but on simple tables with elements with no children. Now, schema is not that simple. Element has chilren and they also got it's children. I don't know how i can map data from my database into xml using that xml schema. For simple data it was very easy. I loaded schema and used it during creating xml file and giving fields from database names of tags in .xsd file. What happens when we got complex elements in schema? How can i map field from database into 3rd lvl elements?

Schema while storing relation in PigLatin

While I store a relation in PigLatin, the data is stored and when LOAD is performed again I need to specify schema. Is there a way to store schema along with the data? I am using Pig 0.8.2