How to Update the Input File of an OntoRefine project in GraphDB

How to Update the Input File of an OntoRefine project in GraphDB - graphdb

I'm trying to script RDF/OWL data (re)loading to a GraphDB store and I wonder how to be able to process again a CSV file through the Ontorefine component, keeping the columns modifications and the RDF mapping, only using the REST API.

One way to script this is by using the rdf-mapper REST API, which accepts a column mapping file and a tabular file and streams the result to an input location file.
This file can afterwards be imported into GraphDB by using the import server file REST API (for which more information can be found here https://graphdb.ontotext.com/free/devhub/workbench-rest-api/curl-commands.html#data-import ).
Please keep in mind that when starting GraphDB, you need to input the directory from which you plan to import the RDF file by using this property:
-Dgraphdb.workbench.importDirectory=/import/location/
Here is a small example script of how you can import a CSV file as RDF .ttl document using cURL:
curl -X POST -sL \
--url "http://address:port/rest/rdf-mapper/rdf/stream:csv:separator={CSV-SEPERATOR}"\
-F mapping=#mapping.json \
-F data=#import_file.csv \
-H 'accept: text/turtle' \
-o export_file.ttl
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{
"fileNames": [
"export_file.ttl"
],
"importSettings": {
"baseURI": "",
"context": "",
"parserSettings": {
"failOnUnknownDataTypes": true,
"failOnUnknownLanguageTags": true,
"normalizeDataTypeValues": true,
"normalizeLanguageTags": true,
"preserveBNodeIds": true,
"stopOnError": true,
"verifyDataTypeValues": true,
"verifyLanguageTags": true
}
}
}' 'http://address:port/rest/data/import/server/{REPOSITORY-NAME}'
P.S. Here is how to create the needed mapping.json by using GraphDB Workbench and following these steps:
Go to Ontorefine -> Select and Import a tabular file -> Select Create Project -> Select RDF Mapping / Edit RDF Mapping -> Then a new window opens where you can configure the said mapping -> After configuring the mapping select "Download JSON" . The downloaded JSON mapping can be used then with the example provided above.
For more information take a look at https://graphdb.ontotext.com/free/loading-data-using-ontorefine.html?highlight=mapping

Related

Trying to Use CFEXECUTE with Curl to Post via API to Rumble

I am trying to convert this example script (from Rumble's Upload API) to using CURL:
curl -F "access_token=XXXXX" \ -F "title=A cool video" \ -F "description=Some detailed description" \ -F "license_type=0" \ -F "channel_id=123" \ -F "video=#video.mp4" \ -F "thumb=#thumbnail.jpg" \ "https://rumble.com/api/simple-upload.php"
I've been trying to get the following code working to upload videos via API to our Rumble account. I'm new to Curl and CFEXECUTE but not new to Coldfusion:
<cfexecute name = "/usr/bin/curl" arguments = "-X POST --insecure https://rumble.com/api/simple-upload.php -F access_token=#access# -F title=#titletouse# - F description=#descript# -F license_type=0 -F channel_id=#channel# -F video=##form.video#" variable="response" timeout = "999"> </cfexecute>
Most of the time the response is: [empty string]
The variables listed are required. I'm pretty sure it IS connecting to Rumble because I tried bunch of different versions and one was without POST and got back JSON response data and another was without an -F before description and got back: { "success": false, "errors": { "description": { "code": "MISSING_OR_INVALID_VALUE" } } }
For example, I tried:
with single quotes around each form field '-F license_type=0'
with -H 'Content-Type:multipart/form-data'
with semi colons between fields: -F licensetype=0;-F channel_id=#channel#
Any suggestions on what I'm doing wrong? I have tried about 30 different things.... and am out of ideas. Thank you!!!!!

Confluent schema-registry how to http post json-schema

Confluent 5.5.0 understands not just Avro schemas, but also json-schema and protobuf. I have a valid json-schema that I'm trying to curl to the schema registry server, but I keep getting the response
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \
--data #$tmpfile ${schemaregistry}/subjects/${topic}-value/versions?schemaType=JSONSCHEMA
{"error_code":42201,"message":"Either the input schema or one its references is invalid"}
The manual is unclear about how to use the schemaType parameter. I've tried as a query parameter, as a field in the json, ...
The $tmpfile I'm posting is a json with one top-level field named schema that contains a quote-escaped json-schema. The same mechanism works perfectly for Avro schemas.
Looking in the logging from the schema registry, I see that it tries to parse the provided data as an Avro schema, so no wonder it fails.
Any help? And Confluent: please clarify and fix your documentation!

Ah I got it. The documentation is unclear and wrong!
You have to add a field inside the posted json. The field name is schemaType, and its value must be JSON, and not JSONSCHEMA (what the documentation says).
For others here's an example that shows how to put local files with an avro and json schema into the schema-registry:
#!/bin/bash
schemaregistry="$1"
tmpfile=$(mktemp)
topic=avro-topic
export SCHEMA=$(cat schema.avsc)
echo '{"schema":""}' | jq --arg schema "$SCHEMA" '.schema = $schema' \
> $tmpfile
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \
--data #$tmpfile ${schemaregistry}/subjects/${topic}-value/versions
topic=json-topic
export SCHEMA=$(cat schema.json)
echo '{"schema":"","schemaType":"JSON"}' | jq --arg schema "$SCHEMA" '.schema = $schema' \
> $tmpfile
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \
--data #$tmpfile ${schemaregistry}/subjects/${topic}-value/versions
rm $tmpfile

According to the Confluent schema registry API documentation one can check the supported schema types by calling
curl --silent -X GET http://localhost:8081/schemas/types
This should result in
["JSON","PROTOBUF","AVRO"]
for the current version (5.5) so the output might help to set a proper schemaType-attribute.
Nonetheless as #bart van deenen pointed out here, the API doc is (still) wrong.

How to use markdown for description from a file in gitlab CI using release API

I'm using the Gitlab release API in the gitlab-ci.yml to be able to automatically create a new release when deploying.
Simply putting a curl request like here in the docs works just fine. For the description, the docs state that markdown is allowed, which is great. However, I can't seem to figure out or come up with an idea to load a description from a markdown file within the curl request. I've already tried storing the content of the markdown file in a variable in the gitlab-ci.yml prior to the curl and then pass it and expand it within the curl like so:
# gitlab-ci.yml
...
- DESCRIPTION=`cat ./description.md`
and also to just put the cat ./description.md in the curl request itself as the value of "description".
Here is the example from the docs:
curl --header 'Content-Type: application/json' --header "PRIVATE-TOKEN: gDybLx3yrUK_HLp3qPjS" \
--data '{ "name": "New release", "tag_name": "v0.3", "description": "Super nice release", "milestones": ["v1.0", "v1.0-rc"], "assets": { "links": [{ "name": "hoge", "url": "https://google.com" }] } }' \
--request POST https://gitlab.example.com/api/v4/projects/24/releases
And for the "description" key I would like to pass the contents of a markdown file as the value.
I was surprised to not have found a post or discussion about this already, so I suspect I'm either missing something (very basic/obvious) or folks don't really use this function (yet)?
Any help will be much appreciated.

Using the variable like you, this .gitlab-ci.yml works :
create_release:
script:
- DESCRIPTION=$(cat description.md)
- |
curl --silent --request POST --header "Content-Type:application/json" \
--header "PRIVATE-TOKEN: TOKEN" \
--data '{"name":"New release","tag_name":"v0.3", "description":"'"$DESCRIPTION"'","assets":{"links":[{"name":"hoge","url":"https://google.com"}]}}' \
https://gitlab.bankassembly.com/api/v4/projects/369/releases
The variable is expanded inside double quote (see https://superuser.com/a/835589)
Example of the content of my description.md :
## CHANGELOG\r\n\r\n- Escape label and milestone titles to prevent XSS in GFM autocomplete. !2740\r\n- Prevent private snippets from being embeddable.\r\n- Add subresources removal to member destroy service.

Any way to use presigned URL uploads and enforce tagging?

Is there any way to issue a presigned URL to a client to upload a file to S3, and ensure that the uploaded file has certain tags? Using the Python SDK here as an example, this generates a URL as desired:
s3.generate_presigned_url('put_object',
ExpiresIn=3600,
Params=dict(Bucket='foo',
Key='bar',
ContentType='text/plain',
Tagging='foo=bar'))
This is satisfactory when uploading while explicitly providing tags:
$ curl 'https://foo.s3.amazonaws.com/bar?AWSAccessKeyId=...&Signature=...&content-type=text%2Fplain&x-amz-tagging=foo%3Dbar&Expires=1538404508' \
-X PUT
-H 'Content-Type: text/plain' \
-H 'x-amz-tagging: foo=bar' \
--data-binary foobar
However, S3 also accepts the request when omitting -H 'x-amz-tagging: foo=bar', which uploads the object without tags. Since I don't have control over the client, that's… bad.
I've tried creating an empty object first and tagging it, then issuing the presigned URL to it, but PUTting the object replaces it entirely, including removing any tags.
I've tried issuing a presigned POST URL, but that doesn't seem to support the tagging parameter at all:
s3.generate_presigned_post('foo', 'bar', {'tagging': '<Tagging><TagSet><Tag><Key>Foo</Key><Value>Bar</Value></Tag></TagSet></Tagging>'})
$ curl https://foo.s3.amazonaws.com/ \
-F key=bar \
-F 'tagging=<Tagging><TagSet><Tag><Key>Foo</Key><Value>Bar</Value></Tag></TagSet></Tagging>'
-F AWSAccessKeyId=... \
-F policy=... \
-F signature=... \
-F file=#/tmp/foo
<Error><Code>AccessDenied</Code><Message>Invalid according to Policy:
Extra input fields: tagging</Message>...
I simply want to let a client upload a file directly to S3, and ensure that it's tagged a certain way in the process. Any way to do that?

Try the following code:
fields = {
"x-amz-meta-u1": "value1",
"x-amz-meta-u2": "value2"
}
conditions = [
{"x-amz-meta-u1": "value1"},
{"x-amz-meta-u2": "value2"}
]
presignedurl = s3_client.generate_presigned_post(
bucket_name, "YOUR_BUCKET_NAME",
Fields=copy.deepcopy(fields),
Conditions=copy.deepcopy(conditions)
)

Python code:
fields = {
'tagging': '<Tagging><TagSet><Tag><Key>Foo</Key><Value>Bar</Value></Tag></TagSet></Tagging>',
}
conditions = [
{'tagging': '<Tagging><TagSet><Tag><Key>Foo</Key><Value>Bar</Value></Tag></TagSet></Tagging>'}
]
presigned_url = s3_client.generate_presigned_post(
Bucket="foo",
Key="file/key.json",
Fields=copy.deepcopy(fields),
Conditions=copy.deepcopy(conditions)
)
CURL command:
$ curl -v --form-string "tagging=<Tagging><TagSet><Tag><Key>Foo</Key><Value>Bar</Value></Tag></TagSet></Tagging>" \
-F key=file/key.json \
-F x-amz-algorithm=... \
-F x-amz-credential=... \
-F x-amz-date=... \
-F x-amz-security-token=... \
-F policy=...\
-F x-amz-signature=... \
-F file=#key.json \
https://foo.s3.amazonaws.com/
Explanation
It is imperative that --form-string is used in the CURL command, otherwise CURL will interpret the =< as reading in a file!
Also ensure that key.json is in your current working directory for CURL to upload the file to S3 using the pre-signed-url.

Where do I find the project ID for the GitLab API?

I use GitLab on their servers. I would like to download my latest built artifacts (build via GitLab CI) via the API like this:
curl --header "PRIVATE-TOKEN: 9koXpg98eAheJpvBs5tK" "https://gitlab.com/api/v3/projects/1/builds/8/artifacts"
Where do I find this project ID? Or is this way of using the API not intended for hosted GitLab projects?

I just found out an even easier way to get the project id: just see the HTML content of the gitlab page hosting your project. There is an input with a field called project_id, e.g:
<input type="hidden" name="project_id" id="project_id" value="335" />

The latest version of GitLab 11.4 at the time of this writing now puts the Project ID at the top of the frontpage of your repository.
Screenshot:

On the Edit Project page there is a Project ID field in the top right corner.
(You can also see the ID on the CI/CD pipelines page, in the exameple code of the Triggers section.)
In older versions, you can see it on the Triggers page, in the URLs of the example code.

You can query for your owned projects:
curl -XGET --header "PRIVATE-TOKEN: XXXX" "https://gitlab.com/api/v4/projects?owned=true"
You will receive JSON with each owned project:
[
{
"id":48,
"description":"",
"default_branch":"master",
"tag_list":[
...
You are also able to get the project ID from the triggers configuration in your project which already has some sample code with your ID.
From the Triggers page:
curl -X POST \
-F token=TOKEN \
-F ref=REF_NAME \
https://<GitLab Installation>/api/v3/projects/<ProjectID>/trigger/builds

As mentioned here, all the project scoped APIs expect either an ID or the project path (URL encoded).
So just use https://gitlab.com/api/v4/projects/gitlab-org%2Fgitlab-foss directly when you want to interact with a project.

Enter the project.
On the Left Hand menu click Settings -> General -> Expand General Settings
It has a label Project ID and is next to the project name.
This is on version GitLab 10.2

Provide the solution that actually solve the problem the api of getting the project id for specific gitlab project
curl -XGET -H "Content-Type: application/json" --header "PRIVATE-TOKEN: $GITLAB_TOKEN" http://<YOUR-GITLAB-SERVER>/api/v3/projects/<YOUR-NAMESPACE>%2F<YOUR-PROJECT-NAME> | python -mjson.tool
Or maybe you just want the project id:
curl -XGET -H "Content-Type: application/json" --header "PRIVATE-TOKEN: $GITLAB_TOKEN" http://<YOUR-GITLAB-SERVER>/api/v3/projects/<YOUR-NAMESPACE>%2F<YOUR-PROJECT-NAME> | python -c 'import sys, json; print(json.load(sys.stdin)["id"])'
Note that the repo url(namespace/repo name) is encoded.

If you know your project name, you can get the project id by using the following API:
curl --header "Private-Token: <your_token>" -X GET https://gitlab.com/api/v4/projects?search=<exact_project_name>
This will return a JSON that includes the id:
[
{
"id":<project id>, ...
}
]

Just for the record, if someone else has the need to download artifacts from gitlab.com created via gitlab-ci
Create a private token within your browser
Get the project id via curl -XGET --header "PRIVATE-TOKEN: YOUR_AD_HERE?" "https://gitlab.com/api/v3/projects/owned"
Download the last artifact from your master branch created via a gitlab-ci step called release curl -XGET --header "PRIVATE-TOKEN: YOUR_AD_HERE?" -o myapp.jar "https://gitlab.com/api/v3/projects/4711/builds/artifacts/master/download?job=release"
I am very impressed about the beauty of gitlab.

You can view it under the repository name

You can query projects with search attribute e.g:
http://gitlab.com/api/v3/projects?private_token=xxx&search=myprojectname

As of Gitlab API v4, the following API returns all projects that you own:
curl --header 'PRIVATE-TOKEN: <your_token>' 'https://gitlab.com/api/v4/projects?owned=true'
The response contains project id. Gitlab access tokens can be created from this page- https://gitlab.com/profile/personal_access_tokens

No answer suits generic needs, the most similar is intended only for the gitlab site, not specific sites. This can be used to find the ID of the project streamer in the Gitlab server my-server.com, for example:
$ curl --silent --header 'Authorization: Bearer MY-TOKEN-XXXX' \
'https://my-server.com/api/v4/projects?per_page=100&simple=true'| \
jq -rc '.[]|select(.name|ascii_downcase|startswith("streamer"))'| \
jq .id
168
Remark that
this gives only the first 100 projects, if you have more, you should request the pages that follow (&page=2, 3, ...) or run a different API (e.g. groups/:id/projects).
jq is quite flexible. Here we're just filtering a project, you can do multiple things with it.

There appears to be no way to retrieve only the Project ID using the gitlab api. Instead, retrieve all the owner's projects and loop through them until you find the matching project, then return the ID. I wrote a script to get the project ID:
#!/bin/bash
projectName="$1"
namespace="$2"
default=$(sudo cat .namespace)
namespace="${namespace:-$default}"
json=$(curl --header "PRIVATE-TOKEN: $(sudo cat .token)" -X GET
'https://gitlab.com/api/v4/projects?owned=true' 2>/dev/null)
id=0
idMatch=0
pathWithNamespaceMatch=0
rowToMatch="\"$(echo "$namespace/$projectName" | tr '[:upper:]' '[:lower:]')\","
for row in $(echo "${json}" | jq -r '.'); do
[[ $idMatch -eq 1 ]] && { idMatch=0; id=${row::-1}; }
[[ $pathWithNamespaceMatch -eq 1 ]] && { pathWithNamespaceMatch=0; [[ "$row" == "$rowToMatch" ]] && { echo "$id"; return 0; } }
[[ ${row} == "\"path_with_namespace\":" ]] && pathWithNamespaceMatch=1
[[ ${row} == "\"id\":" ]] && idMatch=1
done
echo 'Error! Could not retrieve projectID.'
return 1
It expects the default namespace to be stored in a file .namespace and the private token to be stored in a file .token. For increased security, its best to run chmod 000 .token; chmod 000 .namespace; chown root .namespace; chown root .token

If your project name is unique, it is handy to follow the answer by shunya, search by name, refer API doc.
If you have stronger access token and the Gitlab contains a few same name projects within different groups, then search within group is more convenient. API doc here. e.g.
curl --header "PRIVATE-TOKEN: <token>" -X GET https://gitlab.com/api/v4/groups/<group_id>/search?scope=projects&search=<project_name>
The group ID can be found from the Settings page under the group domain.
And to fetch the project id from the output, you can do:
curl --header "PRIVATE-TOKEN: <token>" -X GET https://gitlab.com/api/v4/groups/<group_id>/search?scope=projects&search=<project_name> | jq '[0].id'

To get id from all projects, use:
curl --header 'PRIVATE-TOKEN: XXXXXXXXXXXXXXXXXXXXXXX' 'https://gitlab.com/api/v4/projects?owned=true' > curloutput
grep -oPz 'name\":\".*?\"|{\"id\":[0-9]+' curloutput | sed 's/{\"/\n/g' | sed 's/name//g' |sed 's/id\"://g' |sed 's/\"//g' | sort -u -n

Not Specific to question, but somehow reached here, might help others
I used chrome to get a project ID
Go to the desired project example gitlab.com/username/project1
Inspect network tab
see the first garphql request in network tab

You can search for the project path
curl -s 'https://gitlab.com/api/v4/projects?search=my/path/to/my/project&search_namespaces=true' --header "PRIVATE-TOKEN: $GITLAB_TOKEN" |python -mjson.tool |grep \"id\"
https://docs.gitlab.com/ee/api/projects.html
Which will only match your project and will not find other unnecessary projects

My favorite method is to pull from the CI/CD pipeline so on build it dynamically assigns the project id.
Simply assign a variable in your code to = CI_PROJECT_ID

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to Update the Input File of an OntoRefine project in GraphDB - graphdb

I'm trying to script RDF/OWL data (re)loading to a GraphDB store and I wonder how to be able to process again a CSV file through the Ontorefine component, keeping the columns modifications and the RDF mapping, only using the REST API.

Related

Trying to Use CFEXECUTE with Curl to Post via API to Rumble

Confluent schema-registry how to http post json-schema

How to use markdown for description from a file in gitlab CI using release API

Any way to use presigned URL uploads and enforce tagging?

Where do I find the project ID for the GitLab API?

Categories

Resources