Importing Data to Contentful programatically from a json file - api

I am trying to import some data programatically into contentful:
I am following the docs here
And running the command inside my integrated terminal
contentful space import --config config.json
Where the config file is
{
"spaceId": "abc123",
"managementToken": "112323132321adfWWExample",
"contentFile": "./dataToImport.json"
}
And the dataToImport.json file is
{
"data": [
{
"address": "11234 New York City"
},
{
"address": "1212 New York City"
}
]
}
The thing is I don't understand what format my dataToImport.json should be and what is missing inside this file or in my config file so that the array of addresses from the .json file get added as new entries to an already created content model inside the Contentful UI show in the screenshot below
I am not specifying the content model for the data to go into so I believe that is one issue, and I don't know how I do that. An example or repo would help me out greatly

The types of data you can import are listed : in their documentation
your json top level should say "entries" and not data, if new content of a content type is what you would like to import.
This is an example of a blog post as per content model of the tutorial they provide.
The only thing i didn't work out yet is where the user id is :D so i substituted for one of the content type 'person' also provided in their tutorial (I think it's called Gatsby Starter)
{"entries": [
{
"sys": {
"space": {
"sys": {
"type": "Link",
"linkType": "Space",
"id": "theSpaceIdToReceiveYourImport"
}
},
"type": "Entry",
"createdAt": "2019-04-17T00:56:24.722Z",
"updatedAt": "2019-04-27T09:11:56.769Z",
"environment": {
"sys": {
"id": "master",
"type": "Link",
"linkType": "Environment"
}
},
"publishedVersion": 149, -- these are not compulsory, you can skip
"publishedAt": "2019-04-27T09:11:56.769Z", -- you can skip
"firstPublishedAt": "2019-04-17T00:56:28.525Z", -- you can skip
"publishedCounter": 3, -- you can skip
"version": 150,
"publishedBy": { -- this is an example of a linked content
"sys": {
"type": "Link",
"linkType": "person",
"id": "personId"
}
},
"contentType": {
"sys": {
"type": "Link",
"linkType": "ContentType",
"id": "blogPost" -- here should be your content type 'RealtorProperties'
}
}
},
"fields": { -- here should go your content type fields, i can't see it in your post
"title": {
"en-US": "Test 1"
},
"slug": {
"en-US": "Test-1"
},
"description": {
"en-US": "some description"
},
"body": {
"en-US": "some body..."
},
"publishDate": {
"en-US": "2016-12-19"
},
"heroImage": { -- another example of a linked content
"en-US": {
"sys": {
"type": "Link",
"linkType": "Asset",
"id": "idOfTHisImage"
}
}
}
}
},
--another entry, ...]}

Have a look at this repo. I am also trying to figure this out. Looks like there's quite a lot of fields that need to be included in the json file. I was hoping there'd be a simple solution but it seems you (me too actually) will need to create scripts to "convert" your json file to data contentful can read and import.
I'll let you know if I find anything better.

Related

GraphQL query - Query by ID

I have installed the strapi-starter-blog locally and I'm trying to understand how I can query article by ID (or slug). When I open the GraphQL Playground, I can get all the article using:
query Articles {
articles {
id
title
content
image {
url
}
category {
name
}
}
}
The response is:
{
"data": {
"articles": [
{
"id": "1",
"title": "Thanks for giving this Starter a try!",
"content": "\n# Thanks\n\nWe hope that this starter will make you want to discover Strapi in more details.\n\n## Features\n\n- 2 Content types: Article, Category\n- Permissions set to 'true' for article and category\n- 2 Created Articles\n- 3 Created categories\n- Responsive design using UIkit\n\n## Pages\n\n- \"/\" display every articles\n- \"/article/:id\" display one article\n- \"/category/:id\" display articles depending on the category",
"image": {
"url": "/uploads/blog_header_network_7858ad4701.jpg"
},
"category": {
"name": "news"
}
},
{
"id": "2",
"title": "Enjoy!",
"content": "Have fun!",
"image": {
"url": "/uploads/blog_header_balloon_32675098cf.jpg"
},
"category": {
"name": "trends"
}
}
]
}
}
But when I try to get the article using the ID with variable, like here github code in the GraphQL Playground with the following
Query:
query Articles($id: ID!) {
articles(id: $id) {
id
title
content
image {
url
}
category {
name
}
}
}
Variables:
{
"id": 1
}
I get an error:
...
"message": "Unknown argument \"id\" on field \"articles\" of type \"Query\"."
...
What is the difference and why can't I get the data like in the example of the Github repo.
Thanks for your help.
It's the difference between articles and article as the query. If you use the singular one you can use the ID as argument

How to use definitions from external files in JSON Schema?

I'm trying to import the definitions from another json schema using $ref but getting the following error:
can't resolve reference ../base/definitions.schema.json#/definitions/datetime from id #
{
"$schema": "http://json-schema.org/draft-06/schema#",
"definitions": {
"datetime": {
"type": "string"
},
"name": {
"type": "string"
},
}
}
{
"$schema": "http://json-schema.org/draft-06/schema#",
"properties": {
"active": {"type": "boolean"},
"created_at": { "$ref": "../base/definitions.schema.json#/definitions/datetime" },
"name": { "$ref": "../base/base/definitions.schema.json#/definitions/name" },
"updated_at": { "$ref": "../base/definitions.schema.json#/definitions/datetime" }
},
"required": ["name"],
"type": "object"
}
Directory structure:
api
-- base
-- definitions.schema.json
-- country
-- country.schema.json
I have tried several combinations by using an absolute path, a file url and several other combinations of the path. Not sure what's going on.
Schema validator: ajv#5.1.1
You need to add schemas using "addSchema" method. $ref is resolved relative to "id" attribute ("$id" in draft-06), ajv doesn't (and can't) use file paths.
EDIT: added $ref section to docs.

Is it possible to inline JSON schemas into a JSON document? [duplicate]

For example a schema for a file system, directory contains a list of files. The schema consists of the specification of file, next a sub type "image" and another one "text".
At the bottom there is the main directory schema. Directory has a property content which is an array of items that should be sub types of file.
Basically what I am looking for is a way to tell the validator to look up the value of a "$ref" from a property in the json object being validated.
Example json:
{
"name":"A directory",
"content":[
{
"fileType":"http://x.y.z/fs-schema.json#definitions/image",
"name":"an-image.png",
"width":1024,
"height":800
}
{
"fileType":"http://x.y.z/fs-schema.json#definitions/text",
"name":"readme.txt",
"lineCount":101
}
{
"fileType":"http://x.y.z/extended-fs-schema-video.json",
"name":"demo.mp4",
"hd":true
}
]
}
The "pseudo" Schema note that "image" and "text" definitions are included in the same schema but they might be defined elsewhere
{
"id": "http://x.y.z/fs-schema.json",
"definitions": {
"file": {
"type": "object",
"properties": {
"name": { "type": "string" },
"fileType": {
"type": "string",
"format": "uri"
}
}
},
"image": {
"allOf": [
{ "$ref": "#definitions/file" },
{
"properties": {
"width": { "type": "integer" },
"height": { "type": "integer"}
}
}
]
},
"text": {
"allOf": [
{ "$ref": "#definitions/file" },
{ "properties": { "lineCount": { "type": "integer"}}}
]
}
},
"type": "object",
"properties": {
"name": { "type": "string"},
"content": {
"type": "array",
"items": {
"allOf": [
{ "$ref": "#definitions/file" },
{ *"$refFromProperty"*: "fileType" } // the magic thing
]
}
}
}
}
The validation parts of JSON Schema alone cannot do this - it represents a fixed structure. What you want requires resolving/referencing schemas at validation-time.
However, you can express this using JSON Hyper-Schema, and a rel="describedby" link:
{
"title": "Directory entry",
"type": "object",
"properties": {
"fileType": {"type": "string", "format": "uri"}
},
"links": [{
"rel": "describedby",
"href": "{+fileType}"
}]
}
So here, it takes the value from "fileType" and uses it to calculate a link with relation "describedby" - which means "the schema at this location also describes the current data".
The problem is that most validators do not take any notice of any links (including "describedby" ones). You need to find a "hyper-validator" that does.
UPDATE: the tv4 library has added this as a feature
I think cloudfeet answer is a valid solution. You could also use the same approach described here.
You would have a file object type which could be "anyOf" all the subtypes you want to define. You would use an enum in order to be able to reference and validate against each of the subtypes.
If the sub-types schemas are in the same Json-Schema file you don't need to reference the uri explicitly with the "$ref". A correct draft4 validator will find the enum value and will try to validate against that "subschema" in the Json-Schema tree.
In draft5 (in progress) a "switch" statement has been proposed, which will allow to express alternatives in a more explicit way.

Finding similar documents with Elasticsearch

I'm using ElasticSearch to develop service that will store uploaded files or web pages as attachment (file is one field in document). This part works fine as I can search these files using like_text as input. However, the second part of this service should compare the file that is just uploaded with the existing files in order to find duplicates or very similar files, so it doesn't recommend users same files or same web pages. The problem is that I can't get expected results for documents that are the same. Similarity between same files varies, but is never more then 0.4. Even worse, sometimes I get better scores for files which are not the same then for two exactly the same files. The java code give bellow gives me always the set of documents which are in the same order, regardless of the input. It looks like like_text extracted from uploaded file is always the same.
String mapping = copyToStringFromClasspath("/org/prosolo/services/indexing/documents- mapping.json");
byte[] txt = org.elasticsearch.common.io.Streams.copyToByteArray(file);
Client client = ElasticSearchFactory.getClient();
client.admin().indices().putMapping(putMappingRequest(indexName).type(indexType).source(mapping)).actionGet();
IndexResponse iResponse = client.index(indexRequest(indexName).type(indexType)
.source(jsonBuilder()
.startObject()
.field("file", txt)
.field("title",title)
.field("visibility",visibilityType.name().toLowerCase())
.field("ownerId",ownerId)
.field("description",description)
.field("contentType",DocumentType.DOCUMENT.name().toLowerCase())
.field("dateCreated",dateCreated)
.field("url",link)
.field("relatedToType",relatedToType)
.field("relatedToId",relatedToId)
.endObject()))
.actionGet();
client.admin().indices().refresh(refreshRequest()).actionGet();
MoreLikeThisRequestBuilder mltRequestBuilder=new MoreLikeThisRequestBuilder(client, ESIndexNames.INDEX_DOCUMENTS, ESIndexTypes.DOCUMENT, iResponse.getId());
mltRequestBuilder.setField("file");
SearchResponse response = client.moreLikeThis(mltRequestBuilder.request()).actionGet();
SearchHits searchHits= response.getHits();
System.out.println("getTotalHits:"+searchHits.getTotalHits());
Iterator<SearchHit> hitsIter=searchHits.iterator();
while(hitsIter.hasNext()){
SearchHit searchHit=hitsIter.next();
System.out.println("FOUND DOCUMENT:"+searchHit.getId()+" title:"+searchHit.getSource().get("title")+" score:"+searchHit.score());
}
And the query from browser which looks like:
http://localhost:9200/documents/document/m2HZM3hXS1KFHOwvGY1pVQ/_mlt?mlt_fields=file&min_doc_freq=1
Gives me results:
{"took":120,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},
"hits":{"total":4,
"max_score":0.41059873,
"hits":
[{"_index":"documents","_type":"document",
"_id":"gIe6NDEWRXWTMi4kMPRbiQ",
"_score":0.41059873,
"_source" :
{"file":"PCFET0NUWVBFIGh..._skiping_the_file_content_here...",
"title":"Univariate Analysis",
"visibility":"public",
"description":"Univariate Analysis Simple Tools for Description ",
"contentType":"webpage",
"dateCreated":"null",
"url":"http://www.slideshare.net/christineshearer/univariate-analysis"}}
This is exactly the same web page, so I'm expecting the score to be 1.0, not 0.41 as there is not difference between two documents except in _id. The results are even worse with files.
Mapping I was using is:
{
"document":{
"properties":{
"title":{
"type":"string",
"store":true
},
"description":{
"type":"string",
"store":"yes"
},
"contentType":{
"type":"string",
"store":"yes"
},
"dateCreated":{
"store":"yes",
"type":"date"
},
"url":{
"store":"yes",
"type":"string"
},
"visibility": {
"store":"yes",
"type":"string"
},
"ownerId": {
"type": "long",
"store":"yes"
},
"relatedToType": {
"type": "string",
"store":"yes"
},
"relatedToId": {
"type": "long",
"store":"yes"
},
"file":{
"path": "full",
"type":"attachment",
"fields":{
"author": {
"type": "string"
},
"title": {
"store": true,
"type": "string"
},
"keywords": {
"type": "string"
},
"file": {
"store": true,
"term_vector": "with_positions_offsets",
"type": "string"
},
"name": {
"type": "string"
},
"content_length": {
"type": "integer"
},
"date": {
"format": "dateOptionalTime",
"type": "date"
},
"content_type": {
"type": "string"
}
} } } } }
Does anyone have an idea what could be wrong here?
Thanks

Specific analyzers for sub-documents in lucene / elasticsearch

After reading the documentation, testing and reading a lot of other questions here on stackoverflow:
We have documents that have titles and description in multiple languages. There are also tags that are translated to the same languages. There might be up to 30-40 different languages in the system, but probably only 3 or 4 translations for a single document.
This is the planned document structure:
{
"luck": {
"id": 10018,
"pub": 0,
"pr": 100002,
"loc": {
"lat": 42.7,
"lon": 84.2
},
"t": [
{
"lang": "en-analyzer",
"title": "Forest",
"desc": "A lot of trees.",
"tags": [
"Wood",
"Nature",
"Green Mouvement"
]
},
{
"lang": "fr-analyzer",
"title": "ForĂȘt",
"desc": "A grand nombre d'arbre.",
"tags": [
"Bois",
"Nature",
"Mouvement Vert"
]
}
],
"dates": [
"2014-01-01T20:00",
"2014-06-06T20:00",
"2014-08-08T20:00"
]
}
}
Possible queries are "arbre" or "wood" or "forest" or "nature" combined with a date and a geo_distance filter, furthermore there will be some facets over the tags array (that obviously include counting).
We can produce any document structure that fits best for elasticsearch (or for lucene). It's crucial that each language is analyzed specifically, so we use "_analyzer" in order to distinguish the languages.
{
"luck": {
"properties": {
"id": {
"type": "long"
},
"pub": {
"type": "long"
},
"pr": {
"type": "long"
},
"loc": {
"type": "geo_point"
},
"t": {
"_analyzer": {
"path": "t.lang"
},
"properties": {
"lang": {
"type": "string"
},
"properties": {
"title": {
"type": "string"
},
"desc": {
"type": "string"
},
"tags": {
"type": "string"
}
}
}
}
}
}
A) Apparently, this idea does not work: after PUTting the mapping, we retrieve the same mapping ("GET") and it seems to ignore the specific analyzers (A test with a top-level "_analyzer" worked fine). Does "_analyzer" work for sub-documents and if yes how to should we refer to it? We also tested declaring the sub-document as "object" or "nested". How is multi-language document indexing supposed to work.
B) One possibility would be to put each language in its own document: In that case how do we manage the id? Finally both documents should refer to the same id. For example if the user searches for "nature" (and we don't know if the user intends to find "nature" in English or French), this document would appear twice in the result set, and the counting and paging would be very wrong (also facet counting).
Any ideas?