Query trig file with Sparql - sparql

I have a .trig file which I want to query without pushing to Jena Fuseki.
However when I try to load the model using:
Model model= FileManager.get().loadModel("filepath/demo.trig");
certain links in the original TRIG file are getting lost.
this is the code snippet:
FileManager.get().addLocatorClassLoader(RDFProject.class.getClassLoader());
Model model= FileManager.get().loadModel("filePath/demo.trig");
model.write(System.out);
Is there any alternate way to do this?

Use RDFDataMgr to load a dataset (not a model) and query that.
Dataset ds = RDFDataMgr.loadDataset("filepath/demo.trig");

Related

How do I set region for BigQuery ML.FORECAST(?

Using BigQuery ML in a local jupyter notebook (%%bigquery), I create a model in an EU dataset:
CREATE OR REPLACE MODEL mydataset.mymodel...
It evaluates fine:
ML.EVALUATE(MODEL mydataset.mymodel)...
but when I try to predict:
ML.FORECAST(MODEL mydataset.mymodel,...
I get:
Dataset myproject:mydataset was not found in location US
Why is FORECAST so xenophobic and how can I make it right?
I got rid of %%bigquery and instead used
client = bigquery.Client(location=REGION)
Try the below code. I guess your project path is missing.
CREATE OR REPLACE MODEL myproject.mydataset.mymodel...
ML.EVALUATE(MODEL myproject.mydataset.mymodel)...
ML.FORECAST(MODEL myproject.mydataset.mymodel,...

Is there a way to execute text gremlin query with PartitionStrategy

I'm looking for an implementation to run text query ex: "g.V().limit(1).toList()" while using the PatitionStrategy in Apache TinkerPop.
I'm attempting to build a REST interface to run queries on selected graph paritions only. I know how to run a raw query using Client, but I'm looking for an implementation where I can create a multi-tenant graph (https://tinkerpop.apache.org/docs/current/reference/#partitionstrategy) and query only selected tenants using raw text query instead of a GLV. Im able to query only selected partitions using pythongremlin, but there is no reference implementation I could find to run a text query on a tenant.
Here is tenant query implementation
connection = DriverRemoteConnection('ws://megamind-ws:8182/gremlin', 'g')
g = traversal().withRemote(connection)
partition = PartitionStrategy(partition_key="partition_key",
write_partition="tenant_a",
read_partitions=["tenant_a"])
partitioned_g = g.withStrategies(partition)
x = partitioned_g.V.limit(1).next() <---- query on partition only
Here is how I execute raw query on entire graph, but Im looking for implementation to run text based queries on only selected partitions.
from gremlin_python.driver import client
client = client.Client('ws://megamind-ws:8182/gremlin', 'g')
results = client.submitAsync("g.V().limit(1).toList()").result().one() <-- runs on entire graph.
print(results)
client.close()
Any suggestions appreciated? TIA
It depends on how the backend store handles text mode queries, but for the query itself, essentially you just need to use the Groovy/Java style formulation. This will work with GremlinServer and Amazon Neptune. For other backends you will need to make sure that this syntax is supported. So from Python you would use something like:
client.submit('
g.withStrategies(new PartitionStrategy(partitionKey: "_partition",
writePartition: "b",
readPartitions: ["b"])).V().count()')

How to use slim.dataset_data_provider when I have multiple TFRecords?

I am using slim.dataset_data_provider. For example,
my_dataset = slim.dataset.Dataset(
data_sources='datasets/my_data.tfrecord`,
reader=reader,
decoder=decoder,
...)
provider = slim.dataset_data_provider.DatasetDataProvider(
my_dataset,
...)
I found this is very conveinent. However, the my_data.tfrecord is already now around 15GB, and I am supposed to receive more data. Instead of re-creating a huge TFRecord file, I want to keep several TFRcord files such as my_data_A.tfrecord, my_data_B.tfrecord, and so on.
If I have multiple TFrecord files, how can I use slim.dataset_data_provider? Or, is there a way to do this?
With an experiment, I think one can use several tfrecords such as
my_dataset = slim.dataset.Dataset(
data_sources=['a.tfrecord`, 'b.tfrecord`],
reader=reader,
decoder=decoder,
...)

Converting map into tuple

I am loading hbase table using pig.
product = LOAD 'hbase://product' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('a:*', '-loadKey true') AS (id:bytearray, a:map[])
The relation product has a tuple that has map in it. I want to convert the map data into tuple
Here is the sample..
grunt>dump product;
06:177602927,[cloud_service#true,wvilnk#true,cmpgeo#true,cmplnk#true,webvid_standard#true,criteria_search#true,typeahead_search#true,aasgbr#true,lnkmin#false,aasdel#true,aasmcu#true,aasvia#true,lnkalt#false,aastlp#true,cmpeel#true,aasfsc#true,aasser#true,aasdhq#true,aasgbm#true,gboint#true,lnkupd#true,aasbig#true,webvid_basic#true,cmpelk#true]
06:177927527,[cloud_service#true,wvilnk#true,cmpgeo#true,cmplnk#true,webvid_standard#true,criteria_search#true,typeahead_search#true,aasgbr#false,lnkmin#false,aasdel#false,aasmcu#false,aasvia#false,lnkalt#false,aastlp#true,cmpeel#true,aasfsc#false,aasser#false,aasdhq#true,aasgbm#false,gboint#true,lnkupd#true,aasbig#false,webvid_basic#true,cmpelk#true,blake#true]
I want to convert each tuple into individual records like below
177602927,cloud_service,true
177602927,wvilnk,true
177602927,cmpgeo,true
177602927,cmpgeo,true
I am pretty new to pig and perhaps this is my first time to do something with Pig Latin. Any help is much appreciated.
I was able to find a fix for my problem.
I used a UDF called MapEntriesToBag which will convert all the maps into bags.
Here is my code.
>register /your/path/to/this/Jar/Pigitos-1.0-SNAPSHOT.jar
>DEFINE MapEntriesToBag pl.ceon.research.pigitos.pig.udf.MapEntriesToBag();
>product = LOAD 'hbase://product' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('a:*', '-loadKey true') AS (id:bytearray, a:map[])
>b = foreach product generate flatten(SUBSTRING($0,3,12)), flatten(MapEntriesToBag($1));
The UDF is available in the Jar Pigitos-1.0-SNAPSHOT.jar. You can download this jar from here
For more information you can refer to this link. It has more interesting UDF's related to Map datatype.

Jena SPARQL API using inference rules file

I am working with Jena SPARQL API, and I want to execute queries on my RDF files after applying inference rules. I created a .rul file that contains all my rules; now I want to run those rules and execute my queries. When I used OWL, I proceeded this way:
OntModel model1 = ModelFactory.createOntologyModel( OntModelSpec..OWL_MEM_MICRO_RULE_INF);
// read the RDF/XML file
model1.read( "./files/ontology.owl", "RDF/XML" );
model1.read( "./files/data.rdf", "RDF/XML" );
// Create a new query
String queryString =
".....my query";
Query query = QueryFactory.create(queryString);
QueryExecution qe = QueryExecutionFactory.create(query, model1);
ResultSet results = qe.execSelect();
ResultSetFormatter.out(System.out, results, query);
I want to do the same thing with inferences rules, i.e., load my .rul file like this:
model1.read( "./files/rules.rul", "RDF/XML" );
This didn't work with .rul files, the rules are not executed. Any ideas how to load a .rul file? Thanks in advance.
Jena rules aren't RDF, and you don't read them into a model.
RDFS is RDF, and it is implemented internally using rules.
To build an inference model:
Model baseData = ...
List<Rule> rules = Rule.rulesFromURL("file:YourRulesFile") ;
Reasoner reasoner = new GenericRuleReasoner(rules);
Model infModel = ModelFactory.createInfModel(reasoner, baseData) ;
See ModelFactory for other ways to build models (e.g., RDFS inference) directly.