How to export n-quads from blazegraph - sparql

I want to export all data from blazegraph in the typical formats. I've found a curl command for triple-format and I've found a curl command for xml/json/.. sparql result format
curl -X POST http://localhost:9999/bigdata/namespace/YOUR_NAMESPACE/sparql --data-urlencode 'query=SELECT * WHERE{ ?s p ?o } LIMIT 1' --data-urlencode 'format=json' > outputfile
However what I really need is to generate an export in n-quads format. The graph-information is crucial for my systems. How do I do this?

Related

How can I get the media files of a product in Shopware 6 using store-api?

we have some problem with store-api. When i request a specific product the media is empty. I need to have all the product’s image. Could you help me?
Hm, it would be best to start any question with what you have attempted already, to see if you are missing some part of the puzzle.
Here is a short gist of the request you need:
curl --location --request POST 'http://website.com/store-api/product' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'sw-access-key: XXX' \
--data-raw '{
"limit": 1,
"includes": {
"product": ["cover", "media"]
},
"associations": {
"media": {}
}
}'
Sometimes the data that you are looking for requires an extra database JOIN call. In your case it's the media table, hence why you need to use associations to make sure the query joins that table to your response.
P.S. you don't have to use includes, I just used it to shorten the response object.

Get full list of Groups and Projects in Gitlab Cloud

I'm trying to get a full list of Projects and groups in out Gitlab cloud account.
I'm currently using their documentation as reference (bear in mind though I'm no developer) and using Linux command line to do so. Here's the documentation I'm trying to use:
https://docs.gitlab.com/ee/api/projects.html
https://docs.gitlab.com/ee/api/groups.html#list-a-groups-projects
I'm using the following command to get the data and parse in a readable format that I will export to csv or spreadsheet afterwards:
curl --header "PRIVATE-TOKEN: $TOKEN" "https://gitlab.com/api/v4/projects/?owned=yes&per_page=1000&page=1" | python -m json.tool | grep -E "http_url_to_repo|visibility" | awk '!(NR%2){print$0p}{p=$0}' | awk '{print $4,$2}' | sed -E 's/\"|\,//g' > gitlab.txt
My problem is that the code only return about 100 of the 280 repositories we have. It doesn't seem to get it recursively from all the groups and subgroups.
Any ideas on how I can improve this search to get everything ?
Thank you
It seems it can get only 100 per page so you will have to run it two times - first with page=1 and next with page=2. And for second page you will need >> to append to existing file gitlab.txt
curl --header "..." "https://...&per_page=100&page=1" | ... > gitlab.txt
curl --header "..." "https://...&per_page=100&page=2" | ... >> gitlab.txt
Or you will have to write script which first get all pages and later send it to pipe. You may also try to use for-loop in bash

bash: /usr/bin/bq: Argument list too long

I am trying to load 7MB dump.sql file to bigquery using the following command:
while read -r q;
do
bq query --destination_table '[project].[table]' --allow_large_results "$q"
done < <(grep '^INSERT' dump.sql);
But i get this error, " bash: /usr/bin/bq: Argument list too long "
what could be the solution for the same?
Kindly help .

how to pass variable from shell script to sqlplus

I have a shell script that calls file.sql
I am looking for a way to pass some parameters to my file.sql.
If I don't pass a variable with some value to the sql script, I will have to create multiple .sql files with SELECT statement and all that would change is few words.
My shell script calls file.sql:
sqlplus -S user/pass#localhost
echo " Processing triples"
./load_triples.sh BUILDING/Mapping_File BUILDING Y >> load_semantic.log
#/opt/D2RQ/file.sql
exit;
EOF
And this is how my file.sql looks like:
SET ECHO ON;
SPOOL count.log
SELECT COUNT(*) as total_count
FROM TABLE(SEM_MATCH(
'{
?s rdf:type :ProcessSpec .
?s ?p ?o
}',SEM_Models('BUILDING'),NULL,
SEM_ALIASES(SEM_ALIAS('','http://VISION/DataSource/SEMANTIC_CACHE#')),NULL));
SPOOL OFF;
Can I modify my shell script so that it passes variable names?
I.e.: model = "BUILDING"
and pass this on to file.sql?
Is there anything like that?
You appear to have a heredoc containing a single SQL*Plus command, though it doesn't look right as noted in the comments. You can either pass a value in the heredoc:
sqlplus -S user/pass#localhost << EOF
#/opt/D2RQ/file.sql BUILDING
exit;
EOF
or if BUILDING is $2 in your script:
sqlplus -S user/pass#localhost << EOF
#/opt/D2RQ/file.sql $2
exit;
EOF
If your file.sql had an exit at the end then it would be even simpler as you wouldn't need the heredoc:
sqlplus -S user/pass#localhost #/opt/D2RQ/file.sql $2
In your SQL you can then refer to the position parameters using substitution variables:
...
}',SEM_Models('&1'),NULL,
...
The &1 will be replaced with the first value passed to the SQL script, BUILDING; because that is a string it still needs to be enclosed in quotes. You might want to set verify off to stop if showing you the substitutions in the output.
You can pass multiple values, and refer to them sequentially just as you would positional parameters in a shell script - the first passed parameter is &1, the second is &2, etc. You can use substitution variables anywhere in the SQL script, so they can be used as column aliases with no problem - you just have to be careful adding an extra parameter that you either add it to the end of the list (which makes the numbering out of order in the script, potentially) or adjust everything to match:
sqlplus -S user/pass#localhost << EOF
#/opt/D2RQ/file.sql total_count BUILDING
exit;
EOF
or:
sqlplus -S user/pass#localhost << EOF
#/opt/D2RQ/file.sql total_count $2
exit;
EOF
If total_count is being passed to your shell script then just use its positional parameter, $4 or whatever. And your SQL would then be:
SELECT COUNT(*) as &1
FROM TABLE(SEM_MATCH(
'{
?s rdf:type :ProcessSpec .
?s ?p ?o
}',SEM_Models('&2'),NULL,
SEM_ALIASES(SEM_ALIAS('','http://VISION/DataSource/SEMANTIC_CACHE#')),NULL));
If you pass a lot of values you may find it clearer to use the positional parameters to define named parameters, so any ordering issues are all dealt with at the start of the script, where they are easier to maintain:
define MY_ALIAS = &1
define MY_MODEL = &2
SELECT COUNT(*) as &MY_ALIAS
FROM TABLE(SEM_MATCH(
'{
?s rdf:type :ProcessSpec .
?s ?p ?o
}',SEM_Models('&MY_MODEL'),NULL,
SEM_ALIASES(SEM_ALIAS('','http://VISION/DataSource/SEMANTIC_CACHE#')),NULL));
From your separate question, maybe you just wanted:
SELECT COUNT(*) as &1
FROM TABLE(SEM_MATCH(
'{
?s rdf:type :ProcessSpec .
?s ?p ?o
}',SEM_Models('&1'),NULL,
SEM_ALIASES(SEM_ALIAS('','http://VISION/DataSource/SEMANTIC_CACHE#')),NULL));
... so the alias will be the same value you're querying on (the value in $2, or BUILDING in the original part of the answer). You can refer to a substitution variable as many times as you want.
That might not be easy to use if you're running it multiple times, as it will appear as a header above the count value in each bit of output. Maybe this would be more parsable later:
select '&1' as QUERIED_VALUE, COUNT(*) as TOTAL_COUNT
If you set pages 0 and set heading off, your repeated calls might appear in a neat list. You might also need to set tab off and possibly use rpad('&1', 20) or similar to make that column always the same width. Or get the results as CSV with:
select '&1' ||','|| COUNT(*)
Depends what you're using the results for...

awk doesn't work in hadoop's mapper

This is my hadoop job:
hadoop streaming \
-D mapred.map.tasks=1\
-D mapred.reduce.tasks=1\
-mapper "awk '{if(\$0<3)print}'" \ # doesn't work
-reducer "cat" \
-input "/user/***/input/" \
-output "/user/***/out/"
this job always fails, with an error saying:
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `export TMPDIR='..../work/tmp'; /bin/awk { if ($0 < 3) print } '
But if I change the -mapper into this:
-mapper "awk '{print}'"
it works without any error. What's the problem with the if(..) ?
UPDATE:
Thank #paxdiablo for your detailed answer.
what I really want to do is filter out some data whose 1st column is greater than x, before piping the input data to my custom bin. So the -mapper actually looks like this:
-mapper "awk -v x=$x{if($0<x)print} | ./bin"
Is there any other way to achieve that?
The problem's not with the if per se, it's to do with the fact that the quotes have been stripped from your awk command.
You'll realise this when you look at the error output:
sh: -c: line 0: `export TMPDIR='..../work/tmp'; /bin/awk { if ($0 < 3) print } '
and when you try to execute that quote-stripped command directly:
pax> echo hello | awk {if($0<3)print}
bash: syntax error near unexpected token `('
pax> echo hello | awk {print}
hello
The reason the {print} one works is because it doesn't contain the shell-special ( character.
One thing you might want to try is to escape the special characters to ensure the shell doesn't try to interpret them:
{if\(\$0\<3\)print}
It may take some effort to get the correctly escaped string but you can look at the error output to see what is generated. I've had to escape the () since they're shell sub-shell creation commands, the $ to prevent variable expansion, and the < to prevent input redirection.
Also keep in mind that there may be other ways to filter depending on you needs, ways that can avoid shell-special characters. If you specify what your needs are, we can possibly help further.
For example, you could create an shell script (eg, pax.sh) to do the actual awk work for you:
#!/bin/bash
awk -v x=$1 'if($1<x){print}'
then use that shell script in the mapper without any special shell characters:
hadoop streaming \
-D mapred.map.tasks=1 -D mapred.reduce.tasks=1 \
-mapper "pax.sh 3" -reducer "cat" \
-input "/user/***/input/" -output "/user/***/out/"