Extract alias from Freebase dump - alias

I have downloaded the Freebase dump from https://developers.google.com/freebase/data?hl=en, but I am confused about the relation of the file.
I know the format of the dump is <subject> <predicate> <object> .. If I want to extract the alias subset of Freebase, like http://www.freebase.com/common/topic/alias?instances&lang=en, how can I do for this?
I have tried to filter the lines that contains the mid or '/common/topic/alias', but the result is not what I want.
Is there any library to parse Freebase? Thanks!
Follow up:
I have two more questions.
Is there a list that shows all the namespace in freebase? (e.g. type.object.name is the name of object)
How can extract all the 'type of (IS A)' relations? (e.g. C++ IS A programming language)

The Freebase data dump is RDF, so any RDF parsing library should work, but zgrep would be a lot quicker. One little twist is that the predicate for the Freebase property /common/topic/alias is <http://rdf.freebase.com/ns/common.topic.alias> with the slashes converted to periods/dots.
To filter just the English aliases, you can use a command like:
$ zgrep -E "common.topic.alias>.*#en\t\.$" freebase-rdf-2015-04-19-00-00.gz
Which will give you output looking like:
<http://rdf.freebase.com/ns/m.0100c5g> <http://rdf.freebase.com/ns/common.topic.alias> "Pulska yo"#en .
<http://rdf.freebase.com/ns/m.0101107q> <http://rdf.freebase.com/ns/common.topic.alias> "Unforgiven 2002"#en .
<http://rdf.freebase.com/ns/m.01016v4g> <http://rdf.freebase.com/ns/common.topic.alias> "Ain't Nuthin' But A \"G\" Thang, Rene"#en .
...
If you want aliases in all languages, you can just use:
$ zgrep -E "common.topic.alias>" freebase-rdf-2015-04-19-00-00.gz

Related

BAT or Powershell For loop through CSV to build a URL

Solved. So my first go at this post was a VERY poorly structured question trying to obfuscate proprietary company information in a very poor manner, and not asking the question well.
Once Walter even got me thinking in the correct direction i worked through the issue. Below was the second issue i was running into and found that the #{key=value} statement was being passed into my url because for some reason my script did not like the header in my csv file. In hindsight, perhaps because i was naming my variable the same as my header. Regardless I worked around it just by using Get-Content rather than Import-CSV.
$aliases = Import-Csv -Path .\aliases.csv
foreach ($alias in $aliases) {
Write-Output ('http://www.' + $($alias) + '.mydomain.com') >> urls.txt
where the contents of aliases.csv is:
alias
Matthew
Mable
Mark
Mary
This is giving me:
http://www.#{alias=Matthew}.mydomain.com
http://www.#{alias=Mable}.mydomain.com
http://www.#{alias=Mark}.mydomain.com
http://www.#{alias=Mary}.mydomain.com
When successful urls.txt should contain:
http://www.Matthew.mydomain.com
http://www.Mable.mydomain.com
http://www.Mark.mydomain.com
http://www.Mary.mydomain.com
NOTE: Edited to clarify use case
In Powershell
Get-Content names.txt | %{"Hello, my name is $_. How are you?"} >> results.txt
By the way, with just a little more effort, you can read more than one variable from a csv file, and substitute all of them for named variables in the text. This turns out to be very useful in a variety of situations.
Edit to conform to your edit
Import-csv ./aliases.csv | %{ "http://www.$($_.alias).mydomain.com"}
Notes:
Once you get used to them, pipelines are the easiest way to process a stream of just about anything.
% is an abbreviation of Foreach-Object (not to be confused with foreach).
The loop will be done once for each object coming out of the pipe. Each object will be a PSCustomObject with a single property named alias.
$() allows evaluation of a subexpression within a double quoted string.
$_ is the current object.
the dot, in this context, separates an object specified from a named property.

Reflection to grab strings from an objective c file

I have a large objective c file filled with hardcoded objects with a description String field on them. They are all constructed like so:
Item *43 = [[Item alloc] initWithFieldId:#"43" description:#"This is test 43 of 100"];
I would like to know if there is a way for me to extract all of these strings in this .m file and write them to a text file. Is there some kind of reflection library that would let me walk this file and grab all the strings starting from description:# ending at the second quotation mark?
Not 100% sure if this is what you're looking for, but you could do it from the terminal pretty easily with a find command and some regex like Bob recommends: (edited to only echo descriptions, not entire file)
find . -type f -name \*.m -exec sed -n 's/Item .*description:#"\(.*\)"].*/\1/p' {} \;
This will return
This is test 43 of 100
for all of the matches.
For the entire line that the search finds use:
find . -type f -name \*.m -exec grep Item\.*description:# {} \;
Since there's a specific pattern to the file you can use Regular Expressions to extract what you need.
There are various editors that internally support regex, but you can always use regex through the Unix command line tools.
Here's a place to start.

Find merge arrows pointing to a version in ClearCase

I want to find all the merge arrows pointing to a certain version in a script. When I describe the version of the element with the following command:
ct describe filename##/main/some_branch/3
I get in the result the following:
Hyperlinks:
Merge <- filename##/main/other_branch/2
I want ct describe to output only the relevant information to be used in my script, ie. the versions where the merge arrows come from. In my case, the output should look simply like this:
filename##/main/other_branch/2
I didn't find any relevant parameters in the -fmt from the man page. Is there any way of doing it?
The only option in the fmt_ccase man page would be
%[hlink:filter]p
Displays the hyperlink source and target, with an arrow pointing from the source to the target. The optional H argument lists only the hyperlink names.
You can optionally specify a filter string, preceded by a colon. This filter if present, restricts the output to names that match the filter string. Case is considered when matching the string.
If this doesn't work, you have to resort to grep/awk commands in order to extract those version from the cleartool describe output.
The cleartool descr -ahlink restricts a bit the output.
–ahlink
The listing includes the path names of the objects hyperlinked to pname, annotated with → (listed object is the to- object) or ← (listed object is the from-object).
For example:
-> M:\gamma\vob1\proj\include\db.c##\main\52 <- M:\gamma\vob1\proj\bin\vega##\main\5
Beside the full script option, you can have a look at external third-party tools like R&D Reporter, which can vizualize and export those same hyperlinks.
However:
this is a commercial tool
depending on the export output and what you want, you might end up parsing just another output to extract what you need.
For more on that tool, contact Tamir Gefen.

Finding files in subdirectories created after a certain date

I'm in the process of writing a bash script (just learning it) which needs to find files in subdirectories created after a certain date. I have a folder /images/ with jpegs in various subfolders - I want to find all jpegs uploaded to that directory (or any subdirectories) after a certain date. I know about the -mtime flag, but my "last import" date is stored in %Y-%m-%d format and it'd be nice to use that if possible?
Also, each file/pathname will then be used to generate a MySQL SELECT query. I know find generally outputs the filenames found, line-by-line. But if find isn't actually the command that I should be using, it'd be nice to have a similar output format I could use to generate the SELECT query (WHERE image.file_name IN (...))
Try below script:
DATE=<<date>>
SEARCH_PATH=/images/
DATE=`echo $DATE|sed 's/-//g'`
DATE=$DATE"0000"
FILE=~/timecheck_${RANDOM}_$(date +"%Y%m%d%H%M")
touch -t $DATE $FILE
find $SEARCH_PATH -newer $FILE 2>/dev/null|awk 'BEGIN{f=0}{if(f==1)printf("\"%s\", ",l);l=$0;f=1}END{printf("\"%s\"",l)}'
rm -f $FILE
You can convert your date into the "last X days" format that find -mtime expects.
find is the correct command for this task. Send its output somewhere, then parse the file into the query.
Beware of SQL injection attacks if the files were uploaded by users. Beware of special-character quoting even if they weren't.

Script for Testing with Input files and Output Solutions

I have a set of *.in files and a set of *.soln files with matching files names. I would like to run my program with the *.in file as input and compare the output to the ones found in the *.soln files. What would be the best way to go about this? I can think of 3 options.
Write some driver in Java to list files in the folder, run the program, and compare. This would be hard and difficult.
Write a bash script to do this. How?
Write a python script to do this?
I would go for a the bash solution. Also given that what you are doing is a test, I would always save the output of the myprogram so that if there are failures, that you always have the output to compare it to.
#!/bin/bash
for infile in *.in; do
basename=${infile%.*}
myprogram $infile > $basename.output
diff $basename.output $basename.soln
done
Adding the checking or exit statuses etc. as required by your report.
If the program exists, I suspect the bash script is the best bet.
If your soln files are named right, some kind of loop like
for file in base*.soln
do
myprogram > new_$file
diff $file new_$file
done
Of course, you can check the exit code of diff and
do various other things to create a test report . . .
That looks simplest to me . . .
Karl
This is primarily a problem that requires the use of the file-system with minimal logic. Bash isn't a bad choice for such problems. If it turns out you want to do something more complicated than just comparing for equality Python becomes a more attractive choice. Java doesn't seem like a good choice for a throwaway script such as this.
Basic bash implementation might look something like this:
cd dir_with_files
program=your_program
input_ext=".in"
compare_to_ext=".soIn"
for file in *$from_extension; do
diff <("$program" "$i") "${file:0:$((${#file}-3))}$compare_to_ext"
done
Untested.