Objective C HTML parser error "expected expression before xmlNode" - objective-c

Was following this Simple libxml2 HTML parsing example, using Objective-c, Xcode, and HTMLparser.h and http://benreeves.co.uk/objective-c-hmtl-parser/
The author notes that there's something wrong with rawContentsOfNode method.
NSArray *bodytext = [bodyNode findChildTags:#"td"];
for (HTMLNode *inputBody in bodytext) {
//NSLog(#"%#", [inputBody getAttributeNamed:#"class"]);
NSString *test = rawContentsOfNode(xmlNode *bodytext, htmlDocPtr doc);
}
There doesn't seem to be any example of using the updated version. and I can't figure out whats wrong. Any help with fixing this would be great.

The example in the StackOverflow answer won't even compile because he has just copy-pasted the note in the original example.
This:
rawContentsOfNode(xmlNode *bodytext, htmlDocPtr doc);
is part of a function prototype not a function call. It's a C function that requires and xmlNode and a htmlDocPtr as parameters. Looking at the interface of HTMLNode, we see that the prototype given in the comment is wrong, it should be:
NSString* rawContentsOfNode(xmlNode *node);
There's no mention in the source code of a function matching the prototype recommended in the blog post. I have no idea what they were talking about, unless it has been removed since the comment was made.
The XML node is a public member of the HTML node, so you could do:
test = rawContentsOfNode(inputBody->_node);
But the method rawContents does that anyway so you might as well use it.
test = [inputBody rawContents];
Note that (again checking the source code) there is an issue in that the content of the node is assumed to be encoded in UTF-8 this may be true, but the default encoding for HTTP is ISO-8859-1 so it may not.

Related

SLICC indexing syntax

I have been digging through the cache-related parts of Gem5 (particularly the parts related to directories), and I've hit a bit of a snag.
This is the code for getDirectoryEntry(Addr addr), in src/mem/ruby/protocol/MESI_Two_Level-dir.sm:
Entry getDirectoryEntry(Addr addr), return_by_pointer="yes" {
Entry dir_entry := static_cast(Entry, "pointer", directory[addr]);
if (is_valid(dir_entry)) {
return dir_entry;
}
dir_entry := static_cast(Entry, "pointer",
directory.allocate(addr, new Entry));
return dir_entry;
}
Note the first line inside the function, where it says directory[addr].
directory was previously defined like so:
machine(MachineType:Directory, "MESI Two Level directory protocol")
: DirectoryMemory * directory;
...
I am trying to understand exactly what that directory[addr] bit of code means. Intuitively, it may be calling the C++ DirectoryMemory::lookup(Addr address) method, but I haven't found any code or documentation that supports that guess.
The DirectoryMemory class doesn't define an indexing operator, and there's also nothing in the SLICC page on the wiki that describes an indexing operator.
tl;dr: what does the indexing operator mean in SLICC? It it's defined for particular objects somewhere in the SLICC code, what should I be looking for to find its definition?
Thanks in advance!
Figured it out myself. It calls the lookup method in DirectoryMemory, as I suspected.

Compile and run C code using clang API

I would like to use the clang/llvm APIs to compile a c-function, defined in a string and immediately execute it.
Something like:
void main() {
std::string codestr = "int foo(int bar) { return bar * 2; }"
clang::??? *code = clang::???.compile(codestr);
int result = code->call("foo", 5);
}
I am looking for tutorials, but what I found so far does not quite match my goal or does not work, because it refers to an outdated version of LLVM.
Currently, I am using LLVM 3.5.
Does anyone have a good tutorial at hand?
I followed this blog post with good results. The clang API has changed, so you may have to make adjustments. With LLVM 3.6.1, I got good results with the following code:
llvm::Module* compile(const char* filename) {
clang::CompilerInstance compiler;
clang::CompilerInvocation* invocation = new clang::CompilerInvocation();
llvm::IntrusiveRefCntPtr<clang::DiagnosticIDs> DiagID(new clang::DiagnosticIDs());
auto diagOptions = new clang::DiagnosticOptions();
clang::DiagnosticsEngine Diags(DiagID, diagOptions,
new clang::TextDiagnosticPrinter(llvm::errs(), diagOptions));
std::vector<const char *> arguments = {filename};
clang::CompilerInvocation::CreateFromArgs(*invocation,
&*arguments.begin(), &*arguments.end(),
Diags);
compiler.setInvocation(invocation);
compiler.setDiagnostics(new clang::DiagnosticsEngine(DiagID, diagOptions,
new clang::TextDiagnosticPrinter(llvm::errs(), diagOptions)));
std::unique_ptr<clang::CodeGenAction> action(new clang::EmitLLVMOnlyAction());
compiler.ExecuteAction(*action);
std::unique_ptr<llvm::Module> result = action->takeModule();
llvm::errs() << *result;
return result.release();
}
I was very careless with the pointers, so its very possible there's a memory leak or a double free (although it didn't crash).
I couldn't figure out how to take the source from a memory buffer, so I dumped it in a temporary file using mkstemp.
I didn't get around to executing the result, but I think you can follow #michael-haidi's response, or check out the LLVM Kaleidoscope tutorial (This is the JIT chapter).
I recommend using MCJIT because the old JIT infrastructure will be removed in a further release.
I can't point you to a full tutorial and cannot promise that the API hasn't changed since the blog post but here you'll find a guide how to use MCJIT with the Kaleidoscope example from LLVM and thats it. Examples and tutorials are hard to find for LLVM/Clang. However, I suggest trying it and maybe you can document your journey with a short example.
The Julia project also uses MCJIT for jit compilation of C++ code inside of the Julia lang. Maybe you can peek at the code and find out how the use MCJIT.
Good luck ;)

Methods with multiple arguments in objective C

If you take this method call for instance(from other post)
- (int)methodName:(int)arg1 withArg2:(int)arg2
{
// Do something crazy!
return someInt;
}
Is withArg2 actually ever used for anything inside this method ?
withArg2 is part of the method name (it is usually written without arguments as methodName:withArg2: if you want to refer to the method in the documentation), so no, it is not used for anything inside the method.
As Tamás points out, withArg2 is part of the method name. If you write a function with the exact same name in C, it will look like this:
int methodNamewithArg2(int arg1, int arg2)
{
// Do something crazy!
return someInt;
}
Coming from other programming languages, the Objective-C syntax at first might appear weird, but after a while you will start to understand how it makes your whole code more expressive. If you see the following C++ function call:
anObject.subString("foobar", 2, 3, true);
and compare it to a similar Objective-C method invocation
[anObject subString:"foobar" startingAtCharacter:2 numberOfCharacters:3 makeResultUpperCase:YES];
it should become clear what I mean. The example may be contrived, but the point is to show that embedding the meaning of the next parameter into the method name allows to write very readable code. Even if you choose horrible variable names or use literals (as in the example above), you will still be able to make sense of the code without having to look up the method documentation.
You would call this method as follows:
int i=[self methodName:arg1 withArg2:arg2];
This is just iOs's way of making the code easier to read.

Cpp . NET: "a->Methodname " vs "a.MethodName"

I would like to know the difference between these two (sorry I do not know the name of this subject).
I come from C# where I was used to write System.data as well as classA.MethodA. I have already found out that in Cpp, with namespaces I need to use ::, with classmembers ->. But what about simple "."?
I have created System::data:odbc::odbcConnection^ connection. Later I was able to use connection.Open. Why not connection->open?
Im sorry, I am sure its something easily findable on the net, but I dont know english term for these.
Thank you guys
If you have a pointer to an object, you use:
MyClass *a = new MyClass();
a->MethodName();
On the other hand, if you have an actual object, you use dotted notation:
MyClass a;
a.MethodName();
To clarify the previous answers slightly, the caret character ^ in VC++ can be thought of as a * for most intents and purposes. It is a 'handle' to a class, and means something slightly different, but similar. See this short Googled explanation:
http://blogs.msdn.com/branbray/archive/2003/11/17/51016.aspx
So, in your example there, if you initialize your connection like:
System::Data::Odbc::OdbcConnection connect;
//You should be able to do this:
connect.Open();
Conversely, if you do this:
System::Data::Odbc::OdbcConnection^ connect1 = gcnew System::Data::Odbc::OdbcConnection();
connect1.Open(); // should be an error
connect1->Open(); //correct
The short answer: C++ allows you to manage your own memory. As such, you can create and manipulate memory, through usage of pointers (essentially integer variables containing memory addresses, rather than a value).
a.Method() means a is an instance of a class, from which you call Method.
a->Method() means a is a pointer to an instance of a class, from which you call Method.
When you use syntax like a->member, you are using a pointer to a structure or object.
When you use syntax like a.member, you are using the structure or object and not a pointer to the structure or object.
I did a quick google for you and THIS looks fairly quick and decent explanation.

Writing a TemplateLanguage/VewEngine

Aside from getting any real work done, I have an itch. My itch is to write a view engine that closely mimics a template system from another language (Template Toolkit/Perl). This is one of those if I had time/do it to learn something new kind of projects.
I've spent time looking at CoCo/R and ANTLR, and honestly, it makes my brain hurt, but some of CoCo/R is sinking in. Unfortunately, most of the examples are about creating a compiler that reads source code, but none seem to cover how to create a processor for templates.
Yes, those are the same thing, but I can't wrap my head around how to define the language for templates where most of the source is the html, rather than actual code being parsed and run.
Are there any good beginner resources out there for this kind of thing? I've taken a ganer at Spark, which didn't appear to have the grammar in the repo.
Maybe that is overkill, and one could just test-replace template syntax with c# in the file and compile it. http://msdn.microsoft.com/en-us/magazine/cc136756.aspx#S2
If you were in my shoes and weren't a language creating expert, where would you start?
The Spark grammar is implemented with a kind-of-fluent domain specific language.
It's declared in a few layers. The rules which recognize the html syntax are declared in MarkupGrammar.cs - those are based on grammar rules copied directly from the xml spec.
The markup rules refer to a limited subset of csharp syntax rules declared in CodeGrammar.cs - those are a subset because Spark only needs to recognize enough csharp to adjust single-quotes around strings to double-quotes, match curley braces, etc.
The individual rules themselves are of type ParseAction<TValue> delegate which accept a Position and return a ParseResult. The ParseResult is a simple class which contains the TValue data item parsed by the action and a new Position instance which has been advanced past the content which produced the TValue.
That isn't very useful on it's own until you introduce a small number of operators, as described in Parsing expression grammar, which can combine single parse actions to build very detailed and robust expressions about the shape of different syntax constructs.
The technique of using a delegate as a parse action came from a Luke H's blog post Monadic Parser Combinators using C# 3.0. I also wrote a post about Creating a Domain Specific Language for Parsing.
It's also entirely possible, if you like, to reference the Spark.dll assembly and inherit a class from the base CharGrammar to create an entirely new grammar for a particular syntax. It's probably the quickest way to start experimenting with this technique, and an example of that can be found in CharGrammarTester.cs.
Step 1. Use regular expressions (regexp substitution) to split your input template string to a token list, for example, split
hel<b>lo[if foo]bar is [bar].[else]baz[end]world</b>!
to
write('hel<b>lo')
if('foo')
write('bar is')
substitute('bar')
write('.')
else()
write('baz')
end()
write('world</b>!')
Step 2. Convert your token list to a syntax tree:
* Sequence
** Write
*** ('hel<b>lo')
** If
*** ('foo')
*** Sequence
**** Write
***** ('bar is')
**** Substitute
***** ('bar')
**** Write
***** ('.')
*** Write
**** ('baz')
** Write
*** ('world</b>!')
class Instruction {
}
class Write : Instruction {
string text;
}
class Substitute : Instruction {
string varname;
}
class Sequence : Instruction {
Instruction[] items;
}
class If : Instruction {
string condition;
Instruction then;
Instruction else;
}
Step 3. Write a recursive function (called the interpreter), which can walk your tree and execute the instructions there.
Another, alternative approach (instead of steps 1--3) if your language supports eval() (such as Perl, Python, Ruby): use a regexp substitution to convert the template to an eval()-able string in the host language, and run eval() to instantiate the template.
There are sooo many thing to do. But it does work for on simple GET statement plus a test. That's a start.
http://github.com/claco/tt.net/
In the end, I already had too much time in ANTLR to give loudejs' method a go. I wanted to spend a little more time on the whole process rather than the parser/lexer. Maybe in version 2 I can have a go at the Spark way when my brain understands things a little more.
Vici Parser (formerly known as LazyParser.NET) is an open-source tokenizer/template parser/expression parser which can help you get started.
If it's not what you're looking for, then you may get some ideas by looking at the source code.