How to create a compiler in vb.net - vb.net

Before answering this question, understand that I am not asking how to create my own programming language, I am asking how, using vb.net code, I can create a compiler for a language like vb.net itself. Essentially, the user inputs code, they get a .exe. By NO MEANS do I want to write my own language, as it seems other compiler related questions on here have asked. I also do not want to use the vb.net compiler itself, nor do I wish to duplicate the IDE.
The exact purpose of what I wish to do is rather hard to explain, but all I need is a nudge in the right direction for writing a compiler (from scratch if possible) which can simply take input and create a .exe. I have opened .exe files as plain text before (my own programs) to see if I could derive some meaning from what I assumed would be human readable text, yet I was obviously sorely disappointed to see the random ascii, though it is understandable why this is all I found.
I know that a .exe file is simply lines of code, being parsed by the computer it is on, but my question here really boils down to this: What code makes up a .exe? How could I go about making one in a plain text editor if I wanted to? (No, I do not want to do that, but if I understand the process my goals will be much easier to achieve.) What makes an executable file an executable file? Where does the logic of the code fit in?
This is intended to be a programming question as opposed to a computer question, which is why I did not post it on SuperUser. I know plenty of information about the System.IO namespace, so I know how to create a file and write to it, I simply do not know what exactly I would be placing inside this file to get it to work as an executable file.
I am sorry if this question is "confusing", "dumb", or "obvious", but I have not been able to find any information regarding the actual contents of an executable file anywhere.
One of my google searches
Something that looked promising
EDIT: The second link here, while it looked good, was an utter fail. I am not going to waste hours of my time hitting keys and recording the results. "Use the "Alt" and the 3-digit combinations to create symbols that do not appear on the keyboard but that you need in the program." (step 4) How the heck do I know what symbols I need???
Thank you very much for your help, and my apologies if this question is a nooby or "bad" one.
To sum this up simply: I want to create a program in vb.net that can compile code in a particular language to a single executable file. What methods exist that can allow me to do this, and if there are none, how can I go about writing my own from scratch?

What you're asking is a pretty complex question. Sure, at its core it seems pretty basic:
Interpret the code itself
Write out the interpreted code
but each of those steps can be pretty intense. Step 1 should be somewhat achievable with some time and a LOT of elbow grease - you need to parse the code into a number of control statements based upon the specification of the language. Check out http://en.wikipedia.org/wiki/Parsing for more information on this step. In essence you're converting the typed code into a common format that represents the functionality you want.
Once you've parsed the code, the next step would be to take that parsed code and convert it into machine-runnable code (typically assembly, though with VB.NET you can write Microsoft Intermediate Language code as the output and then run it in the CLR). This is what will actually create the executable file in a manner that lets the computer run the program.
Unfortunately, the best advice for solving this problem is to either:
Go purchase several books on a programming language, machine code, assembly language, compilers, and so on, then spend several months or years reading and experimenting until the knowledge you gain from the books results in your writing a successful compiler.
Enroll in a computer science program at a local university. Writing compilers and programming languages are usually covered at a rudimentary level during the second or third year of a BS in computer science, and then in much more depth at a graduate level.
Good luck!
EDIT: If all you're looking for is a way to write code and then write a way to execute it, you might try looking at writing an interpreter for one of the scripting languages that already exist - Ruby, Python, Lua, etc.

Process.Start(String.Format("vbc.exe {0}", sourceFilePath))

Here is one book that explains it all at a very basic level. Including sample code that you can get working in a matter of hours.
As with many programming endevors, it will take you many months of study to accomplish what you want. Good luck!!

Let me start by saying this is totally doable. Ignore the naysayers.
OK, so it seems you bit off more than you could chew here. I would suggest slightly re-evaluating your goal. It seems that you want to learn how compilers work, and writing a VB.net compiler is your project to get started.
Try instead to write an interpreter, which is a much smaller and simpler task. Here's how you do it:
Write a parser for a very very small language, maybe only supporting assignment statements. Use a parser toolkit, like Antlr. This will build an AST ("abstract syntax tree" - ie a few classes or objects representing the program's syntax, without the crap like semi-colons, as a tree), which will look something like this:
then you go through the AST, and just execute it. Google for "AST walker". So for the assignment x = 5, create a hashtable entry for 'x', assign 5 to it, then move on to the next statement.
keep adding features as you go.
Once you've got the full language, you'll probably have learned enough on the way to understand the compiler books. Don't use the dragon book, try Appel's book instead, or Cooper/Torczon. There are online books if you prefer, I've never tried them.
When you go to write the compiler, you'll just be changing the bit that executes the AST into one which generates assembly (or C if you prefer) which will do the same action when it's run.
I'll grant that it seems daunting, but if you stick to a something simple to start, you'll get something working in a few days at most. In a few months, you'll have built it up to something like what you're looking for. Good luck.
To the final part of your question:
It seems that you don't know how executables are made from code. People haven't messed with hex in their compilers since the 80s, and hopefully not much even then.
Basically, after parsing the code, you go through a series of steps which make the code progressively simpler. At the end of that, you have something that is quite close to assembly. You then generate assembly, and the assembler and linker conspire to make it into an executable.

The Visual Basic .NET compiler is shipped for free as part of the .NET Framework - you don't even need the SDK or Express Editions. The compiler for VB (and C#) is located at c:\windows\microsoft.net\framework[version]\vbc.exe (or csc.exe for C#). Therefore, any computer which can run a VB.NET program can compile one. The .NET Framework also includes the System.CodeDom namespace which provides a way from within a program to compile a program, either from a document model or from a string (or file) into a .NET assembly (.exe or .dll) and generate code in both VB and C#.
Regards,
Anthony D. Green | Program Manager | Visual Basic Compiler

You create a compiler for vb.net the same way you create any other compiler. You need a lexer/parser. Entire books have been written on this topic, the most famous probably being The Dragon Book.
To provide a definitive answer: No, you cannot create a decent compiler that will generate an executable file using Notepad. You need a compiler to convert from human-readable text into the machine language (assembly or IL) that a linker or interpreter can then execute.

You can try checking out my tutorial at http://www.icemanind.com
It is a tutorial on creating your own virtual machine and assembler, written in 100% C#.

Cyclone, I'm wondering exactly what are you trying to achieve? You say "the exact purpose of what I wish to do is rather hard to explain" and "essentially, the user inputs code, they get a .exe".
If you just want the user to be able to enter code and then execute it, you might consider an existing scripting language. VBScript is built into Windows and the language is fairly similar to VB.Net, or there are various excellent free languages you can download like Python.
If the user really needs to be able to create a .exe - I think it's likely scripting might do - then why not use an existing free compiler like FreeBasic, or even Visual Basic.Net Express Edition.

I am making a tutorial for that in my website, http://dgblogs.weebly.com. It is written with C# and you can create your own computer programming language!
You will never see this syntax in my tutorials:
Console.Readline();
My website is currently offline by now so, GOOD LUCK!
Thanks,
DgBlogs

Im quite shocked that this question was never fully answered ; Recently i came across some tutorials on the subject of developing a programming language from scratch https://www.youtube.com/c/DmitrySoshnikov-education/playlists
Although the person uses Java Script the technique used for creating the tokenizer and the parser to consume the output from the tokenizer producing the AST for the transcribed language which i would consider to be GOLD! ...
Another tutorial or person https://youtube.com/playlist?list=PLSq9OFrD2Q3DasoOa54Vm9Mr8CATyTbLF
Toby Ho! Has also produce various methods using the Nearly parser (plugin) to build a language much easier but not a Pure code coder like myself(does not like extensions to do the work which i can do myself)....
https://github.com/spydaz/SpydazWeb-AI-_Emulators
I was able to do some different experiments designing a stack machine (runs mini assembly language). for my "Toy" Programming language(basic) could be executed on. (the VM) - Using Visual Basic(My personal Lang - I think in VB)
Since revisiting the tutorials ;
https://youtube.com/playlist?list=PLRAdsfhKI4OWNOSfS7EUu5GRAVmze1t2y
Immo Landwerth! Had a few more experiments This time with C#
A bit more Spaghetti to mull over; but actually revealing (after knowing a lot more);
Yes IT is possible to design a Compiler / VM with VB.net Quite easily ... without external tools .
For myself as an AI developer (CONVERSATIONAL AI / NLP) i was interested in building a compiler for the English language if it would be possible to use the technique for parsing syntax trees and building syntax trees from text using the Tokenizer/Parser to AST Method , as well as designing the transpiler to allow for the syntax tree to be used for translation to other languages. as with today's programming languages merely being a Higher lever language interfacing with a low level(VM) language. in-fact we should be developing more natural language - programming languages and dispelling the more cryptic languages such as C# (Lol) and Java and Go and the like (so many brackets and semi colons / tabs etc - truly not needed!) and that is to say that we programmers still think in code! the journey as a AI developer crosses many domains. forces many off topic research pathways .. So Again a big YES! and we Still could say that ANy programming language can be used to create another programming language it just depends on which language you "THINK IN" .... hence not having too many languages using (i understand most programming languages - but i will always think in VB)...

Related

Quick Basic Decompilation

We are looking for quick basic decompiler. The program is very old, written in DOS now we wish to enhance that code in Windows with additional functionalities. Unfortunately the developer is not traceable and only hope is decompilation.
Please suggest the best way to achieve this challenge.
Thank you
[ By Dan in the QB forum on http://qbasicnews.com , May 04, 2003 ]
Here's Microsoft's response to that question:
Microsoft does not currently offer any product capable of "decompiling" an object (.OBJ) or executable (.EXE) file back to the original source code (.BAS). The following are several reasons for this:
 
No decompiler could exactly reproduce the original source code.
When a program is compiled to an object and linked to produce an executable, most of the "names" used in the original program are converted to addresses. This loss of names means that a decompiler would have to create unique names for all the variables, procedures, and labels, and these names would not be meaningful in the context of the program.
 
Obviously, source language syntax no longer exists in the compiled object file or executable. It would be very difficult for a decompiler to interpret the series of machine language instructions that exist in an object or executable file and decide what the original source language instruction was.
 
If such a decompiler did exist and was available, anyone could use it to decompile any executable program produced in the language the decompiler was designed for.
 
For instance, if a Microsoft BASIC decompiler existed, anyone with that decompiler could use it on an executable that you had produced and from that executable obtain a copy of your source code. The source code to any program you wrote in Microsoft BASIC would be available to anyone with the decompiler. Few developers of commercial software would want to use a language product that could be deciphered, thus allowing others to obtain their source code.
There's some talk from ex-microsoft programmers who swear there was one made for thier private use. I've also seen a Basic decompiler service on a web site which does provide some working BAS code from your exe, but it's a mess and not worth the money asked for the service (http://02c1289.netsolhost.com).
I can tell you as a BASIC software developer for nearly all of my career that this is not what you want. When it came time to "port" my accounting software from DOS BASIC to Visual BASIC having the source code and a complete understanding of the code, since it was my own code, did not help in the least. It may be an over-used expression, but DOS and Windows are apples and oranges. You cannot simply convert the code, you must redesign and code the system.
All you need to do that is what you already have - a working version of the compiled code. Use it's screens to design your databases and screens in Windows, then write the underlying code. The design of those two things is often half to 90% of the work anyway. Now it did help me to have my own source for things like "how did I calculate those taxes again?" I could copy the pure logic code from DOS to Windows with little or no change, but anything involving the database or user interface had to be completely redone.
If you're not up to it, look for someone who is.
Again, you do NOT need the source code unless there's some kind of secret algorithm that you need to duplicate and don't understand yourself.

Learning - Extensibility: Dynamic loading, and any other no-recompile software updates

I'm planning on writing a program but I am stuck in a conundrum. I don't want to start writing something and then have to rewrite it all when I find out that my program is not extensible. The other problem, is I do not have enough programming knowledge to know where to begin designing my program so that it is extensible.
I have done some reading on DLLs (or delayed loading for unix), dynamic loading, run-time library loading/deloading, etc.. but I still cannot quite comprehend what I need to do. I will give a sample program example, and if someone could lead me in the right direction for what to learn so my bigger program can begin, I will be extremely grateful!
Let's say I create a console program 'iAmDog' where you can type in commands, and the dog will respond accordingly with output to the console. Now let's assume when I create this program, the dog only has 1 command, 'bark' which produces the output 'roof roof!'. How would I go about writing this program so that while the user is still running the program, I can edit a library, or code somewhere else, to add a 'sit' command to the dogs repertoire.
Again, Ideally my plan is to have no downtime or as minimal downtime as possible, all while being able to code updates to functionality of the always running program.
Thank you for reading!
Russell aka SgtPooki
Are you using .NET? Then MEF, Managed Extensibility Framework, is probably suited well for you.
There is a podcast on Hanselminutes about MEF. You find an mp3 as well as a PDF transcript, that lets you search the show. They even discuss the possibility to continuously watch a folder for updates, to achieve what you descripbe, although they discourage to do this.

Any tools to check for duplicate VB.NET code?

I wish to get a quick feeling for how much “copy and paste” coding we have, there are many tools for C# / Java to check for this type of thing. Are there any such tools that work well with VB.NET?
(I have seen what looks like lots of repeated code, but wish to get some number to help me make a case for sorting it out)
Update on progress.
I have just tried Simian.
It does not seem to be able to produce a nicely formatted report I can sent by email
It does not cope when the names of local variables, or parameters etc may have been changed, e.g it just matches on lines of text being the same.
Clone Doctor does not support VB.NET (only C# and VB 6 and lot of other)
October 2010: VB.net added to langauges supported by CloneDR
Clone Detective for Visual Studio only supports C#
SolidSDD - Source Code Duplication Detector only supports C, C++, C# and Java
DuplicateFinder is open source, but otherwise looks very match like Simian, e.g it just works on lines of text
ConQAT - Continuous Quality Assessment Toolkit seems to have a clone detector that works for VB.NET (not tried it yet)
Gendarme is a bit like FXCop and has a AvoidCodeDuplicatedInSameClassRule rule, this looks very promising, as it avoids the problem of working at the text level. Just tried it, it is the best solution so far, pity it does not search with a greater scope.
Before claiming that this question is a duplicate, please check that the other question addresses VB.NET, as a lot of tools that work well for C# don't work so well for VB.NET. (However it would not surprise me if this question is a real duplicate)
CodeRush 11.2 introduced a new feature called Duplicate Detection and Consolidation (DDC)
http://community.devexpress.com/blogs/markmiller/archive/2011/11/29/duplicate-detection-and-consolidation-in-coderush-for-visual-studio.aspx
Make sure to check out the options for it as well, as you can have it run when so many lines are changed, certainly time has passed, etc.
They've posted some decent videos on the DevExpress site too.
Simian: http://www.redhillconsulting.com.au/products/simian/
[I'm the author of CloneDR ("Clone Doctor").]
CloneDR is parameterized by a full grammar for the programming language in question. So it doesn't just match lines. Rather, it can find clones which are syntactically well-formed, with variations that are more than just identifier changes, regardless of where they stop or start in a line.
The engine on which CloneDR rests, The DMS Software Reengineering Toolkit" is a tool for analyzing large scale systems in any programming language, and uses language descriptions to drive the analysis. DMS has a wide variety of language front ends already available.
Presently it has VBScript and VB6 (as dialects of "Visual Basic"). It doesn't have VB.net, but that would be pretty straightforward to do given the DMS infrastructure and our experience with lots of other languages.
So, CloneDR could do this just fine, with a small bit of effort on our part.
EDIT October 2010: VB.net added as a language CloneDR can process.
Atomiq supports vb.net amongst other languages, and the results are nicely presented.
JetBrains published console tool set Resharper Console Tools to run duplication analysis. Once installed it allows you to do the same analysis as TeamCity does and generate duplicates report locally and even include duplicates search into custom build process with MSBuild. This tool does exactly what you need. More details you can find here at JetBrains blog post
Try Simian:
Simian (Similarity Analyser) identifies duplication in Java, C#, C, C++, COBOL, Ruby, JSP, ASP, HTML, XML, Visual Basic, Groovy source code and even plain text files.
I once saw an impressive demo of Pattern Insight; its CP Miner may be what you’re looking for: http://patterninsight.com/products/cp-miner.php. It seems to be language-independent, though I couldn’t find anything explicit about languages other than C/C++.
Roll up your sleeves and write your own parser to use it with CPD?
See the question for the tools I found.

Can statically compiled languages replace scripting language?

Assuming you can get a dynamic interpreter; can statically compiled languages replace scripting language? I never quite understood why anyone would use a scripting language? I am talking about on PC, not a limited system which needs a simplistic interpreter. I seen some python install scripts and seen similar python and C# solutions to a problem. So why use a scripting language?
NOTE: There are things that bother me about C#, i am not asking why not use C# instead. I am asking why use a scripting language? I find static compiled languages much easier to debug and often easier to code in.
There is very little distinction these days between compiling and interpreting. Look at how an interpreted language is executed - the first step is to convert the script into some kind of internal executable form, like byte code that can be executed by a simpler instruction set. This is essentially compilation to a virtual machine format. This is exactly what modern compiled languages do. And when compiled languages are deployed in server-side web apps, they even recompile from the source on the fly. So there's practically no difference in terms of the compile/execute technique.
The only difference is in the details of the instruction set, specifically in the type system. Scripting languages are usually (but not always) dynamically typed. But many large applications are also written in dynamically typed languages too. So again, there is no clear distinction here.
Personally I think static typing, far from being "extra unnecessary effort" (as it is often described) is actually a huge productivity booster, making it much easier to write short snippets correctly on the first attempt, thanks to intellisense/autocompletion. To underline this, look at how Microsoft has improved the jQuery library simply by adding static type information to it (in specially formatted comments) so we can have intellisense in the IDE.
And meanwhile, static languages (including C# and Java) are bringing in more dynamic typing features.
So I see these categories as eventually merging and the distinction being meaningless.
Wikipedia says that a Scripting Language is a language that controls other software. You can do that with C#, but true scripting languages like Powershell are designed specifically for this.
I tend to think of a scripting language in more "interactive" terms than C#. With a scripting language, you can write a line or two of code, execute it and see the results immediately. That's not so easy in C#, where you have to put your code in a Console Application, or fire it off from a unit test, or type it into the Immediate window where you don't have intellisense.
That rapid cycle of write, execute allows rapid prototyping of complete "scripts" in a scripting language, because it gives you immediate feedback on each line of code.
This kind of question often starts flame wars as people are passionate about their respective camps.
In the computer olden days, Unix command line tools and console shells provided a rich scripting environment where all sorts of processing could be done. You didn't need to be an expert programmer in any specific language and could string (pun intended) various programs (other people wrote) together using the pipe structure to massage your data which was mostly text not binary related. It is quick and easy to make changes to your batch command file. You don't have a source file that has to be edited, compiled linked with external static or shared libries/DLLS in the case of Windows.
One thing scripting does not have normally have is speed. You don't write device drives and live internet trading AI systems in scripting. But if you run a script once a day on some data received via e-mail or ftp you don't normally care how long it takes as it can run it background anyway.
Rewind back to the present and the waters become muddy. Some scripting enviroments offer a kind of speed up facility where they will read you script and almost compile and link in modules the same a normal C++ or VB program might use for speed puposes. But this very iffy and can't be relied on.
So how do you choose which route to go. Start doing tasks using scripting. If it runs too slow or you are having to do stuff every 5 minutes then parts of your script might benifit from a section written in a traditional language or the whole thing could be written in a language.
Like anything dabble and learn
Each is used for different purposes. Programs written in scripting languages are often not self-contained; they often function as "glue code" or (as Robert Harvey mentions) to automate a task. You often find scripting language interpreters embedded within an application (cf Python in Blender; Guile, Perl and Python in GIMP; JS in umpteen different browsers; Lua in countless games). Compiled languages, on the other hand, are used to produce self-contained applications. Scripts are mostly cross-platform; compiled applications usually aren't.
Note that a scripting language doesn't necessarily use an interactive interpreter (e.g. Perl), and an interpreted language isn't necessarily use for scripts (e.g. games made using PyGame). Note also that there's nothing about the languages themselves that make them interpreted or compiled. You could have a C# interpreter or a Ruby compiler. There have been a number of Lisp systems that offered both interpreters and compilers.
I would call my shell (bash) a scripting language, and I don't see a replacement comming, which is compiled.
I like to use scala, which is a statically typed language which comes with an interpreter-like REPL-interface, and due to type interference looks pretty much like a scripting language; have a look here: http://www.simplyscala.com/ .
But it isn't meant to be the glue between other programs as the shell is, so for small jobs, which are easily verified by hand and eye, which are just a few lines of code, I prefer to use the shell. And jumping from directory to directory is comfortable in a shell, where the prompt shows where I am.
Before we begin, I don't think that I've ever met a static language user who "got" scripting language without trying them, including myself. It is a different experience.
So no. Basically, you can add features to static languages which makes them superficially seem like scripting languages (like simple type inference), but its not the same:
Many scripting language users hate static languages. They feel constrained. Scripting languages are typically very good at not getting in the users way, which is sacrificed in static languages for speed/correctness.
Duck typing will not appear in static languages.
Scripting language users don't like type annotations. Its not really possible to provide a type-inference system for scripting languages, and the simple type inference appearing in some languages now only works for static types.
Techniques like monkey patching (which to my mind is a very bad idea) is pervasive in Ruby, and allows for very powerful techniques, which won't become available soon in static languages either.
Which isn't to say that a yet-to-be-designed language can't handle scripting language features in a relatively static way, but it would be difficult for it to become popular relative to the entrenched Python/PHP/Perl/Ruby/Javascript set. Factor is the closest thing, AFAICT.
What will happen is that scripting language implementations will get faster by using JITs.
Can a screw driver replace a hammer ? No, because you just don't use them for the same purpose. And if both exist, and if such a lot of people use either one or the other, there must be a reason...
Same anwser for :
class inheritance vs prototype;
imperative vs oo;
static vs dynamic typing;
strongly vs weakly typed;
manual memory management vs GC;
C# vs Java;
blue vs red;
man vs woman;
batman vs superman (but I do think superman would win... wait, there is kryptonite... oh man, I don't know...)
etc...
Because it is shorter to write since it is a higher level language, and it doesn't need the compilation cycle which also makes thing shorter.
I am asking why use a scripting
language? I find static compiled
languages much easier to debug and
often easier to code in.
Because I find loosely-typed dynamic languages without an explicit compile-run cycle much easier to debug and generally easier to code in.

Creating your own language

If I were looking to create my own language are there any tools that would help me along? I have heard of yacc but I'm wondering how I would implement features that I want in the language.
Closely related questions (all taken by searching on [compiler] on stackoverflow):
Learning Resources on Parsers, Interpreters, and Compilers
Learning to write a compiler
Constructing a simple interpreter
...
And similar topics (from the same search):
Bootstrapping a language
How much of the compiler should we know?
Writing a compiler in its own language
...
Edit: I know the stackoverflow related question search isn't what we'd like it to be, but
did we really need the nth iteration of this topic? Meh!
The first tool I would recommend is the Dragon Book. That is the reference for building compilers. Designing a language is no easy task, implementing it is even more difficult. The dragon book helps there. The book even reference to the standard unix tools lex and yacc. The gnu equivalent tools are called flex and bison. They both generate lexer and parser. There exist also more modern tools for generating lexer and parser, e.g. for java there are ANTLR (I also remember javacc and CUP, but I used myself only ANTLR). The fact that ANTLR combines parser and lexer and that eclipse plugin is availabe make it very comfortable to use. But to compare them, the type of parser you need, and know for what you need them, you should read the Dragon book. There are also other things you have to consider, like runtime environment, programming paradigm, ....
If you have already certain design ideas and need help for a certain step or detail the anwsers could be more helpful.
ANTLR is a very nice parser generator written in Java. There's a terrific book available, too.
I like Flex (Fast Lex) [Lexical scanner]
and Bison (A Hairy Yacc) [Yet another compiler compiler]
Both are free and available on all *NIX installations. For Windows just install cygwin.
But I old school.
By using these tools you can also find the lex rules and yacc gramers for a lot of popular languages on the internet. Thus providing you with a quick way to get up and running and then you can customize the grammers as you go.
Example: Arithmetic expression handling [order of precedence etc is a done to death problem] you can quickly get the grammer for this from the web.
An alternative to think about is to write a front-end extension to GCC.
Non Trivial but if you want a compiled language it saves a lot of work in the code generation section (you will still need to know love and understand flex/bison).
I never finished the complete language, I had used rply and llvmlite implements a simple foxbase language, in https://github.com/acekingke/foxbase_compiler
so if you want use python, rply or llvmlite is helpful.
if you want use golang, goyacc maybe useful. But you should write a lexical analyzer by hard coding by hand. Or you can use https://github.com/acekingke/lexergo to simplify it.