Plagiarism with Jplag for different languages - plagiarism-detection

I have a bunch of codes from students for a coding question. I am using Jplag to find the similarities between their codes.
java -jar jplag-yourVersion.jar -l java17 -r /tmp/jplag_results_exerise1/ -s /path/to/exercise1
This kind of syntax works for single language. But i am having codes in multiple languages like c, c++, java, python, ruby.
Can some one suggest me a method to process all the codes which are in different languages.

According to these references, it is not possible to detect cross-language plagiarism with JPLAG.
Detecting source code reuse across programming languages:
"JPlag is able to detect source code reuse in different programming languages although at monolingual level, that is, one programming language at a time."
(CLSCR) CROSS LANGUAGE SOURCE CODE REUSE DETECTION USING INTERMEDIATE LANGUAGE:
"Some of the tools are Sherlock, MOSS, JPLAG etc. All of these tools detect mono language plagiarism"

Related

Are there programming languages that directly translate into another?

Is there a programming language that doesn't compile, but rather just translates into another language? I apologize if this is a stupid question to ask, but I was just wondering if this would be a literal shortcut in creating a programming language. Wouldn't it be easier (probably not speedy) but still doable?
Is there a programming language that doesn't compile, but rather just translates into another language?
That makes no sense to me. My definition of compilation is "translating from one language (the source language) to another (the target language)".
Usually the source language is something written by humans and the target language is machine code (or asm), but that's not a requirement. In fact, many compilers are structured as multiple layers, each translating to another intermediate language (until the final layer emits code in the target language).
And it's not directly related to a language, but a particular implementation. We can take C, for example: There are C interpreters, C compilers that target assembler code, C compilers that target machine code (of various platforms), C compilers that target JavaScript, C compilers that target Perl, etc.
As for simplifying the implementation of a language: Yes, there are various kinds of code reuse that apply.
One way is to separate compiler front-ends (translate from source language to an internal abstract representation) and back-ends (translate from the internal abstract representation to machine code for a particular platform). This way you can keep the front-end and only write a new back-end if you want to support another target platform. You can also keep the back-end and only write a new front-end if you want to add support for another source language.
Another way is to use a full-blown programming language as the intermediate representation. For example, your new compiler might produce C code, which can then be compiled to machine code by any C compiler. The first implementation of C++ did exactly this. C has a number of drawbacks as a compiler target language; there have been efforts to create languages better suited for the task (see e.g. C--, which is used internally by GHC (a Haskell compiler)).
Today the most commonly translated language is JavaScript. The newer constructs of ECMAScript are translated to the old version to be compatible with older browsers. The translation is done by Babel.
There are also other languages like TypeScript and CoffeScript that are translated to JavaScript.
f2c translates Fortran 77 to C code. So it is probably an example for what you are looking for.
All general-purpose programming languages are Turing complete. That means any one of them can be translated into another.
When creating a new programming language, many designers often have their first prototypes translate their new language into one their are familiar with. This makes it easier to check if the translation is correct, that the new language is working correctly, and to share ideas with colleagues since it is machine independent.
When their design becomes stable, they make a front end to an existing compiler to do the compiling. Using an existing compiler has several advantages. Optimization is instantly available. The new language can access existing libraries. Compiling can be targeted to all the existing back ends, making the language available on different architectures.
Yes, this is one technique for creating new languages. The first experiments in what became C++ were translated to C for compilation. Taken from http://wiki.c2.com/?CeeAsAnIntermediateLanguage:
Examples of using C in this fashion:
CeeFront; the original implementation of C++, translated to C.
Comeau C++ (http://www.comeaucomputing.com/) translates C++ to C. It
is the first C++ compiler to provide full core language support for
standard C++.
Several Java-to-C translators out there (some translate Java source;
others translate JavaByteCode to C)
Many experimental language compilers use C as a backend, rather than
emitting assembly language directly.
SqueakSmalltalk's VirtualMachine is written in a subset of Smalltalk
which gets translated to C and fed to the C compiler. The
VirtualMachine used by Scheme48 is written in a StaticallyTyped
SchemeLanguage dialect called PreScheme which is compiled to C. (The
PreScheme compiler itself is written in full Scheme.)
Several SchemeImplementations compile to C (e.g. RScheme, Bigloo and
Chicken). These Schemes often use the technique described in
CheneyOnTheMta to provide support for ProperTailRecursion.
More recently, compilers targeting a subset of JavaScript capable of efficient on-the-fly compilation have been created - emscripten.
And if you count assembly language as well as high level languages, WebAssembly or other bytecode languages fit.

Difference between scripting and non scripting language

I am wondering what is difference between scripting and non scripting language. For example like LUA and C++. Because in game development I often read that they are hiring programmer who must know scripting language. Thank you!
Some of this is somewhat historical in nature.
Non-scripted languages like C and C++ are compiled into "raw machine code" (RMC).
That RMC is then run directly on the machine. Note that RMC is typically
very specific to the underlying CPU/hardware AND to the supporting Operating
System. So if you want to run a C program on both linux and windows, it has to be
compiled for each (two copies to maintain and distribute).
A scripted language is typically NOT compiled. Instead, the source
code is passed to an interpreter that understands the language. The
interpreter itself is typically written in a language that is
itself compiled to RMC. The interpreter's task is to read the
scripted language, and translate that into operations done by RMC.
The line has blurred in recent years (decades?) with the advent of
systems like Java. With languages like Java, source code is
compiled to an intermediate/portable language, and the Java Virtual
Machine handles the translation of that portable language into
operations for the target CPU/OS.

Best scripting language for cross compiling to ARM

I am looking for the best scripting language interpreter for cross compiling to an ARM processor. Here are the requirements for "best":
Its small. Ideally, I'd like to be able to decide which parts of the language and "standard" libraries that are supported. (For instance: file system, nah don't want that. Floating point math, nope don't want that either.)
Its easy. Ideally, I'd like some documentation/tutorial/examples on how to do the cross-compile.
The goal: I'm writing a small, simple web server in an embedded ARM device and I'd like to do some string processing easily. The code is currently written in C.
I'd like the server and system-level code to be in C. I'd like to write the web "application" in a scripting language. The language features I'm most interested in are:
built-in string support
built-in regex support
built-in map support (i.e key-value pairs object)
I'd like the "best" to come from the following list: Perl, Python, Ruby, Lua. But I would be open to other language suggestions.
I would consider Lua.
Lua is distributed in a small package and builds out-of-the-box in all
platforms that have an ANSI/ISO C compiler. Lua runs on all flavors of
Unix and Windows, and also on mobile devices (such as handheld
computers and cell phones that use BREW, Symbian, Pocket PC, etc.) and
embedded microprocessors (such as ARM and Rabbit) for applications
like Lego MindStorms.
Pattern matching in Lua
Lua tables should meet your map requirement.
Lua string manipulation
Lua compared to Python
(Note: I really like Python and for general purpose scriping would prefer it to Lua but in terms of portability and performance on embedded processors I'd lean towards Lua).

Can statically compiled languages replace scripting language?

Assuming you can get a dynamic interpreter; can statically compiled languages replace scripting language? I never quite understood why anyone would use a scripting language? I am talking about on PC, not a limited system which needs a simplistic interpreter. I seen some python install scripts and seen similar python and C# solutions to a problem. So why use a scripting language?
NOTE: There are things that bother me about C#, i am not asking why not use C# instead. I am asking why use a scripting language? I find static compiled languages much easier to debug and often easier to code in.
There is very little distinction these days between compiling and interpreting. Look at how an interpreted language is executed - the first step is to convert the script into some kind of internal executable form, like byte code that can be executed by a simpler instruction set. This is essentially compilation to a virtual machine format. This is exactly what modern compiled languages do. And when compiled languages are deployed in server-side web apps, they even recompile from the source on the fly. So there's practically no difference in terms of the compile/execute technique.
The only difference is in the details of the instruction set, specifically in the type system. Scripting languages are usually (but not always) dynamically typed. But many large applications are also written in dynamically typed languages too. So again, there is no clear distinction here.
Personally I think static typing, far from being "extra unnecessary effort" (as it is often described) is actually a huge productivity booster, making it much easier to write short snippets correctly on the first attempt, thanks to intellisense/autocompletion. To underline this, look at how Microsoft has improved the jQuery library simply by adding static type information to it (in specially formatted comments) so we can have intellisense in the IDE.
And meanwhile, static languages (including C# and Java) are bringing in more dynamic typing features.
So I see these categories as eventually merging and the distinction being meaningless.
Wikipedia says that a Scripting Language is a language that controls other software. You can do that with C#, but true scripting languages like Powershell are designed specifically for this.
I tend to think of a scripting language in more "interactive" terms than C#. With a scripting language, you can write a line or two of code, execute it and see the results immediately. That's not so easy in C#, where you have to put your code in a Console Application, or fire it off from a unit test, or type it into the Immediate window where you don't have intellisense.
That rapid cycle of write, execute allows rapid prototyping of complete "scripts" in a scripting language, because it gives you immediate feedback on each line of code.
This kind of question often starts flame wars as people are passionate about their respective camps.
In the computer olden days, Unix command line tools and console shells provided a rich scripting environment where all sorts of processing could be done. You didn't need to be an expert programmer in any specific language and could string (pun intended) various programs (other people wrote) together using the pipe structure to massage your data which was mostly text not binary related. It is quick and easy to make changes to your batch command file. You don't have a source file that has to be edited, compiled linked with external static or shared libries/DLLS in the case of Windows.
One thing scripting does not have normally have is speed. You don't write device drives and live internet trading AI systems in scripting. But if you run a script once a day on some data received via e-mail or ftp you don't normally care how long it takes as it can run it background anyway.
Rewind back to the present and the waters become muddy. Some scripting enviroments offer a kind of speed up facility where they will read you script and almost compile and link in modules the same a normal C++ or VB program might use for speed puposes. But this very iffy and can't be relied on.
So how do you choose which route to go. Start doing tasks using scripting. If it runs too slow or you are having to do stuff every 5 minutes then parts of your script might benifit from a section written in a traditional language or the whole thing could be written in a language.
Like anything dabble and learn
Each is used for different purposes. Programs written in scripting languages are often not self-contained; they often function as "glue code" or (as Robert Harvey mentions) to automate a task. You often find scripting language interpreters embedded within an application (cf Python in Blender; Guile, Perl and Python in GIMP; JS in umpteen different browsers; Lua in countless games). Compiled languages, on the other hand, are used to produce self-contained applications. Scripts are mostly cross-platform; compiled applications usually aren't.
Note that a scripting language doesn't necessarily use an interactive interpreter (e.g. Perl), and an interpreted language isn't necessarily use for scripts (e.g. games made using PyGame). Note also that there's nothing about the languages themselves that make them interpreted or compiled. You could have a C# interpreter or a Ruby compiler. There have been a number of Lisp systems that offered both interpreters and compilers.
I would call my shell (bash) a scripting language, and I don't see a replacement comming, which is compiled.
I like to use scala, which is a statically typed language which comes with an interpreter-like REPL-interface, and due to type interference looks pretty much like a scripting language; have a look here: http://www.simplyscala.com/ .
But it isn't meant to be the glue between other programs as the shell is, so for small jobs, which are easily verified by hand and eye, which are just a few lines of code, I prefer to use the shell. And jumping from directory to directory is comfortable in a shell, where the prompt shows where I am.
Before we begin, I don't think that I've ever met a static language user who "got" scripting language without trying them, including myself. It is a different experience.
So no. Basically, you can add features to static languages which makes them superficially seem like scripting languages (like simple type inference), but its not the same:
Many scripting language users hate static languages. They feel constrained. Scripting languages are typically very good at not getting in the users way, which is sacrificed in static languages for speed/correctness.
Duck typing will not appear in static languages.
Scripting language users don't like type annotations. Its not really possible to provide a type-inference system for scripting languages, and the simple type inference appearing in some languages now only works for static types.
Techniques like monkey patching (which to my mind is a very bad idea) is pervasive in Ruby, and allows for very powerful techniques, which won't become available soon in static languages either.
Which isn't to say that a yet-to-be-designed language can't handle scripting language features in a relatively static way, but it would be difficult for it to become popular relative to the entrenched Python/PHP/Perl/Ruby/Javascript set. Factor is the closest thing, AFAICT.
What will happen is that scripting language implementations will get faster by using JITs.
Can a screw driver replace a hammer ? No, because you just don't use them for the same purpose. And if both exist, and if such a lot of people use either one or the other, there must be a reason...
Same anwser for :
class inheritance vs prototype;
imperative vs oo;
static vs dynamic typing;
strongly vs weakly typed;
manual memory management vs GC;
C# vs Java;
blue vs red;
man vs woman;
batman vs superman (but I do think superman would win... wait, there is kryptonite... oh man, I don't know...)
etc...
Because it is shorter to write since it is a higher level language, and it doesn't need the compilation cycle which also makes thing shorter.
I am asking why use a scripting
language? I find static compiled
languages much easier to debug and
often easier to code in.
Because I find loosely-typed dynamic languages without an explicit compile-run cycle much easier to debug and generally easier to code in.

Why have language interpreters be written in the target language? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Bootstrapping a language
What's the importance of having an interpreter for a given language written in the target language (for example, PyPy)?
It's not so much about writing the interpreter in itself - more about writing the interpreter in a high-level language, not in C. Ideally, doing so allows to change details of the implementation, and making the interpreter more modular.
For the specific case of PyPy, writing the interpreter and the core objects in (R)Python allows to retarget PyPy for targets (C, JVM, .NET, JavaScript, etc), and also allows to replace aspects such as the garbage collector.
I'm sure there are many different reasons for doing it. In some cases, it's because you truly believe the language is the best tool... so writing the language interpreter or compiler in the language itself can be seen as a form of dogfooding. If you are really interested in this subject, the following article is a really amazing read about the development of squeak. The current version of squeak is a smalltalk runtime written in smalltalk.
http://users.ipa.net/~dwighth/squeak/oopsla_squeak.html
An added benefit is that if you implement good debugers and IDEs for your target language, they also work for your source language.
This way, you can prove that the target language is serious business, because being able to make it compile something is a sign that it is a good language.
OK, C++ and Java produce compilers as well... so maybe that argument is only half as good as it may seem.