Writing Binary To file in Visual Basic with the intent of unreadability - vb.net

I am working on a program that creates a "license" file. This file is expected to be binary, containing a name, today's date, a warning date, an expiration date, and a preference of Metric or Imperial units of measurement, and essentially authorizes programs to work until the expiration date is reached, before which the warning date notifies the user that the license will expire. For this functionality to be fully utilized, the dates must not be able to be easily edited so as to prevent people from setting the date to whatever they want and keeping the program.
What I have now writes each field from a String or Integer into whatever the BinaryWriter class deems should be written when I use its "write" method. I have been experimenting with the difference between Big and Little Endian encoding, which is selectable in the form.
[code redacted]
If the entered name has no spaces, the file looks a bit unreadable, but not enough. With Big Endian, most of the Expiration Date is still showing; with Little Endian, the other two dates are mostly visible. However, using spaces in the entered name changes the format of the outputted text quite a bit, making all characters deliminated by a space, and therefore incredibly easy to change. My apologies that I cannot actually show you what the files look like.
Is there a better/more accepted way of storing this data? I would like the license files to work with existing FORTRAN programs, of which read unformatted files in the general structure I've detailed, but reverse-engineering this sounds a bit difficult from what I've read and my employer has offered to rewrite the FORTRAN files to accept this new license creation program if need be.

Create your license structure as text, containing whatever data you need (XML is a convenient format).
Encrypt that using public key encryption (using your private key).
Embed the public key in your app. Decrypt the license file with the public key. Deal with it as you need to.
Easy!

Most license managers I've seen tend to show the license information in plain text, followed by a checksum code that the program checks against, most likely the data hashed with some other random stuff. That provides the benefit of having a human-readable license file while being hard to change.
Be advised that license managers like these will make casual copying difficult, but someone determined to run your program without a license will still be able to crack it with a disassembler and some time.

The most secure way to do that would probably be to encrypt the license file and have the programs using the licenses decrypt the file and display the info in it as necessary.

Related

How to create your own package for interaction with word, pdf etc

I know that there are a lot of packages around which allow you to create or read e.g. PDF, Word and other files.
What I'm interested in (and never learned at the university) is how you create such a package? Are you always relying on source code being given by the original company (such as Adobe or Microsoft), or is there another clever way of working around it? Should I analyze the individual bytes I see in e.g. PDF files?
It varies.
Some companies provide an SDK ("Software Development Kit") for their own data format, others only a specification (i.e., Adobe for PDF, Microsoft for Word and it's up to the software developer to make sure to write a correct implementation.
Since that can be a lot of work – the PDF specification, for example, runs to over 700 pages and doesn't go deep into practically required material such as LZW, JPEG/JPEG2000, color theory, and math transformations – and you need a huge set of data to test against, it's way easier to use the work that others have done on it.
If you are interested in writing a support library for a certain file format which
is not legally protected,
has no, or only sparse (official) documentation,
and is not already under deconstruction elsewhere,a
then yes: you need to
gather as many possible different files;
from as many possible sources;
(ideally, you should have at least one program that can both read and create the files)
inspect them on the byte level;
create a 'reader' which works on all of the test files;
if possible, interesting, and/or required, create a 'writer' that can create a new file in that format from scratch or can convert data in another format to this one.
There is 'cleverness' involved, mainly in #3, as you need to be very well versed in how data representation works in general. You should be able to tell code from data, and string data from floating point, and UTF8 encoded strings from MacRoman-encoded strings (and so on).
I've done this a couple of times, primarily to inspect the data of various games, mainly because it's huge fun! (Fair warning: it can also be incredibly frustrating.) See Reverse Engineering's Reverse engineering file containing sprites for an example approach; notably, at the bottom of my answer in there I admit defeat and start using the phrases "possibly" and "may" and "probably", which is an indication I did not get any further on that.
a Not necessarily of course. You can cooperate with other whose expertise lies elsewhere, or even do "grunt work" for existing projects – finding out and codifying fairly trivial subcases.
There are also advantages of working independently on existing projects. For example, with the experience of my own PDF reader (written from scratch), I was able to point out a bug in PDFBox.

VB.net quarantine techniques

I was thinking of an efficient way to add quarantining abilities to my antivirus application:
copy the file into a specified directory and change its extension to none (*.).
save the file's binary code in an XML database.
Which way is better?
However, I have no idea how I will recompile the binary code once the user wants to restore the file.
A way to do this is to encrypt the binary file using an encryption engine and moving it into a quarantine folder, you could create a random password and encrypt the file with that password and store it somewhere (that password could also be encrypted with a master key). That is probably the easiest way of quarantining. To unquaranine, just write the complete opposite of the quarantining code. Enumerate the files into a list and filter it out, then when the user clicks on an item and presses unquarantine, it calls the unquarantine function with the filepath as the variable.
If I had to do this (and again, I wouldn't want to be in this situation in the first place, per my comment), I would use an in-process database engine with native support for encryption and large-format binary data. I think sql compact or sqlite both fit this.
I would not use xml, because it's plain-text and the binary data could be easily extracted, and I would not just change the extension, because the file could still easily be executed. Neither are much of a quarantine.
Note that the renaming option is probably the most "efficient" of what I've seen discussed so far, but when dealing with security software correctness should always be your first concern over efficiency. There are times when you can compromise correctness for performance (3D game rendering software does this all the time, to great effect), but security software is not in this category.
What you can do is optimize later. For example, anti-virus engines use heuristics (rules of thumb that will only hold most of the time) to make their software faster, they do this in a way that favors false positives that must then be more-closely checked rather than potentially missing a threat. This only works because the code that more-closely checks each item was written and battle-tested first.

How bad is idea of letting users to upload and store files with national characters in the filename?

Our CMS accepts files with national characters in their names and stores them on the server without a problem. But how bad is such approach in perspective? For example is it possible to store files with filenames in Hebrew, or Arabic or in any other language with non-latin alphabet? Is there a standard established way to handle these?
A standard way would be to generate unique names yourself and store the original file name somewhere else. Typically, even if your underlying OS and file system allow arbitrary Unicode characters in the file name, you don't want users to decide about file names on your server. Doing so may impose certain risks and lead to problems, e.g. caused by too long names or file system collisions. Examples of sites that do that would be Facebook, flickr and many other.
For generating the unique file name Guid values would be a good choice.
Store the original filename in a database of some sort, in case you ever need to use it.
Then, rename the filename using a unique alphanumeric id, keeping the original file extension.
If you expect many files then you should create directories to group the files. Using the year, month, day, hour and minute is usually enough for most. For example:
.../2010/12/02/10/28/1a2b3c4d5e.mp3
Yes, I've had experience with massive mp3 collections which are notorious for being named in the language of the country where the song originates which can cause trouble in several places.
It's fine as long as you detect the charset it's in from the headers in the request, and use a consistent charset (such as UTF-8) internally.
On a Unix server, it's technically feasible and easy to accept any Unicode character in the filename, and then convert filenames to UTF-8 before saving them. However, there might be bugs in the conversion (in the HTML templating engine or web framework you are using, or the user's web browser), so it might be possible that some users will complain that some files they have uploaded disappeared. The root cause might be buggy filename conversion. If all characters in the filename or non-latin, and you (as a software developer) don't speak that foreign language, then good luck figuring out what has happened to the file.
It is an excellent idea. Being Hungarian, I'm pretty annoyed when I'm not allowed to use characters like áÉŰÖÜúÓÚŰÉÍí :)
There is a lot of software out there that has bugs regarding dealing with such file names, especially on Windows.
Udpate:
Example: I couldn't use the Android SDK (without creating a new user), because I had an é in my user name. I also ran into a similar problem with the Intel C++ compiler.
Software usually isn't tested properly with such file names. The Windows API still offers "ANSI" encoded versions of functions, and many developers don't seem to understand its potential problems. I also keep on coming across webpages that mess up my name.
I don't say don't allow such file names, in fact in the 21st century I would expect to be able to use such characters everywhere. But be prepared that you may run into problems.

Howto display or view encrypted data in encrypted form?

In the Wikipedia Article on Block Cipher Modes they have a neat little diagram of an
unencrypted image, the same image encrypted using ECB mode and another version of the same image encrypted using another method.
At university I have developed my own implementation of DES (you can find it here) and we must demo our implementation in a presentation.
I would like to display a similar example as shown above using our implementation. However most image files have header blocks associated with them, which when encrypting the file with our implementation, also get encrypted. So when you go to open them in an image viewer, they are assumed to be corrupted and can't be viewed.
I was wondering if anybody new of a simple header-less image format which we could use to display these? Or if anyone had any idea's as to how the original creator of the images above achieved the above result?
Any help would be appreciated,
Thanks
Note: I realise rolling your own cryptography library is stupid, and DES is considered broken, and ECB mode is very flawed for any useful cryptography, this was purely an academic exercise for school. So please, no lectures, I know the drill.
If you are using a high-level language, like Java, python, etc, one thing you could do is load an image and read the pixel data into an array in memory. Then perform the encryption on those raw bytes, then save the image when you are done. Let all of the header data be handled by the libraries of whatever language you are using. In other words, don't treat the file as a raw sequence of bytes. Hope that helps.
Just cut off the headers before you encrypt (save them somewhere). Then encrypt only the rest. Then add the headers in front of the result.
This is especially easy with the Netpbm format, because you only have to know, how many lines to cut off. The data is stored as decimal numbers, so you should probably take that into account when encrypting (convert them to binary first).

Functional testing of output files, when output is non-deterministic (or with low control)

A long time ago, I had to test a program generating a postscript file image. One quick way to figure out if the program was producing the correct, expected output was to do an md5 of the result to compare against the md5 of a "known good" output I checked beforehand.
Unfortunately, Postscript contains the current time within the file. This time is, of course, different depending on when the test runs, therefore changing the md5 of the result even if the expected output is obtained. As a fix, I just stripped off the date with sed.
This is a nice and simple scenario. We are not always so lucky. For example, now I am programming a writer program, which creates a big fat RDF file containing a bunch of anonymous nodes and uuids. It is basically impossible to check the functionality of the whole program with a simple md5, and the only way would be to read the file with a reader, and then validate the output through this reader. As you probably realize, this opens a new can of worms: first, you have to write a reader (which can be time consuming), second, you are assuming the reader is functionally correct and at the same time in sync with the writer. If both the reader and the writer are in sync, but on incorrect assumptions, the reader will say "no problem", but the file format is actually wrong.
This is a general issue when you have to perform functional testing of a file format, and the file format is not completely reproducible through the input you provide. How do you deal with this case?
In the past I have used a third party application to validate such output (preferably converting it into some other format which can be mechanically verified). The use of a third party ensures that my assumptions are at least shared by others, if not strictly correct. At the very least this approach can be used to verify syntax. Semantic correctness will probably require the creation of a consumer for the test data which will likely always be prone to the "incorrect assumptions" pitfall you mention.
Is the randomness always in the same places? I.e. is most of the file fixed but there are some parts that always change? If so, you might be able to take several outputs and use a programmatic diff to determine the nondeterministic parts. Once those are known, you could use the information to derive a mask and then do a comparison (md5 or just a straight compare). Think about pre-processing the file to remove (or overwrite with deterministic data) the parts that are non-deterministic.
If the whole file is non-deterministic then you'll have to come up with a different solution. I did testing of MPEG-2 decoders which are non-deterministic. In that case we were able to do a PSNR and fail if it was above some threshold. That may or may not work depending on your data but something similar might be possible.