I was thinking of an efficient way to add quarantining abilities to my antivirus application:
copy the file into a specified directory and change its extension to none (*.).
save the file's binary code in an XML database.
Which way is better?
However, I have no idea how I will recompile the binary code once the user wants to restore the file.
A way to do this is to encrypt the binary file using an encryption engine and moving it into a quarantine folder, you could create a random password and encrypt the file with that password and store it somewhere (that password could also be encrypted with a master key). That is probably the easiest way of quarantining. To unquaranine, just write the complete opposite of the quarantining code. Enumerate the files into a list and filter it out, then when the user clicks on an item and presses unquarantine, it calls the unquarantine function with the filepath as the variable.
If I had to do this (and again, I wouldn't want to be in this situation in the first place, per my comment), I would use an in-process database engine with native support for encryption and large-format binary data. I think sql compact or sqlite both fit this.
I would not use xml, because it's plain-text and the binary data could be easily extracted, and I would not just change the extension, because the file could still easily be executed. Neither are much of a quarantine.
Note that the renaming option is probably the most "efficient" of what I've seen discussed so far, but when dealing with security software correctness should always be your first concern over efficiency. There are times when you can compromise correctness for performance (3D game rendering software does this all the time, to great effect), but security software is not in this category.
What you can do is optimize later. For example, anti-virus engines use heuristics (rules of thumb that will only hold most of the time) to make their software faster, they do this in a way that favors false positives that must then be more-closely checked rather than potentially missing a threat. This only works because the code that more-closely checks each item was written and battle-tested first.
Related
I know that there are a lot of packages around which allow you to create or read e.g. PDF, Word and other files.
What I'm interested in (and never learned at the university) is how you create such a package? Are you always relying on source code being given by the original company (such as Adobe or Microsoft), or is there another clever way of working around it? Should I analyze the individual bytes I see in e.g. PDF files?
It varies.
Some companies provide an SDK ("Software Development Kit") for their own data format, others only a specification (i.e., Adobe for PDF, Microsoft for Word and it's up to the software developer to make sure to write a correct implementation.
Since that can be a lot of work – the PDF specification, for example, runs to over 700 pages and doesn't go deep into practically required material such as LZW, JPEG/JPEG2000, color theory, and math transformations – and you need a huge set of data to test against, it's way easier to use the work that others have done on it.
If you are interested in writing a support library for a certain file format which
is not legally protected,
has no, or only sparse (official) documentation,
and is not already under deconstruction elsewhere,a
then yes: you need to
gather as many possible different files;
from as many possible sources;
(ideally, you should have at least one program that can both read and create the files)
inspect them on the byte level;
create a 'reader' which works on all of the test files;
if possible, interesting, and/or required, create a 'writer' that can create a new file in that format from scratch or can convert data in another format to this one.
There is 'cleverness' involved, mainly in #3, as you need to be very well versed in how data representation works in general. You should be able to tell code from data, and string data from floating point, and UTF8 encoded strings from MacRoman-encoded strings (and so on).
I've done this a couple of times, primarily to inspect the data of various games, mainly because it's huge fun! (Fair warning: it can also be incredibly frustrating.) See Reverse Engineering's Reverse engineering file containing sprites for an example approach; notably, at the bottom of my answer in there I admit defeat and start using the phrases "possibly" and "may" and "probably", which is an indication I did not get any further on that.
a Not necessarily of course. You can cooperate with other whose expertise lies elsewhere, or even do "grunt work" for existing projects – finding out and codifying fairly trivial subcases.
There are also advantages of working independently on existing projects. For example, with the experience of my own PDF reader (written from scratch), I was able to point out a bug in PDFBox.
The Project
I've been asked to work on an interesting project -- what amounts to a basic Web CMS -- that uses HTML/CSS/jQuery with PHP. However, one requirement is that there won't be a database to house the data (they want flat files for the documents/pages -- preferable in JSON format).
In a very basic sense, it'll be used to generate HTML pages via a very "non-techie" interface. Each installation would only have around 20 pages, but a few may get up to 100. It has to be fairly easy to drop onto a PHP capable server and run, with very little setup needed.
What's Out There
There are tons of CMS options and quite a few flat file versions. But an OSS or other existing CMS is not an option. They need a simple propriety system.
Initial Thoughts
So flat files it is... but I'd really like to get some feedback on the drawbacks, and if it is worth the effort to try and convince them to use something like MySQL (SQLite or CouchDB are out since none of the servers can be configured to run them at the present time).
Of course the document files are pretty straightforward, but we're also talking about login info for 1 or 2 admins per installation, a few lists, as well as configs/settings (which also can easily be stored in a file with protection).
The Dilemma
If there are benefits to using MySQL rather than JOSN formatted files and some arrays in a simple project like this -- beyond my own pre-conceived notions :) -- I'll be sure to argue them.
But honestly I can't see any that outweigh their need to not have a database system.
I'd appreciate you insight and opinions.
If you can't cite a specific need for relational table design, then you're good with flat files. Build as specified. The moment you can cite a specific need, let them know; upgrading isn't that hard, if you're perception is timely (that is, if you aren;t in the position of having to normalize data that should have been integrated earlier).
It's a shame you can't use CouchDB, this seems like the perfect application for it. Keep in mind that using flat-files severely constrains your architecture and, especially, scalability.
What's the best case scenario for your CMS app? It's successful and people want to use it more? If you're using flat-files it'll be harder to service and improve your system (e.g. make it more robust, and add new features for future versions) and performance will not scale well. So "success" in this case is at best short-lived, as success translates into more and more work for less and less gains in feature-set and performance.
Then again, if the CSM is designed right, then switching between a flat file to RDMS should be as simple as using a different data access file.
Will this be installed on any shared hosting sites. For this to work somewhat safely, a mechanism like suEXEC needs to be set up properly as the web server will need write permissions to various directories.
What would be cool with a simple site that was feed via JSON and jQuery is that the site wouldn't need to load on each click. Just the relevant data would change. You could then use hashes in the location bar to keep track of where you were (ex. http://localhost/#about)
The problem being if they are editing the raw JSON file they can mess it up pretty quick. I think your admin tools would have to generate the JSON files based on the input so that you can ensure nothing breaks. The admin tools would be more entailed then the site (though isn't that always the case with dynamic sites)
What is the predicted data sizes for the CMS?
A large reason for the use of a RDMS is quick,specific access to large amounts of data. The data format might not be large, but if there is a lot of the data, then it might be better in the long run for a RDMS.
Then again, if the CSM is designed right, then switching between a flat file to RDMS should be as simple as using a different data access file.
While an RDBMS may be necessary for a very large CMS, a small one could run off flat files very well. A lot of CMS products out there fall down in that regard, I think, by throwing an RDBMS into the mix when there's no real need.
However, if you are using flat files, there are security issues which others have highlighted. Another issue I've come across is hosting providers using the disable_functions directive in php.ini to disable file I/O functions like fopen() and friends. If you're hosting your CMS on a box you control, you won't have this problem but if you're using a third-party provider, check first.
As the original poster, I wasn't signed in, so I'm following up to the answers so far in an answer (sorry if this is bad form).
There may instances where this is on
a shared host.
Though the JSON files can technically
be edited, this won't be the case.
The admin interface will be robust
enough to do all of the creating/editing of pages
The size for each install will be
relatively small -- 1 - 2 admins,
10-100 pages. A few lists of common
items may run longer (snippets of
copy for example).
Security will be a big issue -- any
other options suggestions on this
specifically?
Well, isn't there a problem with they being distrustful to any database system? Isn't the problem more in their thinking than in technology? Maybe they are afraid of database because it sounds complex to them. In that case, if you just present them some very simple CMS (like CMS made simple, which I've heard is really simple and the learning process is very fast), if they see everything is easy then may be they just don't care what's behind, if it's a database or whatever!
They could hear to arguments like better maintenance, lower cost of maintenance, much better handover to another webmaster than proprietary solutions (they are not dependent on you) etc.
I'm working on an ASP.net web application that uses SQL as a database back-end. One issue that I have is that it sometimes takes a while to get my DBA to create or modify tables in the database which under no circumstance am I allowed to modify on my own.
Here is something that I do is when I expect users to upload files with their data.
Suppose the user uploads a new record for a table called Student_Records. The user uploads a record with fname Bob and lname Smith. The record is assigned primary key 123 The user also uploads two files: attendance_record.pdf and homework_record.pdf. Let's suppose that I have a network share: \\foo\bar where the files are saved.
One way of handling this situtation would be to have a table Student_Records_Files that associates the key 123 with Bob Smith. However, since I have trouble getting tables created, I've gone and done something different: When I save the files on the server, I call them 123_attendance_record.pdf and 123_homework_record.pdf. That way, I can easily identify what table record each file is associated with without having to create a new SQL table. I am, in essence, using the file system itself as a join table (Obviously, the file system is a type of database).
In my code for retrieving the files, I scan the directory \\foo\bar and look for files that begin with each primary key number from Student_Records.
It seems to work very well, but is it good practice?
There is nothing wrong with using the file system to store files. It's what it is used for.
There are a few things to keep in mind though.
I would consider a better method of storing the files - perhaps a directory for each user, rather than simply appending the user id to the filename.
Ensure that the file store is resilient and backed up with the same regularity as your database. If your database is configured to give you a backup every 10 minutes, but your file store only does a backup every day (or worse week) then you might be in for a world of pain.
Also consider what would happen if the user uploads two documents that are the same name.
First of all, I think it's a bad practice, in general, to design your architecture based on how responsive your DBA is. Any given compromise based on this approach may or may not be a big deal, but over time it will result in a poorly designed system.
Second, making the file name this critical seems dangerous to me; there's no protection against a person or application modifying the filename without realizing its importance.
Third, one of the advantages of having a table to maintain the join between the person and the file is that you can add additional data, such as: when was the file uploaded, what is the MIME type, has the file been read by anyone through the system, is this file a newer version of a previous file, etc. etc. Metadata can be very powerful, and the filesystem offers only limited ways to store it.
There are really two questions here. One is, given that for administrative reasons you cannot get changes made to the database schema, is it acceptable to devise some workaround. To that I'd have to say yes. What else can you do? In theory, if it takes two weeks to get the DBA to make a schema change for you, then this two weeks should be added to any deadline that you are given. In practice, this almost never happens. I've often worked places where some paperwork or whatever required two weeks before I could even begin work, and then I'd be given two weeks and one day to do the project. Sometimes you just have to put it together with rubber bands and bandaids.
Two is, is it a good idea to build a naming convention into file names and use this to identify files and their relationship to other data. I've done this at times and it's generally worked for me, though I have a perhaps irrational emotional feeling that it's not a good idea.
On the plus side, (a) By building information into a file name, you make it easy for both the computer and a human being to identify file associations. (Human readable as long as the naming convention is straightforward enough, anyway.) (b) By eliminating the separate storage of a link, you eliminate the possibility of a bad link. A file with the appropriate name may not exist, of course, but a database record with appropriate keys may not exist, or the file reference in such a record may be null or invalid. So it seems to solve one problem there without creating any new problems.
Potential minuses are: (a) You may have characters in the key that are not legal in file names. You may be able to just strip such characters out, or this may cause duplicates. The only safe thing to do is to escape them in some way, which is a pain. (b) You may exceed the legal length of a file name. Not as much of an issue as it was in the bad old 8.3 days. (c) You can't share files. If a database record points to a file, then two db records could point to the same file. If you must make two copies of a file, not only does this waste disk space, but it also means that if the file is updated, you must be sure to update all copies. If in your application it would make no sense to share files, than this isn't an issue.
You have to manage the files in some way, but you had to do that anyway.
I really can't think of any over-riding minuses. As I say, I've done this on occassion and didn't run into any particular problems. I'm interested in seeing others' responses.
I think it is not good practice because you are making your working application very dependent on specific implementation details and it would make it pretty hard to work with in the future to maintain, or if other people later needed access to your code/api.
Now weather you should do this or not is a whole different question. If you are really taking that much of a performance hit and it is significantly easier to work with how you have it, then I would say go ahead and break the rules. Ideally its good to follow best practice methods, but sometimes you have to bend the rules a little to make things work.
First, why is this a table change as opposed to a data change? Once you have the tables set up you should only need to update rows in that table every time that a user adds new files. If you have to put up with this one-time, two-week delay then bite the bullet and just get it done right.
Second, instead of trying to work around the problem why don't you try to fix the problem? Why is the process of implementing table changes so slow? Are you at least able to work on a development database (in which you have control to test and try out these changes)? Even if it's your own laptop you can at least continue on with development. Work with your manager, the DBA, and whoever else you need to, in order to improve the process. Would it help to speed things up if your scripts went through a formal testing process before you handed them off to the DBA so that he doesn't need to test the scripts, etc. himself?
Third, if this is a production database then you should probably be building in this two-week delay into your development cycle. You know that it takes two weeks for the DBA to review and implement changes in production, so make sure that if you have a deadline for releasing functionality that you have enough lead time for it.
Building this kind of "data" into a filename has inherent problems as others have pointed out. You have no relational integrity guarantees and the "data" can be changed without knowledge of the rest of the application/database.
It's best to keep everything in the database.
Network file I/O is spotty at best. In addition, its slower than the DB I/O.
If the DBA is difficult in getting small changes into the database, you
may be dealing with:
A political control issue. Maybe he just knows DB stuff and is threatened
when he perceives others moving in on his turf. Whatever his reasons, you need
to GET WORK DONE. Period. Document all the extra time / communication / work
you need to do for each small change and take that up with the management.
If the first level of management is unwilling to see things your way,
(it does not matter what their reasons are), escalate the issue
to the next level of management. In the past, I've gotten results this way.
It was more of a political territory problem than a technical problem.
The DBA eventually gave up and gave me full access to the TEST system BUT
he also stipulated that I would need to learn his testing process,
naming convention, his DB standards and practices, his way of testing, etc.
I was game.
I would also need to fix any database problems arising from changes I introduced.
This was fair and I got to wear the DBA hat in addition to the developer hat.
I got the freedom I needed and he got one less thing to worry about.
A process issue. Maybe the DBA needs to put every small DB change you submit
through a gauntlet of testing and performance analysis. Maybe he has a highly
normalized DB schema and because he has the big picture, he needs to normalize or
denormalize your requested DB changes to fit into the existing schema.
Ask to work with him. Ask him for a full DB design diagram.
Get a good sense of his DB design philosophy. Implement your DB changes with
his DB design philosophy in mind. Show that you understand that he's trying
to keep the DB in good order (understand normalization, relational constraints,
check constraints) Give him less to worry about. He needs to trust that you
will not muck up his database.
Accumulate all the small changes into a lengthy script and submit them to the DBA.
This way, you won't have to wait for each small change to go through all of his
process / testing. In addition, you're giving him a bigger picture view of your
development planning (that is in step with his DB design philosophy) instead of
just the play by play.
I created this simple textpad program in WPF/VB.NET 2008 that automatically saves the content of the forms to an XML file on every keystroke.
Now, I'm trying to make the program see the changes on the XML file in realtime.. example, If I open two of my textpads, when I write on the first one, it will automatically reflect on the other textpad.
How can I do this?
One of my colleagues told me to read about iNotifyPropertyChanged (which I did) but how can I apply it to my application..?
:( help~
btw, I got the idea from a Google Wave demo, and I'm actually trying to do something bigger..
Note - this approach will be really, really expensive in terms of disk I/O, memory usage and CPU time. Why are you using XML is that the native format of the data you are editing? You may want to look at a more compact format - one that will use less memory, generate fewer I/Os and use less CPU.
Also note that you writer may need to flush the file for the watcher to notice any changes. This is expensive as well - especially if you re doing it on every key stroke.
Be sure to use the correct file open attributes (sharing, reading and writing).
You may want to consider using shared memory to communicate between your processes. This will be less expensive. You can avoid large ammounts of disk I/O by only writing changes to disk when the use asks to commit them, or there is a hint to do so. I suggest avoiding doing this on every key stroke.
Remember, your app needs to be a good system citizen and consume a reasonable amount of system resources. This is especially true running on netbooks and other 'low spec' systems.
You will probably need to use the FileSystemWatcher to watch the file on the disk rather than a property in the running instance of the application.
Or you could use some custom message passing between different instances of your application.
INotifyPropertyChanged isn't going to work for your application. That interfaced is used when data binding some element to a UI object.
Your best bet is going to be to attach a FileSystemWatcher to the file when you open it for editing. You can then use the change events to reload the file as needed in each instance of your application.
This will also load changes made from external editors.
It sounds like you are using file IO as a form of interprocess communication, if so, IMO you need to rethink your design, especially if you are doing something "bigger" than google wave (whatever bigger means in this context) as what you are proposing is terribly ineficient.
Do some searching on Interprocess communication and you will get a whole bunch of idea's #foredecker's idea (+1) of shared memory is a good possibility for example.
We have a Delphi 2006 application that drives a MS SQL Server database.
We have found a vulnerability where it is possible to load the executable into a hex editor and modify the SQL.
Our long term plan is to move this SQL to CLR stored procedures but this is some way off since many of our clients still use SQL 2000.
We've thought about obfuscating the strings, does anyone have a recommendation for a tool for doing this?
Is there a better solution, maybe code signing?
Sorry for being blunt, but if you are thinking of applying "security" measures in your executable you are doomed. No scrambling schema will retain an average hacker.
You also haven't explained how is your app designed. Is the database hosted by you, or resides in your client's premises? If the latter, then just forget about security and start hiring a lawyer to get a good confidentiality contract so your clients behave. If the former, then using stored procedures is the easiest way.
If embedded SQL is being hacked, then it implies that your database is quite open and anyone with MSQRY32.EXE (that is, MS Office) can get your data.
If you are a vendor, then you can't rely on CLR being enabled at your clients. So, why not use non-CLR stored procedures and correct permissioning in the database that is version independent?
This is not a vulnerability. If your machines are vulnerable to having people locally modify EXEs, that is your vulnerability.
All EXEs can be hacked, if someone has local admin account access, your game is over long before they get near your resource strings.
It will never be possible to protect completely, but you can make "casual attack" harder. The simple system that I use is a "ROT47" type system which is like ROT13 but wider ranging. The code then gets to look like the following:
frmLogin.Caption := xIniFile.ReadString(Rot47('$JDE6>' {CODEME'System'}),
The key here is that I have a comment which includes the string so both I can see it, but more importantly so can the utility that I run in my FinalBuilder build script. This allows me to ensure that strings are up-to-date at all times in release code. The utility looks for {CODEME in the lines, and if found knows the format of the data to output appropriately.
A solution that would require a deep restructuring of the application would be to use a multi-tier approach - most the of the SQL code would be in the application server module, that being on a server should be more protected than a client side exe.
Can't you encrypt all your queries and put them to the resource file?
During runtime, firstly you would have to:
Load your query string from resource.
Decrypt it.
Then you just run your query as before.
That should not be a big problem. Of course if you are not storing your queries in some resource / folder than you need to refactor your application a bit. But you should store them anyway in some organized manner. So you will be hitting a two birds with one stone here ;-)
For encryption of the strings you could use a free library called DCPCrypt.
I think you should use a exe packer which makes it hard for anyone to modify the stuff using hex editor.
First - do an analysis of your threat. Who is using your vulnerability, why is this a problem. Then act accordingly.
If your application is win32 and your threat are some kids witch are just having fun, a free exe packer (e.g. upx) might be the solution. On .NET applications signing might be what you want.
If you need more than that, it's going to be expensive and it's going to be more difficult to develop your application. Perhaps you even need to restructure it. Commercial protection schemes are available (perhaps with dongle?) - even protection schemes where you store your strings on some external hardware. If the hardware is not present, no SQL-Strings. But, as I said, that's more expensive.
Move DB interface to stored procedures. Normal regular stored procedures without any CLR. It's not a big deal if you already have queries to put inside.
If you don't want to learn T-SQL for some reasons, simple move all you query string to database and store in application single query, which purpose is reading SQL code with given query ID from database only.
All tricks with encoding produces a lot of troubles, but don't give any real security because must use reversable encrypting (dictated by the nature of the problem) and all keys for decoding placed in application executable too.
There are "protection" suites that encrypt and/or validate your exe before running. searching for "encrypt exe" or "validate exe" or so will probably help. Usually they are payware, but sub $100.
The principle is the same as an exe packer (and has some of its downsides, like cheaper antivirus heuristics sometimes reacting on them, a slightly elevated memory load), just more focussed on security. A problem is also that for most exe packers, depackers exist.
I use dinkeydongle's wares, but that is a kind that also ties into an hardware dongle, so that might be a bridge to far for you.