algorithm to crack steganography - steganography

what is the basic idea behind steganography?ie ,how do you get the hidden information?
suppose if it is an image and some text is within that image...how do you get that text?..

Every stenography algorithm is different in that respect. Every algorithm hides the information differently and thus getting the information back is different.
A simple example goes like this - Each pixel of the image is composed of 3 bytes, one for red, green and blue. Most people can't detect a difference of one bit in the color in an image so one option is to use the least significant bit of each color channel for your data. This way you can store 3 bits of information in every pixel with very little effect on the general quality of the image.
To get the information back you'll need to read the first bit of every color channel of every pixel and gather all the bits together.
This is just a very simple and almost trivial way to do stenography. Real stenography algorithms are somewhat more involved. Like in cryptography, there is no way to generically "unhide" all stenography. you need to know which algorithm you're trying to decode.

The very basic idea is that images contain tons of redundant information that your eye cannot see. For instance if you changed the last bit on each pixel there would be no visible change as almost all of the information about the color is the other bits. So you can encode messages using the last bit (the most basic algorithm). The histogram however will be changed and a large message will easily be detectable. As far a decoding the message itself, well, the message itself is probably using public key encryption so you will never know what the actual payload was.
Steganography unlike cryptography is considered broken if Eve (who is eavesdropping and practising steganalysis) knows that there is a message at all. The assumptions are based on that Alice and Bob are being watched and any communication is sign that they are up to something (aka prisoners, restrictive governments, all governments in the future hehe ;-))
And of course the algorithms become much more complex that just flipping the last bits, but encoding data that will not affect the structure of image (and become vulnerable to statistical attacks.) :
I read this book last summer and I thought is was an excellent introduction (it has a lot a psuedocode of the algorithms used)
http://www.amazon.com/Steganography-Digital-Media-Principles-Applications/dp/0521190193

Steganography, coming from greek Steganos(i'm greek :P) is the art of hiding messages. While cryptography is about scrambling a message, steganography is about a person not being able to locate the message.
There are many tools that do this procedure for you. Writing a tool like that can be a complicated procedure i think, though i haven't tried to do that. You would need to create a sophisticated approach of correctly using unused or seemingly not important image pixels or data, in order to add your own message, file etc. For some more information, please take a look at : http://www.symantec.com/connect/articles/steganography-revealed

Related

QPSK works in simulation but not with SDR

I'm going to start off by saying that I'm very new to SDR and GNU Radio. This may be a dumb question, but I have been googling and testing things for about two months now trying to get this to work without success. Any help or pointers would be appreciated!
I'm attempting to use GNU Radio 3.8 to transfer a file using differential QPSK. I've tried to follow the tutorials on the wiki as well as several similar academic papers I found on the internet (which also seem to be based off the wiki tutorial). None of them worked on their own but combining what actually works from each one, I managed to create a flowgraph sans hardware that does indeed send and receive the data from a file. Here's the Flowgraph and here is a screenshot of the results. The results show the four constellation points, and the data from the file source matches up perfectly with the data having gone though the entire transmit+receive chain. In the simulation I have a throttle block and a channel model block where the LimeSDR Source and LimeSDR Sink block would be. So far so good (at least as far as I can tell).
When I actually start transmitting this signal with the SDR, the received data no longer matches up with what is transmitted. Here's the flowgraph I've been using for the transmission. I added a protocol formatter and some FEC blocks that I could have removed for this illustration, but the point is that simply looking at what bits are going into the modulator vs what's being recovered, the two do not match up. The constellation looks good (as far as I can tell) but the bits are all wrong. Here's a screenshot showing the bits being transmitted. You'll notice in the screenshot of the transmitted signal that the signal has a repeating series of three flat top "1's" surrounded on both sides by a period of "0's" (at time 1.5ms and 3.5ms). This is a screenshot of the received bits. At time 1ms and 3ms you can see how it is has significantly more transitions between 1 and 0 than it should.
So at this point I'm stumped. The simulation worked but the real world test does not. I've messed around with the RRC filter properties a significant amount. I have no clue if the values I have chosen are correct as I have not found a tutorial or explanation on how to do so. I just looked at some of the example flowgraphs and made some guesses as to how those values were derived and applied those guesses to my use case. It worked well in the simulation so I thought it would be fine in the real world test. I've tried a variety of samples per symbol but my goal is for a 4800 bit per second transfer speed, and using different samples per symbol didn't help anyway. What should I change in order to get this to work?
Bonus question: The constellation object has QPSK and DQPSK, and the constellation modulator has a differential checkbox. What is the best practice combination of selections to get a differential QPSK modulation?

QR code-like alternative with extremely low error rate and ability to read bent codes

I'm trying to find an alternative to QR codes (I'd also be willing to accept an entirely novel solution and implement it myself) that meets certain specifications.
First, the codes will often end up on thin pipes, and so need to be readable around a cylinder. The advantage to this is that the effect on the image from wrapping it around a cylinder is easy to express geometrically, and the codes will never be placed on a very irregular shape.
Second, read accuracy must be very high, as any read mistake would be extremely costly. If this means larger codes with more redundancy for better error correction, so be it.
Third, ability to be read by the average smartphone camera from a few inches out.
Fourth, storage space of around half a kilobyte per code.
Do you know of such a code?
The Data Matrix Rectangular Extension (DMRE) improves upon the standard set of rectangular Data Matrix symbol sizes in an algorithmically compatible manner, thus increasing the range of suitable applications with no real downsides.
Reliable cylindrical marking is a primary use case.
Regardless of symbology you will be unable to approach sufficient data density to achieve 0.5KB of binary data in a single compact, narrow symbol scanned using a standard camera phone. However, most 2D symbologies (DMRE included) support a feature called Structured Append that allows chaining of multiple symbols that can be scanned in any order to produce a single read when all components are accounted for.
If the data to be encoded is known to be highly structured (e.g. mostly numeric or alphanumeric) then the internal encoding process of Data Matrix will be more optimised than for general binary data. For example, the largest DMRE symbol (26×64) will provide up to 236 numeric characters, ~175 alphanumeric characters and only 116 bytes.
If the default error recovery rate is insufficient then including a checksum in the data may be appropriate.
DMRE has just been voted to be accepted as an ISO/IEC project and will likely become an international standard enjoying broad hardware and software support in due course.
Another option may be to investigate PDF417 which has a broader range of symbols sizes, however the data density is somewhat less than Data Matrix.
DMRE references: AIM specification and explanatory notes.

hole filling algorthim for VB 6

I'm trying to find a way to implement fill holes algorithm for binary image in VB 6.0. I found these sites so far
http://answers.opencv.org/question/9863/fill-holes-of-a-binary-image/
http://www.mathworks.com/help/images/ref/imfill.html
I'm totally confused by the algorithm to find holes. Can someone please refer me to a better site or simple code so can assist me to implement the algorithm
It is really easy to fill holes in the binary image with a morphological closing operation that typically consists of a dilation followed by an erosion. A side effect of these operations is a small distortion of the shape. morhplogyEx() with parameter MORPH_CLOSE will do it for you in one operation.
A more sophisticated algorithm to use is flood fill which will return the area of the hole as well as will mark it with a specified color. It starts at a seed area specified with a Point seedPoint.
I personally prefer to invert the image contrast and find holes as small blobs or connected components. I then erase blobs with a certain size and invert the image again. OpenCV has a simple blob detector.

Good library for Digital watermarking

Can somebody help me, to find a library, or a detailed description of algorithm, that could embed a Digital watermark(invisible watermark, just a kind of steganography) to a jpeg/png file. But the quality of algorithm, should be great. It should be possible to extract this mark after rotation and expansion(if possible) of image.
Mark is just a key 32bytes.
I found a good site, but the algorithm are made for the NetPBM format, that is dead...
I know that there is a LSB method, but it is not stable to the expansion. Are there something better?
Changing metadata, is not suitable, because it is visible changes.
This maybe won't really be an answer, as I don't think it would be easy to give a magical, precise answer on this question.Watermarking is complex, and the best way to do it is by yourself : this will make things more hard for an attacker trying to reverse engineer your code. One could even read your question here, guess what library you used, and attack your system more easily.
Making Steganography resist to expansion in JPEG images is also very hard, because the JPEG compression is reapplied after the expansion. There are in fact a bunch of JPEG steganography algorithms. Which one you should use, depends on what exactly do you require :
Data confidentiality ?
Message presence confidentiality ?
Message coherence after JPEG changes ?
Resistance to "Known Cover" attacks (when attackers try to find the message, based on the steganographic system) ?
Resistance to "Known Message" attacks (when attackers try to find the steganographic system used, based on the message) ?
From what I know, usually, algorithm that resist to JPEG changes (picture recompression) are often really easier to attack, whereas algorithms that run the "encode" stage during the JPEG compression (after the DCT (lossy) transform, and before the Huffmann (non-lossy) transform) are more prone to resist.
Also, one key factor about steganography is scale : if you have only 32bytes of data to encode in a, say, 256*256px image, don't use an algo that can encode 512bytes of data in the same size. Either use a scalable algorithm, either use an algorithm at its efficient scale.
Also, the best way to do good steganography is to know its limitations,and to know how steganalyzers work. Try these tools, so you can understand what attackers will do to your picture.^
Now, I cannot tell you what steganographic system will be the best for you, but I can give you some indications :
jSteg - Quite old, I don't think it will resist to JPEG changes
OutGuess - Quite old too, but one of the best algorithms
F5 (and F3/F4) - More recent, good algorithm, scientifical research behind.
stegHide
I think all of these are LSB based : the encoding is done during the JPEG compression, after the DCT and Quantization. The only non LSB-based steganography system I heard of was mentionned in this research paper, however, I did not read it to the end yet, so I cannot tell if this will meet your needs.
However, I'm not sure there exists a real steganography algorithm resisting to JPEG compression, to JPEG resize and rotation, resisting to visual and statisticals attacks. Or I'm not aware of it.
Sorry for the lack of precise answer, I tried to give you what I know on the subject, as it's always better to be more informed. Sorry also for the lack of proper English, I'm French, nobody's perfect :)
Pistache is right in what he told you regarding the watermarking implementation algorithms. I will try to help you by showing one algorithm for the given requirements.
Before explaining you the algorithms first I guess that the distinction between the JPG and PNG formats should be done.
JPEG is a lossy format, i.e, the images are susceptible to compression that could remove the watermark. When you open an image for manipulation purposes and you save it, upon the writing procedure, a compression is made by using DCT filtering that removes some important components of the image.
On the other hand, PNG format is lossless, and that means that images are not susceptible to such kind of compression when stored after manipulation.
As a matter of fact, JPEG is used as a watermarking scheme attack due to its compressing characteristic that could remove the watermark if an attacker performed the compression.
Now that you know the difference between both formats, I can tell you a suitable algorithm resistant to the attacks that you mentioned.
Regarding methods to embed a watermark message for PNG files you can use the histogram embedding method. The histogram embedding method changes values on the histogram by changing the values of the neighbor bins. For example imagine that you have a PNG image in grayscale.
Therefore, you'll have only one channel for embedding and that means that you have one histogram with 256 bins. By selecting the neighbor bins x and x+1, you change the values of x and x+1 by moving the pixels with the bright x to x+1 or the other way around, so that (x/(x+1))>T for embedding a '1' or ((x+1)/x)>T for embedding a '0'.
You can repeat the same procedure for the whole histogram length and therefore you can embed in the best case up to 128bits. However this payload is less than what you asked. Therefore I suggest you to split the image into parts, for example blocks, and if you split one image into 4 components you'd be able to embed in the best case up to
512 bits which means 64 bytes.
This method although is very, susceptible to filtering and compression if applied straight in the space domain. Therefore, I suggest you to compute before the DWT of the image and use its low-frequency sub-band. This will provide you better transparency and robustness increased for the warping, resizing etc attacks and compression or filtering as well.
There are other approaches such as LPM (Log Polar Maps) but they are very complex to implement and I think for your case this approach would be fine.
I can suggest you two papers, the first is:
Watermarking digital image and video data. A state-of-the-art overview
This paper will give you some basic notions of watermarking and explain more in detail the LSB algorithm. And the second paper is:
Real-Time Compressed- Domain Video Watermarking Resistance to Geometric Distortions
This paper will explain the algorithm that I just explained now.
Cheers,
I do not know if you are considering approaches different to steganography. Instead of storing data hidden in the pixel data you could create a new data block in the JPEG file and store encripted data.
Take a look at the JPEG file structure on Wikipedia
You can create an application specific data block, using the marker 0xFF 0xEn. Doing so, any change in the image pixels do not change the information stored in the image. Moreover, many image editing software respect custom data blocks and will keep them even after image manipulation.

Harder, Better, Faster, Stronger... Techniques for an image-based CAPTCHA?

There are lots of non-image-based CAPTCHA ideas floating around. But what about the old-fashioned way?
What are the elements of a good image CAPTCHA? What visual elements are hard for computers, but easier for humans? What about mistakes, elements that are easier for computers than they are for humans? What are good techniques for increasing the speed of a CAPTCHA generator?
Here's an example of a CAPCHA I've been working on. It generates the functions for two sine waves, then stretches a text between them. It lays that over a background drawn from a pool of images.
How could this be improved? (Specifically, I'm using PHP GD.) Things that come to mind are:
Change the color of the text, possibly making it multicolored.
Add "scratches" or marks that mildly obscure the text.
Add to the distortion so that it's affected by sine waves horizontally as well.
What goes into a superb image CAPTCHA?
Edit:
I know that there are some very worthy third-party CAPTCHA resources. I'm looking for attributes that make them good. I'd like to use my own CAPTCHAs, just for the purpose of self-improvement. So, you can talk about reCAPTCHA, but it's not exactly what I'm looking for.
Also, it has been brought up that not only the image, but also the experience matters, so feel free to comment on that.
Make each letter/number out of a pattern, I.E. unconnected dots. Meaning the computer has no way of knowing that a dot is part of a letter other than pattern recognition (which they don't have yet.) Then the usual distortions and random lines.
How you do this is the challenge.
EDIT: Also, bonus points for patterns of different shapes, and try alpha transparency on the characters (on the edges or the whole character), so they merge with the background.
Make letters difficult to separate. Use handwriting-like font or add lines that join letters. Decrease and randomize spacing between letters.
Add wave distortion in other axis too. Distortion in one axis only can be relatively easily analyzed and reversed.
Don't bother with color background at all. It's super-easy to automatically filter black from other colors. Your background hinders only humans.
Don't add scratches or other noise unless it has the same thickness as letters. Noise-removal algorithms can easily remove things that are thinner than letters.
What if the color of the letters faded into other colors... for instance the 5 can start off as yellow on top and fade into blue or something. The colors chosen should be random.
With the multicolored background it might make it hard for the computer to pickup where the background ends and the character begins.. and hopefully it would not be too difficult for the human to actually pick up the pattern.
Instead of generating captcha you can create a captcha table in your database and you yourself create the table by search on google for good captcha images.
So no need to worry "Will this generation method work?"
I really hate CAPTCHA on sites, they just annoy me, but if you want to try and make a robust one try the following:
Ability to get a new image without submitting
Spoken version for the visually impaired
Non-uniform characters
I've used Recaptcha on a few sites, it's a nice and robust solution.
Or if you want to be really funky about it check out this: http://research.microsoft.com/asirra/
Algorithms that try to break captcha are pattern matchers that work by a few different ways: scaling and skewing the symbols that they already know about, finding and tracing edges, and counting interior holes to help. If you can break the letter up into pieces, vary the letter quality, or add strong lines or “scratches” along the letters these techniques will help. However all of this is fairly moot considering we have recaptcha for this purpose and it’s a wonderful third party app for this. Additionally captcha will help the security of your site, but will not stop those who are truly enticed.
I like the idea of KittenAuth and Microsoft's Asirra project. The idea is that, while OCR will eventually evolve to break your traditional captcha, the ability to distinguish a kitten from a dog is many orders of magnitude more complex a problem, while absolutely trivial for humans.
This solution, while probably the sexiest captcha idea ever, has the limitation of not being easily portable to hearing-impaired methods.
What about shearing and shuffling bands to mangle display and mouse-only input?
Start by taking your sine-wave morphed text, divide into horizontal bands or maybe even a grid.
That makes optical recognition harder and might allow you to avoid the kind of nasty background games that make some captchas hard for humans.
For a site where you can rely on local drag in the browser, instead of typing in an entry use shuffling requiring the user to re-order pieces (just in sloppy order, not like one of those puzzles). Or, if you wanted to use clicks alone, the classic sliding tile puzzle.
Note, I've run into a captcha where you had to identify which of N cartoons had an animal in them which succeeded in blocking me!
Wellington Grey sums up the AI CAPTCHA race nicely.
You could add a random array of fonts so that GD renders each character using a different one.
Be wary of suggestions of ReCaptcha. I have submitted incorrect input into it a couple few dozen times, and have had success each time. Several of those times I have submitted incorrect input for both words rather than just the most obscured word; the success rate, as I said, has been 100%.
I also think that image-based CAPTCHAs are user-hostile and should be avoided wherever possible. The advantage of text-based solutions is that you can tailor them to your site's audience, adding a level of obscurity that may trip up machines as they become more savvy with text-based solutions.
At the very least, don't use this all the time:
(source: codinghorror.com)