Ever wondered how data compressions work?

how compression works

All of us who use computers have used data compression, whether doing it ourselves or using a compressed file. Sometimes the compressed data or files are apparent while sometimes do not seem that way. For example, when you see a zip folder, you know there is data compressed in it. But when you see an image or a video, you don’t even notice or care to notice that those images and videos are compressed at all. That is the gist of compression – it shouldn’t seem like it has been compressed. JPEG is an extremely popular type of image compression technique and so are PNG, MP3, MP4 or MKVs.

The amount of data is always increasing which instead requires some more space. If we were to store all the data in their original format, we would require hundreds or even thousands of times more disk space than we do now. For example, an image in its original format can have size in tens of MB and in some cases even GB. But the same image when converted to JPG reduces to the size of few hundred KB or few MB without seemingly no data loss.

In theory, compression should be able to compress a data without any loss of information. But that is not usually the case. There are basically two types of compression techniques – Lossless compression and Lossy compression. As it is pretty clear by their name, Lossless compression technique is one in which there is no data loss. In other words, the original data can be perfectly reconstructed from the decompressed data. ZIP, PNG, RAW are lossless techniques. Lossy technique, on the other hand, sacrifices some data for the sake of better decreasing the file size. The loss of data may not be seen superficially but yes, some data is lost. JPG, WMV are some of the lossy techniques.

Lossless compression	Lossy compression
RAW, BMP, PNG, WAV, FLAC, ALAC. Very very few lossless techniques are there for videos	JPEG, GIF, MP3, MP4, OGG, H.264, MKV, WMV

Basic idea of how compression works :

Lets say we want to compress a text file. Text contains lots of repetitive words and phrases. A word of 9 letters (eg. “education”) would take 72 bits. If the word “education” is repeated many times in a text, it could be indexed once and only the places that it was could be noted. This would drastically decrease the size since numbers take a lot less size than words. While compression of images and videos (collection of images basically) would not be as easy as compressing a text, the similar technique can be applied to images also. The pixels with same or almost same RGB values can be treated in the same manner as the repetitive words in text. An image that contains same color all over (fully white or fully black or whatever) could be compressed almost perfectly theoretically. More colorful and more complex textures an image has, lower the compression size. Thus some information is lost in the way for better compression, thus “lossy”.