Philosophical Hurdles II
where k < n . If there were an algorithm that compressed all of $T_n$ into the smaller files,then at least two files would end up being compressed into the same smaller file.
How could a decompression routine tell which was which when it decompressed them? How will the decompression routine tell a file compressed from one from an $T_n$ uncompressed version from a $T_k$ ? Remember, it isn' t possible to add extra bits to indicate the original size of the file or whether it is compressed because this overhead will make the files larger. At least half of $T_n$ is being compressed by only one bit in this scheme, and adding even one bit of overhead will alleviate any gain from this procedure. This is an informal proof, and it has some nasty implications.
The worst is that half of all files can' t be compressed by more than one bit. Three-quarters of the files can' t be compressed by more than two bits. Seven-eighths can' t be
compressed by more than three. That' s pretty depressing news.
The good news is that while the compression is difˇ cultif not impossible in the general case, it is usually quite useful for most of the things we do all day. Most of the files,for instance, are filled with such rich patterns and repetition that the simplest program can identify them and use them to shrink the file. Text is filled with endless streams of only a few words. Images have huge chunks devoted to the same or similar colors.
This means that it' s possible to design compression algorithms that do little with the text files with strings like “jaeujxm kjsaA9kk3*” and do a lot with strings like “Bobbie bobbed her hair and bobbed for apples.” There are many different ways that words can be combined to form sentences, paragraphs, and books, but there are many, many, many more ways that the letters can be combined in random ways. It' s not hard to exclude these cases.
This lack of guarantees may be hard for some people to grasp. Most users are accustomed to using popular programs like PkZip or Stufilt that regularily squeeze files down to half their size. Compressing most common files is fairly easy because most data are largely repetitive and filled with plenty of opportunities to find some shorthand representation. Image files are often quite large, and there is usually plenty of opportunity to find a shorthand approximation for parts of the file.
Another common mistake that many people make is assuming that files can be continually compressed. For instance, if some algorithm cuts the size of files in half, then what happens if it is applied again to the result? Will the resulting file be one-quarter of the size? Probably not. A good compression function should do all of the work in the first pass. This isn' t always the case, but it should be.
0 Response to "Philosophical Hurdles II"
Posting Komentar