When to use tar+gz , zip or 7z

Tar, zip, gunzip and 7zip are compression and packing techniques have long been available for many many years across different platforms with some shipping by default in some OSes while others may be included in some OSes only. Linux tends to ship with tar, gzip and zip while windows now ships with zip only. These tools are very useful, well supported and will continue to in the future.

(Warning, text only article)

Instead of getting technical or wasting text diving into history, packing and compression are very very helpful. They help us to store files and get more space out of our storagez saving money and can also be used to significantly reduce bandwidth use (ISPs and you need this as bandwidth is actually expensive). Nowadays CPUs are so fast that they can perform compression so fast that the slower speeds of hard drives are made faster by compressing content (windows has this for NTFS file systems but using neither the above methods to compress.

Although these methods and softwares are different its essentially the tools at their core that are far more impressive than you think. WinZip by default will only work with zip files while 7zip will work with zip, gzip and tar and more even far. Winrar works with more compression types but less than 7zip while on Linux you would have to install a utility for every type except if you install and work with 7z and its command line.

In general compression does:

  • Reduces bandwidth needs especially for admin, making transfers faster and cheaper. For example gzipped SQL transfers save 10x bandwidth especially when the database gets bigger in size which helps because utilities like phpmyadmin borrow limits from php settings so will not allow uploading huge files.
  • With a fast CPU, reduces bottlenecks for transfers between slow drives or components, and this also applies to vpns.
  • Get more storage space for your money
  • Assists with detection of errors during transmission and storage but also add risk of entire archive failing from an uncorrectable error.
  • Significantly reduce storage overhead by using 1 file. On windows if you right click, notice the size on disk is always bigger than the file size. This adds up if you have many many small files like any comprehensive large software would because the OS adds extra info for each file relating to the file system itself too.

With this what’s not to like about not using compression however different compressions are created equal.

Zip is a decent compression that’s pretty available as standard everywhere. Its not the best or the fastest but good all around with a big advantage of being able to work with individual files without decompressing. This added benefit means that you don’t need huge space to work with archives. For example if your entire disk contains only a single zip file of 1TB in size, you would not need another 1TB of space just to get a file of 1KB contained within while adding new files apply similarly too that you so not need to rebuild the dictionary or add to it inefficiently. Overall zip is a good balance of performance, features and workability and shouldn’t not be used be used everywhere with the exceptions below.

Tar +gzip is a standard used on Linux for a long time before zip came standard everywhere. Gzip is a variation of zip that uses a different more efficient algorithm that’s both faster and compresses even more with the ability to tweak between performance vs space with the default level being faster and better than zip. The difference with tar is that even if tar only packages and not compresses that’s why it is always used with gzip. Tar itself does reduce overhead space usage on filesystem’s but more importantly supports many Linux/Unix file metadata such as permissions which has formed an important practice on Linux/Unix for many years by having software and their files as their own user for security (think windows uac security without the annoyance). Without this metadata, extracting files to be used on Linux can cause issues due to security, that is why even packages use tar+gzip, tar to store metadata, gzip to compress to save bandwidth and space. For everything Linux, even mail and websites, please use tar.gz whenever it involves transfer of files.

7zip is a very comprehensive and configurable compression +tool. While this supports all other formats I’m going to talk mainly about the compressions it offer that the above don’t. With this you have lzma (default 7zip), bzip2 (another cool variation of zip like gzip ( Not as tweakable but has its improvements that isn’t from zip as default), lzma,ppza, and a few more with configurable settings you can adjust based on the type , size and number of files you are compressing. This is best used for long term storage where having the required space to uncompress the entire archive is expected to be present if getting a single file is only extracted. This extra configurability and different algorithms means that more hardware resources such as CPU and ram and knowledge on the compression used are needed to make full use of this but it pays off if you are archiving for long term to save the most amount of space.

In conclusion, when dealing with any software related transfers and archives for linux, use tar.gz even when transferring files to your linux webserver. When you need every bit of space saved at the expense of compression time and accessibility, use 7zip. For everything else use zip even when uploading to a file sharing site, sending to a friend and such. Single media files are the only content where compression won’t help since they are already compressed.

CompressionZip