LZW Compression Support

Overview

HotPDF now includes comprehensive LZW (Lempel-Ziv-Welch) compression support for PDF streams. LZW is a lossless data compression algorithm that's particularly effective for text and simple graphics, commonly used in PDF documents for content stream compression.

Key Features

  • Full LZW decompression support for PDF streams
  • Configurable predictor support for enhanced compression
  • Multiple fill order options (top-to-bottom, bottom-to-top)
  • Early code change support for compatibility
  • Memory-efficient stream processing
  • PDF parameter integration (Predictor, Colors, BitsPerComponent, Columns)

Technical Implementation

The LZW compression support is implemented through the HPDFLZW.pas unit, which provides:

  • TPDFLZWDecompressor: Main decompression class
  • TPDFLZWParms: Parameter structure for PDF-specific settings
  • TPDFLZWFillOrder: Enumeration for bit order processing

Class Reference

TPDFLZWDecompressor

The main class for LZW decompression operations.

Properties:

  • FillOrder: Specifies bit order (foBottom, foTop)
  • EarlyChange: Controls early code size changes
  • InitialCodeSize: Initial code size for decompression

Methods:

  • Decompress(Input: AnsiString): AnsiString - Basic decompression
  • Decompress(Input: AnsiString; Parms: TPDFLZWParms): AnsiString - Decompression with PDF parameters

Usage Examples

Basic LZW Decompression


// Delphi example - Basic LZW decompression
procedure DecompressLZWData;
var
  Decompressor: TPDFLZWDecompressor;
  CompressedData: AnsiString;
  DecompressedData: AnsiString;
begin
  Decompressor := TPDFLZWDecompressor.Create;
  try
    // Configure decompressor
    Decompressor.FillOrder := foTop;
    Decompressor.EarlyChange := True;
    Decompressor.InitialCodeSize := 9;

    // Decompress data
    DecompressedData := Decompressor.Decompress(CompressedData);

    // Use decompressed data
    ProcessDecompressedData(DecompressedData);
  finally
    Decompressor.Free;
  end;
end;
        

Advanced LZW Decompression with PDF Parameters


// Delphi example - Advanced LZW decompression with PDF parameters
procedure DecompressLZWWithParms;
var
  Decompressor: TPDFLZWDecompressor;
  CompressedData: AnsiString;
  DecompressedData: AnsiString;
  Parms: TPDFLZWParms;
begin
  Decompressor := TPDFLZWDecompressor.Create;
  try
    // Configure PDF parameters
    Parms.Predictor := 2;           // Horizontal differencing predictor
    Parms.Colors := 3;              // RGB color space
    Parms.BitsPerComponent := 8;    // 8 bits per component
    Parms.Columns := 100;           // Image width in pixels
    Parms.ExpandedTo8Bit := True;   // Expand to 8-bit components
    Parms.ColorSpace := 'DeviceRGB';

    // Decompress with parameters
    DecompressedData := Decompressor.Decompress(CompressedData, Parms);

    // Process the decompressed image data
    ProcessImageData(DecompressedData, Parms);
  finally
    Decompressor.Free;
  end;
end;
        

PDF Parameter Support

TPDFLZWParms Structure

  • Predictor: Predictor function (0=none, 2=horizontal differencing)
  • Colors: Number of color components (1=grayscale, 3=RGB, 4=CMYK)
  • BitsPerComponent: Bits per color component (1, 2, 4, 8, 16)
  • Columns: Number of samples per row
  • ExpandedTo8Bit: Whether to expand to 8-bit components
  • ColorSpace: Color space identifier

Algorithm Details

The LZW implementation includes:

  • Dictionary-based Compression: Builds and maintains a dynamic dictionary
  • Variable Code Length: Supports code lengths from 9 to 12 bits
  • Clear Code Handling: Proper handling of clear and end-of-information codes
  • String Table Management: Efficient string table for pattern recognition

Performance Characteristics

  • Memory Efficient: Optimized for large data streams
  • Fast Decompression: Highly optimized decompression algorithms
  • Predictable Performance: Consistent performance across different data types
  • Low Memory Footprint: Minimal memory overhead during processing

Common Use Cases

  • PDF content stream decompression
  • Image data decompression in PDF files
  • Text stream decompression
  • Form data decompression
  • PostScript stream decompression

Error Handling

The LZW decompressor includes robust error handling for:

  • Invalid code sequences
  • Corrupted data streams
  • Memory allocation failures
  • Dictionary overflow conditions

Standards Compliance

  • PDF 1.2+ LZWDecode filter compliance
  • PostScript Level 2 LZW compatibility
  • TIFF LZW compression compatibility
  • Adobe LZW implementation compatibility

See Also