Image Codec Introduction

Introduction

Here’s a breakdown of the process of encoding a YUV buffer into a JPEG file:

1. Preparation:

Access YUV buffer: Retrieve the YUV buffer containing the image data.
Determine format: Identify the specific YUV format (e.g., YUV420, YUV422, YV12).
Set parameters: Choose desired JPEG settings like quality level (affecting compression rate) and subsampling (affecting color resolution).

2. Color Space Conversion (if necessary):

Check for RGB requirement: If the JPEG encoder only accepts RGB, convert the YUV data to RGB using a color space transformation algorithm.
Direct YUV support: If the encoder supports YUV directly, skip this step.

3. Sampling (often included in JPEG libraries):

Apply subsampling: For common YUV420, reduce the resolution of chroma components (Cb and Cr) to half the size of luma (Y), as human eyes are less sensitive to color detail.

4. Quantization:

Divide into blocks: Split the image data into 8x8 pixel blocks.
Apply quantization table: Divide each frequency coefficient in each block by a corresponding value in a quantization table, reducing information and introducing some loss.

5. Zigzag Encoding:

Rearrange coefficients: Rearrange the quantized coefficients in a zigzag pattern to group similar frequencies together for efficient encoding.
Apply run-length encoding (RLE): Compress the zigzag-ordered data using RLE to reduce redundancy.

6. Huffman Coding:

Assign codes: Assign variable-length codes to different patterns of values based on their frequency, further reducing file size.

7. Header Information:

Add metadata: Include metadata like image dimensions, color space, quantization table, and Huffman table in the JPEG file header.

8. File Writing:

Write encoded data: Write the encoded image data and header information to the JPEG file.

Key Points:

Lossy compression: JPEG is a lossy compression method, meaning some image information is discarded during encoding to achieve smaller file sizes.
Library usage: Most practical implementations leverage libraries like libjpeg, FFmpeg, or OpenCV that handle most of these steps internally.
Encoder capabilities: Specific steps and options depend on the chosen JPEG encoder and its capabilities.

Discrete Cosine Transform

Here’s an example of a 2D DCT implementation in NumPy:

import numpy as np

def dct2(x):
    """Performs the 2D Discrete Cosine Transform (DCT) on a NumPy array."""

    # Define the DCT matrix
    W = np.zeros((x.shape[0], x.shape[1]))
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            W[i, j] = np.cos(np.pi * j * (2 * i + 1) / (2 * x.shape[0]))

    W /= np.sqrt(2 * x.shape[0])

    # Perform the forward DCT
    return np.real(np.fft.ifft2(W * x * W.T))

# Example usage
image = np.random.rand(8, 8)  # Example image data
dct_coefficients = dct2(image)

print(dct_coefficients)

Key Points:

Discrete Cosine Transform: DCT decomposes an image into its frequency components, concentrating energy in fewer coefficients.
NumPy Implementation: The code constructs the DCT matrix and applies it using FFT-based techniques.
Alternative Libraries: For optimized DCT implementations, consider SciPy (scipy.fftpack.dct) or OpenCV (cv2.dct).

Key Considerations:

Performance: Specialized libraries often offer faster DCT implementations.
Accuracy: Ensure numerical precision for sensitive applications.
Dimensionality: Adapt for higher-dimensional DCTs if needed.

Marker

import struct

with open("image.jpg", "rb") as f:
    data = f.read()

# JPEG markers are 2-byte values starting with 0xFF
while True:
    marker, = struct.unpack(">H", data[:2])
    if marker != 0xFFD8:  # Start of Image (SOI) marker
        break

    data = data[2:]
    length, = struct.unpack(">H", data[:2])
    data = data[2:length + 2]

    # Process the marker data based on its type
    # ... (replace with appropriate logic for each marker)

Here’s a list of common JPEG markers and their data types:

**Marker	Mnemonic	Description	Data Type**
0xFFD8	SOI	Start of Image	None
0xFFE0	APP0	Application-specific (JFIF)	Variable
0xFFE1	APP1	Application-specific (EXIF)	Variable
0xFFE2	APP2	Application-specific (ICC profile)	Variable
0xFFDB	DQT	Define Quantization Table(s)	Quantization table data
0xFFC0	SOF0	Start of Frame (Baseline DCT)	Frame header
0xFFC2	SOF2	Start of Frame (Progressive DCT)	Frame header
0xFFC4	DHT	Define Huffman Table(s)	Huffman table data
0xFFDA	SOS	Start of Scan	Scan header
0xFFD9	EOI	End of Image	None
0xFFDD	DRI	Define Restart Interval	Restart interval
0xFFDE	COM	Comment	Comment text

Key Points:

Marker Format: All markers start with 0xFF, followed by a 1-byte identifier.
Data Types: Data types vary from simple integers to tables and structures.
Variable-Length Segments: Most markers have a 2-byte length field indicating the segment’s size.
Reference: For a complete list and detailed descriptions, consult the JPEG standard (ISO/IEC 10918-1).

Additional Notes:

Restart Markers: 0xFFD0-0xFFD7 are used for error recovery.
Application-Specific Markers: 0xFFE0-0xFFEF can contain custom data.
Data Size: The length field specifies the size of marker-specific data, excluding the marker and length bytes themselves. Here’s an overview of EXIF structure and how to use Python’s struct module to unpack it (with caveats):

EXIF

EXIF Structure:

APP1 Marker: EXIF data is typically embedded within the APP1 marker segment of a JPEG file.
TIFF-based Format: EXIF data is structured using the TIFF (Tagged Image File Format) format, consisting of IFDs (Image File Directories).
IFD: Each IFD contains a list of tags, each representing a specific image attribute (e.g., exposure time, camera model, GPS coordinates).
Tags: Each tag has a unique 16-bit ID, data type (e.g., ASCII string, integer, rational number), and value.

Unpacking with Python struct (Cautions):

Locate EXIF Data: After parsing JPEG markers, find the APP1 marker with a length indicating EXIF presence.
Parse TIFF Header: Extract TIFF header information (byte order, offset to first IFD).
Iterate through IFDs: Use struct to unpack IFD entries (tag count, tag entries).
Unpack Tags: For each tag, unpack its ID, data type, and value using appropriate struct format strings.

Example (Simplified):

import struct

# ... (code to locate EXIF data within APP1 marker)

# Assuming TIFF header has already been parsed
ifd_offset = ...  # Offset to first IFD

# Read first IFD
ifd_data = exif_data[ifd_offset:]
num_tags, = struct.unpack(">H", ifd_data[:2])
ifd_data = ifd_data[2:]

# Iterate through tags
for _ in range(num_tags):
    tag_id, tag_type, value_count, value_offset = struct.unpack(">HHII", ifd_data[:12])
    ifd_data = ifd_data[12:]

    # Extract tag value based on data type
    # ... (replace with appropriate logic for each data type)

Key Points:

Complexity: Handling TIFF structures and variable-length data with struct can be intricate.
Error Handling: Ensure robust error handling for unexpected data or inconsistencies.
Specialized Libraries: Consider using Pillow or piexif for easier and more reliable EXIF handling.

Here’s a guide on parsing a TIFF header using Python’s struct module:

1. Read the Header Bytes:

with open("image.tif", "rb") as f:
    header = f.read(8)

2. Extract Basic Information:

byte_order, magic_number = struct.unpack(">HH", header[:4])

if byte_order == 0x4949:  # "II" for little-endian
    byte_order = "<"
elif byte_order == 0x4D4D:  # "MM" for big-endian
    byte_order = ">"
else:
    raise ValueError("Invalid TIFF byte order")

if magic_number != 42:
    raise ValueError("Invalid TIFF magic number")

3. Extract Offset to First IFD:

ifd_offset, = struct.unpack(byte_order + "I", header[4:])

Key Points:

Byte Order: TIFF files can be little-endian or big-endian; determine the correct format for unpacking.
Magic Number: Check for the valid TIFF magic number (42).
IFD Offset: The first IFD (Image File Directory) contains tags with image metadata.

Additional Information in the Header:

Version (2 bytes): TIFF version number.

4. Accessing IFDs:

Use ifd_offset to read the first IFD using struct.unpack and process its tags.
Iterate through IFDs as needed, following links to subsequent IFDs.

Recommendations:

Specialized Libraries: For comprehensive TIFF handling, consider libraries like Pillow, Tifffile, or tifftools.
Error Handling: Implement robust error handling to address unexpected data or inconsistencies.
TIFF Specifications: Refer to the TIFF specification for detailed understanding and handling of complex structures.

IFD

Within an IFD (Image File Directory) in a TIFF file, tag types are represented by 2-byte numerical values, not strings. Here’s a breakdown of the key points:

Tag Structure: Each tag in an IFD consists of four fields:
- Tag ID (2 bytes): A unique numerical identifier that specifies the tag’s meaning.
- Tag Type (2 bytes): A numerical code indicating the data type of the tag’s value.
- Value Count (4 bytes): The number of values associated with the tag.
- Value Offset (4 bytes): The location of the tag’s value(s) within the TIFF file.
Tag Type Codes: The TIFF specification defines a set of numerical codes for common data types, including:
- 1 = BYTE (8-bit unsigned integer)
- 2 = ASCII (7-bit ASCII string, null-terminated)
- 3 = SHORT (16-bit unsigned integer)
- 4 = LONG (32-bit unsigned integer)
- 5 = RATIONAL (two LONGs, numerator and denominator)
- …and others
Accessing Tag Type: To determine a tag’s data type, you would read the 2-byte value in the Tag Type field and interpret it using the TIFF specification’s codes.
Library Handling: Libraries like Pillow, Tifffile, or tifftools often handle these numerical tag types internally and provide convenient ways to access and interpret tag values using more intuitive data types in your code.

Important Note: The term “tag type” in this context specifically refers to the data type of the tag’s value, not the category or classification of the tag itself. The tag ID (not the tag type) is what determines the tag’s meaning.

Here’s a list of common Tag Type Codes defined in the TIFF specification:

**Code	Data Type**
1	BYTE (8-bit unsigned integer)
2	ASCII (7-bit ASCII string, null-terminated)
3	SHORT (16-bit unsigned integer)
4	LONG (32-bit unsigned integer)
5	RATIONAL (two LONGs, numerator and denominator)
6	SBYTE (8-bit signed integer)
7	UNDEFINED (variable-length uninterpreted byte array)
8	SSHORT (16-bit signed integer)
9	SLONG (32-bit signed integer)
10	SRATIONAL (two SLONGs, numerator and denominator)
11	FLOAT (32-bit IEEE floating-point number)
12	DOUBLE (64-bit IEEE floating-point number)
13	IFD (offset to another IFD)
14	UNICODE (null-terminated Unicode string)
16	SBYTE (same as code 6)
17	UNDEFINED (same as code 7)
18	SSHORT (same as code 8)
19	SLONG (same as code 9)
20	SRATIONAL (same as code 10)
21	FLOAT (same as code 11)
22	DOUBLE (same as code 12)

Key Points:

Numerical Codes: These codes represent data types used for tag values within IFDs.
Library Handling: TIFF libraries often handle these codes internally, providing more intuitive data access.
Reference: Consult the TIFF specification for a complete list and detailed descriptions.
Extended Types: Some TIFF files might use custom or extended tag types beyond this list.

EXIF tag

Here’s information about EXIF tag IDs and required tags:

List of Common EXIF Tag IDs:

Image Description Tags:
- ImageWidth (0x0100)
- ImageLength (0x0101)
- BitsPerSample (0x0102)
- Compression (0x0103)
- PhotometricInterpretation (0x0106)
- ImageDescription (0x010E)
- Make (0x010F)
- Model (0x0110)
- Orientation (0x0112)
- XResolution (0x011A)
- YResolution (0x011B)
- ResolutionUnit (0x0128)
- Software (0x0131)
- DateTime (0x0132)
- YCbCrPositioning (0x0213)
Camera Settings Tags:
- ExposureTime (0x829A)
- FNumber (0x829D)
- ExposureProgram (0x8822)
- ISOSpeedRatings (0x8827)
- ExifVersion (0x9000)
- DateTimeOriginal (0x9003)
- DateTimeDigitized (0x9004)
- ComponentsConfiguration (0x9101)
- ShutterSpeedValue (0x9201)
- ApertureValue (0x9202)
- BrightnessValue (0x9203)
- ExposureBiasValue (0x9204)
- MeteringMode (0x9207)
- Flash (0x9209)
- FocalLength (0x920A)
- SubjectArea (0x9214)
- MakerNote (0x927C)
GPS Tags (Optional):
- GPSVersionID (0x0000)
- GPSLatitudeRef (0x0001)
- GPSLatitude (0x0002)
- GPSLongitudeRef (0x0003)
- GPSLongitude (0x0004)
- GPSAltitudeRef (0x0005)
- GPSAltitude (0x0006)
- GPSTimeStamp (0x0007)
- GPSSatellites (0x0008)
- GPSStatus (0x0009)
- GPSMeasureMode (0x00A0)
- GPSDOP (0x00A1)
- GPSSpeedRef (0x00A2)
- GPSSpeed (0x00A3)
- GPSTrackRef (0x00A4)
- GPSTrack (0x00A5)
- GPSImgDirectionRef (0x00A6)
- GPSImgDirection (0x00A7)
- GPSMapDatum (0x00A8)
- GPSDestLatitudeRef (0x00A9)
- GPSDestLatitude (0x00AA)
- GPSDestLongitudeRef (0x00AB)
- GPSDestLongitude (0x00AC)
- GPSDestBearingRef (0x00AD)
- GPSDestBearing (0x00AE)
- GPSDestDistanceRef (0x00AF)
- GPSDestDistance (0x00B0)

Required Tags:

Only two tags are strictly mandatory in EXIF:
- ImageWidth (0x0100)
- ImageLength (0x0101)
However, most cameras include a broader set of tags, providing valuable image metadata.

Here’s information about the thumbnail tag in EXIF:

Tag ID:

The thumbnail tag’s ID is typically 0x0103 (Image Compression).

Tag Type Code:

The Tag Type Code for the thumbnail can vary depending on its format:
- 7 (UNDEFINED): Most commonly used for embedded thumbnails, indicating a variable-length byte array containing the compressed image data.
- 1 (BYTE): Occasionally used for uncompressed thumbnails.

JPEG Format:

Most embedded thumbnails in EXIF are indeed JPEG-compressed images.
However, it’s technically possible to embed thumbnails in other formats as well, although less common.

Accessing Thumbnail:

1. Locate the thumbnail tag within the EXIF IFD using its ID.
1. Check the Tag Type Code to determine the format of the data.
1. If the Tag Type Code is 7 (UNDEFINED) and you’re expecting a JPEG thumbnail:
- Extract the byte array from the Value Offset.
- Decompress it using a JPEG decoder library.
1. If the Tag Type Code is 1 (BYTE), the thumbnail is likely uncompressed and can be directly interpreted as image data.

Recommendations:

Use libraries like Pillow or piexif to simplify EXIF handling, including thumbnail extraction.
These libraries handle the format variations and decoding complexities for you.

Essential Marker

Here are the markers that must exist in a valid JPEG file:

Essential Markers:

SOI (Start of Image, 0xFFD8): Marks the beginning of the JPEG file.
APP0 (Application-specific 0, 0xFFE0): Typically contains JFIF (JPEG File Interchange Format) metadata.
SOF0 (Start of Frame 0, 0xFFC0) or SOF2 (Start of Frame 2, 0xFFC2): Specifies baseline or progressive DCT compression parameters, respectively.
DQT (Define Quantization Table, 0xFFDB): Defines quantization tables used for compression.
DHT (Define Huffman Table, 0xFFC4): Defines Huffman coding tables used for compression.
SOS (Start of Scan, 0xFFDA): Marks the beginning of compressed image data.
EOI (End of Image, 0xFFD9): Marks the end of the JPEG file.

EXIF:

Not strictly mandatory: A valid JPEG file can exist without EXIF data.
Commonly included: Most digital cameras and imaging software embed EXIF metadata within the APP1 (Application-specific 1, 0xFFE1) marker.

Additional Markers:

JPEG files can contain other optional markers, such as comments, restart markers, or application-specific data.
However, the markers listed above are essential for the basic structure and compression of a JPEG image.

Even without EXIF, a JPEG decoder can determine image width and height directly from the SOF (Start of Frame) marker:

Here’s how it works:

SOF Marker: The SOF marker (either SOF0 for baseline DCT or SOF2 for progressive DCT) is a crucial part of the JPEG structure. It contains essential information about the image’s dimensions and compression parameters.
Width and Height Fields: Within the SOF marker’s data, there are specific fields dedicated to storing the image’s width and height in pixels:
- Sample precision (1 byte): Specifies the number of bits per sample (usually 8).
- Image height (2 bytes): Stores the vertical image resolution in pixels.
- Image width (2 bytes): Stores the horizontal image resolution in pixels.
Decoder Interpretation: When a JPEG decoder encounters the SOF marker, it reads these width and height values directly from the marker’s data. This information is used to allocate memory for the decoded image data and establish the correct image dimensions for display or further processing.

APP0 Marker:

The APP0 marker, often containing JFIF metadata, can optionally provide image dimensions as well.
However, the SOF marker is the primary and more reliable source for this information, as it’s mandatory in all valid JPEG files.

Key Points:

SOF marker is essential for JPEG decoding, providing width, height, and compression details.
Decoders prioritize SOF for dimensions even if APP0 contains them.
EXIF, while valuable for additional metadata, is not strictly required for basic image display.

The APP0 marker in a JPEG file usually contains JFIF (JPEG File Interchange Format) metadata, which typically includes:

Essential Information:

JFIF identifier (4 bytes): “JFIF\0” to indicate JFIF compatibility.
Major version (1 byte): Usually 1 or 2 for common JFIF versions.
Minor version (1 byte): 0 in most cases.
Units (1 byte): 0 for pixel aspect ratio specified as a fraction, 1 for as a density.
X density (2 bytes): Horizontal pixel density (dots per inch).
Y density (2 bytes): Vertical pixel density (dots per inch).
Thumbnail horizontal offset (2 bytes): Offset to thumbnail data within the APP0 segment, if present.
Thumbnail vertical offset (2 bytes): Offset to thumbnail data within the APP0 segment, if present.

Optional Information:

Image resolution: Width and height in pixels (although SOF markers are the primary source).
Thumbnail: A small, embedded version of the image (JFIF specifies formats for uncompressed, 8-bit grayscale, or 24-bit color thumbnails).

Key Points:

Not Strictly Mandatory: While common, APP0 is not strictly required for a valid JPEG file.
JFIF Compatibility: It primarily ensures compatibility with JFIF-compliant decoders and software.
Thumbnail Access: If present, decoders can extract and display the thumbnail.
Image Resolution: While APP0 might contain image dimensions, decoders primarily rely on SOF markers for accuracy.
Application-Specific Data: It’s possible to use APP0 for custom metadata, but this is less common.

Example

Here’s a Python file that demonstrates how to extract and print EXIF tag values from a JPEG file using the struct module, although it’s generally recommended to use specialized libraries like Pillow for more robust and efficient EXIF handling:

import struct

def print_exif_with_struct(filename):
    """
    Prints EXIF tag values from a JPEG file using the struct module.

    Args:
        filename (str): Path to the JPEG file.
    """

    try:
        with open(filename, "rb") as f:
            data = f.read()

        # Find the APP1 marker (where EXIF data is usually stored)
        app1_marker = data.find(b"\xFF\xE1")
        if app1_marker == -1:
            print(f"No APP1 marker found in {filename}")
            return

        # Parse TIFF header within APP1
        header = data[app1_marker + 4:app1_marker + 12]
        byte_order, magic_number, ifd_offset = struct.unpack(">HHI", header)

        if magic_number != 42:  # Check for valid TIFF magic number
            print(f"Invalid TIFF magic number in {filename}")
            return

        # Process IFDs
        ifd_offset = app1_marker + ifd_offset
        while ifd_offset != 0:
            ifd_data = data[ifd_offset:]
            num_tags, = struct.unpack(">H", ifd_data[:2])
            ifd_data = ifd_data[2:]

            for _ in range(num_tags):
                tag_id, tag_type, value_count, value_offset = struct.unpack(">HHII", ifd_data[:12])
                ifd_data = ifd_data[12:]

                # Extract tag value based on type (simplified for brevity)
                if tag_type == 2:  # ASCII
                    tag_value = data[value_offset:value_offset + value_count].decode("ascii")
                else:
                    tag_value = struct.unpack(">" + "B" * value_count, data[value_offset:value_offset + value_count])

                print(f"Tag ID: {tag_id}, Value: {tag_value}")

            # Move to next IFD (if any)
            next_ifd_offset, = struct.unpack(">I", ifd_data[:4])
            ifd_offset = ifd_offset + next_ifd_offset

    except IOError:
        print(f"Error opening file: {filename}")

if __name__ == "__main__":
    filename = "your_image.jpg"  # Replace with your JPEG file path
    print_exif_with_struct(filename)

Important Caveats:

Complexity: Handling TIFF structures and variable-length data with struct can be intricate.
Error Handling: Incomplete error handling for unexpected data or inconsistencies.
Tag Interpretation: Requires manual mapping of tag IDs to names for meaningful output.
Library Preference: Specialized libraries like Pillow or piexif generally offer more robust and convenient EXIF handling.