The Impact of Microsoft's ActiveMovie Multimedia Architecture on the Professional Video Marketplace
|March 4, 1996
The desire to use economical PCs in professional video applications continues to grow. It started with nonlinear editing in the late 80s but technology is rapidly advancing to make PCs useable in a wide variety of applications including:
Manufacturers of all of these types of systems have been held back by the lack of a robust software architecture standard that can meet the demanding requirements of handling multimedia data types in realtime.
Currently the PC standard is Microsoft's Video for Windows (VFW) multimedia architecture and the Mac standard is Apple's Quicktime. Both architectures lack important features that would make them usable by the professional video industry. For example, both schemes limit I/O throughput. Digital video file format structures and media synchronization methods are inadequate. VFW does not even support alpha channel keying. Developers have been forced to come up with proprietary software approaches to overcome these limitations of VFW and Quicktime which makes interoperability among vendors difficult, if not impossible. The need for proprietary code also limits the ability of developers to support multiple hardware platforms (video adapters, host processors) under multiple environments (Windows NT, Windows 95 and Mac-OS).
Effort is wasted porting the same software source code over and over, limiting the time available to respond to users requests for new features. Also, the development of massive pieces of proprietary code leads to longer development and testing (SQA) cycles and less reliable applications. It is well known to programmers that software code reliability is directly related to the amount of testing it undergoes and the size of the user base. More re-usable code with a larger user base would mean more reliable products with less down time -- a real boon to professionals who cannot tolerate system crashes. And, who hasn't noticed that every editing system vendor takes much, much longer than promised to release the products they announce every year at NAB! This is an industry-wide problem due in large part to the software architecture limitations described above.
This paper focuses on Microsoft's new ActiveMovie multimedia architecture, that promises to overcome these limitations and deliver a standard, reliable software framework robust enough to meet the demands of the professional video industry.
First we review the limitations of Video for Windows, then we introduce the general theory of the Component Object Model (COM) programming structure upon which ActiveMovie is based. A discussion of the specific implementation of ActiveMovie as it applies to professional video applications follows, then we lay the ground work for future enhancements to the ActiveMovie model. Finally, we assess the impact that this new software architecture is expected to have on the video industry.
Throughout this discussion we use real-world examples of the benefits of the ActiveMovie architecture based on its implementation in Matrox's new DigiSuite family of hardware modules and software development tools.
Limitations of Video for Windows
The limitations of VFW have been clear to professional video application developers for some time. They include:
Inadequate audio/video file format
In its original form, the AVI file format did not support the performance required by professional audio/video/film applications. The definition of enhancements to make the VFW AVI file format more useful for professional applications was the first project undertaken by the Open Digital Media (OpenDML) coalition. Established in late 1994, OpenDML is a group of software and hardware vendors dedicated to making Windows the platform of choice for professional video, audio and film producers.
The enhanced AVI file format specification was released by OpenDML in November '95 and has been incorporated by Microsoft into the ActiveMovie architecture. The enhanced AVI file format has the following features:
Applications with conventional AVI file format support can work on new codecs and applications that support the extended AVI file format can work on conventional codecs improving the performance of all VFW systems.
Up until this point, there have been as many motion-JPEG standards as there were motion-JPEG codecs on the market preventing interoperability of any sort. The extended AVI file format provides interoperability among different hardware and software vendors' motion-JPEG codecs using ISO 10918 motion-JPEG DIB (Device Independent Bitmap) as the standard.
The 1 GB practical limit on standard AVI files restricts D1-quality video playback to under 2 minutes at a 10 MB/sec. data rate. The OpenDML file format allows practically unlimited playback time subject only to the size of the storage media attached to the system.
AVI was originally designed to reference a frame index. The index, or reference list of video locations in the AVI file, provided access to frames as the smallest discrete element. Professional video applications, however, require the ability to sequence individual fields, not frames. The new field indexing scheme offers a number of important advantages including improved video effects support, film and video frame-rate support, disk seek-time minimization, support for incremental file growth and references to source information.
Limited I/O throughput
VFW transfers (copies) compressed video data, frame by frame, between different steps in the video processing operation using the conventional host CPU memory copying capability. Typically, these transfers need to occur between the storage device and the system memory, then from system memory to the video codec and finally from the video codec to the display device. The overall throughput of the system is thus limited by the performance of the CPU. In addition, VFW limits transfers to a maximum of 64 KBytes, which is very small relative to the enormous quantity of data needed to represent video. Even near-Betacam-quality compressed video requires at least a 4 MB/sec. bandwidth. System performance increases dramatically with larger data transfers.
The need for larger transfers introduces the concept of data streaming. A good analogy is to imagine transporting water to put out a fire. We could use water buckets (representing video frames) and a human chain (representing the CPU) from the source to the destination but that would be slow (buckets are small) and require a huge effort from the carriers (CPU). It is easy to see that using a hose (data stream) to accomplish this task would be much more efficient.
Inconsistent driver models
The number of data types supported by VFW is limited to audio, video, graphics and MIDI. The driver models for these devices lack uniformity because VFW was designed to be backward compatible with legacy code for audio (WAV driver) and graphics (GDI driver). Inconsistency in the driver models greatly complicates the developer's job.
Limited software - hardware interoperability
There are two major barriers to hardware - software interoperability inherent in VFW -- software drivers are monolithic and the Installable Compression Module (ICM) interface is limiting.
Monolithic software drivers make it difficult to replace a specific function inside a software driver without major work. Multimedia device drivers provide services for a specific data type such as digital video playback, audio playback, graphics animation and video effects. Conceptually, the drivers are made up of a large number of individual operations called primitives. In the case of a motion-JPEG video codec, for example, these primitives are memory/storage data management, color space conversion, raster block conversion, discrete cosine transform (DCT), quantization, Huffman decoding and display.
Current software drivers associated with each media type are not broken down into these sub-elements of functionality. They are monolithic in nature. This all but prevents an individual function that was originally coded in software, say color-space conversion, to be enhanced by hardware control and acceleration.
The ICM interface of VFW allows some hardware - software interoperability but it is limited to the replacement of a software-only codec with a hardware codec. A much more generic approach to software - hardware interoperability, for all media types, is needed to achieve various price/performance products.
No system-level synchronization of various media drivers
VFW ensures synchronization between audio and video playback by dropping or repeating video frames. This is a major shortcoming since professional video applications depend on timing accuracy down to the video field level. Synchronization for other media types is not supported.
Drivers are incompatible between Windows 95 and Windows NT
Incompatibility between the Windows 95 and the Windows NT environments practically doubles the amount of work for vendors who want to support both operating systems. One example of the incompatibility is the 32-bit NT driver interface structure vs. the 16-bit Windows '95 software.
Component Object Model (COM)
The Component Object Model (COM) architecture is the basic software foundation of ActiveMovie.
This extensible software architecture exploits object oriented programming techniques to offer interoperability between software and hardware and among different vendors' products. Perhaps the best known implementation of COM is Microsoft's OLE (Object Linking and Embedding) which provides a powerful means for applications to interact and interoperate. Microsoft's strategy is now to use COM in all new software architectures including ActiveMovie.
What is a component object ?
A component object is software code and associated data that perform a specific processing function in a system. In multimedia applications, we care about data processing. Data processing involves transformations. In electrical engineering terms, transformations are produced by filters, so one component object is called a "filter" in ActiveMovie terminology. For example, reading an AVI file from disk, decompressing a motion-JPEG stream and controlling the volume of audio playback are all functions that can be performed by a filter.
COM offers multiple advantages beyond the current VFW model:
Conventional programming techniques result in incompatibility problems when interfacing software modules compiled with different software languages (C, Basic, Pascal) . Similar incompatibilities occur between different CPUs (Alpha, Pentium, PowerPC, MIPS) and various operating systems (NT, 95, UNIX, Mac-OS).
COM defines a binary standard that eliminates this problem -- binary descriptors are the lowest common denominator of all.
One basic problem hindering multi-vendor interoperability is: How can we ensure interoperability between different pieces of code that were designed at different times, in different parts of the world, by different software developers?
COM solves this problem by assigning a globally unique identifier (called GUID) to every interface and component object. This 128-bit integer code is guaranteed to be unique in the world across space and time. (There are 3.4 x 1038 different codes possible.) Because each component object is uniquely identified, there is no risk of confusion in trying to interface to the wrong component object.
COM provides the ability for an application to dynamically query the capability of a given filter and its interface. Thus, application software can adapt (without any code changes) to the availability of certain features in a particular system at a given time. For example, an editing application can query a particular video effects plug-in module, understand the specific effects that are provided in the module and allow the user to access these capabilities.
COM allows interfaces to communicate between different hardware processors, different operating systems, different software processes in the same machine and even between multiple processors through networks.
COM technology provides the facility to use code over and over. In fact, existing component objects can be easily incorporated into new ones. The new component object inherits the features of the parent component. This speeds development and also ensures more reliable software. New code gains an immediate level of reliability by virtue of being built on an existing, thoroughly-tested, component object.
ActiveMovie is a specific implementation of COM for multimedia that overcomes all of the limitations of VFW. The improved audio/video file format described above that was originally proposed by OpenDML has been incorporated in ActiveMovie. High-performance I/O throughput is ensured by the new data streaming architecture. Driver models now have a consistent Application Program Interface (API). They support more data types and are modular leading to a high level of software/hardware interoperability. System-level, accurate synchronization for all media types is provided and ActiveMovie API is identical for Windows 95 and Windows NT minimizing the work required for software developers to support both operating systems.
ActiveMovie also benefits from all of the COM advantages defined above.
ActiveMovie Data Types
ActiveMovie allows the system designer to define a wider variety of data types than was possible under VFW. For example, in the Matrox DigiSuite implementation of ActiveMovie, the following multimedia data types are defined:
The ActiveMovie data streaming architecture allows the optimal interchange of time-based data of all of these data types between various ActiveMovie filters. There are three major benefits to the data streaming approach. Data throughput is maximized by the ability to use large data buffers (more than 64 KBytes). CPU intensive memory copying operations are eliminated by the use of shared memory buffers. System-level synchronization is ensured through the use of time stamps on all data streams.
A filter is a COM object that performs a single task in a multimedia system. For example, in a complex sub-system such as the Matrox DigiMix digital video mixer, individual filters perform such tasks as background and wipe generation, chroma keying, layer priority selection, proc amp adjustments, 2D DVE, etc.
More complex functions can be performed by interfacing multiple filters together to act on a specific data type. Filters are connected together by interfaces called "pins" in ActiveMovie terminology. It will typically take multiple filters grouped together to replace the functionality of the monolithic driver from VFW. Different types of ActiveMovie filters can either provide, transform or consume data. These three functions are accomplished using:
A source filter provides data such as digital audio, digital video or graphics to other filters downstream. Alternatively, it can provide control information such as video and audio keyframes. For example, a data source filter would typically perform functions such as an AVI file reader, WAV audio file reader and title animator. A source filter can get its control information from a file or from interaction with the user via a user interface device like a scroll bar or fader control. A source filter typically has only an output pin.
A transform filter accepts a data stream at its input pin, performs a transformation on the data and provides the processed data to its output pin. For example, in Matrox DigiSuite, the video codec, the 2D DVE processor, the audio equalizer and the Movie-2 bus interconnect are just a few of the many hardware assisted transform filters.
A renderer filter is responsible for consuming the processed data and relaying it to a presentation device such as a video display or a speaker. The renderer will ensure presentation of each media stream at the correct time, based on the system-level synchronization mechanism. In Matrox DigiSuite, for example,. the "correct time" is defined as presentation of each media stream accurately synchronized at the video field-level. Typically, a renderer filter has only an input pin.
Two types of pins are defined in ActiveMovie: shared memory buffers and implicit hardware connections. Shared memory buffer pins are used whenever a filter interfaces to another through the use of computer memory. For example, a source filter reading an AVI file from hard disk will provide the video data to the video codec through a shared memory buffer, given that the data read from disk is stored somewhere in system memory. ActiveMovie provides the ability to share these memory buffers from one filter to the next, without the need to perform the expensive CPU memory copying operations needed by VFW. In addition, the shared memory buffer can be used to transfer large blocks of memory (easily in the MByte range as opposed to VFW's 64K limit) at once, thus achieving significantly higher I/O throughput than the VFW approach.
Hardware connections are made when a filter employs hardware acceleration. For example, a connection made between two video adapters through the Matrox Movie-2 bus would occur through a hardware pin. Because pins are standardized for a given data type, a shared memory buffer pin can be easily replaced by a hardware connection without affecting the interface with the application software. This transparency between hardware and software allows the same application to easily migrate from software-only operation to higher performance using hardware accelerators.
ActiveMovie Filter Graph
A multimedia system requires the connection of multiple filters to accomplish system-level functions. A representation of the multimedia system is called a filter graph. Filter graphs are customized connections performed between many filters in order to accomplish a specific task.
In order to establish a connection between the input pin of one filter and the output pin of another filter, a negotiation of data type takes place. A connection can be established between two filters that share the same data type (i.e. motion-JPEG digital video) but cannot be successfully established between two inconsistent data types. This prevents feeding a digital video stream into an audio equalizer, for example. One output pin can feed many filter inputs using a tee filter.
The goal of system level synchronization is to present all streams of data to the presentation device in a timely manner. The multimedia system designer must decide what the timebase or reference clock will be for his application. For example, in Matrox DigiSuite the timebase is the video field (60/sec. NTSC, 50/sec. PAL) so the reference clock is derived from the hardware vertical synchronization signal provided by the DigiSuite video board. In the film industry, a system designer would likely choose a reference clock of 24 frames/sec.
The rendering filters for all the data streams in the multimedia system must be controlled by this single reference clock. ActiveMovie accomplishes this by employing the concepts described below.
Each data stream is made up of samples. A sample is the smallest time element that the system designer decides that any particular data type can be divided into in his multimedia system. For example, Matrox has determined that, in DigiSuite, there will be 48,000 audio samples/sec. and that there will be 60 NTSC video fields/sec.
Any media segment that is available to the multimedia system has a finite length. The media position interface in a renderer filter allows the application to seek to any position inside the media segment at any time.
At any given time a media segment is in a given state such as STOP, PAUSE, PLAY, FF, etc. The application can change the state of the media by sending information to the media control interface. The state of the media will determine the rate at which the renderer filter consumes the data. For example, media in STOP mode consume no data.
Since there is a single, unchanging reference clock in an ActiveMovie-based multimedia system, one could envision a scenario where one renderer filter for some reason either receives information too quickly or too slowly. ActiveMovie provides a mechanism that resynchronizes the data stream if necessary by dropping or repeating samples. In a professional video system such as the Matrox DigiSuite, this phenomenon is avoided but in some multimedia systems this elegant system degradation is an important benefit of ActiveMovie. For example, a media server designed to deliver digital video streams to multiple users over a network might encounter a situation where there are more simultaneous requests for data than the server can handle. Each user would experience a degradation in the quality of the MPEG stream he receives until the server catches up to the demand placed upon it.
The Next Step -- Standardized ActiveMovie Media Types
The next project being undertaken by the OpenDML group is the definition of standard media types and device driver models. A Device Driver Workgroup made up of individuals from companies interested in this subject is currently being formed. The goal of this group is to define a standard Application Program Interface (API) for drivers to allow interchange of various vendors' hardware and software without the need to modify applications. The driver models will define interface characteristics and address various data types including:
What Will ActiveMovie Do for the Video Professional?
ActiveMovie is an important enabling technology for professional video systems. As vendors complete development using this model and start to introduce products based on ActiveMovie, users will realize many significant benefits. Individual products will be more reliable and offer higher performance than ever before. Development cycles will accelerate because ActiveMovie is such a modular, flexible architecture. Product upgrades will be easier for manufacturers to introduce so we may finally see NAB delivery promises met. A wider variety of price/performance solutions will be available and systems will be easily upgradeable. And perhaps the most significant benefit of all will be a high-level of multi-vendor, multi-platform interoperability.
For more info / Per avere maggiori informazioni : firstname.lastname@example.org