In “Introducing Responsive Audiovisual Design,” I describe Responsive Audiovisual Design. Simply put, RAD asks that we introduce HTML5, CSS3, and Javascript on top of AV signal distribution architectures in order to adapt Responsive Web Design to the built environment. This creates dependencies and opportunities. To understand those, let’s focus for now on typical presentation environments.

Consider how browser-based content is typically integrated into AV systems. Figure 1 illustrates how a PC or mobile device running a browser or other graphics engine, the renderer, converts instructions and assets into AV signals. The instructions are HTML/CSS markup and Javascript. Assets come in the form of text, images, fonts, videos, and vector graphics, among others.

That rendered audio and video is output over some form of digital audio or video transport, whether it’s a single cable or an IP router. The signal is sent to a display or an audio system: “Human-Computer Interfaces.” HCI devices translate digital into the perceptual or translate human actions into digital. We can already build small AV systems that consist only of software and HCI devices on a network. As AV over IP offerings proliferate this capability will continue to expand.

In typical AV distribution systems, the same content can be shown on any number of displays if each of those display knows how to deal with the signal. Figure 2 illustrates this “one to many” approach.

Displays can stretch or crop content but can’t recompose it. While a windowing processor has significant power available for manipulating the image, neither can it recompose the content fed to it. Only the renderer applies instruction to assets and capabilities to compose the content.

So, are those instructions and assets content? Yes and no. Philosophically, I consider them actual content only when rendered, although I refer to the collective assets and instructions as “content” just like most people. Recognizing the distinction is important, however. If content is rendered from assets based on instructions and the capabilities of the renderer, then the same assets and instructions applied to different capabilities result in different content and a different experience. That’s a critical opportunity to control how the story unfolds. In the AV world of built environments, the experience of a client pitch in a huddle space is nothing like the same argument presented in a three-hundred seat auditorium. Not only will the system capabilities differ, but the rest of the characteristics of the spaces will be quite different. The instructions are the smart bit that give us control over how that happens.

Responsive Web Design

Essentially, that is Responsive Web Design: the content intelligently looks at the capabilities available and adjusts itself accordingly. There are four key criteria to create responsive, interactive content.

The information and assets must be modular and heuristic so that they can be reconfigured. HTML5 describes the information and assets semantically so that the instructions can act on the appropriate elements.
There must be instructions that describe how the information and assets are presented and how they behave. CSS3 and Javascript describe how the content should be displayed and how it should behave, including any interactions
The instructions must be able to determine the capabilities of the environment in which they are executed. Javascript can query the browser through standard APIs to find out considerable detail of the environment in which it is running.
The instructions must be able to act on user actions. The browser passes user actions on to Javascript through standard APIs. Javascript can pass messages back and act upon the physical hardware.

This capability exists entirely within the browser. Content is usually fetched from a web server, though anyone who has experimented with web coding knows that a web server isn’t necessary for create simple content (or even complex content within certain parameters). Sometimes the server just passes along files and the browser handles all of the logic, but servers generally act to expand the browser power, providing data synchronization, additional services, and another layer of logic to the overall application.

Figure 3 breaks this down. Notice there are two HCI blocks in this diagram. There is physical IO which connects HCI devices on the renderer, and there are HCI devices in the AV system.

In AV systems, the IO on the renderer (any PC, for example) is connected the signal distribution system. What control information is conveyed across a distribution system is limited to signal management. The image, sound, or HID data is serialized and sent from point A to point B. It is already whole. It is not broken apart into component parts (from a content point-of-view), and there is nothing like CSS, which describes how content should look in terms of color, size, placement, or font. Where HID data connects human input devices such as a mouse or touchscreen, no logic is applied.

Enabling Responsive Audiovisual Design

If we want the content to have knowledge or control of the AV system, for example to cause a microphone to unmute, then we need to connect it to the AV control system, as shown in Figure 4.

This is how proprietary presentation graphics and show control systems work, often also assuming the AV control system role. However, content designed in one proprietary system cannot be easily be utilized in another. If we were to create a common API in the browser that in turn connects to an API in the AV control system, content becomes highly portable. The content becomes aware of the systems capabilities and a first-class member of the total system. This allows us to expand RWD to RAD.

The information and assets must be modular and heuristic so that they can be reconfigured. HTML5 describes the information and assets semantically so that the instructions can act on the appropriate elements.
There must be instructions that describe how the information and assets are presented and how they behave. CSS3 and Javascript describe how the content should be displayed and how it should behave, including any interactions.
The instructions must be able to determine the capabilities of the environment in which they are executed. Javascript can query the AV system (via the browser) through standard APIs to find out considerable detail of the environment in which it is running.
The instructions must be able to act on user actions. The AV system (via the browser) passes user actions on to Javascript through standard APIs. Javascript can pass messages back and act upon the physical hardware.

The Evolution of AV Signal Distribution

There is a consequence of these dependencies. We need to render specifically for a particular mix of HCI in a system just as the browser on an iPad renders specifically for the specifications and orientation of that single, physical tablet.

AV distribution systems evolved to connect one source of content to any number of displays or speakers. For example, they make any number of cameras available to any number of other devices that would record or process that signal. As the AV ecosystem has become entirely digital, the role of the distribution system distills down to one purpose: connect compute with HCI devices. As digital distribution systems mature into IP networks, they differ little or at all from the IP data network that provides the instructions and assets to the compute – the renderer.

This is one part of the convergence of AV and IT that we have long heard echoed throughout the industry. Increasingly, all devices simply plug into a network or, more commonly, several networks. As anyone who has worked in or closely with a major corporate enterprise can attest, there are strong rationale to keep certain networks separate. Data security is better (at least simpler) when it is contained within a single domain and the access to that domain is rigorously maintained. The introduction of IoT devices increases traffic and security risks substantially. Even moderate quality video requires significant bandwidth with different network performance preferences than typical data applications. As our world becomes more connected, it will happen across increasingly fragmented networks. This means content will increasingly need to aggregate and traverse multiple network domains.

Canvases and Venues

HTML based delivery allows assets to be retrieved from different locations and in different forms, and so RAD addresses this alternating network convergence and fragmentation. The renderer allows content to be aggregated across domain and composed specific to an HCI device (Figure 5) or device cluster (Figure 6). The combined renderer and IO cluster create a canvas.

This architecture should look familiar as it is the same as digital signage. Digital signage also has a need to pull assets from multiple locations and render content specifically for a screen location or role. Like digital signage, in a browser-based approach the hardware requirements vary based on canvas requirements. On the low end, very inexpensive compute can suffice, while a canvas larger than UHD may require more powerful GPUs.

The benefits of pairing renderer to HCI devices becomes more apparent as we look at how multiple canvases can become a Venue in Figure 7. Not only can the content adapt to the performance characteristics of a canvas, but it can adapt to the assigned role of a canvas. This is one more opportunity to fine-tune the simultaneous delivery of our story across multiple touch points.

Here is where RAD begins to reveal its power. In Figure 7 we have a primary display, a confidence monitor for the presenter, and we leverage audience personal devices. What if there was a poll that was to be conducted? We could have one HTML5 applet (running on a server) that could present in different forms across the three canvases. The presenter would see a countdown timer and an indication of how many people had voted. The audience could use their devices to vote, and they would see the countdown timer on the primary display. When the timer reaches zero, the voting ability ends, the main display reveals the results, and the confidence monitor provides speaker notes on the results to the presenter. Certainly, this could be achieved purely through hardware, but it would be more complex. Within the RAD pattern, this same content could be moved to another venue and it would automatically adapt to the characteristics of that venue. Perhaps our second venue has built-in voting buttons at each seat, so the audience device isn’t required. Or maybe there is a second confidence monitor which provides additional cues to the presenter.

The Potential for Platforms

This may sound like a lot of coding and configuration, but that’s entirely the point. The story teller shouldn’t have to code anything. RWD helps to maintain the modularity of applications from the very bottom to the user interface. RAD is building onto that mature architecture. We will need to develop all kinds of new features and specialized applications to aggressively activate the environments we build, but we can do that without inventing anything and while leveraging the work of many developers before us. We can extend platforms and build new ones.

The biggest experiential and commercial impact of RAD is the ability to build platforms that fully integrate content with the systems that enable it. Concurrent multi-channel delivery of content, telling a story across several simultaneous touchpoints such as displays, printed material, and a live speaker, is a hallmark of rich business experiences. With the explosive growth in remote workforces and small teams, business users face the need to engage across diverse venues economically and in real time. AV systems do this at a human scale, allowing face-to-face social interaction with technology acting both foundationally and peripherally, but have not yet seen significant experiential benefits from the platform paradigm offered by the web because the content is not responsive to the hardware.

The end-user, both the individual and the organization, will find commercial value in a reduction in operational and content creation costs and in higher utilization of the systems. On the left side of Figure 8, the Content Author and the Event Director both connect to the templating engine within the platform. The templates unlock the ability for the story-teller to create their own multi-channel content without always requiring other specialists or specialized tools.

RAD Extends Proprietary Systems

As I pointed out in the previous article, “Introducing Responsive Audiovisual Systems,” the browser has performance limits. Sometimes the power of a GPU or FPGA based visualization engine is required. But we can still achieve responsiveness. All of those tools work on some sort of template principle, and nearly all of the GPU based tools will render HTML, often multiple instances of HTML in the same template. Since we are delivering component parts to the browser, we can deliver component parts to the visualization engine.

Figure 9 shows one possible implementation, where a parsing agent acts as an interface between the content platform and the template within the proprietary canvas. The parsing agent acts like a browser to the platform and has a native integration to the visualization engine. Vendors of those engines could provide these agents, or the platform vendor could provide it. There is complexity here and there will certainly be practical limits to the responsiveness of such a solution. But the more advanced proprietary capabilities will be available when required, while the responsive framework can drive higher utilization of a critical investment.

Hopefully, I’ve clearly laid out the basic architecture to achieving RAD. There is still much to be said in a future article about the API itself.

Ryan Howard is Founder and Principal at Storied Systems, a New York based thought leader in experiential systems. he’s obsessed with storytelling at human scale and driving a collective conversation about experience.

Feb 18 The Foundation of Responsive Audiovisual Design