User Tools

Site Tools




Related Links

DeviceML was created and is maintained by Ideum.

All content © 2018 Ideum.




Welcome to the DeviceML Wiki. This wiki contains DML knowledge-base articles designed for users and developers of the Device IO Markup Language.

Device Input Output Markup Language (DML) is an XML-based open standard for defining device initialization, configuration, and management within immersive multiuser, multimodal and crossmodal HCI environments. It is the official markup language for authoring Gestureworks HCI applications. The goals of DML are to create reusable and shareable UX models that are programming language-agnostic and to accelerate the development of applications with rich multiuser, multimodal natural user interfaces.


DML provides an infrastructure for rapidly developing and sharing rich multimodal device input output models.

Feature List

  • Input support
    • Touch input device support (3M, PQ, Displax, Android)
      • finger, fiducial, stylus (configuration)
      • Native Multitouch HID driver compatible Windows 7/8/10
      • TUIO protocol compatible
    • Motion input device support (Leap Motion, RealSense, Duo3D, Kinect, SoftKinetic, Structure)
      • finger, hand, head, eyes, body and object tracking (configuration)
      • ranging configuration
      • view-able area configuration
      • view orientation configuration
      • cooperative illumination controls
    • Sensor input support (Tobii, Eyetribe, Android, IMU's, Wiimote, PS3 Move)
      • eyes, controllers, tools (configuration)
      • sampling rate control
    • Wearables (Myo, Nod, Basis)
      • armband, ring, watch, glove (configuration)
    • Microphones
      • headmounted, wearable
      • microphone array (configuration)
    • RFID
      • range (configuration)
      • array (configuration)
  • Output support
    • Multi-monitor display output
    • HMD output feedback
    • Audio output (7.1 surround sound)
    • Haptic output (Myo, Android, Gloves)
  • Fusion support
    • multi-device input fusion
    • modal input fusion
    • fusion pre-processing (configuration)
    • module (configuration)
  • Virtual Control Support (VCML)
    • device management
    • device calibration


The typical number of devices a user interacts with daily is ever increasing. With each new smart device comes groups if internal sensors and communications methods with IO capabilities. In the current generation of mobile devices, peripherals and wearables there are a list of ways to create high fidelity gestures to control and create rich output. For application developers there are many situations where more than more than one device of the same type may be used as a control device. Increasingly more complex applications require an array of sensors to act in a coordinated input control capacity.

Device Input-Output Markup Language has been designed to make it easy for developers to manage arrays of distinct commodity sensors or a list of multimodal input devices. For example the following DML shows how to initialize a set of Leap Motion Devices and initialize a contiguous registration space:

DML Example Multidevice Management

<DeviceMarkupLanguage xmlns:gml="">
      <devices mode="server" protocol="tcp" port="49191" host="localhost" frame_rate="60">
                   <motion active="true">
                           <leap type="first gen" active="true" mode="precision" frame_rate="60" input_mode="3d">
                                   <device id="0"  active="true">
                                         <attributes mode="precision" frame_rate="60" input_mode="3d"normalize="true"/>
                           <leap type="first gen" active="true" mode="precision" frame_rate="60" input_mode="3d">
                                   <device id="1"  active="true">
                                           <attributes mode="precision" frame_rate="60" input_mode="3d" normalize="true"/>
                           <leap type="first gen" active="true" mode="precision" frame_rate="60" input_mode="3d">
                                    <device id="2"  active="true">
                                            <attributes mode="precision" frame_rate="60" input_mode="3d" normalize="true"/>

The following DML example shows how to activate and configure a basis for rich multimodal input. In this example a multitouch input is activated along with motion input via a Leap Motion device and Voice input via microphone to create a multimodal input capability.

DML Example Multimodal Device Management

<DeviceMarkupLanguage xmlns:gml="">
      <devices mode="server" protocol="tcp" port="49191" host="localhost" frame_rate="60">
                 <touch active="true">
                           <screen type="mmm" active="false" mode="native">
                                 <device id="screen1" active="true">
                                          <attributes mode="native" resolution="" refresh_rate="120" palm_rejection="false"/>
                 <motion active="true">
                           <leap type="first gen" active="true" mode="precision" frame_rate="60" input_mode="3d">
                                   <device id="0"  active="true">
                                         <attributes mode="precision" frame_rate="60" input_mode="3d"normalize="true"/>
                           <tobii type="" active="false" mode="">
                                   <device id="0"  active="true">
                                         <attributes mode="precision" frame_rate="60" normalize="true"/>
                <sensor active="true">
                        <voice type="classic" active="false">
                                  <device id ="0" active="true">
                                         <attributes type="microphone" input_type="continuous"/>

The above DML lays the foundation to enable rich multimodal gesture controls via GML or rich dynamic content using CML.

Proposed Expansion of Schema

  • Advanced inter-modual communications
    • Multi-level feature fusion
    • Elastic input synchronization methods
    • Module processing configuration
  • Advanced network control methods
    • Node based network configuration
    • Explicit COMS definition and management for each node
  • Expanded IO Methods
    • NFC
    • ZigBee
    • Z-wave
    • BLE

Multi-device and Multi-instance Feature Fusion

Inter-modal device based feature fusion enables the direct association of mid to high level feature data. It is markedly different from low level data fusion techniques as seen in no common “Kinect fusion” or point cloud fusion techniques that rely on direct depth map integration. The fusion of skeletal data from multiple identical devices with in the same modal type is a type of “Feature fusion”. The qualified feature points identified as object or body skeletons.

One approach is the strategic use of multiple identical 3D motion sensors to independently track object skeletons while intelligently avoiding optical interference through device coordination and complimentary positioning. This allows the features identified and tracked from each device to be directly compared and used to build a more complete object skeleton with significantly more robust features.

Single User 3D motion body skeleton fusion (multiple Leap Motion type devices)

Multiple short-range tracking devices can be integrated into a single, purpose built device to critically increase coverage in a single user interaction environment. In the image below shows a variety of multiple Leap Motion devices arranged in so that hand views are established along each axis. Event with the wide viewing angle of the Leap Motion cameras orthogonal devices tend not to interfere with each other. However devices that are pointing towards each other, fixed along the same axis require cooperative illumination methods.

Multi-Leap Motion Frame Curatio: Hand Scanner

The image below show different setups that allow tracking devices to offer complimentary views of a user. The first and second images show two distinct depth mapping cameras embedded into a laptop to provide coverage for multiple regions of the user's face and hands. The third image shows a user and world facing camera. The fourth shows a head mounted display with embedded depth-map devices being used with a wearable necklace. The final image shows a laptop with a set of dual depth map devices configured as a stereoscopic pair. In each of these configuration examples focused coverage can be achieved with a much larger total field of view. Within regions of strong overlap mutexed tracking methods can be leveraged to increase tracking rates and precision.

Multiuser 3D motion body skeleton fusion (multiple Kinect type devices)

Within extended room scale spaces multiple users can be tracked using the strategic placement, orientation and calibration of multiple long range depth mapping devices such as the Microsoft Kinect.

By using multiple long-range and short-range sensors that are intelligently aligned and managed increases coverage of the user space and reduces blind spots or areas of poor resolution. This provides greater opportunity for hi-fidelity gestures from multiple locations within a space and promotes natural free-form interactions.

Multi-Kinect Walkway Multi-Kinect Surround Dual-Kinect

Using multiple devices creates larger interaction spaces and can allow users to be tracked around large objects or from one room to another. Understanding where a user came from and where a user is going can add valuable context to interactions.

Other Types of Feature Fusion

Other types of feature fusion also exist. For example “cross-modal feature fusion” (also known as context fusion) can be used to integrate feature data from different types of input deices to improve tracking and gesture recognition confidence. More information about context fusion for multimodal gesture analysis applications can be found at GML Types of Input Fusion.

Frameworks & SDKS

GestureWorks Core: C++ framework for use with C++, C#.NET, Java and Python (Uses GML and DML)

deviceml.txt · Last modified: 2018/03/07 19:02 by glass