## Unicode Technical Standard #35
# Unicode Locale Data Markup Language (LDML)
Part 7: Keyboards
For the full header, summary, and status, see [Part 1: Core](tr35.md).
#### _Important Note_
> The CLDR [Keyboard Workgroup](https://cldr.unicode.org/index/keyboard-workgroup) is currently
> developing major changes to the CLDR keyboard specification. These changes are targeted for
> CLDR version 41. Please see [CLDR-15034](https://unicode-org.atlassian.net/browse/CLDR-15034) for
> the latest information.
### _Summary_
This document describes parts of an XML format (_vocabulary_) for the exchange of structured locale data. This format is used in the [Unicode Common Locale Data Repository](https://unicode.org/cldr/).
This is a partial document, describing keyboard mappings. For the other parts of the LDML see the [main LDML document](tr35.md) and the links above.
### _Status_
_This is a draft document which may be updated, replaced, or superseded by other documents at any time. Publication does not imply endorsement by the Unicode Consortium. This is not a stable document; it is inappropriate to cite this document as other than a work in progress._
> _**A Unicode Technical Standard (UTS)** is an independent specification. Conformance to the Unicode Standard does not imply conformance to any UTS._
_Please submit corrigenda and other comments with the CLDR bug reporting form [[Bugs](tr35.md#Bugs)]. Related information that is useful in understanding this document is found in the [References](tr35.md#References). For the latest version of the Unicode Standard see [[Unicode](tr35.md#Unicode)]. For a list of current Unicode Technical Reports see [[Reports](tr35.md#Reports)]. For more information about versions of the Unicode Standard, see [[Versions](tr35.md#Versions)]._
## Parts
The LDML specification is divided into the following parts:
* Part 1: [Core](tr35.md#Contents) (languages, locales, basic structure)
* Part 2: [General](tr35-general.md#Contents) (display names & transforms, etc.)
* Part 3: [Numbers](tr35-numbers.md#Contents) (number & currency formatting)
* Part 4: [Dates](tr35-dates.md#Contents) (date, time, time zone formatting)
* Part 5: [Collation](tr35-collation.md#Contents) (sorting, searching, grouping)
* Part 6: [Supplemental](tr35-info.md#Contents) (supplemental data)
* Part 7: [Keyboards](tr35-keyboards.md#Contents) (keyboard mappings)
## Contents of Part 7, Keyboards
* 1 [Keyboards](#Introduction)
* 2 [Goals and Non-goals](#Goals_and_Nongoals)
* 3 [Definitions](#Definitions)
* 4 [File and Directory Structure](#File_and_Dir_Structure)
* 5 [Element Hierarchy - Layout File](#Element_Heirarchy_Layout_File)
* 5.1 [Element: keyboard](#Element_Keyboard)
* 5.2 [Element: version](#Element_version)
* 5.3 [Element: generation](#Element_generation)
* 5.4 [Element: info](#Element_info)
* 5.5 [Element: names](#Element_names)
* 5.6 [Element: name](#Element_name)
* 5.7 [Element: settings](#Element_settings)
* 5.8 [Element: keyMap](#Element_keyMap)
* Table: [Possible Modifier Keys](#Possible_Modifier_Keys)
* 5.9 [Element: map](#Element_map)
* 5.9.1 [Element: flicks, flick](#Element_flicks)
* 5.10 [Element: import](#Element_import)
* 5.11 [Element: displayMap](#Element_displayMap)
* 5.12 [Element: display](#Element_display)
* 5.13 [Element: layer](#Element_layer)
* 5.14 [Element: row](#Element_row)
* 5.15 [Element: switch](#Element_switch)
* 5.16 [Element: vkeys](#Element_vkeys)
* 5.17 [Element: vkey](#Element_vkey)
* 5.18 [Element: transforms](#Element_transforms)
* 5.19 [Element: transform](#Element_transform)
* 5.20 [Element: reorders, reorder](#Element_reorder)
* 5.21 [Element: transform final](#Element_final)
* 5.22 [Element: backspaces](#Element_backspaces)
* 5.23 [Element: backspace](#Element_backspace)
* 6 [Element Hierarchy - Platform File](#Element_Heirarchy_Platform_File)
* 6.1 [Element: platform](#Element_platform)
* 6.2 [Element: hardwareMap](#Element_hardwareMap)
* 6.3 [Element: map](#Element_hardwareMap_map)
* 7 [Invariants](#Invariants)
* 8 [Data Sources](#Data_Sources)
* Table: [Key Map Data Sources](#Key_Map_Data_Sources)
* 9 [Keyboard IDs](#Keyboard_IDs)
* 9.1 [Principles for Keyboard Ids](#Principles_for_Keyboard_Ids)
* 10 [Platform Behaviors in Edge Cases](#Platform_Behaviors_in_Edge_Cases)
## 1 Keyboards
The CLDR keyboard format provides for the communication of keyboard mapping data between different modules, and the comparison of data across different vendors and platforms. The standardized identifier for keyboards can be used to communicate, internally or externally, a request for a particular keyboard mapping that is to be used to transform either text or keystrokes. The corresponding data can then be used to perform the requested actions.
For example, a web-based virtual keyboard may transform text in the following way. Suppose the user types a key that produces a "W" on a qwerty keyboard. A web-based tool using an azerty virtual keyboard can map that text ("W") to the text that would have resulted from typing a key on an azerty keyboard, by transforming "W" to "Z". Such transforms are in fact performed in existing web applications.
The data can also be used in analysis of the capabilities of different keyboards. It also allows better interoperability by making it easier for keyboard designers to see which characters are generally supported on keyboards for given languages.
To illustrate this specification, here is an abridged layout representing the English US 101 keyboard on the Mac OSX operating system (with an inserted long-press example). For more complete examples, and information collected about keyboards, see keyboard data in XML.
```xml
…
…
…
…
```
And its associated platform file (which includes the hardware mapping):
```xml
```
* * *
## 2 Goals and Non-goals
Some goals of this format are:
1. Make the XML as readable as possible.
2. Represent faithfully keyboard data from major platforms: it should be possible to create a functionally-equivalent data file (such that given any input, it can produce the same output).
3. Make as much commonality in the data across platforms as possible to make comparison easy.
Some non-goals (outside the scope of the format) currently are:
1. Display names or symbols for keycaps (eg, the German name for "Return"). If that were added to LDML, it would be in a different structure, outside the scope of this section.
2. Advanced IME features, handwriting recognition, etc.
3. Roundtrip mappings—the ability to recover precisely the same format as an original platform's representation. In particular, the internal structure may have no relation to the internal structure of external keyboard source data, the only goal is functional equivalence.
Note: During development of this section, it was considered whether the modifier RAlt (=AltGr) should be merged with Option. In the end, they were kept separate, but for comparison across platforms implementers may choose to unify them.
Note that in parts of this document, the format `@x` is used to indicate the _attribute_ **x**.
* * *
## 3 Definitions
**Arrangement** is the term used to describe the relative position of the rectangles that represent keys, either physically or virtually. A physical keyboard has a static arrangement while a virtual keyboard may have a dynamic arrangement that changes per language and/or layer. While the arrangement of keys on a keyboard may be fixed, the mapping of those keys may vary.
**Base character:** The character emitted by a particular key when no modifiers are active. In ISO terms, this is group 1, level 1.
**Base map:** A mapping from the ISO positions to the base characters. There is only one base map per layout. The characters on this map can be output by not using any modifier keys.
**Core keyboard layout:** also known as “alpha” block. The primary set of key values on a keyboard that are used for typing the target language of the keyboard. For example, the three rows of letters on a standard US QWERTY keyboard (QWERTYUIOP, ASDFGHJKL, ZXCVBNM) together with the most significant punctuation keys. Usually this equates to the minimal keyset for a language as seen on mobile phone keyboards.
**Hardware map:** A mapping between key codes and ISO layout positions.
**Input Method Editor (IME):** a component or program that supports input of large character sets. Typically, IMEs employ contextual logic and candidate UI to identify the Unicode characters intended by the user.
**ISO position:** The corresponding position of a key using the ISO layout convention where rows are identified by letters and columns are identified by numbers. For example, "D01" corresponds to the "Q" key on a US keyboard. For the purposes of this document, an ISO layout position is depicted by a one-letter row identifier followed by a two digit column number (like "B03", "E12" or "C00"). The following diagram depicts a typical US keyboard layout superimposed with the ISO layout indicators (it is important to note that the number of keys and their physical placement relative to each-other in this diagram is irrelevant, rather what is important is their logical placement using the ISO convention):
![keyboard layout example showing ISO key numbering](images/keyPositions.png)
One may also extend the notion of the ISO layout to support keys that don't map directly to the diagram above (such as the Android device - see diagram). Per the ISO standard, the space bar is mapped to "A03", so the period and comma keys are mapped to "A02" and "A04" respectively based on their relative position to the space bar. Also note that the "E" row does not exist on the Android keyboard.
![keyboard layout example showing extension of ISO key numbering](images/androidKeyboard.png)
If it becomes necessary in the future, the format could extend the ISO layout to support keys that are located to the left of the "00" column by using negative column numbers "-01", "-02" and so on, or 100's complement "99", "98",...
**Key:** A key on a physical keyboard.
**Key code:** The integer code sent to the application on pressing a key.
**Key map:** The basic mapping between ISO positions and the output characters for each set of modifier combinations associated with a particular layout. There may be multiple key maps for each layout.
**Keyboard:** The physical keyboard.
**Keyboard layout:** A layout is the overall keyboard configuration for a particular locale. Within a keyboard layout, there is a single base map, one or more key maps and zero or more transforms.
**Layer** is an arrangement of keys on a virtual keyboard. Since it is often not intended to use two hands on a visual keyboard to allow the pressing of modifier keys. Modifier keys are made sticky in that one presses one, the visual representation, and even arrangement, of the keys change, and you press the key. This visual representation is a layer. Thus a virtual keyboard is made up of a set of layers.
**Long-press key:** also known as a “child key”. A secondary key that is invoked from a top level key on a software keyboard. Secondary keys typically provide access to variants of the top level key, such as accented variants (a => á, à, ä, ã)
**Modifier:** A key that is held to change the behavior of a keyboard. For example, the "Shift" key allows access to upper-case characters on a US keyboard. Other modifier keys include but is not limited to: Ctrl, Alt, Option, Command and Caps Lock.
**Physical keyboard** is a keyboard that has individual keys that are pressed. Each key has a unique identifier and the arrangement doesn't change, even if the mapping of those keys does.
**Transform:** A transform is an element that specifies a set of conversions from sequences of code points into one (or more) other code points. For example, in most latin keyboards hitting the "^" dead-key followed by the "e" key produces "ê".
**Virtual keyboard** is a keyboard that is rendered on a, typically, touch surface. It has a dynamic arrangement and contrasts with a physical keyboard. This term has many synonyms: touch keyboard, software keyboard, SIP (Software Input Panel). This contrasts with other uses of the term virtual keyboard as an on-screen keyboard for reference or accessibility data entry.
### 3.1 Escaping
When explicitly specified, attributes can contain escaped characters. This specification uses two methods of escaping, the _UnicodeSet_ notation and the `\u{...}` notation.
The _UnicodeSet_ notation is described in [UTS#35 section 5.3.3](tr35.md#Unicode_Sets) and allows for comprehensive character matching, including by character range, properties, names, or codepoints. Currently, the following attributes allow _UnicodeSet_ notation:
* `from`, `before`, `after` on the `` element
* `from`, `before`, `after` on the `` element
* `from`, `before`, `after` on the `` element
The `\u{...}` notation, a subset of hex notation, is described in [UTS#18 section 1.1](http://www.unicode.org/reports/tr18/#Hex_notation). It can refer to one or multiple individual codepoints. Currently, the following attributes allow the `\u{...}` notation:
* `to`, `longPress`, `multitap`, `hint` on the `