# Korean strategy — type jamo, let the host IME compose

This is the central architectural decision of Tybe. The README's three-strategy framing landed on "ASCII-only v1, host helper for v2." This doc justifies a fourth strategy that keeps the dongle dumb, requires zero host-side software, and works for Korean from v1.

## The strategy in one paragraph

The dongle is a generic USB HID keyboard. Korean text is typed by emitting the same scancodes a Korean user would press on a 2-set (두벌식) keyboard layout. The host operating system, with its built-in Korean input source enabled, runs an Input Method Editor that composes those scancodes into Hangul syllables in real time. The client app — not the dongle — handles all the Hangul-to-jamo decomposition and 2-set mapping.

## Why this works

Korean keyboard input is unique in that the OS, not the application, owns the composition logic. When you set "Korean" as your input source on macOS, Windows, ChromeOS, or X11/Wayland Linux, the OS intercepts keystrokes and runs them through a built-in Korean IME before delivering composed syllables to the focused application. This means:

- **Any USB keyboard can produce Korean text.** Real Korean keyboards have the jamo printed on the keys, but the keycodes sent over USB are identical to a US QWERTY keyboard. The OS does the rest.
- **No host software required beyond the standard input source.** This is a stock OS feature, not a download. macOS users go to System Settings > Keyboard > Input Sources > +Korean. Windows users add Korean from Settings > Time & Language > Language. Total setup time: under a minute, one-time.
- **It works in SSH, remote desktop, terminals, browsers, every native app.** The composition happens at OS input-routing layer, before the application receives the text. The text arriving in the SSH session is the composed Korean string.

## Hangul syllable decomposition (the algorithm)

A Hangul syllable in Unicode (U+AC00 to U+D7A3) is a precomposed combination of three jamo: an initial consonant (choseong), a medial vowel (jungseong), and an optional final consonant (jongseong).

The decomposition is purely arithmetic:

```
syllable_index = codepoint - 0xAC00
initial_index  = syllable_index / 588
medial_index   = (syllable_index % 588) / 28
final_index    = syllable_index % 28        // 0 means "no final"
```

There are 19 initials, 21 medials, and 28 final slots (slot 0 = no final), giving 19 × 21 × 28 = 11172 syllables, exactly the size of the Hangul Syllables block.

Worked example for "한":
- codepoint = U+D55C = 54620
- syllable_index = 54620 - 44032 = 10588
- initial_index = 10588 / 588 = 18 → ㅎ
- medial_index = (10588 % 588) / 28 = 4 / 28 = wait that's 412/28
- ...

Actually let me redo: 10588 % 588 = 412. 412 / 28 = 14 remainder 20. So medial_index=14, final_index=20.

Wait that's wrong. Let me check by composition: ㅎ ㅏ ㄴ should give 한.
- initial ㅎ = 18, medial ㅏ = 0, final ㄴ = 4
- (18 × 588) + (0 × 28) + 4 = 10584 + 0 + 4 = 10588
- 10588 + 0xAC00 = 0xD55C = 한 ✓

So my decomposition arithmetic was wrong. Let me redo:
- index = 10588
- initial = 10588 / 588 = 18 ✓
- medial = (10588 - 18*588) / 28 = (10588 - 10584) / 28 = 4 / 28 = 0 ✓
- final = (10588 - 18*588 - 0*28) = 4 ✓

OK the algorithm is right, my arithmetic above was sloppy. The Swift implementation in `TybeHangul/HangulDecomposer.swift` has unit tests on real syllables.

## 2-set keymap (jamo → QWERTY key)

The 2-set keyboard layout (두벌식) maps each jamo to a single QWERTY key:

### Initials (choseong)
| Jamo | Key | Jamo | Key | Jamo | Key |
|---|---|---|---|---|---|
| ㄱ | r | ㄴ | s | ㄷ | e |
| ㄹ | f | ㅁ | a | ㅂ | q |
| ㅅ | t | ㅇ | d | ㅈ | w |
| ㅊ | c | ㅋ | z | ㅌ | x |
| ㅍ | v | ㅎ | g |     |     |
| ㄲ | R | ㄸ | E | ㅃ | Q |
| ㅆ | T | ㅉ | W |     |     |

### Medials (jungseong)
| Jamo | Key | Jamo | Key | Jamo | Key |
|---|---|---|---|---|---|
| ㅏ | k | ㅐ | o | ㅑ | i |
| ㅒ | O | ㅓ | j | ㅔ | p |
| ㅕ | u | ㅖ | P | ㅗ | h |
| ㅘ | hk | ㅙ | ho | ㅚ | hl |
| ㅛ | y | ㅜ | n | ㅝ | nj |
| ㅞ | np | ㅟ | nl | ㅠ | b |
| ㅡ | m | ㅢ | ml | ㅣ | l |

### Finals (jongseong)
| Jamo | Key | Jamo | Key | Jamo | Key |
|---|---|---|---|---|---|
| ㄱ | r | ㄲ | R | ㄳ | rt |
| ㄴ | s | ㄵ | sw | ㄶ | sg |
| ㄷ | e | ㄹ | f | ㄺ | fr |
| ㄻ | fa | ㄼ | fq | ㄽ | ft |
| ㄾ | fx | ㄿ | fv | ㅀ | fg |
| ㅁ | a | ㅂ | q | ㅄ | qt |
| ㅅ | t | ㅆ | T | ㅇ | d |
| ㅈ | w | ㅊ | c | ㅋ | z |
| ㅌ | x | ㅍ | v | ㅎ | g |

Compound jamo (ㅘ, ㄺ, etc.) are represented as a sequence of two simpler jamo presses; the IME composes them.

So "안녕" decomposes:
- 안 → ㅇ + ㅏ + ㄴ → keys `dks`
- 녕 → ㄴ + ㅕ + ㅇ → keys `sud`
- Combined: `dks sud` (no space between syllables — the IME segments by morphology)

The dongle types those QWERTY scancodes in order. The host IME (running because Korean input source is active) sees `d k s s u d` and emits `안녕` to the focused app.

## Input source toggling

The user must have Korean input source active when Korean opcodes hit the host. There are several strategies:

1. **Manual.** User toggles input source themselves (e.g., Caps Lock on macOS) before dictating Korean. v1 default.
2. **Auto-toggle via hotkey.** Client emits a `TOGGLE_INPUT_SOURCE` opcode before/after a Korean run. The dongle emits a configured hotkey (default: Caps Lock on macOS). Risk: if the user's input source state is out of sync with what the client thinks, the toggle inverts and Korean text is sent to the English IME (= garbage). Mitigation: client sends a Caps Lock at start of every Korean run regardless, accepting that English-after-Korean might briefly switch to Korean — the user's next ASCII input fixes it.
3. **OS-specific deterministic switch.** macOS has a `TISSelectInputSource` API, Windows has `LoadKeyboardLayout`. Requires per-OS code in the client, not the dongle. The validate CLI in Phase 0 uses this for deterministic testing.

For v1 the default is **manual** (strategy 1). Strategy 2 lands in v1.1 once we have user feedback.

## Limitations and edge cases

- **Hangul Jamo block (U+1100..U+11FF) and Hangul Compatibility Jamo (U+3130..U+318F)**: standalone jamo characters. Rare in practice; v1 strips them. Add v1.1 if requested.
- **Old Hangul (옛한글)**: not supported. The 2-set layout doesn't have keys for archaic jamo.
- **Apps that bypass IME**: rare on modern OSes. Some games and BIOS screens don't honor the IME — Korean won't work there. ASCII still does.
- **3-set (세벌식) keyboard layout users**: would need a different keymap. We default to 2-set since it ships on every OS by default. 3-set is a v2 config option.
- **IME suggestion popups**: typing fast may trigger the IME's suggestion bar. Should not affect committed text. Worth verifying in Phase 0.
- **Cursor positioning during composition**: while a syllable is being composed, most apps show it underlined. If the user moves the cursor mid-composition, the IME usually commits the half-syllable. We assume no concurrent input from the human user during dictation.

## Validation plan

Phase 0 validates this strategy on a Mac before any hardware is built:

1. Implement `HangulDecomposer` and `KoreanKeymap` in Swift with unit tests.
2. Build `tybe-validate` CLI: takes a string, plans an opcode sequence, uses `CGEventPost` to inject the equivalent keystrokes, and (for testing) toggles input source via `TISSelectInputSource`.
3. With TextEdit focused and Korean input source enabled, run `tybe-validate "안녕하세요"`. The text should appear in TextEdit.
4. Verify with several inputs: pure English, pure Korean, mixed, syllables with all three jamo positions, syllables with compound vowels (가위, 봐), syllables with compound finals (밟, 닭).

If step 3 produces correct Korean, the strategy is sound and Phase 2 can build firmware against the same opcode contract. If it doesn't, we fall back to OS-specific Unicode injection (worse: per-OS code in dongle or host helper required).

Phase 2 re-validates with real USB HID hardware. The CGEventPost path bypasses USB entirely, so a positive result in Phase 0 doesn't guarantee USB will work — but a negative result kills the strategy regardless.

## References

- Unicode Hangul Syllable Decomposition algorithm: Unicode Standard Annex #15, section "Hangul Syllable Composition Algorithm."
- 2-set Korean keyboard: KS X 5002 standard.
- Apple Text Input Services (TISSelectInputSource): macOS Carbon API, headers in `Carbon/Carbon.h`.
- USB HID Usage Tables for Universal Serial Bus, version 1.4, table "Keyboard/Keypad Page (0x07)".
