Paradigma especulativo para bronces en web: lip-grid, serie armónica + válvulas, cono virtual y espectrometría circular brassy.

Prototype Core

This prototype translates the organology of brass instruments into a web interface structured around three axes:

Lip + hand complex as interface
Harmonic series + valves as pitch engine
Brassness as a live timbral metric

The scene is designed with the lips centered and cropped, over a sensing grid (pressure, position, emission) and a visual cone along the Z-axis representing the virtual horn uncoiled.

I. Camera Space → Mask Space Mapping

The object-fit: cover Problem

MediaPipe FaceMesh returns landmarks in normalized coordinates $[0,1]^2$ relative to the native video stream frame.

The <video> element, however, is rendered using object-fit: cover, which scales the stream to fill the element, symmetrically cropping edges. This introduces a non-trivial discontinuity between stream space and viewport space.

Let: - $(v_w, v_h)$ = native stream resolution in pixels - $(e_w, e_h)$ = video element size (= viewport size)

The cover scale factor is:

$s_\text{cover} = \max\left(\frac{e_w}{v_w},\ \frac{e_h}{v_h}\right)$

The cropped pixels per side are:

$c_x = \frac{v_w - e_w / s_\text{cover}}{2}, \quad c_y = \frac{v_h - e_h / s_\text{cover}}{2}$

Normalized crop fractions: $f_x = \frac{c_x}{v_w} = \frac{v_w - e_w/s_\text{cover}}{2v_w}, \quad f_y = \frac{c_y}{v_h}$

The visible stream range in normalized coordinates is:

$[f_x,\ 1-f_x] \times [f_y,\ 1-f_y]$

with visible width and height:

$w_\text{vis} = 1 - 2f_x, \quad h_\text{vis} = 1 - 2f_y$

Full Transformation Pipeline

Given a MediaPipe landmark (m_x, m_y) \in [0,1]^2, the final canvas pixel position (p_x, p_y) is computed as follows:

Step 0 — Center Calibration

$c_x' = m_x - \delta_x, \quad c_y' = m_y - \delta_y$

where $(\delta_x, \delta_y)$ is the calibration offset

Step 1 — Cover Crop Correction

$e_x = \frac{c_x' - f_x}{w_\text{vis}}, \quad e_y = \frac{c_y' - f_y}{h_\text{vis}}$

This maps the visible stream region linearly to [0,1].

Step 2 — Horizontal Mirror (scaleX(-1))

$\mu_x = 1 - e_x$

Step 3 — Zoom Around Center

$p_x = \left(0.5 + (\mu_x - 0.5)\cdot z\right)\cdot W$ $p_y = \left(0.5 + (e_y - 0.5)\cdot z\right)\cdot H$ where:

z = CSS zoom factor
(W, H) = $e_w \times e_h$

Why Work in Normalized Space?

All operations are performed entirely in normalized coordinates. Previous incorrect formulations mixed pixel-space and normalized-space corrections, effectively applying the crop compensation twice. The correct approach applies scaling exactly once as a fraction.

II. Lip Model — 1-DOF Oscillator

Physical Basis

The lips in brass playing can be modeled as a damped mass-spring oscillator:

$\ddot{y} + \frac{\omega_l}{Q_l}\dot{y} + \omega_l^2(y - y_0) = \frac{F_\text{eff}}{m_l}$

where:

$\omega_l = 2\pi f_l$
$Q_l$ = quality factor
$y_0$ = resting aperture
$F_\text{eff}$ = net force (air pressure − muscular tension)

Embouchure as Continuous Parameter

Lip geometry yields: $\xi = \frac{d_{13,14}}{d_{61,291}}$ $\xi \in [0,1]$ $Small \xi$ → tight lips → high register $Large \xi$ → open lips → low/pedal register Lip frequency follows: $f_l = f_{l,0}\sqrt{T_l/T_{l,0}}$ Tension $T_l$ is approximated inversely to $\xi$ .

Fractional Harmonic Interpolation

Harmonic series:

$f_n = n f_\text{tube}$

User trains pairs $(\xi_k, n_k).$

Interpolation:

Sort by $\xi_k$
Find interval
Linear parameter: $t = \frac{\xi - \xi_k}{\xi_{k+1} - \xi_k}$
Smooth S-curve: $\tilde{t} = t^2(3 - 2t)$
Fractional partial: $n_\xi = n_k + \tilde{t}(n_{k+1} - n_k)$ First-order smoothing: $\dot{n} = \frac{n_\xi - n}{\tau} \quad \tau = 80\text{ ms}$ Resulting synthesis frequency: $f_0 = f_\text{tube} \cdot n$ This enables continuous glissandi.

III. Additive Synthesis Engine

Audio Graph

$\text{oscillators} \rightarrow \text{HP} \rightarrow \text{peak EQ} \rightarrow \text{LP} \rightarrow \text{master} \rightarrow \text{analyser} \rightarrow \begin{cases} \text{dry}\\ \text{wet} \end{cases}$

Register-Dependent Brassiness

Normalized partial: $p_n = \text{clamp}(n/8,\ 0.125,\ 1.5)$

Brassiness factor: $B_p = \text{clamp} \left( 0.9 e^{-2.5(p_n-0.125)} + 0.2 p_n^{1.8} + 0.15 \right)$

Octave Darkening

$D_\text{oct} = \text{clamp}(1.5 - 0.8 p_n,\ 0.5,\ 1.5)$

Lowpass Cutoff

$f_\text{cut} = 280 + P_m \cdot 3200 D_\text{oct} + B_s \cdot 800 B_p + [\text{boost}] \cdot 1400$

Spectral Roll-Off

$\alpha_h = 0.3 + 0.5P_m - 0.25B_p$ $A_h \propto h^{-\alpha_h}$

Odd Harmonic Emphasis

$A_h^\text{adj} = \begin{cases} A_h (1 + 0.6 B_p) & h \text{ odd} \\ A_h (1 - 0.2 B_p) & h \text{ even} \end{cases}$

IV. Cubic-Root Companding for Mic Gate

Motivation

RMS values typically lie in:

$[0.001, 0.06]$

Linear mapping compresses usable range.

Companding

$\rho_\text{comp} = \text{RMS}^{1/3}$

Intensity normalization: $\rho_\text{int} = \text{clamp}(1.8\rho_\text{comp} - 0.04,\ 0,\ 1)$

Raw weight: $w_\text{raw} = \text{clamp}(0.78\rho_\text{int} + 0.22\chi,\ 0,\ 1)$

Adaptive Noise Floor

$\eta_{k+1} = \eta_k + 0.025(\min(0.25,w_\text{raw}) - \eta_k)$

Final Gate

$d_0 = \text{clamp}(w_\text{raw} - \eta - 0.005,\ 0,\ 1)$ $d = d_0^{0.65}$

User threshold T: $a = \frac{d - T}{1 - T}$

Exponent: $\varepsilon = 0.45 + 0.55T$ $g_\text{open} = a^\varepsilon$

Soft pre-trigger: $p = \frac{d - 0.6T}{0.4T}$ $g_\text{below} = 0.18 p^{1.4}$

V. Mouth Center Calibration

Calibration key [C]: $\delta_x = \bar{x}_\text{mouth} - 0.5$ $\delta_y = \bar{y}_\text{mouth} - 0.5$

CSS shift: $T_x = \delta_x z W$ $T_y = -\delta_y z H$

VI. Synthetic Impulse Response

$\text{IR}(t) = (2u-1)e^{-6t/\tau_r} + \sum_k a_k\delta(t - t_k)$ Early reflections:

k	t_k ms	a_k L/R
1	18	0.70 / 0.60
2	32	0.43 / 0.50
3	55	0.35 / 0.35

VII. Mobile Camera & Mic Acquisition

Fallback order:

exact deviceId
→ ideal deviceId
→ facingMode: user
→ video: true

Unified device listener:

navigator.mediaDevices.addEventListener('devicechange', async () => {
  await this.refreshCameraInputs();
  await this.refreshMicInputs();
});

VIII. Valve Mapping

Effective tube length: $L_\text{eff} = L_0 \prod_{k=1}^{4} 2^{v_k s_k/12}$

Tube frequency: $f_\text{tube} = \frac{c}{2L_\text{eff}}$

IX. Web Instrument Paradigm

The innovation is not additive synthesis. The innovation is the control channel: Camera → Lip Geometry → Fractional Harmonics → Timbral Field

The web browser becomes:

A lip amplifier
A high-dimensional gestural interface
A programmable brass organology

X. Spectral Metrics

Centroid: $C = \frac{\sum f_k |X_k|^2}{\sum |X_k|^2}$

Harmonicity:

$H = \frac{\text{harmonic energy}}{\text{total energy}}$

Brassiness index: $B_s = 0.6\frac{C}{C_\text{ref}} + 0.4(1-H)$

XI. Roadmap

Full FaceMesh integration
Calibrated tube-length tables
Instrument switching (tuba / trumpet)
Karplus–Strong mode
Session export (JSON / extended MIDI)

the brassys