NOTES

WebBrass / Tuba Proto

Paradigma especulativo para bronces en web: lip-grid, serie armónica + válvulas, cono virtual y espectrometría circular brassy.

Prototype Core

This prototype translates the organology of brass instruments into a web interface structured around three axes:

  1. Lip + hand complex as interface
  2. Harmonic series + valves as pitch engine
  3. Brassness as a live timbral metric

The scene is designed with the lips centered and cropped, over a sensing grid (pressure, position, emission) and a visual cone along the Z-axis representing the virtual horn uncoiled.

I. Camera Space → Mask Space Mapping

The  object-fit: cover Problem

MediaPipe FaceMesh returns landmarks in normalized coordinates [0,1]2[0,1]^2 relative to the native video stream frame.

The <video> element, however, is rendered using object-fit: cover, which scales the stream to fill the element, symmetrically cropping edges. This introduces a non-trivial discontinuity between stream space and viewport space.

Let: -(vw,vh)(v_w, v_h) = native stream resolution in pixels -(ew,eh)(e_w, e_h) = video element size (= viewport size)

The cover scale factor is:

scover=max(ewvw, ehvh)s_\text{cover} = \max\left(\frac{e_w}{v_w},\ \frac{e_h}{v_h}\right)

The cropped pixels per side are:

cx=vwew/scover2,cy=vheh/scover2c_x = \frac{v_w - e_w / s_\text{cover}}{2}, \quad c_y = \frac{v_h - e_h / s_\text{cover}}{2}

Normalized crop fractions: fx=cxvw=vwew/scover2vw,fy=cyvhf_x = \frac{c_x}{v_w} = \frac{v_w - e_w/s_\text{cover}}{2v_w}, \quad f_y = \frac{c_y}{v_h}

The visible stream range in normalized coordinates is:

[fx, 1fx]×[fy, 1fy][f_x,\ 1-f_x] \times [f_y,\ 1-f_y]

with visible width and height:

wvis=12fx,hvis=12fyw_\text{vis} = 1 - 2f_x, \quad h_\text{vis} = 1 - 2f_y

Full Transformation Pipeline

Given a MediaPipe landmark (m_x, m_y) \in [0,1]^2, the final canvas pixel position (p_x, p_y) is computed as follows:

Step 0 — Center Calibration

cx=mxδx,cy=myδyc_x' = m_x - \delta_x, \quad c_y' = m_y - \delta_y

where (δx,δy)(\delta_x, \delta_y) is the calibration offset

Step 1 — Cover Crop Correction

ex=cxfxwvis,ey=cyfyhvise_x = \frac{c_x' - f_x}{w_\text{vis}}, \quad e_y = \frac{c_y' - f_y}{h_\text{vis}}

This maps the visible stream region linearly to [0,1].

Step 2 — Horizontal Mirror (scaleX(-1))

μx=1ex\mu_x = 1 - e_x

Step 3 — Zoom Around Center

px=(0.5+(μx0.5)z)Wp_x = \left(0.5 + (\mu_x - 0.5)\cdot z\right)\cdot W py=(0.5+(ey0.5)z)Hp_y = \left(0.5 + (e_y - 0.5)\cdot z\right)\cdot H where:

  • z = CSS zoom factor
  • (W, H) = ew×ehe_w \times e_h

Why Work in Normalized Space?

All operations are performed entirely in normalized coordinates. Previous incorrect formulations mixed pixel-space and normalized-space corrections, effectively applying the crop compensation twice. The correct approach applies scaling exactly once as a fraction.

II. Lip Model — 1-DOF Oscillator

Physical Basis

The lips in brass playing can be modeled as a damped mass-spring oscillator:

y¨+ωlQly˙+ωl2(yy0)=Feffml\ddot{y} + \frac{\omega_l}{Q_l}\dot{y} + \omega_l^2(y - y_0) = \frac{F_\text{eff}}{m_l}

where:

  • ωl=2πfl\omega_l = 2\pi f_l
  • QlQ_l = quality factor
  • y0y_0 = resting aperture
  • FeffF_\text{eff} = net force (air pressure − muscular tension)

Embouchure as Continuous Parameter

Lip geometry yields: ξ=d13,14d61,291\xi = \frac{d_{13,14}}{d_{61,291}} ξ[0,1]\xi \in [0,1] SmallξSmall \xi → tight lips → high register LargeξLarge \xi → open lips → low/pedal register Lip frequency follows: fl=fl,0Tl/Tl,0f_l = f_{l,0}\sqrt{T_l/T_{l,0}} Tension TlT_l is approximated inversely to ξ\xi.

Fractional Harmonic Interpolation

Harmonic series:

fn=nftubef_n = n f_\text{tube}

User trains pairs (ξk,nk).(\xi_k, n_k).

Interpolation:

  1. Sort by ξk\xi_k
  2. Find interval
  3. Linear parameter: t=ξξkξk+1ξkt = \frac{\xi - \xi_k}{\xi_{k+1} - \xi_k}
  4. Smooth S-curve: t~=t2(32t)\tilde{t} = t^2(3 - 2t)
  5. Fractional partial: nξ=nk+t~(nk+1nk)n_\xi = n_k + \tilde{t}(n_{k+1} - n_k) First-order smoothing: n˙=nξnττ=80 ms\dot{n} = \frac{n_\xi - n}{\tau} \quad \tau = 80\text{ ms} Resulting synthesis frequency: f0=ftubenf_0 = f_\text{tube} \cdot n This enables continuous glissandi.

III. Additive Synthesis Engine

Audio Graph

oscillatorsHPpeak EQLPmasteranalyser{drywet\text{oscillators} \rightarrow \text{HP} \rightarrow \text{peak EQ} \rightarrow \text{LP} \rightarrow \text{master} \rightarrow \text{analyser} \rightarrow \begin{cases} \text{dry}\\ \text{wet} \end{cases}

Register-Dependent Brassiness

Normalized partial: pn=clamp(n/8, 0.125, 1.5)p_n = \text{clamp}(n/8,\ 0.125,\ 1.5)

Brassiness factor: Bp=clamp(0.9e2.5(pn0.125)+0.2pn1.8+0.15)B_p = \text{clamp} \left( 0.9 e^{-2.5(p_n-0.125)} + 0.2 p_n^{1.8} + 0.15 \right)

Octave Darkening

Doct=clamp(1.50.8pn, 0.5, 1.5)D_\text{oct} = \text{clamp}(1.5 - 0.8 p_n,\ 0.5,\ 1.5)

Lowpass Cutoff

fcut=280+Pm3200Doct+Bs800Bp+[boost]1400f_\text{cut} = 280 + P_m \cdot 3200 D_\text{oct} + B_s \cdot 800 B_p + [\text{boost}] \cdot 1400

Spectral Roll-Off

αh=0.3+0.5Pm0.25Bp\alpha_h = 0.3 + 0.5P_m - 0.25B_p AhhαhA_h \propto h^{-\alpha_h}

Odd Harmonic Emphasis

Ahadj={Ah(1+0.6Bp)h oddAh(10.2Bp)h evenA_h^\text{adj} = \begin{cases} A_h (1 + 0.6 B_p) & h \text{ odd} \\ A_h (1 - 0.2 B_p) & h \text{ even} \end{cases}

IV. Cubic-Root Companding for Mic Gate

Motivation

RMS values typically lie in:

[0.001,0.06][0.001, 0.06]

Linear mapping compresses usable range.

Companding

ρcomp=RMS1/3\rho_\text{comp} = \text{RMS}^{1/3}

Intensity normalization: ρint=clamp(1.8ρcomp0.04, 0, 1)\rho_\text{int} = \text{clamp}(1.8\rho_\text{comp} - 0.04,\ 0,\ 1)

Raw weight: wraw=clamp(0.78ρint+0.22χ, 0, 1)w_\text{raw} = \text{clamp}(0.78\rho_\text{int} + 0.22\chi,\ 0,\ 1)

Adaptive Noise Floor

ηk+1=ηk+0.025(min(0.25,wraw)ηk)\eta_{k+1} = \eta_k + 0.025(\min(0.25,w_\text{raw}) - \eta_k)

Final Gate

d0=clamp(wrawη0.005, 0, 1)d_0 = \text{clamp}(w_\text{raw} - \eta - 0.005,\ 0,\ 1) d=d00.65d = d_0^{0.65}

User threshold T: a=dT1Ta = \frac{d - T}{1 - T}

Exponent: ε=0.45+0.55T\varepsilon = 0.45 + 0.55T gopen=aεg_\text{open} = a^\varepsilon

Soft pre-trigger: p=d0.6T0.4Tp = \frac{d - 0.6T}{0.4T} gbelow=0.18p1.4g_\text{below} = 0.18 p^{1.4}

V. Mouth Center Calibration

Calibration key [C]: δx=xˉmouth0.5\delta_x = \bar{x}_\text{mouth} - 0.5 δy=yˉmouth0.5\delta_y = \bar{y}_\text{mouth} - 0.5

CSS shift: Tx=δxzWT_x = \delta_x z W Ty=δyzHT_y = -\delta_y z H

VI. Synthetic Impulse Response

IR(t)=(2u1)e6t/τr+kakδ(ttk)\text{IR}(t) = (2u-1)e^{-6t/\tau_r} + \sum_k a_k\delta(t - t_k) Early reflections:

kt_k msa_k L/R
1180.70 / 0.60
2320.43 / 0.50
3550.35 / 0.35

VII. Mobile Camera & Mic Acquisition

Fallback order:

exact deviceId
→ ideal deviceId
→ facingMode: user
→ video: true

Unified device listener:

navigator.mediaDevices.addEventListener('devicechange', async () => {
  await this.refreshCameraInputs();
  await this.refreshMicInputs();
});

VIII. Valve Mapping

Effective tube length: Leff=L0k=142vksk/12L_\text{eff} = L_0 \prod_{k=1}^{4} 2^{v_k s_k/12}

Tube frequency: ftube=c2Lefff_\text{tube} = \frac{c}{2L_\text{eff}}

IX. Web Instrument Paradigm

The innovation is not additive synthesis. The innovation is the control channel: Camera → Lip Geometry → Fractional Harmonics → Timbral Field

The web browser becomes:

  • A lip amplifier
  • A high-dimensional gestural interface
  • A programmable brass organology

X. Spectral Metrics

Centroid: C=fkXk2Xk2C = \frac{\sum f_k |X_k|^2}{\sum |X_k|^2}

Harmonicity:

H=harmonic energytotal energyH = \frac{\text{harmonic energy}}{\text{total energy}}

Brassiness index: Bs=0.6CCref+0.4(1H)B_s = 0.6\frac{C}{C_\text{ref}} + 0.4(1-H)

XI. Roadmap

  1. Full FaceMesh integration
  2. Calibrated tube-length tables
  3. Instrument switching (tuba / trumpet)
  4. Karplus–Strong mode
  5. Session export (JSON / extended MIDI)