YouPlay
Working memory
YouPlay is being built as a performance instrument in which prompt, interface, audio routing, and symbolic emergence are the same object. It is not only a player or a search tool. It is a conductor surface where language triggers a slow transformation of environments, artifacts, mic resonance, and visual matter.
The current prototype combines:
- prompt conduction in a full-screen performative text surface
- emergent floating words as clickable symbolic agents
- rotating YouTube layers for continuous external environments
- Audio-Alpaca layers for WebAudio-processable semantic audio
- Freesound preview layers for additional field textures
- microphone input as a resonant live body inside the matrix
- a 13x13 matrix HUD where source and processor states become moving cursors
- a tunnel / bloom-field visual engine built in Three.js plus custom shader passes
The central idea is that the prompt is not metadata. It is an event score.
Semantic engine and custom Ollama API
The intelligence layer is mediated by the custom correction service configured through:
CORRECTION_API_URLCORRECTION_API_TOKENCORRECTION_API_TIMEOUT_MS
/api/youplay/prompt-shape acts as the bridge between the interface and the remote Ollama service. The service receives text and returns a structured response that is reused as a shaping device for:
- prompt clarification
- phrase decomposition
- emergent term extraction
- semantic steering of searches
This is important conceptually: the AI is not being used as a final authoring agent, but as a semantic destabilizer and refiner. It listens to the current textual gesture and re-injects it as a more musically productive set of phrases and associated words.
In practical terms the AI layer currently does the following:
- Takes the raw prompt or clicked word.
- Sends it to the local server-side shaping endpoint.
- Uses the Ollama-backed response to propose a refined prompt.
- Extracts short phrases that can be routed to YouTube, Alpaca, or Freesound.
- Produces emergent terms that reappear in the visual field as possible next gestures.
This creates a loop where language is never static. Prompt becomes prompt-memory, then becomes search phrase, then becomes audio layer, then becomes visible word again.
Emergence behavior
Emergence in YouPlay is not random decoration. It is a staged ecology of terms.
The term pool is assembled from:
- the latest committed prompt
- decomposed phrases from that prompt
- recently active prompts
- terms extracted from source titles and metadata
- fallback vocabulary used as a low-level attractor field
These words are weighted, visualized, and recycled. They move against the main object flow, occupy border-heavy positions, and can be clicked to bias the next prompt transformation. This means the cloud is not a tag list. It is a symbolic counter-current.
The emergence logic has three simultaneous roles:
- Memory: it remembers what the system has already heard or searched.
- Suggestion: it proposes new micro-deviations for the performer.
- Circulation: it keeps language circulating between sound, object, and score.
The result is a proto-ecology of conducting words.
Semantic performance in a musical context
YouPlay treats musical performance as semantic performance. Instead of separating score, instrument, and interpretation, it lets the performer act through linguistic tension:
- naming
- recombining
- selecting
- interrupting
- re-voicing
Every entered word has consequences in at least four domains:
- It changes search behavior.
- It changes visual symbolic matter.
- It changes layer occupation across buses.
- It changes the future prompt field by feeding the emergent vocabulary.
This makes the act of typing, clicking, dictating, or speaking into the mic a musically operative gesture. A word is no longer a label for music. It is one of the forces that composes it.
Hybrid source model
YouTube
YouTube is used for persistent environmental audio layers through the iframe API. It provides continuity, cultural residue, found ambience, and unstable semantic contamination. Because it is iframe-based, it cannot be inserted directly into the WebAudio processor matrix, so its control is limited to layer-level volume and pseudo-pan behavior.
Audio-Alpaca
Audio-Alpaca is the most direct bridge between language and processable sound. Prompt-related search results are routed into WebAudio, where they can feed the internal buses and processors. This is the semantic audio core of the system.
Freesound
Freesound previews provide a third source family based on searchable environmental and material recordings. The current route uses preview OGG/MP3 assets rather than original uploads, in order to stay lightweight and immediately streamable inside the browser.
Mic input
The microphone is the embodied bus. It transforms the performer into one more active layer, not an external controller. When active, the mic is routed into the matrix and becomes subject to moving processor states, especially delay and future effect-node mappings.
Matrix logic
The matrix is currently conceived as a living interconnection diagram rather than a static mixer.
It contains:
- source buses
- effect buses
- cross-routes between effects
- floating XY cursor labels that represent bus state through position
The main conceptual mapping is:
Xas a horizontal parameter domain, often pan or effect primary valueYas a vertical parameter domain, often gain or feedback
For delay, the intended mapping is explicit:
X-> delay time from0to4000 msY-> feedback from0to1
The De label should therefore not be read as a number display. It is a spatial cursor that materializes processor state.
The matrix should always feel like a slow choreography of latent relations rather than a deterministic mixer.
Prompt as object, interface, and score
The prompt surface is deliberately oversized and central because it is a performance body.
It behaves simultaneously as:
- interface
- symbolic instrument
- score fragment
- output surface
The cursor is not only utilitarian. It is a temporal indicator, almost a conducting baton, marking the potential site of inscription.
The full-screen prompt is therefore not a text box in the usual UI sense. It is a stage where inscription happens before audio replacement. The system tries to preserve continuity by fading new searches onto other buses rather than replacing the whole scene at once.
Object + interface + resonant body + symbolic artifact
One of the underlying goals of YouPlay is to melt together four categories usually kept apart:
- Object: visual shapes, props, devices, rhombuses, tunnel forms.
- Interface: prompt field, HUD, matrix cursors, service monitor.
- Resonant body: the mic input and the human performer.
- Symbolic artifact: words, labels, prompts, semantic fragments.
In the piece these are not four layers of representation. They are four modes of the same artifact.
The mic is both body and source. The floating word is both suggestion and symbolic object. The prompt is both interface and score. The processor cursor is both data display and compositional agent.
This is the main aesthetic memory to preserve while continuing development.
Current behaviors already established
- prompt shaping through the custom Ollama-backed API
- phrase decomposition into multiple search routes
- slow fade-in / fade-out prompt replacement logic
- non-sudden round-robin loading across buses
- clickable floating words in WebGL
- mic on by default when the engine starts, when allowed
- STT scaffold upgraded to browser speech-recognition control
- animated matrix nodes for source and processor buses
- active delay mapping as a first real processor movement
- procedural bloom field and softened godray post-processing
Constraints and current caveats
- YouTube buses still have iframe limitations and do not enter the WebAudio matrix as native sources.
- Browser STT depends on
SpeechRecognition/webkitSpeechRecognitionsupport. - Some matrix processors are still visually staged before being fully mapped to DSP parameters.
- The note should be updated as the bus topology expands, especially once more YouTube buses and richer effect-node mappings are added.
Direction
The direction is not toward a polished fixed UI. It is toward an unstable semantic instrument where:
- AI prompt shaping
- found audio
- embodied input
- visual objects
- symbolic residue
all participate in one musical surface.
The strongest criterion for future changes should be this:
Does the feature make YouPlay feel more like a living conductor-instrument, where language, resonance, and symbolic matter continuously fold into one another?