If I may:
Formant = overtone shaping is as old as synthesis itself. In what we tend to consider classical synthesis, which is actually subtractive synthesis, the only real compromises made to emulate the influence of formants on a sound are:
1 - adding one resonant peak per available filter
2 - adding key follow so that high sounds may sound bright when low sound sound dull.
In reality shaping the propper overtones is much more complex. Every instrument, including the human voice, has a set of non-altering overtones that shape that instruments basic character. That is also why everybody has his own voice character (and I do not sound like Caruso).
The major question then is: Can you afford to add extra oscillators to create all the dynamic or static overtones you need to get the desired emulation? That is why additive synthesis is, at least in theory, the most powerful synthesis method of all but also the most diffcult to buidl and handle in practice.
If you cannot afford to add oscilators at will another approach must however be used. In stead of using dedicated extra oscillators per formant comb filtering is used to accentuate certain harmonics that are already present in a complex waveform. So this is actually one step below additive synthesis nad therefore a much more practical proposition as vocoders and string/choir ensembles have shown.
The reference to the Trautonium is in so far spot on that the later version that Oskar Sala developed already could be used as a sort of early additive synthesizer (although the same already applies to the classic pipe and Hammond organ) but also had an adjustable bank of non-dynamic filters that could (sort of) emulate fixed formants.
Circling back to the subject at last : Our main associations with vocal sounds are based on the fact that their formants change in a rather pronounced way when we change the shape of our vocal tract. Which brings us to the formant shapig in the FS1R: It uses an algorythm emulating multiple time dynamic filters to accentuate certain frequency ranges to do a very similar thing.
Without searching for the exact patent: In the 90s it became possible to program algorythms that could do complex filtering processes fast enough in real time. So software became fast enough to use all these old insights to actually make your synth sound in real time like somebody is using his mouth to shape the formants of an electronic sound. By the way: Such high processing speeds where also the reason why physical modeling, the original subject of this thread, also moved into focus at about the same time.
Further advances in DSP speed have led to realtime fast formant synthesis systems that can actually create the overtones themselves in stead of using a filter based approach but I expect the FSR1 to be too early for that. It is however possible that bits of Karplus Strong modeling where already involved (https://en.wikipedia.org/wiki/Karplus%E2%80%93Strong_string_synthesis).
Anyway: Never has a synth sounded as much as Peter Frampton doing his talkbox guitar solo on Show Me The Way.