Hi , first , your envelope follower are too fast . 1,3 ms is 769 HZ .
Play a little with this values . For a 20 HZ Filter you need only a
20 HZ follower . For a 70 hz filter a 70 hz follower and so on.
It is very tricky to find the right values . Longer values - longer times- gives more a singing sound , shorter values the direct charakteristic frol the spoken words . If the values are too short, you modulate the synthesis filter (AC) with the frequence from the analysis filter. Than you got an Amplituden modulation and this dosn't sounds good .
Than , for speech synthesis remember than this frequency is in a range from 100...3000 hz
You can tune also the filter like normal notes above 2 octave. This can give a chord .
Than a trick . each second Filteroutput should be inverted and than add to the sum .
Synthesis Filter 1,3,5,7, is positive, 2,4,6,8 audio outputs are inverted . This was the same in the Elektor vocoder .
There was a Vocoder DIY in the Elektror from the 1980th . This had an interesting schematic
Your Basics are OK but the fine tunning must be done.
Than pitch shifting, or >>better formant shifting:. You can do it if you control all synthesis !!! filter simila.( not the analysis filter) It is not the Cheer Effect , but you can move your voice sound from a child to a man to a women .
Example Filter 1= 100 HZ , Filter 2 is 150 Hz , Filter 3= 400 Hz...184.108.40.206.8.
If you now control all Filter with the same Voltage but in V/OCT
you get 200,300.800,,,,,,,,,,,,,, and in the end you do a formant shifting. Because the analysis filter are the same , but the synthesis filter can move up or down .
Pitch shifting , This is possible only with the carrier. If you have a three osc carrier , you can sing with three voices in a chord . I suggest you a superwave osc .