[EAS] Question re CAP and text-to-speech

Wed Dec 7 13:02:00 CST 2011

Suzanne,

A CAP message consists of various text fields, and optional pointers 
to audio files.  A one way link will embed these audio files as data 
and not use pointers.  It can't be sent down a phone line (unless it 
is a data call).  Most likely, the output of a CAP/EAS device is 
sending audio to that phone line.

Per ECIG, if properly flagged audio file is present in a CAP message, 
it will be used.  Otherwise, a local TTS conversion is performed by 
the receiving device.

So, in general, if there is no provided audio, the local CAP/EAS 
system at the broadcaster (Or that the input to your phone line) will 
do the text to speech.

Some CAP origination systems will take the text that is going into 
the CAP message and convert it, at the server, to audio, and place 
that audio file on the server.  The resulting original CAP message 
will contain a pointer to that audio file, or on a one way system, 
the audio will be embedded in the original message.

 From the point of view of the originator, they supplied text 
only.  From the point of view of the broadcaster, they got 
audio.  The audio contains server generated TTS audio.

As always, there are many advantages and disadvantages.

1) The EOC provides text and audio spoken by a human.

Pro:  Always better, in my opinion.  Local pronunciation can be 
correct.  Less chance of "dial nine hundred eleven" and similar 
problems.  Other languages easily handled, at the source.
Con:  Someone at the EOC needs to voice it.  Mike fright, background 
noise, delay in reading, and delay in approvals can result in bad or 
delayed audio.  As much as a 1MB file will need to be delivered to 
the broadcasters (more if multiple languages are used) rather than 
about 4k for a text only CAP message.

2) The EOC provides no audio, and neither does the server.  Audio is 
done by the local device.

Pro: Less bandwidth is needed to get the message to the 
broadcasters.  One less approval needed by the EOC procedures, less 
latency in getting the message out.
Con: The audio generated by the inexpensive CAP/EAS equipment will 
never be as good as a real live human.  Place names, appreciations, 
odd pauses, pacing, etc. is never optimum.  Different vendor voices 
sound different, some are better at some words than others.  Multiple 
languages are limited.  Pronunciation training for each vendor device 
in the area must by done.

3) The TTS is done at the CAP origination server as part of the 
generation of the original CAP message.

Pro: This may be a reasonable compromise. Only one TTS engine must be 
trained in local place names and handling of abbreviations.  An high 
end (expensive) TTS generator can be used (becuase it exists only at 
the central location).  Retains advantages from EOC point of 
view.  Everyone in the area hears the same message, because is is 
delivered as an audio file to all devices.
Con: The large audio file still needs to be delivered.

Long answer to a short question.

That said, what do you mean by "send a CAP-enabled message down a 
dedicated phone line to our State Primary".  Unless that phone line 
is sending IP, it isn't really a CAP message.  It is probably an EAS 
message.  That will always only contain audio, no text, so the TTS 
has to happen before it gets to that phone line, in any of the ways 
discussed above.

Regards,
Harold

At 04:35 PM 12/6/2011, suzanne at mab.org wrote:
>Can one of you smart engineering types please answer a question for 
>a non-techie policy wonk?  At what point in the EAS message chain 
>does a text message translate into speech?  If our state Emergency 
>Management Agency were to send a CAP-enabled message down a 
>dedicated phone line to our State Primary (as a redundant backup to 
>internet delivery), would the message have to be "in voice"?  Or 
>does the translation happen at the receiving end, i.e. inside the 
>EAS box at the State Primary?
>