[EAS] Question re CAP and text-to-speech
Harold Price
hprice at sagealertingsystems.com
Wed Dec 7 13:02:00 CST 2011
Suzanne,
A CAP message consists of various text fields, and optional pointers
to audio files. A one way link will embed these audio files as data
and not use pointers. It can't be sent down a phone line (unless it
is a data call). Most likely, the output of a CAP/EAS device is
sending audio to that phone line.
Per ECIG, if properly flagged audio file is present in a CAP message,
it will be used. Otherwise, a local TTS conversion is performed by
the receiving device.
So, in general, if there is no provided audio, the local CAP/EAS
system at the broadcaster (Or that the input to your phone line) will
do the text to speech.
Some CAP origination systems will take the text that is going into
the CAP message and convert it, at the server, to audio, and place
that audio file on the server. The resulting original CAP message
will contain a pointer to that audio file, or on a one way system,
the audio will be embedded in the original message.
From the point of view of the originator, they supplied text
only. From the point of view of the broadcaster, they got
audio. The audio contains server generated TTS audio.
As always, there are many advantages and disadvantages.
1) The EOC provides text and audio spoken by a human.
Pro: Always better, in my opinion. Local pronunciation can be
correct. Less chance of "dial nine hundred eleven" and similar
problems. Other languages easily handled, at the source.
Con: Someone at the EOC needs to voice it. Mike fright, background
noise, delay in reading, and delay in approvals can result in bad or
delayed audio. As much as a 1MB file will need to be delivered to
the broadcasters (more if multiple languages are used) rather than
about 4k for a text only CAP message.
2) The EOC provides no audio, and neither does the server. Audio is
done by the local device.
Pro: Less bandwidth is needed to get the message to the
broadcasters. One less approval needed by the EOC procedures, less
latency in getting the message out.
Con: The audio generated by the inexpensive CAP/EAS equipment will
never be as good as a real live human. Place names, appreciations,
odd pauses, pacing, etc. is never optimum. Different vendor voices
sound different, some are better at some words than others. Multiple
languages are limited. Pronunciation training for each vendor device
in the area must by done.
3) The TTS is done at the CAP origination server as part of the
generation of the original CAP message.
Pro: This may be a reasonable compromise. Only one TTS engine must be
trained in local place names and handling of abbreviations. An high
end (expensive) TTS generator can be used (becuase it exists only at
the central location). Retains advantages from EOC point of
view. Everyone in the area hears the same message, because is is
delivered as an audio file to all devices.
Con: The large audio file still needs to be delivered.
Long answer to a short question.
That said, what do you mean by "send a CAP-enabled message down a
dedicated phone line to our State Primary". Unless that phone line
is sending IP, it isn't really a CAP message. It is probably an EAS
message. That will always only contain audio, no text, so the TTS
has to happen before it gets to that phone line, in any of the ways
discussed above.
Regards,
Harold
At 04:35 PM 12/6/2011, suzanne at mab.org wrote:
>Can one of you smart engineering types please answer a question for
>a non-techie policy wonk? At what point in the EAS message chain
>does a text message translate into speech? If our state Emergency
>Management Agency were to send a CAP-enabled message down a
>dedicated phone line to our State Primary (as a redundant backup to
>internet delivery), would the message have to be "in voice"? Or
>does the translation happen at the receiving end, i.e. inside the
>EAS box at the State Primary?
>
More information about the EAS
mailing list