How to make Siri a male voice. On an iPhone, iPad, or iPod touch

Siri is a voice assistant that was first introduced in 2011 along with iOS 5. Of course, since then it has seriously developed: it has learned to speak different languages(including in Russian), came to Mac computers, learned to interact with programs from third party developers etc., but he made a qualitative leap only with the announcement of iOS 10 - now his voice is based on deep learning, allowing it to sound more natural and smooth. What is deep learning and how is it synthesized? Siri voice- we’ll talk about this in this article.

Introduction

Speech synthesis - artificial reproduction human speech- widely used in various areas, from voice assistants to games. Recently, combined with speech recognition, speech synthesis has become an integral part of virtual personal assistants such as Siri.

There are two speech synthesis technologies used in the audio industry: selection sound units and parametric synthesis. Unit selection synthesis provides highest quality at sufficient quantity high quality speech recordings, and thus it is the most widely used speech synthesis method in commercial products. On the other hand, parametric synthesis provides very clear and smooth speech, but has a lower overall quality. Modern sound unit selection systems combine some of the advantages of the two approaches and are therefore called hybrid systems. Methods for selecting hybrid units are similar to those classical selection units, but they use a parametric approach to predict which sound units should be selected.

IN lately deep learning is gaining momentum in the field speech technologies, and is largely superior traditional methods, such as hidden Markov models (HMM), the principle of which is to solve unknown parameters based on the observed ones, and the obtained parameters can be used in further analysis, for example, for pattern recognition. Deep learning provided completely new approach to speech synthesis, which is called direct modeling waveforms. He can provide both high quality synthesis of unit selection and the flexibility of parametric synthesis. However, given its extremely high computational costs, it will not yet be possible to implement it on consumer devices.

How speech synthesis works

Creating a high-quality text-to-speech (TTS) system for a personal assistant is no easy task. The first step is to find a professional voice that sounds pleasant, easy-to-understand, and matches Siri's personality. To capture some of the variations in the vast diversity of human speech, it takes 10-20 hours of speech to be recorded in a professional studio. The recording scripts range from audiobooks to navigation instructions, and from hints to answers to witty jokes. Typically, this natural speech cannot be used in a voice assistant because it is impossible to record all possible utterances that the assistant may speak. Thus, the selection of sound units in TTS is based on cutting recorded speech into its elementary components, such as phonemes, and then recombining them according to the input text to create completely new speech. In practice, selecting appropriate speech segments and combining them with each other is not an easy task, since the acoustic characteristics of each phoneme depend on neighboring ones and the intonation of speech, which often makes speech units incompatible with each other. The figure below shows how speech can be synthesized using a speech database divided into phonemes:

The upper part of the figure shows the synthesized utterance “Unit Selection Synthesis” and its phonetic transcription using phonemes. The corresponding synthetic signal and its spectrogram are shown below. Speech segments, separated by lines, are continuous segments of speech from the database that can contain one or more phonemes.

The main problem of selecting sound units in TTS is to find a sequence of units (e.g. phonemes) that satisfy the input text and the predicted intonation, provided that they can be combined together without audible glitches. Traditionally, the process consists of two parts: front-end and back-end (incoming and outgoing data), although in modern systems the boundary can sometimes be ambiguous. The purpose of the front-end is to provide phonetic transcription and intonation information based source text. This also includes normalization of the source text, which may contain numbers, abbreviations, etc.:

Using the symbolic linguistic representation generated by the text analysis module, the intonation generation module predicts values for acoustic characteristics such as phrase duration and intonation. These values are used to select the appropriate sound units. The unit selection problem has high complexity, which is why modern synthesizers use methods machine learning, which can learn the correspondence between text and speech, and then predict the meanings of speech features from the meanings of the subtext. This model should be learned during the training phase of the synthesizer using large quantity text and speech data. The input to this model is numerical linguistic functions such as phoneme, word or phrase identification, converted into a usable numerical form. The output of the model consists of numerical acoustic characteristics of speech, such as spectrum, fundamental frequency, and phrase duration. During synthesis, a trained statistical model is used to map input text features to speech functions, which are then used to control the backend process of sound unit selection, where appropriate intonation and duration are important.

Unlike the front-end, the backend is largely language independent. It consists of selecting the desired sound units and concatenating them (that is, gluing them together) into a phrase. When the system is trained, the recorded speech data is segmented into individual speech segments using forced alignment between the recorded speech and the recording script (using acoustic speech recognition models). The segmented speech is then used to create a database of sound units. The database is being further updated important information, such as the linguistic context and acoustic characteristics of each unit. Using the constructed device database and the predicted intonation functions that determine the selection process, a Viterbi search is performed (at the top are the target phonemes, below are the possible sound blocks, the red line is their best combination):

The selection is based on two criteria: firstly, the sound units must have the same (target) intonation, and secondly, the units must, if possible, be combined without audible glitches at the boundaries. These two criteria are called target and concatenation costs, respectively. Target cost is the difference between predicted targets acoustic characteristics and the acoustic properties extracted from each block, while the concatenation cost is the acoustic difference between subsequent units:

Once the optimal sequence of units is determined, the individual audio signals are concatenated to create continuous synthetic speech.

Hidden Markov models (HMMs) are commonly used as a statistical model for target predictions because they directly model the distributions of acoustic parameters and thus can be easily used to calculate the target cost. However, deep learning-based approaches often outperform HMMs in parametric speech synthesis.

The goal of Siri's TTS system is to train a single deep learning-based model that can automatically and accurately predict both target and concatenation costs for audio units in the database. Thus, instead of HMM, it uses a mixture density network (MDN) to predict distributions for certain characteristics. SPNs combine conventional deep neural networks (DNNs) with Gaussian models.

A conventional GNS is an artificial neural network with several hidden layers of neurons between the input and output layers. Thus, a DNN can model the complex and nonlinear relationship between input and output characteristics. In contrast, an HMM models the probability distribution of the output given the input using a set of Gaussian distributions, and is typically trained using the expectation maximization method. SPS combines the advantages of DNN and HMM, using DNN to model the complex relationship between input and output data, but providing a probability distribution at the output:

Siri uses unified target and concatenation models based on SPS, which can predict the distribution of both target speech characteristics (spectrum, pitch and duration) and concatenation costs between sound units. Sometimes speech features, such as affixes, are quite stable and develop slowly - for example in the case of vowels. Elsewhere, speech may change quite quickly - for example, when transitioning between voiced and unvoiced speech sounds. To account for this variability, the model must be able to adjust its parameters according to the aforementioned variability. ATP does this using biases built into the model. This is important for improving the quality of the synthesis, since we want to calculate target and concatenation costs specific to the current context.

After counting the units based on the total cost using ATP, a traditional Viterbi search is performed to find the best combination of sound units. These are then combined using waveform overlap matching to find optimal concatenation times to produce smooth and continuous synthetic speech.

Results

To use SPS in Siri, a minimum of 15 hours of high-quality speech recordings were recorded at a frequency of 48 kHz. Speech was divided into phonemes using forced alignment, that is, automatic speech recognition was applied to align the input audio sequence with the acoustic characteristics extracted from the speech signal. This segmentation process resulted in the creation of approximately 1–2 million phonemes.

To carry out the process of selecting sound units based on the SPS, a single target and concatenation model was created. The input data to the SPS consists primarily of binary values with some additional functions, which represent information about the context (two preceding and following phonemes).

Quality new system TTS Siri is superior to the previous one - this is confirmed by numerous tests in the picture below (interestingly, it was the new Russian voice of Siri that was rated best):

The best sound quality is associated precisely with the database based on SPS - this ensures best choice and concatenation of sound blocks, more high frequency sampling (22 kHz vs. 48 kHz) and improved audio compression.

Read the original article (required good knowledge English and physics), and you can also listen to how Siri’s voice changed in iOS 9, 10 and 11.

Siri - faithful assistant every appleman. With this awesome system you can search the weather, call friends, listen to music and so on. The function speeds up the process of finding any things you need. Let's say you ask Siri to show the weather for today in St. Petersburg, and she happily helps you. They say that very soon she will be able to listen to people, since many often complain to her about their problems, and she only soullessly offers the number of the nearest psychological service.

So, let's imagine that you might be tired of her voice and would like to change it. Many people think that this is impossible, but in fact the work here only takes about twenty seconds.

Step one.

Let's go to settings. If anything, the icon is usually located on the first page of the desktop or in the “Utilities” folder.

Step two

After we have found the application, we look for the Siri column. As you know, this item is located in the third section of the program.

Step three.

Next to the Siri inscription, switch the position of the button to on mode. If this has already happened, then skip this step.

Step four

Go to the “Voice” section and select the option that you like best. Here you can learn different accents, as well as change the gender of the speaker. Not all languages have an accent, but most do. In general, this is not the main thing, since after a while the application itself begins to adapt to you.

Would you like to have a personal assistant on your iPhone? For example, so that you can plan your day, week and even month, and someone in in a pleasant manner reminded about important matters, schedule your meetings, direct activities, make calls or send emails directly from your smartphone. Such an intelligent voice interface program Siri on iPhone was developed in Russia project team SiriPort.

The individual characteristics of the Siri voice assistant meet today's innovative requirements for creating artificial intelligence. The application is super smart and can fully carry out voice commands from all possible actions on a smartphone: call people from your contact list, send messages, find necessary information, create bookmarks and task texts without using the smartphone keyboard, but only the voice interface. This article will tell you how to install Siri on an iPhone 4 or iPhone 5 or 6 generations.

The new licensed personal assistant application is a voice recognition program, and all Apple devices have it installed. It should be added that the voice assistant works based on iOS 7 on iPhone 4S devices using Siri, Siri on iPhone 5, on iPhone 5S, iPhone 6, iPhone 6S, iPhone 7 generation. In addition, the assistant can serve iPad Mini, Mini 2 and Mini 3, is also present on the 5th generation iPod Touch, on Apple Watch devices, and also works on the 3rd generation iPad and above.

After the release of iOS 8.3, Siri iPhone can be configured in Russian. The iOS 10 system on new generation devices also takes into account great opportunities voice assistant. This makes it much easier to find and remember personal information, saving, as they say, time and money.

Want to know how to enable Siri on iPhone?

For example, if you don’t know how to turn on Siri on iPhone 4 - 7 or don’t understand how to turn off Siri, then we’ll proceed step by step. Consider voice assistant on iPhone 4S or iPhone 6S using voice assistant. First, you should find out whether the application is installed on the iPhone 4 or iPhone 6S and why Siri does not work on the iPhone. If it turns out that the assistant program cannot be run on the iPhone, do not despair, you can install other quite similar alternative programs, for example, the “Dragon Go!” program developed by the Nuance Company, which will be able to access other programs installed on the iPhone, such as Google, Netflix, Yelp and others.

If the voice assistant was installed on the iPhone upon sale, most likely it will be in active state default. To check this, hold down the Home button on your iPhone. Siri will beep when it is ready to use. You can give a command by voice: for example, say clearly out loud: “Check your mail!”

If Siri is not activated as required, you can do it yourself in the following way. Open the main screen of your phone and click “Settings”, find the “Basic” folder and, knowing how to use it, launch the “Siri” application. However, when working with a smart program, you can give a dozen tasks to an assistant, communicating out loud. Try saying a greeting such as “Hey!” or “Hey Siri!”, or, say, “What’s the weather, Siri?” In addition, you can determine the gender of your assistant by selecting it in the settings section.

How to change Siri voice or language

If the voice assistant communicates with you in a language you don't understand, you can change its language. To do this, find Siri in the iPhone’s “settings” menu, select the “Siri Language” command. A list of language options will open in front of you and, after scrolling through, select the one you need, with the help of which the assistant will communicate with you in the future.

If you want to program the communication style of an individual assistant, configure not only her voice, but also the established style of address, various phrases that you will be pleased to hear. To this end, go to the “Settings” section again, launch the “Siri” program, find the command line “ Audio feedback” and accordingly activate the communication option that suits you.

By the way, the developers of this software product We prudently introduced into the consciousness of the voice assistant the ability to recognize voices, intonation, accent and even dialect; it understands any languages.

Siri mode in the car

Turning on the Siri app can make your tasks a lot easier by selecting a map the right direction when you are driving a car. To do this, the car must support software CarPlay or use the “no looking” function available in this program. To use the services of the assistant, you need to call him by pressing the voice command button located directly on the steering wheel of the car and give Siri the appropriate command.

If your car has a CarPlay-enabled touchscreen, activate Siri by accessing the Home button from the screen menu. If you voice a command, the assistant waits for a pause in speech before it begins to execute. But, if the car is very noisy, it is better to respond with a button located on the screen that transmits sound wave, and then Siri will guess that you are finished and will begin to complete the assigned task. If necessary, by going into your iPhone settings, you can also read how to disable Siri.

You can also connect the assistant to the source via a Bluetooth headset, as well as via a USB cable. In this case, perform all actions in the same order.

iPhone and iPad users can now enter text queries and commands for Siri. But there is one point here. In beta versions of iOS 11, you need to choose between text and voice typing. When Typing for Siri is enabled, Siri will not accept voice commands. It would be much more convenient if Siri could switch between these options automatically. Perhaps the manufacturer will take this into account in future versions.

How to use Siri text commands:

To enable text commands for Siri in iOS 11, do the following:

Step 1: Open the Siri and Search section and activate the Listen to “Hey Siri” option.

Step 2: Go to Settings > General > Accessibility > Siri.

Step 3. Turn on the switch next to the “Text input for Siri” option.

Step 4: Press and hold the Home button. Now instead of the usual sound signal the screen will display the question “How can I help” and a standard keyboard.

Step 5: Simply enter your query or command and click Finish.

Siri's response will be displayed as text. If the virtual assistant does not understand the task, you can click on the request and edit it.

External keyboard

Voice requests to Siri also work with external keyboard on iPad. The presence of a Home button (like on the Logitech K811) makes the input process even more convenient. By pressing a key and specifying a command for Siri, the user can perform simple tasks, such as sending a message, playing music, or creating a note.

This kind of functionality is especially important now that Apple is positioning the iPad Pro as a computer replacement. iOS is gradually becoming an operating system professional level, which is closely related to hardware, is always connected to the Internet and is constantly in a person’s pocket.

Lately in our daily life Voice assistants are becoming more and more widespread. Most users of the iPhone and other Apple products are familiar with one of them - Siri, but few understand all the prospects of virtual assistants and know how to use all their capabilities and functions.

What is a voice assistant

Imagine, your devoted friend is always next to you, who is ready to talk to you at any time of the day or night, answer any of your questions and carry out instructions. At the same time, he never gets tired, he never has bad mood, and every day he becomes smarter and understands you better. These are the voice assistants that are available for everyday use today.

Voice assistants are built into computers, tablets, phones, smart watches, smart speakers, and even cars. It is important to understand that interaction with the voice assistant is carried out exclusively by voice, without using your hands, without pressing any buttons. This is fundamental new way interaction between a person and a program, which is very similar to communication between people.

Siri from Apple.
Google Assistant Google company.
Alexa from Amazon.
Alice from Yandex.

We have already written about and, and in this article we will talk in detail about Siri.

Voice assistant Siri

Siri is a voice assistant that was the first to support the Russian language, and only then did the domestic one appear, released at the end of 2017, and even later in the summer of 2018 it spoke Russian. Siri recognizes Russian speech quite well, even if music is playing nearby or there are extraneous noises.

Siri on iPhone SE

Siri wasn't always owned by Apple. Initially, it was a separate application in App Store for iOS. In 2010, Apple acquired Siri Inc. and their unique development. Shortly after the purchase, Apple built Siri into the iPhone 4S and then into its subsequent devices. Then, in 2011, Siri became the first product on the personal voice assistant market.

Siri adapts to each user individually, studies his preferences and begins to better understand his “master.” This is primarily noticeable in the improvement of your voice recognition after the first weeks of use. You can also tell Siri how to address you and the names of your contacts in your address book, so it can understand you better. And when Siri pronounces names incorrectly, you can always correct her and show her the correct accent.

Siri is available on iPhone, iPad, Mac, Apple Watch, Apple TV and almost all modern cars via the CarPlay function. The way you launch Siri and the list of available commands varies depending on your device.

How to launch Siri on iPhone, iPad and iPod touch

Launch by pressing the Home button

Siri is available on all iPhones, starting with iPhone 4s, on operating iOS system 5 and above. To launch Siri on an iPhone (excluding iPhone X), you need to press and hold the center Home button.

To launch Siri on iPhone X, you need to press and hold the side button.

After the beep, you can make a request. On some devices, you must wait for Siri to appear on the screen before giving a command.

Hey Siri - How to enable Siri with your voice

Siri can be launched solely using your voice, without pressing any buttons at all. All you have to do is say, “Hey Siri.” After the sound signal, you can ask a question or give a command.

To do this, the “Hey Siri” function must be activated on the device: Settings → Siri and search → Listen to “Hey Siri”.

On all iPhone models, starting with the iPhone 6s, as well as on the iPad Pro, this function can be used at any time by saying “Hey Siri” so that the gadget’s microphones can pick it up. On earlier iPhones and iPads, the always-listening feature only works when your device is connected to a charger.

How to enable Siri on headphones

Using an original Apple headset with buttons remote control or compatible Bluetooth headphones, you can activate Siri by pressing the center button or the call button. After the beep, you can make a request.

Using Apple's AirPods to launch Siri twice touch outer surface any headphone.

Siri on Mac

Siri is available on Mac computers with macOS 10.12 Sierra and later operating system. However, on at the moment The functionality of the voice assistant on a Mac is limited. All Siri can do here is make FaceTime calls, write messages, play music, show the weather forecast and help you work with files and folders.

Siri on Mac

It is worth noting that working with files on a computer using a voice assistant is really convenient. Siri can carry out quick search files, sort them by type, date or keyword. For example, if you tell Siri, “Show me my photos from yesterday,” a folder with the corresponding media files will open.

There are several ways to activate Siri on Mac:

There will likely be more commands for Siri in future versions of macOS, including commands for HomeKit. This would be a logical continuation of Apple's integration of voice assistant into its laptops and desktops.

Siri functions

Siri, a personal assistant, can answer questions, make recommendations, and carry out commands. Let's look at some of them.

This is just a small part of everything that Siri can do. Get acquainted with a large number commands can be found in our article about commands for Siri. You will find a complete list of commands for the voice assistant in iPhones and Home Pod smart speakers in our reference mobile application, which we update regularly. You can download the Siri Commands app for free. By installing it, you will always have the best current list commands for the voice assistant.