Speech to Text

Arvancloud Video Platform’s AI services offer a convenient way to convert video audio into text. This feature enables you to generate subtitles for your videos or translate the video text into another language.

To access this feature, navigate to the “Video Transcript” page from the “Video Platform” menu, under the “AI” section.

Begin the process by clicking the “Create New Job” button. At this point, you have the option to select videos that you have already uploaded to your channels or upload a new video using a link.

After selecting the video, choose the desired service type from the options available, which include “Translation” and “Transcription.”

Translation

With the “Translation” option, the subtitle file for your video, from Persian, will be generated in English. To store the file, simply select an Object Storage Bucket located in the Shahriar region.

For guidance on creating a bucket in Arvancloud, refer to the provided guide on creating buckets in object storage.
Transcription

If you intend to convert your video’s speech into text, this option is the suitable choice. Currently, Persian and English are the only supported input file languages.

By selecting the input file language and specifying the output folder, you can initiate the video processing.

Please note that the accuracy of the generated text may be lower in files containing significant non-speech audio, such as music, background noise, or prolonged silences, compared to videos with clear, distinct speech.

Finally, click the “Start Upload” button to initiate the processing. The processing time will depend on the length of the video. You can monitor the processing status of the files on the “Video Transcript” page.

Upon successful completion of the processing, your desired output will be available in the selected Object Storage Bucket.

Additionally, you can utilize this generated text to subtitle the same video on the Arvancloud video platform.