Tag Archives: rtmp

2018-03-17 малък видео setup

Post Syndicated from Vasil Kolev original https://vasil.ludost.net/blog/?p=3381

Събирам (засега основно в главата си) setup за видео streaming и запис в hackerspace-овете в България. Изискванията са:

– минимална инвестиция в нов хардуер;
– (сравнително) лесно за използване (предполагам, че хората там са поне донякъде технически грамотни);
– възможност за stream-ване на текущите платформи, и може би и в тяхната си страница;
– запис/архивиране;
– поносимо качество.

Целта на setup-а е да се справи с най-простия тип събитие, което е един лектор с презентация.

Компонентите са следните:

– запис на звука – може да е от въздуха, но по-добре една брошка на лектора, + запис на залата по някакъв начин, за въпроси и т.н.;
– усилване на звука – дори в малка зала е добре да се усили звука от лектора и да се пусне на едни колони, най-малкото има feedback дали си е пуснал микрофона;
– видео запис – да се запише видеото от презентацията и може би самия лектор как говори. Това има варианта с камера, която снима лектора и екрана, или screen capture, директно от лаптопа му (или някой по-сложен setup, за който вероятно няма смисъл да пиша);
– streaming – да се извадят аудио/видео сигнала в/у някакъв протокол и да се stream-нат до някоя услуга;
– restreaming – услугата да го разпрати навсякъде и може би да го запише.

Вариантите за компоненти/setup-и в главата ми са следните:

– ffmpeg команда, която stream-ва екрана + звук от звуковата карта, в която има един свестен микрофон – това го имаме в няколко варианта, тествани и работещи (за windows и linux), трябва да ги качим някъде. Това е най-бързия начин, почти не иска допълнителен хардуер (освен един микрофон, щото тия на лаптопите за нищо не стават). Микрофонът може да е например някоя bluetooth/usb слушалка, или просто от слушалки с микрофон, да е близо до главата на лектора. Може да е от стандартните брошки, които се използват по различни събития, аз имам една китайска цифрова, дето в общи линии ме радва и е около 200-и-нещо лева от aliexpress;

– проста малка камера, която може да записва видео от екрана и звук, която може да бълва и по IP някакси. Това в общи линии са gopro-та (ако се намери как да им се пъхне звук) и още някакви подобни камери, които нямат особено добро качество (особено на звука, та задължително трябва външен микрофон), но на хората и се намират.

– проста камера, която обаче не може да бълва по IP, и има HDMI изход. Това е от нещата, които на хората им се намират по някакви причини, и в тая категория са половината DSLR-и и фотоапарати (които не прегряват след дълга (2-часова) употреба), gopro-та и нормален клас камери. Това се комбинира с устройство, което може да capture-ва HDMI и да го stream-ва, където засега опцията е един китайски device.

– streaming service – човек може да ползва youtube, моя streaming, или ако се мрази, facebook. Много места би трябвало да могат да си пуснат нещо просто при тях (например един nginx с модула за rtmp), да stream-ват до него, то да записва, и от него да restream-ват на други места и да дават някакъв лесен начин на хората ги гледат (с едно video.js/hls.js, както последно направихме за openfest).

Та, за момента основните неща, които издирвам са:

– евтини и работещи микрофони;
– евтини работещи камери с hdmi изход (или с ethernet порт, тва с wifi-то е боза), които да са switchable м/у 50hz и 60hz;
– hdmi capture вариант.

Приемам идеи, и ще гледам да сглобя едно такова за initLab.

Security updates for Wednesday

Post Syndicated from ris original https://lwn.net/Articles/723690/rss

Security updates have been issued by CentOS (libtirpc and rpcbind), Debian (libtasn1-3, libtasn1-6, and samba), Fedora (FlightGear, openvpn, and python-fedora), openSUSE (libtirpc and libxslt), Oracle (libtirpc and rpcbind), Red Hat (samba, samba3x, and samba4), Scientific Linux (samba and samba4), SUSE (java-1_7_0-ibm, java-1_7_1-ibm, java-1_8_0-ibm, samba, and tomcat), and Ubuntu (jbig2dec, miniupnpc, rtmpdump, and samba).

Security updates for Monday

Post Syndicated from ris original https://lwn.net/Articles/722774/rss

Security updates have been issued by Arch Linux (git, lxc, openvpn, and zziplib), Debian (bind9, bitlbee, postgresql-9.4, rtmpdump, sane-backends, and squirrelmail), Fedora (ghostscript, git, kdelibs, kf5-kauth, libplist, libreoffice, openvpn, php-horde-ingo, qemu, radicale, rpcbind, and xen), and Ubuntu (git and kde4libs).

Security updates for Wednesday

Post Syndicated from ris original https://lwn.net/Articles/722356/rss

Security updates have been issued by CentOS (bind, java-1.7.0-openjdk, qemu-kvm, and thunderbird), Debian (git, libtirpc, lxterminal, radicale, rpcbind, and xen), Fedora (batik, java-1.8.0-openjdk-aarch32, kernel, pcre, and weechat), Gentoo (ffmpeg, firefox, libav, and thunderbird), Red Hat (flash-plugin, jasper, java-1.6.0-ibm, java-1.7.1-ibm, java-1.8.0-ibm, and qemu-kvm), Scientific Linux (jasper and qemu-kvm), and Ubuntu (apache2, batik, fop, freetype, and rtmpdump).

Security advisories for Monday

Post Syndicated from ris original https://lwn.net/Articles/713770/rss

Arch Linux has updated gst-plugins-bad (two vulnerabilities), gst-plugins-base-libs (multiple vulnerabilities), gst-plugins-good (multiple vulnerabilities), gst-plugins-ugly (two vulnerabilities), and gstreamer (denial of service).

CentOS has updated ntp (C7; C6:
multiple vulnerabilities), spice (C7: two
vulnerabilities), and spice-server (C6: two vulnerabilities).

Debian has updated svgsalamander (server-side request forgery).

Debian-LTS has updated libphp-phpmailer (information disclosure).

Fedora has updated epiphany (F25:
multiple vulnerabilities), iio-sensor-proxy
(F25: unspecified), jasper (F24: code
execution), thunderbird (F25; F24: multiple vulnerabilities), and wavpack (F24: multiple vulnerabilities).

Gentoo has updated rtmpdump (multiple vulnerabilities).

Mageia has updated java-1.8.0-openjdk (multiple vulnerabilities),
openssl (three vulnerabilities), php (multiple vulnerabilities), phpmyadmin (two vulnerabilities), and thunderbird (multiple vulnerabilities).

openSUSE has updated cpio (42.2,
42.1: out-of-bounds write), gnutls (42.2,
42.1: multiple vulnerabilities), GraphicsMagick (42.2; 42.1:
multiple vulnerabilities), gstreamer-0_10-plugins-bad (42.2: code
execution), libgit2 (42.1: multiple
vulnerabilities), and virtualbox (42.2: multiple vulnerabilities).

Oracle has updated spice (OL7:
two vulnerabilities) and spice-server (OL6:
two vulnerabilities).

Red Hat has updated ntp (RHEL6,7:
multiple vulnerabilities), spice (RHEL7:
two vulnerabilities), and spice-server
(RHEL6: two vulnerabilities).

Scientific Linux has updated ntp
(SL6,7: multiple vulnerabilities), spice
(SL7: two vulnerabilities), and spice-server (SL6: two vulnerabilities).

SUSE has updated spice (SLE12-SP2; SLE12-SP1; SLES12; SLE11-SP4: two vulnerabilities).

2017-01-07 streaming

Post Syndicated from Vasil Kolev original https://vasil.ludost.net/blog/?p=3336

Малко наблюдения по stream-ването и платформите.
(днес stream-вах учредителното събрание на Да България (в която вече съм и член). По темата за Да България ще пиша някой друг път)

Facebook имат една от най-малоумните streaming платформи, на които съм попадал. Освен разните изисквания и горна граница на качеството (максимумът е 720p), event-ите им expire-ват доста бързо (т.е. не можеш да си го създадеш от предната вечер, като в youtube), на едно-две прекъсвания приключват и event-а (ако се налага човек да си пипне настройките няколко пъти, трябва да го създава наново) и имат и лимит за продължителността (което е особено дразнещо). Като за капак, човек ако няма flash не може да си пусне stream-а да е live каквото и да прави, та докато си правех тестовете се наложи да си ползвам виртуалката с windows (и тестовия facebook account на жената).

Youtube пък имат едно малко неразбирателство с ffmpeg, че пращат някакъв keepalive по RTMP сесията, който ffmpeg-а го няма за нищо, не го чете и в един момент едни tcp буфери се напълват (говорим за 16-тина байта на минута-две, та отнема няколко часа, че да се прояви) и се троши връзката. Слава богу, не махат event-а толкова бързо, колкото facebook и може да се рестартира.

Моя си streaming server си работеше най-добре (един nginx с mod_nginx_rtmp). Понеже имаше малко проблеми да reencode-вам всичко локално, бълвах на 10mbps директно изхода от хардуерния encoder до marla, от там дърпах с 3 ffmpeg-а и качвах смачкания на 1mbps stream до facebook, youtube и до същия nginx, за да мога да си го гледам.

И да си имам редът за бълване до facebook (понеже ми отне един следобед да го докарам както трябва, най-вече заради борбата с оня flash) – двете важни неща за -g 45 (може и 60), имат изискване за keyframe поне на всеки 2 секунди, и -r 30, понеже изискват 30-кадрово видео. Другото е стандартно – H.264, AAC, 44100hz (и моно звук, понеже такъв ми подаваха). Добавката с -af volume=60d трябва да се махне за всички, на които не им подават звук на ужасяващо ниски нива.

(за всички, които имаха забележки за звука, няма хубав automatic gain control, който бих могъл да сложа, за да изравнява добре нивата на това, което влиза. В залата микрофоните се чуваха с различна сила, много хора знаеха къде да говорят и нямаше и как да се направи нещо повече. Аз лично щях да търся начин да окича колкото мога повече от участниците с headset-ове, което па екипът ни много много го мрази)

ffmpeg -i rtmp://strm.ludost.net/st/XXXXX -r 30 -c:v libx264  -b:v 1000k -s 1280x720 -preset:v veryfast -threads 6 -minrate 1000k -r 30 -g 45 -maxrate 1000k \
	-c:a libfaac -ar 44100 -ac 1 -b:a 128k -af 'volume=60d' -f flv 'rtmp://rtmp-api.facebook.com:80/rtmp/XXXXXX'

Иначе е неприятно, че трябва да живеем на proprietary кодеци. При някакви скорошни тестове около FOSDEM пак се оказа, че VP9 още няма как да се encode-ва в близо до реално време без поне два пъти процесорната мощ, нужна за H.264, има малко поддръжка и никакъв хардуер, който може да го дава.

2016-11-21 VoctoMix на OpenFest

Post Syndicated from Vasil Kolev original https://vasil.ludost.net/blog/?p=3327

Ситуация

За OpenFest имахме две кутии от тези от FOSDEM, които в общи линии ни даваха възможност да включим почти произволен източник на видео в тях и да го изкараме от мрежата. Те вървят в комплект от две – едната се използва, за да се включи в нея лектора, другата – за камерата. С тези две кутии и малко софтуерно видео миксиране може да се направи много лесно добър setup за видео запис на една зала.

Схемата на setup-а може да се види в github, като лесно може да се види, че е доста по-прост от другите, които използваме. Негов вариант мислим да използваме за FOSDEM 2017 (което може да се наблюдава в repo-тата в github – issue-та, wiki и всякакви работи).

VoctoMix

Липсващият компонент в цялата работа беше софтуерен миксер, който да ползваме. Пробвахме различни – първо един ffmpeg с малко patch-ове (чупи се твърде лесно), после OBS (който leak-ва памет като гламав и не е особено стабилен), и накрая се спряхме на voctomix, който е разработка на CCC и в общи линии е прекрасен хакерски инструмент, който работи по следния начин:

  • Има входове на TCP портове за следните неща:
    • Видео потоци (камери, лекторски лаптоп)
    • Поддържащи потоци (фон, какво да се пуска докато не сме live и т.н.)
    • Команди за разни действия (смяна на картина и т.н.)
  • Изходи, пак по TCP, за
    • Видео поток
    • Аудио поток
    • Копие на всеки входящ stream
    • preview на потоците и изходящата картина

Софтуерът в общи линии просто switch-ва между няколко неща (някоя картина на fullscreen, picture-in-picture в някакви варианти и т.н.) и вади поток, който може да се използва. Има отделно приложение (voctogui) което се закача към него и се използва като конзола – може да показва preview на потоците и да подава команди към основния процес (voctocore).

Как го използвахме

Излъчване от кутиите

Като за начало, изкарването на поток от кутиите става с ffmpeg/avconv, по UDP, по multicast. UDP, понеже е по-издръжливо на някакви random прекъсвания и няма да създаде десинхронизация, multicast, за да може да се гледа от повече от едно място (например за проверка какво точно излиза). Командата изглежда по следния начин:

# these are needed, because the default socket size is too small.
echo 81921024 > /proc/sys/net/core/wmem_max
echo 81921024 > /proc/sys/net/core/wmem_default

echo 81921024 > /proc/sys/net/core/rmem_max
echo 81921024 > /proc/sys/net/core/rmem_default

/usr/local/bin/bmd-streamer -f /usr/lib/firmware -k 1000 -S hdmi -F 0 | \
 ffmpeg -i - -c copy -f mpegts 'udp://227.0.0.1:9000&overrun_nonfatal=1&buffer_size=81921024&fifo_size=178481'

Интересното тук са параметрите на UDP stream-а – гигантски буфери (които и по-горе се казват на kernel-а), така че каквото и да се случва, да не се бави писането в буфера. Като цяло не е проблем да се губят пакети, но е доста лошо да се получава забавяне в целия stream, понеже води до десинхронизация. (да се губят пакети също е лошо, и за целта работя по нещо, което да вкарва forward error correction в тоя поток, един добър човек е написал patch за ffmpeg, реализиращ pro-mpeg, който има точно такава функционалност, надявам се да успеем да го ползваме на FOSDEM)

Приемане във voctomix

Самият voctocore приема потоците точно във видът, в който е конфигуриран (в нашия случай 1280×720, 30fps, audio в pcm_s16le на 44100hz), в MKV контейнер. За целта скриптовете, които го подават изглеждат ето така:

#/bin/sh
confdir="`dirname "$0"`/../"
. $confdir/default-config.sh
if [ -f $confdir/config.sh ]; then
    . $confdir/config.sh
fi


ffmpeg -y -nostdin \
    -i 'udp://227.0.0.1:9000&overrun_nonfatal=1&buffer_size=81921024&fifo_size=178481' \
    -ac 2 \
    -filter_complex "
        [0:v] scale=$WIDTH:$HEIGHT,fps=$FRAMERATE,setdar=16/9,setsar=1 [v] ;
        [0:a] aresample=$AUDIORATE [a]
    " \
    -map "[v]" -map "[a]" \
    -pix_fmt yuv420p \
    -c:v rawvideo \
    -c:a pcm_s16le \
    -f matroska \
    tcp://localhost:10000

Това в общи линии казва "вземи udp stream-а, scale-ни го до колкото искаме, сгъни пикселите и аспекта да са точно каквито ни трябват, и го прати като mkv на порт 10000". Сгъването на пикселите и аспекта (setsar, setdar) се налага основно когато не може да се промени изхода на камерата и идва в нещо странно като 1920×1088, което води до малко по-различна форма на пикселите.

За да работи цялото нещо, имаме два такива скрипта (по един за box), както и един подобен, който просто loop-ва едно PNG, което играе ролята на фон. В оригиналните скриптове хората са използвали видео за фон на picture-in-picture, но това е по-объркващо за гледащите и не го ползваме.

Излъчване и запис при voctomix

Излъчването и записът са в общи линии много подобни скриптове, като ще покажа само този, който праща до restreamer-а:

#/bin/sh
ffmpeg -y -nostdin \
    -i tcp://localhost:15000 \
    -threads:0 0 \
    -aspect 16:9 \
    -c:v libx264 \
    -maxrate:v:0 2000k -bufsize:v:0 8192k \
    -pix_fmt:0 yuv420p -profile:v:0 main -b:v 512k \
    -preset:v:0 ultrafast \
    \
    -ac 1 -c:a libfdk_aac -b:a 96k -ar 44100 \
    -map 0:v \
    -map 0:a -filter:a:0 pan=mono:c0=FL \
    -ac:a:2 2 \
    \
    -y -f flv rtmp://10.23.0.1:1935/st/STREAM

(скриптът е примерен, понеже доработвах след това нещата)

Като цяло, просто се взимат raw данните от порт 15000, encode-ват се до H.264 и се пращат до сървъра. По същият начин може да се обръщат във WEBM и засилват, но той иска много повече процесорно време и не сме стигнали до там, че да го ползваме.

Екстри за voctomix

Нещо, което не включихме на OpenFest, но ще има на FOSDEM е една дребна доработка, която позволява с много малко ресурси хора отдалечено да контролират voctomix-а. По принцип voctogui не е лек процес и има много сериозни мрежови изисквания, ако не се стартира локално (от порядъка на 1Gbps само за него), но позволява всякакви ужасяващи неща с малко дописване. С един прост скрипт, който прави screenshot веднъж в секунда, и съвсем прост друг, който подава команди ще имаме начин определени хора да имат контрол върху излъчването.

Също така нещо, което ползвахме донякъде на OpenFest за monitoring на stream-а е един друг скрипт с mpv, който взима списък URL-та и някакви имена към тях и ги пуска в отделни подредени един до друг прозорци на екрана, като за всеки overlay-ва един bar с нивото на звука, така че да може да се вижда дали е ок (понеже не е практически възможно да се слушат няколко зали едновременно). Проблемът му е, че се иска бая процесорно време, за да се декодират повечето потоци и един T420 с i7 процесор се озорваше с 6те потока от феста. Как изглеждаше екрана може да видите тук.

Опериране

Работата с voctomix не е сложна, но за момента пълна с неща, които имат да се свършат. Ето как изглежда (засега, работим по автоматизация) процесът на стартиране:

  • voctocore
  • voctogui
  • скриптове за приемане от камери (cam1.sh, grab.sh)
  • скрипт за генериране на фон
  • скрипт за stream-ване
  • запис (record.sh)

След което от voctogui при нужда се сменят различните картини. Като цяло е доста по-просто за разбиране от по-големите setup-и с конвертори и т.н., но и с по-малко функционалности.

Доколко добре работи?

Работи прекрасно, въпреки че се опитваме да открием един бъг с забавяне на audio-то, който се появява в някакъв момент. Започвам да си мисля, че има някакъв проблем със самия лаптоп, с който правим миксирането.

Какво още можем да искаме?

Хрумнаха ни няколко екстри, които да добавим, така че да стигнем функционалността на хардуерния setup:

  • Начин да излъчваме екрана за проектора от при нас. Това ще иска някаква доработка, за да смъкнем латентността на цялото нещо под 100ms, понеже иначе ще е доста забележимо (представете си как лектора прави нещо и проекторът се променя след 5 секунди). Единият от вариантите, който ни хрумна е проекторът да е вързан на едно pi и то директно да може да избира кой multicast да гледа (някоя камера, лаптопа на лектора или нещо трето).
  • Overlay надписи по време на лекцията – трябва да видим какво има да се пипне още, мисля, че има някаква такава функционалност (или може да ги сложим във фона).
  • По-добра синхронизация на различните потоци – ако работим с няколко камери, може да се окаже проблем, че едната върви с няколко кадъра след другата и трябва да си поиграем със забавяне.

Като цяло, аз съм много щастлив от voctomix и ако успея да убедя екипа, догодина можем много повече да ползваме него, отколкото чисто хардуерния setup (просто ще ни трябват мощни машини, за да се справят с encode-ването, че засега успяваме да работим само на 720p, без да подпалим лаптопа).

Bringing the Viewer In: The Video Opportunity in Virtual Reality

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/151940036881

By Satender Saroha, Video Engineering

Virtual reality (VR) 360° videos are the next frontier of how we engage with and consume content. Unlike a traditional scenario in which a person views a screen in front of them, VR places the user inside an immersive experience. A viewer is “in” the story, and not on the sidelines as an observer.

Ivan Sutherland, widely regarded as the father of computer graphics, laid out the vision for virtual reality in his famous speech, “Ultimate Display” in 1965 [1]. In that he said, “You shouldn’t think of a computer screen as a way to display information, but rather as a window into a virtual world that could eventually look real, sound real, move real, interact real, and feel real.”

Over the years, significant advancements have been made to bring reality closer to that vision. With the advent of headgear capable of rendering 3D spatial audio and video, realistic sound and visuals can be virtually reproduced, delivering immersive experiences to consumers.

When it comes to entertainment and sports, streaming in VR has become the new 4K HEVC/UHD of 2016. This has been accelerated by the release of new camera capture hardware like GoPro and streaming capabilities such as 360° video streaming from Facebook and YouTube. Yahoo streams lots of engaging sports, finance, news, and entertainment video content to tens of millions of users. The opportunity to produce and stream such content in 360° VR opens a unique opportunity to Yahoo to offer new types of engagement, and bring the users a sense of depth and visceral presence.

While this is not an experience that is live in product, it is an area we are actively exploring. In this blog post, we take a look at what’s involved in building an end-to-end VR streaming workflow for both Live and Video on Demand (VOD). Our experiments and research goes from camera rig setup, to video stitching, to encoding, to the eventual rendering of videos on video players on desktop and VR headsets. We also discuss challenges yet to be solved and the opportunities they present in streaming VR.

1. The Workflow

Yahoo’s video platform has a workflow that is used internally to enable streaming to an audience of tens of millions with the click of a few buttons. During experimentation, we enhanced this same proven platform and set of APIs to build a complete 360°/VR experience. The diagram below shows the end-to-end workflow for streaming 360°/VR that we built on Yahoo’s video platform.

Figure 1: VR Streaming Workflow at Yahoo

1.1. Capturing 360° video

In order to capture a virtual reality video, you need access to a 360°-capable video camera. Such a camera uses either fish-eye lenses or has an array of wide-angle lenses to collectively cover a 360 (θ) by 180 (ϕ) sphere as shown below.

Though it sounds simple, there is a real challenge in capturing a scene in 3D 360° as most of the 360° video cameras offer only 2D 360° video capture.

In initial experiments, we tried capturing 3D video using two cameras side-by-side, for left and right eyes and arranging them in a spherical shape. However this required too many cameras – instead we use view interpolation in the stitching step to create virtual cameras.

Another important consideration with 360° video is the number of axes the camera is capturing video with. In traditional 360° video that is captured using only a single-axis (what we refer as horizontal video), a user can turn their head from left to right. But this setup of cameras does not support a user tilting their head at 90°.

To achieve true 3D in our setup, we went with 6-12 GoPro cameras having 120° field of view (FOV) arranged in a ring, and an additional camera each on top and bottom, with each one outputting 2.7K at 30 FPS.

1.2. Stitching 360° video

Projection Layouts

Because a 360° view is a spherical video, the surface of this sphere needs to be projected onto a planar surface in 2D so that video encoders can process it. There are two popular layouts:

Equirectangular layout: This is the most widely-used format in computer graphics to represent spherical surfaces in a rectangular form with an aspect ratio of 2:1. This format has redundant information at the poles which means some pixels are over-represented, introducing distortions at the poles compared to the equator (as can be seen in the equirectangular mapping of the sphere below).

Figure 2: Equirectangular Layout [2]

CubeMap layout: CubeMap layout is a format that has also been used in computer graphics. It contains six individual 2D textures that map to six sides of a cube. The figure below is a typical cubemap representation. In a cubemap layout, the sphere is projected onto six faces and the images are folded out into a 2D image, so pieces of a video frame map to different parts of a cube, which leads to extremely efficient compact packing. Cubemap layouts require about 25% fewer pixels compared to equirectangular layouts.

Figure 3: CubeMap Layout [3]

Stitching Videos

In our setup, we experimented with a couple of stitching softwares. One was from Vahana VR [4], and the other was a modified version of the open-source Surround360 technology that works with a GoPro rig [5]. Both softwares output equirectangular panoramas for the left and the right eye. Here are the steps involved in stitching together a 360° image:

Raw frame image processing: Converts uncompressed raw video data to RGB, which involves several steps starting from black-level adjustment, to applying Demosaic algorithms in order to figure out RGB color parts for each pixel based on the surrounding pixels. This also involves gamma correction, color correction, and anti vignetting (undoing the reduction in brightness on the image periphery). Finally, this stage applies sharpening and noise-reduction algorithms to enhance the image and suppress the noise.

Calibration: During the calibration step, stitching software takes steps to avoid vertical parallax while stitching overlapping portions in adjacent cameras in the rig. The purpose is to align everything in the scene, so that both eyes see every point at the same vertical coordinate. This step essentially matches the key points in images among adjacent camera pairs. It uses computer vision algorithms for feature detection like Binary Robust Invariant Scalable Keypoints (BRISK) [6] and AKAZE [7].

Optical Flow: During stitching, to cover the gaps between adjacent real cameras and provide interpolated view, optical flow is used to create virtual cameras. The optical flow algorithm finds the pattern of apparent motion of image objects between two consecutive frames caused by the movement of the object or camera. It uses OpenCV algorithms to find the optical flow [8].

Below are the frames produced by the GoPro camera rig:

Figure 4: Individual frames from 12-camera rig

Figure 5: Stitched frame output with PtGui

Figure 6: Stitched frame with barrel distortion using Surround360

Figure 7: Stitched frame after removing barrel distortion using Surround360

To get the full depth in stereo, the rig is set-up so that i = r * sin(FOV/2 – 360/n). where:

  • i = IPD/2 where IPD is the inter-pupillary distance between eyes.\
  • r = Radius of the rig.
  • FOV = Field of view of GoPro cameras, 120 degrees.
  • n = Number of cameras which is 12 in our setup.

Given IPD is normally 6.4 cms, i should be greater than 3.2 cm. This implies that with a 12-camera setup, the radius of the the rig comes to 14 cm(s). Usually, if there are more cameras it is easier to avoid black stripes.

Reducing Bandwidth – FOV-based adaptive transcoding

For a truly immersive experience, users expect 4K (3840 x 2160) quality resolution at 60 frames per second (FPS) or higher. Given typical HMDs have a FOV of 120 degrees, a full 360° video needs a resolution of at least 12K (11520 x 6480). 4K streaming needs a bandwidth of 25 Mbps [9]. So for 12K resolution, this effectively translates to > 75 Mbps and even more for higher framerates. However, average wifi in US has bandwidth of 15 Mbps [10].

One way to address the bandwidth issue is by reducing the resolution of areas that are out of the field of view. Spatial sub-sampling is used during transcoding to produce multiple viewport-specific streams. Each viewport-specific stream has high resolution in a given viewport and low resolution in the rest of the sphere.

On the player side, we can modify traditional adaptive streaming logic to take into account field of view. Depending on the video, if the user moves his head around a lot, it could result in multiple buffer fetches and could result in rebuffering. Ideally, this will work best in videos where the excessive motion happens in one field of view at a time and does not span across multiple fields of view at the same time. This work is still in an experimental stage.

The default output format from stitching software of both Surround360 and Vahana VR is equirectangular format. In order to reduce the size further, we pass it through a cubemap filter transform integrated into ffmpeg to get an additional pixel reduction of ~25%  [11] [12].

At the end of above steps, the stitching pipeline produces high-resolution stereo 3D panoramas which are then ingested into the existing Yahoo Video transcoding pipeline to produce multiple bit-rates HLS streams.

1.3. Adding a stitching step to the encoding pipeline

Live – In order to prepare for multi-bitrate streaming over the Internet, a live 360° video-stitched stream in RTMP is ingested into Yahoo’s video platform. A live Elemental encoder was used to re-encode and package the live input into multiple bit-rates for adaptive streaming on any device (iOS, Android, Browser, Windows, Mac, etc.)

Video on Demand – The existing Yahoo video transcoding pipeline was used to package multiple bit-rates HLS streams from raw equirectangular mp4 source videos.

1.4. Rendering 360° video into the player

The spherical video stream is delivered to the Yahoo player in multiple bit rates. As a user changes their viewing angle, different portion of the frame are shown, presenting a 360° immersive experience. There are two types of VR players currently supported at Yahoo:

WebVR based Javascript Player – The Web community has been very active in enabling VR experiences natively without plugins from within browsers. The W3C has a Javascript proposal [13], which describes support for accessing virtual reality (VR) devices, including sensors and head-mounted displays on the Web. VR Display is the main starting point for all the device APIs supported. Some of the key interfaces and attributes exposed are:

  • VR Display Capabilities: It has attributes to indicate position support, orientation support, and has external display.
  • VR Layer: Contains the HTML5 canvas element which is presented by VR Display when its submit frame is called. It also contains attributes defining the left bound and right bound textures within source canvas for presenting to an eye.
  • VREye Parameters: Has information required to correctly render a scene for given eye. For each eye, it has offset the distance from middle of the user’s eyes to the center point of one eye which is half of the interpupillary distance (IPD). In addition, it maintains the current FOV of the eye, and the recommended renderWidth and render Height of each eye viewport.
  • Get VR Displays: Returns a list of VR Display(s) HMDs accessible to the browser.

We implemented a subset of webvr spec in the Yahoo player (not in production yet) that lets you watch monoscopic and stereoscopic 3D video on supported web browsers (Chrome, Firefox, Samsung), including Oculus Gear VR-enabled phones. The Yahoo player takes the equirectangular video and maps its individual frames on the Canvas javascript element. It uses the webGL and Three.JS libraries to do computations for detecting the orientation and extracting the corresponding frames to display.

For web devices which support only monoscopic rendering like desktop browsers without HMD, it creates a single Perspective Camera object specifying the FOV and aspect ratio. As the device’s requestAnimationFrame is called it renders the new frames. As part of rendering the frame, it first calculates the projection matrix for FOV and sets the X (user’s right), Y (Up), Z (behind the user) coordinates of the camera position.

For devices that support stereoscopic rendering like mobile phones from Samsung Gear, the webvr player creates two PerspectiveCamera objects, one for the left eye and one for the right eye. Each Perspective camera queries the VR device capabilities to get the eye parameters like FOV, renderWidth and render Height every time a frame needs to be rendered at the native refresh rate of HMD. The key difference between stereoscopic and monoscopic is the perceived sense of depth that the user experiences, as the video frames separated by an offset are rendered by separate canvas elements to each individual eye.

Cardboard VR – Google provides a VR sdk for both iOS and Android [14]. This simplifies common VR tasks like-lens distortion correction, spatial audio, head tracking, and stereoscopic side-by-side rendering. For iOS, we integrated Cardboard VR functionality into our Yahoo Video SDK, so that users can watch stereoscopic 3D videos on iOS using Google Cardboard.

2. Results

With all the pieces in place, and experimentation done, we were able to successfully do a 360° live streaming of an internal company-wide event.

Figure 8: 360° Live streaming of Yahoo internal event

In addition to demonstrating our live streaming capabilities, we are also experimenting with showing 360° VOD videos produced with a GoPro-based camera rig. Here is a screenshot of one of the 360° videos being played in the Yahoo player.

Figure 9: Yahoo Studios produced 360° VOD content in the Yahoo Player

3. Challenges and Opportunities

3.1. Enormous amounts of data

As we alluded to in the video processing section of this post, delivering 4K resolution videos for each eye for each FOV at a high frame-rate remains a challenge. While FOV-adaptive streaming does reduce the size by providing high resolution streams separately for each FOV, providing an impeccable 60 FPS or more viewing experience still requires a lot more data than the current internet pipes can handle. Some of the other possible options which we are closely paying attention to are:

Compression efficiency with HEVC and VP9 – New codecs like HEVC and VP9 have the potential to provide significant compression gains. HEVC open source codecs like x265 have shown a 40% compression performance gain compared to the currently ubiquitous H.264/AVC codec. LIkewise, a VP9 codec from Google has shown similar 40% compression performance gains. The key challenge is the hardware decoding support and the browser support. But with Apple and Microsoft very much behind HEVC and Firefox and Chrome already supporting VP9, we believe most browsers would support HEVC or VP9 within a year.

Using 10 bit color depth vs 8 bit color depth – Traditional monitors support 8 bpc (bits per channel) for displaying images. Given each pixel has 3 channels (RGB), 8 bpc maps to 256x256x256 color/luminosity combinations to represent 16 million colors. With 10 bit color depth, you have the potential to represent even more colors. But the biggest stated advantage of using 10 bit color depth is with respect to compression during encoding even if the source only uses 8 bits per channel. Both x264 and x265 codecs support 10 bit color depth, with ffmpeg already supporting encoding at 10 bit color depth.

3.2. Six degrees of freedom

With current camera rig workflows, users viewing the streams through HMD are able to achieve three degrees of Freedom (DoF) i.e., the ability to move up/down, clockwise/anti-clockwise, and swivel. But you still can’t get a different perspective when you move inside it i.e., move forward/backward. Until now, this true six DoF immersive VR experience has only been possible in CG VR games. In video streaming, LightField technology-based video cameras produced by Lytro are the first ones to capture light field volume data from all directions [15]. But Lightfield-based videos require an order of magnitude more data than traditional fixed FOV, fixed IPD, fixed lense camera rigs like GoPro. As bandwidth problems get resolved via better compressions and better networks, achieving true immersion should be possible.

4. Conclusion

VR streaming is an emerging medium and with the addition of 360° VR playback capability, Yahoo’s video platform provides us a great starting point to explore the opportunities in video with regard to virtual reality. As we continue to work to delight our users by showing immersive video content, we remain focused on optimizing the rendering of high-quality 4K content in our players. We’re looking at building FOV-based adaptive streaming capabilities and better compression during delivery. These capabilities, and the enhancement of our webvr player to play on more HMDs like HTC Vive and Oculus Rift, will set us on track to offer streaming capabilities across the entire spectrum. At the same time, we are keeping a close watch on advancements in supporting spatial audio experiences, as well as advancements in the ability to stream volumetric lightfield videos to achieve true six degrees of freedom, with the aim of realizing the true potential of VR.

Glossary – VR concepts:

VR – Virtual reality, commonly referred to as VR, is an immersive computer-simulated reality experience that places viewers inside an experience. It “transports” viewers from their physical reality into a closed virtual reality. VR usually requires a headset device that takes care of sights and sounds, while the most-involved experiences can include external motion tracking, and sensory inputs like touch and smell. For example, when you put on VR headgear you suddenly start feeling immersed in the sounds and sights of another universe, like the deck of the Star Trek Enterprise. Though you remain physically at your place, VR technology is designed to manipulate your senses in a manner that makes you truly feel as if you are on that ship, moving through the virtual environment and interacting with the crew.

360 degree video – A 360° video is created with a camera system that simultaneously records all 360 degrees of a scene. It is a flat equirectangular video projection that is morphed into a sphere for playback on a VR headset. A standard world map is an example of equirectangular projection, which maps the surface of the world (sphere) onto orthogonal coordinates.

Spatial Audio – Spatial audio gives the creator the ability to place sound around the user. Unlike traditional mono/stereo/surround audio, it responds to head rotation in sync with video. While listening to spatial audio content, the user receives a real-time binaural rendering of an audio stream [17].

FOV – A human can naturally see 170 degrees of viewable area (field of view). Most consumer grade head mounted displays HMD(s) like Oculus Rift and HTC Vive now display 90 degrees to 120 degrees.

Monoscopic video – A monoscopic video means that both eyes see a single flat image, or video file. A common camera setup involves six cameras filming six different fields of view. Stitching software is used to form a single equirectangular video. Max output resolution on 2D scopic videos on Gear VR is 3480×1920 at 30 frames per second.

Presence – Presence is a kind of immersion where the low-level systems of the brain are tricked to such an extent that they react just as they would to non-virtual stimuli.

Latency – It’s the time between when you move your head, and when you see physical updates on the screen. An acceptable latency is anywhere from 11 ms (for games) to 20 ms (for watching 360 vr videos).

Head Tracking – There are two forms:

  • Positional tracking – movements and related translations of your body, eg: sway side to side.
  • Traditional head tracking – left, right, up, down, roll like clock rotation.

References:

[1] Ultimate Display Speech as reminisced by Fred Brooks: http://www.roadtovr.com/fred-brooks-ivan-sutherlands-1965-ultimate-display-speech/

[2] Equirectangular Layout Image: https://www.flickr.com/photos/[email protected]/10111691364/

[3] CubeMap Layout: http://learnopengl.com/img/advanced/cubemaps_skybox.png

[4] Vahana VR: http://www.video-stitch.com/

[5] Surround360 Stitching software: https://github.com/facebook/Surround360

[6] Computer Vision Algorithm BRISK: https://www.robots.ox.ac.uk/~vgg/rg/papers/brisk.pdf

[7] Computer Vision Algorithm AKAZE: http://docs.opencv.org/3.0-beta/doc/tutorials/features2d/akaze_matching/akaze_matching.html

[8] Optical Flow: http://docs.opencv.org/trunk/d7/d8b/tutorial_py_lucas_kanade.html

[9] 4K connection speeds: https://help.netflix.com/en/node/306

[10] Average connection speeds in US: https://www.akamai.com/us/en/about/news/press/2016-press/akamai-releases-fourth-quarter-2015-state-of-the-internet-report.jsp

[11] CubeMap transform filter for ffmpeg: https://github.com/facebook/transform

[12] FFMPEG software: https://ffmpeg.org/

[13] WebVR Spec: https://w3c.github.io/webvr/

[14] Google Daydream SDK: https://vr.google.com/cardboard/developers/

[15] Lytro LightField Volume for six DoF: https://www.lytro.com/press/releases/lytro-immerge-the-worlds-first-professional-light-field-solution-for-cinematic-vr

[16] 10 bit color depth: https://gist.github.com/l4n9th4n9/4459997

RTMP Api for Node.JS to ease the implementation of RTMP servers and clients

Post Syndicated from Delian Delchev original http://deliantech.blogspot.com/2014/06/rtmp-api-for-nodejs-to-ease.html

Hello to all,As I mentioned before, I needed to implement a RTMP streaming server in Node.JS. All of the available modules for implementation of RTMP in Node’s NPM repository were incomplete, incorrect or unusable. Not only that but the librtmp used by libav tools like avconv and avplay was incorrect and incomplete.The same with most of the implementation I’ve checked (covering perl, python, others). I’ve tried to fix few of them but at the end I had to write one on my own.This is my library of RTMP related tools and API for Node.JS. It is named node-rtmpapi and is available in the npm repository. Also you can get it here – https://github.com/delian/node-rtmpapiIt works well for me, and it has been tested with MistServer, OrbanEncoders and librtmp (from libav).That does not mean it will work for you, though :)RTMP is quite badly documented protocol and extremely badly implemented.During my tests I have seen issues like crash of libraries (including the Adobe’s original one) if the upper layer commands has been sent in unexpected order (although this is allowed by the RTMP protocol and the order of the upper layer commands is not documented at all). Also I have seen (within Adobe’s rtmp library) incorrect implementation of the setPeerBandwidth command.Generally, each different RTMP implementation is on its own and the only way to make it work is to adjust and tune it according to the software you communicate with.Therefore I separated my code in utils that allows me to write my own RTMP server relatively easy and to adjust it according to my needs.The current library supports only TCP as a transport (although TLS and HTTP/HTTPS is easy to be implemented, I haven’t focused on it yet).It provides separate code that implements streaming (readQueue), the chunk layer of the protocol (rtmpChunk), the upper layer messaging (assembling and disassembling of message over chunks, rtmpMessage), stream processing (rtmpStream) and basic server implementation without the business logic (rtmpServer).Simplified documentation is provided at the git-hub repository.The current library uses callbacks for each upper layer command it receives. I am planning to migrate the code to use node streams and to trigger events per command, instead of callbacks. This will extremely simplify the usage and the understanding of the library for a node programmer. However, this is the future and in order to preserve compatibility, I will probably name it something different (like node-streams-rtmpapi)

RTMP Api for Node.JS to ease the implementation of RTMP servers and clients

Post Syndicated from Delian Delchev original http://deliantech.blogspot.com/2014/06/rtmp-api-for-nodejs-to-ease.html

Hello to all,As I mentioned before, I needed to implement a RTMP streaming server in Node.JS. All of the available modules for implementation of RTMP in Node’s NPM repository were incomplete, incorrect or unusable. Not only that but the librtmp used by libav tools like avconv and avplay was incorrect and incomplete.The same with most of the implementation I’ve checked (covering perl, python, others). I’ve tried to fix few of them but at the end I had to write one on my own.This is my library of RTMP related tools and API for Node.JS. It is named node-rtmpapi and is available in the npm repository. Also you can get it here – https://github.com/delian/node-rtmpapiIt works well for me, and it has been tested with MistServer, OrbanEncoders and librtmp (from libav).That does not mean it will work for you, though :)RTMP is quite badly documented protocol and extremely badly implemented.During my tests I have seen issues like crash of libraries (including the Adobe’s original one) if the upper layer commands has been sent in unexpected order (although this is allowed by the RTMP protocol and the order of the upper layer commands is not documented at all). Also I have seen (within Adobe’s rtmp library) incorrect implementation of the setPeerBandwidth command.Generally, each different RTMP implementation is on its own and the only way to make it work is to adjust and tune it according to the software you communicate with.Therefore I separated my code in utils that allows me to write my own RTMP server relatively easy and to adjust it according to my needs.The current library supports only TCP as a transport (although TLS and HTTP/HTTPS is easy to be implemented, I haven’t focused on it yet).It provides separate code that implements streaming (readQueue), the chunk layer of the protocol (rtmpChunk), the upper layer messaging (assembling and disassembling of message over chunks, rtmpMessage), stream processing (rtmpStream) and basic server implementation without the business logic (rtmpServer).Simplified documentation is provided at the git-hub repository.The current library uses callbacks for each upper layer command it receives. I am planning to migrate the code to use node streams and to trigger events per command, instead of callbacks. This will extremely simplify the usage and the understanding of the library for a node programmer. However, this is the future and in order to preserve compatibility, I will probably name it something different (like node-streams-rtmpapi)

AMF0/3 encoding/decoding in Node.JS

Post Syndicated from Delian Delchev original http://deliantech.blogspot.com/2014/06/amf03-encodingdecoding-in-nodejs.html

I am writing my own RTMP restreamer (RTMP is Adobe’s dying streaming protocol widely used with Flash) in Node.JS.Although, there are quite of few RTMP modules, no one is complete, nor operates with Node.JS buffers, nor support fully ether AMF0 or AMF3 encoding and decoding.So I had to write one on my own.The first module is the AMF0/AMF3 utils that allow me to encode or decode AMF data. AMF is a binary encoding used in Adobe’s protocols, very similar to BER (used in ITU’s protocols) but supporting complex objects. In general the goal of AMF is to encode ActiveScript objects into binary. As ActiveScript is a language belonging to the JavaScript’s familly, basically the ActiveScript’s objects are javascript objects (with the exception of some simplified arrays).My module is named node-amfutils and is now available in the public NPM repository as well as here https://github.com/delian/node-amfutilsIt is not fully completed nor very well tested as I have very limited environment to do the tests. However, it works for me and provides the best AMF0 and AMF3 support currently available for Node.JS – It can encode/decode all the objects defined in both AMF0 and AMF3 (the other AMF modules in the npm repository supports partial AMF0 or partial AMF3)It uses Node.JS buffers (it is not necessary to do string to buffer to string conversion, as you have to do with the other modules)It is easy to use this module. You just have to do something like this:var amfUtils = require(‘node-amfutils’);var buffer = amfUtils.amf0Encode([{ a: “xxx”},b: null]);

AMF0/3 encoding/decoding in Node.JS

Post Syndicated from Delian Delchev original http://deliantech.blogspot.com/2014/06/amf03-encodingdecoding-in-nodejs.html

I am writing my own RTMP restreamer (RTMP is Adobe’s dying streaming protocol widely used with Flash) in Node.JS.Although, there are quite of few RTMP modules, no one is complete, nor operates with Node.JS buffers, nor support fully ether AMF0 or AMF3 encoding and decoding.So I had to write one on my own.The first module is the AMF0/AMF3 utils that allow me to encode or decode AMF data. AMF is a binary encoding used in Adobe’s protocols, very similar to BER (used in ITU’s protocols) but supporting complex objects. In general the goal of AMF is to encode ActiveScript objects into binary. As ActiveScript is a language belonging to the JavaScript’s familly, basically the ActiveScript’s objects are javascript objects (with the exception of some simplified arrays).My module is named node-amfutils and is now available in the public NPM repository as well as here https://github.com/delian/node-amfutilsIt is not fully completed nor very well tested as I have very limited environment to do the tests. However, it works for me and provides the best AMF0 and AMF3 support currently available for Node.JS – It can encode/decode all the objects defined in both AMF0 and AMF3 (the other AMF modules in the npm repository supports partial AMF0 or partial AMF3)It uses Node.JS buffers (it is not necessary to do string to buffer to string conversion, as you have to do with the other modules)It is easy to use this module. You just have to do something like this:var amfUtils = require(‘node-amfutils’);var buffer = amfUtils.amf0Encode([{ a: “xxx”},b: null]);