Quite a number of AMD GPU based video cards are running outdated drivers for such modern task as low latency game streaming, and users have no clue that the video driver is letting them down. For example, a slice of version structure for those who “have things going rather well”:
Current recommended (“stable”?) version is 20.4.2 (just 20%) released on 15-May-2020 10 weeks ago, and optional (“latest”, “beta”?) is 20.7.2. Many users have 19.xx just because they install the driver pulled by Windows Update expecting this to be a good driver. It is not and with many adapters it simply does not work for some of the video encoding tasks because it does not follow the documented behavior. Quite some users just have no guess that their “standard” video driver brought to them via Windows Update channel is hugely outdated and multiple updates have been available.
Now the structure for AMD RX 5×00 XT series (especially popular RX 5700 XT):
The small fraction of 20.5.1 reflects the broken state of the driver: video encoder there fails to process video. Yes, it is fixed in 20.7.1 but only users who check and install optional updates of AMD Adrenaline 2020 have a chance to be aware of availability of fixing update.
Another confusing thing is that there is a recommended version of AMD driver software and it seems to be the default setting to pull recommended updates. Yet the driver download section (link above) suggests to install optional/latest version 20.7.2 of the driver software package.
I was under impression that AMD hardware allows just one video encoding session and prevents from having multiple side by side. This has been the consistent behavior I was seeing and I was always wondering why it had to be that tight.
To my surprise, the actual limitation is higher and, in particular, is sixteen (16!) sessions runnable in parallel. In particular, with the GPU in my dev box…
The problem has been a bug in AMD driver and/or AMD AMF runtime, which triggered an exception when in low latency mode. With this bug it is indeed just one session at a time. Even though the bug has been present for literally years, it is good that AMD engineers do respond on github and this results in problem identification, workaround and I hope resolution as well.
The good thing is that just 2+ low latency sessions are not allowed. Multiple regular sessions and zero or one low latency, up to 16 in total, is still fine. That is, a fallback to non low latency session is a possible workaround.
#include <unknwn.h>
#include <winrt\base.h>
#include <winrt\Windows.Foundation.h>
int main()
{
}
Out:
1>—— Build started: Project: CppWinrt01, Configuration: Debug x64 —— 1>CppWinrt01.cpp 1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(983,26): error C2039: ‘wait_for’: is not a member of ‘winrt::impl’ 1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(103): message : see declaration of ‘winrt::impl’ 1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(985): message : see reference to class template instantiation ‘winrt::impl::consume_Windows_Foundation_IAsyncAction’ being compiled 1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(1004,26): error C2039: ‘wait_for’: is not a member of ‘winrt::impl’ 1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(103): message : see declaration of ‘winrt::impl’ 1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(1006): message : see reference to class template instantiation ‘winrt::impl::consume_Windows_Foundation_IAsyncActionWithProgress’ being compiled 1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(1038,26): error C2039: ‘wait_for’: is not a member of ‘winrt::impl’ 1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(103): message : see declaration of ‘winrt::impl’ 1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(1040): message : see reference to class template instantiation ‘winrt::impl::consume_Windows_Foundation_IAsyncOperationWithProgress’ being compiled 1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(1057,26): error C2039: ‘wait_for’: is not a member of ‘winrt::impl’ 1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(103): message : see declaration of ‘winrt::impl’ 1>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\cppwinrt\winrt\impl\Windows.Foundation.0.h(1059): message : see reference to class template instantiation ‘winrt::impl::consume_Windows_Foundation_IAsyncOperation’ being compiled 1>Done building project “CppWinrt01.vcxproj” — FAILED. ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
For the record, stepping down to 10.18362 heals the build:
Isolated property is supposed to enable referencing in-process COM servers as a registration free COM dependency, but something got broken on the way: Visual Studio 2019 Preview and .NET 5 produce applications that lose the link.
It is still preview so hopefully things get resolved timely.
There is some support for Opus in Windows, unfortunately however it is not documented. IIRC it came to extend media codec support in Microsoft Edge browser, and since internally Microsoft Edge is using standard platform media API Media Foundation, the decoder came in format of Media Foundation Transform.
It is interesting that Opus decoding was put deep enough to appear across multiple environments, including even Windows IoT:
However, Microsoft did not update Media Foundation API itself to indicate presence of new codec support. The documentation has no mention for Opus decoder. The thing has been present in Windows for four years, but it is not exposed to developers…
Apart from this, stock support for Opus either decoder or WebM parser, or both, are limited to mono or stereo audio. There is no support for more sophisticated channel configurations. Neither in Media Foundation, nor in Edge itself. Edge Beta has it because it inherited the capability from Chrome, which in turn bundles libopus directly, through use of FFmpeg.
5.1 Opus audio fragment played by Edge Beta but not Edge:
Edge Beta’s internals:
Since the limitation is in Media Foundation primitives, other Media Foundation based applications exhibit similar behavior. For example, Movies and TV application similarly fails on this media file.
With Intel J4115 CPU and Intel UHD Graphics 600 GPU it is running Windows 10 and is capable to render and encode video in real time.
1600×900@60 is a bit too heavy for it and VLC consumes pre-buffered content hitting underflow around 26th second of playback. Buggy VLC implementation for HLS client exhibiting playback artifacts (then eventually locks dead and/or crashes completely; retroactive replay of produced content ensure the video stream itself is okay).
The small thing without excessive horsepower uncovered another bug too.
The video content is prepared by Media Foundation pipeline with Intel® Quick Sync Video H.264 Encoder MFT as GPU encoder. Once in a while encoded frame flash exhibiting lack of proper synchronization in the Intel’s MFT.
A broken frame can look like this or otherwise and is supposedly caused by taking work item to encoding without giving a trouble to wait for scheduled GPU work completion.
However it is not anything new, I wrote before about the same issue in another vendor implementation:
Adding a patch with D3D11 event query and waiting on it works the sync issue around (giving reasons to call it Intel’s bug in first place) and so the top posted video shows proper video stream.
I mentioned issues in AMD’s and Intel’s video encoding related drivers, APIs and integration components. Now I switched development box video card to NVIDIA’s and immediately hit their glitch too.
NVIDIA GeForce RTX 2060 SUPER offers really fast video encoder and consumer hardware from AMD and Intel is simply nowhere near. 3840×2160@144 video can be encoded as fast as under with 10 ms per frame:
However this is their hardware and API, and Media Foundation integration based on custom Media Foundation wrapper.
NVIDIA’s Media Foundation encoder transform (MFT) shipped with video driver fails to do even simple thing correctly. Encoding texture using NVIDIA MFT:
It looks like internal color space conversion taking place inside the transform is failing…
NVIDIA HEVC Encoder MFT handles the same input (textures) correctly.
Addition of another crazy thing into internal tooling uncovered new interesting bug. An application is creating audiovisual HLS (HTTP Live Streaming – adaptive streaming developed by Apple) stream on the fly now with Microsoft PlayReady DRM encryption attached.
UWP samples offer a sample application for playback: PlayReady sample. The playback looks this:
Oops, no video! However, this is behavior by design: DRM-enabled video is protected on multiple layers and eventually the image cannot be captured back as a screenshot: the video is automatically removed from the view.
Physical picture of application on monitor is:
Depending on protection level, there might be HDCP enforcement as well.
Now the bug is that protected video running in UWP MediaElement is that video cannot be taken to another monitor (even though it belong to the same video adapter):
It cannot be even restarted on another monitor! Generally speaking, it is not necessarily an MediaPlayer element bug, it can as well be DXGI, for example, and it can be even NVIDIA driver bug. Even though NVIDIA software reports that both monitors can have HDCP-enabled signal, there is something missing. One interesting thing is that it is not even possible to query for HDCP status via standard API for one of the monitors, but not for the other.
PlayReady DRM itself is not a technology coming with detailed information, and its support in Windows is seriously limited in information, integration support. There is just one sample mentioned above and pretty much of related information is simply classified. There is not so much for debugging either because Microsoft intentionally limited PlayReady support to stock implementation hard to use partially, and which is running in an isolated protected process: Media Foundation Media Pipeline EXE (related keyword: PsProtectedSignerAuthenticode).
More of unusual stuff: mix of Media Foundation pipeline, with a custom media source specifically, with GStreamer pipeline. It appears that with all bulkiness of GStreamer runtime, the framework remains open for integrations and flexible for a mix of pre-built and custom components.
PoC push source plugin on top of Media Foundation media session and media source is reasonably small, and is mixed in pipeline in a pretty straightforward way.
Media Foundation Media Source with Direct3D 11 rendering and Direct2D graphics is being played through GStreamer pipeline to OpenGL sink
As codebase and plugin development environment GStreamer is closer to DirectShow rather than Media Foundation: there is a structure of base classes for reuse, implemented, however, in C (not even C++) with, respectively, a mere amount of chaos inside.
Still, GStreamer looks cool overall and over years accumulated a bunch of useful plugins. Long list includes multiple integrations and implementation of MPEG stuff of sorts, and RTP related pieces in particular.
Added a few more resolutions to NvcEncode tool. Resolutions above 4K are tried with H.264 codec but they are expected to not work since H.264 codec is limited to resolutions up to 4096 pixels in width or height. So the new ones apply to H.265/HEVC. They work pretty well on NVIDIA GeForce RTX 2060 SUPER:
One interesting thing is – and it is too visible and consistent to be an occaisional fluctuation – is that per frame latency is lower for higher rate feeds. Most recent run has a great example of this effect:
I have an educated guess only and driver development guys are likely to have a good explanation. This is probably something NVIDIA can improve for those who want to have absolutely lowest encoding latencies.
Interface methods lack pure specifiers. This might be OK for some development but once you try to inherit your handler class from public winrt::implements<AsyncCallback, IRtwqAsyncCallback> you are in trouble!
1>Foo.obj : error LNK2001: unresolved external symbol "public: virtual long __cdecl IRtwqAsyncCallback::GetParameters(unsigned long *,unsigned long *)" (?GetParameters@IRtwqAsyncCallback@@UEAAJPEAK0@Z)
1>Foo.obj : error LNK2001: unresolved external symbol "public: virtual long __cdecl IRtwqAsyncCallback::Invoke(struct IRtwqAsyncResult *)" (?Invoke@IRtwqAsyncCallback@@UEAAJPEAUIRtwqAsyncResult@@@Z)
The problem exists in current Windows 10 SDK and since 10.0.18362.0 at the very least.
To work it around without touching SDK code, this project side addition would satisfy the compiler:
AMD is not seemingly making any progress in improving video encoding ASICs on their video cards. New stuff looks pretty depressing.
AMD Radeon RX 5700 XT was a bit of a move forward, a bit. New series look about the same but even slower a bit, however it is quite clear that existing cheaper NVIDIA offering beats the hell out of new AMD gear.
Not to even mention that NVIDIA cards are capable to handle larger resolutions, where AMD’s bar is at 3840×2160@90.
The engineering quality of most recent Microsoft’s work around Media Foundation is terrible. It surely passes some internal tests to make sure that software items meet requirements of the use cases required for internal products, but published work gives impression that there is noone left to care about API offerings to wide audience.
I have been putting the component into existing code base in order to extend it with reference software video encoding, now in H.265/HEVC format. Hence, the stock software encoder regardless of its performance and qualtiy metrics.
Encoder started giving nonsensical exceptions and errors, in particular rejecting obviously valid input. Sorting out a few things, I started seeing the MFT producing E_FAIL on the very first video frame it receives.
The suspected problem was (and there were not so many other things left) that output media type was set two times. Both calls were valid, with good arguments and before any payload processing. Second call supplied the same media type, all the same attributes EXACTLY. Both media type setting call were successful. The whole media type setting story did not produce any errors at the stage of handling streaming start messages.
Still the second call apparently ruined internal state because – and there can be no other explanation – of shitty quality of the MFT itself.
The code fragment that discards the second media type setting call at wrapping level gets the MFT back to processing. What can I say…
StreamingServer is the application I am using as internal testbed for various media processing and encoding primitives. As an application (or service) it is capable to stream HLS assets preparing them on the fly without need to keep and host real media files. The functionality includes:
Supports video only, audio only, video and audio assets
Supports parts of ISO/IEC 23001-7 “Common encryption in ISO base media file format files” specification and implements ‘cenc’ and ‘cbcs’ encryption schemes with AES-CTR-128 and AES-CBC-128 encryption modes of operation respectively
Supports live HLS assets, including live finite and live infinite assets
Encoding services are provided by underlying Media Foundation encoders; due to state of Media Foundation and, specifically, awful quality of vendor specific third party vendor integrations the application (a) might have issues with specific video cards, (b) implements built-in encoding based on NVIDIA Video Codec SDK for NVIDIA GPUs, (c) offers software only mode for GPU agnostic operation
The application assumes just one client and its streaming services are, generally speaking, limited by trivial HTTP serving loop. Still multiple clients should be able to request data in parallel too.
It is possible to dump produced responses as files for retroactive review. Unless responses are written to files, they are streamed in HTTP chunked mode at lowest latency.
Quick start
Start the application with privilege elevation to enable its initialization with HTTP Server API services. Unless overridden with command line parameters, the application uses first available DXGI device for hardware assisted video encoding, and exposes its HTTP functionality via http://localhost/hls base. Open http://localhost/hls/about to get up to date syntax for command line and URI; also to check the status of the application.
Problem resolution
The application is best suited for use with NVIDIA GPUs doing hardware H.264 video encoding. In the case of video encoding issues, it makes sense to start the application with “-Software” switch to put it into software only mode: video frames will be generated by Direct2D into WIC bitmaps instead of DXGI and Direct3D 11 textures, video encoders will use system memory backed Media Foundation media buffers and samples.
Over 20+ years there have been a steady flow of questions “how to build these projects”. Back in time the problem was more about having exactly matching settings in the application/library projects and mandatory dependent static library. At some point Microsoft abandoned the samples, then removed from the SDK completely. Luckily, some point the samples were returned back to public as “Win7Samples” under “Windows Classic Samples” published on GitHub.
DirectShow samples there, however, exist in the state where they were dropped years ago. Still functioning and in good standing, but not prepared for building out of the box. So the flow of the “how to build” questions is still here.
I made a fork of the repository (branch “directshow” on fork of the Microsoft’s repository; “Samples/Win7Samples/multimedia/directshow” from the root of the repository) and upgraded a few projects, those most popular ones (including AmCap, PushSource, EzRGB24, beginner’s DShowPlayer application):
The code requires Microsoft Visual Studio 2019 (Community version is okay) and current Windows 10 SDK.
To start, clone the fork and locate README at the directshow folder, open the solution and build the code, Debug or Release configuration, Win32 or x64 platform.
Introducing another popular DirectShow project: Vivek’s source filter which emulates a video capture device. For a long time the code was hosted on P “The March Hare” W’s website, which was eventually taken down.
This problem is not fatal or severe but it is a long standing one, and Microsoft folks should look into it because — as StackOverflow question suggests — it confuses people.
It is also a widespread one, and — for instance — it can be easily repro’d by one of the apps I posted earlier:
If you start the application in self-debugging mode with -Debug command line parameter, the debug output is redirected to console and those messages are immediately visible:
In the referenced StackOverflow answer I also advertise Microsoft Windows Implementation Libraries (WIL) which I like and use myself where appropriate, and I think is a good piece of software, and an underrated one. No wonder it is used in DXGI implementation internally.