It appears that re-initialization of encoding session with NVIDIA Video Codec SDK is or might be producing an unexpected memory leak.
So, how does it work exactly?
NVENCSTATUS Status;
Status = m_ApiFunctionList.nvEncInitializeEncoder(m_Encoder, &InitializeParams);
assert(Status == NV_ENC_SUCCESS);
// NOTE: Another nvEncInitializeEncoder call
Status = m_ApiFunctionList.nvEncInitializeEncoder(m_Encoder, &InitializeParams);
assert(Status == NV_ENC_SUCCESS); // Still success
...
Status = m_ApiFunctionList.nvEncDestroyEncoder(m_Encoder);
assert(Status == NV_ENC_SUCCESS);
The root case problem is secondary nvEncInitializeEncoder call. Alright, it might be not exactly how API is designed to work, but returned statuses all indicate success, so it will be a bit hard to justify the leak by telling that second initialization call was not expected in first place. Apparently the implementation overwrites internally allocated resources without accurate releasing or reusing. And without triggering any warning of sorts.
Another part of the problem is eclectic design of the API in first place. You open a “session” and obtain “encoder” as a result. Then you initialize “encoder” and when you are finished you destroy “encoder”. Do you destroy “session”? Oh no, you don’t have any session at all except that API opening “session” actually opens an “encoder”.
So when I get into situation where I want to initialize encoder and it is already initialized then what I do is to destroy existing “encoder”, open new “session” and now I can initialize the session-encoder once again with the initialization parameters.
It appears there is a sort of a limitation (read: “a bug”) in Media Foundation MPEG-4 File Source implementation when it comes to reading long fragmented MP4 files.
When respective media source is used to read a file (for which, by the way, it does not offer seeking), the source issues a MF_SOURCE_READERF_ENDOFSTREAM before reaching actual end of file.
When some software sees a full hour of video in the file…
… Media Foundation primitive, after reading frame 00:58:35.1833333, issues “oh gimme a break” event and reports end of stream.
Surface Pro (5th Gen) infrared camera streamed into Chrome browser in H.264 encoding over WebSocket connection
The screenshot above shows Surface Pro tablet’s infrared camera (known as “Microsoft IR Camera Front” on the device) captured live, encoded and streamed (everything is hosted by Microsoft Media Foundation Media Session by this point) over network using WebSockets into Chrome’s HTML5 video tag by means of Media Source Extensions (MSE).
Why? Because why not.
Unfortunately, Microsoft did not publish/document API to access infrared and depth (time-of-flight) cameras so that traditional applications could use the hardware capabilities. Nevertheless, the functionality is available in Universal Windows Platform (UWP), see Windows.Media.Capture.Frames and friends.
UWP implementation is apparently using Media Foundation on its backyard so the fucntionlaity could certainly be published for desktop applications as well. Another interesting thing is that my [undocumented] way to access the device seems to be bypassing frame server and talks to device directly, including video.
It does not look like Microsoft is planning to extend visibility of these new features to desktop Media Foundation API since they sequentially add new features without exposing them for public use outside UWP. UWP API itself is eclectic and I can’t imagine how one could get a good understanding of it without having a good grip on underlying API layers.
Some time ago I shared an application which I have been using to embed git reference into binary resources, especially as a post-build event in automated manner: Embedding a Git reference at build time.
This time I needed a small amendment related to use of a git repository as a sub-module of another repository. To make things easier for troubleshooting, when a project if built as a part of bigger build through a sub-module repository reference, both git details of the repository and its parent might be embedded into resources.
The utility allows multiple path arguments, will go over all of them and concetenate the “git log” output. When multiple paths are given it is okay to some of them be invalid or unrelated to git repositories.
DirectShow Video Mixing Renderer (VMR-7) filter exhibits a (regression?) bug in Windows 10 systems. When aspect ratio preservation is enabled in VMR_ARMODE_LETTER_BOX mode, which makes overall sense as default mode quote so often, the letterboxing does not work as expected.
The problem is easy to reproduce with a well known DShowPlayerSDK sample application, with an edit enforcing VMR-7 mode. Once video is started, just resize the window and the parts not covered by video will not be erased as expected.
The interesting part with live WebM Media Foundation media source I mentioned in the previous post is that the whole thing works great on… Raspberry Pi 3 Model B+ running Windows 10 IoT Core (RaspberryPi 3B+ Technical Preview Build 17661).
Windows 10 IoT has quite the same Media Foundation infrastructure as in other Universal Windows Platform environments (Desktop, Xbox, HoloLens) including the core API, primitives, support in XAML MediaElement (MediaPlayerElement). There is no DirectX support on Raspberry Pi 3 Model B+ and video delivery fails, however this is a sort of known/expected problem with the Technical Preview build. Audio playback is okay.
The picture above is taken on C# UWP application (that’s ARM platform) running a MediaPlayerElement control taking live audio signal from network using a Windows.Networking.Sockets.MessageWebSocket connection.
A custom (the platform does not have a capable primitive out of the box) WebM live media source forwards the signal to media element for low latency audio playback. The codec is Opus and, yes, stock Media Foundation audio decoder MFT decodes the signal just fine.
“The next generation of game capture is here.” The device addresses needs of real time capture of video signal: offering a pass-through HDMI connection the box provides a video capture sink with USB 3.1 Type C interface and makes the video signal available to video capture applications via standard DirectShow and Media Foundation APIs.
I was interested whether the device implements video compression, H.264 and/or H.265/HEVC in hardware. The technical specifications include:
• Max Pass-Through Resolutions:2160p60 HDR /1440p144 / 1080p240 • Max Record Resolutions:2160p30 / 1440p60 / 1080p120 / 1080p60 HDR • Supported Resolutions (Video input):2160p, 1440p, 1080p, 1080i, 720p, 576p, 480p • Record Format: MPEG 4 (H.264+AAC) or (H.265+AAC)*
…
Notes: *H.265 Compression and HDR are supported by RECentral
So there is a direct mention of video compression, and given the state of the technology and the price of the box it makes sense to have it there. Logitech C930e camera has been offering H.264 video compression onboard for years.
So is it there in the Ultra thing? NO, IT IS NOT. Pathetic…
One could guess this of course from a study of FAQ section in the part of third party software configuration. The software is clearly expected to use external compression capabilities. However popular software is also known to not use the latest stuff, so there was a little chance that hardware codec is still there. I think it would fair to include that right there into technical specification that the product does not offer any encoding capabilities.
The good thing is that the box offers 10-bit video capture up to 2560×1440@30 – there is not so much of inexpensive hardware capable to do such job.
The specification mentions high rate 1920×1080@120 mode but I don’t see it in the effectively advertised capabilities.
Also, video capture capabilities in Media Foundation API suggest that it is possible to capture into video memory bypassing system memory mapping/copy. Even though it is irrelevant to most of the applications, some newer ones including those leveraging UWP video capture API could take advantage (such as, for example, video capture apps running on low power consumption devices).
A monitor of one system is remoted to another system where the latter is… Xbox One X. Perceivable latency with 1920×1080@60 monitor resolution is under 2 video frames even though there are so many things happening in between.
The source system is powered by moderate GeForce GTX 750 running with its video encoding engine (encoding alone on this GPU requires around 12 ms for frame H.264 compression work) loaded at 40%. There is Rainway, Sachiel, Protocol Buffers, WebRTC on the sending side. Not necessary here but good to mention: network video data packaging overall remains HTML5 compliant. On the client side of things the same unwind with, of course, use of DXVA2 for video decoding. Xbox GPU engine utilization fluctuates around 6% and the broadcast overall stays an easy job with latency caused by scheduling rather than processing complexity.
Write your code to fit within 80 columns of text. This helps those of us who like to print out code and look at your code in an xterm without resizing it. The longer answer is that there must be some limit to the width of the code in order to reasonably allow developers to have multiple files side-by-side in windows on a modest display. If you are going to pick a width limit, it is somewhat arbitrary but you might as well pick something standard. Going with 90 columns (for example) instead of 80 columns wouldn’t add any significant value and would be detrimental to printing out code. Also many other projects have standardized on 80 columns, so some people have already configured their editors for it (vs something else, like 90 columns). This is one of many contentious issues in coding standards, but it is not up for debate.
Is there any more stupid rule than to wrap around source code lines just because someone would possibly look at code in an xterm?
So source is consuming less than 25% width of a quote ordinary monitor wasting all this space on the right. Same time, the source code lines are objectively long and are massively wrapped around.
Wrapping destroys readability of code.
Re-wrapping source code has an obvious negative effect on change tracking.
I, for once, want to see as much of source code as possible momentarily because it helps to have a picture of what is going on. Information at the end of lines is less important so it is not a big deal even if it goes beyond the right visible margin, but it’s important to have as many LINES of code as possible – I would even prefer to skip blank lines and utilize IDE’s capabilities to collapse comments, functions, regions and scopes. For this reason some developers even rotate monitors into portrait mode – to see more of source code at a time.
Fitting 80 columns and having it even not up for debate is a clearly genius move to keep devs productive. Through continuous irritation.
A bump of StackOverflow post about Media Foundation design flaw related to video encoding.
Set attributes via ICodecAPI for a H.264 IMFSinkWriter Encoder
I am trying to tweak the attributes of the H.264 encoder created via ActivateObject() by retrieving the ICodecAPI interface to it. Although I do not get errors, my settings are not taken into account. […]
Media Foundation’s Sink Writer is a simplified API with a encoder configuration question slipped away. The fundamental problem here is that you don’t own the encoder MFT and you are accessing it over the writer’s head, then the behavior of encoders around changing settings after everything is set up depends on implementation, which is in encoder’s case a vendor specific implementation and might vary across hardware.
Your more reliable option is to manage encoding MFT directly and supply Sink Writer with already encoded video.
Your potential trick to make things work with less of effort is to retrieve IMFTransform of the encoder as well and clear and then set back the input/output media types after you finished with ICodecAPI update. Nudging the media types, you suggest that encoder re-configures the internals and it would do this already having your fine tunings. Note that this, generally speaking, might have side issues.
The ‘trick’ seems to work for some of the ICodecAPI parameters (e.g. CODECAPI_AVEncCommonQualityVsSpeed) and only for Microsoft’s h.264 encoder. No effect on CODECAPI_AVEncH264CABACEnable. The doc indeed seems to be specifically for Microsoft’s encoder and not be a generic API. I’m using the QuickSync and NVidia codecs, do you know if those are configurable via the ICodecAPI assuming I create the MFT myself?
Vendor provided encoders fall under Certified Hardware Encoder requirements, so they must support ICodecAPI values mentioned in the MSDN article. Important is that it is not defined what the order of configuration calls is. If you are managing encoder yourself you would do ICodecAPI setup before setting up media types. In Sink Writer scenario it already configured the media types, then you jump in with your fine tuning. Hence, my trick suggestion includes the part of resetting existing media types. Because this trick is sensitive to implementation details I would suggest to get current media types, then clear them on the MFT, do ICodecAPI thing and get the types back. I assume that this should work in greater number of scenarios, not just MS encoder. Yet it still remains an unreliable hack.
IMO Nvidia’s encoder implementation is terrible (worst across vendors), Intel’s is better but it still has its own issues. Again IMO the MFTs are only provided to meet minimal certification requirements for hardware video encoding and for this reason their implementation is not well aligned. Various software packages prefer to implement video encoding via vendor SDKs rather than Media Foundation Transform interface. In one of the projects I used to also skip the idea of leveraging MFTs for encoding, and implemented my own MFTs on top of vendor SDKs.
Would the class factory approach in this post work with the IMFSinkWriter? This would avoid writing too much code…
I suppose that yes, this should work even though I feel that it’s not a pleasant work to patch it that way. Also you might need to take into account support of HW encoders because Sink Writer also tends to use hardware assisted encoding in some cases, including scenario where it’s given a DXGI device.
Another sort of a hack, which is similar but maybe a bit less intrusive (although in its implementation you would have to have a better understanding of internals) is to redefine vendor specific encoder CLSIDs within Sink Writer initialization scope. There are just three encoders (AMD, Intel, Nvidia; okay there is fourth from Shanghai Zhaoxin Semiconductor but it is not really popular) and their CLSIDs are known. If you CoRegisterClassObject in a smart way, you could hook MFT instantiation letting Media Foundation to decide which encoder to choose. It is just another idea though, so it might depend what is the best to do on other factors.
The actual video frame is a P frame both in terms of MP4 box formatting and contained NAL units (the video is in fact an “infinite GOP” flavor of recording where all frames are P frames except the very first IDR one).
The problem is specific to fragmented MP4 files (and maybe even a subset of those), however is pretty much consistent and shows up with both H.264 and H.265/HEVC video.
Another problem (bug) with Microsoft Media Foundation MPEG-4 Media Source H.265/HEVC handler is that it ignores conformance_window_flagflag and values from H.265’s seq_parameter_set_rbsp (see H.265 spec, F.7.3.2.2.1 General sequence parameter set RBSP syntax).
The problem might or might not be limited to fragmented MP4 variants.
It is overall questionable whether it has been a good idea to report video stream properties using parameter set data. This is not necessarily bad, especially if it was accurately documented in first place. Apparently this raises certain issues from time to time, like this one: Media Foundation and Windows Explorer reporting incorrect video resolution, 2560×1440 instead of 1920×1080. Perhaps every other piece of software and library does not take a trouble to parse the bitstream and simply forwards values from tkhd and/or stsd boxes, why not?
Not the case of Media Foundation primitives which shake the properties out of bitstreams and their parameter sets. There is no problem if values match one another through the file of course.
A bigger problem, however, is that parsing out H.265/HEVC bitstream the media source handler fails to take into account cropping window… Seriously!
conformance_window_flag equal to 1 indicates that the conformance cropping window offset parameters follow next in the SPS. conformance_window_flag equal to 0 indicates that the conformance cropping window offset parameters are not present.
The popular resolution of 1920×1080 when encoded in 16×16 macroblocks is effectively consisting of 120×68 blocks with 1088 luma samples in height. The height of 1080 is obtained by cropping 1088 from either or both sides. By ignoring the cropping, Microsoft’s handler misreporting video height 1920×1088 even if all parts of video file have the correct value of 1080.
1920×1080 HEVC (meaning it does not play in every browser – beware and use Edge)
If there was a prize for the messiest SDK, Intel Media SDK would be a favorite. They seem to have put but special care to make things confusing, unclear, inconvenient to use and sick perplexed.
So there is no clear signal which versions of SDK have support for SkipFrame field. One has to query and the query itself is not something straightforward: one needs to build a multi-piece structure with request for multiple things among which this field is zeroed on the way back if the functionality is not supported. That could be fine if other vendors would not have shown that there are so much friendlier ways to expose features to developers.
Going further: the member itself is documented as introduced in SDK version 1.9. Good to know! Let us continue reading:
The enumeration itself is available since SDK version 1.11. That’s a twist!
To summarize, it is likely to be unsafe to do anything about this functionality, which is a small thing among so many there, before SDK 1.9. With SDK versions 1.9 and 1.10 the values are undefined because SDK 1.11 introduced enumeration, which was then extended in 1.13. Regardless, apart from SDK versions one needs to build a query (which alone makes you feel miserable if you happen to know how capability discovery is implemented by NVIDIA) because even though the field might be known to SDK runtime, its implementation might be missing.
However, as it often happens there is a silver lining if you look well enough: we have to thank Intel for the capability because AMD does not offer it at all.
Further experiments with Direct3D 11 shadertoy rendering: HTTP Server API integration and serving on demand parts of HTTP Live Streaming (HLS) asset using Media Foundation with hardware video encoding. An hls.js player is capable to read and play the content, including being able to step between quality levels.
A sort of a Google Stadia for shadertoys with video on demand and possibly low latency. Standard HLS low latency (I am not following latest HTTP/2 extensions for lower latency HLS) is of course not even near the real ultra-low latency that we have in Rainway for web based game streaming being at levels of as low as 10-20 milliseconds with HTML5 delivery, however the approach proves that it is possible to deliver content with on demand rendering.
Perhaps it is possible to use the approach to broadcast live content with server side GPU based post processing. With a single viewer it is easy to change quality levels because a client would request new segment without also downloading it in another quality. Since consumer grade H.264/H.265 encoders are not normally designed to encode much faster than realtime (1920×1080@100 for H.264 is something to align expectations with, perhaps with only higher end NVIDIA cards offering more), quality change can be handled easy, but doing several qualities at a time might be excessive load.
Simplicity of HLS syntax overall allows to format the virtual asset in a flexible way: it can be a true live asset, or it can be a static fixed length seek-enabled asset with on demand rendering from randomly accessed point.
I would also like to use this opportunity to mention another beautiful shader “The Universe Within” by Martijn “BigWings” Steinrucken, which is running on my screenshot.
Some time ago I found my account at Intel® Developer Zone was disabled. It was strange but who knows, let us go from assumption that there was a good reason.
For a moment I thought I was using wrong credentials, but I have saved ones. When password reset email did not show up, it was a bigger surprise – at least these things were supposed to be working. Username reminder did work and generally confirmed that I am using proper sign in data.
Given that “Contact Us” form is dedicated to login problems and being filled says “Thank you for contacting Intel. Your information has been submitted and we will respond to your inquiry within 48 hours.”, they seem to be disabling accounts from time to time and there is an emergency feedback channel for unexpected.
However I just figured out that a couple of weeks or more already passed, and there was no response. RIP Intel Developer Zone.
Video GPU vendors (AMD, Intel, NVIDIA) ship their hardware with drivers, which in turn provide hardware-assisted decoder for JPEG (also known as MJPG and MJPEG. and Motion JPEG) video in form-factor of a Media Foundation Transform (MFT).
JPEG is not included in DirectX Video Acceleration (DXVA) 2.0 specification, however hardware carries implementation for the decoder. A separate additional MFT is a natural way to provide OS integration.
Presumably the MFT has the behavior of normal asynchronous MFT, however as long as this markup does not have side effects with Microsoft’s software, AMD does not care for this confusion to others.
Furthermore, the registration information for this decoder suggests that it can handle decoding into MFVideoFormat_NV12 video format, and sadly it is again inaccurate promise. Despite the supposed claim, the capability is missing and Microsoft’s Video Processor MFT jumps in as needed to satisfy such format conversion.
These were just minor things, more or less easy to tolerate. However, a rule of thumb is that Media Foundation glue layer provided by technology partners such as GPU vendors is only satisfying minimal certification requirements, and beyond that it causes suffering and pain to anyone who wants to use it in real world scenarios.
AMD’s take on making developers feel miserable is the way how hardware-assisted JPEG decoding actually takes place.
The thread 0xc880 has exited with code 0 (0x0). The thread 0x593c has exited with code 0 (0x0). The thread 0xa10 has exited with code 0 (0x0). The thread 0x92c4 has exited with code 0 (0x0). The thread 0x9c14 has exited with code 0 (0x0). The thread 0xa094 has exited with code 0 (0x0). The thread 0x609c has exited with code 0 (0x0). The thread 0x47f8 has exited with code 0 (0x0). The thread 0xe1ec has exited with code 0 (0x0). The thread 0x6cd4 has exited with code 0 (0x0). The thread 0x21f4 has exited with code 0 (0x0). The thread 0xd8f8 has exited with code 0 (0x0). The thread 0xf80 has exited with code 0 (0x0). The thread 0x8a90 has exited with code 0 (0x0). The thread 0x103a4 has exited with code 0 (0x0). The thread 0xa16c has exited with code 0 (0x0). The thread 0x6754 has exited with code 0 (0x0). The thread 0x9054 has exited with code 0 (0x0). The thread 0x9fe4 has exited with code 0 (0x0). The thread 0x12360 has exited with code 0 (0x0). The thread 0x31f8 has exited with code 0 (0x0). The thread 0x3214 has exited with code 0 (0x0). The thread 0x7968 has exited with code 0 (0x0). The thread 0xbe84 has exited with code 0 (0x0). The thread 0x11720 has exited with code 0 (0x0). The thread 0xde10 has exited with code 0 (0x0). The thread 0x5848 has exited with code 0 (0x0). The thread 0x107fc has exited with code 0 (0x0). The thread 0x6e04 has exited with code 0 (0x0). The thread 0x6e90 has exited with code 0 (0x0). The thread 0x2b18 has exited with code 0 (0x0). The thread 0xa8c0 has exited with code 0 (0x0). The thread 0xbd08 has exited with code 0 (0x0). The thread 0x1262c has exited with code 0 (0x0). The thread 0x12140 has exited with code 0 (0x0). The thread 0x8044 has exited with code 0 (0x0). The thread 0x6208 has exited with code 0 (0x0). The thread 0x83f8 has exited with code 0 (0x0). The thread 0x10734 has exited with code 0 (0x0).
For whatever reason they create a thread for every processed video frame or close to this… Resource utilization and performance is affected respectively. Imagine you are processing a video feed from high frame rate camera? The decoder itself, including its AMF runtime overhead, decodes images in a millisecond or less but they spoiled it with absurd threading topped with other bugs.
However, AMD video cards still have the hardware implementation of the codec, and this capability is also exposed via their AMF SDK.
I guess they stop harassing developers once they switch from out of the box MFT to SDK interface into their decoder. “AMD MFT MJPEG Decoder” is highly likely just a wrapper over AMF interface, however my guess is that the problematic part is exactly the abandoned wrapper and not the core functionality.
The previous post was focusing on problems with the hardware MFT decoder provided as a part of video driver package. This time I am going to mention some data about how the inefficiency affects performance of video capture using a high frame rate 260 FPS camera as a test stand. Apparently the effect is better visible with high frame rates because CPU and GPU hardware is fast enough already to process less complicated signal.
There is already some interest from AMD end (deserves a separate post why this is exceptional on its own), and some bug fixes are already under the way.
The performance problem is less visible because the decoder is overall performing without fatal issues and provides expected output: no failures, error codes, no deadlocks, neither CPU or GPU engine is peaked out, so things are more or less fine at first glance… The test application uses Media Foundation and Source Reader API to read textures in hardware MFT enabled mode and discards the textures just printing out the frame rate.
AMD MFT MJPEG Decoder
C:\...\MjpgCameraReader\bin\x64\Release>MjpgCameraReader.exe
Using camera HD USB Camera
Using adapter Radeon RX 570 Series
Using video capture format 640x360@260.004 MFVideoFormat_MJPG
Using hardware decoder MFT AMD MFT MJPEG Decoder
Using video frame format 640x384@260.004 MFVideoFormat_YUY2
72.500 video samples per second captured
134.000 video samples per second captured
135.000 video samples per second captured
134.500 video samples per second captured
135.500 video samples per second captured
134.000 video samples per second captured
134.000 video samples per second captured
135.000 video samples per second captured
134.500 video samples per second captured
133.500 video samples per second captured
134.000 video samples per second captured
With no sign of hitting a bottleneck the reader process produces ~134 FPS from the video capture device.
Alax.Info MJPG Video Decoder for AMD Hardware
My replacement for hardware decoder MFT is doing the decoding of the same signal, and, generally, shares a lot with AMD’s own decoder: both MFTs are built on top of Advanced Media Framework (AMF) SDK. Driver package installs runtime for this SDK and installs a decoder MFT which is linked against a copy of the runtime (according to AMD representative, the static link copy shares the same codebase).
C:\...\MjpgCameraReader\bin\x64\Release>MjpgCameraReader.exe
Using camera HD USB Camera
Using adapter Radeon RX 570 Series
Using video capture format 640x360@260.004 MFVideoFormat_MJPG
Using substitute decoder Alax.Info MJPG Video Decoder for AMD Hardware
Using video frame format 640x360@260.004 MFVideoFormat_YUY2
74.000 video samples per second captured
261.000 video samples per second captured
261.000 video samples per second captured
261.000 video samples per second captured
261.000 video samples per second captured
260.500 video samples per second captured
261.000 video samples per second captured
261.000 video samples per second captured
261.000 video samples per second captured
261.000 video samples per second captured
260.500 video samples per second captured
Similar CPU and GPU utilization levels with higher frame rate. Actually, with the expected frame rate because it is the rate the camera is supposed to operate at.
1280×720@120 Mode
Interestingly, at lower FPS mode the AMD MFT threading issues are present, and, more to that the MFT exhibits two other issues (one of them is “just ignore” one per AMD comment). At the same time video capture rate is no longer reduced: the horsepower of the hardware is hiding the implementation inefficiency.
Using camera HD USB Camera
Using adapter Radeon RX 570 Series
Using video capture format 1280x720@120.000 MFVideoFormat_MJPG
Using hardware decoder MFT AMD MFT MJPEG Decoder
Using video frame format 1280x736@120.000 MFVideoFormat_YUY2
18.500 video samples per second captured
120.000 video samples per second captured
120.000 video samples per second captured
120.000 video samples per second captured
120.000 video samples per second captured
120.000 video samples per second captured
120.000 video samples per second captured
120.000 video samples per second captured
120.000 video samples per second captured
Intel Hardware M-JPEG Decoder MFT
AMD is not the only one GPU vendor out there and my development system is equipped with integrated GPU from Intel as well, so why not give it a try?
To AMD defence, Intel’s decoder is exhibiting a subpar performance:
C:\...\MjpgCameraReader\bin\x64\Release>MjpgCameraReader.exe
Using camera HD USB Camera
Using adapter Intel(R) UHD Graphics 630
Using video capture format 640x360@260.004 MFVideoFormat_MJPG
Using hardware decoder MFT IntelРѕ Hardware M-JPEG Decoder MFT
Using video frame format 640x368@260.004 MFVideoFormat_YUY2
24.000 video samples per second captured
63.500 video samples per second captured
63.500 video samples per second captured
64.000 video samples per second captured
63.500 video samples per second captured
63.000 video samples per second captured
63.500 video samples per second captured
62.000 video samples per second captured
63.500 video samples per second captured
64.000 video samples per second captured
63.500 video samples per second captured
At lower relative utilization levels and, again, without hitting any bottleneck visibly, the capture rate is reduced.
And this happens even without the threading problem I could at least see in the AMD’s case.
120 FPS mode is doing good:
C:\...\MjpgCameraReader\bin\x64\Release>MjpgCameraReader.exe
Using camera HD USB Camera
Using adapter Intel(R) UHD Graphics 630
Using video capture format 1280x720@120.000 MFVideoFormat_MJPG
Using hardware decoder MFT Intelо Hardware M-JPEG Decoder MFT
Using video frame format 1280x720@120.000 MFVideoFormat_YUY2
77.000 video samples per second captured
119.000 video samples per second captured
120.000 video samples per second captured
121.000 video samples per second captured
119.000 video samples per second captured
121.000 video samples per second captured
120.000 video samples per second captured
120.000 video samples per second captured
120.500 video samples per second captured
119.500 video samples per second captured
120.000 video samples per second captured
That is, there is an obvious performance issue in Intel’s implementation since they fail to process lower resolution signal at original rate and even at rate they are showing for higher resolution signal!
So does 1920×1080@60:
C:\...\MjpgCameraReader\bin\x64\Release>MjpgCameraReader.exe
Using camera HD USB Camera
Using adapter Intel(R) UHD Graphics 630
Using video capture format 1920x1080@60.000 MFVideoFormat_MJPG
Using hardware decoder MFT Intelо Hardware M-JPEG Decoder MFT
Using video frame format 1920x1088@60.000 MFVideoFormat_YUY2
49.500 video samples per second captured
60.500 video samples per second captured
59.500 video samples per second captured
60.000 video samples per second captured
60.000 video samples per second captured
60.000 video samples per second captured
60.000 video samples per second captured
60.000 video samples per second captured
60.000 video samples per second captured
60.000 video samples per second captured
60.000 video samples per second captured
In closing
Bottom line is that hardware ASICs are generally good, but the quality of software MFT layer is not something GPU vendors care much of.
The application below does the testing on first available GPU and it assumes you have a video capture compatible to Media Foundation API. The application uses highest frame rate MJPG format of the camera and uses a hardware decoder MFT associated with the GPU.
One more thing to mention is that video capture takes place through so called Microsoft Windows Camera Frame Server (FrameServer) Service, notorious and not documented. Frame Server virtualizes video capture device adding processing overhead and cross-process synchronization.
Some time later I will compare performance of capturing around Frame Server and around Media Foundation default implementation of video capture device proxy. I expect though that there is no visible performance difference as those are, eventually, done well.
“Modern” C++/WinRT is the way to write rather powerful things in a compact and readable way, mixing everything you can think of together: classic C++ and libraries, UWP APIs including HTTP client, JSON, COM, ability to put code into console/desktop applications, async API model and C++20 coroutines.
Fragment of Telegram bot code snippet that echoes a message back, written with just bare Windows 10 SDK API set without external libraries, for example:
In continuation of previous post about C++/WinRT and Telegram, here we with @ParameterSetAnalyzeBot: “Your buddy to extract H.264 parameter set NAL data”. In a chat, it expects an MP4 file with an H.264 video track sent him (her?). Then it extracts data from sample description box and deciphers into readable form:
It’s literally taking the MP4 file to the Media Foundation Source Reader API, pulls MF_MT_MPEG_SEQUENCE_HEADER and pipe the data to h264_analyze tool (my fork of it has Visual Studio 2019 project, and is added ability to take input from stdin for piping needs).
Maybe it is worth adding full Media Foundation attribute printout as well, and similar H.265/HEVC data. This will have to wait for next occasion though.
And – yeah – it does have support for fragmented MP4 too:
Comparison of time codes is one method, and getting impression on latency through driving is another. Rainway Xbox One UWP application as a thin client to a desktop PC game.
Whoever the engineers who wrote the core technology, the minimal-latency streaming code – wow, I am so impressed by what they’ve created! It’s SO quick, like I’m streaming now from two computers to remote platforms, and everything is all over WiFi and latency is 9ms or less. This is giving life to some old hardware, and it’s enabling me to use my computer anywhere.