So even though OpenGL has been expanded dramatically since the days of the original wglCreateContext, this coupling has been retained.
This causes a nasty problem for multi-target output.
With the benefit of hindsight, I think it’s fair to say that this implicit behavior was
always a bad idea, it just never really caused that much of a problem because most early OpenGL usage involved a single thread targeting a single window on a single monitor. Since this trivial scenario only involves one device context, there was never an issue.
However, now that it is typical for modern applications to be multi-threaded
and target multiple windows on multiple monitors, the implicit device context targeting becomes a real issue.
Let’s start with the common case.
The most obvious problem the implicit device context target causes is that it is by definition impossible to draw to more than one window from the same OpenGL device context without repeatedly calling wglMakeCurrent() to switch its target. Even if all the application wants to do is draw one full-screen image to one monitor, and one full-screen image to another monitor, Windows treats this as two separate windows with two separate device contexts, thus at least one wglMakeCurrent() per window is required every frame.
This is bad for two reasons. First, wglMakeCurrent() is not assumed to be a lightweight operation on OpenGL drivers, and in general the OpenGL driver has no way of knowing a priori that all you’re doing with the wglMakeCurrent() is flip-flopping between two consistently targeted windows. Thus, if a driver wants to support this path efficiently, it introduces yet more “pattern discovery” into the GPU driver as it tries to determine that the user is not
actually targeting a fresh window requiring new setup, but rather simply switching between two consistent windows in an attempt to update them both.
Second, wglMakeCurrent() is not an OpenGL call per se, it’s a Windows call, thus it is not platform-independent and cannot be used by the platform-independent part of a graphics layer. While this may seem like a minor issue, it turns out to be rather unfortunate, since operations that target the window back buffer now must thunk to the platform layer and back in order to ensure that the magic “target” of the operation has been set up before they can continue issuing what would otherwise have been platform-independent OpenGL code.
Things get worse when more threads get involved.
Things don’t get any better when multi-threaded code is involved. As per the original intention with “one GL context per thread”, it is actually well-specified in OpenGL for an application to make each of its threads (say, one per CPU core) have its own GL context so that streams of commands can be constructed separately. Assuming these threads work off a job queue, whichever thread dequeues some OpenGL work from the queue can simply do it on their context, and everything is fine.
However, all that falls apart once you need to target an output surface. Because each thread’s OpenGL context is implicitly targeting only one device context, a thread that tries to do OpenGL work that targets a specific device context must call wglMakeCurrent() using the HDC of the window in question. But it is illegal for two OpenGL contexts to target the same HDC at the same time, so now the two competing threads must introduce a mutex around the wglMakeCurrent().
While it may seem like this would be necessary in any case, the truth is that no such mutex is required during
construction of the frame’s GPU commands — rather, the mutex (or more specifically, fence) required here is one on the GPU command
retirement, namely that the streams of commands built by the threads should retire in a specific order on the
GPU, regardless of when they were articulated to the driver.
There is a very simple API change that would solve these problems.
All of these problems come from one root cause: the fact that OpenGL contexts implicitly target a single Windows device context. But there’s no real reason that has to remain true. We could add a
single pair of API calls and fix it.
OpenGL already has a robust set of APIs that deal with the concept of
framebuffers. This was a necessary addition to the API because developers wanted to be able to render to textures, so instead of always having rendering target the implicit device context, they needed ways of targeting a texture instead. Thus we already have:
When these functions were added, the implicit framebuffer handle of 0 became shorthand for the implicit device context target, and any other handle was one that the application could create and target for its own internal use.
So in some sense, the problem of how to tell the GL where output is suppose to go was
already solved, and already works quite well. Thus all we really need to do to solve all the Windows-specific targeting problems is remove the
implicit device context targeting of framebuffer handle 0 and replace it with
explicit device context targeting with specific framebuffer handles. We can do this with two very simple extensions to wgl:
Note that the wglReleaseDCFramebuffer call isn’t even really necessary for most usage, because device contexts may never have to be explicitly released, but it might be nice for completeness in the case where OpenGL rendering ceases on a device context and the application would like to let the GL driver know it can free any associated resources.
Specifically, if multiple GL contexts call wglAcquireDCFramebuffer on the
same device context, the GL driver has to be smart enough to be thread-safe with its processing of commands targeted at those contexts. It would still be entirely up to the application to ensure that these are issued mutually-exclusively, so that the order in which the driver got them was always the order the application wanted them.
This doesn’t strike me as a particularly good way to do things, because it forces the driver to do implicit ordering of things, which may restrict how threading can work in the driver and lead to unnecessary complexity or performance problems. Thus I would additionally propose that, were such an extension to actually be added, there should be one requirement:
Whenever two or more OpenGL contexts target the same framebuffer, they are
required to use a fence.
To relieve the driver from any responsibility of serializing command streams across threads, any application using two OpenGL contexts to target the same device context for output should be required to insert a fence that serializes them: