Why _would_ you expect anything different to happen at a massive company?
It's not like the team who write the drivers are likely to know of the team working on optimizing compilers, profilers, or anything at all really.
My experience has been that especially in companies working in diverse disciplines across disparate codebases, very little is shared. A team of 8 in a tiny company is just as likely to make the same mistakes as the team of 8 in a bigger company. At large companies with more unified codebases and disciplines, maybe one person or team has added some process which helps identify egregious performance issues at some point in the past. But such shared process or tooling would be really hard at a company like Intel where one team makes open-source Linux drivers while another makes highly specialized RTL design software, for example.
>Why _would_ you expect anything different to happen at a massive company?
Because a massive company has enough money around to put the processes in place and hire skilled people to do both deep[0] testing and system[1] testing.
[1] The definition of "system testing" I'm using: "Testing to assess the value of the system to people who matter." Those include stakeholders, application developers, end users, etc.
Maybe they did profile it and this fix is the result. Or maybe Vulkan raytracing on Linux for an unreleased GPU is lower priority and they just recently got around to noticing it.
Massive companies are more prone to silly errors like this.
Source: I work for a similar massive company. You would not believe the amount of issues similar to this. This one is gettin attention because it happened in open source code.
Does the company have people whose job description includes looking for deeper problems such as this one?
I don't know what your position or political standing in the company is, but I assume that with the tech job market the way it is, if you still work there you care about the company to some degree. So perhaps bringing this issue up with (more) senior management is the way to go.
And if they say there is no budget, or that it would take a bureaucratic nightmare to make space for it in the budget, ask them what the budget is for dealing with PR disasters such as this one.
That's really ignorant given that Intel has thousands of software engineers supporting hundreds of opensource projects you use daily. Including Linux where Intel has consistently a top ten contributor for years.
This mistake could easily have been in other vendors Linux GPU drivers, they in the end don't have nearly the same priority (and in turn resources) as the Windows GPU drivers. And it's a very easy mistake to find. And I don't know if anyone even cared about ray tracing with Intel integrated graphics on Linux desktops (and in turn no one profiled it deeply). I mean ray tracing is generally something you will do much less likely on a integrated GPU. And it's a really easy mistake to make.
And sure I'm pretty sure their software department(s?) have a lot of potential for improvement, I mean they probably have been hampered by the same internal structures which lead to Intel faceplanting somewhat hard recently.
Even so, the very first thing anybody learns about GPU programming is to use the VRAM on the card whenever possible, and to minimize transfers back and forth between VRAM and main memory. This is a super basic mistake that should have been caught by some kind of test suite, at least.
Intel's high-level software teams are okay, and their hardware teams are great, but their firmware teams are a bit of a garbage fire. I assume that nobody really wants to work on firmware, and the organization does not encourage it.
I'm not sure it seems like something you'd easily find through profiling? The change was changing a memory allocation to use GPU memory rather than system memory. Allocating system memory probably isn't noticeably slower than allocating GPU memory, so the line that's at fault wouldn't show up when profiling. Instead, memory access in GPU-side raytracing code is just a bit slower when accessing the allocated memory.
So you would have to profile GPU-side code, which is probably really hard; and you'd have to find slow memory accesses, not slow code or slow algorithms, which is even harder. And those memory accesses may be spread out, so that each instruction which uses the slow memory won't stand out; the effect may only be noticeable in aggregate.
People working at big companies are ALWAYS worried about releasing lots of code that they need to fulfill some monthly or quarterly goals. These ideas that they have time to profile, improve, or check results are inconsistent with reality. When you see real code produced at big companies, it is barely good enough to satisfy the requirements, forget about any sense of high quality.
Not just Intel but programmers in general have got to demand better tools and use the tools they have. This is an obvious problem if you can see it. It needs to be on every programmers checklist to profile.
That could be a GPU memory leak in an application, no? When an application allocates GPU memory, that's taken from main system memory on integrated chops, and the Intel driver would be responsible for that.