[WC] PC gamers experiencing crashes with 13/14th gen Intel Core i9 CPUs | UP: Intel issues root cause findings, further stability update inbound

OP
OP
anonpuffs

anonpuffs

Veteran
Icon Extra
29 Nov 2022
10,418
11,877
If AMD doesn't gain more share after this then there's nothing else they can do.
Probably won't. Majority of Intel's market share comes from laptops and large corporate vendors like the dells and hps that sell basement tier prebuilts to corporate offices. Those are much more sensitive to price than performance and Intel has the advantage in that they own their own foundry and can thus sell with lower margins. Outside of a few temporary products intel is producing with tsmc, I don't see that much changing. Also, because intel owns its own foundry and has only recently tried starting to take external orders, they have a far larger readily available supply for laptops which is a large part of what's limiting AMD market share there.
 
  • Like
Reactions: Systemshock2023
OP
OP
anonpuffs

anonpuffs

Veteran
Icon Extra
29 Nov 2022
10,418
11,877
lmaoooo this is insane, recommended specs from intel degrade their silicon? holy shit
No, the quick and dirty is this:
For years intels have published "recommended" power specs, which motherboard vendors of course ignored because they weren't enforced.
Then when questioned intel would always respond that the motherboard vendor settings were "in-spec" even if they turned off all the limiters and essentially ran unlimited power as long as the tjmax was respected. Which led to motherboard vendors learning how to game the lomits.
Now the 13th and 14th gen i7s and i9s are degrading (allegedly) because people think those motherboard vendor settings were causing the cpus to draw too much current.

But intel is trying to play hush-hush because the water is a little muddy on who is to blame (ofc the answer is both intel and mobo vendors). The issue is intel's new baseline specs can reduce the performance of their top end parts by ~10% and they don't want that to be the numbers benchmarkers go with during the product review cycle, but they also probably don't want a huge number of rma claims from people with raptor lake parts lol
 
  • Like
Reactions: Ghaleon
OP
OP
anonpuffs

anonpuffs

Veteran
Icon Extra
29 Nov 2022
10,418
11,877

Update: Level1techs has reached out to game devs to ask for crash telemetry and found some serious problems with Intel 13th/14th gen processors crashing that he alleges may not be able to be fixed through motherboard bios updates

-Rate of errors increasing over time
-looked at rate of errors from datacenter/server-type motherboards, even found crashing in cpus running on conservative motherboards with much lower power and speed.
-saw similar rates of failure across motherboard manufacturers, which shouldn't be happening if it's a problem with power settings as each motherboard vendor has different defaults
-was able to talk to a datacenter server provider, intel in a particular contract was costing +~$1000 per unit over similarly-configured AMD systems due to an increase in service contract pricing due to having a multitude of CPU problems
 
Last edited:
  • Informative
Reactions: Ghaleon

ToTTenTranz

Veteran
Icon Extra
4 Aug 2023
1,496
1,560
Update: Level1techs has reached out to game devs to ask for crash telemetry and found some serious problems with Intel 13th/14th gen processors crashing that he alleges may not be able to be fixed through motherboard bios updates

-Rate of errors increasing over time
-looked at rate of errors from datacenter/server-type motherboards, even found crashing in cpus running on conservative motherboards with much lower power and speed.
-saw similar rates of failure across motherboard manufacturers, which shouldn't be happening if it's a problem with power settings as each motherboard vendor has different defaults
-was able to talk to a datacenter server provider, intel in a particular contract was costing +~$1000 per unit over similarly-configured AMD systems due to an increase in service contract pricing due to having a multitude of CPU problems
Wow these are pretty damning is true..
 
  • Like
Reactions: Ghaleon
OP
OP
anonpuffs

anonpuffs

Veteran
Icon Extra
29 Nov 2022
10,418
11,877
I have a i9 13900k with RTX4090.

So am I DOOMED....
are you experiencing instability? if not then probably not. still worth keeping an eye out since 10-25% figure seems very high, and that could increase over time if this is an actual silicon degradation issue and not some sort of manufacturing defect
 
29 Jan 2024
20
16
are you experiencing instability? if not then probably not. still worth keeping an eye out since 10-25% figure seems very high, and that could increase over time if this is an actual silicon degradation issue and not some sort of manufacturing defect
Not really...at least for now.

But the fear inside its too high at this moment
 
  • Like
Reactions: Ghaleon
OP
OP
anonpuffs

anonpuffs

Veteran
Icon Extra
29 Nov 2022
10,418
11,877
Not really...at least for now.

But the fear inside its too high at this moment
well if you want peace of mind maybe limit the boosting behavior of your chip and lower the power limits until the thing blows over and intel issues a bigger fix/root cause findings. 10% isn't that much to lose and you won't even lose that much if you're playing at higher res.
 
OP
OP
anonpuffs

anonpuffs

Veteran
Icon Extra
29 Nov 2022
10,418
11,877
Yikes

Intel is selling defective 13-14th Gen CPUs​


My team at Alderon Games, working on the multiplayer dinosaur survival game Path of Titans, has been encountering significant problems with Intel CPU stability. These issues, including crashes, instability, and memory corruption, are confined to the 13th and 14th generation processors. Despite all released microcode, BIOS, and firmware updates, the problem remains unresolved.

We have identified failures in five main areas:

  • End Customers: Thousands of crashes on Intel CPUs on 13th and 14th Gen CPUs in our crash reporting tools.
  • Official Dedicated Game Servers: Experiencing constant crashes, taking entire servers down.
  • Development Team: Developers using these CPUs face frequent instability while building and working on the game. It can also cause SSD and memory corruption.
  • Game Server Providers: Hosting community servers with persistent crashing issues.
  • Benchmarking Tools: Decompression and memory tests unrelated to Path of Titans also fail.

Over the last 3–4 months, we have observed that CPUs initially working well deteriorate over time, eventually failing. The failure rate we have observed from our own testing is nearly 100%, indicating it's only a matter of time before affected CPUs fail. This issue is gaining attention from news outlets and has been noted by Fortnite and RAD Game Tools, which powers decompression behind Unreal Engine.

Users are also receiving misleading error messages about running out of video driver memory, despite having sufficient memory.

Actions We Are Taking​


To prevent further harm to our game, we are implementing the following measures:


  • Server Migration: We are swapping all our servers to AMD, which experience 100 times fewer crashes compared to Intel CPUs that were found to be defective.
  • Hosting Recommendations: We advise anyone hosting Path of Titans servers or selling game servers to avoid purchasing or using 13th and 14th gen Intel CPUs.
  • In-Game Notifications: We are adding a popup message in-game to inform users with these processors about the issue. Many users are currently unaware of why their game is crashing and what they can do about it.
____________
warframe dev post
1000002267.png
 
Last edited:
OP
OP
anonpuffs

anonpuffs

Veteran
Icon Extra
29 Nov 2022
10,418
11,877

-has been working on the story for months behind the scenes, have gotten a wide variety of tips
-believes the reason Intel has been quiet is because if a large number of chips aren't able to stably perform at advertised levels they may be exposed to legal liabilities
-is working with failure analysis labs on a tip for a manufacturing-stage defect based on chemical deposition leading to oxidation
-is putting out a video due to the imminent release of zen5 while they continue the analysis
-basically can't recommend any 13th or 14th gen intel parts until intel makes a statement
-is asking viewers for more data points if they have failed intel parts
-have instability reports going as far back as March of 2023
-still unclear how widespread the issue it is
-challenging to disambiguate what the root causes are due to so many possible failure modes
-still in the process of validating failed specimens reported to GN
-major intel customer reported to GN that over 8 million 13th gen CPUs are potentially affected from their inventory, 6.1 million of which range from 13600k to 13900k, including -k, -f, -t, -kf and non-k models. Total affected population ranges between 10-25% depending on where they were deployed; actual known units with instability and failure range between 600k-2million
-same customer does not have 14th gen statistics yet, but is expected to affect them too. expected to affect units from 03/2023 to at least 04/2024
-customer stressed that the 'power limit' story is not the primary problem - it is a deeper problem with the chips themselves
-source claimed that a manufacturing error led to oxidation of the vias (power/signaling conduits between different layers of a chip)
-spoke to failure analysis lab and gave a rundown of how potential oxidation would occur
-alleged leaked document says the maximum officially supported memory speeds will be reduced from DDR5-5600 to DDR5-4800
-has received a list of affected companies including large hedge funds - namedropped citadel
-claims intel has told OEMs that they have observed a 0.035% failure rate worldwide, while OEMs claim 10-25%; 10-25% figure comes from more than one unconnected sources (one was wendell's)
-another system integrator is failing 12% of their intel CPUs during intake
-after reviewing QA processes of several system integrators it seemed that the ones that did longer and more intensive testing were failing more intel CPUs
-referenced a recent buildzoid video that speculated that the ringbus was getting blasted by high voltages and degrading as also a possibility
-regardless of what the root cause is, there are already millions of units affected - potentially being seen in server ecosystem due to the heavy usage/uptime
 
  • Informative
Reactions: Ghaleon

Cool hand luke

Veteran
14 Feb 2023
2,895
5,143
Hilarious. But that's what you get when you cobble together a gaming experience desperate to emulate the one a perfectly formed console offers.
 
OP
OP
anonpuffs

anonpuffs

Veteran
Icon Extra
29 Nov 2022
10,418
11,877
Seems like we have a proper response from Intel.

July 2024 Update on Instability Reports on Intel Core 13th and 14th Gen Desktop Processors​


Based on extensive analysis of Intel Core 13th/14th Gen desktop processors returned to us due to instability issues, we have determined that elevated operating voltage is causing instability issues in some 13th/14th Gen desktop processors. Our analysis of returned processors confirms that the elevated operating voltage is stemming from a microcode algorithm resulting in incorrect voltage requests to the processor.
Intel is delivering a microcode patch which addresses the root cause of exposure to elevated voltages. We are continuing validation to ensure that scenarios of instability reported to Intel regarding its Core 13th/14th Gen desktop processors are addressed. Intel is currently targeting mid-August for patch release to partners following full validation.
Intel is committed to making this right with our customers, and we continue asking any customers currently experiencing instability issues on their Intel Core 13th/14th Gen desktop processors reach out to Intel Customer Support for further assistance.


Honestly? I don't know if I believe it. I hope it resolves the issues but if it was a simple voltage fix why did this problem persist for over a year with no acknowledgement or fix? Also server motherboards are much more locked down for voltages and yet the CPUs are still failing when socketed in them. I smell desperation.

Also, hilariously enough

Intel says 13th and 14th Gen mobile CPUs are crashing, but not due to the same bug as desktop chips — chipmaker blames common software and hardware issues
Amidst reports of the 13th and 14th Generation processor instability extending to mobile chips, Intel has sent a statement to Tom's Hardware to clarify the situation. While there has been instability feedback on some mobile SKUs, the cause of the instability differs from their desktop counterparts.

"Intel is aware of a small number of instability reports on Intel Core 13th/14th Gen mobile processors.


"Based on our in-depth analysis of the reported Intel Core 13/14 Gen desktop processor instability issues, Intel has determined that mobile products are not exposed to the same issue. The symptoms being reported on 13/14 Gen mobile systems – including system hangs and crashes – are common symptoms stemming from a broad range of potential software and hardware issues.

"As always, if users are experiencing issues with their Intel-powered laptops we encourage them to reach out to the system manufacturer for further assistance.
" — Intel representative to Tom's Hardware.

Alderon Games was one of the few companies that shared its statistics about Raptor Lake and Raptor Lake Refresh Core i9 crash rates. The founder, Matthew Cassells, recently stated that although the company's laptops with mobile variants crashed less frequently than the desktop chips, the issue still existed on laptops.

Cassells responded to Intel's statement in a Reddit thread:

"The laptops crash in the exact same way as the desktop parts including workloads under Unreal Engine, decompression, ycruncher or similar. Laptop chips we have seen failing include but not limited to 13900HX etc.," Cassells said.
Sure Jan GIF

what a clusterfuck
 
Last edited:
  • Like
Reactions: Ghaleon
OP
OP
anonpuffs

anonpuffs

Veteran
Icon Extra
29 Nov 2022
10,418
11,877
New claim by Visual FX studio claims 50% failure rate

Unreal Engine supervisor at ModelFarm blasts 50% failure rate with Intel chips — company switching to AMD's Ryzen 9 9950X, praises single-threaded performance​

Dylan Browne, an Unreal Engine Supervisor and Feature Film VFX at the ModelFarm visual effects studio, posted on X that his company is experiencing a 50% failure rate for systems powered by Intel's Core i9-13900K and 14900K processors. As a result, the company is deploying AMD's as-yet-unreleased Zen 5 Ryzen 9 9950X processors in place of Intel-powered solutions, with Browne praising AMD's single-thread performance.

The report represents yet another piece of unwelcome news for Intel, which announced yesterday that it had found the root cause of the issues and will issue a microcode mitigation in mid-August. (This isn't a 'fix' for CPUs experiencing the issue — impacted processors are irreversibly damaged and must be replaced.)


The problems with Unreal Engine aren't entirely unexpected, as early reports of the Intel crashes revolved around the Oodle compression used with the game engine. The news that ModelFarm is dropping Intel CPUs follows game studio Alderon Games' announcement that Intel desktop CPUs have a 100% crash rate, and laptop chips of the same generation are also affected. However, Intel has now disputed the claims of laptop failures.


Browne claimed that two brand-new processors immediately exhibited instability, while a few others took some time to exhibit symptoms. The computers were all focused on Unreal Engine work, which works best with multi-core systems.

Browne is "fairly sure" that most of the unstable systems use Asus ROG motherboards but will provide an update with more specifics later. However, the Intel chips exhibited instability even with lower power limits. Browne said the motherboards of the affected CPUs have already been tweaked, but that didn’t seem to help with the problems the Core i9-14900Ks and 13900Ks were experiencing.

Intel has acknowledged the issue and announced a solution to the instability problem yesterday. However, the microcode patch to address the problem isn’t expected to arrive until mid-August, so we’re unsure if this will truly stop the crashes. Nevertheless, businesses cannot wait that long for a system that doesn’t suffer from this issue, and Browne said that any new machines for the studio he’s working at will use AMD Ryzen 9950X chips.

The Ryzen 9000 series is expected to become available on store shelves on July 31, but some retailers have already added them to their online stores. Nevertheless, some organizations likely already have early samples of the chips for testing under NDA, and ModelFarm seems to be among them.

more at the link

Also, GN responds to the Intel statement
 
  • Like
Reactions: Ghaleon
OP
OP
anonpuffs

anonpuffs

Veteran
Icon Extra
29 Nov 2022
10,418
11,877

What Intel didn’t write on Reddit but thinks internally – The search for the solution to the Raptor Lake S instabilities continues (Leak)​

I already posted yesterday in a news item and briefly commented on what Intel officially published on the night of 22 July 2024 via Reddit as a kind of interim report. However, according to the unanimous opinion of all colleagues and readers, the whole thing is unsatisfactorily short and contains nothing substantial apart from the announcement of another microcode update for mid-August 2024 and also leaves more questions unanswered than it answers. Somewhat later, however, internal communication seems to have got underway, so I would now like to add excerpts from an internal statement for our readers, which Intel, for whatever reason, has not (yet) written publicly.


There is nothing extremely secret or reprehensible in it, but the content also shows a certain perplexity and Intel’s continuing efforts to find a final clarification. And because I want to avoid misinterpreting content and ignoring headlines this time, I have made a little more effort and commented out the relevant passages so as not to create room for too many interpretations, some of them contrary.
Now let’s move on to what we unfortunately haven’t been able to read yet. The so-called “Problem Statement” briefly summarizes the situation, according to which it mainly refers to feedback from end users on board hardware and the Core i7 and i9 processors of Raptor Lake S and the refresh, but excludes the server area, embedded and mobile systems such as notebooks. This is also nothing new at first, but we are no longer talking explicitly only about K models, as was the case with the eTVB bug, but the entire lineup above the Core i5. There is nothing about these in the internal report, although the Core i5 (apart from the K model) are still based on the Alder Lake S (C0), which is not affected anyway.


– Intel customers have reported recurring OS and application hangs and errors on 13th and 14th Generation Intel desktop processors, particularly Core i7 and Core i9 SKUs.
– Reports to date have come primarily from end user enthusiast/gaming systems with commercial ODM motherboards, and OEM workstations.
––Intel—





This is all still relatively unexciting until the so-called “debug status”. Intel is said to have analyzed processors sent in from RMA cases and measured a significant increase in Vmin, i.e. the lower limit of the operating voltages. Intel also writes interesting details about the cumulative and accumulating effects that ultimately lead to a much too high Vmin. However, this briefly described analysis by Intel also shows that the maximum voltage requested by the processor must definitely be reduced in order to reduce or eliminate the cumulative exposure to voltages that can lead to an increase in Vmin.


So that’s confirmed so far, but they will continue the investigation to fully understand the root cause (again, Intel refers to this as a kind of “root cause”, but not THE root cause) and also address other potential aspects of this problem. Again, I can’t really find anything that couldn’t have been shared with the public on Reddit. Except for the fact that they have found symptoms but are still looking for root causes. Of course, the full description would have been better, but in view of the Ryzen launch next week, the short version that has now been brought forward is at least somewhat comprehensible.


– Intel observes a significant increase to the minimum operating voltage (Vmin) across multiple cores on returned affected processors from customers.
– This increase is similar in outcome to parts subjected to elevated voltage and temperature conditions for reliability testing.
– Factors contributing to this Vmin increase include elevated voltage, high frequency, and elevated temperature.
– Even under idle conditions at relatively cool temperatures, sporadic elevated voltages are observed when the processor is resumed from low power states in order to service background operations before entering a low power state again.
– At a sufficiently high voltage, these short-duration events can accumulate over time, contributing to the increase in Vmin.
– Intel analysis indicates a need to reduce the maximum voltage requested by the processor in order to reduce or eliminate accumulated exposure to voltages which may result in an increase to Vmin.
While Intel has confirmed elevated voltages impact the increase in Vmin, investigation continues in order to fully understand root cause and address other potential aspects of this issue.
––Intel—
However, solutions should also be found, even if it is only a preventive measure or a kind of workaround. Or a complete replacement. The conclusions are also quite remarkable, because the microcode to be provided in August for the official (NDA) board partners (which is then to be distributed via the respective UEFI of the mainboard manufacturers) only addresses the problem with the minimum operating voltage Vmin. This also includes a VID limit of 1.55 volts as a possible solution, which must not be overridden by any automatic mechanism.


In addition, a small number of benchmarks are said to have measured minimal performance losses and the timing of the responsible microcode to the time after the Ryzen launch also has a slight aftertaste here. Or it is simply due to the time that still needs to be taken. But it is again emphasized that further investigations are necessary to ensure that all possible circumstances have been covered. Intel also explains that this microcode update may not fix all systems that show the known symptoms. In this case, the SKU should be replaced via an RMA process.


– Intel is validating a microcode update to limit VID requests above 1.55V as a potential future corrective action, targeted for production release in mid-August to NDA customers.
Early testing by Intel on a small number of benchmarks indicates minimal performance impact due to this microcode change.
– While this microcode update addresses the elevated voltage aspect of this issue, further analysis is required to understand if this proposed mitigation addresses all scenarios.
This microcode update, once validated and released, may not address existing systems in the field with instability symptoms.
Systems which continue to exhibit symptoms associated with this issue should have the processor returned to Intel for RMA.
––Intel—
Once again: There is actually nothing in it that could not have been written, except perhaps the fact that they are still not sure where the cause lies, but they are certain of it.
More at the link

TL;Dr it sounds like IgorsLab has a source document or quote that shows that Intel has found the root cause of the issue, but not the real root cause. Basically for some reason the processor requests high spike voltage in low load scenarios transitioning from idle to powered states, which accumulates degradation damage to the circuitry over time. However it sounds like either they haven't found all of the causes of the voltage requests, or are unsure of if they have.
 
  • Informative
Reactions: Ghaleon
OP
OP
anonpuffs

anonpuffs

Veteran
Icon Extra
29 Nov 2022
10,418
11,877
techspot: all 65w+ CPUs affected

Intel's crashing CPU crisis deepens as more models are affected than originally thought​


The issue extends beyond enthusiast chips, hinting at a more complex root cause​

A hot potato: Intel probably thought the worst was behind them after the company identified the source of the instability surrounding its 13th- and 14th-gen CPUs and promised a patch to address the issue. But new reports say that the patch won't resolve the problems for processors already experiencing crashes. Even worse, whatever the problem is, it affects a broader range of models than previously assumed.
The news coming out of Intel about its crashing 13th- and 14th-generation CPUs is not getting any better, even after they said they had finally "solved" the mystery behind the instability, and promised that a patch should arrive by the middle of next month.


The first disappointment is that the patch won't fix the processors if they are already crashing. Intel has advised owners to use Intel Default Settings in their motherboard BIOS while waiting for the microcode update, although this is not a guaranteed fix. But it appears the best course of action for customers that have already experienced damage is to simply replace the processor instead of tweaking BIOS settings. Intel would not share estimates with reporters of how many chips are likely to be irreversibly impacted.
Worse, it now appears that the crashing issue is also affecting all 65W and higher CPUs as well as the mainstream non-K models alongside their K/KF/KS variants.
More at the link
 
  • Informative
Reactions: Ghaleon