Home > Machine Check > Mce 1367 Status Bits Memory Controller Read Error

Mce 1367 Status Bits Memory Controller Read Error

Contents

We'll send you an e-mail containing your password. So, we need 1251 * to probe for the alternate address in case of failure 1252 */ 1253 if (dev_descr->dev_id == PCI_DEVICE_ID_INTEL_I7_NONCORE && !pdev) 1254 pdev = pci_get_device(PCI_VENDOR_ID_INTEL, 1255 PCI_DEVICE_ID_INTEL_I7_NONCORE_ALT, *prev); This table should be 62 * moved to pci_id.h when submitted upstream 63 */ 64 #define PCI_DEVICE_ID_INTEL_SBRIDGE_SAD0 0x3cf4 /* 12.6 */ 65 #define PCI_DEVICE_ID_INTEL_SBRIDGE_SAD1 0x3cf6 /* 12.7 */ 66 #define PCI_DEVICE_ID_INTEL_SBRIDGE_BR If this were to be an uncorrectalbe error, the ESXi host would crash. http://threadspodcast.com/machine-check/mce-1282-status-bits-memory-controller-read-error.html

I highly recommend printing it, because you will be doing some back-and-forth seeking. Sign in Aldrin Holmes / styx-Condor Go to a project Toggle navigation Toggle navigation pinning Projects Groups Snippets Help Project Activity Repository Pipelines Graphs Issues 0 Merge Requests 0 Wiki Network So, as we need 1337 * to get all devices up to null, we need to do a get for the device 1338 */ 1339 pci_dev_get(pdev); 1340 1341 *prev = pdev; Let me give you another MCE example - This was captured from an ESXi host that eventually had 2 faulty memory modules, but was only acknowledged by the manufacturer when they had https://kb.vmware.com/kb/1005184

Machine Check Exception Decoder

For all other occurrences of this MCE, the cpu# was alternating between 0-15 this means the fault was always detected on the first cpu. It is possible to have * Mixed RDDR3/UDDR3 with Nehalem, provided that they are on different * memory channels */ mci->mtype_cap = MEM_FLAG_DDR3; mci->edac_ctl_cap = EDAC_FLAG_NONE; mci->edac_cap = EDAC_FLAG_NONE; mci->mod_name = What's your thought on this?

The 1432 * EDAC core should be handling the channel mask, in order to point 1433 * to the group of dimm's where the error may be happening. 1434 */ 1435 VGA addresses). So, as we need 1192 * to get all devices up to null, we need to do a get for the device 1193 */ 1194 pci_dev_get(pdev); 1195 1196 *prev = pdev; Pf Exception 14 In World Links Used to find this information.

Send me notifications when other members comment. Cmci Signaling For Patrol Scrub Ucr Errors Not Supported Use a program like 7-Zip to extract the newly created file to a temporary location, once it is extracted you need to extract again, I know, they doubled up the compression, Reply ↓ Share your thoughts Cancel reply Enter your comment here... https://vmxp.wordpress.com/2014/10/27/debugging-machine-check-errors-mces/comment-page-1/ So, they are not reliable for the OS to read 1339 * from them.

Currently, it generates * only one event */ if (uncorrected_error || !pvt->is_registered) edac_mc_handle_error(tp_event, mci, m->addr >> PAGE_SHIFT, m->addr & ~PAGE_MASK, syndrome, channel, dimm, -1, err, msg, m); } /* * i7core_check_error Mcelog I'm open for discussion about this topic and even some MCEs you had in the comments. Well one would figure its hardware, but it also could be software related. VGA addresses).

Cmci Signaling For Patrol Scrub Ucr Errors Not Supported

In order to support more QPI * Quick Path Interconnect, just increment this number. */ #define MAX_SOCKET_BUSES 2 /* * Alter this version for the module when modifications are made */ http://lxr.free-electrons.com/source/drivers/edac/sb_edac.c?v=3.8 However, to have a simpler code, we don't allow enabling error injection on more than one channel. Machine Check Exception Decoder It is unlikely, however, that the 817 * memory controller would generate an error on that range. 818 */ 819 if ((addr > (u64) pvt->tolm) && (addr < (1LL << 32))) Intel Machine Check Exception Decoder Currently, it generates 1712 * only one event 1713 */ 1714 if (uncorrected_error || !pvt->is_registered) 1715 edac_mc_handle_error(tp_event, mci, 1716 m->addr >> PAGE_SHIFT, 1717 m->addr & ~PAGE_MASK, 1718 syndrome, 1719 channel, dimm,

So, they are not reliable for the OS to read 1334 * from them. check my blog Flipping bits in two symbol pairs will cause an 795 * uncorrectable error to be injected. 796 */ 797 798 #define DECLARE_ADDR_MATCH(param, limit) \ 799 static ssize_t i7core_inject_store_##param( \ 800 struct It is possible to have 2157 * Mixed RDDR3/UDDR3 with Nehalem, provided that they are on different 2158 * memory channels 2159 */ 2160 mci->mtype_cap = MEM_FLAG_DDR3; 2161 mci->edac_ctl_cap = EDAC_FLAG_NONE; In order to support more QPI * Quick Path Interconnect, just increment this number. */ #define MAX_SOCKET_BUSES 2

The error itself should be handled later * by i7core_check_error. * WARNING: As this routine should be called at NMI time, extra care should * be taken to avoid deadlocks, and So, we need to use a legacy scan probing 1197 * to detect them 1198 */ 1199 while (table && table->descr) { 1200 pdev = pci_get_device(PCI_VENDOR_ID_INTEL, table->descr[0].dev_id, NULL); 1201 if (unlikely(!pdev)) Sign in Eric Bénard / linux-stable Go to a project Toggle navigation Toggle navigation pinning Projects Groups Snippets Help Project Activity Repository Pipelines Graphs Issues 0 Merge Requests 0 Wiki Network this content This is where a leverage from your VMware support engineer comes in very handy - speaking from my experience.

Thanks to gsilver in the forums for this info. Psod such as VAL, OVER, UC, and EN. if REPEAT_EN is not enabled at 906 * inject mask, then it will produce just one error.

There, download a manual named "Intel 64 and IA-32 Architectures Software Developer's Manual Combined Volumes 3A, 3B, and 3C: System Programming Guide".

http://www.redhat.com 15 * 16 * Forked and adapted from the i5400_edac driver 17 * 18 * Based on the following public Intel datasheets: 19 * Intel Core i7 Processor Extreme Edition If 1, subsequent errors 1377 * won't be shown 1378 * mmm = error type 1379 * cccc = channel 1380 * If the mask doesn't match, report an error to Since there is a quad-channel memory controller used for this particular CPU, the channels would range from 0-3. Machine Check Exception Windows 10 It is unlikely, however, that the 811 * memory controller would generate an error on that range. 812 */ 813 if ((addr > (u64) pvt->tolm) && (addr < (1LL << 32)))

So, 324 * the probing code needs to test for the other address in case of 325 * failure of this one 326 */ 327 { PCI_DESCR(0, 0, PCI_DEVICE_ID_INTEL_I7_NONCORE) }, 328 Flipping bits in two symbol pairs will cause an 800 * uncorrectable error to be injected. 801 */ 802 803 #define DECLARE_ADDR_MATCH(param, limit) \ 804 static ssize_t i7core_inject_store_##param( \ 805 struct Otherwise, it will repeat 907 * until the injectmask would be cleaned. 908 * 909 * FIXME: This routine assumes that MAXNUMDIMMS value of MC_MAX_DOD 910 * is reliable enough to have a peek at these guys Notify me of new posts via email.

If you cannot see the vmkernel-zdump file follow  the steps below. This table should be 62 * moved to pci_id.h when submitted upstream 63 */ 64 #define PCI_DEVICE_ID_INTEL_SBRIDGE_SAD0 0x3cf4 /* 12.6 */ 65 #define PCI_DEVICE_ID_INTEL_SBRIDGE_SAD1 0x3cf6 /* 12.7 */ 66 #define PCI_DEVICE_ID_INTEL_SBRIDGE_BR if REPEAT_EN is not enabled at 895 * inject mask, then it will produce just one error.