Jump to content

Bad Power Mgmt


Recommended Posts

Guest don_b_1
Posted

New and fully updated install of XP Home SP2 installed to freshly formatted

partition at (hd0,0) for new components. No previous power mgmt difficulties.

 

New parts are as follows:

MSI MS-7325 v1.0 (K9N4 SLI) NVIDIA nForce 500 SLI chipset

AMI BIOS v. 1.3 (but the problem exists with v.1.4 and v.1.5)

AMD64 X2 5000+ Black Edition

Corsair TWIN 2X2048-6400C4 G (two overnight runs of memtest86 mode 5

produced no errors)

MSI NX7300LE 256 MB - NVIDIA GeForce 7300 LE

New 500w / 34a x 12v power supply

Nothing overclocked

 

I activated Home/Office Desk profile in default configuration and find the

computer rebooted when I expect it to be hibernated. I activated the

keyboard sleep button and verified standby mode initiates a reboot.

 

I set up a "Hibertest" profile to prevent Standby mode and it entered

hibernation one time only. Computer enters and exits hibernation perfectly

when manually initiaated.

 

All power profiles turn off the display and hard disks properly and on

schedule. These resume properly with any input.

 

So, two problems. No automatic initiation of hibernation and standby mode

causes reboot whether initiated by system, sleep button or shutdown menu

item.

 

Power management has been reinstalled. Changing ACPI BIOS functions have no

effect on either situation.

 

Windows Error Reporting says the standby mode crash is caused by the video

driver and issues advisory "STOP 0x000000EA THREAD_STUCK_IN_DEVICE_DRIVER".

It recommends updating to the most current video driver. I did this.

Alternatively, it suggests disabling hardware acceleration and disabling

write combining . I tried this too. The standby reboot problem still exists

very reliably.

 

Not sure I believe the error reporting service since the computer passes all

stability and torture tests. Other than the rebooting initiated by standby

mode, the computer has no problems that I've identified so far.

 

Any ideas on cause of this? Are there any optional hotfixes issued for

nvidia accelerated graphics driver issues?

 

Thanks,

Don

Guest don_b_1
Posted

Update to above

 

Update to above

 

Update: I did remove hiberfil.sys, run chkdsk /r, defrag and reactivated the

hiberfil. Now I can push my sleep button and put the computer into dead

powered down standby where the lights and fans turn completely off, then

bring it right back up with a key press. I can also manually put it into full

hibernation and bring it back up with the power switch but I still have no

automatic hibernation and have lost automatic standby.

Guest PaulMaudib
Posted

Re: Update to above

 

Re: Update to above

 

On Mon, 25 Feb 2008 23:20:04 -0800, don_b_1

<donb1@discussions.microsoft.com> wrote:

>Update: I did remove hiberfil.sys, run chkdsk /r, defrag and reactivated the

>hiberfil. Now I can push my sleep button and put the computer into dead

>powered down standby where the lights and fans turn completely off, then

>bring it right back up with a key press. I can also manually put it into full

>hibernation and bring it back up with the power switch but I still have no

>automatic hibernation and have lost automatic standby.

 

To bad nobody has any idea what you are responding too.

Posted

Re: Update to above

 

Re: Update to above

 

On Feb 26, 2:20 am, don_b_1 <do...@discussions.microsoft.com> wrote:

> Now I can push my sleep button and put the computer into dead

> powered down standby where the lights and fans turn completely off, then

> bring it right back up with a key press. I can also manually put it into full

> hibernation and bring it back up with the power switch but I still have no

> automatic hibernation and have lost automatic standby.

 

The error message is probably correct. What others rumor as stress

testing is not and tells you little. Question is whether the video

driver or video hardware is defective. Defective hardware (like so

many other devices including power supply) can be defective while

still booting and running a computer.

 

First, does every connector on video card have a corresponding

connection? If not, why not?

 

Second, what does video card manufacturer diagnostics report? If it

does not provide diagnostics for free, well, how responsible was that

manufacturer? Your problem is why diagnostics are provided and

executed.

 

Third, is the power supply really working properly? In your case,

computer must multitask to everything. Sound card making noise, while

CD-ROM is reading, while another program is searching through hard

drive, while video processor is doing complex graphics (ie a movie),

while IE is downloading a program from internet. Now you are ready

to measure four critical voltages on any one of orange, red, yellow,

and purple wire from power supply. Those DC voltage numbers must

exceed 3.23, 4.87, and 11.7. Motherboard monitor is not sufficient.

 

In each case, establish each suspect as either definitively good or

definitively bad. 'Maybe' answers such as your stress test tell us

nothing useful.

 

BTW, when executing video processor diagnostics, then heat that card

with a hairdryer on high. If video card has an intermittent, then

sometimes a 100% defect can only be identified at the perfectly ideal

temperature output by a hairdryer on highest heat setting. This

determines something 'definitively'. And then never look back while

moving on to other possible suspects.

Guest don_b_1
Posted

Re: Update to above

 

Re: Update to above

 

Thanks for the info and advice w_tom.

 

At this point, all the new hardware is suspect even though it tests out OK.

The power supply seems good as all voltages come in within 2% above spec

whether under load or not. I verified with two different meters. I suppose I

should have another look at that. The diagnostics on the video card doesn't

show a problem.

 

Further checking revealed Windows is definitely the culprit here. My new

installation got thoroughly trashed but the problem is I don't know why. I'm

hoping I didn't get a good install when I changed the hardware. I didn't

bother with the power management in the beginning but when I did, it had

problems.

 

The other night after I ran the chkdsk/defrag, etc. I cured one thing but

more things goofed up. I lost all my restore points and Windows started

handling my processor cores in a very crazy manner. This was definitely a

Windows issue because it wasn't happening in Safe Mode or Linux. I also

couldn't start the Windows recovery manager from the CD.

 

Big question is: what happened? I'm hoping I got a bad in the beginning.

IThinking back I'm not absolutely certain I did a proper format on the drive

when I reinstalled three weeks ago. I know I didn't have to break out my

"proof" disk to allow the XP upgrade to continue and blew through everything

quick and dirty. I'm doing a reinstall right now and definitely did do a

proper format this time. Twice. Once with Gparted and a followup with Windows

just to make sure.

 

We'll see but it already appears as though I have a problem as the display

won't return when recovering from manually induced standby in ACPI S3 mode

and it still reboots instead of recovering when in S1 mode. I think maybe I

have to get the proper hotfixes for the AMD X2 CPU. My AMD X2 laptop with

Vista doesn't have any power mgmt issues.

 

I also have a weird power options window with no UPS tab but a double entry

screen under the main Power Schemes tab with setup for battery power and AC

power being side by side. Never seen that before.

 

I'm not sure what you mean when you ask if every connector on video card has

a corresponding connection. There are different video outputs but I'm only

using the D-shell output to run my monitor.

 

 

"w_tom" wrote:

> On Feb 26, 2:20 am, don_b_1 <do...@discussions.microsoft.com> wrote:

> > Now I can push my sleep button and put the computer into dead

> > powered down standby where the lights and fans turn completely off, then

> > bring it right back up with a key press. I can also manually put it into full

> > hibernation and bring it back up with the power switch but I still have no

> > automatic hibernation and have lost automatic standby.

>

> The error message is probably correct. What others rumor as stress

> testing is not and tells you little. Question is whether the video

> driver or video hardware is defective. Defective hardware (like so

> many other devices including power supply) can be defective while

> still booting and running a computer.

>

> First, does every connector on video card have a corresponding

> connection? If not, why not?

>

> Second, what does video card manufacturer diagnostics report? If it

> does not provide diagnostics for free, well, how responsible was that

> manufacturer? Your problem is why diagnostics are provided and

> executed.

>

> Third, is the power supply really working properly? In your case,

> computer must multitask to everything. Sound card making noise, while

> CD-ROM is reading, while another program is searching through hard

> drive, while video processor is doing complex graphics (ie a movie),

> while IE is downloading a program from internet. Now you are ready

> to measure four critical voltages on any one of orange, red, yellow,

> and purple wire from power supply. Those DC voltage numbers must

> exceed 3.23, 4.87, and 11.7. Motherboard monitor is not sufficient.

>

> In each case, establish each suspect as either definitively good or

> definitively bad. 'Maybe' answers such as your stress test tell us

> nothing useful.

>

> BTW, when executing video processor diagnostics, then heat that card

> with a hairdryer on high. If video card has an intermittent, then

> sometimes a 100% defect can only be identified at the perfectly ideal

> temperature output by a hairdryer on highest heat setting. This

> determines something 'definitively'. And then never look back while

> moving on to other possible suspects.

>

Posted

Re: Update to above

 

Re: Update to above

 

On Feb 27, 4:29 pm, don_b_1 <do...@discussions.microsoft.com> wrote:

> At this point, all the new hardware is suspect even though it tests out OK.

> The power supply seems good as all voltages come in within 2% above spec

> whether under load or not. I verified with two different meters. I suppose I

> should have another look at that. The diagnostics on the video card doesn't

> show a problem.

>

> Further checking revealed Windows is definitely the culprit here. My new

> installation got thoroughly trashed but the problem is I don't know why. I'm

> hoping I didn't get a good install when I changed the hardware. I didn't

> bother with the power management in the beginning but when I did, it had

> problems. ....

 

Defined were all voltages OK when under full load. To be clear,

full load means downloading from the Internet, while playing complex

graphics (ie movie) on video processor, while playing sound, while

reading a CD-Rom, while doing a defrag on the disk, while reading a

floppy, etc. Then record those numbers. Being within 2% does not

tell as much as what those numbers actually are on any one purple,

orange, red, and yellow wires - when under that above load.

 

Moving on, it is normal for defective hardware to pass the

diagnostic at normal temperature. Some defects only appear in a

diagnostic when hardware (ie video processor and adjacent memory) is

heated by a hairdryer in highest heat setting. (Same heat is required

to make Memtst86 an effective diagnostic.)

 

Anything involving disk drive hardware (partitioning or formatting)

is irrelevant to your problem. Hardware associated with your problem

is limited to video processor, sound card, some memory (used by OS),

CPU (and its associated power supply), power supply 'system' (not just

the power supply), and some motherboard functions.

 

I don't remember if this was discussed. But a visual inspection of

electrolytic capacitors may reveal swelling - which would create a

voltage problem across the motherboard resulting in what appears to be

strange (unrelated), and unique problems.

 

Whatever the problem is has caused (apparently) the video subsystem

to not work properly as indicated by the BSOD error code and by how

that video processor interferes with power down routines in BIOS.

Hopefully something above will identify a hardware failure. If not,

we can only assume an incompatibility between that video card (both

firmware and driver) and the BIOS (power management) firmware /

hardware.

 

I am also assuming the video processor manufacturer web site has

been consulted for a latest driver version. Not at all a likely

solution, but ...

 

Use heat as a diagnostic tool. For example, are problems worse when

only one part of hardware is heated with that hairdryer? A

technique used in aerospace to find defective hardware before that

hardware started failing. Cold (ie 40 degree F) is also a useful (but

not as effective) diagnostic tool.

Guest don_b_1
Posted

Re: Update to above

 

Re: Update to above

 

After downloading Windows Updates all night, the computer is doing better. It

will enter and recover from S1 standby with no problem. I haven't yet tested

ACPI S3 mode yet. It will also manually and automatically enter hibernation

and my power options control panel applet is now correct.

 

The previous problems sure seem to be Windows induced rather than hardware

or driver problems. Heat was not an issue. I first became aware of it when

the computer was idle. I would return to it and find it either locked up with

no display (requiring a power down for recovery) or find it rebooted when it

should have been in hibernation. Heat had nothing to do with this. Working

the bejeezus out of the computer has never incited a shutdown or an error

that I can see.

 

I can drive the GPU very hard for hours running a simulation that operates

at +75 fps and never induce a shutdown of any sort. It will heat up a bit

according to the temperature monitor but that doesn't appear to cause

problems.

 

Right now I'm still running the original drivers that came with the

components and things seems to by fine so far. Next I'll update the drivers

to nvidia latest and see what happens.

 

The thing that bothers me is what caused Windows to become so corrupt. It

was driving my second core to 80% or better at all times producing an average

CPU Usage of 40-50% even as System Idle Process was clocking in at 97-99%.

This was obviously a windows problem since it didn't occur in Safe Mode or

when running Linux.

 

When it was goofed like this, it required 70 seconds to complete a specific

opertion that normally required 14.7 seconds. With affinity set to CPU1, the

second core, it required 90 seconds.

 

At that time, I could complete the same test operation in Linux in 13

seconds flat. Now that Windows is all reinstalled and tamed, it's back to

completing the operation in 14.7 seconds and will do it with affinity set to

CPU1 in 13.5 seconds.

 

BTW: this test I mention is doing a find and replace operation on a 2.2 mb,

407 page text document that requires almost 69000 replacements of the letter

"a" with "1234567890" . I run the Windows test in Word and the Linux test,

using the same document, in Open Office.

Posted

Re: Update to above

 

Re: Update to above

 

On Feb 28, 12:55 pm, don_b_1 <do...@discussions.microsoft.com> wrote:

> At that time, I could complete the same test operation in Linux in 13

> seconds flat. Now that Windows is all reinstalled and tamed, it's back to

> completing the operation in 14.7 seconds and will do it with affinity set to

> CPU1 in 13.5 seconds.

 

Both your diagnostics and symptoms created by reloading Windows

suggests hardware has always been good and that Windows was

corrupted. Problem may have been in the HAL; more likely the

interface between HAL and other related functions.

 

Sometimes, when reloading system files, some files do not get

updated. Different example, critical files processing IP functions

(networking) would not properly reload until every peripheral that

uses IP (modem, wireless, etc) were first removed. You may have had

same.

 

If, for some reason, a function (or file) was an older version and

the other function (file) was a newer update, then incompatibilities

can exist. Microsoft programmers would have assumed both functions

were either original or updated; never intended two different versions

to work together. This creates strange problems - could explain why

an interface between HAL and BIOS power management was corrupted. How

could two 'talking' functions end up at different rev levels

(incompatible)? I can only speculate. However if a computer with MS

updates was then reloaded with original OS files, then incompatible

revision might exist.

 

Symptoms suggest the problem has been eliminated by reloading and

updating all OS files to the same (latest) rev level.

 

What can cause a processor to work hard while no processes are

executing? An interrupt that the CPU processes but never gets

cleared. An example that might explain 80% CPU time. Unfortunately

we have too little information to answer any better. But once all

files were reloaded to same rev level, then things apparently work.

Guest don_b_1
Posted

Re: Update to above

 

Re: Update to above

 

Here's where it gets weird Tom. I also use Ubuntu linux. After I built the

new computer, I did the XP reinstall to accomodate the hardware upgrade but I

didn't reinstall the linux since it still worked. I merely installed and

enabled the nvidia restricted drivers. It was so easy it was almost

automatic. After a week or so, about the time I first started noticing

oddities with the XP, my linux started having problems. I assumed it was

something to do with putting the new hardware under it so I reinstalled and

did all my updates on it. (Sixteen hours of downloading those updates) I

never could get the nvidia drivers to install on this copy of Ubuntu and was

stuck with generic vesa or the open source nv driver. I tried everything I

could think of, and hammered on it for a week. I used all the linux methods,

used a third party installer and did the miserable nvidia command line

routine to get it to go. No way. I even put an installation on a different

hard drive but nothing would work.

 

Subsequent to formatting, reinstalling, updating , testing and verifying my

XP installation as good, I booted up Ubuntu to see if it acted any

differently. Sure enough, I used the Restricted Drivers Manager to install

the proprietary nvidia and it took it immediately.

 

I never would have dreamed that a corrupted copy of Windows could goof up a

computer so badly that it would prevent a completely different operating

system on a completely different physical drive from functioning properly.

 

Seems like I learn something new every time I blink.

 

Thanks for your help amigo.

Guest w_tom
Posted

Re: Update to above

 

Re: Update to above

 

On Feb 29, 10:21 pm, don_b_1 <do...@discussions.microsoft.com> wrote:

> I never would have dreamed that a corrupted copy of Windows could goof up a

> computer so badly that it would prevent a completely different operating

> system on a completely different physical drive from functioning properly.

 

Software cannot harm hardware. In the rare time when that exception

existed, since then, video monitors were redesigned to eliminate that

'software harming hardware' problem.

 

Software might change CMOS settings. But CMOS (except for data time

clock) is only changed by BIOS; not by any other software. This new

symptom suggests Windows HAL is does not explain the problem. Again,

these analog type of problems are why we use heat (and cold) to

aggravate, then locate, the problem. Problem could be a weak

transistor that periodically does and does not conduct enough current

(does not create a logic one or logic zero; only create undefined - a

voltage in between). Problem could be a hardware problem that causes

timing shifts / delays. Again we use temperature to make that failure

obvious to diagnostics. These problems could exist between CPU, in

buses between essential functions, or even on the video processor

interface.

 

Do your diagnostics also test multiprocessor arbitration functions

on the motherboard? Probably not if diagnostics are not provided by

the computer manufacturer.

 

Heat could also cause CPU's power supply to become less stable. Any

power supply voltage can be completely defective and CPU will still

work OK. Again, temperature may aggravate the problem sufficiently

that bad voltages are apparent.

 

Of course, you have inspected electrolytic capacitors for bulging -

another indication that voltages are no longer stable.

 

Appreciate the objective. Aggravate a marginal and therefore

intermittent hardware problem. Also useful are what those voltages

numbers actually are (not that voltages are in spec). Change between

light load and the above described full load testing to record number

changes. Again, seeking something marginal. Voltages can be in spec

but numbers indicate the problem. A best tool to find these is a

hairdryer on highest heat. Heat is not a problem as others so

mistakenly assume. Heat is a most powerful diagnostic tool to

aggravate, make hard, and therefore find a defect.

 

Of course, motherboards even have tiny capacitors scattered about to

'decouple' adjacent ICs. If one is missing, then intermittent

failures can occur. Inspection will never find such problems. Just

another reason why we aggravate problems by working hardware at

extremes - full load, highest and lowest temperatures, etc. - to make

a problem hard before trying to find or fix it. Eliminating

intermittent failures are the most difficult. Suggested are some

tools to use in that art.

Guest don_b_1
Posted

Re: Update to above

 

Re: Update to above

 

Thanks again Tom.

 

Of course hardware is always a suspect but in this case there really is no

indication of such. Het and/or working the computer very hard has never

caused a problem.

 

What I do know for certain is:

 

1) Since formatting and reinstalling and updating XP, there have still been

no errors or problems with the computer or either operating system.

 

2) Prior to this reinstallation, there was noting but rapidly progressing

problems with the computer and both operating systems that finally reached

the point the computer was out of control.

 

Logic and common sense has to place the blame on the original install of XP

but I'm watching very carefully

×
×
  • Create New...