Jump to content

Why do unwanted characters appear in emails, web pages, and forums?


Recommended Posts

Guest Stan Hilliard
Posted

Why are these unwanted characters displayed?

 

Here is an example:

I sent an email to myself with a "-" (minus sign) in the title. When I

read the email subject lines on my provider's server using Firefox.

the - is displayed as these three characters: –.

 

This is just one example. Other characters can be transformed. The

occurrence of this problem is intermittent for me and thus I cannot

necessarily repeat it at any one time.

 

I think this issue widespread. I did a Google search on – and got

141,000,000 hits.

 

What causes this and how can it be prevented?

 

Stan Hilliard

  • Replies 5
  • Created
  • Last Reply
Guest Bruce Hagen
Posted

Re: Why do unwanted characters appear in emails, web pages, and forums?

 

 

"Stan Hilliard" <usenetreplyMS@samplingplansNOTSPAM.com> wrote in message

news:rkama4dov53bappeiredaetuupf5m1r32c@4ax.com...

> Why are these unwanted characters displayed?

>

> Here is an example:

> I sent an email to myself with a "-" (minus sign) in the title. When I

> read the email subject lines on my provider's server using Firefox.

> the - is displayed as these three characters: â?".

>

> This is just one example. Other characters can be transformed. The

> occurrence of this problem is intermittent for me and thus I cannot

> necessarily repeat it at any one time.

>

> I think this issue widespread. I did a Google search on â?" and got

> 141,000,000 hits.

>

> What causes this and how can it be prevented?

>

> Stan Hilliard

 

 

One reason is when someone creates a message using Word with Smart Quotes

enabled and then pastes it into an e-mail. It is a similar issue with

Websites.

 

See:

http://ezinearticles.com/?Microsoft-Word-Smart-Quotes-and-Article-Marketers-Dont-Mix&id=15624

Guest John Wunderlich
Posted

Re: Why do unwanted characters appear in emails, web pages, and forums?

 

Stan Hilliard <usenetreplyMS@samplingplansNOTSPAM.com> wrote in

news:rkama4dov53bappeiredaetuupf5m1r32c@4ax.com:

> Why are these unwanted characters displayed?

>

> Here is an example:

> I sent an email to myself with a "-" (minus sign) in the title.

> When I read the email subject lines on my provider's server using

> Firefox. the - is displayed as these three characters: –.

>

> This is just one example. Other characters can be transformed. The

> occurrence of this problem is intermittent for me and thus I

> cannot necessarily repeat it at any one time.

>

> I think this issue widespread. I did a Google search on – and

> got 141,000,000 hits.

>

> What causes this and how can it be prevented?

 

This is a character encoding issue. The email is being generated in

one character encoding and being read in a different encoding. There

should be a field in the header defining the encoding (e.g. your post

specified ISO-8859-1 encoding) and the client software should respect

this encoding but this isn't always the case.

 

In your example, you claim you type a single dash "-" character. In

reality, your source email program has transformed that dash into an

"En Dash" which has a Unicode representation of 0x2013. Since email is

set up to handle only 7-bit characters, this En Dash cannot be sent as-

is, so for some reason, your sending program encoded it using the UTF-8

character encoding (common in Unix) which is sent as the three

characters: 0xE2, 0x80, 0x22 which, when you look these up in a ISO-

8859-1 character set results in the three characters that you quoted

above. The program that read these characters did not perform the

proper character translation, so you see those three characters.

 

This can happen with quotes and double-dashes as well. To prevent

this, the program you use to generate the email or post must be

configured to use the ISO-8859-1 character set. Also, a lot of

programs try to be "smart" and change simple dashes into En-Dashes or

simple quotes into start-quote / end-quotes. There may be a setting

that will prevent the program from generating these enhanced symbols.

It all depends on the client that you're using.

 

I've learned to simply ignore these occurrences.

 

Hope this helps,

John

Guest Spiral
Posted

Re: Why do unwanted characters appear in emails, web pages, and forums?

 

John Wunderlich wrote:

> This is a character encoding issue. The email is being generated in

> one character encoding and being read in a different encoding. There

> should be a field in the header defining the encoding (e.g. your post

> specified ISO-8859-1 encoding) and the client software should respect

> this encoding but this isn't always the case.

>

> In your example, you claim you type a single dash "-" character. In

> reality, your source email program has transformed that dash into an

> "En Dash" which has a Unicode representation of 0x2013. Since email is

> set up to handle only 7-bit characters, this En Dash cannot be sent as-

> is, so for some reason, your sending program encoded it using the UTF-8

> character encoding (common in Unix) which is sent as the three

> characters: 0xE2, 0x80, 0x22 which, when you look these up in a ISO-

> 8859-1 character set results in the three characters that you quoted

> above. The program that read these characters did not perform the

> proper character translation, so you see those three characters.

>

> This can happen with quotes and double-dashes as well. To prevent

> this, the program you use to generate the email or post must be

> configured to use the ISO-8859-1 character set. Also, a lot of

> programs try to be "smart" and change simple dashes into En-Dashes or

> simple quotes into start-quote / end-quotes. There may be a setting

> that will prevent the program from generating these enhanced symbols.

> It all depends on the client that you're using.

>

> I've learned to simply ignore these occurrences.

>

> Hope this helps,

> John

 

Nice answer. Thanks for writing--hope the poster reads this.

Guest Stan Hilliard
Posted

Re: Why do unwanted characters appear in emails, web pages, and forums?

 

On Tue, 19 Aug 2008 21:09:57 GMT, John Wunderlich

<jwunderlich@lycos.com> wrote:

>Stan Hilliard <usenetreplyMS@samplingplansNOTSPAM.com> wrote in

>news:rkama4dov53bappeiredaetuupf5m1r32c@4ax.com:

>

>> Why are these unwanted characters displayed?

>>

>> Here is an example:

>> I sent an email to myself with a "-" (minus sign) in the title.

>> When I read the email subject lines on my provider's server using

>> Firefox. the - is displayed as these three characters: –.

>>

>> This is just one example. Other characters can be transformed. The

>> occurrence of this problem is intermittent for me and thus I

>> cannot necessarily repeat it at any one time.

>>

>> I think this issue widespread. I did a Google search on – and

>> got 141,000,000 hits.

>>

>> What causes this and how can it be prevented?

>

>This is a character encoding issue. The email is being generated in

>one character encoding and being read in a different encoding. There

>should be a field in the header defining the encoding (e.g. your post

>specified ISO-8859-1 encoding) and the client software should respect

>this encoding but this isn't always the case.

>

>In your example, you claim you type a single dash "-" character. In

>reality, your source email program has transformed that dash into an

>"En Dash" which has a Unicode representation of 0x2013. Since email is

>set up to handle only 7-bit characters, this En Dash cannot be sent as-

>is, so for some reason, your sending program encoded it using the UTF-8

>character encoding (common in Unix) which is sent as the three

>characters: 0xE2, 0x80, 0x22 which, when you look these up in a ISO-

>8859-1 character set results in the three characters that you quoted

>above. The program that read these characters did not perform the

>proper character translation, so you see those three characters.

>

>This can happen with quotes and double-dashes as well. To prevent

>this, the program you use to generate the email or post must be

>configured to use the ISO-8859-1 character set. Also, a lot of

>programs try to be "smart" and change simple dashes into En-Dashes or

>simple quotes into start-quote / end-quotes. There may be a setting

>that will prevent the program from generating these enhanced symbols.

>It all depends on the client that you're using.

>

>I've learned to simply ignore these occurrences.

>

>Hope this helps,

> John

 

John, thanks for that explanation.

I was using Pegasus as my email client. It was set to use ISO-8859-1.

But I think I must have written the message in another program --

probably OpenOffice, or Word, or NoteTab.

 

I checked the dash in the email title and it was longer that the minus

sign that I put there. I change it back and this fixed the problem.

 

I have changed the settings in my word processors to prevent automatic

character replacement.

 

It was recommended to me on another forum to change Pegasus's

character set from ISO8859-1 to UTF-8. Do you think that is a good

idea?

 

Stan Hilliard

Guest John Wunderlich
Posted

Re: Why do unwanted characters appear in emails, web pages, and forums?

 

Stan Hilliard <usenetreplyMS@samplingplansNOTSPAM.com> wrote in

news:lkfpa4p390fgjcm216a7h3pckqq80imdfr@4ax.com:

> I was using Pegasus as my email client. It was set to use

> ISO-8859-1. But I think I must have written the message in another

> program -- probably OpenOffice, or Word, or NoteTab.

>

> I checked the dash in the email title and it was longer that the

> minus sign that I put there. I change it back and this fixed the

> problem.

>

> I have changed the settings in my word processors to prevent

> automatic character replacement.

>

> It was recommended to me on another forum to change Pegasus's

> character set from ISO8859-1 to UTF-8. Do you think that is a

> good idea?

 

It probably wouldn't hurt to do this. The big thing that UTF-8 has

going for it is that normal US Keyboard characters in the < 0x7F range

are simply represented as themselves so that if you don't use extended

characters, everything is the same. If you do use extended characters,

then by setting it to UTF-8, your email should be sent by your client

identifying it as UTF-8 and a savvy email client at the other end

should recognize this character set and translate the extended

characters back to the original representation. Fonts sometimes vary a

bit so that it's not guaranteed that it will show up on their screen

properly depending on their choice of font.

 

HTH,

John


×
×
  • Create New...