Guest Stan Hilliard Posted August 19, 2008 Posted August 19, 2008 Why are these unwanted characters displayed? Here is an example: I sent an email to myself with a "-" (minus sign) in the title. When I read the email subject lines on my provider's server using Firefox. the - is displayed as these three characters: –. This is just one example. Other characters can be transformed. The occurrence of this problem is intermittent for me and thus I cannot necessarily repeat it at any one time. I think this issue widespread. I did a Google search on – and got 141,000,000 hits. What causes this and how can it be prevented? Stan Hilliard
Guest Bruce Hagen Posted August 19, 2008 Posted August 19, 2008 Re: Why do unwanted characters appear in emails, web pages, and forums? "Stan Hilliard" <usenetreplyMS@samplingplansNOTSPAM.com> wrote in message news:rkama4dov53bappeiredaetuupf5m1r32c@4ax.com... > Why are these unwanted characters displayed? > > Here is an example: > I sent an email to myself with a "-" (minus sign) in the title. When I > read the email subject lines on my provider's server using Firefox. > the - is displayed as these three characters: â?". > > This is just one example. Other characters can be transformed. The > occurrence of this problem is intermittent for me and thus I cannot > necessarily repeat it at any one time. > > I think this issue widespread. I did a Google search on â?" and got > 141,000,000 hits. > > What causes this and how can it be prevented? > > Stan Hilliard One reason is when someone creates a message using Word with Smart Quotes enabled and then pastes it into an e-mail. It is a similar issue with Websites. See: http://ezinearticles.com/?Microsoft-Word-Smart-Quotes-and-Article-Marketers-Dont-Mix&id=15624
Guest John Wunderlich Posted August 19, 2008 Posted August 19, 2008 Re: Why do unwanted characters appear in emails, web pages, and forums? Stan Hilliard <usenetreplyMS@samplingplansNOTSPAM.com> wrote in news:rkama4dov53bappeiredaetuupf5m1r32c@4ax.com: > Why are these unwanted characters displayed? > > Here is an example: > I sent an email to myself with a "-" (minus sign) in the title. > When I read the email subject lines on my provider's server using > Firefox. the - is displayed as these three characters: –. > > This is just one example. Other characters can be transformed. The > occurrence of this problem is intermittent for me and thus I > cannot necessarily repeat it at any one time. > > I think this issue widespread. I did a Google search on – and > got 141,000,000 hits. > > What causes this and how can it be prevented? This is a character encoding issue. The email is being generated in one character encoding and being read in a different encoding. There should be a field in the header defining the encoding (e.g. your post specified ISO-8859-1 encoding) and the client software should respect this encoding but this isn't always the case. In your example, you claim you type a single dash "-" character. In reality, your source email program has transformed that dash into an "En Dash" which has a Unicode representation of 0x2013. Since email is set up to handle only 7-bit characters, this En Dash cannot be sent as- is, so for some reason, your sending program encoded it using the UTF-8 character encoding (common in Unix) which is sent as the three characters: 0xE2, 0x80, 0x22 which, when you look these up in a ISO- 8859-1 character set results in the three characters that you quoted above. The program that read these characters did not perform the proper character translation, so you see those three characters. This can happen with quotes and double-dashes as well. To prevent this, the program you use to generate the email or post must be configured to use the ISO-8859-1 character set. Also, a lot of programs try to be "smart" and change simple dashes into En-Dashes or simple quotes into start-quote / end-quotes. There may be a setting that will prevent the program from generating these enhanced symbols. It all depends on the client that you're using. I've learned to simply ignore these occurrences. Hope this helps, John
Guest Spiral Posted August 19, 2008 Posted August 19, 2008 Re: Why do unwanted characters appear in emails, web pages, and forums? John Wunderlich wrote: > This is a character encoding issue. The email is being generated in > one character encoding and being read in a different encoding. There > should be a field in the header defining the encoding (e.g. your post > specified ISO-8859-1 encoding) and the client software should respect > this encoding but this isn't always the case. > > In your example, you claim you type a single dash "-" character. In > reality, your source email program has transformed that dash into an > "En Dash" which has a Unicode representation of 0x2013. Since email is > set up to handle only 7-bit characters, this En Dash cannot be sent as- > is, so for some reason, your sending program encoded it using the UTF-8 > character encoding (common in Unix) which is sent as the three > characters: 0xE2, 0x80, 0x22 which, when you look these up in a ISO- > 8859-1 character set results in the three characters that you quoted > above. The program that read these characters did not perform the > proper character translation, so you see those three characters. > > This can happen with quotes and double-dashes as well. To prevent > this, the program you use to generate the email or post must be > configured to use the ISO-8859-1 character set. Also, a lot of > programs try to be "smart" and change simple dashes into En-Dashes or > simple quotes into start-quote / end-quotes. There may be a setting > that will prevent the program from generating these enhanced symbols. > It all depends on the client that you're using. > > I've learned to simply ignore these occurrences. > > Hope this helps, > John Nice answer. Thanks for writing--hope the poster reads this.
Guest Stan Hilliard Posted August 21, 2008 Posted August 21, 2008 Re: Why do unwanted characters appear in emails, web pages, and forums? On Tue, 19 Aug 2008 21:09:57 GMT, John Wunderlich <jwunderlich@lycos.com> wrote: >Stan Hilliard <usenetreplyMS@samplingplansNOTSPAM.com> wrote in >news:rkama4dov53bappeiredaetuupf5m1r32c@4ax.com: > >> Why are these unwanted characters displayed? >> >> Here is an example: >> I sent an email to myself with a "-" (minus sign) in the title. >> When I read the email subject lines on my provider's server using >> Firefox. the - is displayed as these three characters: –. >> >> This is just one example. Other characters can be transformed. The >> occurrence of this problem is intermittent for me and thus I >> cannot necessarily repeat it at any one time. >> >> I think this issue widespread. I did a Google search on – and >> got 141,000,000 hits. >> >> What causes this and how can it be prevented? > >This is a character encoding issue. The email is being generated in >one character encoding and being read in a different encoding. There >should be a field in the header defining the encoding (e.g. your post >specified ISO-8859-1 encoding) and the client software should respect >this encoding but this isn't always the case. > >In your example, you claim you type a single dash "-" character. In >reality, your source email program has transformed that dash into an >"En Dash" which has a Unicode representation of 0x2013. Since email is >set up to handle only 7-bit characters, this En Dash cannot be sent as- >is, so for some reason, your sending program encoded it using the UTF-8 >character encoding (common in Unix) which is sent as the three >characters: 0xE2, 0x80, 0x22 which, when you look these up in a ISO- >8859-1 character set results in the three characters that you quoted >above. The program that read these characters did not perform the >proper character translation, so you see those three characters. > >This can happen with quotes and double-dashes as well. To prevent >this, the program you use to generate the email or post must be >configured to use the ISO-8859-1 character set. Also, a lot of >programs try to be "smart" and change simple dashes into En-Dashes or >simple quotes into start-quote / end-quotes. There may be a setting >that will prevent the program from generating these enhanced symbols. >It all depends on the client that you're using. > >I've learned to simply ignore these occurrences. > >Hope this helps, > John John, thanks for that explanation. I was using Pegasus as my email client. It was set to use ISO-8859-1. But I think I must have written the message in another program -- probably OpenOffice, or Word, or NoteTab. I checked the dash in the email title and it was longer that the minus sign that I put there. I change it back and this fixed the problem. I have changed the settings in my word processors to prevent automatic character replacement. It was recommended to me on another forum to change Pegasus's character set from ISO8859-1 to UTF-8. Do you think that is a good idea? Stan Hilliard
Guest John Wunderlich Posted August 21, 2008 Posted August 21, 2008 Re: Why do unwanted characters appear in emails, web pages, and forums? Stan Hilliard <usenetreplyMS@samplingplansNOTSPAM.com> wrote in news:lkfpa4p390fgjcm216a7h3pckqq80imdfr@4ax.com: > I was using Pegasus as my email client. It was set to use > ISO-8859-1. But I think I must have written the message in another > program -- probably OpenOffice, or Word, or NoteTab. > > I checked the dash in the email title and it was longer that the > minus sign that I put there. I change it back and this fixed the > problem. > > I have changed the settings in my word processors to prevent > automatic character replacement. > > It was recommended to me on another forum to change Pegasus's > character set from ISO8859-1 to UTF-8. Do you think that is a > good idea? It probably wouldn't hurt to do this. The big thing that UTF-8 has going for it is that normal US Keyboard characters in the < 0x7F range are simply represented as themselves so that if you don't use extended characters, everything is the same. If you do use extended characters, then by setting it to UTF-8, your email should be sent by your client identifying it as UTF-8 and a savvy email client at the other end should recognize this character set and translate the extended characters back to the original representation. Fonts sometimes vary a bit so that it's not guaranteed that it will show up on their screen properly depending on their choice of font. HTH, John
Recommended Posts