LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 11-20-2023, 11:56 AM   #1
Phunction
LQ Newbie
 
Registered: Dec 2009
Posts: 15

Rep: Reputation: 0
Apache issue with UTF-8 after moving to new server.


I have a strange problem. I am moving my site from an older lamp server to a new debian 12 lamp server. The files where tarred on the old, copied to the new and untarred.

Any UTF-8 characters like ’ are showing as the black diamond with question mark.

I added AddDefaultCharset utf-8 to conf-enabled/charset.conf

I also tried adding <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> to the html file (originally not there). Examining the page in the browser with F12 does show charset utf-8 for content type.

I cannot figure out why the site displays fine on the old server but not the new server?

The only difference I can find is the locale on the old server is en_US.UTF-8 but on the new server is is C.UTF-8

Any hints would be great.

Last edited by Phunction; 11-20-2023 at 12:43 PM.
 
Old 11-20-2023, 12:05 PM   #2
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,753

Rep: Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983
Quote:
Originally Posted by Phunction View Post
Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. You can edit the question so it's on-topic or see if it can be answered on another Stack Exchange site, but be sure to read the on-topic page for a site before posting there.

Closed yesterday.

I have a strange problem. I am moving my site from an older lamp server to a new debian 12 lamp server. The files where tarred on the old, copied to the new and untarred. Any UTF-8 characters like ’ are showing as the black diamond with question mark. I added AddDefaultCharset utf-8 to conf-enabled/charset.conf

I also tried adding <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> to the html file (originally not there). Examining the page in the browser with F12 does show charset utf-8 for content type. I cannot figure out why the site displays fine on the old server but not the new server? The only difference I can find is the locale on the old server is en_US.UTF-8 but on the new server is is C.UTF-8
Two things:
  1. You seriously copied/pasted this directly from Stack Exchange, and didn't even bother to remove the header???
  2. You say it's a 'strange problem'....yet post the exact solution when you say the locales are different.
Your 'hint' is to change the locale on the new server to match what was on your old server. You don't tell us what character sets are installed on your new server, what language(s) your website is in, or what language your site is written in.
 
Old 11-20-2023, 12:41 PM   #3
Phunction
LQ Newbie
 
Registered: Dec 2009
Posts: 15

Original Poster
Rep: Reputation: 0
Yes, I quickly copied and pasted as it was closed on stack exchange. I know it is mostly for programming, but there were similar questions and I am getting desperate.

It is a default Debian 12 install, site is in mostly English but it has the odd Unicode character as some text was originally copied and pasted from word on the old server.

Last edited by Phunction; 11-20-2023 at 12:43 PM.
 
Old 11-20-2023, 12:43 PM   #4
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,753

Rep: Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983
Quote:
Originally Posted by Phunction View Post
Yes, I quickly copied and pasted as it was closed on stack exchange. I know it is mostly for programming, but there were similar questions and I am getting desperate.
Then your 'hint' is still to set the locale to match. If you're 'desperate', you can begin by answering the questions you were asked, looking at the logs, and doing things to try to resolve your error.

The locale doesn't match, and you're getting issues related to locale....why is that (or the fix) surprising???
 
1 members found this post helpful.
Old 11-20-2023, 01:58 PM   #5
Phunction
LQ Newbie
 
Registered: Dec 2009
Posts: 15

Original Poster
Rep: Reputation: 0
From my understanding, the 2 locales where fairly compatible. I did not want to change the locale in case it affected something else on the server.
 
Old 11-20-2023, 01:58 PM   #6
Phunction
LQ Newbie
 
Registered: Dec 2009
Posts: 15

Original Poster
Rep: Reputation: 0
Also, try not being an A hole about it.
 
Old 11-20-2023, 02:14 PM   #7
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,753

Rep: Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983
Quote:
Originally Posted by Phunction View Post
From my understanding, the 2 locales where fairly compatible. I did not want to change the locale in case it affected something else on the server.
"Fairly compatible" is not 100% compatible...if they were, there wouldn't be two locales, would there???
Quote:
Originally Posted by Phunction
Also, try not being an A hole about it.
Try thinking about things before posting. You copy/paste a problem onto multiple forums, then you don't answer questions when asked (and you STILL don't), get a 'hint' about what to change, then post things like this??? Good luck.
 
Old 11-20-2023, 03:17 PM   #8
scasey
LQ Veteran
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.9.2009
Posts: 5,750

Rep: Reputation: 2222Reputation: 2222Reputation: 2222Reputation: 2222Reputation: 2222Reputation: 2222Reputation: 2222Reputation: 2222Reputation: 2222Reputation: 2222Reputation: 2222
I’ve never figured out a way to make Word’s “smart quotes” display consistently on a web page. There may be one, but I found it easier to just edit the document to change them into normal quote marks.
I started to write a script to do that, but while working on it I discovered it only takes 3 or 4 steps to just edit a page in Word, so I never finished the script.

Besides the server locale and page’s Content-type, I believe that the settings and available fonts on the browser/client computer also have an effect on the ability to have them work. As I said, I gave up and eliminated the source of the problem.

Last edited by scasey; 11-20-2023 at 03:25 PM.
 
Old 11-20-2023, 04:06 PM   #9
Phunction
LQ Newbie
 
Registered: Dec 2009
Posts: 15

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by scasey View Post
I’ve never figured out a way to make Word’s “smart quotes” display consistently on a web page. There may be one, but I found it easier to just edit the document to change them into normal quote marks.
I started to write a script to do that, but while working on it I discovered it only takes 3 or 4 steps to just edit a page in Word, so I never finished the script.

Besides the server locale and page’s Content-type, I believe that the settings and available fonts on the browser/client computer also have an effect on the ability to have them work. As I said, I gave up and eliminated the source of the problem.
Thanks, unfortunately there are a few hundred pages to edit. I tried doing a find for that quote to do a find and replace, but grep is unable to find that mark. I tried a simple grep * -e '’' but that does not work. That character is byte 0x92 but not sure how to do a search and replace on it.

I don't think it is a browser issue as the same file displays fine in the same browser on the old server. I may just try changing locales and hope nothing breaks.
 
Old 11-20-2023, 05:26 PM   #10
metaed
Member
 
Registered: Apr 2022
Location: US
Distribution: Slackware64 15.0
Posts: 374

Rep: Reputation: 172Reputation: 172
This is only a partial answer.

Your files are not UTF-8 encoded. A right-single-quote would have been UTF-8 encoded as 0xe28099 (three bytes). The 0x92 encoding is characteristic of Windows-1252. So setting AddDefaultCharset utf-8 is, unfortunately, wide of the mark. I suspect you should aim for an encoding of windows-1252.

How to do that depends too much on how your local system is preconfigured for me to make a good guess about. AddDefaultCharset windows-1252 might work in the right context. If not, you might try adding <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> to one of your files and see if you get the results you want.
 
Old 11-20-2023, 05:57 PM   #11
Phunction
LQ Newbie
 
Registered: Dec 2009
Posts: 15

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by metaed View Post
This is only a partial answer.

Your files are not UTF-8 encoded. A right-single-quote would have been UTF-8 encoded as 0xe28099 (three bytes). The 0x92 encoding is characteristic of Windows-1252. So setting AddDefaultCharset utf-8 is, unfortunately, wide of the mark. I suspect you should aim for an encoding of windows-1252.

How to do that depends too much on how your local system is preconfigured for me to make a good guess about. AddDefaultCharset windows-1252 might work in the right context. If not, you might try adding <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> to one of your files and see if you get the results you want.
That is the strange part though, the old server, which works fine is set to UTF-8 in the apache conf file. I tried your suggestion and set the html header to the windows charset but no difference.

Right now I am trying to figure out how to do a search and replace on the command line to change the windows specific chars to the ascii counter part.
 
  


Reply

Tags
apache, utf-8



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Converting UTF-16 files to another encoding (such as UTF-8) crisostomo_enrico Solaris / OpenSolaris 3 03-25-2008 05:30 PM
im getting UTF-8 to STRING: Could not open converter from 'UTF-8' to 'ISO-8859-1' jabka Linux - Newbie 2 11-24-2006 05:44 AM
How do I know how a file is encoded? UTF-8, UTF-16, etc.. ?? brynjarh Linux - General 1 12-03-2004 11:11 AM
[Enter] in text documents diffrent on Windows and Linux? UTF-8/UTF-16 problem or? brynjarh Linux - General 1 11-24-2004 05:20 AM
X11 / UTF-8 locale seems missing 'fr_FR.UTF-8' chrsitophermann Debian 11 07-17-2004 02:04 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 09:29 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration