Articles »
May 5, 2009
If you are having trouble transferring text between MS Word and a CMS such as WordPress or Drupal, here’s some straightforward practical advice that should help.
What’s the matter with Word?
Here’s a fairly common scenario: you have some text in a Microsoft Word document you’d like to put in a web page. So you open the document, select the text and do a Copy.

Then you go to your website’s administration panel and find the page to insert into, and do Paste.

At this point probably the new text appears to be OK, so you save the page, but when you go to check it on the site itself, it looks out of place: there’s additional weird code, the font is different from the rest of your content, and very often it’s also the wrong colour or size. Aaaarrrggghhhh!!

What’s going on? First, in order for this to happen you must be using what’s called a Rich Text Editor to edit text on your site: one that gives you a WYSIWYG (What You See Is What You Get) view of content as you enter it. WordPress comes with the TinyMCE WYSIWYG editor built in, the one that’s shown when you select the Visual mode for editing, while Drupal sites may have any one of a number of different ones installed of which the commonest are TinyMCE and FCKEditor.
It doesn’t matter which editor you are using though: if it’s in WYSIWYG mode, you are likely to encounter this problem when copying text from Word. It happens because when you do this Word automatically tries to preserve the formatting of your original document by translating it to HTML. Unfortunately the HTML it generates is non-standard and of poor quality. It contains code that will tend to override the choices for font, text colour and layout that your web designer has built into your site, hence the problems we’ve seen above.
“Wait a minute, though,” you may be saying: “I don’t see this problem yet I copy and paste content from Word all the time.” This may be the case if your site has been set up to filter the HTML entered, thus removing all the extraneous code that was added by Word before it is displayed. Even if that’s the case, and you are happy with how things are working, you may still have a problem with copying images: that’s dealt with in the next section. so keep reading.
What becomes of the broken image?

(Yes, it's meant to look like this!)
It’s worth noting that if your Word document contains images, you will not be able to insert these successfully into a web page by means of a copy and paste operation, whatever you do. Generally, the best alternative method will be copy the text first then use your editor’s image upload function to add each image, but this will only be possible if you have the images available as separate files. Also, each image file must be in a format compatible with the web: either GIF, JPEG or PNG.
If you don’t have your images as files in the right format, here’s a quick solution that works most of the time.
- While editing your document in Word, go to the File-> Save As … menu option.
- Choose to save the document as a web page, and select a suitable name and location for it.
- Close the document.
- Using explorer, browse to where you saved the document. If you called it “my_document” then you’ll see a file called “my_document.htm” and a folder called “my_document_files”. In the folder you’ll find copies of all the images from your document.
- Now you can use the image uploader that comes with your CMS to insert each image into your new post.
Cures for your Word ills
Here’s three simple ways to solve your problems with copying text from MS Word:
- TinyMCE’s Paste from Word tool. If you are using the TinyMCE editor (which is almost certainly the case for WordPress users), there is a special button designed to help with this problem, appropriately enough called “Paste From Word”. It looks like this:
. This button causes a popup window to appear, with an area into which you can paste the text copied from a Word document. TinyMCE attempts to remove all unwanted code from the text before inserting it. Most of the time it does a good enough job, but you should be aware that this function is far from perfect: sometimes code remains that can cause trouble.
- Back to basics. It’s usually possible to turn off the WYSIWYG function of your editor altogether (in WordPress click on the tab labelled “HTML”), and use a plain text entry box instead. If you copy and paste into this, you’ll get the text you want, but without formatting. You can then switch back into the WYSIWYG mode and add back the formatting manually. How attractive an option this is will depend on how much text you are entering and how it’s formatted, but it’s a safe way to deal with the problem. Another way of achieving the same effect is to use a text editor such as Notepad as an intermediate place to hold the text: copy from Word into Notepad, copy from Notepad into your WYSIWYG editor. Again, you’ll have to put back any formatting you need by hand.
- OpenOffice. OpenOffice is a free alternative to Microsoft Office that includes a word processing program called Writer. One of the advantages of using this instead of Word is that it generates much better HTML than Word does, although the text may still include some unwanted formatting. In that case, combining OpenOffice with the Paste from Word tool will often do the trick.
Please note that I haven’t tested the above solutions with every version of Word and every WYSIWYG editor, so your experience may be different. Hopefully, though, I’ve managed to shed some useful light on what is for many a frustrating problem. As ever, if you have any questions please let me know.
This article was originally published exclusively for subscribers to our free newsletter.