Relieve the pain of using Microsoft Word with WordPress or Drupal

Articles

If you are having trouble transferring text between MS Word and a CMS such as WordPress or Drupal, here’s some straightforward practical advice that should help.

What’s the matter with Word?

Here’s a fairly common scenario: you have some text in a Microsoft Word document you’d like to put in a web page. So you open the document, select the text and do a Copy.

Then you go to your website’s administration panel and find the page to insert into, and do Paste.

At this point probably the new text appears to be OK, so you save the page, but when you go to check it on the site itself, it looks out of place: there’s additional weird code, the font is different from the rest of your content, and very often it’s also the wrong colour or size. Aaaarrrggghhhh!!

What’s going on? First, in order for this to happen you must be using what’s called a Rich Text Editor to edit text on your site: one that gives you a WYSIWYG (What You See Is What You Get) view of content as you enter it. WordPress comes with the TinyMCE WYSIWYG editor built in, the one that’s shown when you select the Visual mode for editing, while Drupal sites may have any one of a number of different ones installed of which the commonest are TinyMCE and FCKEditor.

It doesn’t matter which editor you are using though: if it’s in WYSIWYG mode, you are likely to encounter this problem when copying text from Word. It happens because when you do this Word automatically tries to preserve the formatting of your original document by translating it to HTML. Unfortunately the HTML it generates is non-standard and of poor quality. It contains code that will tend to override the choices for font, text colour and layout that your web designer has built into your site, hence the problems we’ve seen above.

“Wait a minute, though,” you may be saying: “I don’t see this problem yet I copy and paste content from Word all the time.” This may be the case if your site has been set up to filter the HTML entered, thus removing all the extraneous code that was added by Word before it is displayed. Even if that’s the case, and you are happy with how things are working, you may still have a problem with copying images: that’s dealt with in the next section. so keep reading.

What becomes of the broken image?

(Yes, it's meant to look like this!)

It’s worth noting that if your Word document contains images, you will not be able to insert these successfully into a web page by means of a copy and paste operation, whatever you do. Generally, the best alternative method will be copy the text first then use your editor’s image upload function to add each image, but this will only be possible if you have the images available as separate files. Also, each image file must be in a format compatible with the web: either GIF, JPEG or PNG.

If you don’t have your images as files in the right format, here’s a quick solution that works most of the time.

  1. While editing your document in Word, go to the File-> Save As … menu option.
  2. Choose to save the document as a web page, and select a suitable name and location for it.
  3. Close the document.
  4. Using explorer, browse to where you saved the document. If you called it “my_document” then you’ll see a file called “my_document.htm” and a folder called “my_document_files”. In the folder you’ll find copies of all the images from your document.
  5. Now you can use the image uploader that comes with your CMS to insert each image into your new post.

Cures for your Word ills

Here’s three simple ways to solve your problems with copying text from MS Word:

  1. TinyMCE’s Paste from Word tool. If you are using the TinyMCE editor (which is almost certainly the case for WordPress users), there is a special button designed to help with this problem, appropriately enough called “Paste From Word”. It looks like this: . This button causes a popup window to appear, with an area into which you can paste the text copied from a Word document. TinyMCE attempts to remove all unwanted code from the text before inserting it. Most of the time it does a good enough job, but you should be aware that this function is far from perfect: sometimes code remains that can cause trouble.
  2. Back to basics. It’s usually possible to turn off the WYSIWYG function of your editor altogether (in WordPress click on the tab labelled “HTML”), and use a plain text entry box instead. If you copy and paste into this, you’ll get the text you want, but without formatting. You can then switch back into the WYSIWYG mode and add back the formatting manually. How attractive an option this is will depend on how much text you are entering and how it’s formatted, but it’s a safe way to deal with the problem. Another way of achieving the same effect is to use a text editor such as Notepad as an intermediate place to hold the text: copy from Word into Notepad, copy from Notepad into your WYSIWYG editor. Again, you’ll have to put back any formatting you need by hand.
  3. OpenOffice. OpenOffice is a free alternative to Microsoft Office that includes a word processing program called Writer. One of the advantages of using this instead of Word is that it generates much better HTML than Word does, although the text may still include some unwanted formatting. In that case, combining OpenOffice with the Paste from Word tool will often do the trick.

Please note that I haven’t tested the above solutions with every version of Word and every WYSIWYG editor, so your experience may be different. Hopefully, though, I’ve managed to shed some useful light on what is for many a frustrating problem. As ever, if you have any questions please let me know.

This article was originally published exclusively for subscribers to our free newsletter.

9 Responses to “Relieve the pain of using Microsoft Word with WordPress or Drupal”

  1. Adam

    It would be a much nicer idea i think to integrate Openoffice into drupal, so that pages/docs can be automatically saved as a node, with no worries about image associations.

  2. Alfred Armstrong

    Interesting idea, Adam, although I wouldn’t like to attempt the integration myself (unless someone has a big pot of money they’d like to spend on it).

    The integration code would have to upload embedded images behind the scenes so they could be turned into HTML IMG tags, and there might be other issues to resolve, but I can see how it might work.

  3. Dane

    If you’re using Drupal, the Office HTML Filter can strip Office-generated HTML gunk, no matter what your choice of editor (TinyMCE, FCKeditor, even posts submitted by mail using Mailhandler)

  4. Pslcbs

    Hello!, for me in Drupal, the best way to get clean text from Word is doing this:
    1.- DIsable your rich text editor with the link that normally you finf under the textarea.
    2.- Paste your text on it.
    3.- Select and Copy the pasted text onto the field. Doing this you obtain a clean piece of text well formatted but without dirty code from Word
    4.- Delete the text
    5.- Enable the rich text editor and paste your clean text.
    6.- Done!!

  5. kangnamkid

    Yeah sure, all of these work-around suggestions do work. But you can’t really tell a client they need to save it into a text file first, etc, etc. The client needs to get work product from Word into Drupal, with no intervening hassles. And no complaints back to me about different fonts showing up all over the site. And yes, many clients have no idea what the difference between a Word file and a text file is. I thought that was the whole point of something like Drupal….to make web publishing easy for the non-technical person…

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>