function readOnly(count){ }
Starting November 20, the site will be set to read-only. On December 4, 2023,
forum discussions will move to the Trailblazer Community.
+ Start a Discussion
rajubalajirajubalaji 

Is it possible to get the content of the body of any Word (.doc) attachment in Apex?

Hi everyone,

I was trying write a apex class for to get the content of the body of word document as what i was upload.

public void fetchData()
    {
        List<Folder> folderList = [Select name from Folder where name = 'Templates' Limit 1];
        if(folderList.size() > 0){
            List<Document> dList = [SELECT name,contenttype,body from Document where folderid =:folderList[0].id and name = 'Test_Form'];
            if(dList.size() > 0){
                content =  dList[0].body.toString();
            }
        }else{
            return;
        }
        try{
            content = content.replaceAll('@Number','5678');
            content = content.replaceAll('@Topics','Topics');
            content = content.replaceAll('@Name', 'Test');
        }catch(Exception e) 
        {
            System.debug('Exception >>> ' + e.getMessage());
        }
        
    }

But in my word document it having alot of heading and tables.Is it possible to get all content of the body.Please if anyone know if please help me.

Actually i was do sreach all links of our coummuinty still i was not getting proper information.

Thanks in Advacne,
Raju
ShivankurShivankur (Salesforce Developers) 
Hi Raju,

The content of Word documents is compressed using the Zip format. To extract the text, you need to uncompress a file named 'document.xml' that is embeded in all Word files. For that you can use the Zippex library (open source) https://github.com/pdalcol/Zippex

After you install Zippex, use this code to get the content in plain text:
//wordFileBlob is a Blob that contains the Word document
Zippex myZip = new Zippex(wordFileBlob);
//Uncompress data
String wordDoc = myZip.getFile('word/document.xml').toString();
//Remove XML tags
String plainText = wordDoc.stripHtmlTags();

System.debug(plainText);
I hope you have checked through this thread and tried to implement:
https://developer.salesforce.com/forums/?id=906F0000000AZ7OIAW

Hope above information helps, Please mark as Best Answer so that it can help others in the future.

Thanks.
rajubalajirajubalaji
Hi Shivankur,

Thank you so much for reply.

Is it not possible without install Zippex.

we don't want to have on ZIP file.when we click on visual force page perview it want to download thw file what in the word document as it.

if anyone know please help me.

Thanks inadvance,
Raju