function readOnly(count){ }
Starting November 20, the site will be set to read-only. On December 4, 2023,
forum discussions will move to the Trailblazer Community.
+ Start a Discussion

get content of .docx attachment


I am trying get the content of the attachment using APEX.
For this I have used -
    string myEncodedString = EncodingUtil.base64Encode(mailAttachment.Body);
        system.debug('myEncodedString +++++++ ' + myEncodedString);
        string attBody = EncodingUtil.base64Decode(myEncodedString).toString();

With this I am able get data for csv and .txt(UTF-8 enconded) file but I am not abl read the contents of .docx, .xlsx, .pdf etc
The error I get is 'BLOB is not a valid UTF-8 string'.
After a bit of research I found that, though formats like .docx are encoded in UTF-8 but they are archieved formats (similar to zip).

So can anybody help me to get the content of these formats.

Help would be appriciated.

You may be able to convert the blob to hex [1] and then do something with that hex blob.  The problem will then be extracting the zip to get to the xml contents inside of the file.  I do not know of a way to do this in Apex, and I would not recommend even trying since you will be severly limited by the CPU governor.

Thanks pcon.
I had already tried this solution, but I am getting only special characters in it (similar to Firoz Khan in the link pprovided by you).