function readOnly(count){ }
Starting November 20, the site will be set to read-only. On December 4, 2023,
forum discussions will move to the Trailblazer Community.
+ Start a Discussion
AKS018AKS018 

To Read Ms word (.doc/.docx)file through Apex

Hi,

 

Can you plaese give me an example or suggestion to Read Ms word (.doc/.docx)file through Apex.

Alexander_EAlexander_E
Facing the same problem. Would be great to get rid of layout things. We need to read the text out of an MS Word (.doc/.docx)file through Apex.
Anand SinghAnand Singh

If you try to upload document through Visualforce page (using browse button) and try to extract the content in apex controller, it may most probably fail because of the viewstate limit (if the document size is large).

Trick is to upload the document into Attachments (linked to some dummy record that can be delete later) and extract the Attachment's content into apex controller/class - containing the XML parsing logic (apex XML Parser can be utilized to parse the word document). 

To reduce the viewstate in apex controller -
a) Do not parse the entire word document, instead utilize TEXT.substring() method to grab the portion of document required for parsing.
b) Make sure to NULLIFY the variables (containing large text content from attachments) once their purpose is served.

Possible issue:

Word document may not be perfect XML as it may be missing closing tags or punctuations. To handle this issue, you need to develop logic that will rectify the XML (by adding closing tags and missing punctuation) BEFORE the text is submitted for parsing by XML Parser.

Hope this helps!!

Peter MoralesPeter Morales
 Powerful Word Recovery Tool (http://www.softmagnat.com/word-recovery.html) software which can easily can fix all the corrupt Word documents which have docx or doc extensions. It can safely repair Word files. It preserves the original fonts, text, headers/footers, and images of documents. It can easily and efficiently repair inaccessible Doc/Docx files. To download :  http://www.softmagnat.com/word-recovery.html
Maneet SinghManeet Singh
Use Doc File Recovery Tool and repair corrupt MS Office corrupt Doc file easily. Using Doc file recovery tool a user of Office Application can recover their corrupt Word Doc file in a few minute. So get this Word Recovery Tool and use it in all window operating system. Lets take a trial of Word Recovery Tool from its free demo version. visit: Word Recovery tool. (http://www.sysinfotools.com/recovery/word-file-repair-software.php)
Geoff MarshGeoff Marsh
Word Repair Tool which can easily and carefully repair or restore with complete data such as OLE objects, images, forms, graphs, hyper links, tables, text, headers, and footnotes from corrupted or damaged word documents without any problem. It can easily repair and recover the corrupt, damaged or inaccessible Word documents of any formats like DOC, DOCX, DOCM, DOTX and DOTM.

For any query, Click here:-   http://www.mannatsoftware.com/stellar-phoenix-word-repair.html
Olga Smith 6Olga Smith 6
User can use some manual method before download any Word Recovery Tool. Yes! here're some manual methods which can renovate damaged Word File format such as DOCX, DOC, DOTX, DOCM, and DOTM created by MS Office 2016, 2013, 2010, 2007, etc. With the help of manual methods user can restore all Word saved images, tables, text, charts, links, etc. Read more: http://recoveryandmanagement.com/repair-word-file-manually/