Java Convert .docx File to .html File using XDocReport

Tags: docx html XDocReport

In this Java tutorial we learn how to convert a Word file to HTML file in Java using the XDocReport library.

Table of contents

  1. Add XDocReport Converter DOCX XWPF Dependency to Java Project
  2. How to convert .docx file to .html file in Java
  3. How to Use FileConverter Class to convert Word to HTML File

Add XDocReport Converter DOCX XWPF Dependency to Java Project

If you use Gradle build project, add the following dependency to the build.gradle file.

implementation group: 'fr.opensagres.xdocreport', name: 'fr.opensagres.xdocreport.converter.docx.xwpf', version: '2.0.3'

If you use Maven build project, add the following dependency to the pom.xml file.

<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.xdocreport.converter.docx.xwpf</artifactId>
    <version>2.0.3</version>
</dependency>

How to convert .docx file to .html file in Java

In Java, with a given Word file we can use the XDocReport API with the following steps to convert it to a HTML file.

  • Step 1: Open the .docx file as an InputStream using FileInputStream.
  • Step 2: Create new XWPFDocument object using the XWPFDocument(InputStream is) constructor.
  • Step 3: Create new instance of XHTMLOptions using the XHTMLOptions.create() static method.
  • Step 4: Write the .html file as an OutputStream using FileOutputStream.
  • Step 5: Use the XHTMLConverter.getInstance().convert( XWPFDocument document, OutputStream out, T options ) method to convert the .docx file to .html file.

In the FileConverter Java class below, we implement the convertWordToHtml(String docxFileName, String htmlFileName) method to convert .docx file to .html file with given file names.

FileConverter.java

import fr.opensagres.poi.xwpf.converter.xhtml.XHTMLConverter;
import fr.opensagres.poi.xwpf.converter.xhtml.XHTMLOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;
import java.io.IOException;
import java.io.OutputStream;

public class FileConverter {

    public void convertWordToHtml(String docxFileName, String htmlFileName) {
        try(InputStream inputStream = new FileInputStream(docxFileName);
            OutputStream outputStream = new FileOutputStream(htmlFileName)) {
            XWPFDocument document = new XWPFDocument(inputStream);
            XHTMLOptions options = XHTMLOptions.create();
            // Convert .docx file to .html file
            XHTMLConverter.getInstance().convert(document, outputStream, options);
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

How to Use FileConverter Class to convert Word to HTML File

For example, we have a sample Word file located at D:\SimpleSolution\Data\Document.docx with the content as the screenshot below.

Java Convert .docx File to .html File using XDocReport

In the following example Java program, we use the FileConverter class in the previous step to convert the sample Word file above to a HTML file.

ConvertDocxToHtmlExample1.java

public class ConvertDocxToHtmlExample1 {
    public static void main(String... args) {
        String docxFileName = "D:\\SimpleSolution\\Data\\Document.docx";
        String htmlFileName = "D:\\SimpleSolution\\Data\\Document.html";

        FileConverter fileConverter = new FileConverter();
        fileConverter.convertWordToHtml(docxFileName, htmlFileName);
    }
}

Execute the Java application, we have the HTML file be generated at D:\SimpleSolution\Data\Document.html, open in the browser we have the screenshot below.

Java Convert .docx File to .html File using XDocReport

Happy Coding 😊

Java Convert .docx File to .pdf File using XDocReport

Spring Boot Convert Markdown to HTML using CommonMark