jsoup parse HTML Document from a Java String
Tags: Java jsoup HTML Parser
Introduction
In this tutorial we will explore how to use the jsoup library in Java program to parse HTML from a Java String into jsoup Document object.
What is jsoup?
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.
For more information about the library you can visit jsoup homepage at jsoup.org
Add jsoup library to your project
To use jsoup Java library in the Gradle build project, add the following dependency into the build.gradle file.
compile 'org.jsoup:jsoup:1.13.1'
To use jsoup Java library in the Maven build project, add the following dependency into the pom.xml file.
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.13.1</version>
</dependency>
To download the jsoup-1.13.1.jar file you can visit jsoup download page at jsoup.org/download
Parse HTML Document from a Java String
jsoup provides Jsoup.parse() static method with String argument for parsing a String object into jsoup Document object.
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
public class JsoupParseStringExample {
public static void main(String... args) {
String sampleHtml = "<html><head><title>Simple Solution</title></head>" +
"<body><p id='content'>jsoup Tutorial</p></body></html>";
Document document = Jsoup.parse(sampleHtml);
Element contentElement = document.getElementById("content");
System.out.println("Document Title: " + document.title());
System.out.println("Content Text: " + contentElement.text());
}
}
Document Title: Simple Solution
Content Text: jsoup Tutorial
Parse a Fragment of HTML from a Java String
In case we have only a fragment of HTML String, for example user input from a web form then to parse it we can use Jsoup.parseBodyFragment() static method.
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
public class JsoupParseBodyFragmentExample {
public static void main(String... args) {
String sampleUserInput = "<div><p>sample user input text</p></div>";
Document document = Jsoup.parseBodyFragment(sampleUserInput);
Elements textElements = document.getElementsByTag("p");
String contentText = textElements.first().text();
System.out.println(contentText);
}
}
sample user input text
Happy Coding 😊
Related Articles
jsoup parse HTML Document from a File and InputStream in Java
jsoup parse HTML Document from an URL in Java
jsoup extract CSS class name of HTML element in Java
jsoup extract ID and name of HTML element in Java
jsoup extract text and attributes of HTML element in Java
jsoup extract Inner and Outer HTML of HTML Element in Java
jsoup extract JavaScript from HTML script element in Java
jsoup extract custom data attributes of HTML5 Element in Java
jsoup extract Website Title in Java
Pretty Printing HTML String in Java using jsoup