jsoup extract CSS class name of HTML element in Java
Tags: Java jsoup HTML Parser
Introduction
In this post, we are going to explore how to use jsoup library in a Java application to extract css class names of an element in the HTML document.
Add jsoup library to your Java project
To use jsoup Java library in the Gradle build project, add the following dependency into the build.gradle file.
compile 'org.jsoup:jsoup:1.13.1'
To use jsoup Java library in the Maven build project, add the following dependency into the pom.xml file.
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.13.1</version>
</dependency>
To download the jsoup-1.13.1.jar file you can visit jsoup download page at jsoup.org/download
Sample HTML File
For example, we have a sample.html file as below.
<!DOCTYPE html>
<html>
<body>
<div id="container" class="class1 class2 class3">
<p>Simple Solution</p>
</div>
</body>
</html>
Extract CSS class names as a String
Following Java code example using Element.className() method to get CSS all class names from class attribute as a Java String.
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import java.io.File;
import java.io.IOException;
public class ExtractCssClassExample1 {
public static void main(String... args) {
try {
String fileName = "sample.html";
File file = new File(fileName);
Document document = Jsoup.parse(file, "UTF-8");
Element element = document.getElementById("container");
String cssClassName = element.className();
System.out.println(cssClassName);
} catch (IOException e) {
e.printStackTrace();
}
}
}
class1 class2 class3
We also can use the Element.attr() method to get the class attribute which returns the same result as Element.className() method.
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import java.io.File;
import java.io.IOException;
public class ExtractCssClassExample2 {
public static void main(String... args) {
try {
String fileName = "sample.html";
File file = new File(fileName);
Document document = Jsoup.parse(file, "UTF-8");
Element element = document.getElementById("container");
String cssClass = element.attr("class");
System.out.println(cssClass);
} catch (IOException e) {
e.printStackTrace();
}
}
}
class1 class2 class3
Extract CSS class names as a Set of String
The jsoup library also provides Element.classNames() method to return all of element’s class names as a Java Set of String.
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import java.io.File;
import java.io.IOException;
import java.util.Set;
public class ExtractCssClassExample3 {
public static void main(String... args) {
try {
String fileName = "sample.html";
File file = new File(fileName);
Document document = Jsoup.parse(file, "UTF-8");
Element element = document.getElementById("container");
Set<String> cssClassNames = element.classNames();
for(String name : cssClassNames) {
System.out.println(name);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
class1
class2
class3
Happy Coding 😊
Related Articles
jsoup extract ID and name of HTML element in Java
jsoup extract text and attributes of HTML element in Java
jsoup extract JavaScript from HTML script element in Java
jsoup extract Inner and Outer HTML of HTML Element in Java
jsoup extract custom data attributes of HTML5 Element in Java
jsoup extract Website Title in Java
jsoup parse HTML Document from a Java String
jsoup parse HTML Document from an URL in Java
jsoup parse HTML Document from a File and InputStream in Java
Pretty Printing HTML String in Java using jsoup
Extract All Links of a web page in Java using jsoup
jsoup Get HTML elements by CSS class name in Java
Clean HTML String to get Safe HTML from Untrusted HTML in Java using jsoup