public class SimpleHtmlExtractor extends Object
Modifier and Type | Method and Description |
---|---|
protected static org.htmlcleaner.HtmlCleaner |
getHtmlCleaner() |
static String |
getInnerHtml(String html,
String tagName,
boolean byHtmlCleaner)
Extracts inner HTML of the tag which is first found by the
tagName . |
static String |
getInnerText(String html,
String tagName)
Extracts inner text of the tag which is first found by the
tagName . |
static String |
getText(String html)
Extracts text of the html mark ups.
|
protected static org.htmlcleaner.HtmlCleaner getHtmlCleaner()
public static String getInnerHtml(String html, String tagName, boolean byHtmlCleaner)
tagName
.
If byHtmlCleaner
parameter is set to true, then HTML Cleaner library
will be used to extract the inner content of the tag found by the tagName
.
You can use byHtmlCleaner
option to extract complex html tags, but it
requires more operations because it needs html cleaning.
So, for simple html input and for better performance, you can extract tags with simple
extracting option by setting byHtmlCleaner
to false.
If the html input is more complex and you need more correct result, then you need to set
byHtmlCleaner
to true with more operational cost.
If tagName is null or empty, then the root element is used.
html
- tagName
- the name of the tag including the root or null/empty for root tagbyHtmlCleaner
- null
when the tag is not foundpublic static String getInnerText(String html, String tagName)
tagName
.
If tagName is null or empty, then the root element is used.
html
- tagName
- the name of the tag including the root or null/empty for root tagCopyright © 2008–2016 Hippo B.V. (http://www.onehippo.com). All rights reserved.