SimpleHtmlExtractor (Hippo Site Toolkit 4.1.0 API)

java.lang.Object
- org.hippoecm.hst.utils.SimpleHtmlExtractor

public class SimpleHtmlExtractor
extends Object

Simple HTML Tag Extractor

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`protected static org.htmlcleaner.HtmlCleaner`	`getHtmlCleaner()`
`static String`	`getInnerHtml(String html, String tagName, boolean byHtmlCleaner)` Extracts inner HTML of the tag which is first found by the `tagName`.
`static String`	`getInnerText(String html, String tagName)` Extracts inner text of the tag which is first found by the `tagName`.
`static String`	`getText(String html)` Extracts text of the html mark ups.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Method Detail
  - getHtmlCleaner
```
protected static org.htmlcleaner.HtmlCleaner getHtmlCleaner()
```
  - getInnerHtml
```
public static String getInnerHtml(String html,
                                  String tagName,
                                  boolean byHtmlCleaner)
```
    Extracts inner HTML of the tag which is first found by the tagName. If byHtmlCleaner parameter is set to true, then HTML Cleaner library will be used to extract the inner content of the tag found by the tagName.
    You can use byHtmlCleaner option to extract complex html tags, but it requires more operations because it needs html cleaning. So, for simple html input and for better performance, you can extract tags with simple extracting option by setting byHtmlCleaner to false. If the html input is more complex and you need more correct result, then you need to set byHtmlCleaner to true with more operational cost.
    
    If tagName is null or empty, then the root element is used.
    
    Parameters:
    
    html -
    
    tagName - the name of the tag including the root or null/empty for root tag
    
    byHtmlCleaner -
    
    Returns:
    
    String innerHTML of the tag or null when the tag is not found
  - getInnerText
```
public static String getInnerText(String html,
                                  String tagName)
```
    Extracts inner text of the tag which is first found by the tagName.
    If tagName is null or empty, then the root element is used.
    
    Parameters:
    
    html -
    
    tagName - the name of the tag including the root or null/empty for root tag
    
    Returns:
  - getText
```
public static String getText(String html)
```
    Extracts text of the html mark ups.
    
    Parameters:
    
    html -
    
    Returns:

Class SimpleHtmlExtractor

Method Summary

Methods inherited from class java.lang.Object

Method Detail

getHtmlCleaner

getInnerHtml

getInnerText

getText