SimpleHtmlExtractor (Hippo Site Toolkit 2.24.07 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.hippoecm.hst.utils
Class SimpleHtmlExtractor

java.lang.Object
  org.hippoecm.hst.utils.SimpleHtmlExtractor

public class SimpleHtmlExtractor
extends Object
extends Object

Simple HTML Tag Extractor

Version:: $Id: SimpleHtmlExtractor.java 22564 2010-04-27 12:53:45Z wko $

Method Summary
`protected static org.htmlcleaner.HtmlCleaner`	`getHtmlCleaner()`
`static String`	`getInnerHtml(String html, String tagName, boolean byHtmlCleaner)` Extracts inner HTML of the tag which is first found by the `tagName`.
`static String`	`getInnerText(String html, String tagName)` Extracts inner text of the tag which is first found by the `tagName`.
`static String`	`getText(String html)` Extracts text of the html mark ups.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Method Detail

getHtmlCleaner

protected static org.htmlcleaner.HtmlCleaner getHtmlCleaner()

getInnerHtml

public static String getInnerHtml(String html,
                                  String tagName,
                                  boolean byHtmlCleaner)

Extracts inner HTML of the tag which is first found by the tagName. If byHtmlCleaner parameter is set to true, then HTML Cleaner library will be used to extract the inner content of the tag found by the tagName.

You can use byHtmlCleaner option to extract complex html tags, but it requires more operations because it needs html cleaning. So, for simple html input and for better performance, you can extract tags with simple extracting option by setting byHtmlCleaner to false. If the html input is more complex and you need more correct result, then you need to set byHtmlCleaner to true with more operational cost.

If tagName is null or empty, then the root element is used.

Parameters:: html -; tagName - the name of the tag including the root or null/empty for root tag; byHtmlCleaner -
Returns:: String innerHTML of the tag or null when the tag is not found

getInnerText

public static String getInnerText(String html,
                                  String tagName)

Extracts inner text of the tag which is first found by the tagName.