|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.hippoecm.hst.utils.SimpleHtmlExtractor
public class SimpleHtmlExtractor
Simple HTML Tag Extractor
| Method Summary | |
|---|---|
protected static org.htmlcleaner.HtmlCleaner |
getHtmlCleaner()
|
static String |
getInnerHtml(String html,
String tagName,
boolean byHtmlCleaner)
Extracts inner HTML of the tag which is first found by the tagName. |
static String |
getInnerText(String html,
String tagName)
Extracts inner text of the tag which is first found by the tagName. |
static String |
getText(String html)
Extracts text of the html mark ups. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Method Detail |
|---|
protected static org.htmlcleaner.HtmlCleaner getHtmlCleaner()
public static String getInnerHtml(String html,
String tagName,
boolean byHtmlCleaner)
tagName.
If byHtmlCleaner parameter is set to true, then HTML Cleaner library
will be used to extract the inner content of the tag found by the tagName.
You can use byHtmlCleaner option to extract complex html tags, but it
requires more operations because it needs html cleaning.
So, for simple html input and for better performance, you can extract tags with simple
extracting option by setting byHtmlCleaner to false.
If the html input is more complex and you need more correct result, then you need to set
byHtmlCleaner to true with more operational cost.
If tagName is null or empty, then the root element is used.
html - tagName - the name of the tag including the root or null/empty for root tagbyHtmlCleaner -
null when the tag is not found
public static String getInnerText(String html,
String tagName)
tagName.
If tagName is null or empty, then the root element is used.
html - tagName - the name of the tag including the root or null/empty for root tag
public static String getText(String html)
html -
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||