RSS
SAX与DOM
- XML元素开始,name:channel
- XML元素开始,name:title
- XML文本节点,data:“同济大学软件学院通知RSS”
- XML元素结束,name:title
- XML元素开始,name:link
- XML文本节点,data:“http://sse.tongji.edu.cn”
- XML元素结束,name: link
RSS阅读器Java实现
void startDocument ()
Receive notification of the beginning of thedocument.
void endDocument ()
Receive notification of the end of the document.
void startElement (Stringuri, String localName, String qName, Attributes attributes)
Receive notification of the start of an element.
qName- The qualified name (with prefix), or the empty string if qualified names are not available.
void endElement (String uri,String localName, String qName)
Receive notification of the end of an element.
void characters (char[] ch,intstart,int length)
Receive notification of character data inside an element.
public class RSSItem { private String title; private String description; private String link; private String pubdate; //setters and getters… }
随后我们创建一个解析器RSSHandler,继承DefaultHandler,
public class RSSHandler extends DefaultHandler{ //… }
public void startDocument () { mRSSItem = new RSSItem(); }
public void startElement (String uri, String localName, String qName, Attributes attributes) { //开始解析节点 if (qName.equals("channel")){ return ; } if (qName.equals("item")){ //当遇到一个item节点时,就实例化一个RSSItem对象 mRSSItem = new RSSItem(); return; } if (qName.equals("title")){ currentState = TITLE_STATE; return ; } //the same for description、link and pubDate… }
当遇到文本节点,characters方法会被调用。我们所要做的是根据当前的currentState设置当前mRSSItem的相应属性。
public void characters (char[] ch, int start, int length) { String str = new String(ch, start, length); switch(currentState){ case TITLE_STATE: mRSSItem.setTitle(str); currentState = 0; break; //the same for description、link and pubDate… }
当一条新闻解析完成时,我们需要将当前的mRSSItem存储下来放到List中。
private List<RSSItem> mRSSItems; public RSSHandler(List<RSSItem> mRSSItems){ this.mRSSItems=mRSSItems; //… public void endElement (String uri, String localName, String qName) { //节点解析结束 if(qName.equals("item")) mRSSItems.add(mRSSItem); } }
至此,我们的RSSHandler已经基本完成了,解析来我们要获得inputstream,并用RSSHandler来解析inputstream。
String url_str = "http://sse.tongji.edu.cn/SSEMainRSS.aspx"; try { URL url = new URL(url_str); HttpURLConnection conn; conn = (HttpURLConnection) url.openConnection(); conn.setRequestMethod("GET"); conn.connect(); InputStream in = conn.getInputStream(); //建立工厂对象 SAXParserFactory factory = SAXParserFactory.newInstance(); //产生SAX解析类对象 SAXParser parser = factory.newSAXParser(); //产生XMLReader实例 XMLReader xmlReader = parser.getXMLReader(); //挂接事件处理对象到Reader上 RSSHandler mRSSHandler = new RSSHandler(mRSSItems); xmlReader.setContentHandler(mRSSHandler); //启动串流解析 xmlReader.parse(new InputSource(in)); }catch (MalformedURLException e) { e.printStackTrace(); }
此外,我们需要一个方法来去除获取的信息中的html元素:
public String getContent(String html){ String str = html.replaceAll("\&[a-zA-Z]{1,10};","").replaceAll("<[^>]*>", ""); str = str.replaceAll("[(/>)<]", ""); return str; }
最后,我用一个简单的Swing界面来展示我们的成果,效果如下: