zoukankan      html  css  js  c++  java
  • 用正则表达式匹配用rdf3x处理过后的TTL格式文档

    1、比如下面这个用rdf3x处理过后的TTL文档片段:

    注意缩进的是两个空格

    <http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363853> <http://rdf.ebi.ac.uk/terms/chembl#hasBindingSite> <http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2622>.
    <http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2659> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.ebi.ac.uk/terms/chembl#BindingSite>;
      <http://www.w3.org/2000/01/rdf-schema#label> "CHEMBL_BS_2659";
      <http://rdf.ebi.ac.uk/terms/chembl#chemblId> "CHEMBL_BS_2659";
      <http://rdf.ebi.ac.uk/terms/chembl#hasTarget> <http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363965>;
      <http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName> "30S ribosomal protein S1".
    <http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363965> <http://rdf.ebi.ac.uk/terms/chembl#hasBindingSite> <http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2659> , <http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2623>.
    <http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2623> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.ebi.ac.uk/terms/chembl#BindingSite>;
      <http://www.w3.org/2000/01/rdf-schema#label> "CHEMBL_BS_2623";
      <http://rdf.ebi.ac.uk/terms/chembl#chemblId> "CHEMBL_BS_2623";
      <http://rdf.ebi.ac.uk/terms/chembl#hasTarget> <http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363965>;
      <http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName> "16S/23S ribosomal RNA interface".
    <http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2624> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.ebi.ac.uk/terms/chembl#BindingSite>;
      <http://www.w3.org/2000/01/rdf-schema#label> "CHEMBL_BS_2624";
      <http://rdf.ebi.ac.uk/terms/chembl#chemblId> "CHEMBL_BS_2624";
      <http://rdf.ebi.ac.uk/terms/chembl#hasTarget> <http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022>;
      <http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName> "23S ribosomal RNA".
    <http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022> <http://rdf.ebi.ac.uk/terms/chembl#hasBindingSite> <http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2624>.

    2、Java编写的正则表达式代码

    代码里注释的部分和上面那行是输出三种所需的不同结果

    package com.jena;
    
    import java.io.BufferedReader;
    import java.io.FileReader;
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    
    public class rdfReader3 {
        static String url="";
        
        public static void main(String[] args) {
            FileReader fr=null;
            BufferedReader br=null;
            try{
                fr=new FileReader("C:/Users/Don/workspace/Jena/src/com/jena/bindingsite");
                br=new BufferedReader(fr);
                String s=" ";
                StringBuffer str=new StringBuffer();
                while((s=br.readLine())!=null){
                    Pattern p= Pattern.compile("<([^<>]*)>");    //匹配所有尖括号里的内容
    //                Pattern p= Pattern.compile("^
    *<([^<>]*)>");    //匹配每一个主语,开头匹配“除了空格所有字符”,后面匹配"<>里的所有内容,内容为非尖括号"
    //                Pattern p= Pattern.compile("  <([^<>]*)>");        //匹配“两个空格开头”,后面匹配"<>里的所有内容,内容为非尖括号"
                    Matcher m=p.matcher(s);
                  
                    while(m.find()){
                        System.out.println(m.group(1));
                    }
                }
                
            }catch(Exception e){
                System.out.println(e.getMessage());
            }
            
            
        }
        
        
    
    }

    (1)匹配所有尖括号里的内容

    运行结果

    http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363853
    http://rdf.ebi.ac.uk/terms/chembl#hasBindingSite
    http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2622
    http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2659
    http://www.w3.org/1999/02/22-rdf-syntax-ns#type
    http://rdf.ebi.ac.uk/terms/chembl#BindingSite
    http://www.w3.org/2000/01/rdf-schema#label
    http://rdf.ebi.ac.uk/terms/chembl#chemblId
    http://rdf.ebi.ac.uk/terms/chembl#hasTarget
    http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363965
    http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName
    http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363965
    http://rdf.ebi.ac.uk/terms/chembl#hasBindingSite
    http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2659
    http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2623
    http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2623
    http://www.w3.org/1999/02/22-rdf-syntax-ns#type
    http://rdf.ebi.ac.uk/terms/chembl#BindingSite
    http://www.w3.org/2000/01/rdf-schema#label
    http://rdf.ebi.ac.uk/terms/chembl#chemblId
    http://rdf.ebi.ac.uk/terms/chembl#hasTarget
    http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363965
    http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName
    http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2624
    http://www.w3.org/1999/02/22-rdf-syntax-ns#type
    http://rdf.ebi.ac.uk/terms/chembl#BindingSite
    http://www.w3.org/2000/01/rdf-schema#label
    http://rdf.ebi.ac.uk/terms/chembl#chemblId
    http://rdf.ebi.ac.uk/terms/chembl#hasTarget
    http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022
    http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName
    http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022
    http://rdf.ebi.ac.uk/terms/chembl#hasBindingSite
    http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2624

    (2)匹配每一个主语,即开头不是两个空格的那一行数据的第一对尖括号里的内容

    运行结果

    http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363853
    http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2659
    http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363965
    http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2623
    http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2624
    http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022

    (3)匹配“两个空格开头”,后面匹配"<>里的所有内容,内容为非尖括号"

    http://www.w3.org/2000/01/rdf-schema#label
    http://rdf.ebi.ac.uk/terms/chembl#chemblId
    http://rdf.ebi.ac.uk/terms/chembl#hasTarget
    http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName
    http://www.w3.org/2000/01/rdf-schema#label
    http://rdf.ebi.ac.uk/terms/chembl#chemblId
    http://rdf.ebi.ac.uk/terms/chembl#hasTarget
    http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName
    http://www.w3.org/2000/01/rdf-schema#label
    http://rdf.ebi.ac.uk/terms/chembl#chemblId
    http://rdf.ebi.ac.uk/terms/chembl#hasTarget
    http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName
    

     匹配前面两个空格开始的数据时,在前面直接输入两个空格即可

      Pattern p= Pattern.compile("  <([^<>]*)>"); 
  • 相关阅读:
    一个意外错误使你无法删除该文件,文件或目录损坏且无法读取(转)
    测验3: 基本数据类型 (第3周)-程序题
    Oracle深入学习
    自动化测试
    时尚随感
    SQL-使用事务删除重复记录行
    HDU1878欧拉回路
    简单的完全背包HDU1114
    简单的背包变形HDU1203,HDU2955
    简单的背包问题(入门)HDU2602 HDU2546 HDU1864
  • 原文地址:https://www.cnblogs.com/Donnnnnn/p/5718947.html
Copyright © 2011-2022 走看看