Indexed by CSTPCD
Scopus

Petroleum Science Bulletin ›› 2025, Vol. 10 ›› Issue (5): 1069-1082. doi: 10.3969/j.issn.2096-1693.2025.02.026

Previous Articles     Next Articles

Reservoir geology question answering system based on GraphRAG

LIU Zhaonian1,2(), JIANG Bin2, WANG Ning2, MENG Han1,*(), LI Weichong2, JIANG Man2, SHI Yinliang2, LIN Botao1, JIN Yan1,3,4   

  1. 1 College of Artificial Intelligence, China University of Petroleum, Beijing 102249, China
    2 CNOOC Research Institute Co., Ltd, Beijing 100028, China
    3 State Key Laboratory of Petroleum Resources and Engineering, China University of Petroleum, Beijing 102249, China
    4 College of Petroleum Engineering, China University of Petroleum, Beijing 102249, China
  • Received:2025-05-07 Revised:2025-09-15 Online:2025-10-15 Published:2025-10-21
  • Contact: MENG Han E-mail:liuzhn@cnooc.com.cn;hmeng@dgqinyehang.com

基于图检索生成增强的油藏地质问答系统构建

刘兆年1,2(), 姜彬2, 王宁2, 孟翰1,*(), 李为冲2, 姜曼2, 师印亮2, 林伯韬1, 金衍1,3,4   

  1. 1 中国免费靠逼视频(北京)人工智能学院,北京 102249
    2 中海油研究总院有限责任公司,北京 100028
    3 中国免费靠逼视频(北京)油气资源与工程全国重点实验室,北京 102249
    4 中国免费靠逼视频(北京)石油工程学院,北京 102249
  • 通讯作者: 孟翰 E-mail:liuzhn@cnooc.com.cn;hmeng@dgqinyehang.com
  • 作者简介:刘兆年(1987年—),在读博士研究生,高级工程师,主要研究方向为油气行业数据治理与数智化建设,liuzhn@cnooc.com.cn
  • 基金资助:
    中国免费靠逼视频(北京)前沿交叉探索研究计划(2462024XKQY003);国家自然科学基金“砾岩储层砾石—交界面—基质合压水力裂缝非平面扩展机制研究”(42277122)

Abstract:

In the process of petroleum exploration and development, long-term accumulated documents contain a large amount of engineering knowledge and practical experience, and these materials are of great significance for the scientific development of oilfields and production decision-making. However, such information is mostly preserved in multimodal and unstructured forms such as textual descriptions, data tables, and illustrative figures, lacking a unified structured representation, which leads to low efficiency in retrieval and utilization, and makes the knowledge difficult to be systematically applied. Traditional information retrieval methods have limitations in dealing with complex cross-paragraph and multimodal corpora, and relying only on large-scale language models for question answering is prone to hallucinations and context fragmentation, which cannot meet the requirements of professional fields for accuracy and interpretability. To solve this problem, this paper, based on Microsoft’s open-source graph retrieval-augmented generation framework, constructs a graph retrieval-augmented intelligent question answering system for reservoir geology. Aiming at the linguistic complexity, hierarchical diversity, and structural heterogeneity of oilfield documents, three optimization methods were applied: a logical structure-based segmentation method was used to identify heading hierarchies and numbering rules to achieve reasonable division of semantic units, thereby avoiding semantic fragmentation in entity and relation extraction; a prompt optimization mechanism combined with the terminology system of reservoir geology was applied to improve the accuracy and completeness of entity and relation recognition and extraction, and to reduce errors and omissions; and a multimodal output mechanism was employed to realize the linkage of textual answers with relevant figures and tables through embedding matching, so that the results not only have linguistic coherence but also obtain visual evidence support, enhancing the interpretability and credibility of the answers. In the experimental part, a comprehensive report of about ninety thousand characters from a typical offshore oilfield was used as the data source to construct a knowledge graph and carry out system evaluation. Compared with unoptimized methods and the original framework, the results show that the optimized system has achieved significant improvements in factuality, answer relevance, context precision, and context recall. The improvement in factuality and answer relevance indicates that the system can more accurately generate answers that conform to facts and question intent, while the improvement in context indicators shows that it has greater advantages in cross-paragraph integration and multimodal association. The research results show that this system exhibits higher accuracy and reliability in knowledge extraction, organization, and application, has good engineering adaptability and scalability, not only provides a feasible solution for the structured management and intelligent utilization of complex oilfield knowledge, but also offers references and practical experience for the application of large language models in petroleum engineering and other highly specialized fields.

Key words: graph-retrieval-augmented generation, reservoir geology, multimodal retrieval, intelligent question answering, large language model

摘要:

在石油勘探开发过程中,长期积累的文档中蕴含大量工程知识与实践经验,这些资料对油田科学开发和生产决策具有重要意义。然而,此类信息多以文字描述、数据表格及说明性图件等多模态和非结构化形式保存,缺乏统一结构化表达,导致检索与利用效率低下,知识难以系统化应用。传统信息检索方法在处理跨段落、多模态复杂语料时存在局限,仅依靠大规模语言模型进行问答又容易出现虚假生成和上下文割裂,难以满足专业领域对准确性和可解释性的要求。为解决这一问题,本文基于微软开源的图检索增强生成框架,构建了一种面向油藏地质的图检索增强智能问答系统。针对油田文档语言复杂、层次繁多和结构异构的特点,应用了三种优化方法:一是基于文档逻辑结构的切分方法,通过识别标题层级和编号规则实现语义单元合理划分,避免实体关系抽取中的语义割裂;二是结合油藏地质术语体系的提示词优化机制,提升关键实体与语义关系识别与抽取的准确性和完整性,减少错误和遗漏;三是多模态输出机制,通过嵌入匹配实现文本回答与相关图表联动,使结果既具备语言连贯性,又能得到可视化证据支撑,增强了解释性和可信度。实验部分以某典型海域油田约九万字的综合报告为数据源,构建知识图谱并开展评估。与未优化方法和原始框架对比,结果表明优化系统在真实性、答案相关性、上下文精确率和上下文召回率等指标上均有显著提升,其中真实性和相关性的改善表明系统能够更准确生成符合事实和问题意图的回答,上下文指标的提升则显示其在跨段落整合与多模态关联方面更具优势。研究结果表明,该系统在知识抽取、组织与应用方面表现出更高准确性与可靠性,具有良好的工程适配性与扩展潜力,不仅为油田复杂知识的结构化管理和智能化利用提供了可行方案,也为大语言模型在石油工程等高专业化领域的应用落地提供了参考借鉴。

关键词: 图检索增强, 油藏地质, 多模态检索, 智能问答, 大模型

CLC Number: