中国科技核心期刊
(中国科技论文统计源期刊)
  Scopus收录期刊

石油科学通报 ›› 2025, Vol. 10 ›› Issue (5): 1056-1068. doi: 10.3969/j.issn.2096-1693.2025.02.027

• • 上一篇    下一篇

基于大语言模型的气井产量预测方法

从梦泽1,2(), 薛亮1,3,*(), 韩江峡1,3, 苗得雨1,3, 刘月田1,3   

  1. 1 中国免费靠逼视频(北京)油气资源与工程全国重点实验室,北京 102249
    2 中国免费靠逼视频(北京)人工智能学院,北京 102249
    3 中国免费靠逼视频(北京)石油工程学院,北京 102249
  • 收稿日期:2025-03-25 修回日期:2025-05-23 出版日期:2025-10-15 发布日期:2025-10-21
  • 通讯作者: *薛亮(1983年—),博士,教授,博士生导师,主要从事智能油气藏渗流理论和开发技术研究,xueliang@dgqinyehang.com
  • 作者简介:从梦泽(2001年—),在读硕士研究生,主要从事人工智能在油气藏开发中的应用研究,dreamze_c@student.dgqinyehang.com
  • 基金资助:
    国家自然科学基金(52274048);北京市自然科学基金(3222037)

A forecasting method for gas well production based on large language model (LLM)

CONG Mengze1,2(), XUE Liang1,3,*(), HAN Jiangxia1,3, MIAO Deyu1,3, LIU Yuetian1,3   

  1. 1 State Key Laboratory of Petroleum Resources and Engineering, China University of Petroleum, Beijing 102249, China
    2 College of Artificial Intelligence, China University of Petroleum, Beijing 102249, China
    3 College of Petroleum Engineering, China University of Petroleum, Beijing 102249, China
  • Received:2025-03-25 Revised:2025-05-23 Online:2025-10-15 Published:2025-10-21
  • Contact: *xueliang@dgqinyehang.com

摘要:

准确可靠的产量预测是油气田高效开发与科学决策的关键环节。尽管机器学习方法已在该领域取得了显著进展,但现有模型通常依赖有限的历史生产数据从零训练,难以有效刻画产量序列中的复杂非线性动态、长期时间依赖性以及多变量间的高维交互关系,导致泛化能力不足、预测鲁棒性受限。为应对上述挑战,本文提出了一种基于大语言模型(Large Language Model, LLM)的气井产量预测新方法。该方法以预训练GPT-2模型为基础,通过几项关键策略实现时序预测适配:首先,对包含日产气量、油压、套压及生产时间的输入数据进行实例归一化,以促进知识迁移;其次,设计可训练的嵌入层,将数值型时序数据映射至LLM的语义嵌入空间,实现跨模态对齐;最后,采用冻结与微调相结合的参数高效迁移策略——冻结LLM的核心自注意力与前馈网络层以保留通用知识,同时微调位置编码与层归一化模块以增强对产量时序特性的建模能力。所构建的GPT4TS模型在四川盆地某海相碳酸盐岩气田实际生产数据上进行了系统验证。实验结果表明:对于开发历史较长的气井,GPT4TS显著优于传统LSTM模型——在单变量输入条件下,平均绝对百分比误差(MAPE)降低18.573%;在多变量输入条件下,MAPE进一步降低35.610%,充分体现了其在复杂趋势建模与多变量协同分析方面的优势。然而,对于投产时间较短的气井,由于历史数据不足以支撑LLM的有效微调,其预测精度反而低于LSTM。本研究不仅验证了大语言模型在油气产量预测中的应用潜力,也揭示了其性能对历史数据长度的依赖性,为实际工程中预测模型的合理选择提供了理论依据与实践指导。

关键词: 产量预测, 大语言模型, 机器学习, 时序数据, 模型微调

Abstract:

Accurate and reliable production forecasting is a critical component for the efficient development of oil and gas fields and supports informed scientific decision-making. Although machine learning methods have achieved significant progress in this domain, existing models are typically trained from scratch using limited historical production data, making it difficult to effectively capture the complex nonlinear dynamics, long-term temporal dependencies, and high-dimensional interactions among variables inherent in production time series. This often leads to insufficient generalization capacity and limited predictive robustness. To address these challenges, this study proposes a novel gas well production forecasting method based on large language models (LLMs). The approach builds upon a pre-trained GPT-2 architecture and incorporates several key adaptations to enable effective time-series prediction. First, the input data—including daily gas production rate, tubing pressure, casing pressure, and production time—are subjected to instance normalization to facilitate knowledge transfer. Second, a trainable embedding layer is designed to map numerical time-series data into the semantic embedding space of the LLM, thereby achieving cross-modal alignment between continuous signals and the discrete representation format required by the model. Third, a parameter-efficient transfer learning strategy combining freezing and fine-tuning is implemented: the core self-attention and feed-forward network layers of the LLM are frozen to preserve general-purpose knowledge acquired during pre-training, while the positional encoding and layer normalization modules are selectively fine-tuned to enhance the model’s ability to characterize temporal patterns specific to production dynamics. The resulting model, termed GPT4TS, is systematically evaluated on real-world production data from a marine carbonate gas reservoir in the Sichuan Basin. Experimental results show that for wells with long production histories, GPT4TS significantly outperforms the conventional LSTM model. Under univariate input, the mean absolute percentage error (MAPE) is reduced by 18.573% on average; under multivariate input, the MAPE reduction reaches 35.610%, demonstrating its superior capability in modeling complex trends and leveraging multi-variable synergies. However, for newly commissioned wells with short production histories, insufficient data hinders effective fine-tuning, leading to lower prediction accuracy compared to LSTM. This study not only validates the potential of large language models in petroleum production forecasting but also highlights their strong dependence on historical data length, providing both theoretical insights and practical guidance for model selection in real-world engineering applications.

Key words: large language model, production prediction, machine learning, time series data, model fine-tuning

中图分类号: