GOP大模型理论与实践

课程学习建议

Notebook案例列表

1. 大模型概述与基础原理

1.1 语言模型历史演进

1.1.1 语言模型定义

1.1.2 自回归语言模型

1.1.3 文本表示

1.1.3.1 Word2vec

1.1.3.2 FastText

1.1.3.3 Word2Vec-like embeddings局限性

1.1.4 编码器-解码器模型

1.1.5 Transformer

1.1.5.1 Transformer模型结构

1.1.5.2 Self-Attention

1.1.5.3 Multi-Head Attention

1.1.5.4 Cross Attention

1.1.6 LLM前夜–NLP范式转变

1.1.7 编码器模型

1.1.8 解码器模型

1.1.8.2 GPT2-Generalizing to Unseen Tasks

1.1.8.3 GPT2-Task Specifications

1.1.8.4 GPT3 –Language Models are Few-Shot Learners

1.1.9 Large Language Models(LLMs)

1.2 LLM的核心特性

1.2.2 涌现能力

1.2.2.1 In-context learning

1.2.2.2 Instruction following

1.2.2.3 Chain of Thought

1.2.3 大模型能力边界

2. 大模型预训练

2.1 Data Collection and Preparation

2.1.1 Data Source

2.1.2 Data Preprocessing

2.1.3 Data Scheduling

2.1.4 Tokenization

2.2 LLM结构拆解

2.2.1 多种结构比较

2.2.2 组件拆解

2.2.2.1 Embedding

2.2.2.2 Positional Encoding

2.2.2.3 Activation

2.2.2.4 Add & Norm

2.2.2.5 Summary of network configurations

2.2.3 Self Attention

2.2.4 Multi-Head Attention

2.2.5 Attention 改进

2.2.5.1 Softmax分析

2.2.5.2 Linear Attention改进

2.2.5.3 Sparse Attention改进

2.2.5.4 Multi-Head Attention改进

2.3 训练方法

2.3.1 计算、网络、内存瓶颈分析

2.3.2 模型并行、数据并行

2.3.3 混合精度训练

3. 大模型后训练

3.1 后训练简介

3.2 Fine tuning

3.2.2 优缺点

3.2.4 适用场景

3.3 Reinforcement learning

3.3.2 方法介绍

3.3.3 奖励模型

3.4 Test-time Scaling Methods

3.4.2 方法介绍

3.4.2.1 CoT 推理

3.4.2.2 树形推理

3.4.2.3 搜索策略

3.4.3 优缺点

4. 大模型应用开发

4.1 Prompt Engineering

4.4 Agent或Agentic Workflow

4.5 模型选择

4.6 工具、框架选择

5. 微调实战

5.1 数据准备

5.1.1 常见数据集介绍

5.1.2 自定义数据集

5.2 框架选择

5.3 硬件选择

5.4 模型选择

5.5 训练方法

6.1 benchmark准备

6.2 框架选择

7. 推理部署实战

7.1 框架选择

7.2 硬件选择

7.3 常见推理优化方法

大模型概述与基础原理