基于MMRotate的遥感图像旋转目标检测实践-洪萨配资

基于MMRotate的遥感图像旋转目标检测实践

如果你处理过卫星遥感图像，特别是那些包含建筑物的图片，可能会发现一个头疼的问题：这些建筑物在图像中往往不是方方正正的。它们可能因为卫星拍摄角度、地形起伏或者建筑物自身朝向而呈现出各种倾斜角度。传统的目标检测方法，比如我们熟悉的YOLO或者Faster R-CNN，通常只能检测出水平的矩形框，对于这些倾斜的目标，检测框会包含大量背景，既不精确，也影响后续的分析。

这就是旋转目标检测要解决的问题。今天，我们就来聊聊如何利用OpenMMLab开源的MMRotate工具箱，来搞定卫星遥感图像中旋转建筑物的检测与角度校正。我会带你走一遍从数据准备、模型训练到实际部署的完整流程，让你看完就能在自己的项目里用起来。

1. 为什么遥感图像需要旋转目标检测？

先来说说为什么这个问题这么重要。想象一下，你手头有一张高分辨率的卫星图像，里面密密麻麻都是各种建筑物。你的任务是：

统计建筑物数量：用于城市规划或灾后评估
测量建筑物尺寸：评估建筑面积或变化监测
分析建筑物朝向：研究城市布局或太阳能板安装潜力

如果用传统的水平检测框，你会遇到几个麻烦：

问题一：检测框不精确一个倾斜的建筑物被水平框框住后，框里会包含很多非建筑物的区域（比如道路、绿地）。这会导致：

面积测量不准
后续的分类或识别受影响
多个相邻建筑物容易被框在一起

问题二：角度信息丢失建筑物的朝向本身可能就是重要信息。比如：

判断建筑物是否违规建设（朝向不符合规划）
分析建筑物采光或通风情况
研究城市街道布局规律

问题三：漏检和误检增加对于密集排列的倾斜建筑物，水平框很容易重叠，导致非极大值抑制（NMS）错误地抑制掉一些正确的检测结果。

这就是为什么我们需要能够输出带角度信息的检测框——也就是旋转目标检测。MMRotate就是专门为解决这类问题而生的工具箱。

2. MMRotate工具箱初探

MMRotate是OpenMMLab生态系统的一部分，如果你用过MMDetection，应该会对它的设计哲学很熟悉。简单来说，MMRotate有这几个特点：

统一的算法框架它集成了目前主流的旋转目标检测算法，比如我们今天要用的Oriented R-CNN，还有RoI Transformer、ReDet等等。你不用为了比较不同算法而折腾多个代码库。

灵活的配置系统OpenMMLab系列的config系统非常强大，你可以通过修改配置文件来切换模型、调整参数，基本不用改代码。

高性能实现底层用了MMCV中的旋转框CUDA算子，训练和推理速度都很快。

支持多种角度定义遥感领域常用的角度定义方式它都支持，比如OpenCV定义法、长边定义法等，避免了你手动转换的麻烦。

对于遥感图像中的建筑物检测，我们选择Oriented R-CNN这个模型。它在精度和速度之间取得了不错的平衡，而且在DOTA数据集（遥感目标检测的基准数据集）上表现很好。

3. 环境搭建与数据准备

3.1 安装MMRotate

首先，我们来搭建环境。建议使用Python 3.8及以上版本，并准备好支持CUDA的GPU。

# 创建虚拟环境（可选但推荐） conda create -n mmrotate python=3.8 -y conda activate mmrotate # 安装PyTorch（根据你的CUDA版本选择） # 这里以CUDA 11.3为例 pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113 # 安装MMCV pip install mmcv-full==1.7.1 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.12/index.html # 安装MMDetection pip install mmdet==2.28.2 # 克隆MMRotate仓库并安装 git clone https://github.com/open-mmlab/mmrotate.git cd mmrotate pip install -v -e .

安装完成后，你可以运行一个简单的检查：

import mmrotate print(mmrotate.__version__)

3.2 准备DOTA数据集

DOTA（Dataset for Object deTection in Aerial images）是遥感目标检测领域最常用的数据集之一。它包含2806张航拍图像，尺寸从800×800到4000×4000不等，标注了15个类别的物体，其中就有我们关心的建筑物（"ship", "storage tank", "baseball diamond", "tennis court", "basketball court", "ground track field", "harbor", "bridge", "large vehicle", "small vehicle", "helicopter", "roundabout", "soccer ball field", "swimming pool", "plane"）。

不过要注意，DOTA数据集中的"building"类别实际上指的是大型车辆，真正的建筑物分散在各个类别中。为了简化，我们可以先以"large-vehicle"类别为例，其检测原理对建筑物同样适用。

数据集下载与处理

下载数据集：
- 访问DOTA官网下载DOTA v1.0
- 或者使用一些公开的镜像源

数据集结构：下载解压后，你会看到这样的目录结构：

DOTA/ ├── train/ │ ├── images/ │ │ ├── P0000.png │ │ ├── P0001.png │ │ └── ... │ └── labelTxt/ │ ├── P0000.txt │ ├── P0001.txt │ └── ... ├── val/ │ ├── images/ │ └── labelTxt/ └── test/ └── images/

理解标注格式： DOTA的标注文件是文本格式，每行代表一个物体，格式为：
```
x1 y1 x2 y2 x3 y3 x4 y4 category difficult
```
其中(x1, y1)到(x4, y4)是四边形四个顶点的坐标（顺时针或逆时针），category是类别名称，difficult表示是否难以检测（0或1）。
转换为MMRotate格式： MMRotate需要特定的格式。幸运的是，MMRotate提供了转换脚本：
```
# 在mmrotate目录下 python tools/data/dota/split/img_split.py \ --base-json tools/data/dota/split/split_configs/ss_train.json python tools/data/dota/split/img_split.py \ --base-json tools/data/dota/split/split_configs/ss_val.json
```
这个脚本会把大图切分成1024×1024的小图（有重叠），因为大多数模型无法直接处理太大的图像。

生成数据列表：切分后，需要生成训练和验证集的文件列表：

# 一个简单的生成脚本示例 import os def generate_file_list(image_dir, label_dir, output_file): with open(output_file, 'w') as f: for img_name in os.listdir(image_dir): if img_name.endswith('.png'): img_path = os.path.join(image_dir, img_name) label_path = os.path.join(label_dir, img_name.replace('.png', '.txt')) if os.path.exists(label_path): f.write(f'{img_path}\n') # 假设切分后的数据在以下目录 train_image_dir = 'data/dota/train/images/' train_label_dir = 'data/dota/train/labelTxt/' val_image_dir = 'data/dota/val/images/' val_label_dir = 'data/dota/val/labelTxt/' generate_file_list(train_image_dir, train_label_dir, 'data/dota/train.txt') generate_file_list(val_image_dir, val_label_dir, 'data/dota/val.txt')

4. 配置与训练Oriented R-CNN模型

4.1 理解配置文件

MMRotate使用配置文件来定义整个训练流程。我们以Oriented R-CNN为例，看看关键的配置部分。

首先，找到配置文件。在MMRotate的configs目录下，有各种模型的配置。对于Oriented R-CNN，我们可以用：

configs/oriented_rcnn/oriented_rcnn_r50_fpn_1x_dota.py

这个配置文件继承了几个基础配置：

# 基础模型配置 _base_ = [ '../_base_/datasets/dota.py', # 数据集配置 '../_base_/schedules/schedule_1x.py', # 训练计划 '../_base_/default_runtime.py' # 运行时配置 ]

关键配置项说明：

模型结构：

model = dict( type='OrientedRCNN', # 模型类型 backbone=dict( # 主干网络 type='ResNet', depth=50, # ResNet-50 num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch', init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), neck=dict( # 特征金字塔 type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( # RPN头 type='OrientedRPNHead', in_channels=256, feat_channels=256, anchor_generator=dict(...), bbox_coder=dict(...), loss_cls=dict(...), loss_bbox=dict(...)), roi_head=dict( # ROI头 type='OrientedStandardRoIHead', bbox_roi_extractor=dict(...), bbox_head=dict( type='RotatedShared2FCBBoxHead', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=15, # DOTA有15个类别 bbox_coder=dict(...), reg_class_agnostic=True, loss_cls=dict(...), loss_bbox=dict(...))), train_cfg=dict(...), # 训练配置 test_cfg=dict(...)) # 测试配置

数据增强：对于遥感图像，数据增强很重要。MMRotate提供了丰富的增强选项：

train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='RResize', img_scale=(1024, 1024)), dict( type='RRandomFlip', flip_ratio=0.5, direction=['horizontal', 'vertical', 'diagonal']), dict( type='PolyRandomRotate', rotate_ratio=0.5, angles_range=180, auto_bound=False), dict(type='Normalize', ...), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ]

PolyRandomRotate是旋转目标检测特有的增强，可以随机旋转图像和标注框。

4.2 开始训练

配置准备好后，就可以开始训练了。如果你GPU内存足够，可以直接训练：

# 单GPU训练 python tools/train.py configs/oriented_rcnn/oriented_rcnn_r50_fpn_1x_dota.py # 多GPU训练（比如4张GPU） CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh configs/oriented_rcnn/oriented_rcnn_r50_fpn_1x_dota.py 4

如果遇到内存不足，可以调整批次大小：

# 在配置文件中修改 data = dict( samples_per_gpu=2, # 每张GPU的批次大小，根据你的GPU内存调整 workers_per_gpu=2, # 数据加载线程数 train=dict(...), val=dict(...), test=dict(...))

训练过程中，你会在控制台看到类似这样的输出：

2024-01-15 10:30:15,276 - mmrotate - INFO - Epoch [1][100/1462] lr: 0.00100, eta: 1 day, 2:30:15, time: 0.512, data_time: 0.012, memory: 5123, loss_rpn_cls: 0.1234, loss_rpn_bbox: 0.0456, loss_cls: 0.2345, loss_bbox: 0.0678, loss: 0.4713

训练完成后，模型权重会保存在work_dirs目录下。

4.3 模型评估

训练结束后，我们需要评估模型性能：

# 单GPU测试 python tools/test.py configs/oriented_rcnn/oriented_rcnn_r50_fpn_1x_dota.py work_dirs/oriented_rcnn_r50_fpn_1x_dota/epoch_12.pth --eval mAP # 多GPU测试 CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_test.sh configs/oriented_rcnn/oriented_rcnn_r50_fpn_1x_dota.py work_dirs/oriented_rcnn_r50_fpn_1x_dota/epoch_12.pth 4 --eval mAP

对于DOTA数据集，评估指标是mAP（mean Average Precision）。一个好的Oriented R-CNN模型在DOTA上能达到70%以上的mAP。

5. 实际应用与部署

5.1 单张图像推理

训练好的模型怎么用？这里给一个完整的推理示例：

import mmcv from mmrotate.apis import inference_detector, init_detector, show_result_pyplot import cv2 import numpy as np # 配置文件和权重文件路径 config_file = 'configs/oriented_rcnn/oriented_rcnn_r50_fpn_1x_dota.py' checkpoint_file = 'work_dirs/oriented_rcnn_r50_fpn_1x_dota/epoch_12.pth' # 初始化模型 model = init_detector(config_file, checkpoint_file, device='cuda:0') # 读取图像 img = 'test_image.jpg' # 推理 result = inference_detector(model, img) # 可视化结果 show_result_pyplot(model, img, result, score_thr=0.3) # 如果你需要获取具体的检测框信息 bboxes = result[0] # 第一个类别的检测结果 for bbox in bboxes: # bbox格式: [x1, y1, x2, y2, x3, y3, x4, y4, score] if bbox[8] > 0.5: # 置信度阈值 print(f'检测到物体，置信度: {bbox[8]:.3f}') print(f'四个顶点坐标: {bbox[:8]}') # 计算旋转矩形的中心点、宽高和角度 # 注意：MMRotate默认使用OpenCV的角度定义法 # 角度范围是[-90, 0)，单位是度 points = bbox[:8].reshape(4, 2) rect = cv2.minAreaRect(points.astype(np.float32)) (center_x, center_y), (width, height), angle = rect print(f'中心点: ({center_x:.1f}, {center_y:.1f})') print(f'宽高: {width:.1f} × {height:.1f}') print(f'角度: {angle:.1f}°')

5.2 批量处理与角度校正

在实际应用中，我们往往需要处理大量图像，并进行角度校正。下面是一个批量处理的示例：

import os from tqdm import tqdm import json def batch_process_images(model, image_dir, output_dir, score_thr=0.3): """ 批量处理图像并保存结果 Args: model: 加载好的模型 image_dir: 输入图像目录 output_dir: 输出目录 score_thr: 置信度阈值 """ os.makedirs(output_dir, exist_ok=True) # 创建结果保存文件 results_json = {} image_files = [f for f in os.listdir(image_dir) if f.lower().endswith(('.png', '.jpg', '.jpeg'))] for img_file in tqdm(image_files, desc='处理图像'): img_path = os.path.join(image_dir, img_file) # 推理 result = inference_detector(model, img_path) # 解析结果 detections = [] for class_id, bboxes in enumerate(result): for bbox in bboxes: if bbox[8] > score_thr: detection = { 'class_id': class_id, 'class_name': model.CLASSES[class_id], 'confidence': float(bbox[8]), 'points': bbox[:8].tolist(), 'bbox': bbox[:8].reshape(4, 2).tolist() } # 计算旋转矩形参数 points = np.array(detection['bbox'], dtype=np.float32) rect = cv2.minAreaRect(points) (center_x, center_y), (width, height), angle = rect detection['rotated_rect'] = { 'center': [float(center_x), float(center_y)], 'size': [float(width), float(height)], 'angle': float(angle) } detections.append(detection) # 保存结果 results_json[img_file] = detections # 可视化并保存 vis_img = model.show_result(img_path, result, score_thr=score_thr, show=False) output_path = os.path.join(output_dir, f'vis_{img_file}') cv2.imwrite(output_path, vis_img) # 保存JSON结果 with open(os.path.join(output_dir, 'detections.json'), 'w') as f: json.dump(results_json, f, indent=2) return results_json # 使用示例 model = init_detector(config_file, checkpoint_file, device='cuda:0') results = batch_process_images( model=model, image_dir='path/to/your/images', output_dir='output/detections', score_thr=0.3 )

5.3 部署到生产环境

对于生产环境，我们可能需要将模型部署为API服务。这里用Flask做一个简单的示例：

from flask import Flask, request, jsonify import cv2 import numpy as np from PIL import Image import io from mmrotate.apis import init_detector app = Flask(__name__) # 全局加载模型 model = None def load_model(): global model config_file = 'configs/oriented_rcnn/oriented_rcnn_r50_fpn_1x_dota.py' checkpoint_file = 'work_dirs/oriented_rcnn_r50_fpn_1x_dota/epoch_12.pth' model = init_detector(config_file, checkpoint_file, device='cuda:0') print("模型加载完成") @app.route('/detect', methods=['POST']) def detect(): if 'image' not in request.files: return jsonify({'error': '没有上传图像'}), 400 file = request.files['image'] # 读取图像 image_bytes = file.read() image = Image.open(io.BytesIO(image_bytes)) image = cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR) # 推理 result = inference_detector(model, image) # 解析结果 detections = [] score_thr = float(request.form.get('threshold', 0.3)) for class_id, bboxes in enumerate(result): for bbox in bboxes: if bbox[8] > score_thr: points = bbox[:8].reshape(4, 2) rect = cv2.minAreaRect(points.astype(np.float32)) (center_x, center_y), (width, height), angle = rect detection = { 'class': model.CLASSES[class_id], 'confidence': float(bbox[8]), 'center': [float(center_x), float(center_y)], 'size': [float(width), float(height)], 'angle': float(angle), 'points': points.tolist() } detections.append(detection) return jsonify({ 'image_size': [image.shape[1], image.shape[0]], # [width, height] 'detections': detections, 'count': len(detections) }) if __name__ == '__main__': load_model() app.run(host='0.0.0.0', port=5000, debug=False)

启动服务后，你可以通过HTTP请求发送图像并获取检测结果：

curl -X POST -F "image=@test.jpg" -F "threshold=0.3" http://localhost:5000/detect

6. 优化技巧与常见问题

6.1 模型优化建议

数据增强策略：
- 对于遥感图像，随机旋转（0-360度）很重要
- 考虑添加随机亮度、对比度调整，模拟不同天气条件
- 对于小目标，可以尝试多尺度训练
模型选择：
- 如果精度要求高，可以尝试ReDet或S2A-Net
- 如果速度要求高，可以考虑轻量级主干网络（如ResNet-18）
- 对于小目标检测，可以调整特征金字塔的尺度
后处理优化：
- 调整NMS（非极大值抑制）的阈值
- 对于密集目标，可以尝试Soft-NMS或DIoU-NMS
- 根据应用场景设置合适的置信度阈值

6.2 常见问题解决

问题1：训练时loss不下降

检查学习率是否合适（太大或太小）
检查数据标注是否正确
尝试使用预训练权重
检查数据增强是否过于激进

问题2：推理速度慢

使用更小的输入图像尺寸
尝试模型量化或剪枝
使用TensorRT加速推理
批量处理图像以提高GPU利用率

问题3：小目标检测效果差

增加小目标的训练样本
调整anchor的尺寸和比例
使用特征金字塔的浅层特征
尝试专门针对小目标设计的模型

问题4：角度预测不准

检查角度定义方式是否与标注一致
增加旋转数据增强
调整角度回归的损失函数权重
使用更精确的角度表示方法（如GWD、KLD损失）

6.3 实际应用中的注意事项

领域适应：如果你要检测的建筑物类型与DOTA数据集差异较大，建议进行微调。收集一些目标领域的标注数据，哪怕只有几百张，也能显著提升效果。
计算资源：旋转目标检测比水平检测计算量更大。如果资源有限，可以考虑：
- 使用更小的主干网络
- 降低输入图像分辨率
- 使用模型蒸馏技术
标注工具：旋转框的标注比水平框麻烦。推荐使用专业的标注工具，如：
- LabelMe（支持旋转框）
- CVAT（功能强大）
- Roboflow（在线服务）
评估指标：除了mAP，在实际应用中可能还需要关注：
- 角度误差（平均角度差）
- 召回率（特别是对于关键目标）
- 推理速度（FPS）