深入解析Ghidra Python脚本开发：基础与实战-洪萨配资

文章目录

- 一、Ghidra Python脚本基础
- - 1. 运行环境与核心规则
  - 2. 核心模块与常用类/函数
- 二、5个实战脚本案例
- - 案例1：批量反编译所有函数到文件
  - 案例2：提取所有字符串并导出到CSV
  - 案例3：批量重命名相似函数（基于特征）
  - 案例4：扫描程序中的硬编码IP地址
  - 案例5：统计函数调用关系并生成DOT图
- 三、Ghidra Python脚本开发技巧
- 四、总结

在汽车嵌入式开发中，常常使用Ghidra工具来对二进制目标代码进行分析。Ghidra是NSA开源的逆向工程利器，其内置的Python脚本引擎（基于Jython，兼容Python 2.7）极大扩展了逆向分析的自动化能力。无论是批量反编译、函数特征提取，还是自定义漏洞检测，Python脚本都能将重复的手工操作转化为高效的自动化流程。本文将从Ghidra Python脚本的基础语法、核心API入手，结合5个实战脚本案例，全面讲解Ghidra Python脚本的开发技巧。

一、Ghidra Python脚本基础

1. 运行环境与核心规则

解释器：Ghidra内置Jython（Python 2.7 + Java交互），可直接调用Ghidra的Java API，也兼容大部分Python 2.7语法；
脚本入口：无显式main函数，脚本从上至下执行，可通过currentProgram、currentAddress等全局变量获取当前分析上下文；
核心依赖：无需额外安装库，直接导入Ghidra内置模块（如ghidra.program.model.listing、ghidra.app.decompiler）即可。

2. 核心模块与常用类/函数

Ghidra Python脚本的核心是调用其Java封装的API，以下是逆向分析中最常用的模块和类：

模块/类	功能描述
`ghidra.program.model.listing`	程序列表操作：获取函数、指令、数据、符号等（核心模块）
`ghidra.app.decompiler`	反编译相关：将二进制函数转为C代码
`ghidra.program.model.address`	地址操作：构建、比较、偏移地址
`ghidra.program.model.mem`	内存操作：读取/写入程序内存数据
`ghidra.util.task.ConsoleTaskMonitor`	任务监视器：用于反编译、分析等耗时操作的进度跟踪
`currentProgram`	全局变量：当前打开的程序对象
`listing = currentProgram.getListing()`	获取程序列表对象，用于遍历函数/指令
`func.getEntryPoint()`	获取函数入口地址
`decompiler.decompileFunction()`	反编译指定函数
`memory.getBytes(addr, length)`	从指定地址读取指定长度的字节数据

二、5个实战脚本案例

案例1：批量反编译所有函数到文件

功能：遍历程序中所有函数，将反编译后的C代码批量导出到指定文件，方便离线分析。

# -*- coding: utf-8 -*-# 批量反编译函数脚本importghidra.app.decompilerasdecompilerimportghidra.program.model.listingaslistingimportcodecs# Python 2处理UTF-8编码# 导出路径（自定义）EXPORT_PATH="D:\\ghidra_decompile_all_functions.c"# 获取当前程序和列表对象program=currentProgram listing=program.getListing()# 初始化反编译器decompiler_service=decompiler.DecompInterface()decompiler_service.openProgram(program)# 任务监视器（避免阻塞）monitor=ghidra.util.task.ConsoleTaskMonitor()# 打开文件并写入withcodecs.open(EXPORT_PATH,"w","utf-8")asf:f.write("// Auto decompiled by Ghidra Python Script\n")f.write("#include <stdint.h>\n\n")# 遍历所有函数（递归获取）forfuncinlisting.getFunctions(True):func_name=func.getName()func_addr=func.getEntryPoint()# 跳过系统自动生成的无效函数iffunc_name.startswith("_")andfunc_name.find("sub_")==-1:continue# 写入函数头信息f.write("// ===================== %s @ 0x%s =====================\n"%(func_name,func_addr))# 反编译函数decompile_result=decompiler_service.decompileFunction(func,0,monitor)ifdecompile_result.decompileCompleted():c_code=decompile_result.getDecompiledFunction().getC()f.write(c_code+"\n\n")else:f.write("// Decompile failed: %s\n\n"%func_name)print"Decompile failed: %s"%func_name# 释放资源decompiler_service.dispose()print"All functions decompiled to: %s"%EXPORT_PATH

案例2：提取所有字符串并导出到CSV

功能：扫描程序内存中的所有字符串（ASCII/Unicode），提取字符串内容、地址、长度，导出为CSV文件，辅助快速定位关键字符串（如硬编码密码、URL）。

# -*- coding: utf-8 -*-# 提取字符串并导出CSV脚本importghidra.program.model.listingaslistingimportghidra.program.model.memasmemimportcodecs# 导出路径CSV_PATH="D:\\ghidra_strings.csv"program=currentProgram listing=program.getListing()memory=program.getMemory()# 字符串类型（ASCII/Unicode）STRING_TYPES=[listing.DataUtilities.STRING_TYPE,listing.DataUtilities.UNICODE_STRING_TYPE]# 打开CSV文件并写入表头withcodecs.open(CSV_PATH,"w","utf-8")asf:f.write("Address,Type,Length,Content\n")# 遍历程序中所有数据fordatainlisting.getDefinedData(True):# 判断是否为字符串类型ifdata.getDataType().isString()anddata.getValue()isnotNone:addr=data.getAddress()str_type="ASCII"ifdata.getDataType().getName()=="string"else"Unicode"str_len=data.getLength()str_content=data.getValue().encode("utf-8","ignore")# 处理特殊字符# 写入CSV行f.write("0x%s,%s,%d,%s\n"%(addr,str_type,str_len,str_content))print"Strings exported to: %s"%CSV_PATH

案例3：批量重命名相似函数（基于特征）

功能：识别以sub_开头的自动命名函数，若函数包含指定指令特征（如mov eax, 0x1），则批量重命名为func_xxx，提升逆向可读性。

# -*- coding: utf-8 -*-# 批量重命名函数脚本importghidra.program.model.listingaslistingimportghidra.program.model.addressasaddress program=currentProgram listing=program.getListing()# 目标指令特征（字节码）：mov eax, 0x1 → 0xB8 0x01 0x00 0x00 0x00TARGET_OPCODE=[0xB8,0x01,0x00,0x00,0x00]RENAME_PREFIX="func_syscall_"counter=0# 遍历所有函数forfuncinlisting.getFunctions(True):func_name=func.getName()# 仅处理自动命名的sub_函数iffunc_name.startswith("sub_"):func_addr=func.getEntryPoint()func_instructions=listing.getInstructions(func.getBody(),True)# 检查函数指令是否包含目标特征has_target_opcode=Falseforinstinfunc_instructions:# 读取指令字节码opcode_bytes=[]foriinrange(len(TARGET_OPCODE)):try:byte=program.getMemory().getByte(inst.getAddress().add(i))opcode_bytes.append(byte&0xFF)except:break# 匹配特征ifopcode_bytes==TARGET_OPCODE:has_target_opcode=Truebreak# 重命名函数ifhas_target_opcode:new_name="%s%d"%(RENAME_PREFIX,counter)func.setName(new_name,ghidra.program.model.symbol.SourceType.USER_DEFINED)print"Renamed: %s → %s (0x%s)"%(func_name,new_name,func_addr)counter+=1print"Renamed %d functions"%counter

案例4：扫描程序中的硬编码IP地址

功能：扫描程序内存中的4字节数据，解析为IPv4地址（如0x0100007F→127.0.0.1），提取所有硬编码IP并导出，辅助漏洞分析。

# -*- coding: utf-8 -*-# 扫描硬编码IP脚本importghidra.program.model.memasmemimportcodecs# 导出路径IP_PATH="D:\\ghidra_hardcoded_ips.txt"program=currentProgram memory=program.getMemory()# 遍历程序所有内存块IP_LIST=[]defbytes_to_ip(byte1,byte2,byte3,byte4):"""4字节转IPv4地址（小端序）"""return"%d.%d.%d.%d"%(byte4&0xFF,byte3&0xFF,byte2&0xFF,byte1&0xFF)# 遍历内存（仅扫描可执行/数据段）forblockinmemory.getBlocks():ifblock.isReadable()andnotblock.isExternal():start_addr=block.getStart()end_addr=block.getEnd()current_addr=start_addr# 逐4字节扫描whilecurrent_addr.compareTo(end_addr)<0:try:# 读取4字节b1=memory.getByte(current_addr)b2=memory.getByte(current_addr.add(1))b3=memory.getByte(current_addr.add(2))b4=memory.getByte(current_addr.add(3))# 转换为IPip=bytes_to_ip(b1,b2,b3,b4)# 过滤无效IP（如0.0.0.0、255.255.255.255）ifipnotin["0.0.0.0","255.255.255.255"]:IP_LIST.append((current_addr,ip))# 偏移4字节current_addr=current_addr.add(4)except:current_addr=current_addr.add(1)# 去重并写入文件unique_ips=list(set(IP_LIST))withcodecs.open(IP_PATH,"w","utf-8")asf:f.write("Address,IP Address\n")foraddr,ipinunique_ips:f.write("0x%s,%s\n"%(addr,ip))print"Found %d hardcoded IPs, exported to: %s"%(len(unique_ips),IP_PATH)

案例5：统计函数调用关系并生成DOT图

功能：分析指定函数的调用关系（被谁调用/调用了谁），生成DOT格式文件，可通过Graphviz转为可视化流程图。

# -*- coding: utf-8 -*-# 生成函数调用关系DOT脚本importghidra.program.model.listingaslistingimportghidra.program.model.symbolassymbolimportcodecs# 目标函数名（可修改）TARGET_FUNC_NAME="main"DOT_PATH="D:\\ghidra_callgraph.dot"program=currentProgram listing=program.getListing()symbol_table=program.getSymbolTable()# 查找目标函数target_func=Noneforfuncinlisting.getFunctions(True):iffunc.getName()==TARGET_FUNC_NAME:target_func=funcbreakifnottarget_func:print"Function %s not found!"%TARGET_FUNC_NAME exit(0)# 获取调用关系caller_list=[]# 调用当前函数的函数callee_list=[]# 当前函数调用的函数# 1. 获取被调用者（callee）forrefinprogram.getReferenceManager().getReferencesFrom(target_func.getEntryPoint()):ifref.getReferenceType().isCall():callee_addr=ref.getToAddress()callee_func=listing.getFunctionContaining(callee_addr)ifcallee_func:callee_list.append(callee_func.getName())# 2. 获取调用者（caller）forrefinprogram.getReferenceManager().getReferencesTo(target_func.getEntryPoint()):ifref.getReferenceType().isCall():caller_addr=ref.getFromAddress()caller_func=listing.getFunctionContaining(caller_addr)ifcaller_func:caller_list.append(caller_func.getName())# 生成DOT文件withcodecs.open(DOT_PATH,"w","utf-8")asf:f.write("digraph CallGraph {\n")f.write(' node [shape=box];\n')# 绘制调用者→目标函数forcallerincaller_list:f.write(' "%s" -> "%s";\n'%(caller,TARGET_FUNC_NAME))# 绘制目标函数→被调用者forcalleeincallee_list:f.write(' "%s" -> "%s";\n'%(TARGET_FUNC_NAME,callee))f.write("}\n")print"Call graph generated to: %s"%DOT_PATHprint"Callers of %s: %s"%(TARGET_FUNC_NAME,caller_list)print"Callees of %s: %s"%(TARGET_FUNC_NAME,callee_list)

三、Ghidra Python脚本开发技巧

类型转换：Jython中Java对象与Python类型可直接转换（如java.lang.String→ Python字符串），但需注意编码（推荐utf-8）；
异常处理：内存读取、函数查找等操作需加try-except，避免因无效地址导致脚本崩溃；
性能优化：遍历大量数据时（如全内存扫描），优先使用getBlocks()按内存块遍历，而非逐字节扫描；
调试技巧：通过print输出中间结果，或使用Ghidra的Script Console实时执行代码片段调试；
兼容性：避免使用Python 3专属语法（如f-string），全部改用%格式化或str.format()。

四、总结

Ghidra Python脚本是逆向工程自动化的核心工具，通过调用其丰富的Java API，可实现从批量反编译、特征提取到自定义分析的全流程自动化。本文介绍的5个实战脚本覆盖了逆向分析中最常见的场景，开发者可基于这些案例扩展功能（如漏洞扫描、恶意代码特征匹配等）。掌握Ghidra Python脚本开发，能大幅提升逆向分析效率，将重复的手工操作转化为可复用的自动化流程。