<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>宋旭东</title>
  
  <subtitle>生物医学工程 / Bioinformatics / Data Analysis</subtitle>
  <link href="https://song-xudong.github.io/atom.xml" rel="self"/>
  
  <link href="https://song-xudong.github.io/"/>
  <updated>2025-03-16T10:13:53.000Z</updated>
  <id>https://song-xudong.github.io/</id>
  
  <author>
    <name>Song Xudong</name>
    
  </author>
  
  <generator uri="https://hexo.io/">Hexo</generator>
  
  <entry>
    <title>MATLAB影像数据处理(三)</title>
    <link href="https://song-xudong.github.io/2025/03/16/MATLAB%E5%BD%B1%E5%83%8F%E6%95%B0%E6%8D%AE%E5%A4%84%E7%90%86(%E4%B8%89)/"/>
    <id>https://song-xudong.github.io/2025/03/16/MATLAB%E5%BD%B1%E5%83%8F%E6%95%B0%E6%8D%AE%E5%A4%84%E7%90%86(%E4%B8%89)/</id>
    <published>2025-03-16T10:10:48.000Z</published>
    <updated>2025-03-16T10:13:53.000Z</updated>
    
    <content type="html"><![CDATA[<h1 id="DICOM图像元数据解析"><a href="#DICOM图像元数据解析" class="headerlink" title="DICOM图像元数据解析"></a>DICOM图像元数据解析</h1><p>DICOM（数字影像和通信）是一种用于医学成像的标准格式，广泛应用于医疗领域。它包含了丰富的元数据，用于描述患者信息、设备信息、图像特征等。以下是对您提供的字段的逐一解释：</p><h3 id="1-文件元信息（File-Meta-Information）"><a href="#1-文件元信息（File-Meta-Information）" class="headerlink" title="1. 文件元信息（File Meta Information）"></a>1. <strong>文件元信息（File Meta Information）</strong></h3><ul><li><strong>Filename</strong>: 文件的名称。</li><li><strong>FileModDate</strong>: 文件的修改日期。</li><li><strong>FileSize</strong>: 文件的大小（以字节为单位）。</li><li><strong>Format</strong>: 文件的格式（如DICOM）。</li><li><strong>FormatVersion</strong>: 文件格式的版本。</li><li><strong>Width</strong>, <strong>Height</strong>: 图像的宽度和高度（以像素为单位）。</li><li><strong>BitDepth</strong>: 图像的位深（如8位、16位）。</li><li><strong>ColorType</strong>: 图像的颜色类型（如单色、彩色）。</li><li><strong>FileMetaInformationGroupLength</strong>: 文件元信息组的长度。</li><li><strong>FileMetaInformationVersion</strong>: 文件元信息的版本。</li><li><strong>MediaStorageSOPClassUID</strong>: 媒体存储的SOP类UID。</li><li><strong>MediaStorageSOPInstanceUID</strong>: 媒体存储的SOP实例UID。</li><li><strong>TransferSyntaxUID</strong>: 传输语法UID。</li><li><strong>ImplementationClassUID</strong>: 实现类UID。</li><li><strong>ImplementationVersionName</strong>: 实现版本名称。</li><li><strong>IdentifyingGroupLength</strong>: 标识组的长度。</li><li><strong>SpecificCharacterSet</strong>: 特定的字符集（如UTF-8）。</li></ul><h3 id="2-患者信息（Patient-Information）"><a href="#2-患者信息（Patient-Information）" class="headerlink" title="2. 患者信息（Patient Information）"></a>2. <strong>患者信息（Patient Information）</strong></h3><ul><li><strong>PatientName</strong>: 患者的姓名。</li><li><strong>PatientID</strong>: 患者的ID。</li><li><strong>PatientBirthDate</strong>: 患者的出生日期。</li><li><strong>PatientSex</strong>: 患者的性别。</li><li><strong>PatientAge</strong>: 患者的年龄。</li><li><strong>PatientWeight</strong>: 患者的体重。</li><li><strong>AdditionalPatientHistory</strong>: 其他患者历史信息。</li></ul><h3 id="3-研究和系列信息（Study-and-Series-Information）"><a href="#3-研究和系列信息（Study-and-Series-Information）" class="headerlink" title="3. 研究和系列信息（Study and Series Information）"></a>3. <strong>研究和系列信息（Study and Series Information）</strong></h3><ul><li><strong>StudyDate</strong>: 研究的日期。</li><li><strong>SeriesDate</strong>: 系列的日期。</li><li><strong>AcquisitionDate</strong>: 采集的日期。</li><li><strong>ContentDate</strong>: 内容的日期。</li><li><strong>StudyTime</strong>, <strong>SeriesTime</strong>, <strong>AcquisitionTime</strong>, <strong>ContentTime</strong>: 相关时间的时间部分。</li><li><strong>AccessionNumber</strong>: 追踪号。</li><li><strong>Modality</strong>: 成像方式（如MRI、CT、US）。</li><li><strong>Manufacturer</strong>: 设备制造商。</li><li><strong>InstitutionName</strong>: 机构名称。</li><li><strong>ReferringPhysicianName</strong>: 转诊医生的姓名。</li><li><strong>StationName</strong>: 工作站名称。</li><li><strong>SeriesDescription</strong>: 系列描述。</li><li><strong>ManufacturerModelName</strong>: 设备型号名称。</li></ul><h3 id="4-图像参数（Image-Parameters）"><a href="#4-图像参数（Image-Parameters）" class="headerlink" title="4. 图像参数（Image Parameters）"></a>4. <strong>图像参数（Image Parameters）</strong></h3><ul><li><strong>SliceThickness</strong>: 切片厚度。</li><li><strong>RepetitionTime</strong>: 重复时间（TR）。</li><li><strong>EchoTime</strong>: 回声时间（TE）。</li><li><strong>NumberOfAverages</strong>: 平均次数。</li><li><strong>ImagingFrequency</strong>: 成像频率。</li><li><strong>ImagedNucleus</strong>: 被成像的原子核。</li><li><strong>EchoNumbers</strong>: 回声数。</li><li><strong>MagneticFieldStrength</strong>: 磁场强度。</li><li><strong>SpacingBetweenSlices</strong>: 切片间距。</li><li><strong>EchoTrainLength</strong>: 回声列长度。</li><li><strong>PercentSampling</strong>: 采样百分比。</li><li><strong>PercentPhaseFieldOfView</strong>: 相位场视百分比。</li><li><strong>PixelBandwidth</strong>: 像素带宽。</li><li><strong>DeviceSerialNumber</strong>: 设备序列号。</li><li><strong>SoftwareVersions</strong>: 软件版本。</li><li><strong>ProtocolName</strong>: 协议名称。</li></ul><h3 id="5-其他信息（Miscellaneous-Information）"><a href="#5-其他信息（Miscellaneous-Information）" class="headerlink" title="5. 其他信息（Miscellaneous Information）"></a>5. <strong>其他信息（Miscellaneous Information）</strong></h3><ul><li><strong>ContrastBolusAgent</strong>: 对比剂。</li><li><strong>ContrastBolusRoute</strong>: 对比剂注射途径。</li><li><strong>HeartRate</strong>: 心率。</li><li><strong>CardiacNumberOfImages</strong>: 心脏成像的图像数量。</li><li><strong>TriggerWindow</strong>: 触发窗口。</li><li><strong>ReconstructionDiameter</strong>: 重建直径。</li><li><strong>ReceiveCoilName</strong>: 接收线圈名称。</li><li><strong>AcquisitionMatrix</strong>: 采集矩阵。</li><li><strong>InPlanePhaseEncodingDirection</strong>: 平面相位编码方向。</li><li><strong>FlipAngle</strong>: 翻转角。</li><li><strong>VariableFlipAngleFlag</strong>: 变量翻转角标志。</li><li><strong>SAR</strong>: 比能吸收率（SAR）。</li><li><strong>PatientPosition</strong>: 患者位置。</li><li><strong>Laterality</strong>: 左右侧标志。</li></ul><h3 id="6-图像数据（Image-Data）"><a href="#6-图像数据（Image-Data）" class="headerlink" title="6. 图像数据（Image Data）"></a>6. <strong>图像数据（Image Data）</strong></h3><ul><li><strong>Rows</strong>, <strong>Columns</strong>: 图像的行数和列数。</li><li><strong>PixelSpacing</strong>: 像素间距。</li><li><strong>BitsAllocated</strong>, <strong>BitsStored</strong>, <strong>HighBit</strong>: 位分配、存储和高位。</li><li><strong>PixelRepresentation</strong>: 像素表示（如unsigned short）。</li><li><strong>SamplesPerPixel</strong>: 每像素样本数。</li><li><strong>PhotometricInterpretation</strong>: 光度学解释（如单色2、RGB）。</li><li><strong>SmallestImagePixelValue</strong>, <strong>LargestImagePixelValue</strong>: 最小和最大像素值。</li><li><strong>WindowCenter</strong>, <strong>WindowWidth</strong>: 窗中心和窗宽。</li><li><strong>PixelDataGroupLength</strong>, <strong>PixelData</strong>: 像素数据组长度和实际图像数据。</li></ul><h3 id="7-私有信息（Private-Information）"><a href="#7-私有信息（Private-Information）" class="headerlink" title="7. 私有信息（Private Information）"></a>7. <strong>私有信息（Private Information）</strong></h3><ul><li>私有字段通常以“Private_”开头，用于存储特定设备或机构的额外信息。这些字段的含义需参考设备制造商的文档。</li></ul><h3 id="8-唯一标识符（Unique-Identifiers）"><a href="#8-唯一标识符（Unique-Identifiers）" class="headerlink" title="8. 唯一标识符（Unique Identifiers）"></a>8. <strong>唯一标识符（Unique Identifiers）</strong></h3><ul><li><strong>StudyInstanceUID</strong>, <strong>SeriesInstanceUID</strong>, <strong>SOPInstanceUID</strong>: 用于唯一标识研究、系列和SOP实例的UID。</li><li><strong>InstanceNumber</strong>: 实例编号。</li></ul><h3 id="9-位置和方向（Position-and-Orientation）"><a href="#9-位置和方向（Position-and-Orientation）" class="headerlink" title="9. 位置和方向（Position and Orientation）"></a>9. <strong>位置和方向（Position and Orientation）</strong></h3><ul><li><strong>ImagePositionPatient</strong>: 图像在患者坐标系中的位置。</li><li><strong>ImageOrientationPatient</strong>: 图像在患者坐标系中的方向。</li><li><strong>FrameOfReferenceUID</strong>: 参考框架UID。</li></ul><h3 id="10-时间和事件（Time-and-Events）"><a href="#10-时间和事件（Time-and-Events）" class="headerlink" title="10. 时间和事件（Time and Events）"></a>10. <strong>时间和事件（Time and Events）</strong></h3><ul><li><strong>AcquisitionTime</strong>: 采集时间。</li><li><strong>TriggerTime</strong>: 触发时间。</li><li><strong>ContentTime</strong>: 内容时间。</li></ul><h3 id="11-其他标识符（Other-Identifiers）"><a href="#11-其他标识符（Other-Identifiers）" class="headerlink" title="11. 其他标识符（Other Identifiers）"></a>11. <strong>其他标识符（Other Identifiers）</strong></h3><ul><li><strong>PatientID</strong>, <strong>AccessionNumber</strong>: 患者ID和追踪号，用于标识患者和研究。</li></ul><h3 id="12-设备信息（Device-Information）"><a href="#12-设备信息（Device-Information）" class="headerlink" title="12. 设备信息（Device Information）"></a>12. <strong>设备信息（Device Information）</strong></h3><ul><li><strong>StationName</strong>, <strong>DeviceSerialNumber</strong>: 设备名称和序列号，用于标识采集设备。</li></ul><h2 id="简单练习"><a href="#简单练习" class="headerlink" title="简单练习"></a>简单练习</h2><p><img src="/picture/image-20250316180206236.png" alt="image-20250316180206236"></p><h1 id="DPABI-安装"><a href="#DPABI-安装" class="headerlink" title="DPABI 安装"></a>DPABI 安装</h1><p><strong>DPABI</strong>（Data Processing &amp; Analysis for Brain Imaging）是一个开源的 MATLAB 工具箱，用于脑影像数据的处理和分析。它提供了丰富的功能，特别是针对结构和功能磁共振成像（fMRI）以及结构性磁共振成像（sMRI）数据的分析。DPABI 旨在简化和加速脑成像数据的处理，具有灵活性和高效性，适合神经科学和脑成像研究人员使用。</p><p>参考教程：<a href="https://blog.csdn.net/qq_43419761/article/details/121131875">https://blog.csdn.net/qq_43419761&#x2F;article&#x2F;details&#x2F;121131875</a></p><p>下载</p><p>官网：<a href="https://rfmri.org/DPABI">https://rfmri.org/DPABI</a></p><p><img src="/picture/image-20250228200313384.png" alt="image-20250228200313384"></p><p>解压放到matlab的<a href="https://so.csdn.net/so/search?q=toolbox&spm=1001.2101.3001.7020">toolbox</a>中，即matlab的安装地址&#x2F;toolbox</p><p><img src="/picture/image-20250228200408755.png" alt="image-20250228200408755"></p><h2 id="SPM安装"><a href="#SPM安装" class="headerlink" title="SPM安装"></a>SPM安装</h2><p><a href="https://www.fil.ion.ucl.ac.uk/spm/software/download/">https://www.fil.ion.ucl.ac.uk/spm/software/download/</a></p><p><img src="/picture/image-20250228201203832.png" alt="image-20250228201203832"></p><p>现在安装的MATLAB2024没有对应版本的SPM</p><p>我先尝试下载最新的SPM进行使用</p><p><img src="/picture/image-20250228202105692.png" alt="image-20250228202105692"></p><p>安装成功</p><h1 id="AAL脑区模板解读"><a href="#AAL脑区模板解读" class="headerlink" title="AAL脑区模板解读"></a>AAL脑区模板解读</h1><p>AAL.nii</p><p><strong>AAL（Automated Anatomical Labeling）模板</strong> 是一种常用的脑图谱（brain atlas），用于将大脑划分为多个解剖区域（脑区），并为每个区域分配一个唯一的编号。它是神经影像学研究中常用的工具，特别是在功能磁共振成像（fMRI）和结构磁共振成像（sMRI）数据分析中。</p><p>在MATLAB中使用y_ReadAll读取AAL.nii文件</p><table><tbody><tr><td class="code"><pre><span class="line">[Data, VoxelSize, FileList, Header] = y_ReadAll(AAL_file);</span><br></pre></td></tr></tbody></table><table><tbody><tr><td class="code"><pre><span class="line"><span class="comment">% y_ReadAll - 读取 NIfTI、GIfTI 或 DPABINet Matrix 文件</span></span><br><span class="line"><span class="comment">% ------------------------------------------------------------------------</span></span><br><span class="line"><span class="comment">% 输入:</span></span><br><span class="line"><span class="comment">% InputName - 输入文件或目录的路径，可以是以下形式：</span></span><br><span class="line"><span class="comment">%             1. 单个文件（如 .nii、.nii.gz、.gii 或 .mat 文件）。</span></span><br><span class="line"><span class="comment">%             2. 一个目录，目录下可以是：</span></span><br><span class="line"><span class="comment">%                - 对于 NIfTI：一个 4D 文件或一组 3D 文件。</span></span><br><span class="line"><span class="comment">%                - 对于 GIfTI：一个 2D 文件或一组 1D 文件。</span></span><br><span class="line"><span class="comment">%                - 对于 DPABINet Matrix：一组 .mat 文件。</span></span><br><span class="line"><span class="comment">%             3. 一个文件列表（cell 数组），每个元素是一个文件的路径。</span></span><br><span class="line"><span class="comment">% 输出:</span></span><br><span class="line"><span class="comment">% Data - 图像数据矩阵：</span></span><br><span class="line"><span class="comment">%        - 对于 NIfTI：4D 矩阵。</span></span><br><span class="line"><span class="comment">%        - 对于 GIfTI：2D 矩阵。</span></span><br><span class="line"><span class="comment">%        - 对于 DPABINet Matrix：2D 矩阵。</span></span><br><span class="line"><span class="comment">% VoxelSize - 体素大小（仅对 NIfTI 文件有效）。</span></span><br><span class="line"><span class="comment">% FileList - 读取的文件列表。</span></span><br><span class="line"><span class="comment">% Header - 头信息结构体：</span></span><br><span class="line"><span class="comment">%          - 对于 NIfTI：包含 fname、dim、dt、mat、pinfo 等字段。</span></span><br><span class="line"><span class="comment">%          - 对于 GIfTI：包含 GIfTI 的头信息。</span></span><br><span class="line"><span class="comment">%          - 对于 DPABINet Matrix：包含矩阵名称和大小信息。</span></span><br></pre></td></tr></tbody></table><h1 id="作业代码"><a href="#作业代码" class="headerlink" title="作业代码"></a>作业代码</h1><table><tbody><tr><td class="code"><pre><span class="line">clc,clear;</span><br><span class="line"><span class="comment">% 读取AAL模板</span></span><br><span class="line">AAL_file = <span class="string">'011.nii'</span>;  <span class="comment">% AAL模板文件名</span></span><br><span class="line">[Data, VoxelSize, FileList, Header] = y_ReadAll(AAL_file);</span><br><span class="line"></span><br><span class="line"><span class="comment">% 获取AAL模板中的唯一脑区编号</span></span><br><span class="line">unique_regions = unique(Data);</span><br><span class="line">unique_regions = unique_regions(unique_regions &gt; <span class="number">0</span>);  <span class="comment">% 去除背景0</span></span><br><span class="line"></span><br><span class="line"><span class="comment">% 创建mask文件夹</span></span><br><span class="line">mask_folder = <span class="string">'mask'</span>;</span><br><span class="line"><span class="keyword">if</span> ~exist(mask_folder, <span class="string">'dir'</span>)</span><br><span class="line">    mkdir(mask_folder);</span><br><span class="line"><span class="keyword">end</span></span><br><span class="line"></span><br><span class="line"><span class="comment">% 遍历每个脑区，生成并保存mask</span></span><br><span class="line"><span class="keyword">for</span> <span class="built_in">i</span> = <span class="number">1</span>:<span class="built_in">length</span>(unique_regions)</span><br><span class="line">    region_value = unique_regions(<span class="built_in">i</span>);</span><br><span class="line">    </span><br><span class="line">    <span class="comment">% 生成0-1 mask</span></span><br><span class="line">    mask = double(Data == region_value);</span><br><span class="line">    </span><br><span class="line">    <span class="comment">% 生成文件名（三位数命名）</span></span><br><span class="line">    filename = sprintf(<span class="string">'%03d.nii'</span>, region_value);</span><br><span class="line">    filepath = fullfile(mask_folder, filename);</span><br><span class="line">    </span><br><span class="line">    <span class="comment">% 修改Header以匹配mask</span></span><br><span class="line">    mask_header = Header;  <span class="comment">% 复制原始Header</span></span><br><span class="line">    mask_header.dim = <span class="built_in">size</span>(mask);  <span class="comment">% 更新维度信息</span></span><br><span class="line">    mask_header.dt = [<span class="number">16</span>, <span class="number">0</span>];  <span class="comment">% 设置数据类型为double（根据需要调整）</span></span><br><span class="line">    </span><br><span class="line">    <span class="comment">% 保存mask为NIfTI文件</span></span><br><span class="line">    y_Write(mask, mask_header, filepath);</span><br><span class="line"><span class="keyword">end</span></span><br><span class="line"></span><br><span class="line"><span class="built_in">disp</span>(<span class="string">'所有脑区的mask已生成并保存到mask文件夹中。'</span>);</span><br></pre></td></tr></tbody></table>]]></content>
    
    
    <summary type="html">DICOM图像元数据解析DICOM（数字影像和通信）是一种用于医学成像的标准格式，广泛应用于医疗领域。它包含了丰富的元数据，用于描述患者信息、设备信息、图像特征等。以下是对您提供的字段的逐一解释： 1. 文件元信息（File Meta Information） Filename: 文件的名称。 Fi</summary>
    
    
    
    <category term="生物信息学" scheme="https://song-xudong.github.io/categories/%E7%94%9F%E7%89%A9%E4%BF%A1%E6%81%AF%E5%AD%A6/"/>
    
    
    <category term="MATLAB" scheme="https://song-xudong.github.io/tags/MATLAB/"/>
    
    <category term="影像数据处理" scheme="https://song-xudong.github.io/tags/%E5%BD%B1%E5%83%8F%E6%95%B0%E6%8D%AE%E5%A4%84%E7%90%86/"/>
    
  </entry>
  
  <entry>
    <title>MATLAB影像数据处理(二)</title>
    <link href="https://song-xudong.github.io/2025/03/16/MATLAB%E5%BD%B1%E5%83%8F%E6%95%B0%E6%8D%AE%E5%A4%84%E7%90%86(%E4%BA%8C)/"/>
    <id>https://song-xudong.github.io/2025/03/16/MATLAB%E5%BD%B1%E5%83%8F%E6%95%B0%E6%8D%AE%E5%A4%84%E7%90%86(%E4%BA%8C)/</id>
    <published>2025-03-16T10:04:03.000Z</published>
    <updated>2025-03-16T10:07:16.000Z</updated>
    
    <content type="html"><![CDATA[<h1 id="进阶基本命令"><a href="#进阶基本命令" class="headerlink" title="进阶基本命令"></a>进阶基本命令</h1><p><img src="/picture/image-20250224190005358.png" alt="image-20250224190005358"></p><table><tbody><tr><td class="code"><pre><span class="line">clc,clear</span><br><span class="line"></span><br><span class="line"><span class="comment">%doc fullfile</span></span><br><span class="line"></span><br><span class="line">f1 = <span class="string">'111\222\333.m'</span>;</span><br><span class="line">f2 = fullfile(<span class="string">'111'</span>,<span class="string">'222'</span>,<span class="string">'333.m'</span>);</span><br><span class="line">f3 = strcat(<span class="string">'111\'</span>,<span class="string">'222\'</span>,<span class="string">'333.m'</span>);</span><br><span class="line">f4 = [<span class="string">'111\'</span>,<span class="string">'222\'</span>,<span class="string">'333.m'</span>];</span><br><span class="line"><span class="comment">%filesep可以代替\</span></span><br><span class="line">f5 = [<span class="string">'111'</span>,filesep,<span class="string">'222'</span>,filesep,<span class="string">'333.m'</span>];</span><br><span class="line"></span><br><span class="line"><span class="comment">%fileparts用来分割文件路径，文件名，和后缀</span></span><br><span class="line">[filepath,name,ext] = fileparts(f1);</span><br><span class="line"><span class="comment">%只想返回文件名</span></span><br><span class="line">[~,name1] = fileparts(f1);</span><br><span class="line"><span class="comment">%只想返回后缀</span></span><br><span class="line">[~,~,ext1] = fileparts(f1);</span><br><span class="line"></span><br><span class="line"><span class="comment">%find 查找某值位于的位置</span></span><br><span class="line">a = <span class="number">1</span>:<span class="number">2</span>:<span class="number">10</span>;</span><br><span class="line"><span class="comment">%等于为==，大于&gt;，小于</span></span><br><span class="line">b = <span class="built_in">find</span>(a==<span class="number">5</span>);</span><br><span class="line"></span><br><span class="line"><span class="comment">%genpath 某文件夹下的所有文件夹,包括子文件夹</span></span><br><span class="line">p = genpath(<span class="string">"D:\MATLAB\work2"</span>);</span><br><span class="line"></span><br><span class="line"><span class="comment">%addpath 添加环境变量</span></span><br><span class="line"></span><br><span class="line"><span class="comment">%zip 压缩文件</span></span><br><span class="line"><span class="comment">%zip(zipfilename,filenames)</span></span><br><span class="line"></span><br><span class="line"><span class="comment">%gunzip 解压文件</span></span><br><span class="line"></span><br><span class="line"><span class="comment">%strsplit 分割字符串</span></span><br><span class="line">straa1 = <span class="string">"my name is matlab"</span>;</span><br><span class="line">a = strsplit(straa1);</span><br><span class="line"></span><br><span class="line"><span class="comment">%打开文件夹选择对话框</span></span><br><span class="line">path1 = uigetdir(<span class="string">'C:\'</span>);</span><br><span class="line"></span><br><span class="line"><span class="comment">%打开文件选择对话框</span></span><br><span class="line">filename1 = uigetfile();</span><br></pre></td></tr></tbody></table><h1 id="矩阵操作"><a href="#矩阵操作" class="headerlink" title="矩阵操作"></a>矩阵操作</h1><table><tbody><tr><td class="code"><pre><span class="line">%矩阵运算</span><br><span class="line">a = rand(1000,1);</span><br><span class="line">hist(a);</span><br><span class="line"></span><br><span class="line">%randn的随机数有正有负</span><br><span class="line">a = randn(1000,1);</span><br><span class="line">hist(a);</span><br><span class="line"></span><br><span class="line">%std 方差</span><br><span class="line">std(a)</span><br><span class="line"></span><br><span class="line">%mean 均值</span><br><span class="line">mean(a)</span><br><span class="line"></span><br><span class="line">%sum 相加,矩阵的话，按列相加</span><br><span class="line">sum(a)</span><br><span class="line">a = rand(1000,2);</span><br><span class="line">sum(a)</span><br><span class="line"></span><br><span class="line">%zeros 生成全是0的矩阵</span><br><span class="line">zeros(5,2)</span><br><span class="line"></span><br><span class="line">%ones 生成全是1的矩阵</span><br><span class="line">ones(5,2)</span><br><span class="line"></span><br><span class="line">% 生成全是6的矩阵</span><br><span class="line">X = 6*ones(5,2)</span><br><span class="line"></span><br><span class="line">%eye 单位矩阵,对角阵</span><br><span class="line">eye(5)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">a = rand(4,2);</span><br><span class="line">sum(a)</span><br><span class="line">% a' 转置，行变列，；列变行</span><br><span class="line">a=a'</span><br><span class="line"></span><br><span class="line">%length,返回长度，优先返回列</span><br><span class="line">length(a)</span><br><span class="line"></span><br><span class="line">%size 矩阵的长宽</span><br><span class="line">a</span><br><span class="line">size(a)</span><br><span class="line">%返回长</span><br><span class="line">size(a,1)</span><br><span class="line">%返回宽</span><br><span class="line">size(a,2)</span><br><span class="line"></span><br><span class="line">%矩阵相乘</span><br><span class="line"></span><br><span class="line">b1 =[1 2</span><br><span class="line">    3 4]</span><br><span class="line">b2 =[5 6</span><br><span class="line">    7 8]</span><br><span class="line"></span><br><span class="line">b1*b2</span><br></pre></td></tr></tbody></table><p>矩阵相乘参考：<a href="https://www.bilibili.com/video/BV1Nq421w7vH/?spm_id_from=333.1007.top_right_bar_window_history.content.click">https://www.bilibili.com/video/BV1Nq421w7vH/?spm_id_from&#x3D;333.1007.top_right_bar_window_history.content.click</a></p><h2 id="作业二"><a href="#作业二" class="headerlink" title="作业二"></a>作业二</h2><p><img src="/picture/image-20250225170334446.png" alt="image-20250225170334446"></p><table><tbody><tr><td class="code"><pre><span class="line"><span class="comment">% 选择文件夹</span></span><br><span class="line">selectedFolder = uigetdir(<span class="string">'请选择包含PDF文件的文件夹'</span>);</span><br><span class="line"><span class="keyword">if</span> selectedFolder == <span class="number">0</span></span><br><span class="line">    error(<span class="string">'未选择文件夹，操作取消。'</span>);</span><br><span class="line"><span class="keyword">end</span></span><br><span class="line"></span><br><span class="line"><span class="comment">% 获取文件夹中的所有PDF文件</span></span><br><span class="line">pdfFiles = dir(fullfile(selectedFolder, <span class="string">'*.pdf'</span>));</span><br><span class="line"></span><br><span class="line"><span class="comment">% 检查是否有PDF文件</span></span><br><span class="line"><span class="keyword">if</span> <span class="built_in">isempty</span>(pdfFiles)</span><br><span class="line">    error(<span class="string">'所选文件夹中没有PDF文件。'</span>);</span><br><span class="line"><span class="keyword">end</span></span><br><span class="line"></span><br><span class="line"><span class="comment">% 在所选文件夹的同级目录下创建pdf文件夹</span></span><br><span class="line">pdfFolder = fullfile(fileparts(selectedFolder), <span class="string">'pdf'</span>);</span><br><span class="line"><span class="keyword">if</span> ~exist(pdfFolder, <span class="string">'dir'</span>)</span><br><span class="line">    mkdir(pdfFolder);</span><br><span class="line"><span class="keyword">end</span></span><br><span class="line"></span><br><span class="line"><span class="comment">% 遍历每个PDF文件</span></span><br><span class="line"><span class="keyword">for</span> <span class="built_in">i</span> = <span class="number">1</span>:<span class="built_in">length</span>(pdfFiles)</span><br><span class="line">    <span class="comment">% 获取当前PDF文件的完整路径</span></span><br><span class="line">    currentPdfPath = fullfile(pdfFiles(<span class="built_in">i</span>).folder, pdfFiles(<span class="built_in">i</span>).name);</span><br><span class="line">    </span><br><span class="line">    <span class="comment">% 创建以PDF文件名命名的文件夹</span></span><br><span class="line">    [~, pdfName, ~] = fileparts(pdfFiles(<span class="built_in">i</span>).name);</span><br><span class="line">    newFolder = fullfile(pdfFolder, pdfName);</span><br><span class="line">    <span class="keyword">if</span> ~exist(newFolder, <span class="string">'dir'</span>)</span><br><span class="line">        mkdir(newFolder);</span><br><span class="line">    <span class="keyword">end</span></span><br><span class="line">    </span><br><span class="line">    <span class="comment">% 复制并重命名PDF文件</span></span><br><span class="line">    newPdfPath = fullfile(newFolder, <span class="string">'report.pdf'</span>);</span><br><span class="line">    copyfile(currentPdfPath, newPdfPath);</span><br><span class="line"><span class="keyword">end</span></span><br><span class="line"></span><br><span class="line"><span class="comment">% 压缩pdf文件夹</span></span><br><span class="line">zipFileName = fullfile(fileparts(selectedFolder), <span class="string">'pdf.zip'</span>);</span><br><span class="line">zip(zipFileName, pdfFolder);</span><br><span class="line"></span><br><span class="line"><span class="comment">% 提示操作完成</span></span><br><span class="line"><span class="built_in">disp</span>(<span class="string">'PDF文件已提取并压缩完成。'</span>);</span><br></pre></td></tr></tbody></table><p>可以完成！</p>]]></content>
    
    
    <summary type="html">进阶基本命令 clc,clear%doc fullfilef1 &amp;#x3D; &amp;#x27;111&#92;222&#92;333.m&amp;#x27;;f2 &amp;#x3D; fullfile(&amp;#x27;111&amp;#x27;,&amp;#x27;222&amp;#x27;,&amp;#x27;333.m&amp;#x27;);f3 &amp;#x3D; strcat(&amp;#x27;111&#92;&amp;amp;#x</summary>
    
    
    
    <category term="生物信息学" scheme="https://song-xudong.github.io/categories/%E7%94%9F%E7%89%A9%E4%BF%A1%E6%81%AF%E5%AD%A6/"/>
    
    
    <category term="MATLAB" scheme="https://song-xudong.github.io/tags/MATLAB/"/>
    
    <category term="影像数据处理" scheme="https://song-xudong.github.io/tags/%E5%BD%B1%E5%83%8F%E6%95%B0%E6%8D%AE%E5%A4%84%E7%90%86/"/>
    
  </entry>
  
  <entry>
    <title>MATLAB影像数据处理(一)</title>
    <link href="https://song-xudong.github.io/2025/02/24/MATLAB%E5%BD%B1%E5%83%8F%E6%95%B0%E6%8D%AE%E5%A4%84%E7%90%86(%E4%B8%80)/"/>
    <id>https://song-xudong.github.io/2025/02/24/MATLAB%E5%BD%B1%E5%83%8F%E6%95%B0%E6%8D%AE%E5%A4%84%E7%90%86(%E4%B8%80)/</id>
    <published>2025-02-24T08:19:02.000Z</published>
    <updated>2025-02-24T09:27:54.000Z</updated>
    
    <content type="html"><![CDATA[<h1 id="安装"><a href="#安装" class="headerlink" title="安装"></a>安装</h1><p>参考B站资源和破解方法</p><p><a href="https://www.bilibili.com/video/BV1DoAweWENJ/?spm_id_from=333.788.top_right_bar_window_default_collection.content.click">https://www.bilibili.com/video/BV1DoAweWENJ/?spm_id_from&#x3D;333.788.top_right_bar_window_default_collection.content.click</a></p><p>成功安装，可以使用</p><h2 id="基本函数"><a href="#基本函数" class="headerlink" title="基本函数"></a>基本函数</h2><p>参考教程：<a href="https://www.bilibili.com/video/BV1bv411B7wX?spm_id_from=333.788.videopod.episodes&vd_source=b938c9620af06f4224f5fd4db315cbd4">https://www.bilibili.com/video/BV1bv411B7wX?spm_id_from&#x3D;333.788.videopod.episodes&amp;vd_source&#x3D;b938c9620af06f4224f5fd4db315cbd4</a></p><p><img src="/picture/image-20250222163942309.png" alt="image-20250222163942309"></p><h3 id="查看帮助"><a href="#查看帮助" class="headerlink" title="查看帮助"></a>查看帮助</h3><p>help或者doc加上想要查看帮助的函数</p><table><tbody><tr><td class="code"><pre><span class="line">help mkdir</span><br><span class="line">doc mkdir</span><br></pre></td></tr></tbody></table><p>路径</p><p>cd ..</p><p>cd …</p><p>语法和linux基本操作一致，较为简单</p><h3 id="简单尝试"><a href="#简单尝试" class="headerlink" title="简单尝试"></a>简单尝试</h3><table><tbody><tr><td class="code"><pre><span class="line"><span class="comment">%这是一个测试文件</span></span><br><span class="line"><span class="comment">%清空变量和界面</span></span><br><span class="line">clc,clear</span><br><span class="line"><span class="comment">%print work directory </span></span><br><span class="line">pwd</span><br><span class="line"><span class="comment">%make directory </span></span><br><span class="line">mkdir <span class="number">123</span></span><br><span class="line"><span class="comment">%remove  directory </span></span><br><span class="line">rmdir <span class="number">123</span></span><br><span class="line"><span class="comment">%list </span></span><br><span class="line">ls</span><br><span class="line"><span class="comment">%尝试声明变量</span></span><br><span class="line">a=<span class="number">1</span>;</span><br><span class="line">b=<span class="number">2</span>;</span><br><span class="line"><span class="comment">%copy file and rename</span></span><br><span class="line">copyfile(<span class="string">"test.m"</span>,<span class="string">"test2.m"</span>)</span><br><span class="line"><span class="comment">%find function load</span></span><br><span class="line">which ls</span><br></pre></td></tr></tbody></table><p><img src="/picture/image-20250224142052391.png" alt="image-20250224142052391"></p><h2 id="矩阵操作"><a href="#矩阵操作" class="headerlink" title="矩阵操作"></a>矩阵操作</h2><p><img src="/picture/image-20250224150915166.png" alt="image-20250224150915166"></p><table><tbody><tr><td class="code"><pre><span class="line"><span class="comment">%产生等差数列,从1开始（默认也为1），每次增加2，最大不超过20</span></span><br><span class="line"><span class="number">1</span>:<span class="number">2</span>:<span class="number">20</span></span><br><span class="line"></span><br><span class="line"><span class="comment">%随机生成0-1之间的数值,5行，3列</span></span><br><span class="line">a=<span class="built_in">rand</span>(<span class="number">5</span>,<span class="number">3</span>)</span><br><span class="line"></span><br><span class="line"><span class="comment">%得到第1行，第2列的数值。</span></span><br><span class="line">a(<span class="number">1</span>,<span class="number">2</span>)</span><br><span class="line"></span><br><span class="line"><span class="comment">%第一列的数值</span></span><br><span class="line">a(:,<span class="number">1</span>)</span><br><span class="line">a(<span class="number">1</span>:<span class="keyword">end</span>,<span class="number">1</span>)</span><br><span class="line"></span><br><span class="line"><span class="comment">%第一行的数值</span></span><br><span class="line">a(<span class="number">1</span>,:)</span><br><span class="line">a(<span class="number">1</span>,<span class="number">1</span>:<span class="keyword">end</span>)</span><br><span class="line"></span><br><span class="line"><span class="comment">%MATLAB的运算以列优先，如果想得到一个矩阵的某个值也可只用一个数值得到</span></span><br><span class="line"><span class="comment">%从列开始数，第7个值</span></span><br><span class="line">a(<span class="number">7</span>)</span><br><span class="line"></span><br><span class="line"><span class="comment">%只取第1，3列</span></span><br><span class="line">a(:,[<span class="number">1</span>,<span class="number">3</span>])</span><br><span class="line"></span><br><span class="line"><span class="comment">%只取，第2,4行，第1，3列的交叉元素</span></span><br><span class="line">a([<span class="number">2</span>,<span class="number">4</span>],[<span class="number">1</span>,<span class="number">3</span>])</span><br><span class="line"></span><br><span class="line"><span class="comment">%矩阵的拼接</span></span><br><span class="line">a = <span class="built_in">rand</span>(<span class="number">1</span>,<span class="number">10</span>); </span><br><span class="line">b = <span class="built_in">rand</span>(<span class="number">1</span>,<span class="number">10</span>); </span><br><span class="line">c = [a b] </span><br><span class="line"><span class="comment">%整行拼接</span></span><br><span class="line">c = [a, b]</span><br><span class="line"><span class="comment">%按列拼接</span></span><br><span class="line">c = [a; b]</span><br></pre></td></tr></tbody></table><h2 id="简单的循环作业"><a href="#简单的循环作业" class="headerlink" title="简单的循环作业"></a>简单的循环作业</h2><p><img src="/picture/image-20250224154809419.png" alt="image-20250224154809419"></p><table><tbody><tr><td class="code"><pre><span class="line"><span class="comment">% 获取当前目录</span></span><br><span class="line">currentDir = pwd;</span><br><span class="line"></span><br><span class="line"><span class="comment">% 定义ori文件夹路径</span></span><br><span class="line">oriDir = fullfile(currentDir, <span class="string">'ori'</span>);</span><br><span class="line"></span><br><span class="line"><span class="comment">% 定义sub文件夹路径</span></span><br><span class="line">subDir = fullfile(currentDir, <span class="string">'sub'</span>);</span><br><span class="line"></span><br><span class="line"><span class="comment">% % 创建sub文件夹</span></span><br><span class="line"><span class="comment">% %if ~exist(subDir, 'dir')</span></span><br><span class="line"><span class="comment">%     mkdir(subDir);</span></span><br><span class="line"><span class="comment">% end</span></span><br><span class="line"></span><br><span class="line"><span class="comment">% 获取ori文件夹中的所有PDF文件</span></span><br><span class="line">pdfFiles = dir(fullfile(oriDir, <span class="string">'*.pdf'</span>))</span><br><span class="line"></span><br><span class="line"><span class="comment">% 遍历每个PDF文件</span></span><br><span class="line"><span class="keyword">for</span> <span class="built_in">i</span> = <span class="number">1</span>:<span class="built_in">length</span>(pdfFiles)</span><br><span class="line">    <span class="comment">% 获取PDF文件名</span></span><br><span class="line">    pdfName = pdfFiles(<span class="built_in">i</span>).name;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">% 提取编号（假设文件名格式为 '编号.pdf'）</span></span><br><span class="line">    [~, name, ~] = fileparts(pdfName);</span><br><span class="line">    folderName = name; <span class="comment">% 假设文件名就是编号</span></span><br><span class="line">    </span><br><span class="line">    <span class="comment">% 创建以编号命名的子文件夹</span></span><br><span class="line">    newFolder = fullfile(subDir, folderName);</span><br><span class="line">    <span class="keyword">if</span> ~exist(newFolder, <span class="string">'dir'</span>)</span><br><span class="line">        mkdir(newFolder);</span><br><span class="line">    <span class="keyword">end</span></span><br><span class="line">    </span><br><span class="line">    <span class="comment">% 移动并重命名PDF文件</span></span><br><span class="line">    sourceFile = fullfile(oriDir, pdfName);</span><br><span class="line">    destinationFile = fullfile(newFolder, <span class="string">'report.pdf'</span>);</span><br><span class="line">    movefile(sourceFile, destinationFile);</span><br><span class="line"><span class="keyword">end</span></span><br><span class="line"></span><br><span class="line"><span class="comment">% 删除sub文件夹及其内容</span></span><br><span class="line">rmdir(subDir, <span class="string">'s'</span>);</span><br><span class="line"></span><br><span class="line"><span class="built_in">disp</span>(<span class="string">'操作完成'</span>);</span><br></pre></td></tr></tbody></table><p>可以运行！</p>]]></content>
    
    
    <summary type="html">安装参考B站资源和破解方法 https:&amp;#x2F;&amp;#x2F;www.bilibili.com&amp;#x2F;video&amp;#x2F;BV1DoAweWENJ&amp;#x2F;?spm_id_from&amp;#x3D;333.788.top_right_bar_window_default_collection.content.click 成功安装，可以使用 基本函数参考教</summary>
    
    
    
    <category term="生物信息学" scheme="https://song-xudong.github.io/categories/%E7%94%9F%E7%89%A9%E4%BF%A1%E6%81%AF%E5%AD%A6/"/>
    
    
    <category term="MATLAB" scheme="https://song-xudong.github.io/tags/MATLAB/"/>
    
    <category term="影像数据处理" scheme="https://song-xudong.github.io/tags/%E5%BD%B1%E5%83%8F%E6%95%B0%E6%8D%AE%E5%A4%84%E7%90%86/"/>
    
  </entry>
  
  <entry>
    <title>LLM大模型（一）</title>
    <link href="https://song-xudong.github.io/2025/01/16/LLM%E5%A4%A7%E6%A8%A1%E5%9E%8B%EF%BC%88%E4%B8%80%EF%BC%89/"/>
    <id>https://song-xudong.github.io/2025/01/16/LLM%E5%A4%A7%E6%A8%A1%E5%9E%8B%EF%BC%88%E4%B8%80%EF%BC%89/</id>
    <published>2025-01-16T05:55:12.000Z</published>
    <updated>2025-02-24T10:20:48.000Z</updated>
    
    <content type="html"><![CDATA[<h1 id="LLM大模型的概念"><a href="#LLM大模型的概念" class="headerlink" title="LLM大模型的概念"></a>LLM大模型的概念</h1><p>参考视频：<a href="https://www.bilibili.com/video/BV1XS411w7qr?spm_id_from=333.788.videopod.episodes&vd_source=b938c9620af06f4224f5fd4db315cbd4&p=2">https://www.bilibili.com/video/BV1XS411w7qr?spm_id_from&#x3D;333.788.videopod.episodes&amp;vd_source&#x3D;b938c9620af06f4224f5fd4db315cbd4&amp;p&#x3D;2</a></p><p><img src="/picture/image-20250116135716298.png" alt="image-20250116135716298"></p><ul><li>LLM（Large Language Models）大模型指的是使用大量参数和数据的语言模型，它们能够理解和生成自然语言文本。这些模型通常基于深度学习技术，尤其是变换器（Transformer）架构。</li><li>LLM是AI在自然语言处理（NLP）领域的一种应用，它们能够理解和生成自然语言，应用于机器翻译、文本摘要、问答系统等众多场景。</li></ul><h2 id="生成式AI的概念"><a href="#生成式AI的概念" class="headerlink" title="生成式AI的概念"></a>生成式AI的概念</h2><p><img src="/picture/image-20250116143753854.png" alt="image-20250116143753854"></p><p>ChatGPT也是AI的一个实例，它利用了LLM的强大能力，通过对话的形式与用户交互。</p><p>可以理解为生成式AI是机器学习、深度学习的高阶体现。</p><h1 id="生成式AI的使用"><a href="#生成式AI的使用" class="headerlink" title="生成式AI的使用"></a>生成式AI的使用</h1><h2 id="生成ChatGPT的API"><a href="#生成ChatGPT的API" class="headerlink" title="生成ChatGPT的API"></a>生成ChatGPT的API</h2><p>收费，但刚创建账号时有一定额度，会到期！</p><p><a href="https://platform.openai.com/settings/organization/api-keys">https://platform.openai.com/settings/organization/api-keys</a></p><p>直接在openai的网站生成即可（需要连接外网）</p><p>注意生成时复制，因为只会展示一次。</p><p><img src="/picture/image-20250117173643868.png" alt="image-20250117173643868"></p><h2 id="生成Gemine的API"><a href="#生成Gemine的API" class="headerlink" title="生成Gemine的API"></a>生成Gemine的API</h2><p>由Google开发，免费</p><p><a href="https://aistudio.google.com/app/apikey">https://aistudio.google.com/app/apikey</a></p><p>注意API不要透露给他人</p><p><img src="/picture/image-20250117174603168.png" alt="image-20250117174603168"></p><h1 id="在平台上使用ChatGPT-api"><a href="#在平台上使用ChatGPT-api" class="headerlink" title="在平台上使用ChatGPT-api"></a>在平台上使用ChatGPT-api</h1><p>代码参:<a href="https://github.com/Hoper-J/AI-Guide-and-Demos-zh_CN/tree/master">https://github.com/Hoper-J/AI-Guide-and-Demos-zh_CN&#x2F;tree&#x2F;master</a></p><p>由于生成式AI的开发是较为复杂的，并且暂未完全开源，学习的初级阶段先掌握ChatGPT的使用</p><p>在colab上使用<a href="https://colab.research.google.com/drive/">https://colab.research.google.com/drive/</a></p><p>需要翻墙，由于国内无法直连ChatGPT，很多报错只是由于网络问题</p><h2 id="基本结构"><a href="#基本结构" class="headerlink" title="基本结构"></a>基本结构</h2><table><tbody><tr><td class="code"><pre><span class="line">!pip install openai</span><br><span class="line">!pip install gradio</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="comment">##基本结构</span></span><br><span class="line"><span class="keyword">from</span> openai <span class="keyword">import</span> OpenAI</span><br><span class="line"><span class="keyword">import</span> openai</span><br><span class="line"><span class="keyword">import</span> gradio <span class="keyword">as</span> gr</span><br><span class="line"><span class="keyword">import</span> json</span><br><span class="line"><span class="keyword">from</span> typing <span class="keyword">import</span> <span class="type">List</span>, <span class="type">Dict</span>, <span class="type">Tuple</span></span><br><span class="line"></span><br><span class="line">client = OpenAI(</span><br><span class="line">    <span class="comment"># defaults to os.environ.get("OPENAI_API_KEY")</span></span><br><span class="line">    api_key=<span class="string">"自己的API"</span>,</span><br><span class="line">    base_url=<span class="string">"https://models.inference.ai.azure.com"</span></span><br><span class="line">    <span class="comment"># base_url="https://api.chatanywhere.org/v1"</span></span><br><span class="line">)</span><br><span class="line">response = client.chat.completions.create(</span><br><span class="line">        model=<span class="string">"gpt-4o"</span>,</span><br><span class="line">        messages=[{<span class="string">'role'</span>: <span class="string">'user'</span>, <span class="string">'content'</span>: <span class="string">'请告诉我关于机器学习的基本概念'</span>}],</span><br><span class="line">        max_tokens=<span class="number">100</span>,</span><br><span class="line">)</span><br><span class="line">message_content = response.choices[<span class="number">0</span>].message.content</span><br><span class="line"><span class="built_in">print</span>(message_content)</span><br></pre></td></tr></tbody></table><h1 id="使用-API-快速搭建你的第一个-AI-应用"><a href="#使用-API-快速搭建你的第一个-AI-应用" class="headerlink" title="使用 API 快速搭建你的第一个 AI 应用"></a>使用 API 快速搭建你的第一个 AI 应用</h1><h2 id="测试API"><a href="#测试API" class="headerlink" title="测试API"></a>测试API</h2><table><tbody><tr><td class="code"><pre><span class="line">!pip install openai</span><br><span class="line">!pip install gradio</span><br><span class="line"></span><br><span class="line">import os</span><br><span class="line">import json</span><br><span class="line">from typing import List, Dict, Tuple</span><br><span class="line"></span><br><span class="line">import openai</span><br><span class="line">import gradio as gr</span><br><span class="line"></span><br><span class="line"># TODO: 设置你的 OPENAI API 密钥，这里以阿里云 DashScope API 为例进行演示</span><br><span class="line">OPENAI_API_KEY = "自己的API"</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">client = openai.OpenAI(</span><br><span class="line">    api_key=OPENAI_API_KEY,</span><br><span class="line">    base_url="https://models.inference.ai.azure.com",  # 使用GitHub的CatGPT的API</span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"># 检查是否正确设置了 API</span><br><span class="line"># 如果一切正常，你将看到 "API 设置成功！！"</span><br><span class="line">try:</span><br><span class="line">    response = client.chat.completions.create(</span><br><span class="line">            model="gpt-4o",  # 可以使用gpt-4o或者gpt-4o-mini，资源有限</span><br><span class="line">            messages=[{'role': 'user', 'content': "测试"}],  # 设置一个简单的测试消息</span><br><span class="line">            max_tokens=1,</span><br><span class="line">    )</span><br><span class="line">    print("API 设置成功！！")  # 输出成功信息</span><br><span class="line">except Exception as e:</span><br><span class="line">    print(f"API 可能有问题，请检查：{e}")  # 输出详细的错误信息</span><br></pre></td></tr></tbody></table><p>如果调用成功则会显示</p><p><img src="/picture/image-20250118155515388.png" alt="image-20250118155515388"></p><p>API 设置成功！！</p><h2 id="文章摘要（单轮对话应用）"><a href="#文章摘要（单轮对话应用）" class="headerlink" title="文章摘要（单轮对话应用）"></a>文章摘要（单轮对话应用）</h2><p>在此任务中，你需要将你的聊天机器人变为一个<strong>摘要器</strong>。它的工作是当用户输入一篇文章时，能够为用户总结该文章的内容。</p><p>你需要完成以下步骤：</p><ol><li>设计一个用于生成摘要的提示词，并填写在 <strong>prompt_for_summarization</strong> 中。</li><li><strong>点击运行按钮</strong>， 这将弹出一个可交互的界面。</li><li>你可以找到一篇文章或使用当前的示例文章：《从百草园到三味书屋》，并将其填写在标记为“文章”的输入框中。</li><li>点击“发送”按钮生成文章的摘要。（你可以使用“温度”滑块来控制输出的创造性，温度越高，输出越具创造性）。</li><li>如果你<strong>想更改提示词</strong>，可以停止单元格，返回到TODO部分进行更改，然后再次运行。</li><li>在你获得满意的结果后，点击“导出”按钮保存结果。文件列表中将出现一个名为 <strong>part1.json</strong> 的文件。</li></ol><p>注意：</p><ul><li><strong>如果你再次点击“导出”按钮，之前的结果将被覆盖。</strong></li><li><strong>即使使用相同的提示词，输出的结果可能仍然不同。</strong></li></ul><hr><p>在运行此单元格之前，请确保已运行 <strong>安装包</strong> 和 <strong>导入与设置</strong>。</p><p><strong>记得在进行下一步前停止此单元格。</strong></p><table><tbody><tr><td class="code"><pre><span class="line"><span class="comment"># <span class="doctag">TODO:</span> 在此处输入用于摘要的提示词</span></span><br><span class="line">prompt_for_summarization = <span class="string">"请将以下文章概括成几句话。"</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># 重置对话的函数</span></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">reset</span>() -&gt; <span class="type">List</span>:</span><br><span class="line">    <span class="keyword">return</span> []</span><br><span class="line"></span><br><span class="line"><span class="comment"># 调用模型生成摘要的函数</span></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">interact_summarization</span>(<span class="params">prompt: <span class="built_in">str</span>, article: <span class="built_in">str</span>, temp=<span class="number">1.0</span></span>) -&gt; <span class="type">List</span>[<span class="type">Tuple</span>[<span class="built_in">str</span>, <span class="built_in">str</span>]]:</span><br><span class="line">    <span class="string">'''</span></span><br><span class="line"><span class="string">    * 参数:</span></span><br><span class="line"><span class="string">      - prompt: 我们在此部分中使用的提示词</span></span><br><span class="line"><span class="string">      - article: 需要摘要的文章</span></span><br><span class="line"><span class="string">      - temp: 模型的温度参数。温度用于控制聊天机器人的输出。温度越高，响应越具创造性。</span></span><br><span class="line"><span class="string">    '''</span></span><br><span class="line">    <span class="built_in">input</span> = <span class="string">f"<span class="subst">{prompt}</span>\n<span class="subst">{article}</span>"</span></span><br><span class="line">    response = client.chat.completions.create(</span><br><span class="line">        model=<span class="string">"gpt-4o"</span>,  <span class="comment"># 使用阿里云 DashScope 的模型</span></span><br><span class="line">        messages=[{<span class="string">'role'</span>: <span class="string">'user'</span>, <span class="string">'content'</span>: <span class="built_in">input</span>}],</span><br><span class="line">        temperature=temp,</span><br><span class="line">        max_tokens=<span class="number">200</span>,  <span class="comment"># 你需要注意到这里设置了文本的长度上限。</span></span><br><span class="line">    )</span><br><span class="line"></span><br><span class="line">    <span class="keyword">return</span> [(<span class="built_in">input</span>, response.choices[<span class="number">0</span>].message.content)]</span><br><span class="line"></span><br><span class="line"><span class="comment">##对话导出为本文件夹下为的part1.json文件</span></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">export_summarization</span>(<span class="params">chatbot: <span class="type">List</span>[<span class="type">Tuple</span>[<span class="built_in">str</span>, <span class="built_in">str</span>]], article: <span class="built_in">str</span></span>) -&gt; <span class="literal">None</span>:</span><br><span class="line">    <span class="string">'''</span></span><br><span class="line"><span class="string">    * 参数:</span></span><br><span class="line"><span class="string">      - chatbot: 模型的对话记录，存储在元组列表中</span></span><br><span class="line"><span class="string">      - article: 需要摘要的文章</span></span><br><span class="line"><span class="string">    '''</span></span><br><span class="line">    target = {<span class="string">"chatbot"</span>: chatbot, <span class="string">"article"</span>: article}</span><br><span class="line">    <span class="keyword">with</span> <span class="built_in">open</span>(<span class="string">"part1.json"</span>, <span class="string">"w"</span>) <span class="keyword">as</span> file:</span><br><span class="line">        json.dump(target, file)</span><br><span class="line"></span><br><span class="line"><span class="comment"># 生成 Gradio 的UI界面</span></span><br><span class="line"><span class="keyword">with</span> gr.Blocks() <span class="keyword">as</span> demo:</span><br><span class="line">    gr.Markdown(<span class="string">"# 第1部分：摘要\n填写任何你喜欢的文章，让聊天机器人为你总结！"</span>)</span><br><span class="line">    chatbot = gr.Chatbot()</span><br><span class="line">    prompt_textbox = gr.Textbox(label=<span class="string">"提示词"</span>, value=prompt_for_summarization, visible=<span class="literal">False</span>)</span><br><span class="line">    article_textbox = gr.Textbox(label=<span class="string">"文章"</span>, interactive=<span class="literal">True</span>, value=<span class="string">"我家的后面有一个很大的园，相传叫作百草园。现在是早已并屋子一起卖给朱 文公的子孙了，连那最末次的相见也已经隔了七八年，其中似乎确凿只有一些野草 ；但那时却是我的乐园。 　　不必说碧绿的菜畦，光滑的石井栏，高大的皂荚树，紫红的桑椹；也不必说鸣 蝉在树叶里长吟，肥胖的黄蜂伏在菜花上，轻捷的叫天子（云雀）忽然从草间直窜 向云霄里去了。单是周围的短短的泥墙根一带，就有无限趣味。油蛉在这里低唱， 蟋蟀们在这里弹琴。翻开断砖来，有时会遇见蜈蚣；还有斑蝥，倘若用手指按住它 的脊梁，便会拍的一声，从后窍喷出一阵烟雾。何首乌藤和木莲藤缠络着，木莲有 莲房一般的果实，何首乌有拥肿的根。有人说，何首乌根是有象人形的，吃了便可 以成仙，我于是常常拔它起来，牵连不断地拔起来，也曾因此弄坏了泥墙，却从来 没有见过有一块根象人样。如果不怕刺，还可以摘到覆盆子，象小珊瑚珠攒成的小 球，又酸又甜，色味都比桑椹要好得远。 　 　　长的草里是不去的，因为相传这园里有一条很大的赤练蛇。 　　长妈妈曾经讲给我一个故事听：先前，有一个读书人住在古庙里用功，晚间， 在院子里纳凉的时候，突然听到有人在叫他。答应着，四面看时，却见一个美女的 脸露在墙头上，向他一笑，隐去了。他很高兴；但竟给那走来夜谈的老和尚识破了 机关。说他脸上有些妖气，一定遇见“美女蛇”了；这是人首蛇身的怪物，能唤人 名，倘一答应，夜间便要来吃这人的肉的。他自然吓得要死，而那老和尚却道无妨 ，给他一个小盒子，说只要放在枕边，便可高枕而卧。他虽然照样办，却总是睡不 着，——当然睡不着的。到半夜，果然来了，沙沙沙！门外象是风雨声。他正抖作 一团时，却听得豁的一声，一道金光从枕边飞出，外面便什么声音也没有了，那金 光也就飞回来，敛在盒子里。后来呢？后来，老和尚说，这是飞蜈蚣，它能吸蛇的 脑髓，美女蛇就被它治死了。 　　结末的教训是：所以倘有陌生的声音叫你的名字，你万不可答应他。　　 　　这故事很使我觉得做人之险，夏夜乘凉，往往有些担心，不敢去看墙上，而且 极想得到一盒老和尚那样的飞蜈蚣。走到百草园的草丛旁边时，也常常这样想。但 直到现在，总还没有得到，但也没有遇见过赤练蛇和美女蛇。叫我名字的陌生声音 自然是常有的，然而都不是美女蛇。 　　冬天的百草园比较的无味；雪一下，可就两样了。拍雪人（将自己的全形印在 雪上）和塑雪罗汉需要人们鉴赏，这是荒园，人迹罕至，所以不相宜，只好来捕鸟 。薄薄的雪，是不行的；总须积雪盖了地面一两天，鸟雀们久已无处觅食的时候才 好。扫开一块雪，露出地面，用一支短棒支起一面大的竹筛来，下面撒些秕谷，棒 上系一条长绳，人远远地牵着，看鸟雀下来啄食，走到竹筛底下的时候，将绳子一 拉，便罩住了。但所得的是麻雀居多，也有白颊的“张飞鸟”，性子很躁，养不过 夜的。 　　这是闰土的父亲所传授的方法，我却不大能用。明明见它们进去了，拉了绳， 跑去一看，却什么都没有，费了半天力，捉住的不过三四只。闰土的父亲是小半天 便能捕获几十只，装在叉袋里叫着撞着的。我曾经问他得失的缘由，他只静静地笑 道：你太性急，来不及等它走到中间去。 　　我不知道为什么家里的人要将我送进书塾里去了，而且还是全城中称为最严厉 的书塾。也许是因为拔何首乌毁了泥墙罢，也许是因为将砖头抛到间壁的梁家去了 罢，也许是因为站在石井栏上跳下来罢，……都无从知道。总而言之：我将不能常 到百草园了。Ａｄｅ，我的蟋蟀们！Ａｄｅ，我的覆盆子们和木莲们！ 　　出门向东，不上半里，走过一道石桥，便是我的先生的家了。从一扇黑油的竹 门进去，第三间是书房。中间挂着一块扁道：三味书屋；扁下面是一幅画，画着一 只很肥大的梅花鹿伏在古树下。没有孔子牌位，我们便对着那扁和鹿行礼。第一次 算是拜孔子，第二次算是拜先生。 　　第二次行礼时，先生便和蔼地在一旁答礼。他是一个高而瘦的老人，须发都花 白了，还戴着大眼镜。我对他很恭敬，因为我早听到，他是本城中极方正，质朴， 博学的人。 　　不知从那里听来的，东方朔也很渊博，他认识一种虫，名曰“怪哉”，冤气所 化，用酒一浇，就消释了。我很想详细地知道这故事，但阿长是不知道的，因为她 毕竟不渊博。现在得到机会了，可以问先生。 　　“先生，‘怪哉’这虫，是怎么一回事？……”我上了生书，将要退下来的时 候，赶忙问。 　　“不知道！”他似乎很不高兴，脸上还有怒色了。 　　我才知道做学生是不应该问这些事的，只要读书，因为他是渊博的宿儒，决不 至于不知道，所谓不知道者，乃是不愿意说。年纪比我大的人，往往如此，我遇见 过好几回了。 　　我就只读书，正午习字，晚上对课。先生最初这几天对我很严厉，后来却好起 来了，不过给我读的书渐渐加多，对课也渐渐地加上字去，从三言到五言，终于到 七言。 　　三味书屋后面也有一个园，虽然小，但在那里也可以爬上花坛去折腊梅花，在 地上或桂花树上寻蝉蜕。最好的工作是捉了苍蝇喂蚂蚁，静悄悄地没有声音。然而 同窗们到园里的太多，太久，可就不行了，先生在书房里便大叫起来：—— 　　“人都到那里去了？” 　　人们便一个一个陆续走回去；一同回去，也不行的。他有一条戒尺，但是不常 用，也有罚跪的规矩，但也不常用，普通总不过瞪几眼，大声道：—— 　　“读书！” 　　于是大家放开喉咙读一阵书，真是人声鼎沸。有念“仁远乎哉我欲仁斯仁至矣 ”的，有念“笑人齿缺曰狗窦大开”的，有念“上九潜龙勿用”的，有念“厥土下 上上错厥贡苞茅橘柚”的……先生自己也念书。后来，我们的声音便低下去，静下 去了，只有他还大声朗读着：—— 　　“铁如意，指挥倜傥，一座皆惊呢～～；金叵罗，颠倒淋漓噫，千杯未醉嗬～ ～……” 　　我疑心这是极好的文章，因为读到这里，他总是微笑起来，而且将头仰起，摇 着，向后面拗过去，拗过去。 　　先生读书入神的时候，于我们是很相宜的。有几个便用纸糊的盔甲套在指甲上 做戏。我是画画儿，用一种叫作“荆川纸”的，蒙在小说的绣像上一个个描下来， 象习字时候的影写一样。读的书多起来，画的画也多起来；书没有读成，画的成绩 却不少了，最成片断的是《荡寇志》和《西游记》的绣像，都有一大本。后来，因 为要钱用，卖给一个有钱的同窗了。他的父亲是开锡箔店的；听说现在自己已经做 了店主，而且快要升到绅士的地位了。这东西早已没有了罢。 　　　　　　　　　　　　　　　　　　 　　九月十八日。"</span>)</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">with</span> gr.Column():</span><br><span class="line">        gr.Markdown(<span class="string">"# 温度调节\n温度用于控制聊天机器人的输出。温度越高，响应越具创造性。"</span>)</span><br><span class="line">        temperature_slider = gr.Slider(<span class="number">0.0</span>, <span class="number">2.0</span>, <span class="number">1.0</span>, step=<span class="number">0.1</span>, label=<span class="string">"温度"</span>)</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">with</span> gr.Row():</span><br><span class="line">        sent_button = gr.Button(value=<span class="string">"发送"</span>)</span><br><span class="line">        reset_button = gr.Button(value=<span class="string">"重置"</span>)</span><br><span class="line"></span><br><span class="line">    <span class="keyword">with</span> gr.Column():</span><br><span class="line">        gr.Markdown(<span class="string">"# 保存结果\n当你对结果满意后，点击导出按钮保存结果。"</span>)</span><br><span class="line">        export_button = gr.Button(value=<span class="string">"导出"</span>)</span><br><span class="line">    </span><br><span class="line">    <span class="comment"># 连接按钮与函数</span></span><br><span class="line">    sent_button.click(interact_summarization, inputs=[prompt_textbox, article_textbox, temperature_slider], outputs=[chatbot])</span><br><span class="line">    reset_button.click(reset, outputs=[chatbot])</span><br><span class="line">    export_button.click(export_summarization, inputs=[chatbot, article_textbox])</span><br><span class="line"></span><br><span class="line"><span class="comment"># 启动 Gradio 界面</span></span><br><span class="line">demo.launch(debug=<span class="literal">True</span>)</span><br></pre></td></tr></tbody></table><h3 id="检查并打印你的结果"><a href="#检查并打印你的结果" class="headerlink" title="检查并打印你的结果"></a>检查并打印你的结果</h3><table><tbody><tr><td class="code"><pre><span class="line"># 加载对话记录的 JSON 文件</span><br><span class="line">with open("part1.json", "r") as f:</span><br><span class="line">    context = json.load(f)</span><br><span class="line"></span><br><span class="line">chatbot = context['chatbot']  # 获取对话记录</span><br><span class="line">article = context['article']  # 获取原始文章</span><br><span class="line">summarization = chatbot[0][-1]  # 获取摘要结果</span><br><span class="line"></span><br><span class="line"># 生成 Gradio 的UI界面</span><br><span class="line">with gr.Blocks() as demo:</span><br><span class="line">    gr.Markdown("# 第1部分：摘要\n你可以查看文章和摘要！")</span><br><span class="line">    chatbot = gr.Chatbot(value=context['chatbot'])  # 加载对话历史</span><br><span class="line">    article_textbox = gr.Textbox(label="文章", interactive=False, value=context['article'])  # 显示原始文章</span><br><span class="line"></span><br><span class="line">    # 构建展示摘要和原文的部分</span><br><span class="line">    with gr.Column():</span><br><span class="line">        gr.Markdown("# 只是一个检查")</span><br><span class="line">        gr.Textbox(label="文章", value=article, show_copy_button=True)  # 显示并允许复制原文</span><br><span class="line">        gr.Textbox(label="摘要", value=summarization, show_copy_button=True)  # 显示并允许复制摘要</span><br><span class="line"></span><br><span class="line"># 启动 Gradio 界面</span><br><span class="line">demo.launch(debug=True)</span><br></pre></td></tr></tbody></table><h2 id="第2部分：角色扮演（多轮对话应用）"><a href="#第2部分：角色扮演（多轮对话应用）" class="headerlink" title="第2部分：角色扮演（多轮对话应用）"></a>第2部分：角色扮演（多轮对话应用）</h2><p>在此任务中，你需要将聊天机器人设定为<strong>角色扮演模式</strong>。你应该为它指定一个角色，然后通过提示让它进入该角色的状态。</p><p>你需要完成以下步骤：</p><ol><li>想出一个你希望聊天机器人扮演的<strong>角色</strong>，以及一个使聊天机器人进入该角色的提示词。在 <strong>character_for_chatbot</strong> 中填写角色，在 <strong>prompt_for_roleplay</strong> 中填写提示词。</li><li><strong>点击运行按钮，界面将弹出一个可交互的界面。</strong></li><li><strong>与聊天机器人进行</strong> 2 轮 <strong>互动</strong>。在标为“输入”的框中输入你想说的话，然后点击“发送”按钮。（你可以使用“温度”滑块来控制输出的创造性。）</li><li>如果你<strong>想更改提示词或角色</strong>，可以停止单元格，返回TODO重新设置，然后重新运行单元格。</li><li>在你获得满意的结果后，点击“导出”按钮保存结果。文件列表中将出现一个名为 <strong>part2.json</strong> 的文件。</li></ol><p>注意：</p><ul><li><strong>如果你再次点击“导出”按钮，之前的结果将被覆盖。</strong></li><li><strong>即使使用相同的提示词，输出的结果可能仍然不同。</strong></li></ul><hr><p>在运行此单元格之前，请确保已运行 <strong>安装包</strong> 和 <strong>导入与设置</strong>。</p><p><strong>记得在进行下一步前停止此单元格。</strong></p><table><tbody><tr><td class="code"><pre><span class="line"># TODO: 填写以下两行：character_for_chatbot 和 prompt_for_roleplay</span><br><span class="line"># 第一个是你希望聊天机器人扮演的角色（注意，真正起作用的实际是prompt）</span><br><span class="line"># 第二个是使聊天机器人扮演某个角色的提示词</span><br><span class="line">character_for_chatbot = "面试官"</span><br><span class="line">prompt_for_roleplay = "我需要你面试我有关AI的知识，仅提出问题"</span><br><span class="line"></span><br><span class="line"># 清除对话的函数</span><br><span class="line">def reset() -&gt; List:</span><br><span class="line">    return []</span><br><span class="line"></span><br><span class="line"># 调用模型生成对话的函数</span><br><span class="line">def interact_roleplay(chatbot: List[Tuple[str, str]], user_input: str, temp=1.0) -&gt; List[Tuple[str, str]]:</span><br><span class="line">    '''</span><br><span class="line">    * 参数:</span><br><span class="line"></span><br><span class="line">      - user_input: 每轮对话中的用户输入</span><br><span class="line"></span><br><span class="line">      - temp: 模型的温度参数。温度用于控制聊天机器人的输出。温度越高，响应越具创造性。</span><br><span class="line"></span><br><span class="line">    '''</span><br><span class="line">    try:</span><br><span class="line">        messages = []</span><br><span class="line">        for input_text, response_text in chatbot:</span><br><span class="line">            messages.append({'role': 'user', 'content': input_text})</span><br><span class="line">            messages.append({'role': 'assistant', 'content': response_text})</span><br><span class="line"></span><br><span class="line">        messages.append({'role': 'user', 'content': user_input})</span><br><span class="line"></span><br><span class="line">        response = client.chat.completions.create(</span><br><span class="line">            model="gpt-4o",  # github模型</span><br><span class="line">            messages=messages,  # 包含用户的输入和对话历史</span><br><span class="line">            temperature=temp,  # 使用温度参数控制创造性</span><br><span class="line">            max_tokens=200,  # 控制输出的最大 token 数量</span><br><span class="line">        )</span><br><span class="line">        chatbot.append((user_input, response.choices[0].message.content))</span><br><span class="line"></span><br><span class="line">    except Exception as e:</span><br><span class="line">        print(f"发生错误：{e}")</span><br><span class="line">        chatbot.append((user_input, f"抱歉，发生了错误：{e}"))</span><br><span class="line">    return chatbot</span><br><span class="line"></span><br><span class="line"># 导出整个对话记录的函数</span><br><span class="line">def export_roleplay(chatbot: List[Tuple[str, str]], description: str) -&gt; None:</span><br><span class="line">    '''</span><br><span class="line">    * 参数:</span><br><span class="line"></span><br><span class="line">      - chatbot: 模型的对话记录，存储在元组列表中</span><br><span class="line"></span><br><span class="line">      - description: 此任务的描述</span><br><span class="line"></span><br><span class="line">    '''</span><br><span class="line">    target = {"chatbot": chatbot, "description": description}</span><br><span class="line">    with open("part2.json", "w") as file:</span><br><span class="line">        json.dump(target, file)</span><br><span class="line"></span><br><span class="line"># 进行第一次对话</span><br><span class="line">first_dialogue = interact_roleplay([], prompt_for_roleplay)</span><br><span class="line"></span><br><span class="line"># 生成 Gradio 的UI界面</span><br><span class="line">with gr.Blocks() as demo:</span><br><span class="line">    gr.Markdown(f"# 第2部分：角色扮演\n聊天机器人想和你玩一个角色扮演游戏，试着与它互动吧！")</span><br><span class="line">    chatbot = gr.Chatbot(value=first_dialogue)</span><br><span class="line">    description_textbox = gr.Textbox(label="机器人扮演的角色", interactive=False, value=f"{character_for_chatbot}")</span><br><span class="line">    input_textbox = gr.Textbox(label="输入", value="")</span><br><span class="line">    </span><br><span class="line">    with gr.Column():</span><br><span class="line">        gr.Markdown("# 温度调节\n温度用于控制聊天机器人的输出。温度越高，响应越具创造性。")</span><br><span class="line">        temperature_slider = gr.Slider(0.0, 2.0, 1.0, step=0.1, label="温度")</span><br><span class="line">    </span><br><span class="line">    with gr.Row():</span><br><span class="line">        sent_button = gr.Button(value="发送")</span><br><span class="line">        reset_button = gr.Button(value="重置")</span><br><span class="line">    </span><br><span class="line">    with gr.Column():</span><br><span class="line">        gr.Markdown("# 保存结果\n当你对结果满意后，点击导出按钮保存结果。")</span><br><span class="line">        export_button = gr.Button(value="导出")</span><br><span class="line"></span><br><span class="line">    # 连接按钮与函数</span><br><span class="line">    sent_button.click(interact_roleplay, inputs=[chatbot, input_textbox, temperature_slider], outputs=[chatbot])</span><br><span class="line">    reset_button.click(reset, outputs=[chatbot])</span><br><span class="line">    export_button.click(export_roleplay, inputs=[chatbot, description_textbox])</span><br><span class="line"></span><br><span class="line"># 启动 Gradio 界面</span><br><span class="line">demo.launch(debug=True)</span><br></pre></td></tr></tbody></table><h1 id="API测试"><a href="#API测试" class="headerlink" title="API测试"></a>API测试</h1><p>以chatGPT为例</p><table><tbody><tr><td class="code"><pre><span class="line"><span class="comment">#!pip install openai</span></span><br><span class="line"><span class="comment">#!pip install gradio</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> os</span><br><span class="line"><span class="keyword">import</span> json</span><br><span class="line"><span class="keyword">from</span> typing <span class="keyword">import</span> <span class="type">List</span>, <span class="type">Dict</span>, <span class="type">Tuple</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> openai</span><br><span class="line"><span class="keyword">import</span> gradio <span class="keyword">as</span> gr</span><br><span class="line"></span><br><span class="line"><span class="comment">###################################</span></span><br><span class="line"><span class="comment">#这里输入API和对应的网站</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line">OPENAI_API_KEY = <span class="string">"自己的API"</span></span><br><span class="line">OPENAI_API_WEB = <span class="string">"https://api.chatanywhere.tech"</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># 不设置则默认使用环境变量</span></span><br><span class="line"><span class="keyword">if</span> <span class="keyword">not</span> OPENAI_API_KEY:</span><br><span class="line">    OPENAI_API_KEY = os.getenv(<span class="string">'OPENAI_API_KEY'</span>)</span><br><span class="line"></span><br><span class="line">client = openai.OpenAI(</span><br><span class="line">    api_key=OPENAI_API_KEY,</span><br><span class="line">    base_url=OPENAI_API_WEB,</span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"><span class="comment"># 检查是否正确设置了 API</span></span><br><span class="line"><span class="comment"># 如果一切正常，你将看到 "API 设置成功！！"</span></span><br><span class="line"><span class="keyword">try</span>:</span><br><span class="line">    response = client.chat.completions.create(</span><br><span class="line">        model=<span class="string">"gpt-4o-mini"</span>,</span><br><span class="line">        messages=[{<span class="string">"role"</span>: <span class="string">"user"</span>, <span class="string">"content"</span>: <span class="string">"测试"</span>}],  <span class="comment"># 设置一个简单的测试消息</span></span><br><span class="line">        max_tokens=<span class="number">1</span>,</span><br><span class="line">    )</span><br><span class="line">    <span class="built_in">print</span>(<span class="string">"API 设置成功！！"</span>)  <span class="comment"># 输出成功信息</span></span><br><span class="line"><span class="keyword">except</span> Exception <span class="keyword">as</span> e:</span><br><span class="line">    <span class="built_in">print</span>(<span class="string">f"API 可能有问题，请检查：<span class="subst">{e}</span>"</span>)  <span class="comment"># 输出详细的错误信息</span></span><br></pre></td></tr></tbody></table><p>如果API配置无误则会返回</p><p><img src="/picture/image-20250118190732067.png" alt="image-20250118190732067"></p><h1 id="API"><a href="#API" class="headerlink" title="API"></a>API</h1><h3 id="github的api"><a href="#github的api" class="headerlink" title="github的api"></a>github的api</h3><p>免费，但有次数限制</p><p><a href="https://github.com/marketplace/models/azure-openai/gpt-4o">https://github.com/marketplace/models/azure-openai/gpt-4o</a></p><p>base_url&#x3D;”<a href="https://models.inference.ai.azure.com/">https://models.inference.ai.azure.com</a>“</p><h2 id="免费api"><a href="#免费api" class="headerlink" title="免费api"></a>免费api</h2><p>免费版支持gpt-3.5-turbo, embedding, gpt-4o-mini, gpt-4。其中gpt-4由于价格过高，每天限制3次调用（0点刷新）。需要更稳定快速的gpt-4请使用付费版</p><p><a href="https://github.com/chatanywhere/GPT_API_free?tab=readme-ov-file">https://github.com/chatanywhere/GPT_API_free?tab&#x3D;readme-ov-file</a></p><ul><li><strong>转发Host1: <code>https://api.chatanywhere.tech</code> (国内中转，延时更低)</strong></li><li><strong>转发Host2: <code>https://api.chatanywhere.org</code> (国外使用)</strong></li></ul>]]></content>
    
    
    <summary type="html">LLM大模型的概念参考视频：https:&amp;#x2F;&amp;#x2F;www.bilibili.com&amp;#x2F;video&amp;#x2F;BV1XS411w7qr?spm_id_from&amp;#x3D;333.788.videopod.episodes&amp;amp;vd_source&amp;#x3D;b938c9620af06f4224f5fd4db315cbd4&amp;amp;p&amp;#</summary>
    
    
    
    <category term="生物信息学" scheme="https://song-xudong.github.io/categories/%E7%94%9F%E7%89%A9%E4%BF%A1%E6%81%AF%E5%AD%A6/"/>
    
    
    <category term="AI" scheme="https://song-xudong.github.io/tags/AI/"/>
    
    <category term="机器学习" scheme="https://song-xudong.github.io/tags/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/"/>
    
  </entry>
  
  <entry>
    <title>国家奖学金</title>
    <link href="https://song-xudong.github.io/2025/01/14/%E5%9B%BD%E5%AE%B6%E5%A5%96%E5%AD%A6%E9%87%91/"/>
    <id>https://song-xudong.github.io/2025/01/14/%E5%9B%BD%E5%AE%B6%E5%A5%96%E5%AD%A6%E9%87%91/</id>
    <published>2025-01-14T07:04:28.000Z</published>
    <updated>2025-01-14T07:33:03.000Z</updated>
    
    <content type="html"><![CDATA[<h1 id="学生生涯的最大奖项"><a href="#学生生涯的最大奖项" class="headerlink" title="学生生涯的最大奖项"></a>学生生涯的最大奖项</h1><p>能拿到研究生国家奖学金对我来说也是莫大的荣幸，原本希望能够安稳的毕业，就算是我硕士求学的圆满结局了。</p><p>不曾想，恰好我和肖师姐的论文在暑假期间发表，凭借着二区SCI的的第一、二作者，让我也跻身到国家奖学金的行列，并且凭此获得了较多荣誉。</p><p>也是让家人有所期望，让我数年的求学生涯有所激励！</p><p>继续努力吧！😊</p><p>下个目标：顺利毕业+申请博士</p><p><img src="/picture/guojiang-17368395517361.jpg" alt="guojiang"></p>]]></content>
    
    
    <summary type="html">学生生涯的最大奖项能拿到研究生国家奖学金对我来说也是莫大的荣幸，原本希望能够安稳的毕业，就算是我硕士求学的圆满结局了。 不曾想，恰好我和肖师姐的论文在暑假期间发表，凭借着二区SCI的的第一、二作者，让我也跻身到国家奖学金的行列，并且凭此获得了较多荣誉。 也是让家人有所期望，让我数年的求学生涯有所激励</summary>
    
    
    
    <category term="生活" scheme="https://song-xudong.github.io/categories/%E7%94%9F%E6%B4%BB/"/>
    
    
    <category term="大学" scheme="https://song-xudong.github.io/tags/%E5%A4%A7%E5%AD%A6/"/>
    
    <category term="获奖" scheme="https://song-xudong.github.io/tags/%E8%8E%B7%E5%A5%96/"/>
    
  </entry>
  
  <entry>
    <title>宏基因组分析</title>
    <link href="https://song-xudong.github.io/2024/11/22/%E5%AE%8F%E5%9F%BA%E5%9B%A0%E7%BB%84/"/>
    <id>https://song-xudong.github.io/2024/11/22/%E5%AE%8F%E5%9F%BA%E5%9B%A0%E7%BB%84/</id>
    <published>2024-11-22T05:54:04.000Z</published>
    <updated>2024-11-22T06:05:24.000Z</updated>
    
    <content type="html"><![CDATA[<p>先使用rna-seq的环境</p><table><tbody><tr><td class="code"><pre><span class="line">conda activate rna_p3      </span><br></pre></td></tr></tbody></table><p>创建metagenomic分析环境</p><table><tbody><tr><td class="code"><pre><span class="line">conda create -n metagenomic</span><br><span class="line"></span><br><span class="line">conda activate metagenomic</span><br></pre></td></tr></tbody></table><p>安装kneaddata</p><table><tbody><tr><td class="code"><pre><span class="line">conda install -c biobakery kneaddata</span><br></pre></td></tr></tbody></table><h1 id="下载数据"><a href="#下载数据" class="headerlink" title="下载数据"></a>下载数据</h1><p>参考： <a href="https://blog.csdn.net/Mr_pork/article/details/139743229">https://blog.csdn.net/Mr_pork&#x2F;article&#x2F;details&#x2F;139743229</a></p><p>这是一个 人类的 结直肠癌 的宏基因组数据，我们选择其中的10个样进行分析</p><p>需要数据的文件名，使用prefetch 下载数据，该软件在rna-seq的流程中有</p><p>SRA.txt</p><p><img src="/picture/image-20240831151004127.png" alt="image-20240831151004127"></p><p>选择样本的metadata</p><p><img src="/picture/image-20240831151030690.png" alt="image-20240831151030690"></p><table><tbody><tr><td class="code"><pre><span class="line">nohup prefetch -f no --option-file SRA.txt &amp;</span><br></pre></td></tr></tbody></table><p>可以加 -O 选择输出路径</p><ul><li><code>-O|--output-directory &lt;目录&gt;</code>：保存文件的目录。</li></ul><p>直接在超算中输入命令下载，并没有使用sbatch提交作业命令</p><h3 id="查看后台任务"><a href="#查看后台任务" class="headerlink" title="查看后台任务"></a>查看后台任务</h3><table><tbody><tr><td class="code"><pre><span class="line">jobs</span><br><span class="line"></span><br><span class="line">#或者</span><br><span class="line">ps -f</span><br></pre></td></tr></tbody></table><h1 id="SRA文件转为FASTQ格式"><a href="#SRA文件转为FASTQ格式" class="headerlink" title="SRA文件转为FASTQ格式"></a>SRA文件转为FASTQ格式</h1><h2 id="单个转格式"><a href="#单个转格式" class="headerlink" title="单个转格式"></a>单个转格式</h2><p>慢的转换，太慢了，不建议使用</p><table><tbody><tr><td class="code"><pre><span class="line">#将当前</span><br><span class="line">fastq-dump --split-3 --gzip ./SRR12207279</span><br></pre></td></tr></tbody></table><p>使用这个转换，多线程转换格式，输出的为fq的文件</p><table><tbody><tr><td class="code"><pre><span class="line">fasterq-dump --split-3 ./SRR12207283</span><br></pre></td></tr></tbody></table><h2 id="批量转格式"><a href="#批量转格式" class="headerlink" title="批量转格式"></a>批量转格式</h2><p>小命令：删除当前目录SRR文件夹里的所有分文件夹，只保留其文件</p><table><tbody><tr><td class="code"><pre><span class="line">find ./SRR -mindepth 1 -type d -exec sh -c 'mv {}/* ./SRR; rmdir {}' \;</span><br></pre></td></tr></tbody></table><p>fasterq-dump进行批量转换，将所有 .sra 文件都放在SRR文件夹里</p><table><tbody><tr><td class="code"><pre><span class="line"><span class="comment"># 设置输入目录</span></span><br><span class="line">sra_dir<span class="operator">=</span><span class="string">"./SRR"</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># 设置输出目录</span></span><br><span class="line">output_dir<span class="operator">=</span><span class="string">"./fastq-result"</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># 遍历目录中的所有.sra文件</span></span><br><span class="line"><span class="keyword">for</span> sra_file <span class="keyword">in</span> <span class="operator">$</span>sra_dir<span class="operator">/</span><span class="operator">*</span>.sra</span><br><span class="line">do</span><br><span class="line">    <span class="comment"># 获取不带路径的文件名</span></span><br><span class="line">    filename<span class="operator">=</span><span class="operator">$</span><span class="punctuation">(</span>basename <span class="string">"$sra_file"</span> .sra<span class="punctuation">)</span></span><br><span class="line">    </span><br><span class="line">    <span class="comment"># 使用fasterq-dump处理每个文件</span></span><br><span class="line">    fasterq<span class="operator">-</span>dump <span class="operator">-</span><span class="operator">-</span>outdir <span class="string">"$output_dir"</span> <span class="operator">-</span><span class="operator">-</span>split<span class="operator">-</span><span class="number">3</span> <span class="string">"$sra_file"</span></span><br><span class="line">done</span><br></pre></td></tr></tbody></table><p>生成在当前路径 .&#x2F;fastq-result 下的 fastq 文件</p><h1 id="质控"><a href="#质控" class="headerlink" title="质控"></a>质控</h1><p>也只是类似的序列文件，质控流程和转录组相似</p><h2 id="fastp质控"><a href="#fastp质控" class="headerlink" title="fastp质控"></a>fastp质控</h2><p>此处为扩展学习</p><p>conda下载fastp</p><table><tbody><tr><td class="code"><pre><span class="line"># note: the fastp version in bioconda may be not the latest</span><br><span class="line">conda install -c bioconda fastp</span><br></pre></td></tr></tbody></table><p>单个</p><p>输入 -i -I 双端测序文件 ，输出 -o -O 质控处理后文件，和 json文件，fastp.html结果</p><table><tbody><tr><td class="code"><pre><span class="line">fastp -i in.R1.fq.gz -I in.R2.fq.gz -o out.R1.fq.gz -O out.R2.fq.gz</span><br></pre></td></tr></tbody></table><p>批量</p><table><tbody><tr><td class="code"><pre><span class="line"># 创建清理后的文件夹</span><br><span class="line">mkdir  clean-fastp</span><br><span class="line"></span><br><span class="line"># 设置工作目录为fastq文件所在的目录</span><br><span class="line">cd ./fastq-result/</span><br><span class="line"></span><br><span class="line"># 遍历所有以_1.fastq结尾的文件</span><br><span class="line">for file1 in *_1.fastq; do</span><br><span class="line">    # 从文件名中提取没有_1的部分</span><br><span class="line">    base=$(basename "$file1" _1.fastq)</span><br><span class="line">    </span><br><span class="line">    # 构建对应的_2.fastq文件名</span><br><span class="line">    file2="${base}_2.fastq"</span><br><span class="line">fileoo1="${base}_1.fq"</span><br><span class="line">fileoo2="${base}_2.fq"</span><br><span class="line">jsono="${base}.json"</span><br><span class="line">htmlo="${base}.html"</span><br><span class="line"></span><br><span class="line">    </span><br><span class="line">    # 在后台执行检查和trim_galore命令</span><br><span class="line">    (</span><br><span class="line">        # 检查对应的_2.fastq文件是否存在</span><br><span class="line">        if [ -e "$file2" ]; then</span><br><span class="line">            # 如果存在，执行trim_galore命令</span><br><span class="line">fastp -i "$file1"  -o ../clean-fastp/"$fileoo1" -I "$file2" -O ../clean-fastp/"$fileoo2"  --json  ../clean-fastp/"$jsono"  --html  ../clean-fastp/"$htmlo"</span><br><span class="line">                    else</span><br><span class="line">            # 如果不存在，打印错误信息</span><br><span class="line">            echo "Error: No matching file found for $file1"</span><br><span class="line">        fi</span><br><span class="line">    ) &amp;</span><br><span class="line">done</span><br><span class="line"></span><br><span class="line"># 等待所有后台进程完成</span><br><span class="line">wait</span><br></pre></td></tr></tbody></table><h2 id="multiqc汇总质控结果"><a href="#multiqc汇总质控结果" class="headerlink" title="multiqc汇总质控结果"></a>multiqc汇总质控结果</h2><table><tbody><tr><td class="code"><pre><span class="line">multiqc ./fastq-result/ -o   ./fastq-result/</span><br></pre></td></tr></tbody></table><p>这个结果看不看无所谓，fastp是强大的质控软件，只要输入文件无误，结果也不会有问题</p><p>.fq 结尾的文件即为之后进行分析的，清洗之后的序列文件</p><p><img src="/picture/image-20240831181023502.png" alt="image-20240831181023502"></p><h1 id="去宿主-方法一"><a href="#去宿主-方法一" class="headerlink" title="去宿主-方法一"></a>去宿主-方法一</h1><p>去宿主的过程其实就是将序列比对到宿主基因组上，然后没有比对到的序列整合成新文件就是去宿主后的了。宿主基因组需要自己先下载好并用 bowtie2-build 建立索引，以人类为例：</p><h2 id="构建索引"><a href="#构建索引" class="headerlink" title="构建索引"></a>构建索引</h2><p>在官网中找到自己的物种 <a href="https://hgdownload.soe.ucsc.edu/downloads.html">https://hgdownload.soe.ucsc.edu/downloads.html</a></p><p>人的基因组 hg38</p><p><a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.chromFa.tar.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.chromFa.tar.gz</a></p><p>鼠的基因组 mm10</p><p><a href="http://hgdownload.cse.ucsc.edu/goldenPath/mm10/bigZips/chromFa.tar.gz">http://hgdownload.cse.ucsc.edu/goldenPath/mm10/bigZips/chromFa.tar.gz</a></p><table><tbody><tr><td class="code"><pre><span class="line">wget http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.chromFa.tar.gz</span><br><span class="line">tar -zxvf chromFa.tar.gz </span><br><span class="line">cat *.fa &gt; hg38.fa</span><br><span class="line">bowtie2-build hg38.fa hg38</span><br></pre></td></tr></tbody></table><p>自己构建索引有点慢</p><h2 id="bowtie2比对-单个"><a href="#bowtie2比对-单个" class="headerlink" title="bowtie2比对,单个"></a>bowtie2比对,单个</h2><p>参考 <a href="https://www.jianshu.com/p/fe9c5cc7373e">https://www.jianshu.com/p/fe9c5cc7373e</a></p><table><tbody><tr><td class="code"><pre><span class="line">bowtie2 -p 20 -x /public/home/dk_szy/songxudong/metagenomic/db/hg37dec_v0.1 -1 data/fastp/${sample}_1.fq.gz \</span><br><span class="line">    -2 data/fastp/${sample}_2.fq.gz -S data/rm_human/${sample}.sam \</span><br><span class="line">    --un-conc data/rm_human/${sample}.fq --very-sensitive</span><br><span class="line">  rm data/rm_human/${sample}.sam</span><br></pre></td></tr></tbody></table><h2 id="bowtie2比对-批量"><a href="#bowtie2比对-批量" class="headerlink" title="bowtie2比对,批量"></a>bowtie2比对,批量</h2><table><tbody><tr><td class="code"><pre><span class="line"># 将工作目录设置为fastq文件所在的目录！！！！</span><br><span class="line">mkdir rm_human</span><br><span class="line"></span><br><span class="line">cd ./clean-fastp/</span><br><span class="line"></span><br><span class="line"># 将传入的参数赋值给变量！！！！！</span><br><span class="line">file1_pattern="_1.fq"</span><br><span class="line">file2_pattern="_2.fq"</span><br><span class="line"></span><br><span class="line"># 遍历所有以第一个参数模式结尾的文件</span><br><span class="line">for file1 in *${file1_pattern}; do</span><br><span class="line">    # 从文件名中提取去掉模式后的部分</span><br><span class="line">    base=$(basename "$file1" ${file1_pattern})</span><br><span class="line">    # 构建对应的第二个参数模式的文件名</span><br><span class="line">    file2="${base}${file2_pattern}"</span><br><span class="line"></span><br><span class="line">    # 在后台执行检查和trim_galore命令</span><br><span class="line">    (</span><br><span class="line">        # 检查对应的文件是否存在</span><br><span class="line">        if [ -e "$file2" ]; then</span><br><span class="line">echo "找到名为 "$base" 的文件 $file1 对应 $file2 "</span><br><span class="line">            # 如果存在，例如，执行trim_galore命令！！！！！！</span><br><span class="line">            bowtie2 -p 20 -x /public/home/dk_szy/songxudong/metagenomic/db/hg37dec_v0.1 -1 "$file1" \</span><br><span class="line">    -2  "$file2" -S ${base}.sam \</span><br><span class="line">    --un-conc ../rm_human/${base}.fq --very-sensitive</span><br><span class="line">  rm ${base}.sam</span><br><span class="line">            </span><br><span class="line">            </span><br><span class="line">        else</span><br><span class="line">            # 如果不存在，打印错误信息</span><br><span class="line">            echo "错误: 未找到与 $file1 匹配的文件"</span><br><span class="line">        fi</span><br><span class="line">    ) &amp;</span><br><span class="line">done</span><br><span class="line"></span><br><span class="line"># 等待所有后台进程完成</span><br><span class="line">wait</span><br></pre></td></tr></tbody></table><h2 id="使用hisat2直接用构建好的索引进行比对"><a href="#使用hisat2直接用构建好的索引进行比对" class="headerlink" title="#使用hisat2直接用构建好的索引进行比对"></a>#使用hisat2直接用构建好的索引进行比对</h2><p>尝试使用建立好的索引进行比对</p><table><tbody><tr><td class="code"><pre><span class="line"></span><br><span class="line">mkdir -p ./align/flag</span><br><span class="line">cd ./align/</span><br><span class="line">pwd</span><br><span class="line"></span><br><span class="line">##参考基因组的位置</span><br><span class="line">index='/public/home/dk_szy/songxudong/rna-test/reference/human-UCSC-hg38/hg38/genome'</span><br><span class="line"></span><br><span class="line"># 假设你的fastq文件在fastq-result文件夹中</span><br><span class="line">fastq_dir="../clean-fastp"</span><br><span class="line"></span><br><span class="line"># 遍历fastq-result文件夹中的所有1.fastq文件</span><br><span class="line">for file1 in $fastq_dir/*_1.fq; do</span><br><span class="line">    # 从1.fastq文件名中提取ID，并删除_1_val_1</span><br><span class="line">    id=$(basename "$file1" .fq | sed 's/_1_val_1//')</span><br><span class="line">    </span><br><span class="line">    # 查找对应的2.fastq文件</span><br><span class="line">    file2="$fastq_dir/${id}_2.fq"</span><br><span class="line">    </span><br><span class="line">    # 检查2.fastq文件是否存在</span><br><span class="line">    if [ -f "$file2" ]; then</span><br><span class="line">        echo "333#  ${id}  ！！！！！ is on the hisat2 Working !!!"</span><br><span class="line">        </span><br><span class="line">        # 使用hisat2进行比对，并指定输出目录为当前目录（./align/）</span><br><span class="line">        hisat2 -t -p 20  -x $index \</span><br><span class="line">            -1 "$file1" \</span><br><span class="line">            -2 "$file2"  -S  "${id}.sam" </span><br><span class="line">        </span><br><span class="line">        # sam2bam and remove sam，指定输出目录为当前目录（./align/）</span><br><span class="line">        echo -e " ${id} sam2bam and remove sam   "</span><br><span class="line">        samtools view -F 12 -@ 12 -b "./${id}.sam" &gt; "./${id}_sorted.bam"</span><br><span class="line">        rm "./${id}.sam"</span><br><span class="line">    else</span><br><span class="line">        echo "No matching 2.fastq file found for $file1"</span><br><span class="line">    fi</span><br><span class="line">done</span><br></pre></td></tr></tbody></table><h1 id="去宿主-方法二-kneaddata"><a href="#去宿主-方法二-kneaddata" class="headerlink" title="#去宿主-方法二 kneaddata"></a>#去宿主-方法二 kneaddata</h1><p>使用 kneaddata进行质控和去宿主</p><p>kneaddata自带的参考序列有限，主要为人和小鼠，其他的需要自己构建</p><table><tbody><tr><td class="code"><pre><span class="line">#查看有哪些数据库</span><br><span class="line">kneaddata_database</span><br></pre></td></tr></tbody></table><p>下载人的参考序列，速度似乎还可以</p><table><tbody><tr><td class="code"><pre><span class="line">mkdir -p db</span><br><span class="line"></span><br><span class="line">kneaddata_database --download human_genome bowtie2 db/</span><br></pre></td></tr></tbody></table><table><tbody><tr><td class="code"><pre><span class="line">kneaddata -i seq/C2_1.fq.gz -i seq/C2_2.fq.gz</span><br><span class="line"></span><br><span class="line"> -o qc/ -v -t 8 --remove-intermediate-output</span><br><span class="line"></span><br><span class="line"> --trimmomatic ~/.conda/envs/qc2/share/trimmomatic</span><br><span class="line"></span><br><span class="line"> --trimmomatic-options 'ILLUMINACLIP:~/.conda/envs/qc2/share/trimmomatic/adapters/TruSeq3-PE.fa:2:40:15 SLIDINGWINDOW:4:20 MINLEN:50'</span><br><span class="line"></span><br><span class="line"> --bowtie2-options '--very-sensitive --dovetail'</span><br><span class="line"></span><br><span class="line"> --bowtie2-options="--reorder"</span><br><span class="line"></span><br><span class="line"> -db db/Homo_sapiens</span><br><span class="line"></span><br></pre></td></tr></tbody></table><h1 id="物种注释：kraken2"><a href="#物种注释：kraken2" class="headerlink" title="物种注释：kraken2"></a>物种注释：kraken2</h1><p>参考：<a href="https://www.jianshu.com/p/fe9c5cc7373e">https://www.jianshu.com/p/fe9c5cc7373e</a></p><p>Kraken2是一个用于对高通量测序数据进行分类和标识物种的软件。它使用参考数据库中的基因组序列来进行分类，并使用k-mer方法来实现快速和准确的分类。</p><p>使用Kraken2进行基本分类的简单步骤：</p><p>安装Kraken2：可以从Kraken2官方网站下载并安装Kraken2软件。</p><table><tbody><tr><td class="code"><pre><span class="line">conda install bioconda::kraken2</span><br></pre></td></tr></tbody></table><p>准备参考数据库：Kraken2需要一个参考数据库，以便对测序数据进行分类。可以直接下载官方构建的标准库，也可以从NCBI、Ensembl或其他数据库下载相应的基因组序列，并使用Kraken2内置的工具来构建数据库。</p><p><code>--standard</code>标准模式下只下载5种数据库：古菌archaea、细菌bacteria、人类human、载体UniVec_Core、病毒viral。</p><table><tbody><tr><td class="code"><pre><span class="line">#超算中未成功，选择自行下载</span><br><span class="line">#kraken2-build --standard --threads 20 --db ./</span><br></pre></td></tr></tbody></table><p>选择自行下载网站： <a href="https://benlangmead.github.io/aws-indexes/k2">https://benlangmead.github.io/aws-indexes/k2</a></p><p>下载Standard 文件大小 90G</p><table><tbody><tr><td class="code"><pre><span class="line">#自行下载命令</span><br><span class="line">wget -c https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240605.tar.gz</span><br></pre></td></tr></tbody></table><p>#运行Kraken2：使用Kraken2对测序数据进行分类需要使用以下命令：</p><table><tbody><tr><td class="code"><pre><span class="line">kraken2 --db &lt;path_to_database&gt; &lt;input_file&gt; --output &lt;output_file&gt;</span><br></pre></td></tr></tbody></table><p>这里，<code>&lt;path_to_database&gt;</code>是参考数据库的路径，<code>&lt;input_file&gt;</code>是需要进行分类的输入文件，<code>&lt;output_file&gt;</code>是输出文件的名称。Kraken2将输出一个分类报告文件和一个序列文件。</p><p>需要注意的是kraken运行至少要提供数据库大小的内存大小（运行内存），因为它会把整个数据库载入内存后进行序列的注释，所以如果发现无法载入数据库的报错，可以尝试调大内存资源。</p><h2 id="单个比对"><a href="#单个比对" class="headerlink" title="单个比对"></a>单个比对</h2><p>第二行- -db 为存放参考索引的文件夹</p><table><tbody><tr><td class="code"><pre><span class="line">kraken2 --threads 20 \</span><br><span class="line">    --db /public/home/dk_szy/songxudong/metagenomic/test \</span><br><span class="line">    --confidence 0.05 \</span><br><span class="line">    --output ./result/test.output \</span><br><span class="line">    --report ./report/test.kreport \</span><br><span class="line">    --paired \</span><br><span class="line">    ../rm_human/ERR1018185.1.fq \</span><br><span class="line">    ../rm_human/ERR1018185.2.fq</span><br></pre></td></tr></tbody></table><h2 id="批量比对"><a href="#批量比对" class="headerlink" title="批量比对"></a>批量比对</h2><table><tbody><tr><td class="code"><pre><span class="line">mkdir kraken-result</span><br><span class="line"># 将工作目录设置为fastq文件所在的目录！！！！</span><br><span class="line">cd ./rm_human/</span><br><span class="line"></span><br><span class="line"># 将传入的参数赋值给变量！！！！！</span><br><span class="line">file1_pattern=".1.fq"</span><br><span class="line">file2_pattern=".2.fq"</span><br><span class="line"></span><br><span class="line"># 遍历所有以第一个参数模式结尾的文件</span><br><span class="line">for file1 in *${file1_pattern}; do</span><br><span class="line">    # 从文件名中提取去掉模式后的部分</span><br><span class="line">    base=$(basename "$file1" ${file1_pattern})</span><br><span class="line">    # 构建对应的第二个参数模式的文件名</span><br><span class="line">    file2="${base}${file2_pattern}"</span><br><span class="line"></span><br><span class="line">    # 在后台执行检查和trim_galore命令</span><br><span class="line">    (</span><br><span class="line">        # 检查对应的文件是否存在</span><br><span class="line">        if [ -e "$file2" ]; then</span><br><span class="line">echo "找到名为 "$base" 的文件 $file1 对应 $file2 "</span><br><span class="line">            # 如果存在，例如，执行trim_galore命令！！！！！！</span><br><span class="line">            #trim_galore -q 25 --phred33 --length 35 --stringency 3 --paired -o ../clean_data/ "$file1" "$file2"</span><br><span class="line">  kraken2 --threads 20 \</span><br><span class="line">    --db /public/home/dk_szy/songxudong/metagenomic/test \</span><br><span class="line">    --confidence 0.05 \</span><br><span class="line">    --output ../kraken-result/${base}.output \</span><br><span class="line">    --report ../kraken-result/${base}.kreport \</span><br><span class="line">    --paired \</span><br><span class="line">    ./${file1} \</span><br><span class="line">    ./${file2}</span><br><span class="line"></span><br><span class="line">echo "文件 "$base" 运行完毕 "</span><br><span class="line">        else</span><br><span class="line">            # 如果不存在，打印错误信息</span><br><span class="line">            echo "错误: 未找到与 $file1 匹配的文件"</span><br><span class="line">        fi</span><br><span class="line">    ) &amp;</span><br><span class="line">done</span><br><span class="line"></span><br><span class="line"># 等待所有后台进程完成</span><br><span class="line">wait</span><br></pre></td></tr></tbody></table><h1 id="物种组成及丰度估计"><a href="#物种组成及丰度估计" class="headerlink" title="物种组成及丰度估计"></a>物种组成及丰度估计</h1><p>只能说宏基因组的教程也太水了，未找到对结果的处理部分</p><p>使用教程：<a href="https://www.jianshu.com/nb/54122549">https://www.jianshu.com/nb/54122549</a></p><p>运行 bracken 进行各个分类水平物种丰度估计：</p><p>先安装 bracken</p><table><tbody><tr><td class="code"><pre><span class="line">conda install bioconda::bracken</span><br></pre></td></tr></tbody></table><p>运行 bracken 进行各个分类水平物种丰度估计：</p><table><tbody><tr><td class="code"><pre><span class="line">mkdir out</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"># 将工作目录设置为fastq文件所在的目录</span><br><span class="line">cd ./kraken-result/</span><br><span class="line"></span><br><span class="line"># 将传入的参数赋值给变量</span><br><span class="line">file1_pattern=".kreport"</span><br><span class="line"></span><br><span class="line"># 遍历所有以第一个参数模式结尾的文件</span><br><span class="line">for file1 in *${file1_pattern}; do</span><br><span class="line">    # 从文件名中提取去掉模式后的部分</span><br><span class="line">    base=$(basename "$file1" ${file1_pattern})</span><br><span class="line">    </span><br><span class="line">    # 直接执行trim_galore命令，不需要检查对应的file2是否存在</span><br><span class="line">    (</span><br><span class="line">        echo "找到名为 $base 的文件 $file1"</span><br><span class="line">        #循环执行代码区</span><br><span class="line">        # 运行bracken</span><br><span class="line">bracken \</span><br><span class="line">-d /public/home/dk_szy/songxudong/metagenomic/test \</span><br><span class="line">-i ${base}.kreport \</span><br><span class="line">-o ../out/${base}.bracken.S \</span><br><span class="line">-w ../out/${base}.bracken.S.kreport \</span><br><span class="line">-l S \</span><br><span class="line">-t 20 </span><br><span class="line">    ) &amp;</span><br><span class="line">done</span><br><span class="line"></span><br><span class="line"># 等待所有后台进程完成</span><br><span class="line">wait</span><br></pre></td></tr></tbody></table><h2 id="结果整理"><a href="#结果整理" class="headerlink" title="结果整理"></a>结果整理</h2><p>安装kraken-biom</p><table><tbody><tr><td class="code"><pre><span class="line">conda install bioconda::kraken-biom</span><br></pre></td></tr></tbody></table><table><tbody><tr><td class="code"><pre><span class="line"># report文件合并成biom格式</span><br><span class="line">kraken-biom \</span><br><span class="line">./out/*.kreport \</span><br><span class="line">--max D \</span><br><span class="line">-o ./out/S.biom</span><br><span class="line"></span><br><span class="line"># biom转count表格</span><br><span class="line"># 注意：这里假设convert是指向biom转换为tsv的工具，不是ImageMagick的convert</span><br><span class="line"># 如果是ImageMagick的convert，那么下面的命令是错误的</span><br><span class="line"># 正确的命令应该是直接使用biom工具进行转换</span><br><span class="line">biom convert \</span><br><span class="line">-i ./out/S.biom \</span><br><span class="line">-o ./out/S.count.tsv.tmp \</span><br><span class="line">--to-tsv \</span><br><span class="line">--header-key taxonomy</span><br><span class="line"></span><br><span class="line"># 输出文件格式调整，补全物种名</span><br><span class="line">sed 's/; g__; s__/; g__; s__ /' ./out/S.count.tsv.tmp \</span><br><span class="line">&gt; ./out/S.taxID.count.tsv</span><br><span class="line"></span><br><span class="line"># taxonID 替换回拉丁名</span><br><span class="line">sed '/^#/! s/^[[0-9]]+\t\(.*[A-Za-z]__\([^\t;]\+\)\)$/\2\t\1/' \</span><br><span class="line">./out/S.taxID.count.tsv &gt; ./out/S.taxName.count.tsv</span><br><span class="line"></span><br><span class="line"># 保留丰度信息，用于后续绘图</span><br><span class="line">sed '1d; 2s/^#//' ./out/S.taxName.count.tsv | \</span><br><span class="line">awk -F $'\t' -v 'OFS=\t' '{$NF = ""; print $0}' | \</span><br><span class="line">sed 's/\t$//' &gt; ./out/S.count.tsv</span><br></pre></td></tr></tbody></table><p>输出结果：<br>S.taxName.count.tsv 为包含物种分类信息的丰度文件,可以用此结果绘制物种堆积图<br>S.count.tsv 为去除物种分类信息的丰度文件</p><p>暂未准备绘图代码<br><img src="/picture/27313279-7f11db61e735cc99.png" alt="27313279-7f11db61e735cc99"></p><h1 id="a多样性"><a href="#a多样性" class="headerlink" title="a多样性"></a>a多样性</h1><p>教程中未具体讲解方法，但是提到 使用 vegan 及 phyloseq 两个 R 包进行多样性分析</p><p>使用16s做过的方法进行分析</p><p>OTU文件 S.count.tsv</p><p><img src="/picture/image-20240905155737677.png" alt="image-20240905155737677"></p><p>分组文件 group.txt</p><p><img src="/picture/image-20240905155808792.png" alt="image-20240905155808792"></p><p>R语言</p><table><tbody><tr><td class="code"><pre><span class="line">getwd<span class="punctuation">(</span><span class="punctuation">)</span></span><br><span class="line">setwd<span class="punctuation">(</span><span class="string">"D:/rtest/songtest/metagenomic"</span><span class="punctuation">)</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># 安装和加载必要的包</span></span><br><span class="line"><span class="keyword">if</span> <span class="punctuation">(</span><span class="operator">!</span>requireNamespace<span class="punctuation">(</span><span class="string">"BiocManager"</span><span class="punctuation">,</span> quietly <span class="operator">=</span> <span class="literal">TRUE</span><span class="punctuation">)</span><span class="punctuation">)</span></span><br><span class="line">  install.packages<span class="punctuation">(</span><span class="string">"BiocManager"</span><span class="punctuation">)</span></span><br><span class="line">BiocManager<span class="operator">::</span>install<span class="punctuation">(</span><span class="string">"phyloseq"</span><span class="punctuation">)</span></span><br><span class="line">rm<span class="punctuation">(</span><span class="built_in">list</span><span class="operator">=</span>ls<span class="punctuation">(</span><span class="punctuation">)</span><span class="punctuation">)</span></span><br><span class="line">library<span class="punctuation">(</span>vegan<span class="punctuation">)</span></span><br><span class="line">library<span class="punctuation">(</span>reshape2<span class="punctuation">)</span></span><br><span class="line">library<span class="punctuation">(</span>ggplot2<span class="punctuation">)</span></span><br><span class="line">library<span class="punctuation">(</span>ggpubr<span class="punctuation">)</span></span><br><span class="line">library<span class="punctuation">(</span>RColorBrewer<span class="punctuation">)</span></span><br><span class="line"></span><br><span class="line">data <span class="operator">&lt;-</span> read.delim<span class="punctuation">(</span><span class="string">"D:/rtest/songtest/metagenomic/S.count.txt"</span><span class="punctuation">,</span>header<span class="operator">=</span><span class="literal">TRUE</span><span class="punctuation">,</span>sep<span class="operator">=</span><span class="string">"\t"</span><span class="punctuation">,</span>row.names<span class="operator">=</span><span class="number">1</span><span class="punctuation">)</span></span><br><span class="line">group <span class="operator">&lt;-</span> read.delim<span class="punctuation">(</span><span class="string">"D:/rtest/songtest/metagenomic/group.txt"</span><span class="punctuation">)</span></span><br><span class="line"><span class="comment">#抽平</span></span><br><span class="line">otu <span class="operator">&lt;-</span> data</span><br><span class="line"><span class="comment">#求和查看每个样本的和</span></span><br><span class="line">colSums<span class="punctuation">(</span>data<span class="punctuation">)</span></span><br><span class="line"><span class="comment">#使用该代码进行抽平</span></span><br><span class="line">otu_Flattening <span class="operator">=</span> as.data.frame<span class="punctuation">(</span>t<span class="punctuation">(</span>rrarefy<span class="punctuation">(</span>t<span class="punctuation">(</span>otu<span class="punctuation">)</span><span class="punctuation">,</span> <span class="built_in">min</span><span class="punctuation">(</span>colSums<span class="punctuation">(</span>otu<span class="punctuation">)</span><span class="punctuation">)</span><span class="punctuation">)</span><span class="punctuation">)</span><span class="punctuation">)</span></span><br><span class="line"><span class="comment">#查看抽平后的每个样本的和</span></span><br><span class="line">colSums<span class="punctuation">(</span>otu_Flattening<span class="punctuation">)</span></span><br><span class="line">data <span class="operator">&lt;-</span> otu_Flattening</span><br><span class="line"></span><br><span class="line">ttdata <span class="operator">&lt;-</span> t<span class="punctuation">(</span>data<span class="punctuation">)</span></span><br><span class="line">data<span class="operator">&lt;-</span>data<span class="operator">/</span>apply<span class="punctuation">(</span>data<span class="punctuation">,</span><span class="number">2</span><span class="punctuation">,</span><span class="built_in">sum</span><span class="punctuation">)</span></span><br><span class="line">tdata<span class="operator">=</span>t<span class="punctuation">(</span>data<span class="punctuation">)</span></span><br><span class="line">a<span class="operator">&lt;-</span>as.data.frame<span class="punctuation">(</span>tdata<span class="punctuation">)</span></span><br><span class="line">a<span class="operator">=</span>as.data.frame<span class="punctuation">(</span>lapply<span class="punctuation">(</span>a<span class="punctuation">,</span><span class="built_in">as.numeric</span><span class="punctuation">)</span><span class="punctuation">)</span></span><br><span class="line"></span><br><span class="line">shannon<span class="operator">&lt;-</span>diversity<span class="punctuation">(</span>a<span class="punctuation">,</span>index<span class="operator">=</span><span class="string">"shannon"</span><span class="punctuation">)</span></span><br><span class="line">simpson<span class="operator">&lt;-</span>diversity<span class="punctuation">(</span>a<span class="punctuation">,</span>index<span class="operator">=</span><span class="string">"simpson"</span><span class="punctuation">)</span></span><br><span class="line">Chao1  <span class="operator">&lt;-</span> estimateR<span class="punctuation">(</span>ttdata<span class="punctuation">)</span><span class="punctuation">[</span><span class="number">2</span><span class="punctuation">,</span> <span class="punctuation">]</span></span><br><span class="line">ACE  <span class="operator">&lt;-</span> estimateR<span class="punctuation">(</span>ttdata<span class="punctuation">)</span><span class="punctuation">[</span><span class="number">4</span><span class="punctuation">,</span> <span class="punctuation">]</span></span><br><span class="line">invsimpson<span class="operator">&lt;-</span>diversity<span class="punctuation">(</span>a<span class="punctuation">,</span>index<span class="operator">=</span><span class="string">"invsimpson"</span><span class="punctuation">)</span></span><br><span class="line"></span><br><span class="line">data_shannon<span class="operator">=</span>data.frame<span class="punctuation">(</span>shannon<span class="punctuation">)</span></span><br><span class="line">data_simpson<span class="operator">=</span>data.frame<span class="punctuation">(</span>simpson<span class="punctuation">)</span></span><br><span class="line">data_Chao1<span class="operator">=</span>data.frame<span class="punctuation">(</span>Chao1<span class="punctuation">)</span></span><br><span class="line">data_ACE<span class="operator">=</span>data.frame<span class="punctuation">(</span>ACE<span class="punctuation">)</span></span><br><span class="line">data_invsimpson<span class="operator">=</span>data.frame<span class="punctuation">(</span>invsimpson<span class="punctuation">)</span></span><br><span class="line"></span><br><span class="line">spe_alpha<span class="operator">&lt;-</span>cbind<span class="punctuation">(</span></span><br><span class="line">  data_shannon<span class="punctuation">,</span></span><br><span class="line">  data_simpson<span class="punctuation">,</span></span><br><span class="line">  data_invsimpson<span class="punctuation">,</span></span><br><span class="line">  data_Chao1<span class="punctuation">,</span></span><br><span class="line">  data_ACE<span class="punctuation">,</span></span><br><span class="line">  group</span><br><span class="line"><span class="punctuation">)</span></span><br><span class="line"></span><br></pre></td></tr></tbody></table><h1 id="物种分类结果整理"><a href="#物种分类结果整理" class="headerlink" title="物种分类结果整理"></a>物种分类结果整理</h1><p>还是使用之前16s的方法</p><p>使用上面的 S.taxName.count.tsv 为包含物种分类信息的丰度文件,可以用此结果绘制物种堆积图</p><p><img src="/picture/image-20240921154706720.png" alt="image-20240921154706720"></p><table><tbody><tr><td class="code"><pre><span class="line">setwd<span class="punctuation">(</span><span class="string">"D:/rtest/songtest/metagenomic/lefse"</span><span class="punctuation">)</span></span><br><span class="line"></span><br><span class="line"><span class="comment">#taxon_data &lt;- read.csv("merged_otu_taxonomy.csv", stringsAsFactors = FALSE,row.names = 1)</span></span><br><span class="line">taxon_data <span class="operator">&lt;-</span> read.delim<span class="punctuation">(</span><span class="string">"D:/rtest/songtest/metagenomic/lefse/S.taxName.count.tsv"</span><span class="punctuation">,</span> row.names<span class="operator">=</span><span class="number">1</span><span class="punctuation">)</span></span><br><span class="line">taxon_data<span class="operator">$</span>kingdom <span class="operator">&lt;-</span> sapply<span class="punctuation">(</span>strsplit<span class="punctuation">(</span>taxon_data<span class="operator">$</span>taxonomy<span class="punctuation">,</span> <span class="string">";"</span><span class="punctuation">)</span><span class="punctuation">,</span> `[`<span class="punctuation">,</span> <span class="number">1</span><span class="punctuation">)</span></span><br><span class="line">taxon_data<span class="operator">$</span>phylum <span class="operator">&lt;-</span> sapply<span class="punctuation">(</span>strsplit<span class="punctuation">(</span>taxon_data<span class="operator">$</span>taxonomy<span class="punctuation">,</span> <span class="string">";"</span><span class="punctuation">)</span><span class="punctuation">,</span> `[`<span class="punctuation">,</span> <span class="number">2</span><span class="punctuation">)</span></span><br><span class="line">taxon_data<span class="operator">$</span><span class="built_in">class</span> <span class="operator">&lt;-</span> sapply<span class="punctuation">(</span>strsplit<span class="punctuation">(</span>taxon_data<span class="operator">$</span>taxonomy<span class="punctuation">,</span> <span class="string">";"</span><span class="punctuation">)</span><span class="punctuation">,</span> `[`<span class="punctuation">,</span> <span class="number">3</span><span class="punctuation">)</span></span><br><span class="line">taxon_data<span class="operator">$</span>order <span class="operator">&lt;-</span> sapply<span class="punctuation">(</span>strsplit<span class="punctuation">(</span>taxon_data<span class="operator">$</span>taxonomy<span class="punctuation">,</span> <span class="string">";"</span><span class="punctuation">)</span><span class="punctuation">,</span> `[`<span class="punctuation">,</span> <span class="number">4</span><span class="punctuation">)</span></span><br><span class="line">taxon_data<span class="operator">$</span>family <span class="operator">&lt;-</span> sapply<span class="punctuation">(</span>strsplit<span class="punctuation">(</span>taxon_data<span class="operator">$</span>taxonomy<span class="punctuation">,</span> <span class="string">";"</span><span class="punctuation">)</span><span class="punctuation">,</span> `[`<span class="punctuation">,</span> <span class="number">5</span><span class="punctuation">)</span></span><br><span class="line">taxon_data<span class="operator">$</span>genus <span class="operator">&lt;-</span> sapply<span class="punctuation">(</span>strsplit<span class="punctuation">(</span>taxon_data<span class="operator">$</span>taxonomy<span class="punctuation">,</span> <span class="string">";"</span><span class="punctuation">)</span><span class="punctuation">,</span> `[`<span class="punctuation">,</span> <span class="number">6</span><span class="punctuation">)</span></span><br><span class="line">taxon_data<span class="operator">$</span>species <span class="operator">&lt;-</span> sapply<span class="punctuation">(</span>strsplit<span class="punctuation">(</span>taxon_data<span class="operator">$</span>taxonomy<span class="punctuation">,</span> <span class="string">";"</span><span class="punctuation">)</span><span class="punctuation">,</span> `[`<span class="punctuation">,</span> <span class="number">7</span><span class="punctuation">)</span></span><br><span class="line"></span><br><span class="line">write.csv<span class="punctuation">(</span>taxon_data<span class="punctuation">,</span>file <span class="operator">=</span> <span class="string">"otu-result.csv"</span><span class="punctuation">)</span> <span class="comment">#保存</span></span><br></pre></td></tr></tbody></table><p>导出分类好的 otu-result.csv 文件</p><h2 id="得到每个水平的结果"><a href="#得到每个水平的结果" class="headerlink" title="得到每个水平的结果"></a>得到每个水平的结果</h2><p>界（Kingdom）、门（Phylum）、纲（Class）、目（Order）、科（Family）、属（Genus）、种（Species）</p><p>16s只能注释到 属（Genus）水平的结果，虽然我们有属水平的文件，但是是没有相应的可用结果</p><table><tbody><tr><td class="code"><pre><span class="line"><span class="comment"># 载入必要的库</span></span><br><span class="line">library<span class="punctuation">(</span>dplyr<span class="punctuation">)</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># 读取数据</span></span><br><span class="line">otu_data <span class="operator">&lt;-</span> read.csv<span class="punctuation">(</span><span class="string">"otu-result.csv"</span><span class="punctuation">,</span> header <span class="operator">=</span> <span class="literal">TRUE</span><span class="punctuation">,</span> row.names <span class="operator">=</span> <span class="number">1</span><span class="punctuation">)</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># 移除Confidence列</span></span><br><span class="line">otu_data <span class="operator">&lt;-</span> otu_data<span class="punctuation">[</span><span class="operator">!</span><span class="built_in">names</span><span class="punctuation">(</span>otu_data<span class="punctuation">)</span> <span class="operator">%in%</span> <span class="string">"Confidence"</span><span class="punctuation">]</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># 删除kingdom等于"Unassigned"的行</span></span><br><span class="line">otu_data <span class="operator">&lt;-</span> otu_data<span class="punctuation">[</span>otu_data<span class="operator">$</span>kingdom <span class="operator">!=</span> <span class="string">"Unassigned"</span><span class="punctuation">,</span> <span class="punctuation">]</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># 分类级别列表</span></span><br><span class="line">taxonomic_levels <span class="operator">&lt;-</span> <span class="built_in">c</span><span class="punctuation">(</span><span class="string">"phylum"</span><span class="punctuation">,</span> <span class="string">"class"</span><span class="punctuation">,</span> <span class="string">"order"</span><span class="punctuation">,</span> <span class="string">"family"</span><span class="punctuation">,</span> <span class="string">"genus"</span><span class="punctuation">,</span> <span class="string">"species"</span><span class="punctuation">)</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># 上一层级名称</span></span><br><span class="line">previous_level <span class="operator">&lt;-</span> <span class="string">"kingdom"</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># 遍历每个分类级别，进行分组和求和，并保存到CSV文件</span></span><br><span class="line"><span class="keyword">for</span> <span class="punctuation">(</span>i <span class="keyword">in</span> <span class="built_in">seq_along</span><span class="punctuation">(</span>taxonomic_levels<span class="punctuation">)</span><span class="punctuation">)</span> <span class="punctuation">{</span></span><br><span class="line">  current_level <span class="operator">&lt;-</span> taxonomic_levels<span class="punctuation">[</span>i<span class="punctuation">]</span></span><br><span class="line">  </span><br><span class="line">  <span class="comment"># 确保当前层级和上一层级列存在并且数据类型正确</span></span><br><span class="line">  otu_data<span class="punctuation">[[</span>current_level<span class="punctuation">]</span><span class="punctuation">]</span> <span class="operator">&lt;-</span> <span class="built_in">as.character</span><span class="punctuation">(</span>otu_data<span class="punctuation">[[</span>current_level<span class="punctuation">]</span><span class="punctuation">]</span><span class="punctuation">)</span></span><br><span class="line">  otu_data<span class="punctuation">[[</span>previous_level<span class="punctuation">]</span><span class="punctuation">]</span> <span class="operator">&lt;-</span> <span class="built_in">as.character</span><span class="punctuation">(</span>otu_data<span class="punctuation">[[</span>previous_level<span class="punctuation">]</span><span class="punctuation">]</span><span class="punctuation">)</span></span><br><span class="line">  </span><br><span class="line">  <span class="comment"># 对当前层级列进行分组并求和，同时保留上一层级信息</span></span><br><span class="line">  summary_data <span class="operator">&lt;-</span> otu_data <span class="operator">%&gt;%</span></span><br><span class="line">    group_by<span class="punctuation">(</span><span class="operator">!</span><span class="operator">!</span>sym<span class="punctuation">(</span>previous_level<span class="punctuation">)</span><span class="punctuation">,</span> <span class="operator">!</span><span class="operator">!</span>sym<span class="punctuation">(</span>current_level<span class="punctuation">)</span><span class="punctuation">)</span> <span class="operator">%&gt;%</span></span><br><span class="line">    summarise<span class="punctuation">(</span>across<span class="punctuation">(</span>where<span class="punctuation">(</span><span class="built_in">is.numeric</span><span class="punctuation">)</span><span class="punctuation">,</span> <span class="built_in">sum</span><span class="punctuation">,</span> na.rm <span class="operator">=</span> <span class="literal">TRUE</span><span class="punctuation">)</span><span class="punctuation">)</span> <span class="operator">%&gt;%</span></span><br><span class="line">    ungroup<span class="punctuation">(</span><span class="punctuation">)</span></span><br><span class="line">  </span><br><span class="line">  <span class="comment"># 生成文件名</span></span><br><span class="line">  csv_filename <span class="operator">&lt;-</span> paste0<span class="punctuation">(</span>previous_level<span class="punctuation">,</span> <span class="string">"_"</span><span class="punctuation">,</span> current_level<span class="punctuation">,</span> <span class="string">".csv"</span><span class="punctuation">)</span></span><br><span class="line">  </span><br><span class="line">  <span class="comment"># 保存结果到CSV文件</span></span><br><span class="line">  write.csv<span class="punctuation">(</span>summary_data<span class="punctuation">,</span> csv_filename<span class="punctuation">,</span> row.names <span class="operator">=</span> <span class="literal">FALSE</span><span class="punctuation">)</span></span><br><span class="line">  </span><br><span class="line">  <span class="comment"># 更新上一层级为当前层级</span></span><br><span class="line">  previous_level <span class="operator">&lt;-</span> current_level</span><br><span class="line"><span class="punctuation">}</span></span><br></pre></td></tr></tbody></table><p><img src="/picture/image-20240812191527453.png" alt="image-20240812191527453"></p><h1 id="基因组组装"><a href="#基因组组装" class="headerlink" title="基因组组装"></a>基因组组装</h1><p><a href="https://www.jianshu.com/p/77131fa96caa">https://www.jianshu.com/p/77131fa96caa</a></p><p><strong>使用 megahit 进行组装:</strong></p><p>安装</p><table><tbody><tr><td class="code"><pre><span class="line">conda activate metagenomic</span><br><span class="line"></span><br><span class="line">conda install bioconda::megahit</span><br></pre></td></tr></tbody></table><p>单个的示例</p><table><tbody><tr><td class="code"><pre><span class="line">megahit \</span><br><span class="line"><span class="number">-1</span> ./A1_1.fq.gz \ <span class="meta"># 输入，fq1</span></span><br><span class="line"><span class="number">-2</span> ./A1_2.fq.gz \ <span class="meta"># 输入，fq2</span></span><br><span class="line">--min-contig-len <span class="number">1000</span> \ <span class="meta"># contig最小长度</span></span><br><span class="line">--tmp-dir ./ \ <span class="meta"># 设置tmp目录</span></span><br><span class="line">--memory <span class="number">6</span> \ <span class="meta"># 内存占用</span></span><br><span class="line">--num-cpu-threads <span class="number">4</span> \ <span class="meta"># 线程数</span></span><br><span class="line">--<span class="keyword">out</span>-dir A1_megahit \ <span class="meta"># 输出目录</span></span><br><span class="line">--<span class="keyword">out</span>-prefix A1 <span class="meta"># 输出前缀</span></span><br><span class="line"><span class="meta">## 多组数据组装, 输入数据逗号分隔</span></span><br></pre></td></tr></tbody></table><h2 id="批量组装"><a href="#批量组装" class="headerlink" title="批量组装"></a>批量组装</h2><p>要求：输出文件夹会自动创建，开始时不需要存在</p><table><tbody><tr><td class="code"><pre><span class="line"># 将工作目录设置为fastq文件所在的目录！！！！</span><br><span class="line">cd ./fastq-result/</span><br><span class="line"></span><br><span class="line"># 将传入的参数赋值给变量！！！！！</span><br><span class="line">file1_pattern=".1.fq"</span><br><span class="line">file2_pattern=".2.fq"</span><br><span class="line"></span><br><span class="line">#${file1}</span><br><span class="line">#${file2}</span><br><span class="line">#${base}</span><br><span class="line"># 遍历所有以第一个参数模式结尾的文件</span><br><span class="line">for file1 in *${file1_pattern}; do</span><br><span class="line">    # 从文件名中提取去掉模式后的部分</span><br><span class="line">    base=$(basename "$file1" ${file1_pattern})</span><br><span class="line">    # 构建对应的第二个参数模式的文件名</span><br><span class="line">    file2="${base}${file2_pattern}"</span><br><span class="line"></span><br><span class="line">    # 在后台执行检查和trim_galore命令</span><br><span class="line">    (</span><br><span class="line">        # 检查对应的文件是否存在</span><br><span class="line">        if [ -e "$file2" ]; then</span><br><span class="line">echo "找到名为 "$base" 的文件 $file1 对应 $file2 "</span><br><span class="line">            # 如果存在，例如，执行trim_galore命令！！！！！！</span><br><span class="line">            #trim_galore -q 25 --phred33 --length 35 --stringency 3 --paired -o ../clean_data/ "$file1" "$file2"</span><br><span class="line">            megahit \</span><br><span class="line">-1 ./rm_human/${file1} \</span><br><span class="line">-2 ./rm_human/${file2} \</span><br><span class="line">--min-contig-len 1000 \</span><br><span class="line">--tmp-dir ./ \</span><br><span class="line">--memory 6 \</span><br><span class="line">--num-cpu-threads 20 \</span><br><span class="line">--out-dir ./megahit-result/  \</span><br><span class="line">--out-prefix ${base}</span><br><span class="line"></span><br><span class="line">        else</span><br><span class="line">            # 如果不存在，打印错误信息</span><br><span class="line">            echo "错误: 未找到与 $file1 匹配的文件"</span><br><span class="line">        fi</span><br><span class="line">    ) &amp;</span><br><span class="line">done</span><br><span class="line"></span><br><span class="line"># 等待所有后台进程完成</span><br><span class="line">wait</span><br></pre></td></tr></tbody></table><h1 id="Lefse"><a href="#Lefse" class="headerlink" title="Lefse"></a>Lefse</h1><p>参考： <a href="https://blog.csdn.net/a852232394/article/details/139296579?spm=1001.2101.3001.6650.2&utm_medium=distribute.pc_relevant.none-task-blog-2~default~YuanLiJiHua~Position-2-139296579-blog-126683847.235%5Ev43%5Econtrol&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2~default~YuanLiJiHua~Position-2-139296579-blog-126683847.235%5Ev43%5Econtrol&utm_relevant_index=5">https://blog.csdn.net/a852232394/article/details/139296579?spm=1001.2101.3001.6650.2&amp;utm_medium&#x3D;distribute.pc_relevant.none-task-blog-2%7Edefault%7EYuanLiJiHua%7EPosition-2-139296579-blog-126683847.235%5Ev43%5Econtrol&amp;depth_1-utm_source&#x3D;distribute.pc_relevant.none-task-blog-2%7Edefault%7EYuanLiJiHua%7EPosition-2-139296579-blog-126683847.235%5Ev43%5Econtrol&amp;utm_relevant_index&#x3D;5</a></p><p>需要3个输入文件</p><p>sample_table.csv<br><img src="/picture/image-20240812220639621.png" alt="image-20240812220639621"></p><p>feature_table.csv</p><p><img src="/picture/image-20240812220710558.png" alt="image-20240812220710558"></p><p>tax_table.csv</p><p><img src="/picture/image-20240812220801693.png" alt="image-20240812220801693"></p><table><tbody><tr><td class="code"><pre><span class="line">rm(list=ls())</span><br><span class="line">pacman::p_load(tidyverse,microeco,magrittr)</span><br><span class="line"></span><br><span class="line">feature_table &lt;- read.csv('feature_table.csv', row.names = 1)</span><br><span class="line">sample_table &lt;- read.csv('sample_table.csv', row.names = 1)</span><br><span class="line">tax_table &lt;- read.csv('tax_table.csv', row.names = 1)</span><br><span class="line"></span><br><span class="line">head(feature_table)[,1:6]; head(sample_table); head(tax_table)[,1:6]</span><br><span class="line"></span><br><span class="line">dataset &lt;- microtable$new(sample_table = sample_table,</span><br><span class="line">                          otu_table = feature_table, </span><br><span class="line">                          tax_table = tax_table)</span><br><span class="line">dataset</span><br><span class="line"></span><br><span class="line">lefse &lt;- trans_diff$new(dataset = dataset, </span><br><span class="line">                        method = "lefse", </span><br><span class="line">                        group = "Group", </span><br><span class="line">                        #过少可增大下面选项</span><br><span class="line">                        alpha = 0.1, </span><br><span class="line">                        lefse_subgroup = NULL)</span><br><span class="line"></span><br><span class="line">write.csv(lefse$res_diff,file = "lefse-lda.csv") #保存</span><br><span class="line">write.csv(lefse$abund_table,file = "lefse-input.csv") #保存</span><br></pre></td></tr></tbody></table><p>分析出初步结果</p><p>lefse-lda.csv LDA，可自行画图</p><p>lefse-input.csv lefse输入文件，可在在线网站进行可视化或者R可视化</p><h2 id="在线绘制"><a href="#在线绘制" class="headerlink" title="在线绘制"></a>在线绘制</h2><p>复制 lefse-input.csv 内容，进行修改</p><p><a href="https://www.bic.ac.cn/BIC/#/">https://www.bic.ac.cn/BIC/#/</a></p><p>找到lefse选项</p><p>主要是更改第一行的列名，定义分组</p><p><img src="/picture/image-20240813092603213.png" alt="image-20240813092603213"></p><p>检查数据可用后可直接出结果</p><p>（某些未知的菌可在原始数据中提前删除）</p><p><img src="/picture/image-20240813092855547.png" alt="image-20240813092855547"></p><p><img src="/picture/image-20240813092948111.png" alt="image-20240813092948111"></p><p>在线网站的编辑功能好像更强大了，作图方便建议采用</p><h2 id="R绘制"><a href="#R绘制" class="headerlink" title="R绘制"></a>R绘制</h2><p><a href="https://blog.csdn.net/a852232394/article/details/139296579?spm=1001.2101.3001.6650.2&utm_medium=distribute.pc_relevant.none-task-blog-2~default~YuanLiJiHua~Position-2-139296579-blog-126683847.235%5Ev43%5Econtrol&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2~default~YuanLiJiHua~Position-2-139296579-blog-126683847.235%5Ev43%5Econtrol&utm_relevant_index=5">https://blog.csdn.net/a852232394/article/details/139296579?spm=1001.2101.3001.6650.2&amp;utm_medium&#x3D;distribute.pc_relevant.none-task-blog-2<del>default</del>YuanLiJiHua<del>Position-2-139296579-blog-126683847.235%5Ev43%5Econtrol&amp;depth_1-utm_source&#x3D;distribute.pc_relevant.none-task-blog-2</del>default<del>YuanLiJiHua</del>Position-2-139296579-blog-126683847.235%5Ev43%5Econtrol&amp;utm_relevant_index&#x3D;5</a></p><p>使用 microeco包 自带的绘图，并不算好看，可进行美化或使用原始数据自行绘图</p><p>LDA</p><table><tbody><tr><td class="code"><pre><span class="line">##use_number 为显示的个数</span><br><span class="line">##group_order 为自己的分组</span><br><span class="line">lefse$plot_diff_bar(use_number = 1:20, </span><br><span class="line">                    width = 0.8, </span><br><span class="line">                    group_order = c("subject-1", "subject-2"))</span><br></pre></td></tr></tbody></table><p><img src="/picture/image-20240813093433231.png" alt="image-20240813093433231"></p><p>lefse</p><table><tbody><tr><td class="code"><pre><span class="line">library(ggtree)</span><br><span class="line">lefse$plot_diff_cladogram(use_taxa_num = 200, </span><br><span class="line">                       use_feature_num = 50, </span><br><span class="line">                       clade_label_level = 5, </span><br><span class="line">                       group_order =  c("subject-1", "subject-2"))</span><br><span class="line"></span><br></pre></td></tr></tbody></table><p><img src="/picture/image-20240813093404407.png" alt="image-20240813093404407"></p>]]></content>
    
    
    <summary type="html">先使用rna-seq的环境 conda activate rna_p3 创建metagenomic分析环境 conda create -n metagenomicconda activate metagenomic 安装kneaddata conda install -c bioba</summary>
    
    
    
    <category term="生物信息学" scheme="https://song-xudong.github.io/categories/%E7%94%9F%E7%89%A9%E4%BF%A1%E6%81%AF%E5%AD%A6/"/>
    
    
    <category term="生物信息学" scheme="https://song-xudong.github.io/tags/%E7%94%9F%E7%89%A9%E4%BF%A1%E6%81%AF%E5%AD%A6/"/>
    
    <category term="宏基因组" scheme="https://song-xudong.github.io/tags/%E5%AE%8F%E5%9F%BA%E5%9B%A0%E7%BB%84/"/>
    
  </entry>
  
  <entry>
    <title>翻译组分析流程</title>
    <link href="https://song-xudong.github.io/2024/11/18/Ribo-seq%E7%BF%BB%E8%AF%91%E7%BB%84%E5%AD%A6/"/>
    <id>https://song-xudong.github.io/2024/11/18/Ribo-seq%E7%BF%BB%E8%AF%91%E7%BB%84%E5%AD%A6/</id>
    <published>2024-11-18T04:07:58.000Z</published>
    <updated>2024-11-18T04:15:50.000Z</updated>
    
    <content type="html"><![CDATA[<h1 id="Ribo-seq的介绍"><a href="#Ribo-seq的介绍" class="headerlink" title="Ribo-seq的介绍"></a>Ribo-seq的介绍</h1><p><a href="https://www.cell.com/cell-metabolism/fulltext/S1550-4131(22)00541-1?uuid=uuid:1357b65f-e2ff-45e2-a40c-7a90f3170be5#mmc2">https://www.cell.com/cell-metabolism/fulltext/S1550-4131(22)00541-1?uuid=uuid%3A1357b65f-e2ff-45e2-a40c-7a90f3170be5#mmc2</a></p><p>核糖体分析 (Ribo-seq) 和蛋白质基因组学的最新进展已经鉴定出数千种未注释的肽和小蛋白质、微生物蛋白质 (MP)，由哺乳动物基因组中的小开放阅读框 (smORF) 编码。</p><p>核糖体分析，也称为 Ribo-seq，可生成核糖体保护 RNA 片段 (RPF) 的全基因组分配和定量 ，从而提供整个转录组的翻译（翻译组）的实时快照。</p><p>RFs（Ribosome footprints）：核糖体足迹</p><p>使用rna-seq的环境</p><table><tbody><tr><td class="code"><pre><span class="line">conda activate rna_p3</span><br></pre></td></tr></tbody></table><h1 id="测试"><a href="#测试" class="headerlink" title="测试"></a>测试</h1><p>参考文章</p><p><a href="https://www.sciencedirect.com/science/article/pii/S1525001621001337?via=ihub#mmc1">https://www.sciencedirect.com/science/article/pii/S1525001621001337?via%3Dihub#mmc1</a></p><p>参考数据</p><p><a href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE155899">https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE155899</a></p><h2 id="后台批量下载"><a href="#后台批量下载" class="headerlink" title="后台批量下载"></a>后台批量下载</h2><table><tbody><tr><td class="code"><pre><span class="line">nohup prefetch -f no --option-file SRA.txt &amp;</span><br></pre></td></tr></tbody></table><h1 id="SRA文件转为FASTQ格式"><a href="#SRA文件转为FASTQ格式" class="headerlink" title="SRA文件转为FASTQ格式"></a>SRA文件转为FASTQ格式</h1><h2 id="单个转格式"><a href="#单个转格式" class="headerlink" title="单个转格式"></a>单个转格式</h2><p>慢的转换，太慢了，不建议使用</p><table><tbody><tr><td class="code"><pre><span class="line">#将当前</span><br><span class="line">fastq-dump --split-3 --gzip ./SRR12207279</span><br></pre></td></tr></tbody></table><p>使用这个转换，多线程转换格式，输出的为fq的文件</p><table><tbody><tr><td class="code"><pre><span class="line">fasterq-dump --split-3 ./SRR12207283</span><br></pre></td></tr></tbody></table><h2 id="批量转格式"><a href="#批量转格式" class="headerlink" title="批量转格式"></a>批量转格式</h2><p>小命令：删除当前目录SRR文件夹里的所有分文件夹，只保留其文件</p><table><tbody><tr><td class="code"><pre><span class="line">find ./SRR -mindepth 1 -type d -exec sh -c 'mv {}/* ./SRR; rmdir {}' \;</span><br></pre></td></tr></tbody></table><p>fasterq-dump进行批量转换，将所有 .sra 文件都放在SRR文件夹里</p><table><tbody><tr><td class="code"><pre><span class="line"><span class="comment"># 设置输入目录</span></span><br><span class="line">sra_dir<span class="operator">=</span><span class="string">"./SRR"</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># 设置输出目录</span></span><br><span class="line">output_dir<span class="operator">=</span><span class="string">"./fastq-result"</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># 遍历目录中的所有.sra文件</span></span><br><span class="line"><span class="keyword">for</span> sra_file <span class="keyword">in</span> <span class="operator">$</span>sra_dir<span class="operator">/</span><span class="operator">*</span>.sra</span><br><span class="line">do</span><br><span class="line">    <span class="comment"># 获取不带路径的文件名</span></span><br><span class="line">    filename<span class="operator">=</span><span class="operator">$</span><span class="punctuation">(</span>basename <span class="string">"$sra_file"</span> .sra<span class="punctuation">)</span></span><br><span class="line">    </span><br><span class="line">    <span class="comment"># 使用fasterq-dump处理每个文件</span></span><br><span class="line">    fasterq<span class="operator">-</span>dump <span class="operator">-</span><span class="operator">-</span>outdir <span class="string">"$output_dir"</span> <span class="operator">-</span><span class="operator">-</span>split<span class="operator">-</span><span class="number">3</span> <span class="string">"$sra_file"</span></span><br><span class="line">done</span><br></pre></td></tr></tbody></table><p>生成在当前路径 .&#x2F;fastq-result 下的 fastq 文件</p><h1 id="下载rRNA序列"><a href="#下载rRNA序列" class="headerlink" title="下载rRNA序列"></a>下载rRNA序列</h1><p>参考 <a href="https://www.jianshu.com/p/10477f96f12e">https://www.jianshu.com/p/10477f96f12e</a></p><p><img src="/picture/image-20240905223242161.png" alt="image-20240905223242161"></p><p><img src="/picture/image-20240905223210970.png" alt="image-20240905223210970"></p><h1 id="cutadapt过滤序列"><a href="#cutadapt过滤序列" class="headerlink" title="cutadapt过滤序列"></a>cutadapt过滤序列</h1><p>-u 4 \ 可能并不需要</p><table><tbody><tr><td class="code"><pre><span class="line">mkdir cutadapt-result</span><br><span class="line"># 将工作目录设置为fastq文件所在的目录</span><br><span class="line">cd ./fastq-result/</span><br><span class="line"></span><br><span class="line"># 将传入的参数赋值给变量</span><br><span class="line">file1_pattern=".fastq"</span><br><span class="line"></span><br><span class="line"># 遍历所有以第一个参数模式结尾的文件</span><br><span class="line">for file1 in *${file1_pattern}; do</span><br><span class="line">    # 从文件名中提取去掉模式后的部分</span><br><span class="line">    base=$(basename "$file1" ${file1_pattern})</span><br><span class="line">    </span><br><span class="line">    # 直接执行trim_galore命令，不需要检查对应的file2是否存在</span><br><span class="line">    (</span><br><span class="line">        echo "找到名为 $base 的文件 $file1"</span><br><span class="line">        #循环执行代码区</span><br><span class="line">        #trim_galore -q 25 --phred33 --length 35 --stringency 3 --paired -o ../clean_data/ "$file1" "$file2"</span><br><span class="line">cutadapt -j 20 \</span><br><span class="line">  -a "TGGAATTCTCGGGTGCCAAGG" \</span><br><span class="line">  -u 4 \</span><br><span class="line">  -m 24 \</span><br><span class="line">  -M 35 \</span><br><span class="line">  -q 20 \</span><br><span class="line">  --match-read-wildcards \</span><br><span class="line">  --max-n 0.25 \</span><br><span class="line">  -o ../cutadapt-result/${base}_clear.fastq \</span><br><span class="line">  ./${file1}</span><br><span class="line">    ) &amp;</span><br><span class="line">done</span><br><span class="line"></span><br><span class="line"># 等待所有后台进程完成</span><br><span class="line">wait</span><br></pre></td></tr></tbody></table><h1 id="建立rRNA索引"><a href="#建立rRNA索引" class="headerlink" title="建立rRNA索引"></a>建立rRNA索引</h1><table><tbody><tr><td class="code"><pre><span class="line">bowtie2-build rRNA.fasta rattrna</span><br></pre></td></tr></tbody></table><h2 id="bowtie2比对，删除rRNA序列"><a href="#bowtie2比对，删除rRNA序列" class="headerlink" title="bowtie2比对，删除rRNA序列"></a>bowtie2比对，删除rRNA序列</h2><table><tbody><tr><td class="code"><pre><span class="line">mkdir bowtie-resule</span><br><span class="line"># 将工作目录设置为fastq文件所在的目录</span><br><span class="line">cd ./cutadapt-result/</span><br><span class="line"></span><br><span class="line"># 将传入的参数赋值给变量</span><br><span class="line">file1_pattern="_clear.fastq"</span><br><span class="line"></span><br><span class="line"># 遍历所有以第一个参数模式结尾的文件</span><br><span class="line">for file1 in *${file1_pattern}; do</span><br><span class="line">    # 从文件名中提取去掉模式后的部分</span><br><span class="line">    base=$(basename "$file1" ${file1_pattern})</span><br><span class="line">    </span><br><span class="line">    # 直接执行trim_galore命令，不需要检查对应的file2是否存在</span><br><span class="line">    (</span><br><span class="line">        echo "找到名为 $base 的文件 $file1"</span><br><span class="line">        #循环执行代码区</span><br><span class="line">        #trim_galore -q 25 --phred33 --length 35 --stringency 3 --paired -o ../clean_data/ "$file1" "$file2"</span><br><span class="line">bowtie2 -x /public/home/dk_szy/songxudong/riboseq/rRNA/rattrna --un-gz ../bowtie-resule/${base}.fastq.gz -U ./${file1} -p 20 -S ../bowtie-resule/${base}.sam</span><br><span class="line"></span><br><span class="line">    ) &amp;</span><br><span class="line">done</span><br><span class="line"></span><br><span class="line"># 等待所有后台进程完成</span><br><span class="line">wait</span><br></pre></td></tr></tbody></table><h1 id="STAR比对到测序物种基因组"><a href="#STAR比对到测序物种基因组" class="headerlink" title="STAR比对到测序物种基因组"></a>STAR比对到测序物种基因组</h1><p>按照网上教程所说，STAR的运行速度是最快的，只是对性能和内存的要求比较高</p><p>参考： <a href="https://www.jianshu.com/p/5b6dfc954315">https://www.jianshu.com/p/5b6dfc954315</a></p><h2 id="安装"><a href="#安装" class="headerlink" title="安装"></a>安装</h2><table><tbody><tr><td class="code"><pre><span class="line">conda install bioconda::star</span><br></pre></td></tr></tbody></table><h2 id="下载参考基因组"><a href="#下载参考基因组" class="headerlink" title="下载参考基因组"></a>下载参考基因组</h2><p>参考物种为大鼠： <a href="https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=10116">Rattus norvegicus</a></p><p>在ensebml中下载： <a href="https://asia.ensembl.org/Rattus_norvegicus/Info/Index">https://asia.ensembl.org/Rattus_norvegicus&#x2F;Info&#x2F;Index</a></p><p><img src="/picture/image-20240909084551111.png" alt="image-20240909084551111"></p><p>下载 Rattus_norvegicus.mRatBN7.2.dna.toplevel.fa.gz</p><p>Rattus_norvegicus.mRatBN7.2.112.gtf.gz</p><table><tbody><tr><td class="code"><pre><span class="line">#后台下载</span><br><span class="line">nohup wget -c https://ftp.ensembl.org/pub/release-112/fasta/rattus_norvegicus/dna/Rattus_norvegicus.mRatBN7.2.dna.toplevel.fa.gz &amp;</span><br><span class="line">nohup wget -c https://ftp.ensembl.org/pub/release-112/gtf/rattus_norvegicus/Rattus_norvegicus.mRatBN7.2.112.gtf.gz &amp;</span><br></pre></td></tr></tbody></table><h2 id="构建索引"><a href="#构建索引" class="headerlink" title="构建索引"></a>构建索引</h2><p>将参考基因组和注释解压缩</p><p>先统计读取序列的最大长度，决定STAR 的 –sjdbOverhang 参数</p><p><a href="https://blog.csdn.net/qazplm12_3/article/details/119687084">https://blog.csdn.net/qazplm12_3&#x2F;article&#x2F;details&#x2F;119687084</a></p><table><tbody><tr><td class="code"><pre><span class="line">conda install -c bioconda seqkit</span><br></pre></td></tr></tbody></table><table><tbody><tr><td class="code"><pre><span class="line">seqkit stat *.fastq</span><br></pre></td></tr></tbody></table><p><img src="/picture/image-20240909100054216.png" alt="image-20240909100054216"></p><p>可见我们的翻译组最大读长为 31，我们设置 –sjdbOverhang 30</p><p><a href="https://www.jianshu.com/p/9bdad4a4f98f">https://www.jianshu.com/p/9bdad4a4f98f</a></p><table><tbody><tr><td class="code"><pre><span class="line">gzip -c -d Rattus_norvegicus.mRatBN7.2.112.gtf.gz</span><br><span class="line">gzip -c -d Rattus_norvegicus.mRatBN7.2.dna.toplevel.fa.gz</span><br></pre></td></tr></tbody></table><table><tbody><tr><td class="code"><pre><span class="line">cd reference</span><br><span class="line"></span><br><span class="line">STAR \</span><br><span class="line">    --runMode genomeGenerate \</span><br><span class="line">    --runThreadN 20 \</span><br><span class="line">    --genomeDir ./ \</span><br><span class="line">    --genomeFastaFiles ./Rattus_norvegicus.mRatBN7.2.dna.toplevel.fa \</span><br><span class="line">    --sjdbGTFfile ./Rattus_norvegicus.mRatBN7.2.112.gtf \</span><br><span class="line">    --sjdbOverhang 30</span><br></pre></td></tr></tbody></table><p>花费20分钟，较其他的软件确实快点</p><p>结果，包括序列和注释18个文件</p><p><img src="/picture/image-20240910152808887.png" alt="image-20240910152808887"></p><h2 id="序列比对"><a href="#序列比对" class="headerlink" title="序列比对"></a>序列比对</h2><p>参数参考文章</p><p>单个</p><table><tbody><tr><td class="code"><pre><span class="line">STAR --outSAMtype BAM SortedByCoordinate \</span><br><span class="line">--runThreadN 20 \</span><br><span class="line">--genomeDir ./reference \</span><br><span class="line">--outFilterMismatchNmax 2 \</span><br><span class="line">--outFilterMultimapNmax 5 \</span><br><span class="line">--outFilterMatchNmin 16 \</span><br><span class="line">--alignEndsType EndToEnd \</span><br><span class="line">--readFilesIn  ./bowtie-resule/SRR12414240.fastq \</span><br><span class="line">--outFileNamePrefix ./star-result/star_output_</span><br></pre></td></tr></tbody></table><p>批量</p><table><tbody><tr><td class="code"><pre><span class="line"># 将工作目录设置为fastq文件所在的目录</span><br><span class="line">cd ./bowtie-resule/</span><br><span class="line"></span><br><span class="line"># 将传入的参数赋值给变量</span><br><span class="line">file1_pattern=".fastq"</span><br><span class="line"></span><br><span class="line"># 遍历所有以第一个参数模式结尾的文件</span><br><span class="line">for file1 in *${file1_pattern}; do</span><br><span class="line">    # 从文件名中提取去掉模式后的部分</span><br><span class="line">    base=$(basename "$file1" ${file1_pattern})</span><br><span class="line">    </span><br><span class="line">    # 直接执行trim_galore命令，不需要检查对应的file2是否存在</span><br><span class="line">    (</span><br><span class="line">        echo "找到名为 $base 的文件 $file1"</span><br><span class="line">        #循环执行代码区</span><br><span class="line">        #trim_galore -q 25 --phred33 --length 35 --stringency 3 --paired -o ../clean_data/ "$file1" "$file2"</span><br><span class="line">        STAR --outSAMtype BAM SortedByCoordinate \</span><br><span class="line">--runThreadN 20 \</span><br><span class="line">--genomeDir ../reference \</span><br><span class="line">--outFilterMismatchNmax 2 \</span><br><span class="line">--outFilterMultimapNmax 5 \</span><br><span class="line">--outFilterMatchNmin 16 \</span><br><span class="line">--alignEndsType EndToEnd \</span><br><span class="line">--quantMode TranscriptomeSAM \</span><br><span class="line">--readFilesIn  ./${file1} \</span><br><span class="line">--outFileNamePrefix ../star-result/${base}</span><br><span class="line">    ) &amp;</span><br><span class="line">done</span><br><span class="line"></span><br><span class="line"># 等待所有后台进程完成</span><br><span class="line">wait</span><br></pre></td></tr></tbody></table><p>得到的 SRR12414240Aligned.toTranscriptome.out.bam</p><p>是我们需要的转录本 ribocode 的输入文件</p><h1 id="生成表达矩阵"><a href="#生成表达矩阵" class="headerlink" title="生成表达矩阵"></a>生成表达矩阵</h1><table><tbody><tr><td class="code"><pre><span class="line">gtf='/public/home/dk_szy/songxudong/rna-test/reference/gft/mm10.refGene.gtf.gz'</span><br><span class="line"></span><br><span class="line">mkdir  -p  ./counts</span><br><span class="line"></span><br><span class="line">cd ./counts</span><br><span class="line"></span><br><span class="line">pwd</span><br><span class="line"></span><br><span class="line">featureCounts -T  20  -p  -a  $gtf  -o  counts.txt  ../align/*.bam</span><br><span class="line"></span><br><span class="line">multiqc ./</span><br><span class="line"></span><br><span class="line">echo -e " \n \n \n ALL WORK DONE !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!  \n "</span><br></pre></td></tr></tbody></table><h1 id="Ribocode分析"><a href="#Ribocode分析" class="headerlink" title="Ribocode分析"></a>Ribocode分析</h1><p>参考官方流程: <a href="https://github.com/zhengtaoxiao/RiboCode">https://github.com/zhengtaoxiao/RiboCode</a></p><p>清华大学教程：<a href="https://book.ncrnalab.org/teaching/part-iii.-ngs-data-analyses/7.rna-regulation-ii/ribo_seq">https://book.ncrnalab.org/teaching/part-iii.-ngs-data-analyses/7.rna-regulation-ii/ribo_seq</a></p><h2 id="conda安装"><a href="#conda安装" class="headerlink" title="conda安装"></a>conda安装</h2><p>#不使用，conda安装使用软件时依旧有报错，参考下面通过源码安装</p><p>软件对python的版本有要求，单独创建环境</p><table><tbody><tr><td class="code"><pre><span class="line">conda create -n ribocode python=2.7</span><br></pre></td></tr></tbody></table><p>打开环境</p><table><tbody><tr><td class="code"><pre><span class="line">conda activate ribocode</span><br></pre></td></tr></tbody></table><p>安装Ribocode</p><table><tbody><tr><td class="code"><pre><span class="line">conda install -c bioconda ribocode</span><br></pre></td></tr></tbody></table><h3 id="补充修改"><a href="#补充修改" class="headerlink" title="补充修改"></a>补充修改</h3><p>注意：该软件使用不成功主要为兼容性问题，我采取了conda指定 python&#x3D;2.7，并且安装 numpy&#x3D;1.16.5 的方式</p><table><tbody><tr><td class="code"><pre><span class="line">conda update h5py</span><br><span class="line">conda install numpy=1.16.5</span><br></pre></td></tr></tbody></table><h2 id="源码安装（可选）"><a href="#源码安装（可选）" class="headerlink" title="#源码安装（可选）"></a>#源码安装（可选）</h2><p>目前还是报错，参考其他教程说可能是脚本的python版本太早，具体问题不清楚</p><p><img src="/picture/image-20240909140051290.png" alt="image-20240909140051290"></p><p>下载 RiboCode-1.2.13.tar.gz</p><table><tbody><tr><td class="code"><pre><span class="line">pip install --user  RiboCode-*.tar.gz</span><br></pre></td></tr></tbody></table><p>添加环境变量</p><table><tbody><tr><td class="code"><pre><span class="line">export PATH=$PATH:$HOME/.local/bin/</span><br><span class="line">export PYTHONPATH=$HOME/.local/lib/python2.7</span><br><span class="line"></span><br><span class="line">source ~/.bashrc</span><br></pre></td></tr></tbody></table><h2 id="准备注释文件"><a href="#准备注释文件" class="headerlink" title="准备注释文件"></a>准备注释文件</h2><p>进入环境</p><table><tbody><tr><td class="code"><pre><span class="line">conda activate ribocode</span><br></pre></td></tr></tbody></table><table><tbody><tr><td class="code"><pre><span class="line">prepare_transcripts -g ./reference/Rattus_norvegicus.mRatBN7.2.112.gtf -f ./reference/Rattus_norvegicus.mRatBN7.2.dna.toplevel.fa -o ./ribocode-seference</span><br></pre></td></tr></tbody></table><h2 id="选择RPF读数的长度范围并识别P位点位置"><a href="#选择RPF读数的长度范围并识别P位点位置" class="headerlink" title="选择RPF读数的长度范围并识别P位点位置"></a>选择RPF读数的长度范围并识别P位点位置</h2><p>（与P点结合的位置位于整个RFs第13-15个碱基的位置——基迪奥生物）</p><p>在这一步花费了过多时间，</p><p>主要原因是 python的版本和上一步中STAR的输出文件，需要为转录组的文件–quantMode TranscriptomeSAM \ 为重要参数</p><table><tbody><tr><td class="code"><pre><span class="line">(ribocode) [dk_szy@login1 riboseq]$ python --version</span><br><span class="line">Python 2.7.18 :: Anaconda, Inc.</span><br></pre></td></tr></tbody></table><p>单个</p><table><tbody><tr><td class="code"><pre><span class="line">metaplots -a ./ribocode-seference/ -r ./star-result/SRR12414240Aligned.toTranscriptome.out.bam \</span><br><span class="line">-o ./test/ \</span><br><span class="line">-m 26 -M 50 -s yes -pv1 1 -pv2 1</span><br></pre></td></tr></tbody></table><p><img src="/picture/image-20240910101424560.png" alt="image-20240910101424560"></p><p>批量</p><p>需要提供 “-i”参数指定一个包含这些bam文件名称的文本文件（每行一个文件）</p><table><tbody><tr><td class="code"><pre><span class="line">cd ./star-result/</span><br><span class="line">metaplots -a ./ribocode-seference/ -i ./test.txt \</span><br><span class="line">-o ./test/ \</span><br><span class="line">-m 26 -M 50 -s yes -pv1 1 -pv2 1</span><br></pre></td></tr></tbody></table><p><img src="/picture/image-20240911094820898.png" alt="image-20240911094820898"></p><p>_pre_config.txt可用来做</p><p><img src="/picture/image-20240923102222043.png" alt="image-20240923102222043"></p><h2 id="使用核糖体分析数据检测翻译的ORF："><a href="#使用核糖体分析数据检测翻译的ORF：" class="headerlink" title="使用核糖体分析数据检测翻译的ORF："></a>使用核糖体分析数据检测翻译的ORF：</h2><p>需要准备config.txt文件，内容从上一部的_pre_config.txt中得到，尽量不要修改，直接复制每个的结果即可，不然可能因为缩进等未知原因报错</p><p>放在序列文件中</p><p>例如：</p><table><tbody><tr><td class="code"><pre><span class="line"># List the ribosome profiling bam/sam files below and specify the lengths and P-site locations of alignment reads which</span><br><span class="line"># are most likely originated from the translating ribosomes. If multiple files are defined, their P-site densities along</span><br><span class="line"># each nucleotide would be added together.</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"># Explanation of each column:</span><br><span class="line"># 1. SampleName: specify a name for each sample</span><br><span class="line"># 2. AlignmentFile: ribosome profiling alignment file (bam or sam format) at the transcript-level</span><br><span class="line"># 3. Stranded: Strandedness. Specify 'yes' for stranded interpretation, 'reverse' for reversed strand interpretation, or</span><br><span class="line">#              "no" for non strand-specific libraries.</span><br><span class="line"># 4,5. P-siteReadLength, and P-siteOffsets: the read lengths and P-sites locations.</span><br><span class="line">#      Both of them can be estimated by perform the metagene analysis using our package.</span><br><span class="line">#      List all lengths or P-site locations which separated by ",".</span><br><span class="line"></span><br><span class="line"># SampleNameAlignmentFileStranded(yes/reverse)P-siteReadLengthP-siteLocations</span><br><span class="line">SRR12414240Aligned.toTranscriptome.outSRR12414240Aligned.toTranscriptome.out.bamyes28,29,30,31,32,33,34,3512,12,12,12,12,12,12,12</span><br><span class="line">SRR12414241Aligned.toTranscriptome.outSRR12414241Aligned.toTranscriptome.out.bamyes28,29,30,31,32,3512,12,12,12,12,15</span><br><span class="line">SRR12414242Aligned.toTranscriptome.outSRR12414242Aligned.toTranscriptome.out.bamyes28,29,30,3112,12,12,12</span><br><span class="line">SRR12414243Aligned.toTranscriptome.outSRR12414243Aligned.toTranscriptome.out.bamyes28,29,30,31,33,3412,12,12,12,12,12</span><br></pre></td></tr></tbody></table><p>命令</p><table><tbody><tr><td class="code"><pre><span class="line">RiboCode -a ../ribocode-seference/  -c ./config.txt -l no -g -o ../test/</span><br></pre></td></tr></tbody></table><p><img src="/picture/image-20240911094748548.png" alt="image-20240911094748548"></p><h1 id="测序长度计数"><a href="#测序长度计数" class="headerlink" title="测序长度计数"></a>测序长度计数</h1><p>使用 seqkit 对我们使用 bowtie2 筛选后的结果进行统计</p><table><tbody><tr><td class="code"><pre><span class="line">seqkit fx2tab -j 20 -l -n -i -H ./bowtie-resule/*.fastq | cut -f 2 | sort | uniq -c &gt; sum.txt</span><br></pre></td></tr></tbody></table><p>得到 sum.txt 文件，包含序列长度统计信息</p><h1 id="计算翻译效率"><a href="#计算翻译效率" class="headerlink" title="计算翻译效率"></a>计算翻译效率</h1><p>未尝试</p><p>参考： <a href="https://book.ncrnalab.org/teaching/part-iii.-ngs-data-analyses/7.rna-regulation-ii/ribo_seq">https://book.ncrnalab.org/teaching/part-iii.-ngs-data-analyses/7.rna-regulation-ii/ribo_seq</a></p><table><tbody><tr><td class="code"><pre><span class="line">library<span class="punctuation">(</span>xtail<span class="punctuation">)</span></span><br><span class="line">ribo <span class="operator">&lt;-</span> read.table<span class="punctuation">(</span><span class="string">'Ribo_count.txt'</span><span class="punctuation">,</span>header<span class="operator">=</span><span class="built_in">T</span><span class="punctuation">,</span> <span class="built_in">quote</span><span class="operator">=</span><span class="string">''</span><span class="punctuation">,</span>check.names<span class="operator">=</span><span class="built_in">F</span><span class="punctuation">,</span> sep<span class="operator">=</span><span class="string">'\t'</span><span class="punctuation">,</span>row.names<span class="operator">=</span><span class="number">1</span><span class="punctuation">)</span></span><br><span class="line">mrna <span class="operator">&lt;-</span> read.table<span class="punctuation">(</span><span class="string">'RNA_count.txt'</span><span class="punctuation">,</span>header<span class="operator">=</span><span class="built_in">T</span><span class="punctuation">,</span> <span class="built_in">quote</span><span class="operator">=</span><span class="string">''</span><span class="punctuation">,</span>check.names<span class="operator">=</span><span class="built_in">F</span><span class="punctuation">,</span> sep<span class="operator">=</span><span class="string">'\t'</span><span class="punctuation">,</span>row.names<span class="operator">=</span><span class="number">1</span><span class="punctuation">)</span></span><br><span class="line"></span><br><span class="line">ribo <span class="operator">&lt;-</span> ribo<span class="punctuation">[</span><span class="punctuation">,</span><span class="built_in">c</span><span class="punctuation">(</span><span class="string">"wtnouvb1"</span><span class="punctuation">,</span><span class="string">"wtnouvb2"</span><span class="punctuation">,</span><span class="string">"wtnouvb3"</span><span class="punctuation">,</span><span class="string">"wtuvb1"</span><span class="punctuation">,</span><span class="string">"wtuvb2"</span><span class="punctuation">,</span><span class="string">"wtuvb3"</span><span class="punctuation">)</span><span class="punctuation">]</span></span><br><span class="line">mrna <span class="operator">&lt;-</span> mrna<span class="punctuation">[</span><span class="built_in">c</span><span class="punctuation">(</span><span class="string">"CD1_1"</span><span class="punctuation">,</span><span class="string">"CD1_2"</span><span class="punctuation">,</span><span class="string">"CD1_3"</span><span class="punctuation">,</span><span class="string">"CD0_1"</span><span class="punctuation">,</span><span class="string">"CD0_2"</span><span class="punctuation">,</span><span class="string">"CD0_3"</span><span class="punctuation">)</span><span class="punctuation">]</span></span><br><span class="line"></span><br><span class="line">condition <span class="operator">&lt;-</span> <span class="built_in">c</span><span class="punctuation">(</span><span class="string">"control"</span><span class="punctuation">,</span><span class="string">"control"</span><span class="punctuation">,</span><span class="string">"control"</span><span class="punctuation">,</span><span class="string">"treat"</span><span class="punctuation">,</span><span class="string">"treat"</span><span class="punctuation">,</span><span class="string">"treat"</span><span class="punctuation">)</span></span><br><span class="line">results <span class="operator">&lt;-</span> xtail<span class="punctuation">(</span>mrna<span class="punctuation">,</span>ribo<span class="punctuation">,</span>condition<span class="punctuation">,</span>minMeanCount<span class="operator">=</span><span class="number">1</span><span class="punctuation">,</span>bins<span class="operator">=</span><span class="number">10000</span><span class="punctuation">)</span></span><br><span class="line">results_tab <span class="operator">&lt;-</span> resultsTable<span class="punctuation">(</span>results<span class="punctuation">,</span>sort.by<span class="operator">=</span><span class="string">"pvalue.adjust"</span><span class="punctuation">,</span>log2FCs<span class="operator">=</span><span class="literal">TRUE</span><span class="punctuation">,</span> log2Rs<span class="operator">=</span><span class="literal">TRUE</span><span class="punctuation">)</span></span><br><span class="line">write.table<span class="punctuation">(</span>results_tab<span class="punctuation">,</span><span class="string">"TE.xls"</span><span class="punctuation">,</span><span class="built_in">quote</span><span class="operator">=</span><span class="built_in">F</span><span class="punctuation">,</span>sep<span class="operator">=</span><span class="string">"\t"</span><span class="punctuation">)</span></span><br></pre></td></tr></tbody></table><h1 id="ChatGPT根据参考文章提供的内容"><a href="#ChatGPT根据参考文章提供的内容" class="headerlink" title="ChatGPT根据参考文章提供的内容"></a>ChatGPT根据参考文章提供的内容</h1><table><tbody><tr><td class="code"><pre><span class="line"># 使用Cutadapt进行质量控制</span><br><span class="line">cutadapt -j 4 \</span><br><span class="line">  -a "TGGAATTCTCGGGTGCCAAGG" \</span><br><span class="line">  -u 4 \</span><br><span class="line">  -m 24 \</span><br><span class="line">  -M 35 \</span><br><span class="line">  -q 20 \</span><br><span class="line">  --match-read-wildcards \</span><br><span class="line">  --max-n 0.25 \</span><br><span class="line">  -o trimmed_reads.fastq \</span><br><span class="line">  input_reads.fastq</span><br><span class="line"></span><br><span class="line"># 使用bowtie去除rRNA序列</span><br><span class="line">bowtie -p 4 \</span><br><span class="line">  -v 2 \</span><br><span class="line">  -m 1 \</span><br><span class="line">  --un non_rRNA_reads.fastq \</span><br><span class="line">  path_to_index/rRNA_index \</span><br><span class="line">  trimmed_reads.fastq</span><br><span class="line"></span><br><span class="line"># 使用STAR将剩余的reads映射到大鼠基因组</span><br><span class="line">STAR --runThreadN 4 \</span><br><span class="line">  --genomeDir path_to_STAR_index \</span><br><span class="line">  --readFilesIn non_rRNA_reads.fastq \</span><br><span class="line">  --outFileNamePrefix star_output_ \</span><br><span class="line">  --outFilterMismatchNmax 2 \</span><br><span class="line">  --outFilterMultimapNmax 5 \</span><br><span class="line">  --outFilterMatchNmin 16 \</span><br><span class="line">  --alignEndsType EndToEnd</span><br><span class="line"></span><br><span class="line"># 使用RiboCode进行ORF的识别和定量</span><br><span class="line">RiboCode.py -c config_file.txt \</span><br><span class="line">  -l no \</span><br><span class="line">  -m 10 \</span><br><span class="line">  -g \</span><br><span class="line">  -b \</span><br><span class="line">  -A "CTG,GTG,TTG" \</span><br><span class="line">  path_to_STAR_output/Aligned.out.sam</span><br><span class="line"></span><br><span class="line"># 使用Ribodiff分析ORFs的TE</span><br><span class="line">Ribodiff.py -i RiboCode_output_ORFs.txt \</span><br><span class="line">  -o Ribodiff_output.txt</span><br><span class="line"></span><br><span class="line"># 分析RNAs-seq和Ribo-seq重复样本之间的相关性</span><br><span class="line"># 这一步通常需要自定义脚本或使用统计软件，这里只是一个示例命令</span><br><span class="line">Rscript analyze_correlation.R Ribo-seq_data.txt RNA-seq_data.txt</span><br><span class="line"></span><br></pre></td></tr></tbody></table><h1 id="绘图"><a href="#绘图" class="headerlink" title="绘图"></a>绘图</h1><p>序列长度统计</p><table><tbody><tr><td class="code"><pre><span class="line">seqkit fx2tab -j 30 -l  -n -i -H file.fastq.gz  &gt; Length.txt</span><br></pre></td></tr></tbody></table><h1 id="编码的保守性"><a href="#编码的保守性" class="headerlink" title="编码的保守性"></a>编码的保守性</h1><p><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9889109/">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9889109/</a></p><p><img src="/picture/image-20240925142042648.png" alt="image-20240925142042648"></p><p>使用<a href="https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Translations&PROGRAM=tblastn&PAGE_TYPE=BlastSearch&BLAST_SPEC=">tblastn</a> 进行分析</p><p>复制编码smORF的蛋白序列</p><p>之前 _collapsed.txt 的文件</p><p><img src="/picture/image-20240925142143277.png" alt="image-20240925142143277"></p><p>进入网站</p><p><a href="https://blast.ncbi.nlm.nih.gov/Blast.cgi">https://blast.ncbi.nlm.nih.gov/Blast.cgi</a></p><p><img src="/picture/image-20240925142257291.png" alt="image-20240925142257291"></p><p><img src="/picture/image-20240925142332303.png" alt="image-20240925142332303"></p>]]></content>
    
    
    <summary type="html">Ribo-seq的介绍https:&amp;#x2F;&amp;#x2F;www.cell.com&amp;#x2F;cell-metabolism&amp;#x2F;fulltext&amp;#x2F;S1550-4131(22)00541-1?uuid&amp;#x3D;uuid%3A1357b65f-e2ff-45e2-a40c-7a90f3170be5#mmc2 核糖体分析 (Ribo-seq)</summary>
    
    
    
    <category term="生物信息学" scheme="https://song-xudong.github.io/categories/%E7%94%9F%E7%89%A9%E4%BF%A1%E6%81%AF%E5%AD%A6/"/>
    
    
    <category term="生物信息学" scheme="https://song-xudong.github.io/tags/%E7%94%9F%E7%89%A9%E4%BF%A1%E6%81%AF%E5%AD%A6/"/>
    
    <category term="Ribo-seq" scheme="https://song-xudong.github.io/tags/Ribo-seq/"/>
    
  </entry>
  
  <entry>
    <title>AutoDock分子对接</title>
    <link href="https://song-xudong.github.io/2024/11/16/AutoDock%E5%88%86%E5%AD%90%E5%AF%B9%E6%8E%A5/"/>
    <id>https://song-xudong.github.io/2024/11/16/AutoDock%E5%88%86%E5%AD%90%E5%AF%B9%E6%8E%A5/</id>
    <published>2024-11-16T04:17:02.000Z</published>
    <updated>2024-11-16T04:24:07.000Z</updated>
    
    <content type="html"><![CDATA[<h1 id="文件下载"><a href="#文件下载" class="headerlink" title="文件下载"></a>文件下载</h1><h2 id="下载PDB文件"><a href="#下载PDB文件" class="headerlink" title="下载PDB文件"></a>下载PDB文件</h2><p>以eEF2K和橄榄苦苷为例进行对接</p><hr><p>eEF2K的文件，eEF2K-TR，无ATP，ADP的结构，包含CaM</p><p><a href="https://www.rcsb.org/structure/7SHQ">https://www.rcsb.org/structure/7SHQ</a></p><p>下载<a href="https://files.rcsb.org/download/7SHQ.pdb">PDB Format</a></p><p>橄榄苦苷的文件Oleuropein</p><p><a href="https://pubchem.ncbi.nlm.nih.gov/compound/5281544">https://pubchem.ncbi.nlm.nih.gov/compound/5281544</a></p><p>下载<a href="https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/CID/5281544/record/SDF?record_type=2d&response_type=save&response_basename=Structure2D_COMPOUND_CID_5281544">2D Structure</a></p><p>将文件放在<strong>Autodock工作目录</strong>（不能有中文路径）<br><img src="/picture/image-20240520095924170.png" alt="image-20240520095924170"></p><h2 id="格式转化"><a href="#格式转化" class="headerlink" title="格式转化"></a>格式转化</h2><p>使用<strong>Open Babel</strong>将<strong>Oleuropein</strong>的sdf文件转化为autodock可用的pdb文件</p><p>选择input和output的格式和目录，点击CONVERT进行转换</p><p><img src="/picture/image-20240520100335713.png" alt="image-20240520100335713"></p><h1 id="Autodock对接"><a href="#Autodock对接" class="headerlink" title="Autodock对接"></a>Autodock对接</h1><h2 id="软件启动"><a href="#软件启动" class="headerlink" title="软件启动"></a>软件启动</h2><p><img src="/picture/image-20240520101335053.png" alt="image-20240520101335053"></p><p>记得关闭AMD显卡和联想电脑管家</p><p>参考：<a href="https://www.zhihu.com/question/393216168">https://www.zhihu.com/question/393216168</a></p><h2 id="对接流程"><a href="#对接流程" class="headerlink" title="对接流程"></a>对接流程</h2><p>参考：<a href="https://zhuanlan.zhihu.com/p/662465038">https://zhuanlan.zhihu.com/p/662465038</a></p><p>打开autodock软件</p><h3 id=""><a href="#" class="headerlink" title=""></a><img src="/picture/v2-803b427f5480684166eca85d6a2f8b4d_720w.webp" alt="img"></h3><p>首先，设置路径，file-&gt;perfernce-&gt;set,出现下图界面，选择startup directory，把我们刚刚含有5个文件的文件路径拷贝进来。</p><p><img src="/picture/v2-c701e2e475f091be274df740cbfeec84_720w.webp" alt="img"></p><p>然后点击Make Default,将它设置为默认路径。</p><p>3.对蛋白质进行前处理。点击file-&gt;read molecule，选择蛋白质分子</p><p><img src="/picture/v2-6c8b1ab72e7109ade47c139d4874d1a3_720w.webp" alt="img"></p><p><img src="/picture/v2-0a8f980d32862213414cedf591383707_720w.webp" alt="img"></p><p>首先，对蛋白质进行除水加氢键。点击edit-&gt;delete water，edit-&gt;hydrogens-&gt;add-&gt;ok</p><p><img src="/picture/v2-6376d484378e06555d188433fb02dced_720w.webp" alt="img"></p><p>把它选择为大分子，点击grid-&gt;macromolecules-&gt;choose,选择大分子，点击select molecule</p><p><img src="/picture/v2-8b9dd1da2bc6a10d96e077d1e49cbeaf_720w.webp" alt="img"></p><p>得到蛋白质的pdbqt的一个文件，保存即可（注意不要出现特殊符号，下图中的—就有可能导致运行错误）</p><p><img src="/picture/v2-e6d2128b1714165d8be2979303418f0e_720w.webp" alt="img"></p><p>4.对小分子处理，然后把蛋白质删掉，edit-&gt;delete,导入小分子，与蛋白质导入相同的步骤，同样的进行加氢处理，ligand-&gt;input-&gt;choose,选择小分子作为ligand，同时对扭转键进行检测。ligand-&gt;torsion tree-&gt;detect root</p><p><img src="/picture/v2-bfba88d41c1b429ca40b3ebe25b4fbc0_720w.webp" alt="img"></p><p>ligand-&gt;torsion tree-&gt;choose torsions</p><p><img src="/picture/v2-f0dd7ae87740457582c230f56b76919a_720w.webp" alt="img"></p><p>done</p><p><img src="/picture/v2-8d621f05dbf66ba84e284160f3ffc41c_720w.webp" alt="img"></p><p>红色的是不可以被扭转的，绿色的是可以扭转的</p><p><img src="/picture/v2-869105bc1d67749ebace248043d86bfc_720w.webp" alt="img"></p><p>输出小分子，ligand-&gt;output-&gt;save as PDBQT,得到小分子的pdbqt格式的文件。</p><p><img src="/picture/v2-71982514ca7db4d4c0964503879a3f1d_720w.webp" alt="img"></p><p>四，开始对接。</p><p>删掉当前的小分子。</p><p>先把蛋白质导入进来，点击Grid-&gt;macromolecules-&gt;open,选择蛋白质大分子。后面出现的选项全部选择yes，确定。</p><p><img src="/picture/v2-df73e9cd522e93fc94234b038a12324a_720w.webp" alt="img"></p><p>接着导入小分子，Grid-&gt;set map types-&gt;open ligand</p><p><img src="/picture/v2-ed8226ce84b9a275b89b7128f0e5eb42_720w.webp" alt="img"></p><p>此时开始对接，对参数进行一些设置，点击grid-&gt;grid box出现一个立方体</p><p><img src="/picture/v2-4e8ee0136bc729ed20bdbf62c51243ba_720w.webp" alt="img"></p><p>由于我们不知道大概的结合位点在哪里，需要调节这个三位参数，把蛋白质和小分子都包含在内，可以通过旋转观察否包含在内。</p><p><img src="/picture/v2-144cee9ad61d0754b7d043fa1c2f7df5_720w.webp" alt="img"></p><p>到这个程度就可以了。</p><p>点击dejavu gui，即这个图标</p><p><img src="/picture/v2-4e789eb0a12e597c6770aee32bb427b1_720w.webp" alt="img"></p><p>点击root,选择小分子，把图中的对钩取消勾选</p><p><img src="/picture/v2-3e6a18fdeebf4301bae9bfd0b4807af0_720w.webp" alt="img"></p><p>此时，用右键把小分子拖出到立方体外，此时，再把刚刚取消勾选的√给勾上。再点击刚刚打开窗口的file，close saving current</p><p><img src="/picture/v2-919c3bd82b48706e9263ecb7f84f3c06_720w.webp" alt="img"></p><p>点击grid-&gt;output-&gt;save GPF</p><p><img src="/picture/v2-bf9a1779303125ccdda9f59ee36e280b_720w.webp" alt="img"></p><p>保存为1，后缀如果不能正常出现的话，手动输入后缀gpf。</p><p><img src="/picture/v2-9d2b285e7a451e483a57a873bb52f0b3_720w.webp" alt="img"></p><p>点击保存（注意，不要出现中文字符或者空格以及特殊符号）</p><p>点击run-&gt;autogrid</p><p><img src="/picture/v2-e403161aca167f5c5bd5794069014dff_720w.webp" alt="img"></p><p>browse 我们的gpf文件</p><p><img src="/picture/v2-5aae1d9189cd4d9950d2a71294d7e351_720w.webp" alt="img"></p><p>会生成一个glg文件</p><p><img src="/picture/v2-e80da1e2612273d6910c589295008876_720w.webp" alt="img"></p><p>点击launch，生成一个新窗口，等待这个窗口运行完毕。</p><p><img src="/picture/v2-d2fcfdf2faf9fa9fe5a865ea0f1f5559_720w.webp" alt="img"></p><p>运行完毕，数据文件夹中会多出很多map结尾的文件，还有一个glg文件。</p><p><img src="/picture/v2-765527a3c183541af30b57235e8891cc_720w.webp" alt="img"></p><p>点击Docking-&gt;macromolecules-&gt;set rigid file name,打开蛋白质大分子。再点击docking-》ligand-&gt;choose-&gt;选择小分子-&gt;set as ligand,然后点击接受。</p><p>点击Docking-&gt;search parameters-&gt;genetic algorithm</p><p><img src="/picture/v2-60f697b8f54bd85803af1a392fef78d3_720w.webp" alt="img"></p><p>第一排是对接次数，我这里选择50次（官方建议对接50次），点击accept。</p><p><img src="/picture/v2-1f82fc3babe402387ed5b89e2d191548_720w.webp" alt="img"></p><p>接着Docking-&gt;docking parameters-&gt;accept。</p><p>docking-&gt;output-&gt;lamarckian(4.2),输出文件，后缀同样手动添加dpf，点击保存。</p><p><img src="/picture/v2-7ca3d16c0217e7093fa80b0db9636c4d_720w.webp" alt="img"></p><p><img src="/picture/v2-8fdcff10bfecb1824ee1f7ec1506d899_720w.webp" alt="img"></p><p>点击run-&gt;run autodock</p><p><img src="/picture/v2-d762ba8aaada0d787b5bc0bfccf07ac7_720w.webp" alt="img"></p><p>browse刚刚保存的dpf文件，生成一个dlg文件，点击launch，等待程序运行。（对接时间较长，框体自动关闭为完成）</p><p><img src="/picture/v2-8aa645ab404280a08c691966b2083f45_720w.webp" alt="img"></p><p>文件夹中dlg文件生成好之后，可以删掉所有的分子，点击edit-&gt;delete-&gt;all molecules.</p><p><img src="/picture/v2-28a9b4911c8a1b9de9a71e04652b4779_720w.webp" alt="img"></p><p>五、对结果进行分析</p><p>analysis-&gt;docking-&gt;open</p><p><img src="/picture/v2-e0c2cca664be92ae55623cf0efbfcac9_720w.webp" alt="img"></p><p>打开dlg文件,点击确定，点击analyze-》macromolecule-&gt;open,等待大分子出现，接着点击analyze-》conformations-》play ranked by enanergy,出现一个新窗口如下图</p><p><img src="/picture/v2-334f0dc473615150cde59dd961924641_720w.webp" alt="img"></p><p>点击倒数第二个按钮，即</p><p><img src="/picture/v2-031ec389b376060954ff39030e011fb4_720w.webp" alt="img"></p><p>出现新的窗口，点击build H-bond，点击show info,</p><p><img src="/picture/v2-4c2fe4a47237d4393585175eb37a075a_720w.webp" alt="img"></p><p>出现新的窗口，得到第一次对接结果的结合能数据，形成氢键个数等等。</p><p><img src="/picture/v2-fc2dde254d39d4298649749324d97b39_720w.webp" alt="img"></p><p>点击analyze-》conformations-》load…查看其它结合能信息</p><p>接着，点击write complex，</p><p><img src="/picture/v2-0b7984ce356031cd0060950c47a50ce6_720w.webp" alt="img"></p><p>输出格式为pdbqt的文件，手动输入后缀</p><p><img src="/picture/v2-84a4c1b6258cc4d3990adb066def04d7_720w.webp" alt="img"></p><p>保存后，用openbabel将格式转化为pdb格式，接着就可以用pymol（pymol安装看<a href="https://zhuanlan.zhihu.com/p/663296401">开源版pymol的下载与安装（写给自己） - 知乎 (zhihu.com)</a>）打开查看对接结果以及绘图。</p><p>*<strong>用pymol输出图片*</strong></p><ol><li>打开pymol，file-》open，选择pdb格式的文件。</li></ol><p><img src="/picture/v2-49af57a90a72a0b80755ca12c33e5294_720w.webp" alt="img"></p><p>2.点击pymol右下角的s，可以显示出氨基酸残基</p><p><img src="/picture/v2-43bb5be4a23ce551fb5c82dfea933244_720w.webp" alt="img"></p><p>这里的UNL是小分子，后面的是蛋白质残基。</p><h1 id="对接完成使用PyMol可视化"><a href="#对接完成使用PyMol可视化" class="headerlink" title="对接完成使用PyMol可视化"></a>对接完成使用PyMol可视化</h1>]]></content>
    
    
    <summary type="html">文件下载下载PDB文件以eEF2K和橄榄苦苷为例进行对接 eEF2K的文件，eEF2K-TR，无ATP，ADP的结构，包含CaM https:&amp;#x2F;&amp;#x2F;www.rcsb.org&amp;#x2F;structure&amp;#x2F;7SHQ 下载PDB Format 橄榄苦苷的文件Oleuropein https:&amp;#x2F;&amp;#x2F;pubchem.n</summary>
    
    
    
    <category term="生物信息学" scheme="https://song-xudong.github.io/categories/%E7%94%9F%E7%89%A9%E4%BF%A1%E6%81%AF%E5%AD%A6/"/>
    
    
    <category term="生物信息学" scheme="https://song-xudong.github.io/tags/%E7%94%9F%E7%89%A9%E4%BF%A1%E6%81%AF%E5%AD%A6/"/>
    
    <category term="分子对接" scheme="https://song-xudong.github.io/tags/%E5%88%86%E5%AD%90%E5%AF%B9%E6%8E%A5/"/>
    
  </entry>
  
  <entry>
    <title>转录组上游分析</title>
    <link href="https://song-xudong.github.io/2024/11/16/%E8%BD%AC%E5%BD%95%E7%BB%84%E4%B8%8A%E6%B8%B8%E5%88%86%E6%9E%90/"/>
    <id>https://song-xudong.github.io/2024/11/16/%E8%BD%AC%E5%BD%95%E7%BB%84%E4%B8%8A%E6%B8%B8%E5%88%86%E6%9E%90/</id>
    <published>2024-11-16T02:13:12.000Z</published>
    <updated>2024-11-16T02:16:21.000Z</updated>
    
    <content type="html"><![CDATA[<h1 id="创建环境"><a href="#创建环境" class="headerlink" title="创建环境"></a>创建环境</h1><table><tbody><tr><td class="code"><pre><span class="line">conda create -n rna_p3 python=3    sra-tools               </span><br><span class="line">conda env list                                         #查看环境</span><br><span class="line">conda activate rna_p3                                  #进入conda 环境，每次开始分析都要进入环境！！</span><br><span class="line">conda deactivate                                    #退出当前conda环境</span><br></pre></td></tr></tbody></table><h1 id="上游分析软件下载"><a href="#上游分析软件下载" class="headerlink" title="上游分析软件下载"></a>上游分析软件下载</h1><table><tbody><tr><td class="code"><pre><span class="line">conda install -y 软件名=版本号</span><br></pre></td></tr></tbody></table><p>例如sra-tools</p><table><tbody><tr><td class="code"><pre><span class="line">conda install bioconda::sra-tools</span><br></pre></td></tr></tbody></table><p>具体下载可在conda官网找到 <a href="https://anaconda.org/">https://anaconda.org/</a></p><p>下载</p><p>质控清洗：fastqc multiqc trim-galore<br>比对计数： hisat2 subread samtools&#x3D;1.6 salmon</p><p>建议一个个安装，顺利完成</p><h1 id="示例文件下载"><a href="#示例文件下载" class="headerlink" title="示例文件下载"></a>示例文件下载</h1><p>参考文章</p><p><a href="https://www.nature.com/articles/s41422-021-00477-x#data-availability">https://www.nature.com/articles/s41422-021-00477-x#data-availability</a></p><p>找到文章中</p><p>All data generated in the current study are available in the Gene Expression Omnibus with accession number GSE154290.</p><p>在NCBI网站中：<a href="https://www.ncbi.nlm.nih.gov/">https://www.ncbi.nlm.nih.gov/</a></p><p>​ 选择GEO DataSets</p><p>​ 查找GSE154290</p><p><img src="/picture/image-20240801164417712.png" alt="image-20240801164417712"></p><p><img src="/picture/image-20240801164435455.png" alt="image-20240801164435455"></p><p><img src="/picture/image-20240805154029037.png" alt="image-20240805154029037"></p><p>PAIRED 表示为双端测序</p><p><img src="/picture/image-20240801164513367.png" alt="image-20240801164513367"></p><p>SRA.txt</p><table><tbody><tr><td class="code"><pre><span class="line">SRR12207279</span><br><span class="line">SRR12207280</span><br><span class="line">SRR12207283</span><br><span class="line">SRR12207284</span><br></pre></td></tr></tbody></table><h2 id="prefetch下载"><a href="#prefetch下载" class="headerlink" title="prefetch下载"></a>prefetch下载</h2><p>单个下载</p><table><tbody><tr><td class="code"><pre><span class="line">prefetch SRR1482462</span><br></pre></td></tr></tbody></table><p>批量下载</p><table><tbody><tr><td class="code"><pre><span class="line">prefetch -f no -p --option-file SRA.txt</span><br></pre></td></tr></tbody></table><p>后台下载</p><table><tbody><tr><td class="code"><pre><span class="line">nohup prefetch SRR12207279 &amp;</span><br></pre></td></tr></tbody></table><p>##使用超算无法通过提交sbatch的方式下载，怀疑是sbatch任务的网络问题</p><h2 id="后台批量下载"><a href="#后台批量下载" class="headerlink" title="后台批量下载"></a>后台批量下载</h2><table><tbody><tr><td class="code"><pre><span class="line">nohup prefetch -f no --option-file SRA.txt &amp;</span><br></pre></td></tr></tbody></table><p>$ nohup: ignoring input and appending output to ‘nohup.out’ 并不是报错，按回车继续</p><h3 id="查看后台任务"><a href="#查看后台任务" class="headerlink" title="查看后台任务"></a>查看后台任务</h3><table><tbody><tr><td class="code"><pre><span class="line">jobs</span><br><span class="line"></span><br><span class="line">#或者</span><br><span class="line">ps -f</span><br></pre></td></tr></tbody></table><p>会显示</p><p>UID PID PPID C STIME TTY TIME CMD<br>dk_szy 195230 145297 4 15:11 pts&#x2F;2 00:00:00 prefetch -f no –option-file SRA.txt</p><p>删除后台任务</p><table><tbody><tr><td class="code"><pre><span class="line">kill -9 1</span><br></pre></td></tr></tbody></table><h3 id="下载完成"><a href="#下载完成" class="headerlink" title="下载完成"></a>下载完成</h3><p>会在nohub.out文件中显示下载的进展，可查看是否完整下载</p><h1 id="SRA文件转为FASTQ格式"><a href="#SRA文件转为FASTQ格式" class="headerlink" title="SRA文件转为FASTQ格式"></a>SRA文件转为FASTQ格式</h1><h2 id="单个转格式"><a href="#单个转格式" class="headerlink" title="单个转格式"></a>单个转格式</h2><p>慢的转换，太慢了，不建议使用</p><table><tbody><tr><td class="code"><pre><span class="line">#将当前</span><br><span class="line">fastq-dump --split-3 --gzip ./SRR12207279</span><br></pre></td></tr></tbody></table><p>使用这个转换，多线程转换格式，输出的为fq的文件</p><table><tbody><tr><td class="code"><pre><span class="line">fasterq-dump --split-3 ./SRR12207283</span><br></pre></td></tr></tbody></table><h2 id="批量转格式"><a href="#批量转格式" class="headerlink" title="批量转格式"></a>批量转格式</h2><p>小命令：删除当前目录SRR文件夹里的所有分文件夹，只保留其文件</p><table><tbody><tr><td class="code"><pre><span class="line">find ./SRR -mindepth 1 -type d -exec sh -c 'mv {}/* ./SRR; rmdir {}' \;</span><br></pre></td></tr></tbody></table><p>fasterq-dump进行批量转换，将所有 .sra 文件都放在SRR文件夹里</p><table><tbody><tr><td class="code"><pre><span class="line"><span class="comment"># 设置输入目录</span></span><br><span class="line">sra_dir<span class="operator">=</span><span class="string">"./SRR"</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># 设置输出目录</span></span><br><span class="line">output_dir<span class="operator">=</span><span class="string">"./fastq-result"</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># 遍历目录中的所有.sra文件</span></span><br><span class="line"><span class="keyword">for</span> sra_file <span class="keyword">in</span> <span class="operator">$</span>sra_dir<span class="operator">/</span><span class="operator">*</span>.sra</span><br><span class="line">do</span><br><span class="line">    <span class="comment"># 获取不带路径的文件名</span></span><br><span class="line">    filename<span class="operator">=</span><span class="operator">$</span><span class="punctuation">(</span>basename <span class="string">"$sra_file"</span> .sra<span class="punctuation">)</span></span><br><span class="line">    </span><br><span class="line">    <span class="comment"># 使用fasterq-dump处理每个文件</span></span><br><span class="line">    fasterq<span class="operator">-</span>dump <span class="operator">-</span><span class="operator">-</span>outdir <span class="string">"$output_dir"</span> <span class="operator">-</span><span class="operator">-</span>split<span class="operator">-</span><span class="number">3</span> <span class="string">"$sra_file"</span></span><br><span class="line">done</span><br></pre></td></tr></tbody></table><p>生成在当前路径 .&#x2F;fastq-result 下的 fastq 文件</p><h2 id="md5检查文件完整性"><a href="#md5检查文件完整性" class="headerlink" title="md5检查文件完整性"></a>md5检查文件完整性</h2><p>这样检验似乎并不是很准确，需要用数据来源处所给的 md5 值进行比对</p><p>生成md5值</p><table><tbody><tr><td class="code"><pre><span class="line">md5sum *gz &gt;md5.txt</span><br><span class="line">cat md5.txt</span><br></pre></td></tr></tbody></table><p>检查，要在当前文件夹下</p><table><tbody><tr><td class="code"><pre><span class="line">md5sum -c md5.txt</span><br></pre></td></tr></tbody></table><h1 id="原始数据质量查看"><a href="#原始数据质量查看" class="headerlink" title="原始数据质量查看"></a>原始数据质量查看</h1><h2 id="fastqc"><a href="#fastqc" class="headerlink" title="fastqc"></a>fastqc</h2><table><tbody><tr><td class="code"><pre><span class="line">ls ./fastq-result/* | xargs fastqc -t 12 -o   ./fastq-result/</span><br></pre></td></tr></tbody></table><h2 id="multiqc"><a href="#multiqc" class="headerlink" title="multiqc"></a>multiqc</h2><table><tbody><tr><td class="code"><pre><span class="line">multiqc ./fastq-result/ -o   ./fastq-result/</span><br></pre></td></tr></tbody></table><h1 id="质控"><a href="#质控" class="headerlink" title="质控"></a>质控</h1><p>trim_galore</p><p>参考： <a href="https://mp.weixin.qq.com/s?search_click_id=2253394900344533860-1633341597788-628761&sub=&__biz=MzAxMDkxODM1Ng==&mid=2247503527&idx=4&sn=261be2c7ff2a7f80b2e6ab139028d780&chksm=9b4b8e1cac3c070a963f59fcc55095fe0ae37d1eecde3e67f6682779c91d0e907a8fd7eb8f84&scene=3&subscene=10000&clicktime=1633341597&enterid=1633341597&ascene=0&devicetype=android-30&version=28000f35&nettype=cmnet&abtest_cookie=AAACAA==&lang=zh_CN&exportkey=AdXQxZkOzbrJK+UwprsoMEk=&pass_ticket=pvvLsQPcUT6hjrE3KSHbHTaZKL/VtNBVq+LbY9r0hucz/DdbU3NgO9ofB9mtC3fS&wx_header=1">https://mp.weixin.qq.com/s?search_click_id&#x3D;2253394900344533860-1633341597788-628761&amp;sub&#x3D;&amp;__biz&#x3D;MzAxMDkxODM1Ng&#x3D;&#x3D;&amp;mid&#x3D;2247503527&amp;idx&#x3D;4&amp;sn&#x3D;261be2c7ff2a7f80b2e6ab139028d780&amp;chksm&#x3D;9b4b8e1cac3c070a963f59fcc55095fe0ae37d1eecde3e67f6682779c91d0e907a8fd7eb8f84&amp;scene&#x3D;3&amp;subscene&#x3D;10000&amp;clicktime&#x3D;1633341597&amp;enterid&#x3D;1633341597&amp;ascene&#x3D;0&amp;devicetype&#x3D;android-30&amp;version&#x3D;28000f35&amp;nettype&#x3D;cmnet&amp;abtest_cookie&#x3D;AAACAA%3D%3D&amp;lang&#x3D;zh_CN&amp;exportkey&#x3D;AdXQxZkOzbrJK%2BUwprsoMEk%3D&amp;pass_ticket&#x3D;pvvLsQPcUT6hjrE3KSHbHTaZKL%2FVtNBVq%2BLbY9r0hucz%2FDdbU3NgO9ofB9mtC3fS&amp;wx_header&#x3D;1</a></p><h2 id="单个质控"><a href="#单个质控" class="headerlink" title="单个质控"></a>单个质控</h2><p>-q 25 质量</p><p>–phred33 测序类型</p><p>–length 35 最短长度</p><p>–stringency 3 设定可以忍受的前后adapter重叠的碱基数，默认为1（非常苛刻）。可以适度放宽，因为后一个adapter几乎不可能被测序仪读到。</p><p>–paired 双端测序</p><p>-o .&#x2F;clean_data&#x2F; 设置输出目录</p><p>.&#x2F;fastq-result&#x2F;SRR12207279_1.fastq .&#x2F;fastq-result&#x2F;SRR12207279_2.fastq 为输入文件</p><table><tbody><tr><td class="code"><pre><span class="line">trim_galore -q 25 --phred33 --length 35 --stringency 3 --paired -o ./clean_data/ ./fastq-result/SRR12207279_1.fastq ./fastq-result/SRR12207279_2.fastq</span><br></pre></td></tr></tbody></table><h2 id="批量质控"><a href="#批量质控" class="headerlink" title="批量质控"></a>批量质控</h2><p>运行目录下包含 .&#x2F;fastq-result&#x2F; 这个文件夹，存放测序数据 .fastq或 .fastq.gz</p><p>方法一：这个是标准的单线程循环方式，较慢。4个双端数据，大概半个小时，慢但是不容易出错</p><table><tbody><tr><td class="code"><pre><span class="line"># 设置工作目录为fastq文件所在的目录</span><br><span class="line">cd ./fastq-result/</span><br><span class="line"></span><br><span class="line"># 遍历所有以_1.fastq结尾的文件</span><br><span class="line">for file1 in *_1.fastq</span><br><span class="line">do</span><br><span class="line">    # 从文件名中提取没有_1的部分</span><br><span class="line">    base=$(basename "$file1" _1.fastq)</span><br><span class="line">    </span><br><span class="line">    # 构建对应的_2.fastq文件名</span><br><span class="line">    file2="${base}_2.fastq"</span><br><span class="line">    </span><br><span class="line">    # 检查对应的_2.fastq文件是否存在</span><br><span class="line">    if [ -e "$file2" ]; then</span><br><span class="line">        # 如果存在，执行trim_galore命令</span><br><span class="line">        trim_galore -q 25 --phred33 --length 35 --stringency 3 --paired -o ../clean_data/ "$file1" "$file2"</span><br><span class="line">    else</span><br><span class="line">        # 如果不存在，打印错误信息</span><br><span class="line">        echo "Error: No matching file found for $file1"</span><br><span class="line">    fi</span><br><span class="line">done</span><br></pre></td></tr></tbody></table><p>fq.gz格式文件是处理后得到的数据，txt格式文件是样品处理的结果报告，也包括软件运行的参数信息</p><p>方法二：多任务同时进行，4个双端数据，10分钟</p><table><tbody><tr><td class="code"><pre><span class="line"># 设置工作目录为fastq文件所在的目录</span><br><span class="line">cd ./fastq-result/</span><br><span class="line"></span><br><span class="line"># 遍历所有以_1.fastq结尾的文件</span><br><span class="line">for file1 in *_1.fastq; do</span><br><span class="line">    # 从文件名中提取没有_1的部分</span><br><span class="line">    base=$(basename "$file1" _1.fastq)</span><br><span class="line">    </span><br><span class="line">    # 构建对应的_2.fastq文件名</span><br><span class="line">    file2="${base}_2.fastq"</span><br><span class="line">    </span><br><span class="line">    # 在后台执行检查和trim_galore命令</span><br><span class="line">    (</span><br><span class="line">        # 检查对应的_2.fastq文件是否存在</span><br><span class="line">        if [ -e "$file2" ]; then</span><br><span class="line">            # 如果存在，执行trim_galore命令</span><br><span class="line">            trim_galore -q 25 --phred33 --length 35 --stringency 3 --paired -o ../clean_data/ "$file1" "$file2"</span><br><span class="line">        else</span><br><span class="line">            # 如果不存在，打印错误信息</span><br><span class="line">            echo "Error: No matching file found for $file1"</span><br><span class="line">        fi</span><br><span class="line">    ) &amp;</span><br><span class="line">done</span><br><span class="line"></span><br><span class="line"># 等待所有后台进程完成</span><br><span class="line">wait</span><br></pre></td></tr></tbody></table><p>结果</p><p><img src="/picture/image-20240803183040309.png" alt="image-20240803183040309"></p><p>再次查看样品质量</p><p>XXXXXX_val_1.fq即为清洗后的序列</p><h2 id="fastp质控"><a href="#fastp质控" class="headerlink" title="fastp质控"></a>fastp质控</h2><p>此处为扩展学习</p><p>conda下载fastp</p><table><tbody><tr><td class="code"><pre><span class="line"># note: the fastp version in bioconda may be not the latest</span><br><span class="line">conda install -c bioconda fastp</span><br></pre></td></tr></tbody></table><p>单个</p><p>输入 -i -I 双端测序文件 ，输出 -o -O 质控处理后文件，和 json文件，fastp.html结果</p><table><tbody><tr><td class="code"><pre><span class="line">fastp -i in.R1.fq.gz -I in.R2.fq.gz -o out.R1.fq.gz -O out.R2.fq.gz</span><br></pre></td></tr></tbody></table><p>批量</p><table><tbody><tr><td class="code"><pre><span class="line"># 创建清理后的文件夹</span><br><span class="line">mkdir  clean-fastp</span><br><span class="line"></span><br><span class="line"># 设置工作目录为fastq文件所在的目录</span><br><span class="line">cd ./fastq-result/</span><br><span class="line"></span><br><span class="line"># 遍历所有以_1.fastq结尾的文件</span><br><span class="line">for file1 in *_1.fastq; do</span><br><span class="line">    # 从文件名中提取没有_1的部分</span><br><span class="line">    base=$(basename "$file1" _1.fastq)</span><br><span class="line">    </span><br><span class="line">    # 构建对应的_2.fastq文件名</span><br><span class="line">    file2="${base}_2.fastq"</span><br><span class="line">fileoo1="${base}_1.fq"</span><br><span class="line">fileoo2="${base}_2.fq"</span><br><span class="line">jsono="${base}.json"</span><br><span class="line">htmlo="${base}.html"</span><br><span class="line"></span><br><span class="line">    </span><br><span class="line">    # 在后台执行检查和trim_galore命令</span><br><span class="line">    (</span><br><span class="line">        # 检查对应的_2.fastq文件是否存在</span><br><span class="line">        if [ -e "$file2" ]; then</span><br><span class="line">            # 如果存在，执行trim_galore命令</span><br><span class="line">fastp -i "$file1"  -o ../clean-fastp/"$fileoo1" -I "$file2" -O ../clean-fastp/"$fileoo2"  --json  ../clean-fastp/"$jsono"  --html  ../clean-fastp/"$htmlo"</span><br><span class="line">                    else</span><br><span class="line">            # 如果不存在，打印错误信息</span><br><span class="line">            echo "Error: No matching file found for $file1"</span><br><span class="line">        fi</span><br><span class="line">    ) &amp;</span><br><span class="line">done</span><br><span class="line"></span><br><span class="line"># 等待所有后台进程完成</span><br><span class="line">wait</span><br></pre></td></tr></tbody></table><h1 id="下载参考基因组"><a href="#下载参考基因组" class="headerlink" title="下载参考基因组"></a>下载参考基因组</h1><p>参考：<a href="http://www.360doc.com/content/21/0708/21/44561002_985728537.shtml">http://www.360doc.com/content/21/0708/21/44561002_985728537.shtml</a></p><p>Hisat2官网的 UCSC参考基因下载地址 <a href="https://daehwankimlab.github.io/hisat2/download/#h-sapiens">https://daehwankimlab.github.io/hisat2/download/#h-sapiens</a></p><p>官网提供了人和小鼠的索引文件下载，压缩包有make_grch38_tran.sh文件，详细记录了创建索引的过程。</p><p>一般要先构建索引然后进行比对，但是人和鼠的索引在Hisat2官网可直接下载使用</p><p>构建索引参考：<a href="https://www.bilibili.com/video/BV1mt411J7v8/?spm_id_from=333.337.search-card.all.click&vd_source=b938c9620af06f4224f5fd4db315cbd4">https://www.bilibili.com/video/BV1mt411J7v8/?spm_id_from&#x3D;333.337.search-card.all.click&amp;vd_source&#x3D;b938c9620af06f4224f5fd4db315cbd4</a></p><p><a href="https://blog.csdn.net/qq_74093550/article/details/131915068">https://blog.csdn.net/qq_74093550&#x2F;article&#x2F;details&#x2F;131915068</a></p><p>下载并解压所需的 mm10 或 grcm38 的index文件</p><p>下载UCSC mm10，小鼠参考基因，创建文件夹并下载到 .&#x2F;reference 测试中使用这个</p><p><img src="/picture/image-20240805141405479.png" alt="image-20240805141405479"></p><table><tbody><tr><td class="code"><pre><span class="line">wget https://genome-idx.s3.amazonaws.com/hisat/mm10_genome.tar.gz</span><br><span class="line">tar -zxvf *tar.gz </span><br></pre></td></tr></tbody></table><p>另一个参考基因组GRCm38，GRCm38下载后为同一个文件？小鼠还是直接用UCSC mm10</p><p><img src="/picture/image-20240805141345151.png" alt="image-20240805141345151"></p><table><tbody><tr><td class="code"><pre><span class="line">wget https://cloud.biohpc.swmed.edu/index.php/s/grcm38/download -O GRCm38.tar.gz</span><br></pre></td></tr></tbody></table><h2 id="参考序列及注释下载汇总"><a href="#参考序列及注释下载汇总" class="headerlink" title="参考序列及注释下载汇总"></a>参考序列及注释下载汇总</h2><p>直接可用的索引和注释</p><p>UCSC下载，推荐 <a href="https://hgdownload.soe.ucsc.edu/downloads.html">https://hgdownload.soe.ucsc.edu/downloads.html</a></p><p>选择物种，下载chromFa.tar.gz</p><p>人</p><table><tbody><tr><td class="code"><pre><span class="line">##基因组  UCSC hg38</span><br><span class="line">https://genome-idx.s3.amazonaws.com/hisat/hg38_genome.tar.gz</span><br><span class="line">##参考注释</span><br><span class="line">https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/hg38.refGene.gtf.gz</span><br></pre></td></tr></tbody></table><p>小鼠</p><table><tbody><tr><td class="code"><pre><span class="line">##基因组  UCSC mm10</span><br><span class="line">https://genome-idx.s3.amazonaws.com/hisat/mm10_genome.tar.gz</span><br><span class="line">##参考注释</span><br><span class="line">https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/genes/mm10.refGene.gtf.gz</span><br></pre></td></tr></tbody></table><h2 id="自己构建参考序列索引（可选）"><a href="#自己构建参考序列索引（可选）" class="headerlink" title="自己构建参考序列索引（可选）"></a>自己构建参考序列索引（可选）</h2><p>下载参考：<a href="https://blog.csdn.net/u011262253/article/details/117486244">https://blog.csdn.net/u011262253/article/details/117486244</a></p><p>如果不使用UCSC则需要这一步</p><h3 id="Ensembl数据库"><a href="#Ensembl数据库" class="headerlink" title="Ensembl数据库"></a>Ensembl数据库</h3><p>参考：<a href="https://blog.csdn.net/flashan_shensanceng/article/details/115705200">https://blog.csdn.net/flashan_shensanceng&#x2F;article&#x2F;details&#x2F;115705200</a></p><p>在Ensembl数据库：<a href="https://asia.ensembl.org/index.html">https://asia.ensembl.org/index.html</a></p><p>找到我们数据的物种</p><p><img src="/picture/image-20240806182532687.png" alt="image-20240806182532687"></p><p><strong>参考基因组和注释文件都有很多的版本，需要我们根据实际情况进行选择</strong></p><p>Ensemble提供两种组装形式和3种重复序列处理方式的参考基因组，分别是primary、toplevel 、unmasked(dna) 、soft-masked(dna_sm) 和masked(dna_rm) 。</p><p>一般选择 <strong>.dna.primary或.dna_sm.primary</strong>！！！！！！</p><p>没有的话选择 <strong>.dna.toplevel.fa.gz</strong> 也可</p><p>分别包含三种类型的.gtf（general tranfer format）和.gff（general feature format）注释文件，根据自己需求选择合适注释信息</p><ol><li>gtf：全部的注释信息，选择这个就好</li><li>chr：染色体注释信息</li><li>abinitio：预测基因集注释信息</li></ol><p>如选择：</p><p><a href="https://ftp.ensembl.org/pub/release-112/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz">Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz</a></p><p><a href="https://ftp.ensembl.org/pub/release-112/gtf/homo_sapiens/Homo_sapiens.GRCh38.112.gtf.gz">Homo_sapiens.GRCh38.112.gtf.gz</a></p><h3 id="NCBI数据库"><a href="#NCBI数据库" class="headerlink" title="NCBI数据库"></a>NCBI数据库</h3><p><a href="https://www.ncbi.nlm.nih.gov/datasets/genome/">https://www.ncbi.nlm.nih.gov/datasets/genome/</a></p><p><img src="/picture/image-20240806184009539.png" alt="image-20240806184009539"></p><p><img src="/picture/image-20240806184101829.png" alt="image-20240806184101829"></p><p>我们下载</p><p>Genome sequences (FASTA)</p><p>Annotation features (GTF)</p><h3 id="GENCODE数据库"><a href="#GENCODE数据库" class="headerlink" title="GENCODE数据库"></a>GENCODE数据库</h3><p>如果只涉及人类和小鼠，极力推荐 GENCOE，这里有着相较其他数据库，最新最全的基因组和其注释信息。</p><p><img src="/picture/image-20240806185050209.png" alt="image-20240806185050209"></p><p><img src="/picture/image-20240806185112023.png" alt="image-20240806185112023"></p><p>一般选择</p><table><thead><tr><th>Genome sequence (GRCh38.p14)</th><th>ALL</th></tr></thead><tbody><tr><td>Comprehensive gene annotation</td><td>CHR</td></tr></tbody></table><h3 id="UCSC数据库"><a href="#UCSC数据库" class="headerlink" title="UCSC数据库"></a>UCSC数据库</h3><p>因为只有人和鼠的是已经有构建好索引的文件可以直接用，但是其他物种还是需要自己进行构建</p><p>在官网中找到自己的物种 <a href="https://hgdownload.soe.ucsc.edu/downloads.html">https://hgdownload.soe.ucsc.edu/downloads.html</a></p><p>自己选择合适的文件</p><p><img src="/picture/image-20240807163219749.png" alt="image-20240807163219749"></p><p>下载结果</p><p><img src="/picture/image-20240807181922011.png" alt="image-20240807181922011"></p><h3 id="构建方法"><a href="#构建方法" class="headerlink" title="构建方法"></a>构建方法</h3><p>参考 <a href="https://blog.csdn.net/weixin_40640700/article/details/116891230">https://blog.csdn.net/weixin_40640700&#x2F;article&#x2F;details&#x2F;116891230</a></p><p>例如Ensembl数据库中下载小鼠的参考序列，用 hisat2 构建，</p><p>注意，使用什么软件构建索引，就使用什么软件进行比对，并使用序列对应的注释</p><p>下载参考序列和注释 <a href="https://asia.ensembl.org/Mus_musculus/Info/Index">https://asia.ensembl.org/Mus_musculus&#x2F;Info&#x2F;Index</a></p><p>选择下载<img src="/picture/image-20240902110039833.png" alt="image-20240902110039833"></p><p>参考基因 Mus_musculus.GRCm39.dna.primary_assembly.fa.gz</p><p>注释文件 Mus_musculus.GRCm39.112.gtf.gz</p><p>使用命令下载</p><table><tbody><tr><td class="code"><pre><span class="line">#基因文件</span><br><span class="line">wget -c https://ftp.ensembl.org/pub/release-112/fasta/mus_musculus/dna/Mus_musculus.GRCm39.dna.primary_assembly.fa.gz</span><br><span class="line">#注释文件</span><br><span class="line">wget  -c https://ftp.ensembl.org/pub/release-112/gtf/mus_musculus/Mus_musculus.GRCm39.112.gtf.gz</span><br></pre></td></tr></tbody></table><p>构建命令，需要等待的时间较长，可能1-2h</p><table><tbody><tr><td class="code"><pre><span class="line">#解压文件</span><br><span class="line">gzip -d Mus_musculus.GRCm39.dna.primary_assembly.fa.gz</span><br><span class="line">#hisat2构建索引</span><br><span class="line">hisat2-build -p 20  Mus_musculus.GRCm39.dna.primary_assembly.fa mousegenome</span><br></pre></td></tr></tbody></table><p><img src="/picture/image-20240903100450370.png" alt="image-20240903100450370"></p><p>得到8个索引文件，前面的 MusGRCm39 为自己设置的索引名字，结尾为 ht2</p><p>使用自己构建的索引和官方提供的索引进行比较</p><p>自己索引比对率和官方比对率比较，相差无几</p><p><img src="/picture/image-20240903100856364.png" alt="image-20240903100856364"></p><p>暂未进行较多尝试</p><table><tbody><tr><td class="code"><pre><span class="line">samtools sort -@ 12 -o ./688.bam ./6668888.sam</span><br><span class="line">samtools sort -@ 12 -o ./677.bam ./666777.sam</span><br><span class="line"></span><br><span class="line">gtf='/public/home/dk_szy/songxudong/rna-test/reference/mouse-Ensembl-bowtie2/gft/Mus_musculus.GRCm39.112.gtf.gz'</span><br><span class="line">gtf0='/public/home/dk_szy/songxudong/rna-test/reference/mouse-UCSC-mm10/gft/mm10.refGene.gtf.gz'</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">featureCounts -T  20  -p  -a  $gtf  -o  counts.txt  ./688.bam</span><br><span class="line"></span><br><span class="line">featureCounts -T  20  -p  -a  $gtf0  -o  counts0.txt  ./677.bam</span><br></pre></td></tr></tbody></table><p>值得注意的是自己从Ensembl数据库中下载的对应注释，生成的count矩阵需要geneid为 ENSMUSG00000104478，之后需要进行ID转换</p><h1 id="hisat2比对基因"><a href="#hisat2比对基因" class="headerlink" title="hisat2比对基因"></a>hisat2比对基因</h1><h2 id="hisat2单个比对"><a href="#hisat2单个比对" class="headerlink" title="hisat2单个比对"></a>hisat2单个比对</h2><p>一个样，花了8分钟，多样品记得调整运行时间</p><table><tbody><tr><td class="code"><pre><span class="line">index='/public/home/dk_szy/songxudong/rna-test/reference/mm10/genome'</span><br><span class="line"></span><br><span class="line">    id=234234214</span><br><span class="line"></span><br><span class="line">    file1="./clean_data/SRR12207279_1_val_1.fq"</span><br><span class="line"></span><br><span class="line">file2="./clean_data/SRR12207279_2_val_2.fq"</span><br><span class="line"></span><br><span class="line">echo $file1</span><br><span class="line">echo $file2</span><br><span class="line">echo "${id}.sam" </span><br><span class="line">        hisat2 -t -p 12 -x $index \</span><br><span class="line">    -1 "$file1" \</span><br><span class="line">    -2 "$file2" \</span><br><span class="line">    -S "${id}.sam"</span><br></pre></td></tr></tbody></table><h2 id="hisat2批量比对"><a href="#hisat2批量比对" class="headerlink" title="hisat2批量比对"></a>hisat2批量比对</h2><p>hisat2</p><p>比对将双端的fq文件转为》sam文件》bam文件</p><p>sam太大，命令中转为bam就自动删除了</p><p>结果在 .&#x2F;align&#x2F; 文件夹中</p><p>25行 hisat2 -t -p 20 -x $index \ 中和通过调整 -p 20 的线程数加速运行</p><table><tbody><tr><td class="code"><pre><span class="line"></span><br><span class="line">mkdir -p ./align/flag</span><br><span class="line">cd ./align/</span><br><span class="line">pwd</span><br><span class="line"></span><br><span class="line">##参考基因组的位置</span><br><span class="line">index='/public/home/dk_szy/songxudong/rna-test/reference/mm10/genome'</span><br><span class="line"></span><br><span class="line"># 假设你的fastq文件在fastq-result文件夹中</span><br><span class="line">fastq_dir="../clean_data"</span><br><span class="line"></span><br><span class="line"># 遍历fastq-result文件夹中的所有1.fastq文件</span><br><span class="line">for file1 in $fastq_dir/*_1_val_1.fq; do</span><br><span class="line">    # 从1.fastq文件名中提取ID</span><br><span class="line">    id=$(basename "$file1" .fq | sed 's/_1_val_1//')</span><br><span class="line">    </span><br><span class="line">    # 查找对应的2.fastq文件</span><br><span class="line">    file2="$fastq_dir/${id}_2_val_2.fq"</span><br><span class="line">    </span><br><span class="line">    # 检查2.fastq文件是否存在</span><br><span class="line">    if [ -f "$file2" ]; then</span><br><span class="line">        echo "333#  ${id}  ！！！！！ is on the hisat2 Working !!!"</span><br><span class="line">        </span><br><span class="line">        # 使用hisat2进行比对，并指定输出目录为当前目录（./align/）</span><br><span class="line">        hisat2 -t -p 20  -x $index \</span><br><span class="line">            -1 "$file1" \</span><br><span class="line">            -2 "$file2"  -S  "${id}.sam" </span><br><span class="line">        </span><br><span class="line">        # sam2bam and remove sam，指定输出目录为当前目录（./align/）</span><br><span class="line">        echo -e " ${id} sam2bam and remove sam   "</span><br><span class="line">        samtools sort -@ 12 -o "./${id}_sorted.bam" "./${id}.sam"</span><br><span class="line">        rm "./${id}.sam"</span><br><span class="line">    else</span><br><span class="line">        echo "No matching 2.fastq file found for $file1"</span><br><span class="line">    fi</span><br><span class="line">done</span><br></pre></td></tr></tbody></table><h2 id="查看运行结果"><a href="#查看运行结果" class="headerlink" title="查看运行结果"></a>查看运行结果</h2><p>grep alignment slurm-555095.out 为运行的报告文件，提取关于alignment的结果</p><table><tbody><tr><td class="code"><pre><span class="line">grep alignment slurm-555095.out</span><br></pre></td></tr></tbody></table><p><img src="/picture/image-20240805090435619.png" alt="image-20240805090435619"></p><h1 id="生成counts表达矩阵"><a href="#生成counts表达矩阵" class="headerlink" title="生成counts表达矩阵"></a>生成counts表达矩阵</h1><p>使用featureCounts 计数得到</p><h2 id="下载参考基因注释"><a href="#下载参考基因注释" class="headerlink" title="下载参考基因注释"></a>下载参考基因注释</h2><p>UCSC mm10 为例</p><p><a href="https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/genes/">https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/genes/</a></p><p><img src="/picture/image-20240805104440947.png" alt="image-20240805104440947"></p><p>gtf&#x3D;’&#x2F;public&#x2F;home&#x2F;dk_szy&#x2F;songxudong&#x2F;rna-test&#x2F;reference&#x2F;gft&#x2F;mm10.refGene.gtf.gz’ 为注释文件，<strong>要和比对中的参考基因对应</strong>！！！</p><table><tbody><tr><td class="code"><pre><span class="line">date</span><br><span class="line"></span><br><span class="line">gtf='/public/home/dk_szy/songxudong/rna-test/reference/gft/mm10.refGene.gtf.gz'</span><br><span class="line"></span><br><span class="line">mkdir  -p  ./counts</span><br><span class="line"></span><br><span class="line">cd ./counts</span><br><span class="line"></span><br><span class="line">pwd</span><br><span class="line"></span><br><span class="line">featureCounts -T  20  -p  -a  $gtf  -o  counts.txt  ../align/*.bam</span><br><span class="line"></span><br><span class="line">multiqc ./</span><br><span class="line"></span><br><span class="line">echo -e " \n \n \n ALL WORK DONE !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!  \n "</span><br><span class="line">date</span><br></pre></td></tr></tbody></table><p>结果在.&#x2F;counts中</p><p>count矩阵 counts.txt</p><p>注释结果解读 multiqc_report.html</p><h2 id="类型转换"><a href="#类型转换" class="headerlink" title="类型转换"></a>类型转换</h2><p>featureCounts的结果包含基因长度，可直接使用进行转换</p><p>整理为带基因长度的文件</p><p><img src="/picture/image-20240816162701639.png" alt="image-20240816162701639"></p><table><tbody><tr><td class="code"><pre><span class="line"></span><br><span class="line"><span class="comment">#我们使原始的count数据进行转换</span></span><br><span class="line">exprSet<span class="operator">&lt;-</span>read.csv<span class="punctuation">(</span><span class="string">"count-改名.csv"</span><span class="punctuation">,</span>row.names <span class="operator">=</span> <span class="number">1</span><span class="punctuation">)</span></span><br><span class="line"></span><br><span class="line"><span class="comment">#删除表达量=0的行</span></span><br><span class="line">exprSet1 <span class="operator">&lt;-</span> exprSet<span class="punctuation">[</span>rowSums<span class="punctuation">(</span>exprSet<span class="punctuation">[</span><span class="punctuation">,</span><span class="number">2</span><span class="operator">:</span>ncol<span class="punctuation">(</span>exprSet<span class="punctuation">)</span><span class="punctuation">]</span><span class="punctuation">)</span> <span class="operator">&gt;</span> <span class="number">0</span><span class="punctuation">,</span><span class="punctuation">]</span></span><br><span class="line"><span class="built_in">dim</span><span class="punctuation">(</span>exprSet<span class="punctuation">)</span></span><br><span class="line"><span class="built_in">names</span><span class="punctuation">(</span>exprSet<span class="punctuation">)</span></span><br><span class="line"></span><br><span class="line"><span class="comment">##提取Count值</span></span><br><span class="line">count <span class="operator">&lt;-</span> exprSet1<span class="punctuation">[</span><span class="punctuation">,</span><span class="number">2</span><span class="operator">:</span>ncol<span class="punctuation">(</span>exprSet<span class="punctuation">)</span><span class="punctuation">]</span></span><br><span class="line">write.csv<span class="punctuation">(</span>count<span class="punctuation">,</span><span class="string">"song_count.csv"</span><span class="punctuation">)</span></span><br><span class="line"><span class="comment">##提取基因长度，基因长度需要转化成kb</span></span><br><span class="line">gene_length_kb <span class="operator">&lt;-</span> exprSet1<span class="operator">$</span>Length <span class="operator">/</span> <span class="number">1000</span></span><br><span class="line">head<span class="punctuation">(</span>gene_length_kb<span class="punctuation">)</span></span><br><span class="line"></span><br><span class="line"><span class="comment">#CPM</span></span><br><span class="line">cpm <span class="operator">=</span> log2<span class="punctuation">(</span>edgeR<span class="operator">::</span>cpm<span class="punctuation">(</span>count<span class="punctuation">)</span><span class="operator">+</span><span class="number">1</span><span class="punctuation">)</span></span><br><span class="line">write.csv<span class="punctuation">(</span>cpm<span class="punctuation">,</span><span class="string">"song_CPM.csv"</span><span class="punctuation">)</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="comment">##TPM</span></span><br><span class="line"><span class="comment">### 每千碱基reads（per million scaling factor)长度标准化</span></span><br><span class="line">data_rpk <span class="operator">&lt;-</span> count <span class="operator">/</span>gene_length_kb</span><br><span class="line"><span class="comment">##每百万</span></span><br><span class="line">TPM <span class="operator">&lt;-</span> t<span class="punctuation">(</span>t<span class="punctuation">(</span>data_rpk<span class="punctuation">)</span> <span class="operator">/</span> colSums<span class="punctuation">(</span>data_rpk<span class="punctuation">)</span> <span class="operator">*</span> <span class="number">1000000</span><span class="punctuation">)</span></span><br><span class="line">head<span class="punctuation">(</span>TPM<span class="punctuation">)</span></span><br><span class="line"><span class="comment">## 求均值，看一看</span></span><br><span class="line">avg_tmp <span class="operator">&lt;-</span> data.frame<span class="punctuation">(</span>avg_tmp <span class="operator">=</span> rowMeans<span class="punctuation">(</span>TPM<span class="punctuation">)</span><span class="punctuation">)</span></span><br><span class="line">head<span class="punctuation">(</span>avg_tmp<span class="punctuation">)</span></span><br><span class="line"><span class="comment">##保存数据</span></span><br><span class="line">write.csv<span class="punctuation">(</span>TPM<span class="punctuation">,</span><span class="string">"song_TPM.csv"</span><span class="punctuation">)</span></span><br><span class="line"></span><br><span class="line"><span class="comment">##FPKM</span></span><br><span class="line">FPKM <span class="operator">&lt;-</span> t<span class="punctuation">(</span>t<span class="punctuation">(</span>data_rpk<span class="punctuation">)</span> <span class="operator">/</span> colSums<span class="punctuation">(</span>count <span class="punctuation">)</span> <span class="operator">*</span> <span class="number">10</span><span class="operator">^</span><span class="number">6</span><span class="punctuation">)</span></span><br><span class="line">write.csv<span class="punctuation">(</span>TPM<span class="punctuation">,</span><span class="string">"song_FPKM.csv"</span><span class="punctuation">)</span></span><br></pre></td></tr></tbody></table><h1 id="转录本相关"><a href="#转录本相关" class="headerlink" title="转录本相关"></a>转录本相关</h1><p>未深入了解使用</p><p>基于PASA软件</p><p>简单介绍的视频：<a href="https://www.bilibili.com/video/BV1KE421g7kT?t=3053.0&p=7">https://www.bilibili.com/video/BV1KE421g7kT?t=3053.0&amp;p=7</a></p><p><img src="/picture/image-20241024163759421.png" alt="image-20241024163759421"></p><h1 id="样品的重命名和分组"><a href="#样品的重命名和分组" class="headerlink" title="样品的重命名和分组"></a>样品的重命名和分组</h1><p>这部分可省略</p><p>counts.txt 中根据分组进行修改，样品少则手动在 Excel 中修改也一样</p><p>下载的数据从ncbi中下载 Metadata</p><p><img src="/picture/image-20240805111958493.png" alt="image-20240805111958493"></p><p>在R语言中进行修改</p><p>需要</p><p>Metadata文件 SraRunTable.txt</p><p>count文件 counts.txt</p><table><tbody><tr><td class="code"><pre><span class="line"><span class="comment">###环境设置</span></span><br><span class="line">rm<span class="punctuation">(</span><span class="built_in">list</span><span class="operator">=</span>ls<span class="punctuation">(</span><span class="punctuation">)</span><span class="punctuation">)</span></span><br><span class="line">options<span class="punctuation">(</span>stringsAsFactors <span class="operator">=</span> <span class="built_in">F</span><span class="punctuation">)</span> </span><br><span class="line">library<span class="punctuation">(</span>tidyverse<span class="punctuation">)</span> <span class="comment"># ggplot2 stringer dplyr tidyr readr purrr  tibble forcats</span></span><br><span class="line">library<span class="punctuation">(</span>data.table<span class="punctuation">)</span> <span class="comment">#多核读取文件</span></span><br><span class="line">setwd<span class="punctuation">(</span><span class="string">"C:/Users/Lenovo/Desktop/test"</span><span class="punctuation">)</span></span><br><span class="line"></span><br><span class="line"><span class="comment">#### 对counts进行处理筛选得到表达矩阵 ####</span></span><br><span class="line">a1 <span class="operator">&lt;-</span> fread<span class="punctuation">(</span><span class="string">'./counts.txt'</span><span class="punctuation">,</span></span><br><span class="line">            header <span class="operator">=</span> <span class="built_in">T</span><span class="punctuation">,</span>data.table <span class="operator">=</span> <span class="built_in">F</span><span class="punctuation">)</span><span class="comment">#载入counts，第一列设置为列名</span></span><br><span class="line">colnames<span class="punctuation">(</span>a1<span class="punctuation">)</span></span><br><span class="line">counts <span class="operator">&lt;-</span> a1<span class="punctuation">[</span><span class="punctuation">,</span><span class="number">7</span><span class="operator">:</span>ncol<span class="punctuation">(</span>a1<span class="punctuation">)</span><span class="punctuation">]</span> <span class="comment">#截取样本基因表达量的counts部分作为counts </span></span><br><span class="line">rownames<span class="punctuation">(</span>counts<span class="punctuation">)</span> <span class="operator">&lt;-</span> a1<span class="operator">$</span>Geneid <span class="comment">#将基因名作为行名</span></span><br><span class="line"><span class="comment">#更改样品名</span></span><br><span class="line">colnames<span class="punctuation">(</span>counts<span class="punctuation">)</span></span><br><span class="line">colnames<span class="punctuation">(</span>counts<span class="punctuation">)</span> <span class="operator">&lt;-</span> gsub<span class="punctuation">(</span><span class="string">'../align/'</span><span class="punctuation">,</span><span class="string">''</span><span class="punctuation">,</span> <span class="comment">#删除样品名前缀</span></span><br><span class="line">                         gsub<span class="punctuation">(</span><span class="string">'_sorted.bam'</span><span class="punctuation">,</span><span class="string">''</span><span class="punctuation">,</span>  colnames<span class="punctuation">(</span>counts<span class="punctuation">)</span><span class="punctuation">)</span><span class="punctuation">)</span> <span class="comment">#删除样品名后缀</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="comment">#### 导入或构建样本信息,  进行列样品名的重命名和分组####</span></span><br><span class="line">b <span class="operator">&lt;-</span> read.csv<span class="punctuation">(</span><span class="string">'./SraRunTable.txt'</span><span class="punctuation">)</span></span><br><span class="line">b</span><br><span class="line">name_list <span class="operator">&lt;-</span> b<span class="operator">$</span>source_name<span class="punctuation">[</span>match<span class="punctuation">(</span>colnames<span class="punctuation">(</span>counts<span class="punctuation">)</span><span class="punctuation">,</span>b<span class="operator">$</span>Run<span class="punctuation">)</span><span class="punctuation">]</span>; name_list  <span class="comment">#选择所需要的样品信息列</span></span><br><span class="line">nlgl <span class="operator">&lt;-</span> data.frame<span class="punctuation">(</span>row.names<span class="operator">=</span>colnames<span class="punctuation">(</span>counts<span class="punctuation">)</span><span class="punctuation">,</span></span><br><span class="line">                   name_list<span class="operator">=</span>name_list<span class="punctuation">,</span></span><br><span class="line">                   group_list<span class="operator">=</span>name_list<span class="punctuation">)</span></span><br><span class="line">fix<span class="punctuation">(</span>nlgl<span class="punctuation">)</span>  <span class="comment">#手动编辑构建样品名和分组信息</span></span><br><span class="line">name_list <span class="operator">&lt;-</span> nlgl<span class="operator">$</span>name_list</span><br><span class="line">colnames<span class="punctuation">(</span>counts<span class="punctuation">)</span> <span class="operator">&lt;-</span> name_list <span class="comment">#更改样品名</span></span><br><span class="line">group_list <span class="operator">&lt;-</span> nlgl<span class="operator">$</span>group_list</span><br><span class="line">gl <span class="operator">&lt;-</span> data.frame<span class="punctuation">(</span>row.names<span class="operator">=</span>colnames<span class="punctuation">(</span>counts<span class="punctuation">)</span><span class="punctuation">,</span> <span class="comment">#构建样品名与分组对应的数据框</span></span><br><span class="line">                 group_list<span class="operator">=</span>group_list<span class="punctuation">)</span></span><br><span class="line"></span><br><span class="line">write.csv<span class="punctuation">(</span>counts<span class="punctuation">,</span>file <span class="operator">=</span> <span class="string">"count-改名.csv"</span><span class="punctuation">)</span> <span class="comment">#保存</span></span><br><span class="line">write.csv<span class="punctuation">(</span>nlgl<span class="punctuation">,</span>file <span class="operator">=</span> <span class="string">"分组.csv"</span><span class="punctuation">)</span> <span class="comment">#保存</span></span><br></pre></td></tr></tbody></table><p>用于下游分析的文件 count-改名.csv<br>分组信息文件 分组.csv</p><h1 id="文件目录"><a href="#文件目录" class="headerlink" title="文件目录"></a>文件目录</h1><p><img src="/picture/image-20240805171736813.png" alt="image-20240805171736813"></p><p>test ：测试文件</p><p>SRR：下载的源文件</p><p>fastq-result：解压后的双端文件</p><p>clean_data：质控后的序列文件</p><p>align：比对后未注释的文件</p><p>counts：对比后的count文件，需进行样品重命名</p><p>reference：参考序列和注释文件</p><p>.sh 按步骤的分析操作文件 sbatch XXX.sh</p>]]></content>
    
    
    <summary type="html">创建环境conda create -n rna_p3 python&amp;#x3D;3 sra-tools conda env list #查看环境conda activate rna_p3</summary>
    
    
    
    <category term="生物信息学" scheme="https://song-xudong.github.io/categories/%E7%94%9F%E7%89%A9%E4%BF%A1%E6%81%AF%E5%AD%A6/"/>
    
    
    <category term="生物信息学" scheme="https://song-xudong.github.io/tags/%E7%94%9F%E7%89%A9%E4%BF%A1%E6%81%AF%E5%AD%A6/"/>
    
    <category term="RNA-seq" scheme="https://song-xudong.github.io/tags/RNA-seq/"/>
    
  </entry>
  
  <entry>
    <title>Termite Fungus Comb Polysaccharides Alleviate Hyperglycemia and Hyperlipidemia in Type 2 Diabetic Mice by Regulating Hepatic Glucose/Lipid Metabolism and the Gut Microbiota</title>
    <link href="https://song-xudong.github.io/2024/11/16/%E8%AE%BA%E6%96%872/"/>
    <id>https://song-xudong.github.io/2024/11/16/%E8%AE%BA%E6%96%872/</id>
    <published>2024-11-16T01:43:30.000Z</published>
    <updated>2024-11-16T01:54:05.000Z</updated>
    
    <content type="html"><![CDATA[<p><img src="/picture/image-20241116094650564.png" alt="image-20241116094650564"></p><h1 id="Termite-Fungus-Comb-Polysaccharides-Alleviate-Hyperglycemia-and-Hyperlipidemia-in-Type-2-Diabetic-Mice-by-Regulating-Hepatic-Glucose-Lipid-Metabolism-and-the-Gut-Microbiota"><a href="#Termite-Fungus-Comb-Polysaccharides-Alleviate-Hyperglycemia-and-Hyperlipidemia-in-Type-2-Diabetic-Mice-by-Regulating-Hepatic-Glucose-Lipid-Metabolism-and-the-Gut-Microbiota" class="headerlink" title="Termite Fungus Comb Polysaccharides Alleviate Hyperglycemia and Hyperlipidemia in Type 2 Diabetic Mice by Regulating Hepatic Glucose&#x2F;Lipid Metabolism and the Gut Microbiota"></a>Termite Fungus Comb Polysaccharides Alleviate Hyperglycemia and Hyperlipidemia in Type 2 Diabetic Mice by Regulating Hepatic Glucose&#x2F;Lipid Metabolism and the Gut Microbiota</h1><p>发表日期：2024-7-6</p><p>作者：Xiao H, <strong>Song X</strong>, Wang P, Li W, Qin S, Huang C, et al.</p><p>DOI：10.3389&#x2F;fphar.2024.1445061</p><p>分区：中科院2区</p><p>影响因子：4.9</p><p>引用格式：Xiao H, Song X, Wang P, Li W, Qin S, Huang C, Wu B, Jia B, Gao Q, Song Z. Termite Fungus Comb Polysaccharides Alleviate Hyperglycemia and Hyperlipidemia in Type 2 Diabetic Mice by Regulating Hepatic Glucose&#x2F;Lipid Metabolism and the Gut Microbiota. <em>International Journal of Molecular Sciences</em>. 2024; 25(13):7430.</p><p>直达链接：<a href="https://doi.org/10.3390/ijms25137430">https://doi.org/10.3390/ijms25137430</a></p>]]></content>
    
    
    <summary type="html">Termite Fungus Comb Polysaccharides Alleviate Hyperglycemia and Hyperlipidemia in Type 2 Diabetic Mice by Regulating Hepatic Glucose&amp;#x2F;Lipid Metab</summary>
    
    
    
    <category term="论文" scheme="https://song-xudong.github.io/categories/%E8%AE%BA%E6%96%87/"/>
    
    
    <category term="论文" scheme="https://song-xudong.github.io/tags/%E8%AE%BA%E6%96%87/"/>
    
    <category term="二作" scheme="https://song-xudong.github.io/tags/%E4%BA%8C%E4%BD%9C/"/>
    
  </entry>
  
  <entry>
    <title>Bibliometric analysis of vitamin D and obesity research over the period 2000 to 2023</title>
    <link href="https://song-xudong.github.io/2024/11/16/%E8%AE%BA%E6%96%871/"/>
    <id>https://song-xudong.github.io/2024/11/16/%E8%AE%BA%E6%96%871/</id>
    <published>2024-11-16T01:25:33.000Z</published>
    <updated>2024-11-16T01:43:10.000Z</updated>
    
    <content type="html"><![CDATA[<p><img src="/picture/image-20241116093701587.png" alt="image-20241116093701587"></p><h1 id="Bibliometric-analysis-of-vitamin-D-and-obesity-research-over-the-period-2000-to-2023"><a href="#Bibliometric-analysis-of-vitamin-D-and-obesity-research-over-the-period-2000-to-2023" class="headerlink" title="Bibliometric analysis of vitamin D and obesity research over the period 2000 to 2023"></a>Bibliometric analysis of vitamin D and obesity research over the period 2000 to 2023</h1><p>发表日期：2024-7-18</p><p>作者：<strong>Song X</strong>, Qin S, Chen S, Zhang C, Lin L, Song Z.</p><p>DOI：10.3389&#x2F;fphar.2024.1445061</p><p>分区：中科院2区</p><p>影响因子：4.4</p><p>引用格式：Song X, Qin S, Chen S, Zhang C, Lin L, Song Z. Bibliometric analysis of vitamin D and obesity research over the period 2000 to 2023. <em>Front Pharmacol</em>. 2024;15:1445061. Published 2024 Jul 18. doi:10.3389&#x2F;fphar.2024.1445061</p><p>直达链接：<a href="https://doi.org/10.3389/fphar.2024.1445061">https://doi.org/10.3389/fphar.2024.1445061</a></p>]]></content>
    
    
    <summary type="html">Bibliometric analysis of vitamin D and obesity research over the period 2000 to 2023发表日期：2024-7-18 作者：Song X, Qin S, Chen S, Zhang C, Lin L, Song Z.</summary>
    
    
    
    <category term="论文" scheme="https://song-xudong.github.io/categories/%E8%AE%BA%E6%96%87/"/>
    
    
    <category term="论文" scheme="https://song-xudong.github.io/tags/%E8%AE%BA%E6%96%87/"/>
    
    <category term="一作" scheme="https://song-xudong.github.io/tags/%E4%B8%80%E4%BD%9C/"/>
    
  </entry>
  
</feed>
