guelfoweb/malware-analysis-static

Codex SKILL.md for for malware analysis of suspicious binaries

Compatible avecClaude CodeCodex CLI~Cursor
npx add-skill guelfoweb/malware-analysis-static

name: malware-analysis-static author: Gianni Amato version: 1.2.0 homepage: https://github.com/guelfoweb/malware-analysis-static update_source: https://raw.githubusercontent.com/guelfoweb/malware-analysis-static/main/SKILL.md description: Use this skill for malware analysis of suspicious binaries, Android APKs, Office documents, web payloads, and scripts or source code such as HTML, SVG, MHTML, JS, VBS, VBA, HTA, WSF, PowerShell, batch, shell, Python, PHP, and similar droppers across Linux, macOS, and Windows. It guides Codex through triage, reverse engineering, family attribution, static analysis, controlled decoding, evidence preservation, AGENTS.md context tracking, REPORT.md reporting, C2 extraction, stage extraction, and cross-platform tool discovery with per-tool install authorization.

Malware Analysis

Show this warning before starting analysis:

Warning: malware analysis should be performed on an isolated lab machine or disposable VM, not on a personal or production workstation.

Versioning And Update Check

This skill uses explicit versioning.

Startup behavior:

  1. read the local skill version from the frontmatter
  2. if network access is available and the user has not prohibited network use, fetch the remote SKILL.md from:
    • https://raw.githubusercontent.com/guelfoweb/malware-analysis-static/main/SKILL.md
  3. read the remote version field from the frontmatter and compare it with the local version
  4. if the installed skill is behind, tell the user clearly that a newer version exists and ask whether they want to update the skill before continuing
  5. if the user declines the update, continue the analysis with the current local version
  6. if the remote file is unavailable or the frontmatter cannot be parsed, continue normally and record that the update check could not be completed

Rules:

  • do not block the analysis if the update check fails
  • do not auto-update silently
  • only ask the user about updating when a newer version is actually available
  • if the versions match, continue without surfacing unnecessary update chatter

Scope

Use this skill for:

  • Windows PE binaries, DLLs, and .NET assemblies
  • Linux ELF binaries
  • macOS Mach-O binaries
  • Android APKs
  • Office documents
  • web payloads such as HTML, HTM, MHTML, SVG, XML, and XSL with embedded script or smuggling logic
  • scripts and source payloads such as JS, JSE, VBS, VBE, VBA, HTA, WSF, WSH, PowerShell, BAT, CMD, shell scripts, Python, PHP, Perl, Ruby, Lua, AutoIt, Node.js loaders, webshells, and downloader chains

Primary mode is static analysis. If the user explicitly requests dynamic analysis, only prepare or use an isolated disposable lab. Never run malware on a personal workstation, production system, or host OS that contains real credentials or user data.

Objectives

For each case, try to determine:

  • file type and architecture
  • MD5, SHA1, and SHA256
  • likely malware family, cluster, or toolkit if supported by evidence
  • C2 infrastructure
  • drop URLs and stage URLs
  • config endpoints and embedded configuration
  • persistence and anti-analysis behavior
  • embedded, dropped, or downloaded stages
  • obfuscation, packing, encoding, compression, and encryption methods
  • decoding, decryption, unpacking, and staging logic

State uncertainty explicitly. Do not make unsupported claims.

Reverse Engineering Mandate

Do real malware reverse engineering, not only IOC grepping.

Autonomy rule:

  • proceed automatically through the next reasonable analytical step whenever it is within the authorized scope, available tooling, and lab-safe context
  • do not stop to ask the user whether to continue with the next obvious analysis action
  • do not ask the user to choose between equivalent next analytical steps when one can be attempted immediately
  • if remote retrieval is not already authorized, continue automatically with the deepest offline analysis available instead of offering retrieval as a choice
  • prefer offline continuation by default whenever remote retrieval would require additional authorization
  • only interrupt the flow to ask the user when a real decision is required:
    • a missing tool must be installed
    • a network retrieval or execution step would exceed the already authorized scope
    • the next action creates a meaningful safety or cost decision
    • progress is blocked by an unresolved external dependency

Examples of what should be done automatically:

  • parse the next suspicious stream, object, resource, macro, script block, or overlay
  • recover and analyze the next stage when a drop URL or stage URL is already in scope
  • follow xrefs from suspicious strings into code
  • decode the next blob when the parent code clearly defines the decoder
  • inspect the next relevant artifact in the execution or compromise chain

When both of these are available:

  • deeper offline parsing or reverse engineering of already available artifacts
  • remote retrieval that is not yet explicitly authorized

Codex must choose the offline path automatically and continue working without asking.

Required mindset:

  • trace control flow and data flow when strings are insufficient
  • move from triage to deeper reversing when the interesting logic is hidden
  • inspect imports, xrefs, resources, overlays, config blobs, and decode routines
  • reconstruct decoding, decryption, decompression, and staging logic when feasible
  • use disassembly, decompilation, headless reverse-engineering tooling, and small helper scripts as needed
  • preserve all important results in evidence files, AGENTS.md, and REPORT.md

Do not stop at the first obstacle. If one method fails:

  1. try an equivalent tool
  2. try a lower-level view such as disassembly, sections, resources, raw blobs, or metadata
  3. try a different extraction or decoding approach
  4. document what failed, why it failed, and what was tried
  5. continue until the remaining blocker is real and clearly explained

Valid escalation examples:

  • strings -> decoded strings -> xrefs -> disassembly -> decompilation
  • manifest review -> decompiled code -> smali -> native library inspection
  • macro extraction -> script reconstruction -> decoder emulation -> stage extraction
  • PE metadata -> imports -> config blob carving -> Ghidra headless analysis

Only stop when:

  • the requested scope is complete
  • the remaining blocker cannot be resolved with available tools and permissions
  • further progress would require execution that the user did not authorize

Tool Discovery And Install Policy

This skill must work as a single SKILL.md. Do not assume helper scripts exist beside it.

Before analysis:

  1. detect the OS and shell environment
  2. discover available tools dynamically
  3. choose the best available tool for each task
  4. use a fallback if the preferred tool is missing
  5. ask the user before installing any missing tool
  6. do not install tools silently
  7. after authorization, use the native package manager or the most direct official installation path

Sample-type tool check:

  • after identifying the sample type, verify the tools required for that specific analysis path before deep analysis begins
  • check only the tools relevant to the current sample type, not the entire global tool list
  • if a mandatory tool or capability for that sample type is missing, ask the user whether to install it before proceeding
  • if a valid fallback exists, use the fallback and record the limitation
  • if only strongly recommended tools are missing, continue unless the missing capability blocks a concrete step
  • do not interrupt the workflow to install irrelevant tools unrelated to the current sample type

Typical discovery commands:

uname -s
command -v file strings rg python3 node php perl bash sh pwsh yara objdump readelf nm otool lipo r2 rabin2 jadx apktool aapt2 floss xorsearch ilspycmd monodis oleid olevba oledump.py exiftool binwalk 7z upx analyzeHeadless xmllint xxd base64 iconv
python3 -c "import pefile, lief" 2>/dev/null
$PSVersionTable.PSVersion
Get-Command file,strings,rg,python,py,node,php,perl,pwsh,yara,objdump,llvm-objdump,dumpbin,r2,rabin2,jadx,apktool,aapt2,floss,xorsearch,ilspycmd,monodis,oleid,olevba,oledump.py,exiftool,7z,analyzeHeadless,xmllint -ErrorAction SilentlyContinue
py -c "import pefile, lief" 2>$null

Core tools:

  • file
  • sha256sum or equivalent
  • strings
  • ripgrep (rg)
  • python3 or py
  • xxd
  • base64
  • yara
  • objdump
  • nm
  • exiftool
  • 7z

Important specialized tools:

  • Android: jadx CLI, apktool, aapt2, strings, ripgrep (rg), yara, optional floss, bundletool, apkanalyzer
  • PE/.NET: strings, yara, floss, xorsearch, radare2 (r2), rabin2, ilspycmd, monodis, pefile, lief, optional Ghidra headless with analyzeHeadless, capa, upx, sigcheck
  • Office/script: oletools (oleid, olevba), oledump.py, unzip, zipinfo, strings, ripgrep (rg), sed, awk, tr, optional CyberChef CLI, rtfobj
  • ELF/Mach-O: objdump, readelf, nm, otool, lipo, binwalk, radare2 (r2), rabin2, optional Ghidra headless with analyzeHeadless
  • Web/script/source: strings, ripgrep (rg), sed, awk, tr, xxd, base64, iconv, xmllint, python3, optional node, php, perl, pwsh, CyberChef CLI, beautifiers or AST parsers

Interpreters and runtimes may be used only for safe text parsing, AST extraction, beautification, or reimplementing a decoder on inert data. Do not execute the malicious sample as-is with node, php, perl, pwsh, python, bash, or other runtimes.

Install examples, only after user authorization:

# Linux
sudo apt-get update
sudo apt-get install -y file binutils unzip p7zip-full yara apktool aapt default-jre mono-utils exiftool binwalk upx-ucl
python3 -m pip install --user flare-floss oletools oledump pefile lief capa
dotnet tool install --global ilspycmd
# macOS
brew update
brew install binutils p7zip yara apktool openjdk python dotnet exiftool binwalk upx
pip3 install --user flare-floss oletools oledump pefile lief capa
dotnet tool install --global ilspycmd
# Windows
winget install --id Python.Python.3.12 --accept-source-agreements --accept-package-agreements
winget install --id OpenJS.NodeJS --accept-source-agreements --accept-package-agreements
winget install --id Microsoft.DotNet.SDK.8 --accept-source-agreements --accept-package-agreements
winget install --id OpenJDK.OpenJDK.17 --accept-source-agreements --accept-package-agreements
winget install --id Microsoft.Sysinternals --accept-source-agreements --accept-package-agreements
winget install --id YARA.YARA --accept-source-agreements --accept-package-agreements
winget install --id OliverBetz.ExifTool --accept-source-agreements --accept-package-agreements
py -m pip install --user flare-floss oletools oledump pefile lief capa
dotnet tool install --global ilspycmd

For Ghidra headless, download the official release and expose support/analyzeHeadless or support\analyzeHeadless.bat in PATH.

Required Tool Set

Use these levels to decide whether the environment is sufficient for a complete analysis.

Mandatory

These are the tools or capabilities expected for a solid end-to-end workflow across the main sample types:

  • file
  • sha256sum or equivalent hash tooling
  • strings
  • ripgrep (rg)
  • python3 or py
  • yara
  • 7z
  • objdump
  • nm
  • exiftool
  • jadx
  • apktool
  • aapt2
  • floss
  • xorsearch
  • radare2 (r2)
  • rabin2
  • Ghidra headless with analyzeHeadless
  • ilspycmd
  • monodis
  • oletools (oleid, olevba)
  • oledump.py
  • Python modules pefile, lief

Strongly Recommended

These are not mandatory in every case, but a full malware analyst workflow will regularly benefit from them:

  • readelf
  • binwalk
  • unzip
  • zipinfo
  • xxd
  • base64
  • sed
  • awk
  • tr
  • xmllint
  • iconv
  • otool and lipo on macOS
  • bundletool
  • apkanalyzer
  • rtfobj
  • CyberChef CLI

Optional Or Situational

These are useful for specific formats, families, or analyst preferences:

  • capa
  • upx
  • sigcheck
  • node
  • php
  • perl
  • pwsh
  • beautifiers or AST parsers for script-heavy cases

Interpretation rule:

  • if a mandatory tool or capability is missing, the analysis may still continue, but Codex must say that the environment is degraded and use the best fallback
  • if a strongly recommended tool is missing, continue normally unless the missing capability blocks a specific step
  • if an optional tool is missing, mention it only when it would materially improve the current case

Persistent Case Files

Always maintain two files at the case root:

  • AGENTS.md: working memory
  • REPORT.md: analyst-facing final report, updated during the work

AGENTS.md

Create or read AGENTS.md before substantive work. Update it after every meaningful action or finding.

Use it to preserve:

  • current sample and current status
  • tool availability and fallback decisions
  • confirmed findings and provenance
  • extracted stages and parent-child relationships
  • decoder or decryption notes
  • family attribution status
  • blockers, open questions, and next actions

Recommended sections:

  • Current Summary
  • Sample Inventory
  • Environment And Tooling
  • Timeline
  • Confirmed Findings
  • Decoders And Algorithms
  • Family Attribution
  • Evidence Index
  • Open Questions
  • Next Actions
  • Final Assessment

REPORT.md

Create REPORT.md during the analysis, not only at the end. Update the same file if the work continues.

REPORT.md must remain clear, simple, and easy to consult. It is the clean summary; AGENTS.md is the working memory.

REPORT.md must contain:

  • case summary
  • sample inventory
  • ordered execution or compromise chain
  • per-stage details with:
    • filename or label
    • MD5
    • SHA1
    • SHA256
    • file type
    • source or parent stage
    • role in the chain
  • indicators separated into:
    • drop URLs
    • stage URLs
    • C2 URLs
    • domains
    • IP addresses
  • decoding, decryption, unpacking, or staging logic
  • persistence and anti-analysis findings
  • malware family assessment
  • confidence and limitations

If the chain cannot be fully reconstructed, say where it breaks and why.

Evidence Preservation

Always create a case directory and preserve evidence even if the request appears limited to IOC extraction.

Recommended layout:

  • case/AGENTS.md
  • case/REPORT.md
  • case/00-intake/
  • case/01-triage/
  • case/02-strings/
  • case/03-static/
  • case/04-decoding/
  • case/05-config/
  • case/06-extracted/
  • case/07-stages/
  • case/08-yara/
  • case/09-reports/
  • case/10-iocs/
  • case/11-notes/

Create it manually if needed:

mkdir -p case/{00-intake,01-triage,02-strings,03-static,04-decoding,05-config,06-extracted,07-stages,08-yara,09-reports,10-iocs,11-notes}
touch case/AGENTS.md case/REPORT.md
$dirs = "00-intake","01-triage","02-strings","03-static","04-decoding","05-config","06-extracted","07-stages","08-yara","09-reports","10-iocs","11-notes"
$dirs | ForEach-Object { New-Item -ItemType Directory -Force -Path (Join-Path "case" $_) | Out-Null }
New-Item -ItemType File -Force -Path (Join-Path "case" "AGENTS.md") | Out-Null
New-Item -ItemType File -Force -Path (Join-Path "case" "REPORT.md") | Out-Null

Preserve at minimum:

  • hash reports
  • file-type identification output
  • ASCII and Unicode strings output
  • decoded strings
  • decompiled or extracted code fragments of interest
  • config blobs
  • decoder scripts
  • C2 URLs, domains, IPs
  • drop URLs and stage URLs
  • update URLs and fallback infrastructure
  • embedded, extracted, or downloaded payload stages with hashes
  • YARA hits
  • final analyst report

Save command output to files whenever practical.

Remote Stage Handling

If a confirmed drop URL, stage URL, payload URL, update URL, or fallback download URL is found, do not stop at identifying the URL. When retrieval is authorized and can be performed safely from a lab context:

  • never stop at a suspicious stage URL when a safe and authorized retrieval path exists

  • always retrieve, hash, document, and continue the analysis on the new stage

  • treat every retrieved component as part of the same execution or compromise chain unless evidence proves otherwise

  • download the referenced file

  • save it under case/07-stages/ or another clearly named evidence path

  • compute hashes immediately

  • record provenance in AGENTS.md

  • analyze the retrieved file as a new stage

  • update REPORT.md and the parent-child chain

For downloaded stages, preserve context from the parent sample:

  • follow the same decoding, decryption, unpacking, parameter-building, header-building, or staging logic introduced by the code path that generated the URL or request
  • if the parent transforms the payload before execution with a wrapper, key, offset, encoding, archive layer, or rename step, reproduce and document that logic before concluding the stage analysis
  • if the URL is dead, blocked, or requires unsafe interaction, record that limitation and continue with static reconstruction where possible

If retrieval is not already authorized:

  • do not stop to ask whether you should retrieve the file
  • continue automatically with offline parsing, carving, deobfuscation, decoder reconstruction, blob extraction, and deeper reverse engineering of the artifacts already present
  • only mention the remote retrieval as a documented next option after you have exhausted the reasonable offline path

Required Workflow

  1. identify sample type, architecture, container format, and hashes
  2. create the case workspace, create or read AGENTS.md, and preserve the original sample
  3. run triage:
    • file type
    • hashes
    • strings
    • metadata
    • archives and containers
    • YARA
  4. run the sample-type workflow
  5. reconstruct encoding, encryption, or config decoding with minimal deterministic scripts if needed
  6. extract and save embedded blobs, dropped URLs, stage URLs, and config material
  7. if a reachable stage is authorized for retrieval in a lab context, download it, hash it, preserve parent logic, and continue the analysis on that stage
  8. attempt malware family attribution using technically defensible evidence
  9. update AGENTS.md with findings, blockers, open questions, next actions, and stage relationships
  10. create or update REPORT.md with the ordered chain, hashes, infrastructure, findings, and limitations
  11. produce a structured answer with evidence, provenance, and confidence

Sample-Type Procedures

Common constraints for all sample types:

  • static analysis only unless explicitly moved into isolated dynamic mode
  • do not execute malware, payloads, scripts, macros, HTML smuggling chains, or webshells
  • every material claim must be backed by visible evidence

Android APK

Objectives:

  • identify suspicious permissions, services, receivers, jobs, and foreground services
  • recover C2s, drop URLs, stage URLs, and config endpoints
  • identify reflection, dynamic loading, native bridges, crypto wrappers, and family indicators

Procedure:

  1. analyze manifest and metadata with aapt2, apktool, or equivalent
  2. decompile with jadx and or apktool
  3. search for:
    • obfuscated code
    • networking classes
    • reflection, dynamic loading, native bridges
    • hardcoded hosts and URLs
    • Firebase, MQTT, WebSocket, sockets
    • AES, DES, RSA, ECC, ChaCha, RC4, XOR, Base64, custom encoding
    • hardcoded or runtime-derived keys
  4. trace where the C2 originates, how it is reconstructed, and where it is used
  5. if the C2 is hidden, reconstruct the decoder and preserve script plus output
  6. if a stage URL or config URL is recoverable and authorized, download it and analyze it while preserving the parent APK decoding or staging logic
  7. save manifest findings, decompiled classes, assets, decoded config, and downloaded stages

Android family-pattern enrichment:

  • always extract and compare these APK features:
    • permission values from the manifest
    • application values from the manifest or application definition
    • intent-related values extracted across the APK, including but not limited to manifest declarations
  • use the remote dataset published at:
    • https://raw.githubusercontent.com/guelfoweb/artifacts/refs/heads/main/data/patterns.json
  • use this dataset as a family-attribution aid, not as a standalone verdict
  • compare the APK features against the dataset and compute a similarity score per family
  • express the similarity score as a percentage
  • treat that percentage as a match score, not as certainty
  • report for the top candidate families:
    • family name
    • match score percentage
    • matched permissions
    • matched application value
    • matched intent-related values
    • missing or divergent values
  • treat intent as a noisier support signal than exact manifest fields and correlate it with code, config, services, receivers, and infrastructure before concluding attribution
  • correlate the score with code structure, config, infrastructure, services, receivers, and other family indicators before concluding attribution
  • if the dataset is unavailable, continue the APK analysis normally and record that the family-pattern enrichment step could not be performed
  • if multiple families score closely, report them as candidate families and lower the confidence
  • if the score is weak, report the attribution as tentative

Required APK-specific report content:

  • recovered C2s: host, IP, URL, fallback
  • top family candidates with match score percentage
  • protection mechanism used
  • decoding or decryption algorithm if applicable
  • relevant class, method, resource, or smali path
  • confidence

Windows PE And .NET

Objectives:

  • identify PE type, architecture, .NET usage, packing, signer, resources, config, and network logic
  • recover C2s, drop URLs, stage URLs, Telegram artifacts, fallback infrastructure
  • identify persistence, anti-analysis behavior, and family indicators

Procedure:

  1. triage with file, hashes, strings, UTF-16 strings, objdump, rabin2, pefile, or lief
  2. search specifically for URL, IP, domain, user-agent, HTTP endpoints, Telegram bot tokens, chat_id, and api.telegram.org
  3. use floss for decoded strings and classify the obfuscation type
  4. use xorsearch for hidden URL or IP material and preserve offsets or blobs
  5. inspect imports, sections, xrefs, resources, and crypto or network functions with r2 or rabin2
  6. focus on patterns such as WinInet, WinHTTP, sockets, crypt APIs, and BCrypt
  7. if .NET, decompile with ilspycmd; use monodis if readability is damaged by obfuscation
  8. if needed, use Ghidra headless to inspect functions, constants, crypto routines, and C2 construction paths
  9. recover overlays, config blobs, mutexes, registry paths, services, task names, and staged URLs
  10. if Telegram is used, extract bot token, chat_id if present, and communication flow
  11. if a stage URL is authorized for retrieval, download it and analyze it while preserving the parent binary decoding, decrypting, unpacking, or request-building logic
  12. save all extracted resources, config blobs, decoders, and stages with hashes and provenance

Required PE/.NET-specific report content:

  • EXE type: native, .NET, or packed
  • C2s with type and origin:
    • type: URL, IP, domain, Telegram bot
    • origin: strings, decoded strings, xorsearch, code
  • protection mechanism: none, XOR, standard crypto, or custom
  • relevant code path
  • confidence

If no C2 is found, say so explicitly, state whether it likely resolves at runtime, and explain the static-analysis limitation.

Office, VBS, JS, HTA, VBA, PowerShell

Objectives:

  • identify macros, autoexec triggers, downloader logic, LOLBins, encoded commands, and remote payload sources
  • recover C2s, drop URLs, stage URLs, and embedded scripts
  • determine likely family or script-kit when technically supported

Procedure:

  1. identify file type with file and determine whether macros are present
  2. use oleid, olevba, and oledump.py where available
  3. for macro-enabled content, identify AutoOpen, Document_Open, and Workbook_Open
  4. search for Eval, Execute, Shell, Run, PowerShell, and LOLBins such as certutil, bitsadmin, and mshta
  5. for OOXML, inspect XML, relationships, embedded objects, and external template or macro references
  6. search for CreateObject, GetObject, MSXML2.XMLHTTP, WinHttpRequest, Chr, ChrW, Asc, Split, Replace, Join
  7. reconstruct obfuscated strings or command lines such as numeric arrays, Chr() / ChrW(), Base64, simple XOR, and split concatenation
  8. search specifically for URL, IP, domain, PowerShell -enc, Telegram Bot API usage, Paste sites, GitHub raw, and similar stage sources
  9. if a remote stage is authorized for retrieval, download it and analyze it while preserving the parent document or script decoding, deobfuscation, or staging logic
  10. preserve extracted macros, decoded PowerShell, embedded payloads, remote URLs, downloaded stages, and stage relationships

Required document/script report content:

  • file type: Office with macros, Office without macros, or script type
  • obfuscation mechanisms used
  • C2s with type and origin:
    • type: URL, IP, Telegram, Paste, GitHub raw, other
    • origin: macro, script, XML, decoding
  • relevant logical code path
  • confidence

If no C2 is found, say so explicitly, state whether it likely resolves at runtime, and explain the static-analysis limitation.

HTML, Web, Script, And Source-Code Payloads

Objectives:

  • identify malicious logic in HTML, HTM, MHTML, SVG, XML, XSL, JS, JSE, VBS, VBE, WSF, WSH, HTA, PowerShell, BAT, CMD, shell, Python, PHP, Perl, Ruby, Lua, AutoIt, Node.js, JSP, ASP, ASPX, and similar payloads
  • recover C2s, drop URLs, stage URLs, webshell endpoints, credentials, and embedded payloads
  • identify HTML smuggling, browser-based delivery, downloader logic, webshell features, and obfuscation chains
  • determine likely family, toolkit, loader pattern, or script-kit when technically supported

Procedure:

  1. identify the apparent language or markup type and all embedded sub-languages
  2. normalize formatting without executing the sample:
    • beautify minified JS where useful
    • unwrap long lines
    • decode escaped, hex, percent-encoded, Base64, UTF-16, or mixed encodings
    • preserve original and normalized versions
  3. search for dangerous primitives and stage-delivery logic such as:
    • eval, Function, setTimeout, setInterval, unescape, atob, fromCharCode
    • ActiveXObject, WScript.Shell, CreateObject, MSXML2.XMLHTTP, WinHttpRequest, ADODB.Stream
    • IEX, Invoke-Expression, DownloadString, WebClient, Start-BitsTransfer
    • certutil, bitsadmin, mshta, rundll32, regsvr32, curl, wget, powershell
    • exec, system, shell_exec, passthru, subprocess, os.system, child_process, sockets, HTTP clients
  4. for HTML smuggling and browser-delivered payloads, inspect Blob, URL.createObjectURL, download, msSaveBlob, Uint8Array, hidden iframes, meta refresh, redirects, and payload reconstruction in HTML or JS
  5. for webshells and server-side code, identify command execution, upload or download features, authentication gates, outbound callbacks, and exfil paths
  6. reconstruct decoding or staging logic with minimal helper scripts on inert data only
  7. if a referenced stage is authorized for retrieval, download it and analyze it while preserving the parent HTML, script, or source-code delivery logic
  8. save normalized source, extracted payloads, reconstructed stages, decoder scripts, endpoints, and downloaded stages

Required web/source report content:

  • file type and language or script family
  • delivery pattern: HTML smuggling, downloader, loader, stager, webshell, credential harvester, or unknown
  • obfuscation mechanisms used
  • C2s with type and origin:
    • type: URL, IP, domain, panel path, webshell endpoint, Telegram, Paste, GitHub raw, other
    • origin: source code, embedded script, HTML or DOM logic, decoding, extracted stage
  • relevant logical code path
  • confidence

If no C2 is found, say so explicitly, state whether it likely resolves or builds at runtime, and explain the static-analysis limitation.

ELF And Mach-O

Objectives:

  • identify network endpoints, drop URLs, stage URLs, config paths, and persistence
  • recover embedded scripts, archives, and config blobs
  • determine likely family, toolkit, or implant lineage when supported

Procedure:

  1. determine ELF, Mach-O, fat Mach-O, or wrapper script format
  2. use strings, objdump, readelf, nm, otool, lipo, r2, rabin2, or Ghidra headless as appropriate
  3. inspect linked libraries, symbols, hardcoded paths, shell commands, network and crypto APIs
  4. look for persistence via launch agents, launch daemons, cron, systemd, shell profile changes, login items, or hidden service files
  5. use binwalk and archive extraction for embedded content
  6. if a stage URL is authorized for retrieval, download it and analyze it while preserving the parent Unix-side decoding, unpacking, or staging logic
  7. preserve extracted stages, shell scripts, plist files, configs, decoded infrastructure, and downloaded stages

Family Attribution Rules

Attempt family attribution only when supported by technical evidence such as:

  • config layout and field names
  • mutex names
  • user-agent formats
  • campaign IDs
  • protocol structure
  • crypto constants
  • registry or filesystem conventions
  • service, task, or receiver names
  • builder markers
  • packer signatures
  • distinctive strings or code structure

If attribution is weak:

  • say that the family is unknown or tentative
  • provide candidate families if useful
  • explain why the attribution is uncertain

For Android APKs, if the remote family-pattern dataset is available, include the manifest-based family match score as supporting evidence, but never treat it as proof on its own.

Final Response Requirements

Default to concise structured output, not narrative prose.

Always include:

  • file type
  • architecture
  • SHA256
  • likely malware family or candidate families when supported
  • key findings
  • extracted or reconstructed C2 values
  • extracted or reconstructed drop URLs
  • extracted or reconstructed stage URLs
  • origin of each important value
  • obfuscation, packing, compression, or protection method
  • decoding or decryption algorithm used
  • persistence artifacts
  • extracted stages and their hashes
  • evidence file locations
  • AGENTS.md path and update status
  • REPORT.md path and update status
  • confidence
  • explicit limits if the sample appears packed, incomplete, or runtime-dependent

If dynamic analysis was explicitly requested and performed in a disposable lab, also include:

  • environment used
  • observed processes
  • observed network destinations
  • dropped files
  • persistence observed

Safety Rules

  • Never run the sample on a personal or production machine.
  • Never use browser-based sandboxes unless the user explicitly asks and policy allows it.
  • Never claim a URL is active, reachable, or malicious without evidence.
  • Never destroy the original sample.
  • Never overwrite extracted evidence.
  • Keep decoder scripts minimal and deterministic.
  • Keep dynamic claims separate from static claims.

Skills associés