赞
踩
yara规则的详细信息请参考:
https://yara.readthedocs.io/en/stable/writingrules.html
根据官方文档,yara规则长这个样子:
[1]:yara rule
-
- /*
- This is a multi-line comment ...
- */
- rule silent_banker : banker
- {
- meta:
- description = "This is just an example"
- threat_level = 3
- in_the_wild = true
- strings:
- $a = {6A 40 68 00 30 00 00 6A 14 8D 91}
- $b = {8D 4D B0 2B C1 83 C0 27 99 6A 4E 59 F7 F9}
- $c = "UVODFRYSIHLNWPEJXQZAKCBGMT"
- condition:
- $a or $b or $c
- }

1. /* 。。。。*/这部分是注释,可有可无。相关解释见链接。
2.rule 这个是yara的关键词,是一条规则必不可少的部分,其实 rule前边还可以有 global,private 等关键词。
3.silent_banker 这个是规则ID( rule identifier ),是用户自定义的部分,一般是用来表示规则的名称。
4.banke 这个是规则标签(Rule tags),主要用于过滤扫描结果。
5.meta 此部分是元数据( Metadata),主要是规则的描述信息,比如作者,日期或者其他信息
6.strings 此部分是规则字符串(strings), 就是描述样本特征的字符串,可以使用普通字符串,16进制字符串,和正则表达式。
7.condition 此部分是规则的条件部分(condition),主要用来表述怎么组合利用上边的样本特征(strings)及其他的一些条件。
其中,规则中最复杂,也最丰富的是6.7两部分。编写一条规则,最主要的就是编写以上两部分。
注:文章开始处的链接文档是最新版的yara的文档,而此文解析的源码是早期的源码,文档中的一部分特性在此源码中是没有的。
比如xor strings ,base64 strings.
规则编译,即是将[1]处的这样一条条规则转化成内存中的数据结构 YARA_CONTEXT* context;
此部分主要通过lex.l grammar.y ast.h,ast.c 这几个文件完成。
lex.l 是规则的词法分析部分。通过flex 程序可编译生成lex.h lex,c文件
grammar.y 是规则的语法分析部分。通过bison程序编译产生 grammarh 和grammar.c文件。
ast.h/ast.c 主要是生成新的rule,string 等内存结构及查找。
yara程序在规则扫描部分的入口是
parse_rules_string
parse_rules_file
这两个函数。
- int parse_rules_string(const char* rules_string, YARA_CONTEXT* context)
- {
- yyscan_t yyscanner;
- YY_BUFFER_STATE state;
-
- yylex_init(&yyscanner);
-
- yyset_extra(context, yyscanner);
-
- state = yy_scan_string(rules_string, yyscanner);
-
- yyset_lineno(1, yyscanner);
- yyparse(yyscanner);
-
- yylex_destroy(yyscanner);
-
- return context->errors;
- }
-
-
-
- int parse_rules_file(FILE* rules_file, YARA_CONTEXT* context)
- {
- yyscan_t yyscanner;
-
- yylex_init(&yyscanner);
-
- #ifdef DEBUG
- yyset_debug(1, yyscanner);
- #endif
-
- yyset_in(rules_file, yyscanner);
- yyset_extra(context, yyscanner);
-
- yyparse(yyscanner); //这个是语法(grammar)分析器的入口 yylex是词法分析器的入口
-
- yylex_destroy(yyscanner);
-
- return context->errors;
- }

扫描流程:
yyparse函数(grammar.c) 种调用YYLEX获取标识符,当匹配到用户定义的BNF范式后,进入一个大大的swithch,调用用户调用的各种归约函数(reduce_*):
- yyreduce:
- /* yyn is the number of a rule to reduce with. */
- yylen = yyr2[yyn];
-
- /* If YYLEN is nonzero, implement the default value of the action:
- `$$ = $1'.
- Otherwise, the following line sets YYVAL to garbage.
- This behavior is undocumented and Bison
- users should not rely upon it. Assigning to YYVAL
- unconditionally makes the parser a bit smaller, and it avoids a
- GCC warning that YYVAL may be used uninitialized. */
- yyval = yyvsp[1-yylen];
-
-
- YY_REDUCE_PRINT (yyn);
- switch (yyn)
- {
- case 6: //这里的case 与grammar.y 中的BNF范式的定义顺序基本对应
- #line 279 "grammar.y"
- {
- if (reduce_rule_declaration(yyscanner, (yyvsp[(3) - (9)].c_string),(yyvsp[(1) - (9)].integer),(yyvsp[(4) - (9)].tag),(yyvsp[(6) - (9)].meta),(yyvsp[(7) - (9)].string),(yyvsp[(8) - (9)].term)) != ERROR_SUCCESS)
- {
- yyerror(yyscanner, NULL);
- YYERROR;
- }
- }
- break;
- .........................

之后的代码没有什么太难理解的了。
比较有意思的就是ast.c中的new_hex_string函数。
此函数中完整实现了对 ? 通配符 ,[num-num] ,(BYTE|BYTE) 等模式的匹配和处理。
对于这三种模式的相关信息是保存在mask字段中。
此部分的解析到此结束,更多信息请参考附录中的源码。
附录:
lex.l 此部分加有注释,格式可能已破坏
- /*
- Copyright (c) 2007. Victor M. Alvarez [plusvic@gmail.com].
- All rights reserved.
- Redistribution and use in source and binary forms, with or without
- modification, are permitted provided that the following conditions
- are met:
- 1. Redistributions of source code must retain the above copyright
- notice, this list of conditions and the following disclaimer.
- 2. Redistributions in binary form must reproduce the above copyright
- notice, this list of conditions and the following disclaimer in the
- documentation and/or other materials provided with the distribution.
- 3. All advertising materials mentioning features or use of this software
- must display the following acknowledgement:
- This product includes software developed by Victor M. Alvarez and its
- contributors.
- 4. Neither the name of Victor M. Alvarez nor the names of its contributors
- may be used to endorse or promote products derived from this software
- without specific prior written permission.
- THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
- AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
- LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
- CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
- SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
- INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
- CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
- ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- POSSIBILITY OF SUCH DAMAGE.
- */
-
- /* Lexical analyzer for YARA */
-
- %{
-
- #include <math.h>
- #include <stdio.h>
- #include <string.h>
- #include "grammar.h"
- #include "xtoi.h"
- #include "mem.h"
- #include "sizedstr.h"
- #include "lex.h"
- #include "yara.h"
-
- #define LEX_CHECK_SPACE_OK(data, current_size, max_length) \
- if (strlen(data) + current_size >= max_length - 1) \
- { \
- yyerror(yyscanner, "out of space in lex_buf"); \
- yyterminate(); \
- }
-
- #define YYTEXT_TO_BUFFER \
- { \
- char *yptr = yytext; \
- LEX_CHECK_SPACE_OK(yptr, yyextra->lex_buf_len, LEX_BUF_SIZE); \
- while ( *yptr ) \
- { \
- *yyextra->lex_buf_ptr++ = *yptr++; \
- yyextra->lex_buf_len++; \
- } \
- }
-
- #ifdef WIN32
- #define snprintf _snprintf
- #endif
-
- %}
- /*flex 的配置信息 reentrant 可重入 bison-bridge配合bison使用 */
- %option reentrant bison-bridge
- /**/
- %option noyywrap
- %option nounistd
- %option yylineno
-
- %option verbose
- %option warn
-
- /*http://postgresqlchina.com/tecdocdetail/1 */
- /*%x 定义 开始状态,开始状态代表进入一个特定的状态,在规则段只有定义了特定状态的规则才会匹配,*/
- /*这种规则通过<start stat>来标识。例如 定义段定义了 %x xb 则在规则段只有<xb>开头的规则才会匹配,其他的的规则则不会被匹配。*/
- %x str
- %x regexp
- %x include
- %x comment
-
- digit [0-9]
- letter [a-zA-Z]
- hexdigit [a-fA-F0-9]
-
- %%
- /*关键字识别 */
- "<" { return _LT_; }
- ">" { return _GT_; }
- "<=" { return _LE_; }
- ">=" { return _GE_; }
- "==" { return _EQ_; }
- "!=" { return _NEQ_; }
- "<<" { return _SHIFT_LEFT_; }
- ">>" { return _SHIFT_RIGHT_; }
- "private" { return _PRIVATE_; }
- "global" { return _GLOBAL_; }
- "rule" { return _RULE_; }
- "meta" { return _META_; }
- "strings" { return _STRINGS_; }
- "ascii" { return _ASCII_; }
- "wide" { return _WIDE_; }
- "fullword" { return _FULLWORD_; }
- "nocase" { return _NOCASE_; }
- "condition" { return _CONDITION_; }
- "true" { return _TRUE_; }
- "false" { return _FALSE_; }
- "not" { return _NOT_; }
- "and" { return _AND_; }
- "or" { return _OR_; }
- "at" { return _AT_; }
- "in" { return _IN_; }
- "of" { return _OF_; }
- "them" { return _THEM_; }
- "for" { return _FOR_; }
- "all" { return _ALL_; }
- "any" { return _ANY_; }
- "entrypoint" { return _ENTRYPOINT_; }
- "filesize" { return _SIZE_; }
- "rva" { return _RVA_; }
- "offset" { return _OFFSET_; }
- "file" { return _FILE_; }
- "section" { return _SECTION_; }
- "uint8" { return _UINT8_; }
- "uint16" { return _UINT16_; }
- "uint32" { return _UINT32_; }
- "int8" { return _INT8_; }
- "int16" { return _INT16_; }
- "int32" { return _INT32_; }
- "matches" { return _MATCHES_; }
- "contains" { return _CONTAINS_; }
- "index" { return _INDEX_; }
-
- /*多行注释识别 */
- "/*" { BEGIN(comment); }
- <comment>"*/" { BEGIN(INITIAL); }
- <comment>(.|\n) { /* skip comments */ }
-
-
- /*单行注释识别 */
- "//"[^\n]* { /* skip single-line comments */ }
-
- include[ \t]+\" {
- yyextra->lex_buf_ptr = yyextra->lex_buf;
- yyextra->lex_buf_len = 0;
- BEGIN(include);
- }
- <include>[^\"]+ {
- YYTEXT_TO_BUFFER;
- }
- <include>\" {
- char buffer[1024];
- char *current_file_name;
- char *s = NULL;
- char *b = NULL;
- char *f;
- FILE* fh;
- YARA_CONTEXT* context = yyget_extra(yyscanner);
- if (context->allow_includes)
- {
- *yyextra->lex_buf_ptr = '\0'; // null-terminate included file path
- // move path of current source file into buffer
- current_file_name = yr_get_current_file_name(context);
- if (current_file_name != NULL)
- {
- strncpy(buffer, yr_get_current_file_name(context), sizeof(buffer)-1);
- buffer[sizeof(buffer)-1] = '\0';
- }
- else
- {
- buffer[0] = '\0';
- }
- // make included file path relative to current source file
- s = strrchr(buffer, '/');
- #ifdef WIN32
- b = strrchr(buffer, '\\'); // in Windows both path delimiters are accepted
- #endif
- if (s != NULL || b != NULL)
- {
- f = (b > s)? (b + 1): (s + 1);
- strncpy(f, yyextra->lex_buf, sizeof(buffer) - (f - buffer));
- buffer[sizeof(buffer)-1] = '\0';
- // SECURITY: Potential for directory traversal here.
- fh = fopen(buffer, "r");
- // if include file was not found relative to current source file, try to open it
- // with path as specified by user (maybe user wrote a full path)
- if (fh == NULL)
- {
- // SECURITY: Potential for directory traversal here.
- fh = fopen(yyextra->lex_buf, "r");
- }
- }
- else
- {
- // SECURITY: Potential for directory traversal here.
- fh = fopen(yyextra->lex_buf, "r");
- }
- if (fh != NULL)
- {
- int error_code = ERROR_SUCCESS;
- if ((error_code = yr_push_file_name(context, yyextra->lex_buf)) != ERROR_SUCCESS)
- {
- if (error_code == ERROR_INCLUDES_CIRCULAR_REFERENCE)
- {
- yyerror(yyscanner, "includes circular reference");
- }
- else if (error_code == ERROR_INCLUDE_DEPTH_EXCEEDED)
- {
- yyerror(yyscanner, "includes circular reference");
- }
- yyterminate();
- }
- yr_push_file(context, fh);
- yypush_buffer_state(yy_create_buffer(fh, YY_BUF_SIZE, yyscanner), yyscanner);
- }
- else
- {
- snprintf(buffer, sizeof(buffer), "can't open include file: %s", yyextra->lex_buf);
- yyerror(yyscanner, buffer);
- }
- }
- else // not allowing includes
- {
- yyerror(yyscanner, "includes are disabled");
- yyterminate();
- }
- BEGIN(INITIAL);
- }
- <<EOF>> {
- YARA_CONTEXT* context = yyget_extra(yyscanner);
- FILE* file = yr_pop_file(context);
- if (file != NULL)
- {
- fclose(file);
- }
- yr_pop_file_name(context);
- yypop_buffer_state(yyscanner);
- if (!YY_CURRENT_BUFFER)
- {
- yyterminate();
- }
- }
- /*string 变量名识别 现在好像没有了 */
- $({letter}|{digit}|_)*"*" {
- yylval->c_string = (char*) yr_strdup(yytext);
- return _STRING_IDENTIFIER_WITH_WILDCARD_;
- }
- /*string 变量名识别 */
- $({letter}|{digit}|_)* {
- yylval->c_string = (char*) yr_strdup(yytext);
- return _STRING_IDENTIFIER_;
- }
- /*条件部分的 变量名识别 */
- #({letter}|{digit}|_)* {
- yylval->c_string = (char*) yr_strdup(yytext);
- yylval->c_string[0] = '$'; /* replace # by $*/
- return _STRING_COUNT_;
- }
- /*条件部分的 变量名识别 */
- @({letter}|{digit}|_)* {
- yylval->c_string = (char*) yr_strdup(yytext);
- yylval->c_string[0] = '$'; /* replace @ by $*/
- return _STRING_OFFSET_;
- }
- /*ID 识别 */
- ({letter}|_)({letter}|{digit}|_)* {
- if (strlen(yytext) > 128)
- {
- yyerror(yyscanner, "indentifier too long");
- }
- yylval->c_string = (char*) yr_strdup(yytext);
- return _IDENTIFIER_;
- }
- {digit}+(MB|KB){0,1} {
- yylval->integer = (size_t) atol(yytext);
- if (strstr(yytext, "KB") != NULL)
- {
- yylval->integer *= 1024;
- }
- else if (strstr(yytext, "MB") != NULL)
- {
- yylval->integer *= 1048576;
- }
- return _NUMBER_;
- }
- 0x{hexdigit}+ {
-

Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。