Artigo Revisado por pares

CryptoSAT: a tool for SAT‐based cryptanalysis

2018; Institution of Engineering and Technology; Volume: 12; Issue: 6 Linguagem: Inglês

10.1049/iet-ifs.2017.0176

ISSN

1751-8717

Autores

Frédéric Lafitte,

Tópico(s)

Advanced Malware Detection Techniques

Resumo

IET Information SecurityVolume 12, Issue 6 p. 463-474 Research ArticleFree Access CryptoSAT: a tool for SAT-based cryptanalysis Frédéric Lafitte, Corresponding Author Frédéric Lafitte frederic.lafitte@mil.be orcid.org/0000-0001-9748-3130 Department of Mathematics, Royal Military Academy, BelgiumSearch for more papers by this author Frédéric Lafitte, Corresponding Author Frédéric Lafitte frederic.lafitte@mil.be orcid.org/0000-0001-9748-3130 Department of Mathematics, Royal Military Academy, BelgiumSearch for more papers by this author First published: 01 November 2018 https://doi.org/10.1049/iet-ifs.2017.0176Citations: 5AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract The security of symmetric key primitives comes from their exposure to public scrutiny in the context of competitions such as Advanced Encryption Standard, Secure Hash Algorithm 3, or currently CAESAR. However, due to the increasing number of primitives subjected to these competitions, the quality of the scrutiny relies on the availability of automated tools. Although SAT solvers have already proved useful for the automated analysis of these primitives, there is a lack of practical software tools for this purpose. This study describes a framework that aims to make SAT-based analyses accessible to cryptographers. The framework is implemented in a free open-source tool called CryptoSAT which is available in the public domain. 1 Introduction Throughout this article, symmetric key primitives refer to algorithms with fixed-length inputs such as block ciphers, keystream generators, compression functions, permutations, i.e. security-critical algorithms for which, unlike all the other cryptographic mechanisms, no formal security proofs are available. While operation modes and asymmetric key mechanisms are supported by manual proofs and cryptographic protocols often verified using formal methods, the security of symmetric key primitives comes from the 'test of time', i.e. from their exposure to public scrutiny; if no attack is exposed after years of analysis by the research community, the primitive is considered suitable for deployment. To stimulate public scrutiny, the test of time most often takes the form of a competition where cryptographers submit their primitives and spend about 4 years attempting to break rival submissions. To summarise major competitions: 15 block ciphers were competing in Advanced Encryption Standard (1998–2000), 34 stream ciphers in eStream (2005–2008), 51 hash functions in Secure Hash Algorithm 3 (2008–2012), and 55 authenticated-encryption algorithms in CAESAR (2014–ongoing). Due to the increasing number of primitives, automated tools are needed in order to speed-up their analysis. Formal verification is an active field of research based on mathematical logic. Very basically, a system and its properties are expressed by a set of formulas that can be checked automatically by a machine. Since verification consists of logical inferences from precise specifications, a verification engine (without bugs) provides an irrefutable proof that (the specification of) a system has the specified properties. Formal methods have reached a maturity level suitable for industrial applications. For example, hardware and critical software are often verified in an automated manner. They are also often used for the verification of cryptographic protocols and sometimes to assist manual proofs. However, these methods inherit the limitations of mathematical logic: while some logics are rich enough to express practically any property, they are either undecidable or intractable. Other logics are simple enough to be efficiently verified by a computer but do not allow to express useful properties. For the formal verification of cryptographic primitives, propositional logic strikes a good balance: (i) many properties can be expressed in a precise and rigorous manner as instances of the Boolean satisfiability problem, or instances of SAT for short, (ii) the corresponding verification engines, called SAT solvers, have become incredibly efficient over more than 20 years of optimisation and fine-tuning. Although the verification of a propositional formula requires to search a space that is of exponential size (in the number of propositions), SAT solvers exhibit surprising performances in practice. SAT solvers have already been used on several occasions for cryptanalysis, but their adoption by the symmetric key community seems to be hindered by the lack of convenient software; as mentioned in the seminal work of Massacci et al. [1], obtaining a propositional formula that captures a cryptographic algorithm is feasible but not easy. Although some open-source tools do exist for the SAT-based analysis of symmetric key primitives, these tools still require quite some work from the cryptanalyst before obtaining a reliable encoding of the target algorithm. In particular, these tools require the cryptanalyst to specify the target algorithm using a different language than the algorithm's reference implementation, which is usually written in C or C++. This study describes a framework and its implementation in the software tool CryptoSAT that allows for a cryptanalyst that is not familiar with SAT to easily verify properties of cryptographic implementations written in C or C++. The tool offers a usable and extensible platform in which all the steps of a SAT-based cryptanalysis are integrated, from the generation and manipulation of SAT instances to their solution, while abstracting away from the user all the propositional aspects that constitute its backend. An overview of the software package is given in Fig. 1 (the elements depicted in the figure and their interaction will be specified in the course of the article). Fig. 1Open in figure viewerPowerPoint Overview of the CryptoSAT package The package is written in C++ and R; R is a free open-source collaborative effort widely used for (but not restricted to) data analysis and numerical computing. It is an implementation of S, a statistical programming language that appeared around 1975. More information can be found in [2, 3] and https://www.r-project.org. For the purpose of SAT-based cryptanalysis, R offers a high level scripting language; as suggested by the examples in this study, a cryptanalyst will only need a very basic knowledge of the R language in order to use Crypto SAT. After reviewing existing tools in Section 2.1, the objectives of the framework are stated in Section 2.2. The method used in order to obtain SAT encodings from C or C++ implementations is described in Section 4 based on the formulas from Section 3. This method is wrapped in R code such that a user can exploit it from a higher level of abstraction. The R code is organised around three objects: Target (Section 5.1), Instance (Section 5.2), and Solution (Section 5.3). Section 6 explains how to get started with Crypto SAT and how to extend it with new targets in a verifiable manner. Finally, Section 7 concludes. The remainder of the article assumes some familiarity with propositional logic. For clarity, some notions are informally introduced hereafter. Boolean satisfiability is a decision problem: given a propositional formula , is there a valuation for which is true? In practice, SAT solvers answer the problem by searching for a satisfying valuation v (which makes them valuable for cryptanalysis) and the formulas given to solvers are conjunctive normal forms (CNFs) in the DIMACS format. A CNF is a conjunction () of clauses, a clause is a disjunction () of literals, and a literal is a proposition () or its negation (). The DIMACS format (https://dimacs.rutgers.edu/pub/challenge/satisfiability/doc) is a commonly accepted format for representing inputs (and outputs) of SAT solvers. An input file in DIMACS CNF contains one line per clause, each line ending with '', and in each line, a literal (resp. ) is simply represented by the integer i (resp. ). Finally, a 'problem line' (header) defines the kind of formula, the number of clauses and the number of propositions. Example 1.The exclusive OR of 2 bits () can be formulated in CNF by considering the negation of those valuations for which the XOR is false: . This CNF is written in DIMACS as follows: p cnf 2 2 1 2 0 −1 −2 0 2 Literature review and objectives 2.1 Literature review This section reviews the existing methods for generating cryptographic SAT instances for which an implementation is available in the public domain. Previous works for which an implementation is not available (e.g. [4-6]) use essentially the Tseitin transformations, with the exception of [7]. CryptoSAT implements a variant of [7] which is described in Section 3. From algebraic normal forms (ANFs) [8]. In 2007, a method to convert instances of the multivariate quadratic (MQ) problem over to SAT instances was introduced. The MQ problem consists of a system of ANFs of degree 2. The purpose of reducing MQ to SAT was to measure the efficiency of SAT solvers when confronted with the systems of ANFs (of low degree and sparse in terms of monomials) that arise from algebraic cryptanalyses. The performance of SAT solvers compared favourably against that of algebraic methods implemented in the computer algebra systems Magma and Singular [8]. An implementation of the conversion method was made publicly available (http://www.nicolascourtois.com/software/CourtoisBardJefferson_public_distribution.zip), allowing for a cryptanalyst to quickly obtain DIMACS CNF encodings of the algebraic system at hand. As a consequence, this translation method has been widely used in the symmetric key community as a way to enhance algebraic cryptanalyses. Grain-of-salt [9]. Since 2010, a tool called Grain-of- Salt is available (http://www.msoos.org/grain-of-salt/) exclusively for the analysis of shift-register-based keystream generators. These algorithms have a common structure; they are essentially finite-state machines, determined by a state, a state transition function and an output function. The input of Grain-of-Salt is similar to a configuration file that allows instantiating these components. Uniform reduction to SAT (URSA) [10]. In 2010, the tool URSA was proposed as a generic method to translate different problems into SAT instances, among which cryptographic problems. It requires the specification of the target algorithm in a new language (called URSA) that is specific to the tool. The CNF encodings are obtained from the language's semantic. The tool was used in [11] in order to show that previous differential analyses of some block ciphers (i.e. XTEA and SHACAL-1) were based on incompatible differential trails. Crypt Log Ver [12]. Made available in 2011, the tool CryptLogVer (http://www.ecrypt.eu.org/tools/cryptlogver) greatly simplifies the generation of SAT instances but still requires quite some effort from the user since the latter is required to re-write the target primitive in a hardware description language (HDL) called SystemVerilog, and to use the CryptLogVer software in order to convert the HDL code into a SAT instance. The instance can then be verified and solved independently of the CryptLogVer software. The authors of this tool applied it to the analysis of round-reduced versions of SHA-3 finalists, among which Keccak [12-14]. Transalg [15]. In 2014, the Transalg tool was proposed in order to produce symbolic executions, in the form of propositional formulas, using techniques from compilation theory. The tool requires the specification of the target in an (untyped) 'C-like language' that is specific to the tool. Crypto SMT. In 2014, the Crypto SMT tool (http://www2.compute.dtu.dk/stek/cryptosmt.html) was introduced for the verification of properties based on the constraint solver simple theorem prover (STP, http://stp.github.io/) (which translates the given constraint satisfaction problem into SAT). The tool requires the user to model the target algorithm as an STP program (using the SMT or the CVC language) and has been used for the analysis of differential properties of the block cipher Simon [16]. In recent years, STP has become a common tool in cryptanalysis (e.g. [17-22]) and is often used in order to verify the differential properties of ARX primitives based on [17]. The limitations of the aforementioned tools are explained in the next section. To overcome them, Crypto SAT improves upon previous works [7] in which operator overloading (a feature of the C++ language) was proposed in order to obtain a symbolic execution of the target primitive. The instance generation method from [7] requires keeping a large formula in memory and translating it to CNF in a separate step, which results in high memory requirements and lower performance. Sections 3 and 4 describe a modification of the method from [7] that allows efficiently generating formulas directly in the DIMACS CNF format. Once a primitive is formulated in CNF, it can be very time-consuming to manipulate the formula in order to encode a search problem of cryptographic interest. Crypto SAT offers tools in order to facilitate this step, although the properties that can be easily verified are somewhat limited in the current release. 2.2 Objectives The evaluation of cryptographic algorithms and their implementations is an intricate and time-consuming endeavour. Under time constraints, it is unlikely that an evaluator will take the time to become familiar with novel software tools or methodologies that are available in the public domain for the analysis of algorithms. Therefore, an important objective is to make the SAT framework immediately accessible for the impatient cryptanalyst. Ideally, an evaluator should be able to quickly obtain answers to questions about the design and implementation of the algorithm without having to become familiar with propositional reasoning or a new language to specify the target. A related objective is, therefore, to make the framework as user friendly as possible: integrate all steps involved in an SAT-based analysis: (i) generate the formula, (ii) encode properties (iii) solve formula (iv) manipulating solver's output. This is not the case with the existing tools. The tools mentioned in the previous section require an evaluator to translate the target implementation into a different language, a procedure that has two important drawbacks: Rewriting the code is time consuming and might lead to implementation errors that will further delay the analysis. The evaluator will end up verifying the security of a specification that does not necessarily model all the flaws that might be present in the implementation. The second point is of critical importance. When evaluating a cryptographic product, an important objective is to rule out potential backdoors. Although backdoors might be present in the design of an algorithm (e.g. see the third question of the interview at https://www.iso.org/iso/sc_27_interviewwalterfumy.pdf), such backdoors are likely to be detected during the algorithm's exposure to public scrutiny (test of time), whereas a backdoor in the implementation would only be detectable during the product's evaluation by a handful of evaluators under time constraints. An example of potential flaw due to the implementation can be found in [23]. Therefore, a third objective consists of making the verification as close to the implementation as possible. Finally, in order to be useful for practicians, the tool must guarantee that the SAT encodings are reliable. That is, the tool must implement a mechanism that allows automatically verifying the correctness of the SAT encodings. Let us summarise the objectives introduced in the previous paragraphs as follows: Transparency: the user is not required to be familiar with the verification methods. Language: the target is specified in a well-known language. Integration: the user is not required to switch from one platform to another. This allows the user to write programs that automate the entirety of the analysis. Flexibility: the tool's ability to verify arbitrary (propositional) properties. Verifiability: it must be easy to verify that a SAT encoding reliably represents the target algorithm. Crypto SAT, which is in the public domain since 2013, was developed further with an effort to meet all the objectives listed above. An important advantage is that the evaluation target consists of a slight modification of its reference C/C++ implementation. The package includes unit testing based on test vectors in order to guarantee its reliability and facilitate its extension. 3 Generating SAT instances This section discusses how to translate the properties of algorithms into CNF so that SAT solvers can be used for their verification. An algorithm can always be decomposed into a sequence of basic instructions for some N. Therefore, its propositional encoding, which we denote , can be simply obtained using the propositional encoding of these instructions as follows: (1)Furthermore, if the formulas are written in CNF, then (1) is a CNF. So, finding a CNF for all possible basic instructions is sufficient in order to obtain a CNF encoding of an algorithm. This section will provide the CNF corresponding to a variety of simple instructions and properties P, allowing us to represent a wide range of algorithms and properties suitable for SAT-solver verification. To optimise their performance, primitives are often designed using word-lengths that match the processors. Typical word-lengths are . So the simple instructions are most often operators on words . Common operators found in symmetric key primitives are the bitwise logical operators NOT (), XOR (), AND (), OR (), as well as addition modulo () and left rotation of amplitude s (), with s a constant integer. S-boxes are also a frequent component of primitives. So, the instructions considered in the remainder are of the following forms: Once the formula is obtained, any property P of E, for which a CNF is known, can be verified by calling a SAT solver on an instance . The properties P considered in the remainder are the following: , , , , where is a list of -bit words. These formulas are easy to adapt when considering a subset of bits, allowing to express properties such as 'bits 1 and 3 of A are equal to (a given function of) bits 2 and 3 of B'. 3.1 Preliminaries The formulas in this section are numbered only if they are in the CNF. In these equations, a symbol , with i an integer, means that the equality is justified by equation (i). A symbol (resp. , ) means that equality is by truth tables (resp. by notation, by definition). Finally, equality without annotation is either trivial or justified in the text. A word is represented with an uppercase letter and its bits by the corresponding lowercase letter, e.g. the bits of a word are written . A set of k -bit words is written and its complement is written . For a formula, is assumed to have conjunctive terms written (2), with each term being a disjunction of literals written (3) (2) (3)Any relation on bits is a set of n -tuples that can be seen as n -bit words. Let denote these words. The complementary relation is written as and its words . Let be a propositional formula that is true if and only if . Let us write instead of . For example, is true if and only if . The conditions under which can be naturally expressed, providing the CNF of the formula (4)i.e. the CNF is a conjunction of conditions for not having an unsatisfying word. Notice that (resp. ) signifies the literal (resp. x). So, any relation on bits can be put in CNF using (4) which allows to model arbitrary properties. We will now focus on functional relations in order to model algorithms. A function defines a relation between its input and output bits. The relation has words written . Let (resp. ) be the first n bits (resp. last m bits) of . The relation is functional, i.e. for an there is a single such that . That is, whenever we necessarily have in order to satisfy . Then, the CNF of can be obtained in (5) instead of (4). (5)By using (5), the CNF of any function from n to m bits can always be expressed using clauses of length each. However, these CNFs are, in general, impractical: an operation on -bit words defines a functional relation among bits that would be expressed using clauses. Table 1 shows the number of clauses for different word lengths. Notice that in practice, S-boxes are most often of small size, rarely exceeding . Table 1. Number of clauses using (5) 8 16 32 64 Therefore, (5) will only be used for (small) S-boxes and the instructions of the form will require a different CNF encoding. Before defining these encodings in Section 3.4, the relevant identities are given in Sections 3.2 and 3.3. 3.2 Identities on bits For , the De Morgan rules (6) and (7), as well as the distributivity rules (8) and (9), are obvious from the truth tables (6) (7) (8) (9)The equality and inequality of a and b are written in CNF as follows: (10) (11)Another useful equality is () for any formula . Before giving the CNF of , we need to introduce that of (with one of the disjunctions of ) (12) (13)i.e. in order for the formula's size to be 'small', each term is replaced by a new proposition which is defined to be equivalent by the formula . 3.3 Identities of words For words , equality and inequality are written in the CNF as follows: (14) (15)The set membership can be expressed in a straightforward manner as follows: (16) (17)Table 2 summarises formulas (10)–(17), and their number of clauses. Table 2. CNF of basic relations Formula CNF of Number of clauses Equation 2 (10)* 2 (11)* (12)* (13)* (14)* (15)* (16)* (17)* 3.4 Operators By using the formulas from Table 2, the basic instructions found in cryptographic primitives can be written in CNF as shown in Table 3. Table 3. CNF of operators (18) (19) (20) (21) (22) (23) (24) (25) (26) In the last formula of Table 3, the propositions correspond to the carry bit. For , the formula expresses the value of (i.e. ) while the formula expresses the value of the carry bit (i.e. ). 4 C++ implementation Since the version C++11 of the ISO standard for C++, the header ‹cstdint› (stdint.h) defines uint8_t, uint16_t, uint32_t, uint64_t, which are unsigned integer types with bit-length exactly 8, 16, 32, 64, respectively. We will therefore only consider these types. Crypto SAT uses a template class U‹X› where the template parameter X is to be replaced by one of uint8_t, uint16_t, uint32_t, or uint64_t. This allows representing each word in a primitive's implementation (written in C or C++) as instances of this class. In the template class U‹X›, each operator shown in Table 4 is overloaded in order to output the formula from Tables 2 and 3 that corresponds to it. So, running a sequence of operations on instances of U‹X› will produce a symbolic execution of the algorithm directly in the CNF (using the DIMACS format). Table 4. C++ operators that are overloaded in CryptoSAT Description In text In C++ bitwise negation bitwise exclusive or bitwise logical and & bitwise logical or | addition modulo + right shift >> left shift << assignment = The rest of this section provides details on how the operators are overloaded in the C++ class U‹X›. This class gives rise to the classes U8, U16, U32, and U64 defined as follows: A class U with represents an -bit word and essentially consists of A value unsigned and of length bits, whose type is one of uint8_t, uint16_t, uint32_t, or uint64_t, depending on . A vector bitID of identifiers (integers) for the bits in the value, which correspond to the indices of the propositions, with the convention that the integers 0 and 1 are dummy identifiers reserved for the Boolean constants, i.e. and . (Hence the smallest identifier for a proposition is 2.) The only purpose of the vector bitID is to track the propositions' identifier and to make them available to the overloaded operators for the generation of clauses. 4.1 Constructors A class U‹X› can be instantiated in four manners: Using the (default) constructor: U‹X› (X value=0, bool isvar=0). So, by default, the object is a zero constant, i.e. none of its bits will appear as propositions in the CNF. For a constant, bitID will contain entries 0 or 1 according to the binary encoding of the value. For a variable, the content of bitID will consist of new (fresh) identifiers. These are obtained by incrementing a counter globalindex, which is common to all the U, . The copy constructor, U‹X› (U‹X› other), is not explicitly defined, but since operator= is, the instruction U‹X› a = b; is technically the same as U‹X› a(b); (the operator= copies all fields from the argument b to the object a and then returns the object for cascading assignments). The constructor U‹X› (U‹X› other) allows to build a U‹X› from an object U‹X› with . When Y is a larger (resp. smaller) word, the value of type X is initialised by extending with leading zeros (resp. truncating) the value of type Y. The constructors U‹X› (U‹X›, U‹X›) and U‹X› (U‹X›, U‹Z›) allow composing small Us into bigger ones. Details on these constructors can be found in Appendix 1. 4.2 Operators To illustrate how the overloaded code will make use of the vector bitID in order to generate clauses in DIMACS CNF format, let us consider the case of a bitwise OR: with int funct_or(int,int) defined so as to output (18) in DIMACS format: Example 2 illustrates the difference between instances of U‹X› defined as constants and those defined as variables depending on the second constructor argument isVar. Example 2.In the program below c is declared as the bitwise OR of a and b. When a and b are declared as constants (left), the bits of c are identified by the integers 2–9, corresponding to the propositions . The unit clauses define the value of the variable c. When a and b are declared as variables, the program will generate the clauses () from (18) 4.3 S-boxes Let us consider a reference implementation that uses an S-box (left) and the modification needed to produce clauses (right): The function S() uses the array S_LUT to define the returned U8, and uses the function S_CNF() to produce the clauses: Finally, the function S_CNF() produces the clauses according to (5) (an example can be found in Appendix 2). For any S-box S, Crypto SAT will automatically generate S_LUT and the functions S_CNF() and S() as discussed at the end of Section 5.1. 4.4 Additional outputs The previous paragraphs showed how the most common operators found in symmetric key cryptography can be 'clausified' and how operator overloading allows obtaining the CNF encoding of a primitive written in C or C++. Another purpose of the class U‹X› is to output words of interest with their bitID and value so that Crypto SAT can collect these and make sense of the propositions in the generated formula. This is done by the method U‹X›::print(ofstream&, ofstream&,…) using two output streams, one for bitID, the other for value, as shown in Example 3. Example 3.Consider a program that starts with the instruction U8 X(0xbe,1); . Since this is the first (variable) U defined, the bitID of X will contain the integers 2– 9. The instruction X.print(vars,vals,'X') will output X: 2 3 4 5 6 7 8 9 on vars and X: 0xbe on vals. The method X.print(vars,vals,'X') takes an additional argument, iteration, which is useful when considering the iterative values taken by a word (Example 4). Example 4.: The instructions for (i = 0; i < rounds; ++i) {X = F(X); X.print(vars,vals, 'X',i);} will output X[0]: 2 3 4 5 6 7 8 9\nX[1]:… on vars and X[0]: 0xbe\nX[1]: … on vals. 5 Software objects in CryptoSAT Once a SAT instance is generated, its manipulation can be quite cumbersome. Just in order to check if the instance reliably represents the target, for example by valuating the instance using the target's test vectors, the user would have to map the propositional variables to the input–output bits of the program and define their value using unit clauses. An important goal of Crypto SAT is to abstract away the correspondence between bits and propositional variables so that the user can focus on the properties of the C++ program. This section describes the objects Target, Instance, and Solution which form the core of the Crypto SAT software. Overview (see Fig. 1). A Target's role is to produce the target's C++ code according to user defined parameters. The method generateInstance() will compile and execute that code in order to collect the three outputs mentioned in Section 4 (i.e. the clauses, the propositions and their values) which define an Instance. An Instance allows the user to encode properties by manipulating the clauses from a high level of abstraction. The method solveWith(solver) will execute the SAT-solver given as argument and collect the solver's output in order to define an object Solution. A Solution contains the solver's output. Its ro

Referência(s)
Altmetric
PlumX