The promise of artificial intelligence in chemical engineering: Is it here, finally?
2018; Wiley; Volume: 65; Issue: 2 Linguagem: Inglês
10.1002/aic.16489
ISSN1547-5905
Autores Tópico(s)Reservoir Engineering and Simulation Methods
ResumoThe current excitement about artificial intelligence (AI), particularly machine learning (ML), is palpable and contagious. The expectation that AI is poised to “revolutionize,” perhaps even take over, humanity has elicited prophetic visions and concerns from some luminaries.1-4 There is also a great deal of interest in the commercial potential of AI, which is attracting significant sums of venture capital and state-sponsored investment globally, particularly in China.5 McKinsey, for instance, predicts the potential commercial impact of AI in several domains, envisioning markets worth trillions of dollars.6 All this is driven by the sudden, explosive, and surprising advances AI has made in the last 10 years or so. AlphaGo, autonomous cars, Alexa, Watson, and other such systems, in game playing, robotics, computer vision, speech recognition, and natural language processing are indeed stunning advances. But, as with earlier AI breakthroughs, such as expert systems in the 1980s and neural networks in the 1990s, there is also considerable hype and a tendency to overestimate the promise of these advances, as market research firm Gartner and others have noted about emerging technology.7 It is quite understandable that many chemical engineers are excited about the potential applications of AI, and ML in particular,8 for use in such applications as catalyst design.9-11 It might seem that this prospect offers a novel approach to challenging, long-standing problems in chemical engineering using AI. However, the use of AI in chemical engineering is not new—it is, in fact, a 35-year-old ongoing program with some remarkable successes along the way. This article is aimed broadly at chemical engineers who are interested in the prospects for AI in our domain, as well as at researchers new to this area. The objectives of this article are threefold. First, to review the progress we have made so far, highlighting past efforts that contain valuable lessons for the future. Second, drawing on these lessons, to identify promising current and future opportunities for AI in chemical engineering. To avoid getting caught up in the current excitement and to assess the prospects more carefully, it is important to take such a longer and broader view, as a “reality check.” Third, since AI is going to play an increasingly dominant role in chemical engineering research and education, it is important to recount and record, however incomplete, certain early milestones for historical purposes. It is apparent that chemical engineering is at an important crossroads. Our discipline is undergoing an unprecedented transition—one that presents significant challenges and opportunities in modeling and automated decision-making. This has been driven by the convergence of cheap and powerful computing and communications platforms, tremendous progress in molecular engineering, the ever-increasing automation of globally integrated operations, tightening environmental constraints, and business demands for speedier delivery of goods and services to market. One important outcome from this convergence is the generation, use, and management of massive amounts of diverse data, information, and knowledge, and this is where AI, particularly ML, would play an important role. Some of these are application-focused, such as game playing and vision. Others are methodological, such as expert systems and ML—the two branches that are most directly and immediately applicable to our domain, and hence the focus of this article. These are the ones that have been investigated the most in the last 35 years by AI researchers in chemical engineering. While the current “buzz” is mostly around ML, the expert system framework holds important symbolic knowledge representation concepts and inference techniques that could prove useful in the years ahead as we strive to develop more comprehensive solutions that go beyond the purely data-centric emphasis of ML. Many tasks in these different branches of AI share certain common features. They all require pattern recognition, reasoning, and decision-making under complex conditions. And they often deal with ill-defined problems, noisy data, model uncertainties, combinatorially large search spaces, nonlinearities, and the need for speedy solutions. But such features are also found in many problems in process systems engineering (PSE)—in synthesis, design, control, scheduling, optimization, and risk management. So, some of us thought, in the early-1980s, that we should examine such problems from an AI perspective.15-17 Just as it is today, the excitement about AI at that time was centered on expert systems. It was palpable and contagious, with high expectations for AI's near-term potential.18-20 Hundreds of millions of dollars were invested in AI start-ups as well as within large companies. AI spurred the development of special purpose hardware, called Lisp machines (e.g., Symbolics Lisp machines). Promising proof-of-concept systems were demonstrated in many domains, including chemical engineering (see below). In this phase, it was expected that AI would have a significant impact in chemical engineering in the near future. However, unlike optimization and model predictive control, AI did not quite live up to its early promise. So, what happened? Why was not AI as impactful? Before addressing this question, it is necessary to examine the different phases of AI, as I classify them, in chemical engineering. While major efforts to developing AI methods for chemical engineering problems started in the early 1980s, it is remarkable that some researchers (for instance, Gary Powers, Dale Rudd, and Jeff Siirola) were investigating AI in PSE in the late 1960s and early 1970s.21 In particular, the Adaptive Initial DEsign Synthesizer system, developed by Siirola and Rudd22 for process synthesis, represents a significant development. This was arguably the first system that employed AI methods such as means-and-ends analysis, symbolic manipulation, and linked data structures in chemical engineering. Phase I, the Expert Systems Era (from the early 1980s through the mid-1990s), saw the first broad effort to exploit AI in chemical engineering. Expert systems, also called knowledge-based systems, rule-based systems, or production systems, are computer programs that mimic the problem-solving of humans with expertise in a given domain.23, 24 Expert problem-solving typically involves large amounts of specialized knowledge, called domain knowledge, often in the form of rules of thumb, called heuristics, typically learned and refined over years of problem-solving experience. The amount of knowledge manipulated is often vast, and the expert system rapidly narrows down the search by recognizing patterns and by using the appropriate heuristics. The architecture of these systems was inspired by the stimulus–response model of cognition from psychology and pattern-matching-and-search model of symbolic computation, which originated in Emil Post's work in symbolic logic. Building on this work, Simon and Newell in the late 1960s and 1970s devised the production system framework, an important conceptual, representational, and architectural breakthrough, for developing expert systems.25-27 The crucial insight here was that one needs to, and one can, separate domain knowledge from its order of execution, that is, from search or inference, thereby achieving the necessary computational flexibility to address ill-structured problems. In contrast, conventional programs consist of a set of statements whose order of execution is predetermined. Therefore, if the execution order is not known or cannot be anticipated a priori, as in the case of medical diagnosis, for example, this approach will not work. Expert systems programming alleviated this problem by making a clear distinction between the knowledge base and the search or inference strategy. This not only allowed for flexible execution, it also facilitated the incremental addition of knowledge, without distorting the overall program structure. This rule-based knowledge representation and architecture are intuitive, and relatively easy to understand and generate explanations about the system's decisions. This new approach facilitated the development of a number of impressive expert systems, starting with MYCIN, an expert system for diagnosing infectious diseases28 developed at Stanford University during 1972–82. This led to other successful systems such as PROSPECTOR (for mineral prospecting29), R1 (configuring Vax computers30), and so on, in this era. These systems inspired the first expert system application in chemical engineering, CONPHYDE, developed in 1983 by Bañares-Alcántara, Westerberg, and Rychner at Carnegie Mellon16 for predicting thermophysical properties of complex fluid mixtures. CONPHYDE was implemented using Knowledge Acquisition System that was used for PROSPECTOR. This was quickly followed by DECADE, in 1985, again from the same CMU researchers,17 for catalyst design. There was other such remarkable early work in process synthesis, design, modeling, and diagnosis as well. In synthesis and in design, for instance, important conceptual advances were made by Stephanopoulos and his students, starting with Design-Kit,31 and in modeling, MODELL.LA, a language for developing process models.32 In process fault diagnosis, Davis33 and Kramer,34, 35 and their groups, made important contributions in the same period. My group developed causal model-based diagnostic expert systems,36 a departure from the heuristics-based approach, which was the dominant theme of the time. We also demonstrated the potential of learning expert systems, an unusual idea at that time as automated learning in expert systems was not in vogue.37 The need for causal models in AI, a topic that has emerged as very important now,38 was also recognized in those early years.39 This period also saw expert system work commencing in Europe,40 particularly for conceptual design support. An important large-scale program in this era was the Abnormal Situation Management (ASM) consortium, funded at $17 million by the National Institute of Standards and Technology's Advanced Technology Program and by the leading oil companies, under the leadership of Honeywell.41 Three different academic groups, led by Davis (Ohio State), Vicente (University of Toronto), and myself at Purdue, were also involved in the consortium. This program is the forerunner to the current Clean Energy Smart Manufacturing Innovation Institute that was funded in 2016.42 The first course on AI in PSE was developed and taught at Columbia University in 1986, and it was subsequently offered at Purdue University for many years. The earlier offerings had an expert systems emphasis, but as ML advanced, in later years, the course evolved to include topics such as clustering, neural networks, statistical classifiers, graph-based models, and genetic algorithms. In 1986, Stephanopoulos published an article43 titled, “Artificial Intelligence in Process Engineering”, in which he discussed the potential of AI in process engineering and outlined a research program to realize it. Coincidentally, in the same issue, I had a article with the same title, which described the Columbia course.44 In my article, I discussed topics from the course, and it mirrored what Stephanopoulos had outlined as the research program. (Curiously, we did not know each other at that time and had written our articles independently, yet with the same title, at the same time, with almost the same content, and had submitted to the same journal for the same issue!) The first AIChE session on AI was organized by Gary Powers (CMU) at the annual meeting held in Chicago in 1985. The first national meeting on AI in process engineering was held in 1987 at Columbia University, co-organized by Venkatasubramanian, Stephanopoulos, and Davis, sponsored by the National Science Foundation, American Association for Artificial Intelligence, and Air Products. The first international conference, Intelligent Systems in Process Engineering (ISPE’95), sponsored by the Computer Aids for Chemical Engineering (CACHE) Corporation, was co-organized by Stephanopoulos, Davis, and Venkatasubramanian, held at Snowmass, CO, in July 1995. The CACHE Corporation had also organized an Expert Systems Task Force in 1985, under the leadership of Stephanopoulos, to develop tools for the instruction of AI in chemical engineering.45 The task force published a series of monographs on AI in process engineering during 1989–1993. Despite impressive successes, the expert system approach did not quite take-off as it suffered from serious drawbacks. It took a lot of effort, time, and money to develop a credible expert system for industrial applications. Furthermore, it was also difficult and expensive to maintain and update the knowledge base as new information came in or the target application changed, such as in the retrofitting of a chemical plant. This approach did not scale well for practical applications (more on this in sections Lack of impact of AI during Phases I and II and Are things different now for AI to have impact?). As the excitement about expert systems waned in the 1990s due to these practical difficulties, interest in another AI technique was picking up greatly. This was the beginning of Phase II, the Neural Networks Era, roughly from 1990 onward. This was a crucial shift from the top-down design paradigm of expert systems to the bottom-up paradigm of neural nets that acquired knowledge automatically from large amounts of data, thus easing the maintenance and development of models. It all started with the reinvention of the backpropagation algorithm by Rumelhart, Hinton, and Williams in 1986 for training feedforward neural networks to learn hidden patterns in input–output data. It had been proposed earlier, in 1974, by Paul Werbos as part of his Ph.D. thesis at Harvard. It is essentially an algorithm for implementing gradient descent search, using the chain rule in calculus, to propagate errors back through the network to adjust the strength (i.e., weights) of connections between nodes iteratively, to make the network learn the patterns. While the idea of neural networks had been around since 1943 from the work of McCulloch and Pitts, and was further developed by Rosenblatt, Minsky, and Papert in the 1960s, these earlier models were limited in scope as they could not handle problems with nonlinearity. The key breakthrough this time was the ability to solve nonlinear function approximation and nonlinear classification problems in an automated manner using the backpropagation learning algorithm. The typical structure of a feedforward neural network from this era is shown in Figure 1, with its input, hidden, and output layers of neurons, and their associated signals, weights and biases. The figure also shows examples of nonlinear function approximation and nonlinear classification problems such networks were able to solve provided enough data were available.46 (a) Architecture of a feedforward neural network. (b) Examples of nonlinear function approximation and classification problems. Adapted from: https://medium.com/@curiousily/tensorflow-for-hackers-part-iv-neural-network-from-scratch-1a4f504dfa8 https://neustan.wordpress.com/2015/09/05/neural-networks-vs-svm-where-when-and-above-all-why/ http://mccormickml.com/2015/08/26/rbfn-tutorial-part-ii-function-approximation/ This novel automated nonlinear modeling ability spurred a tremendous amount of work in a variety of domains including chemical engineering.47 Researchers made substantial progress on addressing challenging problems in modeling,48, 49 fault diagnosis,50-55 control,56, 57 and product design.58 In particular, the recognition of the connection between the autoencoder architecture and the nonlinear principal component analysis by Kramer,48 and the recognition of the nature of the basis function approximation of neural networks through the WaveNet architecture by Bakshi and Stephanopoulos49 are outstanding contributions. There were hundreds of articles in our domain during this phase and only some of the earliest and key articles are highlighted here. While this phase was largely driven by neural networks, researchers also made progress on expert systems (such as the ASM consortium) and genetic algorithms at that time. For instance, we proposed59 directed evolution of engineering polymers in silico using genetic algorithms. This led in subsequent years60 to the multiscale model-based informatics framework called Discovery Informatics61 for materials design. The discovery informatics framework led to the successful development of materials design systems using directed evolution in several industrial applications, such as gasoline additives,62 formulated rubbers,63 and catalyst design.64 During this period, researchers were also beginning to realize the challenges and opportunities in multiscale modeling using informatics techniques.65, 66 Other important advances not using neural networks included research into frameworks and architectures for building AI systems, such as blackboard architectures, integrated problem-solving-and-learning systems, and cognitive architectures. Architectures such as Prodigy and Soar are examples of this work.67 Similarly, there was progress in process synthesis and in design,68 domain-specific representations and languages,32, 69 domain-specific compilers,70 ontologies,71, 72 modeling environments,32, 73 molecular structure search engines,74 automatic reaction network generators,64 and chemical entities extraction systems.74 These references by no means constitute a comprehensive list. All this work, and others along similar lines, performed some two decades ago is still relevant and useful today in the modern era of data science. Building such systems using modern tools presents major opportunities. Despite the surprising success of neural networks in many practical applications, some especially challenging problems in vision, natural language processing, and speech understanding remained beyond the capabilities of the neural nets of this era. Researchers suspected that one would need neural nets with many more hidden layers, not just one, but training these turned out to be nearly impossible. So, the field was more or less stuck for about a decade or so until a breakthrough arrived for training deep neural nets, thus launching the current phase which we discuss in section Phases of AI in Chemical Engineering: Current. In spite of all this effort over two decades, AI was not as transformative in chemical engineering as we had anticipated. In hindsight, it is clear why this was the case. First, the problems we were attacking are extremely challenging even today. Second, we were lacking the powerful computing, storage, communication, and programming environments required to address such challenging problems. Third, we were greatly limited by data. And, finally, whatever resource that was available was very expensive. There were three kinds of challenges in Phases I and II—conceptual, implementational, and organizational. While we made good progress on the conceptual issues such as knowledge representation and inference strategies for attacking problems in synthesis, design, diagnosis, and safety, we could not overcome the implementation challenges and organizational difficulties involved in practical applications. In short, there was no “technology push.” Further, as it turned out, there was no “market pull” either, in the sense that the low-hanging fruits in process engineering, in that period, could be picked more readily by optimization and by model-predictive control (MPC) technologies. Generally speaking, as algorithms and hardware improved over the years, these traditional approaches scaled well on problems for which we could build and solve first-principles-based models. On the contrary, problems for which such models are difficult to build (e.g., diagnosis, safety analysis, and materials design), or almost impossible to generate (e.g., speech recognition), required AI-based approaches, which required enormous computational power and voluminous data, both of which were not available during this period. This lack of practical success led to two “AI winters,” one at the end of the Expert Systems era and the other at the end of the Neural Networks era, when funding for AI research greatly diminished both in computer science and in the application domains. This slowed progress even more. In addition, it typically seems to take about 50 years for a technology to mature, penetrate, and have widespread impact, from discovery to adoption. For instance, for the simulation technology such as Aspen Plus to achieve about 90% market penetration, it took about 50 years from the time computer simulation of chemical plants was first proposed in the 1950s.75 A similar discovery-growth-and-penetration cycle occurred in optimization as well, for mixed-integer linear programming (MILP) and mixed-integer nonlinear programming technologies, and for MPC. In retrospect, during Phase I and II, AI as a tool was only about 10–15 years old. It was too early to expect widespread impact. This analysis suggests that one could expect wider impact around 2030–35. While predicting technology penetration and impact is hardly an exact science, this estimate nevertheless seems reasonable given the current state of AI. As it turns out, for those of us who started working on AI in the early-1980s, we were much too early as far as impact is concerned, but it was intellectually challenging and exciting to attack these problems. Many of the intellectual challenges, such as developing hybrid AI methods and causal model-based AI systems, are still around, as I shall discuss. The progress of AI over the last decade or so has been very exciting, and the resource limitations mentioned above are largely gone now. Implementational difficulties have been greatly diminished. Organizational and psychological barriers are also lower now, as people have started to trust and accept recommendations from AI-assisted systems, such as Google, Alexa, and Yelp, more readily and for a variety of tasks. Companies are beginning to embrace organizational and work-flow changes to accommodate AI-assisted work processes. It is both interesting and instructive to make the following comparison. In 1985, arguably the most powerful computer was the CRAY-2 supercomputer. Its computational speed was 1.9 gigaflops and it consumed 150 kW of power. The 16 million dollar machine (about $32 million in today's dollars) was huge, and required a large, customized, air-conditioned environment to house it. So, what would CRAY-2 look like now? Well, it would look like the Apple Watch (Series 1). In fact, the Apple Watch is more powerful than the CRAY-2 was. The Apple Watch performs at 3 gigaflops, while consuming just 1 W of power—and it costs $300! That is a 150,000-fold performance/unit-cost gain, just on the hardware side. There have been equally dramatic advances in software—in the performance of algorithms and in high-level programming environments such as MATLAB, Mathematica, Python, Hadoop, Julia, and TensorFlow. Gone are the days when we had to program in Lisp for weeks to achieve what can now be accomplished in a few minutes with a few lines of code. We have also seen great progress in wireless communication technologies. The other critical development is the availability of tremendous amounts of data, “big data”, in many domains, which made possible the stunning advances in ML (more on this below). All this is game-changing. What accounts for this progress? Basically, Moore's law continued to deliver without fail for the last 30 years, far outlasting its expected lifespan, making these stunning advances possible. As a result, the “technology push” is here. The “market pull,” I believe, is also here because much of the efficiency gains that could be accomplished using optimization and MPC technologies have largely been realized. Hence, for further gains, for further automation, one must go up the value chain, and that means going after challenging decision-making problems that require AI-assisted solutions. So, now we have a “technology push-market pull” convergence. Looking back some 30 years from now, history would recognize that there were three early milestones in AI. One is Deep Blue defeating Gary Kasparov in chess in 1997, the second Watson becoming Jeopardy champion in 2011, and the third is the surprising win by AlphaGO in 2016. The AI advances that made these amazing feats possible are now poised to have an impact that goes far beyond game playing. In my view, we entered Phase III around 2005, the era of Data Science or Predictive Analytics. This new phase was made possible by three important ideas: deep or convolutional neural nets (CNNs), reinforcement learning, and statistical ML. These are the technologies that are behind the recent AI success stories in game playing, natural language processing, robotics, and vision. Unlike neural nets of the 1990s, which typically had only one hidden layer of neurons, deep neural nets have multiple hidden layers, as shown in Figure 2. Such an architecture has the potential to extract features hierarchically for complex pattern recognition. However, such deep networks were nearly impossible to train using the backpropagation or any gradient descent algorithm. The breakthrough came in 200676, 77 by using a layer-by-layer training strategy coupled with considerable increase in processing speed, in the form of graphics processing units. In addition, a procedure called convolution in the training of the neural net78 made such feature extraction feasible. Convolution is a filtering technique, well-known in the domain of signals processing, for extracting features from a noisy signal. After the initial specification of the network architecture and the filter parameters such as the size and number of filters, a CNN learns during training, from a very large data set—and this is a crucial requirement—the appropriate filters that lead to a successful performance by the network.79-81 Typical convolutional neural network (CNN) architecture. (Adapted from https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/) Another architectural innovation was the recurrent neural network.82 A feedforward neural network has no notion of temporal order, and the only input it considers is the current example it has been shown. This is not appropriate for problems which have sequential information, such as time series data, where what comes next typically depends on what has gone before. For instance, to predict the next word in a sentence one needs to know which words came before it. Recurrent networks address such problems by taking as their input not just the current input example, but also what they have seen previously. Since the output depends on what has occurred before, the network behaves as if it has “memory.” This “memory” property was further enhanced by another architectural innovation called the long short-term memory (LSTM) unit. The typical LSTM unit consists of a cell, an input gate, an output gate, and a forget gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell. LSTM networks are well suited for making predictions based on time series data, since there can be lags of unknown duration between important events in a time series. While the key advances here are in the architecture and training of large-scale deep neural networks, the second important idea, reinforcement learning, can be thought of as a scheme for learning a sequence of actions to achieve a desired outcome, such as maximizing an objective function. It is a goal-oriented learning procedure in which an agent learns the desired behavior by suitably adapting its internal states based on the reward-punishment signals it receives iteratively in response to its dynamic interaction with the environment. A simple example is the strategy one uses to train a pet, say a dog, where one rewards the pet with a treat if it learns the desired behavior and punishes it if it does not. When this is repeated many times, one is essentially reinforcing the reward-punishment patterns to the pet until it adopts the desired behavior. This feedback control-based learning mechanism is essentially Bellman's dynamic programming in modern ML garb.83 For this approach to work well for complex problems, such as the game of Go, one literally needs millions of “training sessions.” AlphaGo played millions of games against itself to learn the game from scratch, accumulating thousands of years' worth of human expertise and skill during a period of just a few days.13, 14 As stunning as this accomplishment is, one must note that the game playing domain has the envious property that it can provide almost unlimited training data over unlimited training runs with a great deal of accuracy. This is typically not the case in science and engineering, where one is data-limited even in this “big data” era. But this limitation might be overcome whenever the source of the data is a computer simulation, as in some materials science applications. For the sake of completeness, it is important to point out that reinforcement learning differs from the other dominant learning paradigms–supervised and unsupervised learning. In supervised learning, the system learns the relationship between input (X) and output (Y) given a set of input–output (X-Y) pairs. On the other contrary, in unsupervised learning, only a set of X is given with no labels (i.e., no Y). The system is supposed to discover the regularities in the data on its own, hence “unsupervised.” One could say that unsupervised learning looks for similarities among e
Referência(s)