Apache Mahout: Machine Learning on Distributed Dataflow Systems

Artigo Revisado por pares

Apache Mahout: Machine Learning on Distributed Dataflow Systems

2020; The MIT Press; Volume: 21; Issue: 127 Linguagem: Inglês

ISSN

1533-7928

Autores

Robin Anil, Gökhan Çapan, Isabel Drost-Fromm, Ted Dunning, Ellen Friedman, Trevor Grant, Shannon Quinn, Paritosh Ranjan, Sebastian Schelter, Özgür Yılmazel,

Tópico(s)

Cloud Computing and Resource Management

Resumo

Apache Mahout is a library for scalable machine learning (ML) on distributed dataflow systems, offering various implementations of classification, clustering, dimensionality reduction and recommendation algorithms. Mahout was a pioneer in large-scale machine learning in 2008, when it started and targeted MapReduce, which was the predominant abstraction for scalable computing in industry at that time. Mahout has been widely used by leading web companies and is part of several commercial cloud offerings. In recent years, Mahout migrated to a general framework enabling a mix of dataflow programming and linear algebraic computations on backends such as Apache Spark and Apache Flink. This design allows users to execute data preprocessing and model training in a single, unified dataflow system, instead of requiring a complex integration of several specialized systems. Mahout is maintained as a community-driven open source project at the Apache Software Foundation, and is available under https://mahout.apache.org.

Ver no editor

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

Apache Mahout: Machine Learning on Distributed Dataflow Systems