MLlib is Apache Spark's scalable machine learning library.

    Ease of use

    Usable in Java, Scala, Python, and R.

    MLlib fits into Spark's APIs and interoperates with NumPy in Python (as of Spark 0.9) and R libraries (as of Spark 1.5). You can use any Hadoop data source (e.g. HDFS, HBase, or local files), making it easy to plug into Hadoop workflows.

    data = spark.read.format("libsvm")\
      .load("hdfs://...")

    model = KMeans(k=10).fit(data)
    Calling MLlib in Python

    Performance

    High-quality algorithms, 100x faster than MapReduce.

    Spark excels at iterative computation, enabling MLlib to run fast. At the same time, we care about algorithmic performance: MLlib contains high-quality algorithms that leverage iteration, and can yield better results than the one-pass approximations sometimes used on MapReduce.

    Logistic regression in Hadoop and Spark

    Runs everywhere

    Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud, against diverse data sources.

    You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on Kubernetes. Access data in HDFS, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources.

    Algorithms

    MLlib contains many algorithms and utilities.

    ML algorithms include:

    • Classification: logistic regression, naive Bayes,...
    • Regression: generalized linear regression, survival regression,...
    • Decision trees, random forests, and gradient-boosted trees
    • Recommendation: alternating least squares (ALS)
    • Clustering: K-means, Gaussian mixtures (GMMs),...
    • Topic modeling: latent Dirichlet allocation (LDA)
    • Frequent itemsets, association rules, and sequential pattern mining

    ML workflow utilities include:

    • Feature transformations: standardization, normalization, hashing,...
    • ML Pipeline construction
    • Model evaluation and hyper-parameter tuning
    • ML persistence: saving and loading models and Pipelines

    Other utilities include:

    • Distributed linear algebra: SVD, PCA,...
    • Statistics: summary statistics, hypothesis testing,...

    Refer to the MLlib guide for usage examples.

    Community

    MLlib is developed as part of the Apache Spark project. It thus gets tested and updated with each Spark release.

    If you have questions about the library, ask on the Spark mailing lists.

    MLlib is still a rapidly growing project and welcomes contributions. If you'd like to submit an algorithm to MLlib, read how to contribute to Spark and send us a patch!

    Getting started

    To get started with MLlib:

    • Download Spark. MLlib is included as a module.
    • Read the MLlib guide, which includes various usage examples.
    • Learn how to deploy Spark on a cluster if you'd like to run in distributed mode. You can also run locally on a multicore machine without any setup.
    主站蜘蛛池模板: 国产AV午夜精品一区二区三区| 一区二区和激情视频| 国产高清一区二区三区| 色国产在线视频一区| 久久久久人妻一区精品色| 精品视频在线观看你懂的一区| 在线观看中文字幕一区| 亚洲中文字幕在线无码一区二区| 国产日韩AV免费无码一区二区 | 国产在线不卡一区二区三区 | 一区二区三区日本视频| 在线观看一区二区三区视频| 亚洲va乱码一区二区三区| 麻豆aⅴ精品无码一区二区| 久久精品免费一区二区三区| 久久国产高清一区二区三区| 激情啪啪精品一区二区| 免费在线视频一区| 亚洲av无码天堂一区二区三区| 精品视频在线观看一区二区| 国产成人AV区一区二区三| 国产成人免费一区二区三区| 国产人妖在线观看一区二区| 动漫精品一区二区三区3d| 一区二区高清视频在线观看| 亚洲午夜日韩高清一区| 国产一区二区三区在线视頻 | 内射少妇一区27P| 偷拍精品视频一区二区三区| 国产精品xxxx国产喷水亚洲国产精品无码久久一区 | 天堂成人一区二区三区| 精品一区二区三区四区电影| 好吊视频一区二区三区| 国产一区二区三区久久| 免费无码毛片一区二区APP| 亚洲综合av一区二区三区不卡| 欧美日韩精品一区二区在线观看| 日韩精品久久一区二区三区| 国产免费播放一区二区| 蜜臀AV在线播放一区二区三区| 精品一区二区久久|