Tachyon的配置和使用入门

文章目录[隐藏]

特性
原理简述
Tachyon集群配置

Tachyon是一个基于内存的分布式文件系统（项目首页：tachyon-project.org），它是AmpLab的BDAS(berkeley data analytics stack)的一个重要组成。解决了丢失cache导致的重新计算，不同app（job），甚至是不同计算框架间重复的内存使用等问题。目前Spark 1.1默

特性

Java-like File API: Tachyon的原生API和Java文件系统非常相似，提供InputStream, OutputStream等接口, 以及高效的内存映射I/O，用这些API能够获得最好的性能。
Compatibility: Tachyon 实现了Hadoop FileSystem 接口, 因此Hadoop MapReduce和Spark可以不经过任何修改就能在使用Tachyon。
Native support for raw tables: Tachyon对列存储结构的数据提供了原生的支持，用户可以将某些访问量高的列选择性地放到内存中。
Pluggable underlayer file system: Tachyon 提供memory data到底层文件系统的方法。目前支持HDFS和单点的本地文件系统。
Web UI: 用户可以通过浏览器浏览文件系统，在debug模式下，管理员可以查看文件的位置等详细信息。
Command line interaction: 用户可以使用 ./bin/tachyon tfs和 Tachyon交互，例如将文件在Tachyon和本地文件系统中拷贝。

原理简述

参考Dr.浩源 6月30日的slide

本文来源gao!%daima.com搞$代*!码$网3

。Tachyon的架构是常见的Master/Worker结构，使用Zookeeper可以构建Master的HA。由Master节点负责管理维护文件系统MetaData（使用Journal image+edit log，详见参考1)，而文件数据维护在Worker节点的内存中。Worker和Master的通讯依赖于thrift。另外，底层支持用户指定文件的持久化（保存到underlyHDFS中）。

Tachyon充分利用内存，在内存中只存一份数据（没有replica复制内存数据），并将lineage的设计应用到存储层，通过异步的向Tachyon的底层文件系统做Checkpoint。当我们向Tachyon里面写入文件的时候，Tachyon会在后台异步的把这个文件给checkpoint到它的底层存储。另外，Tachyon的重算如下图，如果File Set B丢失，则需要由File Set A通过Spark Job重新得到File Set B。

Tachyon中定义了下面几种cache的类型

package tachyon.clientimport java.io.IOException;/** * Different write types for a TachyonFile. */public enum WriteType {  /**   * Write the file and must cache it.   */  MUST_CACHE(1),  /**   * Write the file and try to cache it.   */  TRY_CACHE(2),  /**   * Write the file synchronously to the under fs, and also try to cache it,   */  CACHE_THROUGH(3),  /**   * Write the file synchronously to the under fs, no cache.   */  THROUGH(4),  /**   * Write the file asynchronously to the under fs (either must cache or must through).   */  ASYNC_THROUGH(5);......

Tachyon集群配置

下载并解压Tachyon 0.5
wget http://tachyon-project.org/downloads/tachyon-0.5.0-bin.tar.gz
tar xvfz tachyon-0.5.0-bin.tar.gz
cd tachyon-0.5.0/conf
Tachyon官方文档Configuration Settings，除了设置正确的JAVA_HOME，我们要设置的参数如下：

#Basictachyon.home = /var/lib/spark/tachyon-0.5.0tachyon.underfs.address = hdfs://hdp01:8020tachyon.data.folder = /user/spark/tach_datatachyon.workers.folder = /user/spark/tach_worker# tachyon.underfs.hdfs.impl = "org.apache.hadoop.hdfs.DistributedFileSystem" #default# tachyon.max.columns = 1000 #default# tachyon.table.metadata.byte = 5242880 #default#HAtachyon.usezookeeper = truetachyon.zookeeper.address = hdp02:2181, hdp03:2181, hdp04:2181tachyon.zookeeper.election.path = "/tach_elect"tachyon.zookeeper.leader.path = "/tach_leader"#Master# tachyon.master.journal.folder = "$TACHYON_UNDERFS_ADDRESS/user/spark/tach_journal/"  #default $tachyon.home + "/journal/"    tachyon.master.hostname = hdp04# tachyon.master.port = 19998    #default # tachyon.master.web.port = 19999    #default # tachyon.master.whitelist = "/"    #default #Worker# tachyon.worker.port = 29998 #default # tachyon.worker.data.port = 29999 #default tachyon.worker.memory.size = 10G      #default 128Mtachyon.worker.data.folder = /mnt/ramdisk           #default /mnt/ramdisk#User# tachyon.user.failed.space.request.limits = 3    #default # tachyon.user.quota.unit.bytes = 8MB    #default # tachyon.user.file.buffer.bytes = 1MB    #default # tachyon.user.default.block.size.byte = 1GB    #default # tachyon.user.remote.read.buffer.size.byte = 1MB    #default

搞代码网（gaodaima.com）提供的所有资源部分来自互联网，如果有侵犯您的版权或其他权益，请说明详细缘由并提供版权或权益证明然后发送到邮箱[email protected]‍，我们会在看到邮件的第一时间内为您处理，或直接联系QQ：872152909。本网站采用BY-NC-SA协议进行授权
转载请注明原文链接：Tachyon的配置和使用入门

特性

原理简述

Tachyon集群配置

Hi，您需要填写昵称和邮箱！