• 欢迎访问搞代码网站,推荐使用最新版火狐浏览器和Chrome浏览器访问本网站!
  • 如果您觉得本站非常有看点,那么赶紧使用Ctrl+D 收藏搞代码吧

Oink : Making Pig Self-Service

mysql 搞代码 4年前 (2022-01-09) 11次浏览 已收录 0个评论
文章目录[隐藏]

The Platform and Infrastructure team at eBay Inc. is happy to announce the open-sourcing of Oink – a self-s本文来源gaodai#ma#com搞*!代#%^码网%ervice solution to Apache Pig. Pig and Hadoop overview Apache Pig?is a platform for analyzing large data sets. It uses a high-level

The Platform and Infrastructure team at eBay Inc. is happy to announce the open-sourcing of Oink – a self-service solution to Apache Pig.

Pig and Hadoop overview

Apache Pig?is a platform for analyzing large data sets. It uses a high-level language for expressing data analysis programs, coupled with the infrastructure for evaluating these programs. Pig abstracts the Map/Reduce paradigm, making it very easy for users to write complex tasks using Pig’s language, called Pig Latin. Because execution of tasks can be optimized automatically, Pig Latin allows users to focus on semantics rather than efficiency. Another key benefit of Pig Latin is extensibility:? users can do special-purpose processing by creating their own functions.

Apache Hadoop and Pig provide an excellent platform for extracting and analyzing data from very large application logs. At eBay, we on the Platform and Infrastructure team are responsible for storing TBs of logs that are generated every day from thousands of eBay application servers. Hadoop and Pig offer us an array of tools to search and view logs and to generate reports on application behavior. As the logs are available in Hadoop, engineers (users of applications) also have the ability to use Hadoop and Pig to do custom processing, such as Pig scripting to extract useful information.

The problem

Today, Pig is primarily used through the command line to spawn jobs. This model wasn’t well suited to the Platform team at eBay, as the cluster that housed the application logs was shared with other teams. This situation created a number of issues:

  • Governance – In a shared-cluster scenario, governance is critically important to attain. Pig scripts and requests of one customer should not impact those of other customers and stakeholders of the cluster. In addition, providing CLI access would make governance difficult in terms of controlling the number of job submissions.
  • Scalability – CLI access to all Pig customers created another challenge:? scalability. Onboarding customers takes time and is a cumbersome process.
  • Change management – No easy means existed to upgrade or modify common libraries.

Hence, we needed a solution that acted as a gateway to Pig job submission, provided QoS, and abstracted the user from cluster configuration.

The solution:? Oink

Oink solves the above challenges not only by allowing execution of Pig requests through a REST interface, but also by enabling users to register jars, view the status of Pig requests, view Pig request output, and even cancel a running Pig request. With the REST interface, the user has a cleaner way to submit Pig requests compared to CLI access. Oink serves as a single point of entry for Pig requests, thereby facilitating rate limiting and QoS enforcement for different customers.

Oink runs as a servlet inside a web container and allows users to run multiple requests in parallel within a single JVM instance. This capability was not supported initially, but rather required the help of the patch found in PIG-3866. This patch provides multi-tenant environment support so that different users can share the same instance.

With Oink, eBay’s Platform and Infrastructure team has been able to onboard 100-plus different use cases onto its cluster. Currently, more than 6000 Pig jobs run every day without any manual intervention from the team.

Special thanks to Vijay Samuel, Ruchir Shah, Mahesh Somani, and Raju Kolluru for open-sourcing Oink. If you have any queries related to Oink, please submit an issue through GitHub.


搞代码网(gaodaima.com)提供的所有资源部分来自互联网,如果有侵犯您的版权或其他权益,请说明详细缘由并提供版权或权益证明然后发送到邮箱[email protected],我们会在看到邮件的第一时间内为您处理,或直接联系QQ:872152909。本网站采用BY-NC-SA协议进行授权
转载请注明原文链接:Oink : Making Pig Self-Service

喜欢 (0)
[搞代码]
分享 (0)
发表我的评论
取消评论

表情 贴图 加粗 删除线 居中 斜体 签到

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址