i found a very useful article about the database sharding, here is the content.
This article accompanies the slides from a p+本文来源gao@daima#com搞(%代@#码网
搞代gaodaima码
resentation on database sharding. Sharding is a technique used for horizontal scaling of databases we are using at Netlog. If you’re interested in high performance, scalability, MySQL, php, caching, partitioning, Sphinx, federation or Netlog, read on …
This presentation was given at the second day of FOSDEM 2009 in Brussels. FOSDEM is an annual conference on open source software with about 5000 hackers. I was invited by Kris Buytaert and Lenz Grimmer to give a talk in the MySQL Dev Room. The talk was based on an earlier talk I gave atBarcampGent 2.
Overview Who am I? What is Netlog? A history of scaling database systems Hitting limits Sharding basics Sharding schemes Implications Existing solutions Implementation Tackling the problems Final thoughts Slides Resources
Who am I?
Currently I am a Lead Web Developer at Netlog working with php, MySQL and other frontend technologies to develop and improve the features of our social network. I’ve been doing this for 3 years now. For this paper it is important to mention that I am neither a DBA nor a sys-admin, so I approach the problem of scaling databases from an application / developer point of view.
Of course the solutions presented in this presentation are the result of a lot of effort from the Development and IT Services Department at Netlog.
What is Netlog?
For those of you, who are unfamiliar with Netlog, it’s best to sketch a little overview of who and what we are, and especially where we come from in terms of userbase and growth. It will let you see things in perspective regarding scalability. At the moment we have over 40 million active members, resulting in over 50 million unique visitors per month. This adds up to 5+ billion page views per month and 6 billion online minutes per month. We’re active in 26 languages and 30+ countries with our 5 most active countries being Italy, Belgium, Turkey, Switzerland and Germany. (If you’re interested in more info about the company, check our About Pages and sign-up for an account.)
In terms of database statistics, this type of usage results among others in huge amounts of data to store (eg. 100+ million friendships for nl.netlog.com). The nature of ourapplication (lots of interaction) results in a very write-heavy app (with a read-write ratio of about 1.4 to 1). A typical database, before sharding, had an average of 3000+ queries per second during the peaktime (15h – 22h local time, for nl.netlog.com).
Of course, these requirements do not have to be met by every application, and different applications require different scaling strategies. Nevertheless we wouldn’t have thought (or hoped) to be where we are today, when we started off 7 years ago as a college student project. We are convinced that we can give you further insight into scalability and share some valuable suggestions.
Below is a graph of our growth in the last year.
This growth has of course resulted in several performance issues. The bottleneck for us has often been the database layer, because this layer is the only layer in the web stack that isn’t stateless. The interactions and dependencies in a relational database system, make scaling horizontally less evident.
Netlog is (being) built and runs on open source software such as php, MySQL, Apache, Debian,Memcached, Sphinx, Lighttpd, Squid, and many more. Our solutions for scaling databases are also built on these technologies. That’s why we want to give something back by documenting and sharing our story.
A history of scaling database systems
As every hobby project, Netlog (then asl.to, “your internet passport”) started off, more then 7 years ago, with a single database instance on a – probably virtual – server in a shared hosting environment. As traffic grew and load increased, we moved to a separate server, with eventually a split setup for MySQL and php (database setup 1).