A GFS cluster consists of a single master and multiple
and is accessed by multiple clients, Each of these is typically a commodity Linux
chunkservers
machine running a user-level server process. It is easy to run
both a chunkserver and a client on the same machine, as long
as machine resources permit and the lower reliability caused
by running possibly flaky application code is acceptable.
Files are divided into fixed-size chunks. Each chunki s
identified by an immutable and globally unique 64 bit chunkassigned by the master at the time of chunkcreat ion.
handle
Chunkservers store chunks on local disks as Linux files and
read or write chunkda ta specified by a chunkha ndle and
byte range. For reliability, each chunkis replicated on multiple
chunkservers. By default, we store three replicas, though
users can designate different replication levels for different
regions of the file namespace.
The master maintains all file system metadata. This includes
the namespace, access control information, the mapping
from files to chunks, and the current locations of chunks.
It also controls system-wide activities such as chunklease
management, garbage collection of orphaned chunks, and
chunkmigration between chunkservers. The master periodically
communicates with each chunkserver in HeartBeat
messages to give it instructions and collect its state.
GFS client code linked into each application implements
the file system API and communicates with the master and
chunkservers to read or write data on behalf of the application.
Clients interact with the master for metadata operations,
but all data-bearing communication goes directly to
the chunkservers. We do not provide the POSIX API and
therefore need not hookin to the Linux vnode layer.
Neither the client nor the chunkserver caches file data.
Client caches offer little benefit because most applications
stream through huge files or have working sets too large
to be cached. Not having them simplifies the client and
the overall system by eliminating cache coherence issues.
(Clients do cache metadata, however.) Chunkservers need
not cache file data because chunks are stored as local files
and so Linux’s buffer cache already keeps frequently accessed
data in memory.
Architecture