- •components that often fail. It must constantly monitoritself and detect, tolerate, and recover promptly fromcomponent failures on a routine basis.•expect a few million files, each typically 100 MB orlarger in size. Multi-GB files are the common caseand should be managed efficiently. Small files must besupported, but we need not optimize for them.•large streaming reads and small random reads. Inlarge streaming reads, individual operations typicallyread hundreds of KBs, more commonly 1 MB or more.Successive operations from the same client often readthrough a contiguous region of a file. A small randomread typically reads a few KBs at some arbitraryoffset. Performance-conscious applications often batchand sort their small reads to advance steadily throughthe file rather than go backan d forth.•that append data to files. Typical operation sizes aresimilar to those for reads. Once written, files are seldommodified again. Small writes at arbitrary positionsin a file are supported but do not have to beefficient.•for multiple clients that concurrently appendto the same file. Our files are often used as producerconsumerqueues or for many-way merging. Hundredsof producers, running one per machine, will concurrentlyappend to a file. Atomicity with minimal synchronizationoverhead is essential. The file may beread later, or a consumer may be reading through thefile simultaneously.•latency. Most of our target applications place a premiumon processing data in bulka t a high rate, whilefew have stringent response time requirements for anindividual read or write.High sustained bandwidth is more important than lowThe system must efficiently implement well-defined semanticsThe workloads also have many large, sequential writesThe workloads primarily consist of two kinds of reads:The system stores a modest number of large files. WeThe system is built from many inexpensive commodity
Monday, December 6, 2010
Design Assumptions of Google File System:
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment