Solaris 10 Nfsv4 bug

"Here is a list of servers and you can switch = when=20 there is a

failure with no discontinuity of access = (changes in=20 fh's, stateid's,

fileids, change attributes), although = changing is a big=20 deal

and you shouldn't do it without good=20 cause."

"Here is a list of servers and you can switch = when=20 there is a

failure with no discontinuity of access or = even at will=20 since

there is no big cost to=20 switch."

"Here is a list of servers and you can access = any of=20 these

servers as you will at the same time = (multi-pathing or=20 a

cluster fs) since they=20 are all effectively the same=20 thing".

The problem is that the client only knows = about the=20 list and

has no way of knowing which of the statements = above is=20 associated

with the list of servers he is getting. = I have=20 been thinking

about a locations_info attribute for 4.1 that = would=20 allow the

server to tell the client which of those he = meant and=20 also

give preference information (local vs. remote = copies,=20 absolutely

up-to-data vs. slightly out-of-data=20 copies).

> is solely within the purview of a = particular=20 server

> I agree that once you got a = layout for a=20 file you should follow =

> the layout instruction and = read/write only=20 from the specified=20

> nodes and get an error if you=20 don't.

Then it sounds like we are = in violent=20 agreement, except maybe for

choice of modal = auxiliaries or=20 capitalization.

You say "you should follow = the layout=20 instruction" and I'm torn

between saying "you SHOULD = follow the=20 layout instruction" and

"you MUST follow the = layout=20 instruction".

You say "[should] get an = error if you=20 don't" and I say "the server

SHOULD give you an error = if you=20 don't".

-----Original Message-----
From: Marc = Eshel=20 [mailto:eshel@almaden.ibm.com]
Sent: Tuesday, June 21, 2005 = 12:51=20 PM
To: Noveck, Dave
Cc: Dean Hildebrand; Goodson, = Garth;=20 nfsv4@ietf.org; nfsv4-bounces@ietf.org
Subject: RE: [nfsv4] = pNFS=20 issues/changes = (draft-welch-pnfs-ops-02.txt)

nfsv4-bounces@ietf.org wrote on 06/21/2005 06:39:07 = AM: >=20 Hold on. I'm not clear exactly how people are supposing that=20 this > will work. > > If I have a given server and = let us=20 say it is the metadata server > for the root directory of an fs, = then=20 with v4 (and so far pnfs) as > it stands it is the metadata = server for=20 every every descendent object > until and unless the fsid = changes.=20 If a client were to decide > arbitrarily that it could try = to do=20 metadata operations on a different > server with that same = filehandle,=20 then I would say that that client > is seriouly confused and = we'd like=20 to know about it. >
> If = the server=20 is deciding, in Marc's words, that "nodes in my cluster > = filesystem=20 ... take the roles of metadata server or data server for > each = file=20 independently", how is the client to determine what the > = proper=20 metadata server for each file is? Are we talking about = some >=20 protocol extension beyond what has already been discussed for = pnfs > to=20 distribute the data server role? We haven't talked about=20 anything > to specifically support distribution of the metadata = server=20 role for > a filesystem. I'm not saying that that would be = a bad=20 thing to have, > but it seems to be a new thing and right now I = think=20 the pnfs focus > should be more on nailing down the stuff = already=20 discussed.
With fs-locations we = give the=20 client a list of metadata server that can act as the server for given=20 filesystem and the client can choose which one to use and switch among = them at=20 will or because of some failure. I agree that once you got a layout = for a file=20 you should follow the layout instruction and read/write only from the=20 specified nodes and get an error if you don't.
> It is possible with a cluster filesystem = to have=20 multiple clients > each mount different servers as the metadata = server=20 and leave it to > the cluster filesystem to provide the = coherence for=20 metadata operations. > In that case multiple servers would be = acting as=20 metadata servers for > what is really the same filesystem.=20 However, in that case, I would > argue that we still want = the=20 language and the error in Garth's draft > (even if Garth now = seems to=20 backing away from it). Suppose a client > is talking to = server A=20 and a file has filehandle X and that file > has two stripes, one = with=20 handle P on server B and the other with > handle Q on server C.=20 Those particular handles (should) give you > the ability = to do=20 data operations and nothing else. If you do metadata > = operations=20 with them, you should get an error, since you are doing > = something that=20 pnfs does not allow. This is without regard to the > fact = that B=20 and C may have the ability to act as metadata servers > for = other=20 handles. However, they should not act as metadata = servers > for=20 stripes P and Q since P and Q are stripes and not files. Even if = > you had a file with a single stripe, the handle for that = stripe as=20 a > file should be different from the one that only gives the = client=20 the > right to do IO, P and P' let's say. If a client = takes a=20 handle that > gives him the right to do IO (he got it from a = layout)=20 and uses it > for metadata operations, he is violating the = protocol and=20 should get > an error (and the server SHOULD make sure that he = gets=20 one).
Yes, once you started to = use a=20 metadata server A for file X you should stick to it as the metadata = server for=20 the file until you are done with it and follow the layout to use the=20 appropriate data servers. I am not sure why the restriction on the = usage of=20 file handles. It is an add complication to the server implementation = to add=20 information to the fh that restricts it usage from specific nodes. We = should=20 say that the client should not use fh given by layout for other = purposes but=20 not require an error if it does.

Marc.=20

> > = -----Original=20 Message----- > From: Goodson, Garth > Sent: Monday, June = 20, 2005=20 6:06 PM > To: Dean Hildebrand > Cc: nfsv4@ietf.org > = Subject: Re: [nfsv4] pNFS issues/changes = (draft-welch-pnfs-ops-02.txt) >=20 > > Correct, that is one of the issues I mentioned. = But I=20 think there are > cases where the data servers are neither = stock, nor=20 where it is > desirable that they service more than I/O = requests (i.e.,=20 metadata > requests) in which case they may return an error = (and that=20 error should > be decided upon). > > = -Garth >=20 > Dean Hildebrand wrote: > > If they are stock, how = can they=20 return a new error unless they know > > something, in which = case they=20 are no longer stock. > > Dean > > > > On = Mon, 20=20 Jun 2005, Garth Goodson wrote: > > > > > = >>I=20 agree. I don't think it need be an error, but if the system=20 isn't > >>designed to handle it, it would be nice if the = data=20 servers did/could > >>reply with an error. >=20 >> > >> > >>-Garth > = >> >=20 >>Marc Eshel wrote: > >> > = >>> >=20 >>>nfsv4-bounces@ietf.org wrote on 06/20/2005 12:39:59 = PM: >=20 >>> > >>> > 5.1 File Striping and Data=20 Access > >>> > > >>> > Change: = simplify=20 striping layout -- have enum for SPARSE vs. DENSE > >>> = >=20 layout instead of skip and start offset > >>> = > >=20 >>> > Issue: think about what error gets returned if a = client=20 performs a > >>> > non(READ/WRITE/PUTFH/COMMIT) at a = data=20 server; issue: this may be a > >>> > problem if a = regular=20 nfsv4 data server is used as it has no way to > >>> = >=20 differentiate accesses. > >>> > > = >>> >=20 >>>Why is should it be an error in the first place. I would = like all=20 the > >>>nodes in my cluster filesystem to take the = roles of=20 metadata server or > >>>data server for each file=20 independently. > >>>Marc. > >> >=20 >>_______________________________________________ > = >>nfsv4=20 mailing list > >>nfsv4@ietf.org >=20 >>https://www1.ietf.org/mailman/listinfo/nfsv4 > = >> >=20 > _______________________________________________ > nfsv4 = mailing=20 list > nfsv4@ietf.org >=20 https://www1.ietf.org/mailman/listinfo/nfsv4 > >=20 _______________________________________________ > nfsv4 mailing=20 list > nfsv4@ietf.org >=20 = https://www1.ietf.org/mailman/listinfo/nfsv4

= =00 ------_=_NextPart_001_01C576A2.F279A5EF-- --===============2090594847== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 --===============2090594847==-- From nfsv4-bounces@ietf.org Tue Jun 21 18:27:48 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkrDg-0002mb-72; Tue, 21 Jun 2005 18:27:48 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkrDc-0002mT-HN; Tue, 21 Jun 2005 18:27:45 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA20295; Tue, 21 Jun 2005 18:27:41 -0400 (EDT) Received: from e1.ny.us.ibm.com ([32.97.182.141]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1Dkrbb-0000Xl-53; Tue, 21 Jun 2005 18:52:32 -0400 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e1.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j5LMRTwH030433; Tue, 21 Jun 2005 18:27:29 -0400 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay02.pok.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j5LMRTKK259646; Tue, 21 Jun 2005 18:27:29 -0400 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.12.11/8.13.3) with ESMTP id j5LMRTAx005040; Tue, 21 Jun 2005 18:27:29 -0400 Received: from [9.56.227.90] (d01ml604.pok.ibm.com [9.56.227.90]) by d01av04.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j5LMRTJM005037; Tue, 21 Jun 2005 18:27:29 -0400 In-Reply-To: To: "Noveck, Dave" Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) MIME-Version: 1.0 X-Mailer: Lotus Notes Build V70_M4_01112005 Beta 3NP January 11, 2005 Message-ID: From: Marc Eshel Date: Tue, 21 Jun 2005 15:27:20 -0700 X-MIMETrack: Serialize by Router on D01ML604/01/M/IBM(Build V70_06092005|June 09, 2005) at 06/21/2005 18:27:28, Serialize complete at 06/21/2005 18:27:28 Content-Type: text/plain; charset="US-ASCII" X-Spam-Score: 0.0 (/) X-Scan-Signature: 52f7a77164458f8c7b36b66787c853da Cc: "Goodson, Garth" , nfsv4-bounces@ietf.org, nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org "Noveck, Dave" wrote on 06/21/2005 01:51:07 PM: > > With fs-locations we give the client a list of metadata server > > that can act as the server for given filesystem and the client > > can choose which one to use and switch among them at will or > > because of some failure. > > The problem is that in v4.0 as it stands there is an enormous > range of ways that a client can interpret the fs_locations list: > > "Here is a list of servers and you can switch when there is failure, > after refetching your change attibutes (because change attribute > is within the purview of a particular server)." (and that is not the > most extreme case of discontinuity -- some people are thinking that > filehandles and fileids will just change or that you may wind up > with a slighly out-of-data version of the data). > > "Here is a list of servers and you can switch when there is a > failure with no discontinuity of access (changes in fh's, stateid's, > fileids, change attributes), although changing is a big deal > and you shouldn't do it without good cause." > > "Here is a list of servers and you can switch when there is a > failure with no discontinuity of access or even at will since > there is no big cost to switch." > > "Here is a list of servers and you can access any of these > servers as you will at the same time (multi-pathing or a > cluster fs) since they are all effectively the same thing". > > The problem is that the client only knows about the list and > has no way of knowing which of the statements above is associated > with the list of servers he is getting. I have been thinking > about a locations_info attribute for 4.1 that would allow the > server to tell the client which of those he meant and also > give preference information (local vs. remote copies, absolutely > up-to-data vs. slightly out-of-data copies). > > > is solely within the purview of a particular server > > > I agree that once you got a layout for a file you should follow > > the layout instruction and read/write only from the specified > > nodes and get an error if you don't. > > Then it sounds like we are in violent agreement, except maybe for > choice of modal auxiliaries or capitalization. > > You say "you should follow the layout instruction" and I'm torn > between saying "you SHOULD follow the layout instruction" and > "you MUST follow the layout instruction". > > You say "[should] get an error if you don't" and I say "the server > SHOULD give you an error if you don't". > I say should follow and not MUST follow because I am trying to avoid the complication to the server if it MUST enforce this rule which might not be a problem for the server in the first place. For example, I might want to allow the read of the same large file(no caching) from one set of data server for client A and a different set of data server for client B. Now to enforce the above rule the server need to some how encode into the file handle information about which client can read what from which data server? I prefer not have this added extra work on the server and just say that the client should follow the rule to guaranty successful operation. Marc. _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Tue Jun 21 18:33:46 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkrJS-0005Kn-Fc; Tue, 21 Jun 2005 18:33:46 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkrJO-0005Kc-TB; Tue, 21 Jun 2005 18:33:44 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA20853; Tue, 21 Jun 2005 18:33:40 -0400 (EDT) Received: from mx2.netapp.com ([216.240.18.37]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DkrhO-0000ho-0e; Tue, 21 Jun 2005 18:58:31 -0400 Received: from smtp2.corp.netapp.com (10.57.159.114) by mx2.netapp.com with ESMTP; 21 Jun 2005 15:33:30 -0700 X-IronPort-AV: i="3.93,219,1115017200"; d="scan'208"; a="252976577:sNHT18599380" Received: from [10.34.24.132] (loderunner.hq.netapp.com [10.34.24.132]) by smtp2.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id j5LMXUT0012533; Tue, 21 Jun 2005 15:33:30 -0700 (PDT) Message-ID: <42B895BA.2060501@netapp.com> Date: Tue, 21 Jun 2005 15:33:30 -0700 From: Garth Goodson User-Agent: Debian Thunderbird 1.0.2 (X11/20050602) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Marc Eshel Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Scan-Signature: 0fa76816851382eb71b0a882ccdc29ac Content-Transfer-Encoding: 7bit Cc: nfsv4-bounces@ietf.org, "Noveck, Dave" , nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org Marc Eshel wrote: > "Noveck, Dave" wrote on 06/21/2005 01:51:07 PM: > > >>>With fs-locations we give the client a list of metadata server >>>that can act as the server for given filesystem and the client >>>can choose which one to use and switch among them at will or >>>because of some failure. >> >>The problem is that in v4.0 as it stands there is an enormous >>range of ways that a client can interpret the fs_locations list: >> >>"Here is a list of servers and you can switch when there is failure, >>after refetching your change attibutes (because change attribute >>is within the purview of a particular server)." (and that is not the >>most extreme case of discontinuity -- some people are thinking that >>filehandles and fileids will just change or that you may wind up >>with a slighly out-of-data version of the data). >> >>"Here is a list of servers and you can switch when there is a >>failure with no discontinuity of access (changes in fh's, stateid's, >>fileids, change attributes), although changing is a big deal >>and you shouldn't do it without good cause." >> >>"Here is a list of servers and you can switch when there is a >>failure with no discontinuity of access or even at will since >>there is no big cost to switch." >> >>"Here is a list of servers and you can access any of these >>servers as you will at the same time (multi-pathing or a >>cluster fs) since they are all effectively the same thing". >> >>The problem is that the client only knows about the list and >>has no way of knowing which of the statements above is associated >>with the list of servers he is getting. I have been thinking >>about a locations_info attribute for 4.1 that would allow the >>server to tell the client which of those he meant and also >>give preference information (local vs. remote copies, absolutely >>up-to-data vs. slightly out-of-data copies). >> >> >>>is solely within the purview of a particular server >> >>>I agree that once you got a layout for a file you should follow >>>the layout instruction and read/write only from the specified >>>nodes and get an error if you don't. >> >>Then it sounds like we are in violent agreement, except maybe for >>choice of modal auxiliaries or capitalization. >> >>You say "you should follow the layout instruction" and I'm torn >>between saying "you SHOULD follow the layout instruction" and >>"you MUST follow the layout instruction". >> >>You say "[should] get an error if you don't" and I say "the server >>SHOULD give you an error if you don't". >> > > > I say should follow and not MUST follow because I am trying to avoid the > complication to the server if it MUST enforce this rule which might not be > a problem for the server in the first place. For example, I might want to > allow the read of the same large file(no caching) from one set of data > server for client A and a different set of data server for client B. Now > to enforce the above rule the server need to some how encode into the file > handle information about which client can read what from which data > server? I prefer not have this added extra work on the server and just say > that the client should follow the rule to guaranty successful operation. > Marc. I think your example can be handled by giving client A and client B different layouts for the same file. The filehandles in the different layouts can be different as would be the device IDs. As long as the metadata server controls the sharing modes (read-only vs. read/write) there shouldn't be a problem. -Garth _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Tue Jun 21 18:46:18 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkrVa-0001JW-Fq; Tue, 21 Jun 2005 18:46:18 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkrVY-0001JM-9w; Tue, 21 Jun 2005 18:46:16 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA21781; Tue, 21 Jun 2005 18:46:13 -0400 (EDT) Received: from e4.ny.us.ibm.com ([32.97.182.144]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DkrtY-00013T-Oh; Tue, 21 Jun 2005 19:11:05 -0400 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e4.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j5LMk1j3025860; Tue, 21 Jun 2005 18:46:01 -0400 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay02.pok.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j5LMk1KK259260; Tue, 21 Jun 2005 18:46:01 -0400 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11/8.13.3) with ESMTP id j5LMk0SN006091; Tue, 21 Jun 2005 18:46:00 -0400 Received: from [9.56.227.90] (d01ml604.pok.ibm.com [9.56.227.90]) by d01av02.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j5LMk0av006076; Tue, 21 Jun 2005 18:46:00 -0400 In-Reply-To: <42B895BA.2060501@netapp.com> To: Garth Goodson Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) MIME-Version: 1.0 X-Mailer: Lotus Notes Build V70_M4_01112005 Beta 3NP January 11, 2005 Message-ID: From: Marc Eshel Date: Tue, 21 Jun 2005 15:45:51 -0700 X-MIMETrack: Serialize by Router on D01ML604/01/M/IBM(Build V70_06092005|June 09, 2005) at 06/21/2005 18:46:00, Serialize complete at 06/21/2005 18:46:00 Content-Type: text/plain; charset="US-ASCII" X-Spam-Score: 0.0 (/) X-Scan-Signature: 34d35111647d654d033d58d318c0d21a Cc: nfsv4-bounces@ietf.org, "Noveck, Dave" , nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org Garth Goodson wrote on 06/21/2005 03:33:30 PM: > Marc Eshel wrote: > > "Noveck, Dave" wrote on 06/21/2005 01:51:07 PM: > > > > > >>>With fs-locations we give the client a list of metadata server > >>>that can act as the server for given filesystem and the client > >>>can choose which one to use and switch among them at will or > >>>because of some failure. > >> > >>The problem is that in v4.0 as it stands there is an enormous > >>range of ways that a client can interpret the fs_locations list: > >> > >>"Here is a list of servers and you can switch when there is failure, > >>after refetching your change attibutes (because change attribute > >>is within the purview of a particular server)." (and that is not the > >>most extreme case of discontinuity -- some people are thinking that > >>filehandles and fileids will just change or that you may wind up > >>with a slighly out-of-data version of the data). > >> > >>"Here is a list of servers and you can switch when there is a > >>failure with no discontinuity of access (changes in fh's, stateid's, > >>fileids, change attributes), although changing is a big deal > >>and you shouldn't do it without good cause." > >> > >>"Here is a list of servers and you can switch when there is a > >>failure with no discontinuity of access or even at will since > >>there is no big cost to switch." > >> > >>"Here is a list of servers and you can access any of these > >>servers as you will at the same time (multi-pathing or a > >>cluster fs) since they are all effectively the same thing". > >> > >>The problem is that the client only knows about the list and > >>has no way of knowing which of the statements above is associated > >>with the list of servers he is getting. I have been thinking > >>about a locations_info attribute for 4.1 that would allow the > >>server to tell the client which of those he meant and also > >>give preference information (local vs. remote copies, absolutely > >>up-to-data vs. slightly out-of-data copies). > >> > >> > >>>is solely within the purview of a particular server > >> > >>>I agree that once you got a layout for a file you should follow > >>>the layout instruction and read/write only from the specified > >>>nodes and get an error if you don't. > >> > >>Then it sounds like we are in violent agreement, except maybe for > >>choice of modal auxiliaries or capitalization. > >> > >>You say "you should follow the layout instruction" and I'm torn > >>between saying "you SHOULD follow the layout instruction" and > >>"you MUST follow the layout instruction". > >> > >>You say "[should] get an error if you don't" and I say "the server > >>SHOULD give you an error if you don't". > >> > > > > > > I say should follow and not MUST follow because I am trying to avoid the > > complication to the server if it MUST enforce this rule which might not be > > a problem for the server in the first place. For example, I might want to > > allow the read of the same large file(no caching) from one set of data > > server for client A and a different set of data server for client B. Now > > to enforce the above rule the server need to some how encode into the file > > handle information about which client can read what from which data > > server? I prefer not have this added extra work on the server and just say > > that the client should follow the rule to guaranty successful operation. > > Marc. > > I think your example can be handled by giving client A and client B > different layouts for the same file. The filehandles in the different > layouts can be different as would be the device IDs. As long as the > metadata server controls the sharing modes (read-only vs. read/write) > there shouldn't be a problem. > > -Garth I know I can do it. I just don't want to make sure (enforce the rule) that each client is using the file handles to read only from the specified data server. Marc. Marc. _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Tue Jun 21 19:54:14 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DksZK-00032l-Ky; Tue, 21 Jun 2005 19:54:14 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DksZJ-00032g-7Z for nfsv4@megatron.ietf.org; Tue, 21 Jun 2005 19:54:13 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id TAA28014 for ; Tue, 21 Jun 2005 19:54:12 -0400 (EDT) Received: from mx1.netapp.com ([216.240.18.38]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DksxI-0003Fn-9w for nfsv4@ietf.org; Tue, 21 Jun 2005 20:19:02 -0400 Received: from smtp2.corp.netapp.com (10.57.159.114) by mx1.netapp.com with ESMTP; 21 Jun 2005 16:54:01 -0700 X-IronPort-AV: i="3.93,219,1115017200"; d="scan'208"; a="201922787:sNHT17774552" Received: from [10.34.24.132] (loderunner.hq.netapp.com [10.34.24.132]) by smtp2.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id j5LNs1QV003470; Tue, 21 Jun 2005 16:54:01 -0700 (PDT) Message-ID: <42B8A899.5030204@netapp.com> Date: Tue, 21 Jun 2005 16:54:01 -0700 From: Garth Goodson User-Agent: Debian Thunderbird 1.0.2 (X11/20050602) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Marc Eshel Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Scan-Signature: b22590c27682ace61775ee7b453b40d3 Content-Transfer-Encoding: 7bit Cc: "Noveck, Dave" , nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org Marc Eshel wrote: > Garth Goodson wrote on 06/21/2005 03:33:30 PM: > > >>Marc Eshel wrote: >> >>>"Noveck, Dave" wrote on 06/21/2005 01:51:07 > > PM: > >>> >>>>>With fs-locations we give the client a list of metadata server >>>>>that can act as the server for given filesystem and the client >>>>>can choose which one to use and switch among them at will or >>>>>because of some failure. >>>> >>>>The problem is that in v4.0 as it stands there is an enormous >>>>range of ways that a client can interpret the fs_locations list: >>>> >>>>"Here is a list of servers and you can switch when there is failure, >>>>after refetching your change attibutes (because change attribute >>>>is within the purview of a particular server)." (and that is not the >>>>most extreme case of discontinuity -- some people are thinking that >>>>filehandles and fileids will just change or that you may wind up >>>>with a slighly out-of-data version of the data). >>>> >>>>"Here is a list of servers and you can switch when there is a >>>>failure with no discontinuity of access (changes in fh's, stateid's, >>>>fileids, change attributes), although changing is a big deal >>>>and you shouldn't do it without good cause." >>>> >>>>"Here is a list of servers and you can switch when there is a >>>>failure with no discontinuity of access or even at will since >>>>there is no big cost to switch." >>>> >>>>"Here is a list of servers and you can access any of these >>>>servers as you will at the same time (multi-pathing or a >>>>cluster fs) since they are all effectively the same thing". >>>> >>>>The problem is that the client only knows about the list and >>>>has no way of knowing which of the statements above is associated >>>>with the list of servers he is getting. I have been thinking >>>>about a locations_info attribute for 4.1 that would allow the >>>>server to tell the client which of those he meant and also >>>>give preference information (local vs. remote copies, absolutely >>>>up-to-data vs. slightly out-of-data copies). >>>> >>>> >>>> >>>>>is solely within the purview of a particular server >>>> >>>>>I agree that once you got a layout for a file you should follow >>>>>the layout instruction and read/write only from the specified >>>>>nodes and get an error if you don't. >>>> >>>>Then it sounds like we are in violent agreement, except maybe for >>>>choice of modal auxiliaries or capitalization. >>>> >>>>You say "you should follow the layout instruction" and I'm torn >>>>between saying "you SHOULD follow the layout instruction" and >>>>"you MUST follow the layout instruction". >>>> >>>>You say "[should] get an error if you don't" and I say "the server >>>>SHOULD give you an error if you don't". >>>> >>> >>> >>>I say should follow and not MUST follow because I am trying to avoid > > the > >>>complication to the server if it MUST enforce this rule which might > > not be > >>>a problem for the server in the first place. For example, I might want > > to > >>>allow the read of the same large file(no caching) from one set of data > > >>>server for client A and a different set of data server for client B. > > Now > >>>to enforce the above rule the server need to some how encode into the > > file > >>>handle information about which client can read what from which data >>>server? I prefer not have this added extra work on the server and just > > say > >>>that the client should follow the rule to guaranty successful > > operation. > >>>Marc. >> >>I think your example can be handled by giving client A and client B >>different layouts for the same file. The filehandles in the different >>layouts can be different as would be the device IDs. As long as the >>metadata server controls the sharing modes (read-only vs. read/write) >>there shouldn't be a problem. >> >>-Garth > > > I know I can do it. I just don't want to make sure (enforce the rule) that > each client is using the file handles to read only from the specified data > server. > Marc. > > Marc. Ok, that is a valid concern (not having to propagate layouts to the data servers to validate that I/Os are coming from the correct clients). I guess the object guys get around this by encoding the layout/device IDs into the capability that is handed back to the client with the layout. It has been marked as an open issue... -Garth _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Wed Jun 22 06:55:22 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dl2t8-00053f-Hx; Wed, 22 Jun 2005 06:55:22 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dl2t6-00053X-Jd; Wed, 22 Jun 2005 06:55:20 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id GAA24097; Wed, 22 Jun 2005 06:55:17 -0400 (EDT) Received: from mx2.netapp.com ([216.240.18.37]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1Dl3HD-0002ro-BA; Wed, 22 Jun 2005 07:20:15 -0400 Received: from smtp1.corp.netapp.com (10.57.156.124) by mx2.netapp.com with ESMTP; 22 Jun 2005 03:55:10 -0700 X-IronPort-AV: i="3.93,220,1115017200"; d="scan'208"; a="253067190:sNHT23769408" Received: from svlexc03.hq.netapp.com (svlexc03.corp.netapp.com [10.57.156.149]) by smtp1.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id j5MAtAIi018285; Wed, 22 Jun 2005 03:55:10 -0700 (PDT) Received: from burgundy.hq.netapp.com ([10.56.10.66]) by svlexc03.hq.netapp.com with Microsoft SMTPSVC(6.0.3790.0); Wed, 22 Jun 2005 03:55:10 -0700 Received: from exnane01.hq.netapp.com ([10.97.0.61]) by burgundy.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.6713); Wed, 22 Jun 2005 03:55:10 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Date: Wed, 22 Jun 2005 06:55:08 -0400 Message-ID: Thread-Topic: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Thread-Index: AcV2sGpmjMnqRzZCRDOS8fA6mORgUAAZWreg From: "Noveck, Dave" To: "Marc Eshel" X-OriginalArrivalTime: 22 Jun 2005 10:55:10.0160 (UTC) FILETIME=[DBDDE500:01C57718] X-Spam-Score: 0.0 (/) X-Scan-Signature: 093efd19b5f651b2707595638f6c4003 Content-Transfer-Encoding: quoted-printable Cc: "Goodson, Garth" , nfsv4-bounces@ietf.org, nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org > > You say "you should follow the layout instruction" and I'm torn > > between saying "you SHOULD follow the layout instruction" and > > "you MUST follow the layout instruction". > >=20 > > You say "[should] get an error if you don't" and I say "the server > > SHOULD give you an error if you don't". >=20 > I say should follow and not MUST follow because I am trying to avoid = the=20 > complication to the server if it MUST enforce this rule which might = not be=20 > a problem for the server in the first place.=20 Hold on. I never suggested "MUST" for the server's obligation to check. It seems that the server checking is where you see = difficulties/inconvenience. I did suggest "MUST" as a possiblility for the clients' obligation to=20 conform. These two do *not* have to go in tandem. While it wouldn't = make any sense to have the client not have to conform while the server is = giving him an error if he doesn't, it is perfectly reasonable for the spec to strongly state the rule for the client but not to insist that the server check for compliance if it has great difficulties doing so. > For example, I might want to=20 > allow the read of the same large file(no caching) from one set of data = > server for client A and a different set of data server for client B. = Now=20 > to enforce the above rule the server need to some how encode into the = file=20 > handle information about which client can read what from which data=20 > server?=20 I may not be uderstanding your example correctly but it sounds like the case you are worried about is not really at issue here. I prefer not have this added extra work on the server and just say=20 that the client should follow the rule to guaranty successful operation. = -----Original Message----- From: Marc Eshel [mailto:eshel@almaden.ibm.com] Sent: Tuesday, June 21, 2005 6:27 PM To: Noveck, Dave Cc: Dean Hildebrand; Goodson, Garth; nfsv4@ietf.org; nfsv4-bounces@ietf.org Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) "Noveck, Dave" wrote on 06/21/2005 01:51:07 PM: > > With fs-locations we give the client a list of metadata server=20 > > that can act as the server for given filesystem and the client=20 > > can choose which one to use and switch among them at will or=20 > > because of some failure.=20 >=20 > The problem is that in v4.0 as it stands there is an enormous > range of ways that a client can interpret the fs_locations list: >=20 > "Here is a list of servers and you can switch when there is failure, > after refetching your change attibutes (because change attribute > is within the purview of a particular server)." (and that is not the > most extreme case of discontinuity -- some people are thinking that=20 > filehandles and fileids will just change or that you may wind up > with a slighly out-of-data version of the data). >=20 > "Here is a list of servers and you can switch when there is a=20 > failure with no discontinuity of access (changes in fh's, stateid's, > fileids, change attributes), although changing is a big deal > and you shouldn't do it without good cause." >=20 > "Here is a list of servers and you can switch when there is a > failure with no discontinuity of access or even at will since > there is no big cost to switch." >=20 > "Here is a list of servers and you can access any of these=20 > servers as you will at the same time (multi-pathing or a=20 > cluster fs) since they are all effectively the same thing". >=20 > The problem is that the client only knows about the list and > has no way of knowing which of the statements above is associated > with the list of servers he is getting. I have been thinking > about a locations_info attribute for 4.1 that would allow the > server to tell the client which of those he meant and also=20 > give preference information (local vs. remote copies, absolutely > up-to-data vs. slightly out-of-data copies). >=20 > > is solely within the purview of a particular server >=20 > > I agree that once you got a layout for a file you should follow=20 > > the layout instruction and read/write only from the specified=20 > > nodes and get an error if you don't.=20 >=20 > Then it sounds like we are in violent agreement, except maybe for > choice of modal auxiliaries or capitalization. >=20 > You say "you should follow the layout instruction" and I'm torn > between saying "you SHOULD follow the layout instruction" and > "you MUST follow the layout instruction". >=20 > You say "[should] get an error if you don't" and I say "the server > SHOULD give you an error if you don't". >=20 I say should follow and not MUST follow because I am trying to avoid the = complication to the server if it MUST enforce this rule which might not = be=20 a problem for the server in the first place. For example, I might want = to=20 allow the read of the same large file(no caching) from one set of data=20 server for client A and a different set of data server for client B. Now = to enforce the above rule the server need to some how encode into the = file=20 handle information about which client can read what from which data=20 server? I prefer not have this added extra work on the server and just = say=20 that the client should follow the rule to guaranty successful operation. = Marc.=20 _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Wed Jun 22 06:57:31 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dl2vD-0005aX-1R; Wed, 22 Jun 2005 06:57:31 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dl2vC-0005Zl-Is for nfsv4@megatron.ietf.org; Wed, 22 Jun 2005 06:57:30 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id GAA24271 for ; Wed, 22 Jun 2005 06:57:27 -0400 (EDT) Received: from mx1.netapp.com ([216.240.18.38]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1Dl3JJ-0002th-8U for nfsv4@ietf.org; Wed, 22 Jun 2005 07:22:25 -0400 Received: from smtp2.corp.netapp.com (10.57.159.114) by mx1.netapp.com with ESMTP; 22 Jun 2005 03:57:21 -0700 X-IronPort-AV: i="3.93,220,1115017200"; d="scan'208"; a="201968973:sNHT22844000" Received: from svlexc02.hq.netapp.com (svlexc02.corp.netapp.com [10.57.157.136]) by smtp2.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id j5MAvKPs009281 for ; Wed, 22 Jun 2005 03:57:20 -0700 (PDT) Received: from burgundy.hq.netapp.com ([10.56.10.66]) by svlexc02.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.6713); Wed, 22 Jun 2005 03:57:20 -0700 Received: from exnane01.hq.netapp.com ([10.97.0.61]) by burgundy.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.6713); Wed, 22 Jun 2005 03:57:20 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: FW: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Date: Wed, 22 Jun 2005 06:57:19 -0400 Message-ID: Thread-Topic: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Thread-Index: AcV2sGpmjMnqRzZCRDOS8fA6mORgUAAZWregAADEk5A= From: "Noveck, Dave" To: X-OriginalArrivalTime: 22 Jun 2005 10:57:20.0774 (UTC) FILETIME=[29B80260:01C57719] X-Spam-Score: 0.0 (/) X-Scan-Signature: 6640e3bbe8a4d70c4469bcdcbbf0921d Content-Transfer-Encoding: quoted-printable X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org Best to ignore the following for the moment. Send hit inadvertantly. Updated, more coherent message will be sent soon. -----Original Message----- From: Noveck, Dave=20 Sent: Wednesday, June 22, 2005 6:55 AM To: 'Marc Eshel' Cc: Dean Hildebrand; Goodson, Garth; nfsv4@ietf.org; nfsv4-bounces@ietf.org Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) > > You say "you should follow the layout instruction" and I'm torn > > between saying "you SHOULD follow the layout instruction" and > > "you MUST follow the layout instruction". > >=20 > > You say "[should] get an error if you don't" and I say "the server > > SHOULD give you an error if you don't". >=20 > I say should follow and not MUST follow because I am trying to avoid = the=20 > complication to the server if it MUST enforce this rule which might = not be=20 > a problem for the server in the first place.=20 Hold on. I never suggested "MUST" for the server's obligation to check. It seems that the server checking is where you see = difficulties/inconvenience. I did suggest "MUST" as a possiblility for the clients' obligation to=20 conform. These two do *not* have to go in tandem. While it wouldn't = make any sense to have the client not have to conform while the server is = giving him an error if he doesn't, it is perfectly reasonable for the spec to strongly state the rule for the client but not to insist that the server check for compliance if it has great difficulties doing so. > For example, I might want to=20 > allow the read of the same large file(no caching) from one set of data = > server for client A and a different set of data server for client B. = Now=20 > to enforce the above rule the server need to some how encode into the = file=20 > handle information about which client can read what from which data=20 > server?=20 I may not be uderstanding your example correctly but it sounds like the case you are worried about is not really at issue here. I prefer not have this added extra work on the server and just say=20 that the client should follow the rule to guaranty successful operation. = -----Original Message----- From: Marc Eshel [mailto:eshel@almaden.ibm.com] Sent: Tuesday, June 21, 2005 6:27 PM To: Noveck, Dave Cc: Dean Hildebrand; Goodson, Garth; nfsv4@ietf.org; nfsv4-bounces@ietf.org Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) "Noveck, Dave" wrote on 06/21/2005 01:51:07 PM: > > With fs-locations we give the client a list of metadata server=20 > > that can act as the server for given filesystem and the client=20 > > can choose which one to use and switch among them at will or=20 > > because of some failure.=20 >=20 > The problem is that in v4.0 as it stands there is an enormous > range of ways that a client can interpret the fs_locations list: >=20 > "Here is a list of servers and you can switch when there is failure, > after refetching your change attibutes (because change attribute > is within the purview of a particular server)." (and that is not the > most extreme case of discontinuity -- some people are thinking that=20 > filehandles and fileids will just change or that you may wind up > with a slighly out-of-data version of the data). >=20 > "Here is a list of servers and you can switch when there is a=20 > failure with no discontinuity of access (changes in fh's, stateid's, > fileids, change attributes), although changing is a big deal > and you shouldn't do it without good cause." >=20 > "Here is a list of servers and you can switch when there is a > failure with no discontinuity of access or even at will since > there is no big cost to switch." >=20 > "Here is a list of servers and you can access any of these=20 > servers as you will at the same time (multi-pathing or a=20 > cluster fs) since they are all effectively the same thing". >=20 > The problem is that the client only knows about the list and > has no way of knowing which of the statements above is associated > with the list of servers he is getting. I have been thinking > about a locations_info attribute for 4.1 that would allow the > server to tell the client which of those he meant and also=20 > give preference information (local vs. remote copies, absolutely > up-to-data vs. slightly out-of-data copies). >=20 > > is solely within the purview of a particular server >=20 > > I agree that once you got a layout for a file you should follow=20 > > the layout instruction and read/write only from the specified=20 > > nodes and get an error if you don't.=20 >=20 > Then it sounds like we are in violent agreement, except maybe for > choice of modal auxiliaries or capitalization. >=20 > You say "you should follow the layout instruction" and I'm torn > between saying "you SHOULD follow the layout instruction" and > "you MUST follow the layout instruction". >=20 > You say "[should] get an error if you don't" and I say "the server > SHOULD give you an error if you don't". >=20 I say should follow and not MUST follow because I am trying to avoid the = complication to the server if it MUST enforce this rule which might not = be=20 a problem for the server in the first place. For example, I might want = to=20 allow the read of the same large file(no caching) from one set of data=20 server for client A and a different set of data server for client B. Now = to enforce the above rule the server need to some how encode into the = file=20 handle information about which client can read what from which data=20 server? I prefer not have this added extra work on the server and just = say=20 that the client should follow the rule to guaranty successful operation. = Marc.=20 _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Wed Jun 22 09:51:00 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dl5d6-0008Hg-Jp; Wed, 22 Jun 2005 09:51:00 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dl5d4-0008HP-MX; Wed, 22 Jun 2005 09:50:58 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id JAA11287; Wed, 22 Jun 2005 09:50:56 -0400 (EDT) Received: from mx2.netapp.com ([216.240.18.37]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1Dl61C-0000JL-Lb; Wed, 22 Jun 2005 10:15:55 -0400 Received: from smtp1.corp.netapp.com (10.57.156.124) by mx2.netapp.com with ESMTP; 22 Jun 2005 06:50:48 -0700 X-IronPort-AV: i="3.93,221,1115017200"; d="scan'208"; a="253110518:sNHT24388284" Received: from svlexc02.hq.netapp.com (svlexc02.corp.netapp.com [10.57.157.136]) by smtp1.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id j5MDolwd012359; Wed, 22 Jun 2005 06:50:47 -0700 (PDT) Received: from lavender.hq.netapp.com ([10.56.11.75]) by svlexc02.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.6713); Wed, 22 Jun 2005 06:50:47 -0700 Received: from exnane01.hq.netapp.com ([10.97.0.61]) by lavender.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.6713); Wed, 22 Jun 2005 06:50:47 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Date: Wed, 22 Jun 2005 09:50:45 -0400 Message-ID: Thread-Topic: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Thread-Index: AcV2sGpmjMnqRzZCRDOS8fA6mORgUAAfERgw From: "Noveck, Dave" To: "Marc Eshel" X-OriginalArrivalTime: 22 Jun 2005 13:50:47.0105 (UTC) FILETIME=[64601B10:01C57731] X-Spam-Score: 1.3 (+) X-Scan-Signature: 7e439b86d3292ef5adf93b694a43a576 Content-Transfer-Encoding: quoted-printable Cc: "Goodson, Garth" , nfsv4-bounces@ietf.org, nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org The full response this time. > > You say "you should follow the layout instruction" and I'm torn > > between saying "you SHOULD follow the layout instruction" and > > "you MUST follow the layout instruction". > >=20 > > You say "[should] get an error if you don't" and I say "the server > > SHOULD give you an error if you don't". >=20 > I say should follow and not MUST follow because I am trying to avoid = the=20 > complication to the server if it MUST enforce this rule which might = not be=20 > a problem for the server in the first place.=20 Hold on. I never suggested "MUST" for the server's obligation to check. It seems that the server checking is where you see = difficulties/inconvenience. I did suggest "MUST" as a possiblility for the clients' obligation to=20 conform. These two do *not* have to go in tandem. While it wouldn't = make any sense to have the client not have to conform while the server is = giving him an error if he doesn't, it is perfectly reasonable for the spec to strongly state the rule for the client but not to insist that the server check for compliance if it has great difficulties doing so. > For example, I might want to=20 > allow the read of the same large file(no caching) from one set of data = > server for client A and a different set of data server for client B. = Now=20 > to enforce the above rule the server need to some how encode into the = file=20 > handle information about which client can read what from which data=20 > server?=20 I may not be uderstanding your example correctly but it sounds like the case you are worried about is not really at issue here. I know we have been talking kind of loosely about should/SHOULD/MUST=20 "follow the layout instruction". This is overbroad. If a server is told to use server111.clustersRus.org and takes that same handles and uses it on some other server, server111.clustersRus.com for example, then he is not following the layout instruction, but the spec is not going to require anybody to specifically act to make sure that he=20 gets an error. The effect of using a filehandle on a server other than the one it for had always been undefined, and I expect it will continue to be. Even though your data servers above are going to=20 be in more confederal relationship than the two server111's, I=20 think the same will still hold. If I take a handle for X and use it on Y, I have a real good chance of getting STALE but there is no guarantee that I will. The specific issue that started this (and that I'm still talking=20 about) is more limited. I'm given a handle H for a server A in a layout and in that is the requirement that that handle be valid for READ/WRITE, etc. and not for SETATTR. If the client uses it on A and does a SETATTR, he SHOULD get an error. If he uses that same handle on B and does a READ then he is broken but the server has no obligation to recognize handles for other servers. Similarly if he does a SETATTR with the handle on B. > I prefer not have this added extra work on the server and just say=20 > that the client should follow the rule to guaranty successful = operation.=20 In the IETF "should" is very weak and amounts to "gee, it is sort of a good idea to". "SHOULD" is much stronger and says "Do it unless you = have a real good reason not to". "MUST" just says to do it.=20 I guess I still think that if you receive a handle for server A in a=20 layout, you MUST NOT use it to do operations on that server other than PUTFH, COMMIT, READ, WRITE, and that if you do, the server SHOULD give=20 you an error. If you feel this is too difficult for the server, then the "SHOULD"=20 would give you enough wiggle-room. -----Original Message----- From: Marc Eshel [mailto:eshel@almaden.ibm.com] Sent: Tuesday, June 21, 2005 6:27 PM To: Noveck, Dave Cc: Dean Hildebrand; Goodson, Garth; nfsv4@ietf.org; nfsv4-bounces@ietf.org Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) "Noveck, Dave" wrote on 06/21/2005 01:51:07 PM: > > With fs-locations we give the client a list of metadata server=20 > > that can act as the server for given filesystem and the client=20 > > can choose which one to use and switch among them at will or=20 > > because of some failure.=20 >=20 > The problem is that in v4.0 as it stands there is an enormous > range of ways that a client can interpret the fs_locations list: >=20 > "Here is a list of servers and you can switch when there is failure, > after refetching your change attibutes (because change attribute > is within the purview of a particular server)." (and that is not the > most extreme case of discontinuity -- some people are thinking that=20 > filehandles and fileids will just change or that you may wind up > with a slighly out-of-data version of the data). >=20 > "Here is a list of servers and you can switch when there is a=20 > failure with no discontinuity of access (changes in fh's, stateid's, > fileids, change attributes), although changing is a big deal > and you shouldn't do it without good cause." >=20 > "Here is a list of servers and you can switch when there is a > failure with no discontinuity of access or even at will since > there is no big cost to switch." >=20 > "Here is a list of servers and you can access any of these=20 > servers as you will at the same time (multi-pathing or a=20 > cluster fs) since they are all effectively the same thing". >=20 > The problem is that the client only knows about the list and > has no way of knowing which of the statements above is associated > with the list of servers he is getting. I have been thinking > about a locations_info attribute for 4.1 that would allow the > server to tell the client which of those he meant and also=20 > give preference information (local vs. remote copies, absolutely > up-to-data vs. slightly out-of-data copies). >=20 > > is solely within the purview of a particular server >=20 > > I agree that once you got a layout for a file you should follow=20 > > the layout instruction and read/write only from the specified=20 > > nodes and get an error if you don't.=20 >=20 > Then it sounds like we are in violent agreement, except maybe for > choice of modal auxiliaries or capitalization. >=20 > You say "you should follow the layout instruction" and I'm torn > between saying "you SHOULD follow the layout instruction" and > "you MUST follow the layout instruction". >=20 > You say "[should] get an error if you don't" and I say "the server > SHOULD give you an error if you don't". >=20 I say should follow and not MUST follow because I am trying to avoid the = complication to the server if it MUST enforce this rule which might not = be=20 a problem for the server in the first place. For example, I might want = to=20 allow the read of the same large file(no caching) from one set of data=20 server for client A and a different set of data server for client B. Now = to enforce the above rule the server need to some how encode into the = file=20 handle information about which client can read what from which data=20 server? I prefer not have this added extra work on the server and just = say=20 that the client should follow the rule to guaranty successful operation. = Marc.=20 _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Wed Jun 22 11:24:24 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dl75T-0004XO-VR; Wed, 22 Jun 2005 11:24:23 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dl75R-0004X4-Dz; Wed, 22 Jun 2005 11:24:21 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id LAA23051; Wed, 22 Jun 2005 11:24:18 -0400 (EDT) Received: from newman.eecs.umich.edu ([141.213.4.11]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1Dl7TZ-0003y5-Dk; Wed, 22 Jun 2005 11:49:18 -0400 Received: from willow.eecs.umich.edu (willow.eecs.umich.edu [141.213.4.14]) by newman.eecs.umich.edu (8.13.2/8.13.0) with ESMTP id j5MFNxuU000457 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 22 Jun 2005 11:23:59 -0400 Received: from willow.eecs.umich.edu (localhost.eecs.umich.edu [127.0.0.1]) by willow.eecs.umich.edu (8.13.1/8.13.0) with ESMTP id j5MFNw8s016805; Wed, 22 Jun 2005 11:23:59 -0400 Received: from localhost (dhildebz@localhost) by willow.eecs.umich.edu (8.13.1/8.13.1/Submit) with ESMTP id j5MFNwC8016802; Wed, 22 Jun 2005 11:23:58 -0400 Date: Wed, 22 Jun 2005 11:23:58 -0400 (EDT) From: Dean Hildebrand To: "Noveck, Dave" Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,NO_OBLIGATION autolearn=no version=3.0.3 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on newman.eecs.umich.edu X-Virus-Scan: : UVSCAN at UoM/EECS X-Spam-Score: 1.3 (+) X-Scan-Signature: 848ed35f2a4fc0638fa89629cb640f48 Cc: "Goodson, Garth" , nfsv4-bounces@ietf.org, Marc Eshel , nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org Two comments: 1) I think another example to keep in mind is that a layout could be redundant. You can imagine layouts that just return "file available on all data servers", and then let the client use its own load balancing scheme to access data. Of course this depends on all clients using the same load balancing algorithm, but file systems already exist in this manner. 2) I think it can be safely assumed that someone is going to want to use these extensions to send GETATTR, etc to the data servers to offload work from the metadata server. Garth's NASD paper did this with NASD NFS and showed the benefits. If one layout driver redirects GETATTR's, etc and another one doesn't, I assume the faster one will be used, not the one that follows the spec. If we would rather put this ability into a new FILE_LOCATIONS attribute or something similar, then we should say so. I wrote a FILE_LOCATIONS internet draft about 2 years ago that never went anywhere.... Dean On Wed, 22 Jun 2005, Noveck, Dave wrote: > The full response this time. > > > > You say "you should follow the layout instruction" and I'm torn > > > between saying "you SHOULD follow the layout instruction" and > > > "you MUST follow the layout instruction". > > > > > > You say "[should] get an error if you don't" and I say "the server > > > SHOULD give you an error if you don't". > > > > I say should follow and not MUST follow because I am trying to avoid the > > complication to the server if it MUST enforce this rule which might not be > > a problem for the server in the first place. > > Hold on. I never suggested "MUST" for the server's obligation to check. > It seems that the server checking is where you see difficulties/inconvenience. > > I did suggest "MUST" as a possiblility for the clients' obligation to > conform. These two do *not* have to go in tandem. While it wouldn't make > any sense to have the client not have to conform while the server is giving > him an error if he doesn't, it is perfectly reasonable for the spec to > strongly state the rule for the client but not to insist that the server > check for compliance if it has great difficulties doing so. > > > For example, I might want to > > allow the read of the same large file(no caching) from one set of data > > server for client A and a different set of data server for client B. Now > > to enforce the above rule the server need to some how encode into the file > > handle information about which client can read what from which data > > server? > > I may not be uderstanding your example correctly but it sounds like the > case you are worried about is not really at issue here. > > I know we have been talking kind of loosely about should/SHOULD/MUST > "follow the layout instruction". This is overbroad. If a server is > told to use server111.clustersRus.org and takes that same handles and > uses it on some other server, server111.clustersRus.com for example, > then he is not following the layout instruction, but the spec is not > going to require anybody to specifically act to make sure that he > gets an error. The effect of using a filehandle on a server other > than the one it for had always been undefined, and I expect it will > continue to be. Even though your data servers above are going to > be in more confederal relationship than the two server111's, I > think the same will still hold. If I take a handle for X and use it > on Y, I have a real good chance of getting STALE but there is no > guarantee that I will. > > The specific issue that started this (and that I'm still talking > about) is more limited. I'm given a handle H for a server A in a > layout and in that is the requirement that that handle be valid > for READ/WRITE, etc. and not for SETATTR. If the client uses it > on A and does a SETATTR, he SHOULD get an error. If he uses that > same handle on B and does a READ then he is broken but the server > has no obligation to recognize handles for other servers. Similarly > if he does a SETATTR with the handle on B. > > > I prefer not have this added extra work on the server and just say > > that the client should follow the rule to guaranty successful operation. > > In the IETF "should" is very weak and amounts to "gee, it is sort of a > good idea to". "SHOULD" is much stronger and says "Do it unless you have > a real good reason not to". "MUST" just says to do it. > > I guess I still think that if you receive a handle for server A in a > layout, you MUST NOT use it to do operations on that server other than > PUTFH, COMMIT, READ, WRITE, and that if you do, the server SHOULD give > you an error. > > If you feel this is too difficult for the server, then the "SHOULD" > would give you enough wiggle-room. > > > -----Original Message----- > From: Marc Eshel [mailto:eshel@almaden.ibm.com] > Sent: Tuesday, June 21, 2005 6:27 PM > To: Noveck, Dave > Cc: Dean Hildebrand; Goodson, Garth; nfsv4@ietf.org; > nfsv4-bounces@ietf.org > Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) > > > "Noveck, Dave" wrote on 06/21/2005 01:51:07 PM: > > > > With fs-locations we give the client a list of metadata server > > > that can act as the server for given filesystem and the client > > > can choose which one to use and switch among them at will or > > > because of some failure. > > > > The problem is that in v4.0 as it stands there is an enormous > > range of ways that a client can interpret the fs_locations list: > > > > "Here is a list of servers and you can switch when there is failure, > > after refetching your change attibutes (because change attribute > > is within the purview of a particular server)." (and that is not the > > most extreme case of discontinuity -- some people are thinking that > > filehandles and fileids will just change or that you may wind up > > with a slighly out-of-data version of the data). > > > > "Here is a list of servers and you can switch when there is a > > failure with no discontinuity of access (changes in fh's, stateid's, > > fileids, change attributes), although changing is a big deal > > and you shouldn't do it without good cause." > > > > "Here is a list of servers and you can switch when there is a > > failure with no discontinuity of access or even at will since > > there is no big cost to switch." > > > > "Here is a list of servers and you can access any of these > > servers as you will at the same time (multi-pathing or a > > cluster fs) since they are all effectively the same thing". > > > > The problem is that the client only knows about the list and > > has no way of knowing which of the statements above is associated > > with the list of servers he is getting. I have been thinking > > about a locations_info attribute for 4.1 that would allow the > > server to tell the client which of those he meant and also > > give preference information (local vs. remote copies, absolutely > > up-to-data vs. slightly out-of-data copies). > > > > > is solely within the purview of a particular server > > > > > I agree that once you got a layout for a file you should follow > > > the layout instruction and read/write only from the specified > > > nodes and get an error if you don't. > > > > Then it sounds like we are in violent agreement, except maybe for > > choice of modal auxiliaries or capitalization. > > > > You say "you should follow the layout instruction" and I'm torn > > between saying "you SHOULD follow the layout instruction" and > > "you MUST follow the layout instruction". > > > > You say "[should] get an error if you don't" and I say "the server > > SHOULD give you an error if you don't". > > > > I say should follow and not MUST follow because I am trying to avoid the > complication to the server if it MUST enforce this rule which might not be > a problem for the server in the first place. For example, I might want to > allow the read of the same large file(no caching) from one set of data > server for client A and a different set of data server for client B. Now > to enforce the above rule the server need to some how encode into the file > handle information about which client can read what from which data > server? I prefer not have this added extra work on the server and just say > that the client should follow the rule to guaranty successful operation. > Marc. > _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Wed Jun 22 13:13:17 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dl8mr-0007n7-E7; Wed, 22 Jun 2005 13:13:17 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dl8mq-0007mz-Kv; Wed, 22 Jun 2005 13:13:16 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id NAA03650; Wed, 22 Jun 2005 13:13:12 -0400 (EDT) Received: from e3.ny.us.ibm.com ([32.97.182.143]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1Dl9B0-0007mA-0i; Wed, 22 Jun 2005 13:38:14 -0400 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e3.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j5MHCuS8032608; Wed, 22 Jun 2005 13:12:56 -0400 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay02.pok.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j5MHCuZJ261810; Wed, 22 Jun 2005 13:12:56 -0400 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.12.11/8.13.3) with ESMTP id j5MHCk4b019872; Wed, 22 Jun 2005 13:12:46 -0400 Received: from [9.56.227.90] (d01ml604.pok.ibm.com [9.56.227.90]) by d01av04.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j5MHCk6s019736; Wed, 22 Jun 2005 13:12:46 -0400 In-Reply-To: To: "Noveck, Dave" Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) MIME-Version: 1.0 X-Mailer: Lotus Notes Build V70_M4_01112005 Beta 3NP January 11, 2005 Message-ID: From: Marc Eshel Date: Wed, 22 Jun 2005 10:12:26 -0700 X-MIMETrack: Serialize by Router on D01ML604/01/M/IBM(Build V70_06092005|June 09, 2005) at 06/22/2005 13:12:46, Serialize complete at 06/22/2005 13:12:46 Content-Type: text/plain; charset="US-ASCII" X-Spam-Score: 1.3 (+) X-Scan-Signature: 6e922792024732fb1bb6f346e63517e4 Cc: "Goodson, Garth" , nfsv4@ietf.org, nfsv4-bounces@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org "Noveck, Dave" wrote on 06/22/2005 06:50:45 AM: > The full response this time. > > > > You say "you should follow the layout instruction" and I'm torn > > > between saying "you SHOULD follow the layout instruction" and > > > "you MUST follow the layout instruction". > > > > > > You say "[should] get an error if you don't" and I say "the server > > > SHOULD give you an error if you don't". > > > > > I say should follow and not MUST follow because I am trying to avoid the > > complication to the server if it MUST enforce this rule which might not be > > a problem for the server in the first place. > > Hold on. I never suggested "MUST" for the server's obligation to check. > It seems that the server checking is where you see difficulties/inconvenience. > > I did suggest "MUST" as a possiblility for the clients' obligation to > conform. These two do *not* have to go in tandem. While it wouldn't make > any sense to have the client not have to conform while the server is giving > him an error if he doesn't, it is perfectly reasonable for the spec to > strongly state the rule for the client but not to insist that the server > check for compliance if it has great difficulties doing so. > > > For example, I might want to > > allow the read of the same large file(no caching) from one set of data > > server for client A and a different set of data server for client B. Now > > to enforce the above rule the server need to some how encode into the file > > handle information about which client can read what from which data > > server? > > I may not be uderstanding your example correctly but it sounds like the > case you are worried about is not really at issue here. The example was just to illustrate the information (capabilities) that you would have to encode in the file handle if we wanted the server verify that the client follow the rules. > > I know we have been talking kind of loosely about should/SHOULD/MUST > "follow the layout instruction". This is overbroad. If a server is > told to use server111.clustersRus.org and takes that same handles and > uses it on some other server, server111.clustersRus.com for example, > then he is not following the layout instruction, but the spec is not > going to require anybody to specifically act to make sure that he > gets an error. The effect of using a filehandle on a server other > than the one it for had always been undefined, and I expect it will > continue to be. Even though your data servers above are going to > be in more confederal relationship than the two server111's, I > think the same will still hold. If I take a handle for X and use it > on Y, I have a real good chance of getting STALE but there is no > guarantee that I will. > > The specific issue that started this (and that I'm still talking > about) is more limited. I'm given a handle H for a server A in a > layout and in that is the requirement that that handle be valid > for READ/WRITE, etc. and not for SETATTR. If the client uses it > on A and does a SETATTR, he SHOULD get an error. If he uses that > same handle on B and does a READ then he is broken but the server > has no obligation to recognize handles for other servers. Similarly > if he does a SETATTR with the handle on B. > > > I prefer not have this added extra work on the server and just say > > that the client should follow the rule to guaranty successful operation. > > In the IETF "should" is very weak and amounts to "gee, it is sort of a > good idea to". "SHOULD" is much stronger and says "Do it unless you have > a real good reason not to". "MUST" just says to do it. > > I guess I still think that if you receive a handle for server A in a > layout, you MUST NOT use it to do operations on that server other than > PUTFH, COMMIT, READ, WRITE, and that if you do, the server SHOULD give > you an error. > > If you feel this is too difficult for the server, then the "SHOULD" > would give you enough wiggle-room. > I can see that I can get a way with not checking on the server but I would like to implement a server that is closer to the spec recommendation and understand the requirements. The "SHOULD" makes me a little uncomfortable I would prefer "should", but if it is only me I will shut-up. I would rather keep using file handles as file ids which can be used on any node of a cluster file system and not have to encode capabilities into them. Now the client needs to remember all those different file handles that should be used for different operations and the server verifying that the client did so. Marc. _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Wed Jun 22 14:58:28 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DlAQe-00089D-8M; Wed, 22 Jun 2005 14:58:28 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DlAQc-000895-IC; Wed, 22 Jun 2005 14:58:26 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id OAA17221; Wed, 22 Jun 2005 14:58:24 -0400 (EDT) Received: from mx1.netapp.com ([216.240.18.38]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DlAol-0006Kz-D2; Wed, 22 Jun 2005 15:23:26 -0400 Received: from smtp1.corp.netapp.com (10.57.156.124) by mx1.netapp.com with ESMTP; 22 Jun 2005 11:58:15 -0700 X-IronPort-AV: i="3.93,221,1115017200"; d="scan'208"; a="202012379:sNHT24311260" Received: from svlexc02.hq.netapp.com (svlexc02.corp.netapp.com [10.57.157.136]) by smtp1.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id j5MIwEBT007542; Wed, 22 Jun 2005 11:58:14 -0700 (PDT) Received: from lavender.hq.netapp.com ([10.56.11.75]) by svlexc02.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.6713); Wed, 22 Jun 2005 11:58:14 -0700 Received: from exnane01.hq.netapp.com ([10.97.0.61]) by lavender.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.6713); Wed, 22 Jun 2005 11:58:14 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Date: Wed, 22 Jun 2005 14:58:12 -0400 Message-ID: Thread-Topic: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Thread-Index: AcV3TaPPqL1sKFj4T26HyZXxZx7S4gACeXwg From: "Noveck, Dave" To: "Marc Eshel" X-OriginalArrivalTime: 22 Jun 2005 18:58:14.0121 (UTC) FILETIME=[57A75D90:01C5775C] X-Spam-Score: 1.3 (+) X-Scan-Signature: 43317e64100dd4d87214c51822b582d1 Content-Transfer-Encoding: quoted-printable Cc: "Goodson, Garth" , nfsv4@ietf.org, nfsv4-bounces@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org > I can see that I can get a way with not checking on the server but I would=20 > like to implement a server that is closer to the spec recommendation and=20 > understand the requirements. The "SHOULD" makes me a little uncomfortable=20 > I would prefer "should", but if it is only me I will shut-up. It sounds like you'd prefer "if you feel like it" and that makes me uncomfortable. > I would rather keep using file handles as file ids which can be used on=20 > any node of a cluster file system and not have to encode capabilities into=20 > them. Now the client needs to remember all those different file handles=20 > that should be used for different operations Whoa! Before you said you were OK with the client obligation (to only do READ/WRITE/etc on filehandles it got from layouts) and only objected to the work of the server verifying compliance. Now it appears that you object to forcing the client to obey that rule, i.e. that the problem with the verification is not that it is hard to=20 do but that it would make the client remember "all those different file handles that should be used for different operations". If that is the case then we have a real problem. You have an implementation in which every server may act as a metadata server but the pnfs client cannot=20 assume that all of the implementations with which it will interact will have that characteristic or else we have a massive (lack-of)- interoperability problem. If a layout tells the client he may use handle A on server X to READ/WRITE then he had to be capable of=20 respecting that, whether the server holds him to it or not. I'm perfectly OK with exposing additional functionality that a=20 cluster fs would provide for metadata load-balancing and failover as long as we are clear that this is something that the client is directed to use based on server characteristics. For example, if the devinfo entry says that the layout handle may be used to read/ write on a certain set of guaranteed-equivalent servers, then this=20 is fine. Or if a locations_info attribute for the fs indicated that coherent metadata service was available on a given set of servers, then this is OK as well. But each of these options is an option=20 and the basic architecture of pnfs is that there is a distinction between data service and meta-data service and that the client=20 has to maintain that distinction. Just as a pnfs client should=20 not use a block address in a SETATTR request or send a filehandle=20 in a SCSI block write :-), it should not send a handle it got from=20 a layout in a SETATTR request. It should not send a filehandle it=20 got from the meta-data server to a data server unless it has some=20 specific guidance that it can, such as a locations_info attribute=20 saying servers X, Y, Z are equivalent. The important point is=20 that that latter is not always going to be there and the client may not assume that it is. =20 > and the server verifying that=20 > the client did so. The verification is a big help when testing. This is going to be more complicated that what we've done in the past and the=20 earlier we detect a problem the better off we are all going=20 to be. I wouldn't think of trying to make this work without=20 that kind of checking, particular given all the possible types=20 of implementations we have been talking about here. All this=20 requires is one bit in a file handle saying whether it gives the=20 right to do all operations (including metadata-server operations)=20 or just the subset for data server operations. If you are inclined=20 not to do this, my question would be, "Do you feel lucky?". -----Original Message----- From: Marc Eshel [mailto:eshel@almaden.ibm.com]=20 Sent: Wednesday, June 22, 2005 1:12 PM To: Noveck, Dave Cc: Goodson, Garth; nfsv4@ietf.org; nfsv4-bounces@ietf.org Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) "Noveck, Dave" wrote on 06/22/2005 06:50:45 AM: > The full response this time. >=20 > > > You say "you should follow the layout instruction" and I'm torn > > > between saying "you SHOULD follow the layout instruction" and > > > "you MUST follow the layout instruction". > > >=20 > > > You say "[should] get an error if you don't" and I say "the server > > > SHOULD give you an error if you don't". > >=20 >=20 > > I say should follow and not MUST follow because I am trying to avoid the=20 > > complication to the server if it MUST enforce this rule which might=20 not be=20 > > a problem for the server in the first place.=20 >=20 > Hold on. I never suggested "MUST" for the server's obligation to check. > It seems that the server checking is where you see=20 difficulties/inconvenience. >=20 > I did suggest "MUST" as a possiblility for the clients' obligation to=20 > conform. These two do *not* have to go in tandem. While it wouldn't=20 make > any sense to have the client not have to conform while the server is=20 giving > him an error if he doesn't, it is perfectly reasonable for the spec to > strongly state the rule for the client but not to insist that the server > check for compliance if it has great difficulties doing so. >=20 > > For example, I might want to=20 > > allow the read of the same large file(no caching) from one set of data=20 > > server for client A and a different set of data server for client B. Now=20 > > to enforce the above rule the server need to some how encode into the=20 file=20 > > handle information about which client can read what from which data=20 > > server?=20 >=20 > I may not be uderstanding your example correctly but it sounds like the > case you are worried about is not really at issue here. The example was just to illustrate the information (capabilities) that you=20 would have to encode in the file handle if we wanted the server verify=20 that the client follow the rules. =20 >=20 > I know we have been talking kind of loosely about should/SHOULD/MUST=20 > "follow the layout instruction". This is overbroad. If a server is > told to use server111.clustersRus.org and takes that same handles and > uses it on some other server, server111.clustersRus.com for example, > then he is not following the layout instruction, but the spec is not > going to require anybody to specifically act to make sure that he=20 > gets an error. The effect of using a filehandle on a server other > than the one it for had always been undefined, and I expect it will > continue to be. Even though your data servers above are going to=20 > be in more confederal relationship than the two server111's, I=20 > think the same will still hold. If I take a handle for X and use it > on Y, I have a real good chance of getting STALE but there is no > guarantee that I will. >=20 > The specific issue that started this (and that I'm still talking=20 > about) is more limited. I'm given a handle H for a server A in a > layout and in that is the requirement that that handle be valid > for READ/WRITE, etc. and not for SETATTR. If the client uses it > on A and does a SETATTR, he SHOULD get an error. If he uses that > same handle on B and does a READ then he is broken but the server > has no obligation to recognize handles for other servers. Similarly > if he does a SETATTR with the handle on B. >=20 > > I prefer not have this added extra work on the server and just say=20 > > that the client should follow the rule to guaranty successful=20 operation.=20 >=20 > In the IETF "should" is very weak and amounts to "gee, it is sort of a > good idea to". "SHOULD" is much stronger and says "Do it unless you=20 have > a real good reason not to". "MUST" just says to do it.=20 >=20 > I guess I still think that if you receive a handle for server A in a=20 > layout, you MUST NOT use it to do operations on that server other than > PUTFH, COMMIT, READ, WRITE, and that if you do, the server SHOULD give > you an error. >=20 > If you feel this is too difficult for the server, then the "SHOULD"=20 > would give you enough wiggle-room. >=20 I can see that I can get a way with not checking on the server but I would=20 like to implement a server that is closer to the spec recommendation and understand the requirements. The "SHOULD" makes me a little uncomfortable=20 I would prefer "should", but if it is only me I will shut-up. I would rather keep using file handles as file ids which can be used on=20 any node of a cluster file system and not have to encode capabilities into=20 them. Now the client needs to remember all those different file handles=20 that should be used for different operations and the server verifying that=20 the client did so. Marc.=20 _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Wed Jun 22 16:54:09 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DlCEb-0002Gt-Hy; Wed, 22 Jun 2005 16:54:09 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DlCEX-0002FH-7n; Wed, 22 Jun 2005 16:54:05 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id QAA04409; Wed, 22 Jun 2005 16:54:02 -0400 (EDT) Received: from e2.ny.us.ibm.com ([32.97.182.142]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DlCcj-0001jX-BT; Wed, 22 Jun 2005 17:19:05 -0400 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e2.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j5MKrtZp003553; Wed, 22 Jun 2005 16:53:55 -0400 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay02.pok.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j5MKrtZJ224388; Wed, 22 Jun 2005 16:53:55 -0400 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.12.11/8.13.3) with ESMTP id j5MKrtFk022236; Wed, 22 Jun 2005 16:53:55 -0400 Received: from [9.56.227.90] (d01ml604.pok.ibm.com [9.56.227.90]) by d01av04.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j5MKrtEM022233; Wed, 22 Jun 2005 16:53:55 -0400 In-Reply-To: To: "Noveck, Dave" Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) MIME-Version: 1.0 X-Mailer: Lotus Notes Build V70_M4_01112005 Beta 3NP January 11, 2005 Message-ID: From: Marc Eshel Date: Wed, 22 Jun 2005 13:53:44 -0700 X-MIMETrack: Serialize by Router on D01ML604/01/M/IBM(Build V70_06092005|June 09, 2005) at 06/22/2005 16:53:55, Serialize complete at 06/22/2005 16:53:55 Content-Type: text/plain; charset="US-ASCII" X-Spam-Score: 0.0 (/) X-Scan-Signature: 41c17b4b16d1eedaa8395c26e9a251c4 Cc: "Goodson, Garth" , nfsv4@ietf.org, nfsv4-bounces@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org "Noveck, Dave" on 06/22/2005 11:58:12 AM: > Whoa! Before you said you were OK with the client obligation (to only > do > READ/WRITE/etc on filehandles it got from layouts) and only objected > to the work of the server verifying compliance. > > Now it appears that you object to forcing the client to obey that rule, > i.e. that the problem with the verification is not that it is hard to > do but that it would make the client remember "all those different file > handles that should be used for different operations". If that is the > case then we have a real problem. You have an implementation in which > every server may act as a metadata server but the pnfs client cannot > assume that all of the implementations with which it will interact > will have that characteristic or else we have a massive (lack-of)- > interoperability problem. If a layout tells the client he may use > handle A on server X to READ/WRITE then he had to be capable of > respecting that, whether the server holds him to it or not. I don't object. I just voiced a concern about the implementation overhead. It is obvious that I am thinking of cluster filesystem only and if there is a need for other implementation to require that the client use only the file handles provided for each specific operation then fine. > I'm perfectly OK with exposing additional functionality that a > cluster fs would provide for metadata load-balancing and failover > as long as we are clear that this is something that the client is > directed to use based on server characteristics. For example, if > the devinfo entry says that the layout handle may be used to read/ > write on a certain set of guaranteed-equivalent servers, then this > is fine. Or if a locations_info attribute for the fs indicated that > coherent metadata service was available on a given set of servers, > then this is OK as well. But each of these options is an option > and the basic architecture of pnfs is that there is a distinction > between data service and meta-data service and that the client > has to maintain that distinction. Just as a pnfs client should > not use a block address in a SETATTR request or send a filehandle > in a SCSI block write :-), it should not send a handle it got from > a layout in a SETATTR request. It should not send a filehandle it > got from the meta-data server to a data server unless it has some > specific guidance that it can, such as a locations_info attribute > saying servers X, Y, Z are equivalent. The important point is > that that latter is not always going to be there and the client > may not assume that it is. This sound like a good compromise I would like to see the above options in the protocol. > > and the server verifying that > > the client did so. > > The verification is a big help when testing. This is going to > be more complicated that what we've done in the past and the > earlier we detect a problem the better off we are all going > to be. I wouldn't think of trying to make this work without > that kind of checking, particular given all the possible types > of implementations we have been talking about here. All this > requires is one bit in a file handle saying whether it gives the > right to do all operations (including metadata-server operations) > or just the subset for data server operations. If you are inclined > not to do this, my question would be, "Do you feel lucky?". > Like you say it is only one bit and it is not to difficult to implement on the server, but now you force the client to remember 2 file handles that a different only by one bit (big waste of space :). The client need to remember only which are metadata servers and which are data servers and be required to direct operation to the appropriate server. If the client made a mistake and the server cares than the server can reject the operation. Marc. _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Wed Jun 22 21:51:22 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DlGsE-0005W1-7P; Wed, 22 Jun 2005 21:51:22 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DlGsC-0005Vt-6v; Wed, 22 Jun 2005 21:51:20 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id VAA25753; Wed, 22 Jun 2005 21:51:15 -0400 (EDT) Received: from brmea-mail-4.sun.com ([192.18.98.36]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DlHGM-0006OL-Lt; Wed, 22 Jun 2005 22:16:21 -0400 Received: from sfbaymail1sca.SFBay.Sun.COM ([129.145.154.35]) by brmea-mail-4.sun.com (8.12.10/8.12.9) with ESMTP id j5N1pCqg020981; Wed, 22 Jun 2005 19:51:12 -0600 (MDT) Received: from sheplap.Central.Sun.COM (sheplap.Central.Sun.COM [10.1.194.251]) by sfbaymail1sca.SFBay.Sun.COM (8.12.10+Sun/8.12.10/ENSMAIL,v2.2) with ESMTP id j5N1pBPu013826; Wed, 22 Jun 2005 18:51:11 -0700 (PDT) Received: by sheplap.Central.Sun.COM (Postfix, from userid 76367) id EA1F44065BF; Wed, 22 Jun 2005 20:51:50 -0500 (CDT) Date: Wed, 22 Jun 2005 20:51:50 -0500 From: Spencer Shepler To: Marc Eshel Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Message-ID: <20050623015150.GV5698@sheplap.Central.Sun.COM> References:

Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i X-Spam-Score: 0.0 (/) X-Scan-Signature: 8b30eb7682a596edff707698f4a80f7d Cc: "Goodson, Garth" , nfsv4-bounces@ietf.org, "Noveck, Dave" , nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: spencer.shepler@sun.com List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org On Wed, Marc Eshel wrote: <...> > > > and the server verifying that > > > the client did so. > > > > The verification is a big help when testing. This is going to > > be more complicated that what we've done in the past and the > > earlier we detect a problem the better off we are all going > > to be. I wouldn't think of trying to make this work without > > that kind of checking, particular given all the possible types > > of implementations we have been talking about here. All this > > requires is one bit in a file handle saying whether it gives the > > right to do all operations (including metadata-server operations) > > or just the subset for data server operations. If you are inclined > > not to do this, my question would be, "Do you feel lucky?". > > > Like you say it is only one bit and it is not to difficult to implement on > the server, but now you force the client to remember 2 file handles that a > different only by one bit (big waste of space :). The client need to > remember only which are metadata servers and which are data servers and be > required to direct operation to the appropriate server. If the client made > a mistake and the server cares than the server can reject the operation. > Marc. This is a general comment (and one I have made before) based on the expectations of what the server will "look" like for a pNFS extension. It would be prudent identify the various operational, deployment or implementation models that people either have planned for the pNFS functionality or can reasonably imagine. It will be important so that we have this in mind when reviewing the general protocol for the subtle interactions of meta-data and data servers as has been identified in this thread of discussion. Oh yeah, the above is with my working group co-chair hat on. Spencer _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Thu Jun 23 01:12:12 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DlK0a-0001P2-2I; Thu, 23 Jun 2005 01:12:12 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DlK0V-0001Lv-LG; Thu, 23 Jun 2005 01:12:07 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id BAA11312; Thu, 23 Jun 2005 01:12:07 -0400 (EDT) Received: from e5.ny.us.ibm.com ([32.97.182.145]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DlKOj-0006hN-PY; Thu, 23 Jun 2005 01:37:12 -0400 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e5.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j5N5Bf4e013661; Thu, 23 Jun 2005 01:11:41 -0400 Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay02.pok.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j5N5BfZJ226824; Thu, 23 Jun 2005 01:11:41 -0400 Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.12.11/8.13.3) with ESMTP id j5N5BVKu004552; Thu, 23 Jun 2005 01:11:31 -0400 Received: from [9.56.227.90] (d01ml604.pok.ibm.com [9.56.227.90]) by d01av03.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j5N5BVum004222; Thu, 23 Jun 2005 01:11:31 -0400 In-Reply-To: <20050623015150.GV5698@sheplap.Central.Sun.COM> To: spencer.shepler@sun.com Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) MIME-Version: 1.0 X-Mailer: Lotus Notes Build V70_M4_01112005 Beta 3NP January 11, 2005 Message-ID: From: Marc Eshel Date: Wed, 22 Jun 2005 22:11:09 -0700 X-MIMETrack: Serialize by Router on D01ML604/01/M/IBM(Build V70_06092005|June 09, 2005) at 06/23/2005 01:11:30, Serialize complete at 06/23/2005 01:11:30 Content-Type: text/plain; charset="US-ASCII" X-Spam-Score: 0.0 (/) X-Scan-Signature: 8b30eb7682a596edff707698f4a80f7d Cc: "Goodson, Garth" , nfsv4-bounces@ietf.org, "Noveck, Dave" , nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org Spencer Shepler wrote on 06/22/2005 06:51:50 PM: > It would be prudent identify the various operational, deployment or > implementation models that people either have planned for the pNFS > functionality or can reasonably imagine. It will be important so that > we have this in mind when reviewing the general protocol for the > subtle interactions of meta-data and data servers as has been > identified in this thread of discussion. > > Oh yeah, the above is with my working group co-chair hat on. > I just started to think about this topic lately so I don't have a clear model so will just dump what I think that I can or would like to do in short (I am always very terse but I will try to elaborate:). Give a cluster filesystem where all the data is available on all the nodes I would like to use pNFS to do parallel I/O from as many nodes as possible so I would make all nodes to be data servers. I believe the metadata operation can saturate a single node even if it not doing any data I/O so I would like all the nodes to also be metadata server, in other words distribute all the operations to all the nodes. Or, direct all clients to a specific node for a file or a files segment because it is cached on that node or that node has faster access to the disks. Have multiple alternate nodes or maybe all the nodes for any given I/O in the case of an error. Return the data from the metadata server for small files and avoid all the layout exchange and redirection. Have short way to reference a list of nodes that can be in the hundreds that can be given once and not repeated in every layout. Not have to many requirement to validate correct client behavior which requires a lot of book keeping on the server side if the only thing it affected is performance (in the case of cluster filesystem) after all if the client requested all the data from the metadata server it is valid option and no one will produce any error codes. I am not sure if this is much help but I plan to spend much more time on the topic when I get back from vacation in 3 weeks and provide more input. I think that Dave Noveck suggested in his last note to add some options that will help with cluster filesystem implementations and I think this is a good idea :) maybe we need some more input from other cluster filesystem planed or even prototyped implementations. Marc. _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Thu Jun 23 18:26:21 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dla9M-0008CL-VW; Thu, 23 Jun 2005 18:26:20 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dla9K-0008CD-GM for nfsv4@megatron.ietf.org; Thu, 23 Jun 2005 18:26:18 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA09444 for ; Thu, 23 Jun 2005 18:26:16 -0400 (EDT) Received: from gw-w.panasas.com ([63.80.58.206] helo=medlicott.panasas.com) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DlaXj-0000yn-Ki for nfsv4@ietf.org; Thu, 23 Jun 2005 18:51:32 -0400 Received: from panasas.com (welch@localhost) by medlicott.panasas.com (8.11.6/8.11.6) with ESMTP id j5NMPqH30039; Thu, 23 Jun 2005 15:25:52 -0700 Message-Id: <200506232225.j5NMPqH30039@medlicott.panasas.com> X-Authentication-Warning: medlicott.panasas.com: welch owned process doing -bs X-Mailer: exmh version 2.7.3 (cvs) 04/15/2005 with nmh-1.0.4 To: Marc Eshel Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) In-reply-to: References: Comments: In-reply-to Marc Eshel message dated "Wed, 22 Jun 2005 13:53:44 -0700." From: Brent Welch X-URL: http://www.panasas.com/ X-Face: "HxE|?EnC9fVMV8f70H83&{fgLE.|FZ^$>@Q(yb#N,Eh~N]e&]=> r5~UnRml1:4EglY{9B+ :'wJq$@c_C!l8@<$t,{YUr4K,QJGHSvS~U]H`<+L*x?eGzSk>XH\W:AK\j?@?c1o, "Noveck, Dave" , nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org >>>Marc Eshel said: > > "Noveck, Dave" on 06/22/2005 11:58:12 AM: > > > Whoa! Before you said you were OK with the client obligation (to only > > do > > READ/WRITE/etc on filehandles it got from layouts) and only objected > > to the work of the server verifying compliance. > > > > Now it appears that you object to forcing the client to obey that rule, > > i.e. that the problem with the verification is not that it is hard to > > do but that it would make the client remember "all those different file > > handles that should be used for different operations". If that is the > > case then we have a real problem. You have an implementation in which > > every server may act as a metadata server but the pnfs client cannot > > assume that all of the implementations with which it will interact > > will have that characteristic or else we have a massive (lack-of)- > > interoperability problem. If a layout tells the client he may use > > handle A on server X to READ/WRITE then he had to be capable of > > respecting that, whether the server holds him to it or not. > > I don't object. I just voiced a concern about the implementation overhead. > It is obvious that I am thinking of cluster filesystem only and if there > is a need for other implementation to require that the client use only the > file handles provided for each specific operation then fine. > > > I'm perfectly OK with exposing additional functionality that a > > cluster fs would provide for metadata load-balancing and failover > > as long as we are clear that this is something that the client is > > directed to use based on server characteristics. For example, if > > the devinfo entry says that the layout handle may be used to read/ > > write on a certain set of guaranteed-equivalent servers, then this > > is fine. Or if a locations_info attribute for the fs indicated that > > coherent metadata service was available on a given set of servers, > > then this is OK as well. But each of these options is an option > > and the basic architecture of pnfs is that there is a distinction > > between data service and meta-data service and that the client > > has to maintain that distinction. Just as a pnfs client should > > not use a block address in a SETATTR request or send a filehandle > > in a SCSI block write :-), it should not send a handle it got from > > a layout in a SETATTR request. It should not send a filehandle it > > got from the meta-data server to a data server unless it has some > > specific guidance that it can, such as a locations_info attribute > > saying servers X, Y, Z are equivalent. The important point is > > that that latter is not always going to be there and the client > > may not assume that it is. > > This sound like a good compromise I would like to see the above options in > the protocol. I'd like to suggest that we mention the issues about multiple metadata servers, but that we don't explicitly address them in the current pNFS proposals. The goal is to get pNFS clients that interoperate with different servers. If some servers have very different semantics (transparent failover among them, servicing of metadata or data operations with internal forwarding, whatever) then that has a big impact on the clients. In otherwords, we are starting small with just an effort to distribute the I/O load. Bypassing the metadata server for I/O goes a long way to reducing load and providing scalability. Let's get that worked out before we do metadata load balancing. If you really, really, wanted to go there, then you could define a new layout type that returned, e.g., a set of equivalent (deviceID, filehandle) that the client could use based on the availability or load of the data server. You might also be tempted (as Dean is) to return layouts that hint to the client that if it did a GETATTR to a data server it would get back something sensible. However, I don't think we should go there, even though you and I, as cluster file system implementers may have already done that. > > > and the server verifying that > > > the client did so. > > > > The verification is a big help when testing. This is going to > > be more complicated that what we've done in the past and the > > earlier we detect a problem the better off we are all going > > to be. I wouldn't think of trying to make this work without > > that kind of checking, particular given all the possible types > > of implementations we have been talking about here. All this > > requires is one bit in a file handle saying whether it gives the > > right to do all operations (including metadata-server operations) > > or just the subset for data server operations. If you are inclined > > not to do this, my question would be, "Do you feel lucky?". > > > Like you say it is only one bit and it is not to difficult to implement on > the server, but now you force the client to remember 2 file handles that a > different only by one bit (big waste of space :). The client need to > remember only which are metadata servers and which are data servers and be > required to direct operation to the appropriate server. If the client made > a mistake and the server cares than the server can reject the operation. I have the same reaction as Dave - I don't see how you can argue that the spec should imply that the client can get away with switching around the handles used on the metadata servers and the data servers. If your implementation wants to give out the same bit pattern for these cases, that's fine. But clients simply MUST use the file handles in the layouts for operations on the corresponding device, and it is simply undefined what happens if they use a file handle from the metadata server with a data server or vice versa, or heck, swap around the file handles among the different data servers. "Of course" the clients will keep track of what handles are to be used with what servers and what operations, because they MUST. -- Brent Welch Software Architect, Panasas Inc Accelerating Time to Results(tm) with Clustered Storage www.panasas.com welch@panasas.com _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Thu Jun 23 18:50:59 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DlaXD-0003R3-Dd; Thu, 23 Jun 2005 18:50:59 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DlaXB-0003Qy-LT for nfsv4@megatron.ietf.org; Thu, 23 Jun 2005 18:50:57 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA11272 for ; Thu, 23 Jun 2005 18:50:55 -0400 (EDT) Received: from gw-w.panasas.com ([63.80.58.206] helo=medlicott.panasas.com) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DlavZ-00027g-21 for nfsv4@ietf.org; Thu, 23 Jun 2005 19:16:11 -0400 Received: from panasas.com (welch@localhost) by medlicott.panasas.com (8.11.6/8.11.6) with ESMTP id j5NMohJ30146; Thu, 23 Jun 2005 15:50:43 -0700 Message-Id: <200506232250.j5NMohJ30146@medlicott.panasas.com> X-Authentication-Warning: medlicott.panasas.com: welch owned process doing -bs X-Mailer: exmh version 2.7.3 (cvs) 04/15/2005 with nmh-1.0.4 To: Marc Eshel Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) In-reply-to: References: Comments: In-reply-to Marc Eshel message dated "Wed, 22 Jun 2005 22:11:09 -0700." From: Brent Welch X-URL: http://www.panasas.com/ X-Face: "HxE|?EnC9fVMV8f70H83&{fgLE.|FZ^$>@Q(yb#N,Eh~N]e&]=> r5~UnRml1:4EglY{9B+ :'wJq$@c_C!l8@<$t,{YUr4K,QJGHSvS~U]H`<+L*x?eGzSk>XH\W:AK\j?@?c1o, spencer.shepler@sun.com, "Noveck, Dave" , nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org >>>Marc Eshel said: > > Spencer Shepler wrote on 06/22/2005 06:51:50 PM: > > > It would be prudent identify the various operational, deployment or > > implementation models that people either have planned for the pNFS > > functionality or can reasonably imagine. It will be important so that > > we have this in mind when reviewing the general protocol for the > > subtle interactions of meta-data and data servers as has been > > identified in this thread of discussion. > > > > Oh yeah, the above is with my working group co-chair hat on. > > > I just started to think about this topic lately so I don't have a clear > model so will just dump what I think that I can or would like to do in > short (I am always very terse but I will try to elaborate:). Give a > cluster filesystem where all the data is available on all the nodes I > would like to use pNFS to do parallel I/O from as many nodes as possible > so I would make all nodes to be data servers. I believe the metadata > operation can saturate a single node even if it not doing any data I/O so > I would like all the nodes to also be metadata server, in other words > distribute all the operations to all the nodes. Or, direct all clients to > a specific node for a file or a files segment because it is cached on that > node or that node has faster access to the disks. Have multiple alternate > nodes or maybe all the nodes for any given I/O in the case of an error. > Return the data from the metadata server for small files and avoid all the > layout exchange and redirection. Have short way to reference a list of > nodes that can be in the hundreds that can be given once and not repeated > in every layout. Not have to many requirement to validate correct client > behavior which requires a lot of book keeping on the server side if the > only thing it affected is performance (in the case of cluster filesystem) > after all if the client requested all the data from the metadata server it > is valid option and no one will produce any error codes. > > I am not sure if this is much help but I plan to spend much more time on > the topic when I get back from vacation in 3 weeks and provide more input. > I think that Dave Noveck suggested in his last note to add some options > that will help with cluster filesystem implementations and I think this is > a good idea :) maybe we need some more input from other cluster filesystem > planed or even prototyped implementations. First I'll restate what I think your model is, and then describe another one. Under your cluster file system there is some storage substrate that today is hidden by your "nodes" (e.g., a back-end SAN). And, your nodes cooperate to manage metadata and each exports an identical view. You are thinking that pNFS will be another layer over your nodes, so that the underlying storage system is still fairly hidden. In this model, pNFS will let you fetch data for a single file from many "nodes" in parallel, and so get higher bandwidth (ideally) than a single node can deliver. Also, by artfully distributing the layouts returned to clients, you can smear the I/O load over more nodes and achieve more balanced load among your nodes. pNFS in its current form does not directly address the balancing of metadata operations like GETATTR over your nodes. The only approach I can offer you is that different clients mount different nodes to get a coarse level of metadata load balancing. As an aside, I think the FS_LOCATIONS attribute is similar in spirit to what you want, but you want it on a per-file basis. Today that operation applies to whole file systems for the purposes of migration. Ultimately I think we'll want a FILE_LOCATION attribute (or something) that redirects a client to a different metadata server. But that would be a different extension than pNFS. I think it could be orthogonal. A different model for your cluster file system is to bring the pNFS clients more tightly into your cluster file system by exposing more of the underlying storage layer. If you had a SAN, for example, then you'd be giving out block layouts and letting the clients sit right on the SAN and bypass your nodes altogether to do I/O. The objects world takes this approach. The clients can communicate directly with storage devices, and the storage devices don't really know how the objects being read/written by clients fit into the file system. The clients have to communicate with metadata servers which take the role of building up file system semantics on top of something with a simpler interface. Blocks are really simple, and objects are slightly richer. By shunting all the I/O load directly to the storage devices, the metadata servers don't have all that much work to do. So, I'd characterize this as an "asymmetric" model where data servers own particular pieces of storage, and the metadata servers direct clients to the appropriate location via layouts. In contrast, you have a "symmetric" model where any data is available at any storage server. But, ultimately there is a hidden asymmetric model unless you have fully replicated all the data on all nodes in the symmetric system. -- Brent Welch Software Architect, Panasas Inc Accelerating Time to Results(tm) with Clustered Storage www.panasas.com welch@panasas.com _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Thu Jun 23 20:06:09 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dlbhx-0000qk-60; Thu, 23 Jun 2005 20:06:09 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dlbhw-0000qb-6V; Thu, 23 Jun 2005 20:06:08 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id UAA18343; Thu, 23 Jun 2005 20:06:07 -0400 (EDT) Received: from e2.ny.us.ibm.com ([32.97.182.142]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1Dlc6M-0005w7-JF; Thu, 23 Jun 2005 20:31:23 -0400 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e2.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j5O05snQ022406; Thu, 23 Jun 2005 20:05:54 -0400 Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay02.pok.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j5O05siO261134; Thu, 23 Jun 2005 20:05:54 -0400 Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.12.11/8.13.3) with ESMTP id j5O05r11015950; Thu, 23 Jun 2005 20:05:53 -0400 Received: from [9.56.227.90] (d01ml604.pok.ibm.com [9.56.227.90]) by d01av03.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j5O05rNw015947; Thu, 23 Jun 2005 20:05:53 -0400 In-Reply-To: <200506232225.j5NMPqH30039@medlicott.panasas.com> To: Brent Welch Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) MIME-Version: 1.0 X-Mailer: Lotus Notes Build V70_M4_01112005 Beta 3NP January 11, 2005 Message-ID: From: Marc Eshel Date: Thu, 23 Jun 2005 17:05:40 -0700 X-MIMETrack: Serialize by Router on D01ML604/01/M/IBM(Build V70_06092005|June 09, 2005) at 06/23/2005 20:05:53, Serialize complete at 06/23/2005 20:05:53 Content-Type: text/plain; charset="US-ASCII" X-Spam-Score: 0.0 (/) X-Scan-Signature: 3d7f2f6612d734db849efa86ea692407 Cc: "Goodson, Garth" , nfsv4-bounces@ietf.org, "Noveck, Dave" , nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org nfsv4-bounces@ietf.org wrote on 06/23/2005 03:25:52 PM: > > >>>Marc Eshel said: > > > > "Noveck, Dave" on 06/22/2005 11:58:12 AM: > > > > > Whoa! Before you said you were OK with the client obligation (to only > > > do > > > READ/WRITE/etc on filehandles it got from layouts) and only objected > > > to the work of the server verifying compliance. > > > > > > Now it appears that you object to forcing the client to obey that rule, > > > i.e. that the problem with the verification is not that it is hard to > > > do but that it would make the client remember "all those different file > > > handles that should be used for different operations". If that is the > > > case then we have a real problem. You have an implementation in which > > > every server may act as a metadata server but the pnfs client cannot > > > assume that all of the implementations with which it will interact > > > will have that characteristic or else we have a massive (lack-of)- > > > interoperability problem. If a layout tells the client he may use > > > handle A on server X to READ/WRITE then he had to be capable of > > > respecting that, whether the server holds him to it or not. > > > > I don't object. I just voiced a concern about the implementation > overhead. > > It is obvious that I am thinking of cluster filesystem only and if there > > is a need for other implementation to require that the client use only > the > > file handles provided for each specific operation then fine. > > > > > I'm perfectly OK with exposing additional functionality that a > > > cluster fs would provide for metadata load-balancing and failover > > > as long as we are clear that this is something that the client is > > > directed to use based on server characteristics. For example, if > > > the devinfo entry says that the layout handle may be used to read/ > > > write on a certain set of guaranteed-equivalent servers, then this > > > is fine. Or if a locations_info attribute for the fs indicated that > > > coherent metadata service was available on a given set of servers, > > > then this is OK as well. But each of these options is an option > > > and the basic architecture of pnfs is that there is a distinction > > > between data service and meta-data service and that the client > > > has to maintain that distinction. Just as a pnfs client should > > > not use a block address in a SETATTR request or send a filehandle > > > in a SCSI block write :-), it should not send a handle it got from > > > a layout in a SETATTR request. It should not send a filehandle it > > > got from the meta-data server to a data server unless it has some > > > specific guidance that it can, such as a locations_info attribute > > > saying servers X, Y, Z are equivalent. The important point is > > > that that latter is not always going to be there and the client > > > may not assume that it is. > > > > This sound like a good compromise I would like to see the above options > in > > the protocol. > > I'd like to suggest that we mention the issues about multiple > metadata servers, but that we don't explicitly address them in > the current pNFS proposals. The goal is to get pNFS clients that > interoperate with different servers. If some servers have very > different semantics (transparent failover among them, servicing > of metadata or data operations with internal forwarding, whatever) > then that has a big impact on the clients. In otherwords, we are > starting small with just an effort to distribute the I/O load. > Bypassing the metadata server for I/O goes a long way to reducing > load and providing scalability. Let's get that worked out before > we do metadata load balancing. > > If you really, really, wanted to go there, then you could define > a new layout type that returned, e.g., a set of equivalent > (deviceID, filehandle) that the client could use based on > the availability or load of the data server. You might also > be tempted (as Dean is) to return layouts that hint to the client > that if it did a GETATTR to a data server it would get back > something sensible. However, I don't think we should go there, even > though you and I, as cluster file system implementers may have > already done that. > Yes I really really want to go there because there are few different cluster filesystems out there today with clusters of hounders and thousands of nodes and they can really really benefit from the p in pNFS. it is not some future requirement and I really don't want to wait for the next version of this protocol. I don't want to give hundred identical file handles, I want a way to give one file handles and tell the client to use it on a list of data servers that I can give once and reference over and over. I would also use Dean's hint for GETATTR. > > > > and the server verifying that > > > > the client did so. > > > > > > The verification is a big help when testing. This is going to > > > be more complicated that what we've done in the past and the > > > earlier we detect a problem the better off we are all going > > > to be. I wouldn't think of trying to make this work without > > > that kind of checking, particular given all the possible types > > > of implementations we have been talking about here. All this > > > requires is one bit in a file handle saying whether it gives the > > > right to do all operations (including metadata-server operations) > > > or just the subset for data server operations. If you are inclined > > > not to do this, my question would be, "Do you feel lucky?". > > > > > Like you say it is only one bit and it is not to difficult to implement > on > > the server, but now you force the client to remember 2 file handles that > a > > different only by one bit (big waste of space :). The client need to > > remember only which are metadata servers and which are data servers and > be > > required to direct operation to the appropriate server. If the client > made > > a mistake and the server cares than the server can reject the operation. > > I have the same reaction as Dave - I don't see how you can argue that > the spec should imply that the client can get away with switching > around the handles used on the metadata servers and the data servers. > If your implementation wants to give out the same bit pattern for > these cases, that's fine. But clients simply MUST use the file handles > in the layouts for operations on the corresponding device, and it > is simply undefined what happens if they use a file handle from the > metadata server with a data server or vice versa, or heck, swap around > the file handles among the different data servers. "Of course" the > clients will keep track of what handles are to be used with what > servers and what operations, because they MUST. > At first I just suggest that server will not have to verify the usage of the right file handle, not that the client swap around file handles (Dave said that I don't really have to If I don't want), but like you suggested I can use equivalent file handles to avoid the problem, the client have nothing to swap or get confused with. Now I suggest like in the above comment that the client get only one file handles so there is no possibility for confusion and we can save a lot of space. > -- > Brent Welch > Software Architect, Panasas Inc > Accelerating Time to Results(tm) with Clustered Storage > _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Thu Jun 23 20:24:58 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dlc0A-0004sN-Px; Thu, 23 Jun 2005 20:24:58 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dlc07-0004s8-4Q for nfsv4@megatron.ietf.org; Thu, 23 Jun 2005 20:24:57 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id UAA19264 for ; Thu, 23 Jun 2005 20:24:51 -0400 (EDT) Received: from e5.ny.us.ibm.com ([32.97.182.145]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DlcOV-0006uD-ER for nfsv4@ietf.org; Thu, 23 Jun 2005 20:50:07 -0400 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e5.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j5O0OdtC010668 for ; Thu, 23 Jun 2005 20:24:39 -0400 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay04.pok.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j5O0OdKb205956 for ; Thu, 23 Jun 2005 20:24:39 -0400 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11/8.13.3) with ESMTP id j5O0Oc5w000976 for ; Thu, 23 Jun 2005 20:24:39 -0400 Received: from [9.56.227.90] (d01ml604.pok.ibm.com [9.56.227.90]) by d01av02.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j5O0OchV000973; Thu, 23 Jun 2005 20:24:38 -0400 In-Reply-To: <200506232250.j5NMohJ30146@medlicott.panasas.com> To: Brent Welch Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) MIME-Version: 1.0 X-Mailer: Lotus Notes Build V70_M4_01112005 Beta 3NP January 11, 2005 Message-ID: From: Marc Eshel Date: Thu, 23 Jun 2005 17:24:26 -0700 X-MIMETrack: Serialize by Router on D01ML604/01/M/IBM(Build V70_06092005|June 09, 2005) at 06/23/2005 20:24:37, Serialize complete at 06/23/2005 20:24:37 Content-Type: text/plain; charset="US-ASCII" X-Spam-Score: 0.0 (/) X-Scan-Signature: 200d029292fbb60d25b263122ced50fc Cc: "Goodson, Garth" , spencer.shepler@sun.com, "Noveck, Dave" , nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org This my last note I am going on vacation and I will not have internet access for 3 weeks. I will continue to bug you when I come back. Marc. Brent Welch wrote on 06/23/2005 03:50:43 PM: > >>>Marc Eshel said: > > > > Spencer Shepler wrote on 06/22/2005 06:51:50 > PM: > > > > > It would be prudent identify the various operational, deployment or > > > implementation models that people either have planned for the pNFS > > > functionality or can reasonably imagine. It will be important so that > > > we have this in mind when reviewing the general protocol for the > > > subtle interactions of meta-data and data servers as has been > > > identified in this thread of discussion. > > > > > > Oh yeah, the above is with my working group co-chair hat on. > > > > > I just started to think about this topic lately so I don't have a clear > > model so will just dump what I think that I can or would like to do in > > short (I am always very terse but I will try to elaborate:). Give a > > cluster filesystem where all the data is available on all the nodes I > > would like to use pNFS to do parallel I/O from as many nodes as possible > > so I would make all nodes to be data servers. I believe the metadata > > operation can saturate a single node even if it not doing any data I/O > so > > I would like all the nodes to also be metadata server, in other words > > distribute all the operations to all the nodes. Or, direct all clients > to > > a specific node for a file or a files segment because it is cached on > that > > node or that node has faster access to the disks. Have multiple > alternate > > nodes or maybe all the nodes for any given I/O in the case of an error. > > Return the data from the metadata server for small files and avoid all > the > > layout exchange and redirection. Have short way to reference a list of > > nodes that can be in the hundreds that can be given once and not > repeated > > in every layout. Not have to many requirement to validate correct client > > behavior which requires a lot of book keeping on the server side if the > > only thing it affected is performance (in the case of cluster > filesystem) > > after all if the client requested all the data from the metadata server > it > > is valid option and no one will produce any error codes. > > > > I am not sure if this is much help but I plan to spend much more time on > > the topic when I get back from vacation in 3 weeks and provide more > input. > > I think that Dave Noveck suggested in his last note to add some options > > that will help with cluster filesystem implementations and I think this > is > > a good idea :) maybe we need some more input from other cluster > filesystem > > planed or even prototyped implementations. > > First I'll restate what I think your model is, and then describe another > one. > > Under your cluster file system there is some storage substrate that > today is hidden by your "nodes" (e.g., a back-end SAN). And, your > nodes cooperate to manage metadata and each exports an identical view. > You are thinking that pNFS will be another layer over your nodes, so > that the underlying storage system is still fairly hidden. In this model, > pNFS will let you fetch data for a single file from many "nodes" in > parallel, and so get higher bandwidth (ideally) than a single node > can deliver. Also, by artfully distributing the layouts returned to > clients, you can smear the I/O load over more nodes and achieve more > balanced load among your nodes. pNFS in its current form does not > directly address the balancing of metadata operations like GETATTR > over your nodes. The only approach I can offer you is that different > clients mount different nodes to get a coarse level of metadata > load balancing. > I would like to use fs_locations to distribute the client to different nodes. > As an aside, I think the FS_LOCATIONS attribute is similar in spirit > to what you want, but you want it on a per-file basis. Today that > operation applies to whole file systems for the purposes of migration. > Ultimately I think we'll want a FILE_LOCATION attribute (or something) > that redirects a client to a different metadata server. But that would > be a different extension than pNFS. I think it could be orthogonal. I don't think that it is orthogonal I was hoping that the output of pNFS will include file-locations. > A different model for your cluster file system is to bring the pNFS > clients more tightly into your cluster file system by exposing more > of the underlying storage layer. If you had a SAN, for example, then > you'd be giving out block layouts and letting the clients sit right > on the SAN and bypass your nodes altogether to do I/O. I prefer the file layout protocol on the block one. > The objects world takes this approach. The clients can communicate > directly with storage devices, and the storage devices don't really > know how the objects being read/written by clients fit into the > file system. The clients have to communicate with metadata servers > which take the role of building up file system semantics on top of > something with a simpler interface. Blocks are really simple, and > objects are slightly richer. By shunting all the I/O load directly > to the storage devices, the metadata servers don't have all that > much work to do. So, I'd characterize this as an "asymmetric" model > where data servers own particular pieces of storage, and the metadata > servers direct clients to the appropriate location via layouts. > In contrast, you have a "symmetric" model where any data is available > at any storage server. But, ultimately there is a hidden asymmetric > model unless you have fully replicated all the data on all nodes > in the symmetric system. > I have a symmetric system the only asymmetric might be in some configuration only in regards to performance and I would use the pNFS protocol optimize the performance where asymmetry exist. I believe that the cluster filesystem model is the simpler one and should be taken into consideration in the first version of pNFS. > -- > Brent Welch > Software Architect, Panasas Inc > Accelerating Time to Results(tm) with Clustered Storage _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Thu Jun 23 20:27:39 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dlc2l-0005kQ-J5; Thu, 23 Jun 2005 20:27:39 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dlc2j-0005iy-G0 for nfsv4@megatron.ietf.org; Thu, 23 Jun 2005 20:27:37 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id UAA19452 for ; Thu, 23 Jun 2005 20:27:36 -0400 (EDT) Received: from traakan.com ([66.160.190.59]) by ietf-mx.ietf.org with smtp (Exim 4.33) id 1DlcR6-0006xg-Je for nfsv4@ietf.org; Thu, 23 Jun 2005 20:52:52 -0400 Received: from GWW15 ([64.168.153.34]) by traakan.com for ; Thu, 23 Jun 2005 17:22:53 -0700 From: "Gordon Waidhofer" To: , Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Date: Thu, 23 Jun 2005 17:26:45 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1506 In-Reply-To: Importance: Normal X-Spam-Score: 0.0 (/) X-Scan-Signature: df9edf1223802dd4cf213867a3af6121 Content-Transfer-Encoding: 7bit X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org I'll plead ignorance right up front because I haven't had time to read the spec and still don't. But........ Something (metadata server I guess) is providing a (deviceID, fileHandle) pair for accessing file content. How is the deviceID mapped to a node address? Is it possible that the aggregation of data servers can be handled by the deviceID->node mapping (here is a list of alternate addresses for this deviceID)? I'm not a stake holder in pNFS at this time but think it likely in the future. I would, for the sake of sound forward progress, suggest that multiple metadata servers is an order of magnitude more complicated than the single metadata server case. It would hopelessly stall pNFS, and useful single metadata server deployments would stall needlessly because of it. There does seem to be fair bit of chatter about aggregate devices (clients able to alternate at will) that it's worth a little try. I agree with Spencer that case studies would go a long way to helping frame the mind and the discussion. FWIW. Regards, -gww > -----Original Message----- > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org]On Behalf Of > Marc Eshel > Sent: Thursday, June 23, 2005 5:06 PM > To: Brent Welch > Cc: Goodson, Garth; nfsv4-bounces@ietf.org; Noveck, Dave; nfsv4@ietf.org > Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) > > > nfsv4-bounces@ietf.org wrote on 06/23/2005 03:25:52 PM: > > > > > >>>Marc Eshel said: > > > > > > "Noveck, Dave" on 06/22/2005 11:58:12 AM: > > > > > > > Whoa! Before you said you were OK with the client obligation (to > only > > > > do > > > > READ/WRITE/etc on filehandles it got from layouts) and only > objected > > > > to the work of the server verifying compliance. > > > > > > > > Now it appears that you object to forcing the client to obey that > rule, > > > > i.e. that the problem with the verification is not that it is hard > to > > > > do but that it would make the client remember "all those different > file > > > > handles that should be used for different operations". If that is > the > > > > case then we have a real problem. You have an implementation in > which > > > > every server may act as a metadata server but the pnfs client > cannot > > > > assume that all of the implementations with which it will interact > > > > will have that characteristic or else we have a massive (lack-of)- > > > > interoperability problem. If a layout tells the client he may use > > > > handle A on server X to READ/WRITE then he had to be capable of > > > > respecting that, whether the server holds him to it or not. > > > > > > I don't object. I just voiced a concern about the implementation > > overhead. > > > It is obvious that I am thinking of cluster filesystem only and if > there > > > is a need for other implementation to require that the client use > only > > the > > > file handles provided for each specific operation then fine. > > > > > > > I'm perfectly OK with exposing additional functionality that a > > > > cluster fs would provide for metadata load-balancing and failover > > > > as long as we are clear that this is something that the client is > > > > directed to use based on server characteristics. For example, if > > > > the devinfo entry says that the layout handle may be used to read/ > > > > write on a certain set of guaranteed-equivalent servers, then this > > > > is fine. Or if a locations_info attribute for the fs indicated > that > > > > coherent metadata service was available on a given set of servers, > > > > then this is OK as well. But each of these options is an option > > > > and the basic architecture of pnfs is that there is a distinction > > > > between data service and meta-data service and that the client > > > > has to maintain that distinction. Just as a pnfs client should > > > > not use a block address in a SETATTR request or send a filehandle > > > > in a SCSI block write :-), it should not send a handle it got from > > > > a layout in a SETATTR request. It should not send a filehandle it > > > > got from the meta-data server to a data server unless it has some > > > > specific guidance that it can, such as a locations_info attribute > > > > saying servers X, Y, Z are equivalent. The important point is > > > > that that latter is not always going to be there and the client > > > > may not assume that it is. > > > > > > This sound like a good compromise I would like to see the above > options > > in > > > the protocol. > > > > I'd like to suggest that we mention the issues about multiple > > metadata servers, but that we don't explicitly address them in > > the current pNFS proposals. The goal is to get pNFS clients that > > interoperate with different servers. If some servers have very > > different semantics (transparent failover among them, servicing > > of metadata or data operations with internal forwarding, whatever) > > then that has a big impact on the clients. In otherwords, we are > > starting small with just an effort to distribute the I/O load. > > Bypassing the metadata server for I/O goes a long way to reducing > > load and providing scalability. Let's get that worked out before > > we do metadata load balancing. > > > > If you really, really, wanted to go there, then you could define > > a new layout type that returned, e.g., a set of equivalent > > (deviceID, filehandle) that the client could use based on > > the availability or load of the data server. You might also > > be tempted (as Dean is) to return layouts that hint to the client > > that if it did a GETATTR to a data server it would get back > > something sensible. However, I don't think we should go there, even > > though you and I, as cluster file system implementers may have > > already done that. > > > Yes I really really want to go there because there are few different > cluster filesystems out there today with clusters of hounders and > thousands of nodes and they can really really benefit from the p in pNFS. > it is not some future requirement and I really don't want to wait for the > next version of this protocol. I don't want to give hundred > identical file > handles, I want a way to give one file handles and tell the client to use > it on a list of data servers that I can give once and reference over and > over. I would also use Dean's hint for GETATTR. > > > > > > and the server verifying that > > > > > the client did so. > > > > > > > > The verification is a big help when testing. This is going to > > > > be more complicated that what we've done in the past and the > > > > earlier we detect a problem the better off we are all going > > > > to be. I wouldn't think of trying to make this work without > > > > that kind of checking, particular given all the possible types > > > > of implementations we have been talking about here. All this > > > > requires is one bit in a file handle saying whether it gives the > > > > right to do all operations (including metadata-server operations) > > > > or just the subset for data server operations. If you are > inclined > > > > > not to do this, my question would be, "Do you feel lucky?". > > > > > > > Like you say it is only one bit and it is not to difficult to > implement > > on > > > the server, but now you force the client to remember 2 file handles > that > > a > > > different only by one bit (big waste of space :). The client need to > > > remember only which are metadata servers and which are data servers > and > > be > > > required to direct operation to the appropriate server. If > the client > > > made > > > a mistake and the server cares than the server can reject the > operation. > > > > I have the same reaction as Dave - I don't see how you can argue that > > the spec should imply that the client can get away with switching > > around the handles used on the metadata servers and the data servers. > > If your implementation wants to give out the same bit pattern for > > these cases, that's fine. But clients simply MUST use the file handles > > in the layouts for operations on the corresponding device, and it > > is simply undefined what happens if they use a file handle from the > > metadata server with a data server or vice versa, or heck, swap around > > the file handles among the different data servers. "Of course" the > > clients will keep track of what handles are to be used with what > > servers and what operations, because they MUST. > > > At first I just suggest that server will not have to verify the usage of > the right file handle, not that the client swap around file handles (Dave > said that I don't really have to If I don't want), but like you suggested > I can use equivalent file handles to avoid the problem, the client have > nothing to swap or get confused with. Now I suggest like in the above > comment that the client get only one file handles so there is no > possibility for confusion and we can save a lot of space. > > > -- > > Brent Welch > > Software Architect, Panasas Inc > > Accelerating Time to Results(tm) with Clustered Storage > > > > > > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www1.ietf.org/mailman/listinfo/nfsv4 > > _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Sun Jun 26 14:14:55 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dmbeh-0002gz-Cp; Sun, 26 Jun 2005 14:14:55 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dmbef-0002gu-NQ for nfsv4@megatron.ietf.org; Sun, 26 Jun 2005 14:14:53 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id OAA29197 for ; Sun, 26 Jun 2005 14:14:51 -0400 (EDT) Received: from gw-e.panasas.com ([65.194.124.178] helo=blackcomb.panasas.com) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1Dmc3e-0007UI-9x for nfsv4@ietf.org; Sun, 26 Jun 2005 14:40:43 -0400 Received: from [127.0.0.1] (bhalevy@dynamic-vpn34.panasas.com [172.17.19.34]) by blackcomb.panasas.com (8.9.3/8.9.3) with ESMTP id OAA19272; Sun, 26 Jun 2005 14:14:39 -0400 Message-ID: <42BEF082.1040202@panasas.com> Date: Sun, 26 Jun 2005 21:14:26 +0300 From: Benny Halevy User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Garth Goodson Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) References: <42B8A899.5030204@netapp.com> In-Reply-To: <42B8A899.5030204@netapp.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Scan-Signature: 39bd8f8cbb76cae18b7e23f7cf6b2b9f Content-Transfer-Encoding: 7bit Cc: Marc Eshel , nfsv4@ietf.org, "Noveck, Dave" X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org On 6/22/2005, Garth Goodson wrote: [snip] >> >> I know I can do it. I just don't want to make sure (enforce the rule) >> that each client is using the file handles to read only from the >> specified data server. >> Marc. >> >> Marc. > > > Ok, that is a valid concern (not having to propagate layouts to the data > servers to validate that I/Os are coming from the correct clients). I > guess the object guys get around this by encoding the layout/device IDs > into the capability that is handed back to the client with the layout. T10's object capabilities model does not encode the client identity, nor the layout/device IDs into the capability so the object storage device (OSD) have no way to verify that an I/O request came from the "correct" client (e.g. the client that got the cap could have given it to another client and it would just work), yet the capability is generated for each device, so a cap for one device wouldn't work on another device if the two devices have different device keys. > > It has been marked as an open issue... > > -Garth > > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www1.ietf.org/mailman/listinfo/nfsv4 _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4