From nfsv4-bounces@ietf.org Mon Jun 06 15:16:59 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DfN5n-0004FK-PV; Mon, 06 Jun 2005 15:16:59 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DfN5m-0004FF-QB for nfsv4@megatron.ietf.org; Mon, 06 Jun 2005 15:16:58 -0400 Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id PAA00694 for ; Mon, 6 Jun 2005 15:16:56 -0400 (EDT) Received: from mm01snlnto.sandia.gov ([132.175.109.20]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DfNQf-0000pw-GC for nfsv4@ietf.org; Mon, 06 Jun 2005 15:38:35 -0400 Received: from 132.175.109.1 by MM01SNLNTO.sandia.gov with ESMTP ( Tumbleweed MMS SMTP Relay 01 (MMS v5.6.3)); Mon, 06 Jun 2005 13:16:32 -0600 X-Server-Uuid: 2C1074A8-2B28-4DE3-9F7D-FF40AE090BA2 Received: from ES23SNLNT.srn.sandia.gov (ec04snlnt.sandia.gov [134.253.164.156] (may be forged)) by mailgate.sandia.gov ( 8.13.3/8.13.3) with ESMTP id j56JGUYu025664 for ; Mon, 6 Jun 2005 13:16:31 -0600 (MDT) Received: from ES20SNLNT.srn.sandia.gov ([134.253.164.29]) by ES23SNLNT.srn.sandia.gov with Microsoft SMTPSVC(6.0.3790.211); Mon, 6 Jun 2005 13:16:31 -0600 x-mimeole: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Date: Mon, 6 Jun 2005 13:16:30 -0600 Message-ID: <80A84CB5E834D4439556D1F64A1FEB83DF394A@ES20SNLNT.srn.sandia.gov> Thread-Topic: Solaris 10 Nfsv4 bug Thread-Index: AcVqzD63kCdoLfbMSni760+yNsoj7w== From: "Wachdorf, Daniel R" To: nfsv4@ietf.org X-OriginalArrivalTime: 06 Jun 2005 19:16:31.0564 (UTC) FILETIME=[3F2BE0C0:01C56ACC] X-PMX-Version: 4.7.1.128075, Antispam-Engine: 2.0.3.2, Antispam-Data: 2005.6.6.23 X-WSS-ID: 6EBA7E9A1AG1504837-01-01 X-Spam-Score: 0.1 (/) X-Scan-Signature: cd26b070c2577ac175cd3a6d878c6248 Subject: [nfsv4] Solaris 10 Nfsv4 bug X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============2082309145==" Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org This is a multi-part message in MIME format. --===============2082309145== Content-class: urn:content-classes:message Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C56ACC.3F1B7A67" This is a multi-part message in MIME format. ------_=_NextPart_001_01C56ACC.3F1B7A67 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Is there a separate channel for reporting NFSv4 bugs in Solaris 10 or should I go through our normal support channels? -dan --------------------------------------=20 Daniel Wachdorf=20 drwachd@sandia.gov=20 Sandia National Laboratories=20 Cyber Security Technologies=20 505-284-8060=20 ------_=_NextPart_001_01C56ACC.3F1B7A67 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: quoted-printable Solaris 10 Nfsv4 bug

Is = there a separate channel for reporting NFSv4 bugs in Solaris 10 or = should I go through our normal support channels?

-dan

--------------------------------------
Daniel = Wachdorf
drwachd@sandia.gov
Sandia = National Laboratories
Cyber = Security Technologies
505-284-8060

------_=_NextPart_001_01C56ACC.3F1B7A67-- --===============2082309145== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 --===============2082309145==-- From nfsv4-bounces@ietf.org Mon Jun 06 16:07:47 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DfNsx-0000uM-2c; Mon, 06 Jun 2005 16:07:47 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DfNsw-0000uC-Hg for nfsv4@megatron.ietf.org; Mon, 06 Jun 2005 16:07:46 -0400 Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id QAA10473 for ; Mon, 6 Jun 2005 16:07:44 -0400 (EDT) Received: from brmea-mail-4.sun.com ([192.18.98.36]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DfODn-00044D-IZ for nfsv4@ietf.org; Mon, 06 Jun 2005 16:29:23 -0400 Received: from jurassic.eng.sun.com ([129.146.228.31]) by brmea-mail-4.sun.com (8.12.10/8.12.9) with ESMTP id j56K7aau029116; Mon, 6 Jun 2005 14:07:36 -0600 (MDT) Received: from [129.146.228.103] (kidrobot.SFBay.Sun.COM [129.146.228.103]) by jurassic.eng.sun.com (8.13.4+Sun/8.13.4) with ESMTP id j56K7ZFF523760; Mon, 6 Jun 2005 13:07:35 -0700 (PDT) Message-ID: <42A4ACA5.2040600@sun.com> Date: Mon, 06 Jun 2005 13:05:57 -0700 From: eric kustarz User-Agent: Mozilla Thunderbird 0.9 (X11/20041105) X-Accept-Language: en-us, en MIME-Version: 1.0 To: "Wachdorf, Daniel R" Subject: Re: [nfsv4] Solaris 10 Nfsv4 bug References: <80A84CB5E834D4439556D1F64A1FEB83DF394A@ES20SNLNT.srn.sandia.gov> In-Reply-To: <80A84CB5E834D4439556D1F64A1FEB83DF394A@ES20SNLNT.srn.sandia.gov> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Scan-Signature: 97adf591118a232206bdb5a27b217034 Content-Transfer-Encoding: 7bit Cc: nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org Wachdorf, Daniel R wrote: > > Is there a separate channel for reporting NFSv4 bugs in Solaris 10 or > should I go through our normal support channels? > In the "near" future, there will be an opensolaris nfs alias. For now, just use "spencer.shepler@sun.com" or me :) I imagine this is related to the problems reported on nfsv4@linux-nfs.org ? I was about to reply to you on that... i'll reply on that alias... eric > -dan > > -------------------------------------- > Daniel Wachdorf > drwachd@sandia.gov > Sandia National Laboratories > Cyber Security Technologies > 505-284-8060 > >------------------------------------------------------------------------ > >_______________________________________________ >nfsv4 mailing list >nfsv4@ietf.org >https://www1.ietf.org/mailman/listinfo/nfsv4 > > _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Mon Jun 13 13:17:49 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DhsZJ-00046u-Ob; Mon, 13 Jun 2005 13:17:49 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DhsZI-00046b-Jz for nfsv4@megatron.ietf.org; Mon, 13 Jun 2005 13:17:48 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id NAA00921 for ; Mon, 13 Jun 2005 13:17:45 -0400 (EDT) Received: from mx1.netapp.com ([216.240.18.38]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1Dhsvb-0007E2-PE for nfsv4@ietf.org; Mon, 13 Jun 2005 13:40:53 -0400 Received: from smtp2.corp.netapp.com (10.57.159.114) by mx1.netapp.com with ESMTP; 13 Jun 2005 10:17:38 -0700 X-IronPort-AV: i="3.93,194,1115017200"; d="scan'208"; a="193015688:sNHT15099024" Received: from [10.34.24.132] (loderunner.hq.netapp.com [10.34.24.132]) by smtp2.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id j5DHHcDF003679 for ; Mon, 13 Jun 2005 10:17:38 -0700 (PDT) Message-ID: <42ADBFB2.9090506@netapp.com> Date: Mon, 13 Jun 2005 10:17:38 -0700 From: Garth Goodson User-Agent: Debian Thunderbird 1.0.2 (X11/20050331) X-Accept-Language: en-us, en MIME-Version: 1.0 To: nfsv4@ietf.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Scan-Signature: 68c8cc8a64a9d0402e43b8eee9fc4199 Content-Transfer-Encoding: 7bit Subject: [nfsv4] new pNFS draft (draft-welch-pnfs-ops-02.txt) X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org I have incorporated a substantial number of additions and changes into the pNFS operations draft ID. Many of the changes arise from feedback from other individuals/companies interested in pNFS. This draft adds a NFSv4 file-layout specification and attempts to refine the pNFS semantics. It is based off of the early pNFS drafts. See http://www.ietf.org/internet-drafts/draft-welch-pnfs-ops-02.txt Feedback and comments are welcome... -Garth Goodson _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Tue Jun 14 09:54:59 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DiBsY-00047P-Uv; Tue, 14 Jun 2005 09:54:58 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DiBsX-00047J-9u for nfsv4@megatron.ietf.org; Tue, 14 Jun 2005 09:54:57 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id JAA19115 for ; Tue, 14 Jun 2005 09:54:55 -0400 (EDT) Received: from gw-e.panasas.com ([65.194.124.178] helo=blackcomb.panasas.com) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DiCEy-0002L7-Qo for nfsv4@ietf.org; Tue, 14 Jun 2005 10:18:12 -0400 Received: from helprin.panasas.com (helprin.panasas.com [172.17.1.43]) by blackcomb.panasas.com (8.9.3/8.9.3) with ESMTP id JAA08219 for ; Tue, 14 Jun 2005 09:54:45 -0400 X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 From: Jim Zelenka To: nfsv4@ietf.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Tue, 14 Jun 2005 09:54:45 -0400 Message-ID: <12543.1118757285@helprin.panasas.com> X-Spam-Score: 0.0 (/) X-Scan-Signature: 856eb5f76e7a34990d1d457d8e8e5b7f Subject: [nfsv4] pNFS draft for object-based storage (draft-zelenka-pnfs-obj-00.txt) X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: jimz@panasas.com List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org As a companion to draft-welch-pnfs-ops-02.txt, we have published a document draft-zelenka-pnfs-obj-00.txt (http://www.ietf.org/internet-drafts/d raft-zelenka-pnfs-obj-00.txt) which proposes pNFS handling for object-based storage, and also suggets some refinements of the pNFS protocol. We welcome all comments/feedback/etc. Thanks, Jim Zelenka -- Jim Zelenka Software Engineer, Panasas, Inc. Accelerating Time to Results(TM) with Clustered Storage www.panasas.com 412.323-3500 _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Mon Jun 20 15:40:24 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkS88-0005RZ-Kj; Mon, 20 Jun 2005 15:40:24 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkS87-0005RU-RO for nfsv4@megatron.ietf.org; Mon, 20 Jun 2005 15:40:23 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id PAA02324 for ; Mon, 20 Jun 2005 15:40:21 -0400 (EDT) Received: from mx1.netapp.com ([216.240.18.38]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DkSVr-0002nq-Qr for nfsv4@ietf.org; Mon, 20 Jun 2005 16:04:58 -0400 Received: from smtp2.corp.netapp.com (10.57.159.114) by mx1.netapp.com with ESMTP; 20 Jun 2005 12:40:01 -0700 X-IronPort-AV: i="3.93,215,1115017200"; d="scan'208"; a="199829112:sNHT19323564" Received: from [10.34.24.132] (loderunner.hq.netapp.com [10.34.24.132]) by smtp2.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id j5KJdxxu023707 for ; Mon, 20 Jun 2005 12:39:59 -0700 (PDT) Message-ID: <42B71B8F.6020704@netapp.com> Date: Mon, 20 Jun 2005 12:39:59 -0700 From: Garth Goodson User-Agent: Debian Thunderbird 1.0.2 (X11/20050602) X-Accept-Language: en-us, en MIME-Version: 1.0 To: nfsv4@ietf.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Scan-Signature: b2809b6f39decc6de467dcf252f42af1 Content-Transfer-Encoding: 7bit Subject: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org There are a number of open issues and changes that were suggested for the pnfs draft ID (draft-welch-pnfs-ops-02.txt). Please post any feedback or opinions (or feel free to add more). Thanks, -Garth 3.1.1 Device IDs Proposal: device ids are valid only while a server is up, remappings while server is up must use a different device id. (I prefer this) Alternative: device ids are attached to leases and may be timed-out (probably also need to be recallable or invalidated) 3.1.2 Aggregation Schemes Proposal: aggregation scheme (e.g., striping is part of opaque layout-type defined structure) Alternative: make a general striping aggregation that is outside of the opaque structure; push striping up a level... (it references layout-type opaque specific structures which contain the devices) 3.2.2 Operation sequencing Issue: Race condition exists between LAYOUTGET/LAYOUTRECALL passing each other on wire. Sessions does not solve this since they are on different channels. May require a seq-id to be returned in layout ops. 3.3.1 Identifying layouts Proposal: layouts are identified by Alternative: may need to distinguish between read vs. read/write layout types, e.g., block/obj layouts may have separate read vs. read/write layouts and will want to handle them separately. Propose adding iomode to layout identification (and allowing recall to specify the specific mode or any mode). 3.3.3 Copy-on-write Same as discussion for 3.3.1 3.4 Recalling a LAYOUT Addition: Add a recall to recall all layouts pertaining to a specific fsid. Issue: Long callback recalls (if client has dirty data that needs to be flushed). Mostly wording on how long client should have to write data while holding the layout. It can always revert to sending the data through the metadata server if I/Os to data servers fail (due to server revoking layout). 3.5 Committing a layout Addition: add mtime/atime hints to LAYOUTCOMMIT (server can use them or not based on current state -- e.g., server should not allow time to go backwards and should not set the mtime if the mtime is already higher than that being set). SETATTR used to specify an exact time and is constrained by regular V4 semantics. The main difference between the time in LAYOUTCOMMIT and SETATTR is that the times set by SETATTR are mandatory vs. hints. 3.5.1 LAYOUTCOMMIT and EOF Change: instead of specifying newEOF with a flag (depending on whether client thinks it is setting a new EOF), client will specify the last byte to which it wrote. 5.1 File Striping and Data Access Change: simplify striping layout -- have enum for SPARSE vs. DENSE layout instead of skip and start offset Issue: think about what error gets returned if a client performs a non(READ/WRITE/PUTFH/COMMIT) at a data server; issue: this may be a problem if a regular nfsv4 data server is used as it has no way to differentiate accesses. 5.2 Global Stateids Issue: does not provide for unmodified NFSv4 data servers. More thinking must be done if unmodified V4 servers are to be used as data servers. See section: 5.7 for discussion. 5.3 I/O mode Change: Don't restrict I/O mode to be RW. It may be useful to have read-only replicas to which a client can be directed if the iomode is READ. 5.4.1 Lock State Propagation Issue: can seq ids in stateids be ignored on the data servers (what about if sessions are used?) 5.4.3 Access State Propagation Issue: NFSv4 spec says that READs/WRITEs do not require same principal as OPEN. This opens a security hole, but some implementations depend on it. pNFS should probably go with the spec on this and not change the semantics. 6.3 pnfs_devaddr4 Change: switch on layouttype4 instead of devaddrtypes4 Change?: add disk signature to list of types 7 pnfs File Attributes Add: PREFERRED_ALIGNMENT, PREFERRED_BLOCKSIZE as FSID level attributes 9.2 LAYOUTCOMMIT Add: mtime/atime time attribute hints to args Change: neweof in result as per Object Internet Draft. Basically, have a specific structure for each layout type that is the layoutcommit layout (vs. the layout received by layout get). This allows extra opaque data to be sent on layoutcommit. General: IANA - think about whether additional layouttypes go through IANA or specification process. _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Mon Jun 20 16:01:26 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkSSU-0008Tr-2c; Mon, 20 Jun 2005 16:01:26 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkSSS-0008TQ-Mt for nfsv4@megatron.ietf.org; Mon, 20 Jun 2005 16:01:24 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id QAA04569 for ; Mon, 20 Jun 2005 16:01:22 -0400 (EDT) Received: from dsl093-002-214.det1.dsl.speakeasy.net ([66.93.2.214] helo=pickle.fieldses.org) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DkSqD-0003Xl-UQ for nfsv4@ietf.org; Mon, 20 Jun 2005 16:25:59 -0400 Received: from bfields by pickle.fieldses.org with local (Exim 4.51) id 1DkSSN-00047M-PO; Mon, 20 Jun 2005 16:01:19 -0400 Date: Mon, 20 Jun 2005 16:01:19 -0400 To: Garth Goodson Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Message-ID: <20050620200119.GB12019@fieldses.org> References: <42B71B8F.6020704@netapp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <42B71B8F.6020704@netapp.com> User-Agent: Mutt/1.5.9i From: "J. Bruce Fields" X-Spam-Score: 0.0 (/) X-Scan-Signature: 7a6398bf8aaeabc7a7bb696b6b0a2aad Cc: nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org On Mon, Jun 20, 2005 at 12:39:59PM -0700, Garth Goodson wrote: > 5.4.3 Access State Propagation > > Issue: NFSv4 spec says that READs/WRITEs do not require same principal > as OPEN. This opens a security hole, but some implementations depend on > it. pNFS should probably go with the spec on this and not change the > semantics. I still disagree here. In my opinion this was a mistake in the protocol--users shouldn't be able to close other users' opens. At the very least, we shouldn't allow it to propogate further. --b. _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Mon Jun 20 16:04:18 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkSVG-0001WP-CH; Mon, 20 Jun 2005 16:04:18 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkSVF-0001Vd-0A for nfsv4@megatron.ietf.org; Mon, 20 Jun 2005 16:04:17 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id QAA05569 for ; Mon, 20 Jun 2005 16:04:15 -0400 (EDT) Received: from dsl093-002-214.det1.dsl.speakeasy.net ([66.93.2.214] helo=pickle.fieldses.org) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DkSt1-0003tN-BV for nfsv4@ietf.org; Mon, 20 Jun 2005 16:28:51 -0400 Received: from bfields by pickle.fieldses.org with local (Exim 4.51) id 1DkSVC-00049N-S3; Mon, 20 Jun 2005 16:04:14 -0400 Date: Mon, 20 Jun 2005 16:04:14 -0400 To: Garth Goodson Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Message-ID: <20050620200414.GC12019@fieldses.org> References: <42B71B8F.6020704@netapp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <42B71B8F.6020704@netapp.com> User-Agent: Mutt/1.5.9i From: "J. Bruce Fields" X-Spam-Score: 0.0 (/) X-Scan-Signature: 08170828343bcf1325e4a0fb4584481c Cc: nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org On Mon, Jun 20, 2005 at 12:39:59PM -0700, Garth Goodson wrote: > 3.1.1 Device IDs > > Proposal: device ids are valid only while a server is up, remappings > while server is up must use a different device id. (I prefer this) > > Alternative: device ids are attached to leases and may be timed-out > (probably also need to be recallable or invalidated) Note the language in 3.1.1 also needs to be clarified to make sure which of the above is proposed. (E.g. it says the mapping may change on reboot, but doesn't explicitly forbid changes between reboots; in fact the last sentence implies that such changes are allowable.) --b. _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Mon Jun 20 16:51:15 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkTEg-0001ga-Td; Mon, 20 Jun 2005 16:51:14 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkTEc-0001gN-0t for nfsv4@megatron.ietf.org; Mon, 20 Jun 2005 16:51:10 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id QAA09276 for ; Mon, 20 Jun 2005 16:51:07 -0400 (EDT) Received: from mx2.netapp.com ([216.240.18.37]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DkTcN-0005Db-Mp for nfsv4@ietf.org; Mon, 20 Jun 2005 17:15:45 -0400 Received: from smtp2.corp.netapp.com (10.57.159.114) by mx2.netapp.com with ESMTP; 20 Jun 2005 13:50:59 -0700 X-IronPort-AV: i="3.93,216,1115017200"; d="scan'208"; a="250302574:sNHT15879316" Received: from [10.34.24.132] (loderunner.hq.netapp.com [10.34.24.132]) by smtp2.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id j5KKowsV007675; Mon, 20 Jun 2005 13:50:58 -0700 (PDT) Message-ID: <42B72C32.1010404@netapp.com> Date: Mon, 20 Jun 2005 13:50:58 -0700 From: Garth Goodson User-Agent: Debian Thunderbird 1.0.2 (X11/20050602) X-Accept-Language: en-us, en MIME-Version: 1.0 To: "J. Bruce Fields" Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) References: <42B71B8F.6020704@netapp.com> <20050620200414.GC12019@fieldses.org> In-Reply-To: <20050620200414.GC12019@fieldses.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Scan-Signature: ea4ac80f790299f943f0a53be7e1a21a Content-Transfer-Encoding: 7bit Cc: nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org J. Bruce Fields wrote: > On Mon, Jun 20, 2005 at 12:39:59PM -0700, Garth Goodson wrote: > >>3.1.1 Device IDs >> >>Proposal: device ids are valid only while a server is up, remappings >>while server is up must use a different device id. (I prefer this) >> >>Alternative: device ids are attached to leases and may be timed-out >>(probably also need to be recallable or invalidated) > > > Note the language in 3.1.1 also needs to be clarified to make sure which > of the above is proposed. (E.g. it says the mapping may change on > reboot, but doesn't explicitly forbid changes between reboots; in fact > the last sentence implies that such changes are allowable.) > The last sentence is saying something different. It is saying that if the mapping between data and the deviceID changes, then the layout should be recalled -- not the mapping between the deviceID and the device. There are a number of wording changes that will be made (throughout the document). I have not commented on these types of changes. -Garth > --b. _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Mon Jun 20 16:53:05 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkTGT-0001kn-54; Mon, 20 Jun 2005 16:53:05 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkTGQ-0001kV-3i for nfsv4@megatron.ietf.org; Mon, 20 Jun 2005 16:53:02 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id QAA09306 for ; Mon, 20 Jun 2005 16:52:58 -0400 (EDT) Received: from e1.ny.us.ibm.com ([32.97.182.141]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DkTeA-0005Fl-24 for nfsv4@ietf.org; Mon, 20 Jun 2005 17:17:35 -0400 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e1.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j5KKqmHI016139 for ; Mon, 20 Jun 2005 16:52:48 -0400 Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay02.pok.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j5KKqlLd236780 for ; Mon, 20 Jun 2005 16:52:47 -0400 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.12.11/8.13.3) with ESMTP id j5KKql5W014277 for ; Mon, 20 Jun 2005 16:52:47 -0400 Received: from [9.56.227.90] (d01ml604.pok.ibm.com [9.56.227.90]) by d01av01.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j5KKql50014257 for ; Mon, 20 Jun 2005 16:52:47 -0400 In-Reply-To: <42B71B8F.6020704@netapp.com> To: Garth Goodson Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) MIME-Version: 1.0 X-Mailer: Lotus Notes Build V70_M4_01112005 Beta 3NP January 11, 2005 Message-ID: From: Marc Eshel Date: Mon, 20 Jun 2005 13:52:38 -0700 X-MIMETrack: Serialize by Router on D01ML604/01/M/IBM(Build V70_06092005|June 09, 2005) at 06/20/2005 16:52:47, Serialize complete at 06/20/2005 16:52:47 X-Spam-Score: 0.0 (/) X-Scan-Signature: 8b431ad66d60be2d47c7bfeb879db82c Cc: nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============0138329349==" Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org This is a multipart message in MIME format. --===============0138329349== Content-Type: multipart/alternative; boundary="=_alternative 0072A52188257026_=" This is a multipart message in MIME format. --=_alternative 0072A52188257026_= Content-Type: text/plain; charset="US-ASCII" nfsv4-bounces@ietf.org wrote on 06/20/2005 12:39:59 PM: > 5.1 File Striping and Data Access > > Change: simplify striping layout -- have enum for SPARSE vs. DENSE > layout instead of skip and start offset > > Issue: think about what error gets returned if a client performs a > non(READ/WRITE/PUTFH/COMMIT) at a data server; issue: this may be a > problem if a regular nfsv4 data server is used as it has no way to > differentiate accesses. > Why is should it be an error in the first place. I would like all the nodes in my cluster filesystem to take the roles of metadata server or data server for each file independently. Marc. --=_alternative 0072A52188257026_= Content-Type: text/html; charset="US-ASCII"

nfsv4-bounces@ietf.org wrote on 06/20/2005 12:39:59 PM:

> 5.1 File Striping and Data Access
>
> Change: simplify striping layout -- have enum for SPARSE vs. DENSE
> layout instead of skip and start offset
>
> Issue: think about what error gets returned if a client performs a
> non(READ/WRITE/PUTFH/COMMIT) at a data server; issue: this may be a
> problem if a regular nfsv4 data server is used as it has no way to
> differentiate accesses.
>

Why is should it be an error in the first place. I would like all the nodes in my cluster filesystem to take the roles of metadata server or data server for each file independently.

Marc.  
--=_alternative 0072A52188257026_=-- --===============0138329349== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 --===============0138329349==-- From nfsv4-bounces@ietf.org Mon Jun 20 16:55:46 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkTJ4-0002Ol-Lc; Mon, 20 Jun 2005 16:55:46 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkTJ2-0002OK-Q0 for nfsv4@megatron.ietf.org; Mon, 20 Jun 2005 16:55:44 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id QAA09558 for ; Mon, 20 Jun 2005 16:55:42 -0400 (EDT) Received: from dsl093-002-214.det1.dsl.speakeasy.net ([66.93.2.214] helo=pickle.fieldses.org) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DkTgo-0005L2-Ps for nfsv4@ietf.org; Mon, 20 Jun 2005 17:20:20 -0400 Received: from bfields by pickle.fieldses.org with local (Exim 4.51) id 1DkTJ0-0004eV-NV; Mon, 20 Jun 2005 16:55:42 -0400 Date: Mon, 20 Jun 2005 16:55:42 -0400 To: Garth Goodson Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Message-ID: <20050620205542.GF12019@fieldses.org> References: <42B71B8F.6020704@netapp.com> <20050620200414.GC12019@fieldses.org> <42B72C32.1010404@netapp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <42B72C32.1010404@netapp.com> User-Agent: Mutt/1.5.9i From: "J. Bruce Fields" X-Spam-Score: 0.0 (/) X-Scan-Signature: d17f825e43c9aed4fd65b7edddddec89 Cc: nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org On Mon, Jun 20, 2005 at 01:50:58PM -0700, Garth Goodson wrote: > The last sentence is saying something different. It is saying that if > the mapping between data and the deviceID changes, then the layout > should be recalled -- not the mapping between the deviceID and the device. OK, I guess I see. Though maybe that "SHOULD" should be a "MUST". > There are a number of wording changes that will be made (throughout the > document). I have not commented on these types of changes. Fair enough.--b. _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Mon Jun 20 16:59:49 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkTMz-0002n8-3k; Mon, 20 Jun 2005 16:59:49 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkTMu-0002n3-8X for nfsv4@megatron.ietf.org; Mon, 20 Jun 2005 16:59:47 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id QAA09775 for ; Mon, 20 Jun 2005 16:59:42 -0400 (EDT) Received: from mx2.netapp.com ([216.240.18.37]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DkTkh-0005Pm-5g for nfsv4@ietf.org; Mon, 20 Jun 2005 17:24:19 -0400 Received: from smtp2.corp.netapp.com (10.57.159.114) by mx2.netapp.com with ESMTP; 20 Jun 2005 13:59:34 -0700 X-IronPort-AV: i="3.93,216,1115017200"; d="scan'208"; a="250313883:sNHT22683748" Received: from [10.34.24.132] (loderunner.hq.netapp.com [10.34.24.132]) by smtp2.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id j5KKxYSq009346; Mon, 20 Jun 2005 13:59:34 -0700 (PDT) Message-ID: <42B72E36.5020005@netapp.com> Date: Mon, 20 Jun 2005 13:59:34 -0700 From: Garth Goodson User-Agent: Debian Thunderbird 1.0.2 (X11/20050602) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Marc Eshel Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Scan-Signature: 2409bba43e9c8d580670fda8b695204a Content-Transfer-Encoding: 7bit Cc: nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org I agree. I don't think it need be an error, but if the system isn't designed to handle it, it would be nice if the data servers did/could reply with an error. -Garth Marc Eshel wrote: > > > nfsv4-bounces@ietf.org wrote on 06/20/2005 12:39:59 PM: > > > 5.1 File Striping and Data Access > > > > Change: simplify striping layout -- have enum for SPARSE vs. DENSE > > layout instead of skip and start offset > > > > Issue: think about what error gets returned if a client performs a > > non(READ/WRITE/PUTFH/COMMIT) at a data server; issue: this may be a > > problem if a regular nfsv4 data server is used as it has no way to > > differentiate accesses. > > > > Why is should it be an error in the first place. I would like all the > nodes in my cluster filesystem to take the roles of metadata server or > data server for each file independently. > Marc. _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Mon Jun 20 17:04:23 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkTRP-0003Gp-UD; Mon, 20 Jun 2005 17:04:23 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkTRM-0003GT-C2 for nfsv4@megatron.ietf.org; Mon, 20 Jun 2005 17:04:22 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id RAA10205 for ; Mon, 20 Jun 2005 17:04:18 -0400 (EDT) Received: from newman.eecs.umich.edu ([141.213.4.11]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DkTp7-0005Wi-6Z for nfsv4@ietf.org; Mon, 20 Jun 2005 17:28:55 -0400 Received: from willow.eecs.umich.edu (willow.eecs.umich.edu [141.213.4.14]) by newman.eecs.umich.edu (8.13.2/8.13.0) with ESMTP id j5KL3vh2029558 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 20 Jun 2005 17:03:57 -0400 Received: from willow.eecs.umich.edu (localhost.eecs.umich.edu [127.0.0.1]) by willow.eecs.umich.edu (8.13.1/8.13.0) with ESMTP id j5KL3qEZ032141; Mon, 20 Jun 2005 17:03:52 -0400 Received: from localhost (dhildebz@localhost) by willow.eecs.umich.edu (8.13.1/8.13.1/Submit) with ESMTP id j5KL3l3g032138; Mon, 20 Jun 2005 17:03:47 -0400 Date: Mon, 20 Jun 2005 17:03:47 -0400 (EDT) From: Dean Hildebrand To: Garth Goodson Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) In-Reply-To: <42B72E36.5020005@netapp.com> Message-ID: References: <42B72E36.5020005@netapp.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.0.3 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on newman.eecs.umich.edu X-Virus-Scan: : UVSCAN at UoM/EECS X-Spam-Score: 0.0 (/) X-Scan-Signature: 8b30eb7682a596edff707698f4a80f7d Cc: Marc Eshel , nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org If they are stock, how can they return a new error unless they know something, in which case they are no longer stock. Dean On Mon, 20 Jun 2005, Garth Goodson wrote: > I agree. I don't think it need be an error, but if the system isn't > designed to handle it, it would be nice if the data servers did/could > reply with an error. > > > -Garth > > Marc Eshel wrote: > > > > > > nfsv4-bounces@ietf.org wrote on 06/20/2005 12:39:59 PM: > > > > > 5.1 File Striping and Data Access > > > > > > Change: simplify striping layout -- have enum for SPARSE vs. DENSE > > > layout instead of skip and start offset > > > > > > Issue: think about what error gets returned if a client performs a > > > non(READ/WRITE/PUTFH/COMMIT) at a data server; issue: this may be a > > > problem if a regular nfsv4 data server is used as it has no way to > > > differentiate accesses. > > > > > > > Why is should it be an error in the first place. I would like all the > > nodes in my cluster filesystem to take the roles of metadata server or > > data server for each file independently. > > Marc. > > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www1.ietf.org/mailman/listinfo/nfsv4 > _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Mon Jun 20 18:06:24 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkUPQ-0004w8-Td; Mon, 20 Jun 2005 18:06:24 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkUPO-0004vx-9X for nfsv4@megatron.ietf.org; Mon, 20 Jun 2005 18:06:22 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA14359 for ; Mon, 20 Jun 2005 18:06:19 -0400 (EDT) Received: from mx1.netapp.com ([216.240.18.38]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DkUnA-00075z-MG for nfsv4@ietf.org; Mon, 20 Jun 2005 18:30:58 -0400 Received: from smtp2.corp.netapp.com (10.57.159.114) by mx1.netapp.com with ESMTP; 20 Jun 2005 15:06:11 -0700 X-IronPort-AV: i="3.93,216,1115017200"; d="scan'208"; a="199992756:sNHT17006888" Received: from [10.34.24.132] (loderunner.hq.netapp.com [10.34.24.132]) by smtp2.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id j5KM6AMa023904; Mon, 20 Jun 2005 15:06:10 -0700 (PDT) Message-ID: <42B73DD2.9010908@netapp.com> Date: Mon, 20 Jun 2005 15:06:10 -0700 From: Garth Goodson User-Agent: Debian Thunderbird 1.0.2 (X11/20050602) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Dean Hildebrand Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) References: <42B72E36.5020005@netapp.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Scan-Signature: 50a516d93fd399dc60588708fd9a3002 Content-Transfer-Encoding: 7bit Cc: nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org Correct, that is one of the issues I mentioned. But I think there are cases where the data servers are neither stock, nor where it is desirable that they service more than I/O requests (i.e., metadata requests) in which case they may return an error (and that error should be decided upon). -Garth Dean Hildebrand wrote: > If they are stock, how can they return a new error unless they know > something, in which case they are no longer stock. > Dean > > On Mon, 20 Jun 2005, Garth Goodson wrote: > > >>I agree. I don't think it need be an error, but if the system isn't >>designed to handle it, it would be nice if the data servers did/could >>reply with an error. >> >> >>-Garth >> >>Marc Eshel wrote: >> >>> >>>nfsv4-bounces@ietf.org wrote on 06/20/2005 12:39:59 PM: >>> >>> > 5.1 File Striping and Data Access >>> > >>> > Change: simplify striping layout -- have enum for SPARSE vs. DENSE >>> > layout instead of skip and start offset >>> > >>> > Issue: think about what error gets returned if a client performs a >>> > non(READ/WRITE/PUTFH/COMMIT) at a data server; issue: this may be a >>> > problem if a regular nfsv4 data server is used as it has no way to >>> > differentiate accesses. >>> > >>> >>>Why is should it be an error in the first place. I would like all the >>>nodes in my cluster filesystem to take the roles of metadata server or >>>data server for each file independently. >>>Marc. >> >>_______________________________________________ >>nfsv4 mailing list >>nfsv4@ietf.org >>https://www1.ietf.org/mailman/listinfo/nfsv4 >> _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Tue Jun 21 09:39:20 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkiyG-0000OV-4V; Tue, 21 Jun 2005 09:39:20 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkiyF-0000OA-4M for nfsv4@megatron.ietf.org; Tue, 21 Jun 2005 09:39:19 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id JAA18952 for ; Tue, 21 Jun 2005 09:39:17 -0400 (EDT) Received: from mx1.netapp.com ([216.240.18.38]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DkjMA-0005Au-MC for nfsv4@ietf.org; Tue, 21 Jun 2005 10:04:03 -0400 Received: from smtp1.corp.netapp.com (10.57.156.124) by mx1.netapp.com with ESMTP; 21 Jun 2005 06:39:10 -0700 X-IronPort-AV: i="3.93,217,1115017200"; d="scan'208"; a="201089316:sNHT22354116" Received: from svlexc03.hq.netapp.com (svlexc03.corp.netapp.com [10.57.156.149]) by smtp1.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id j5LDd8E8012973; Tue, 21 Jun 2005 06:39:09 -0700 (PDT) Received: from burgundy.hq.netapp.com ([10.56.10.66]) by svlexc03.hq.netapp.com with Microsoft SMTPSVC(6.0.3790.0); Tue, 21 Jun 2005 06:39:08 -0700 Received: from exnane01.hq.netapp.com ([10.97.0.61]) by burgundy.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.6713); Tue, 21 Jun 2005 06:39:08 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Date: Tue, 21 Jun 2005 09:39:07 -0400 Message-ID: Thread-Topic: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Thread-Index: AcV15Lo7yIJ4MUpzQnSQw+wJQEsoPgAeyb5w From: "Noveck, Dave" To: "Goodson, Garth" , "Dean Hildebrand" X-OriginalArrivalTime: 21 Jun 2005 13:39:08.0545 (UTC) FILETIME=[99968F10:01C57666] X-Spam-Score: 0.0 (/) X-Scan-Signature: 6d95a152022472c7d6cdf886a0424dc6 Content-Transfer-Encoding: quoted-printable Cc: nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org Hold on. I'm not clear exactly how people are supposing that this will work. If I have a given server and let us say it is the metadata server for the root directory of an fs, then with v4 (and so far pnfs) as it stands it is the metadata server for every every descendent object until and unless the fsid changes. If a client were to decide arbitrarily that it could try to do metadata operations on a different server with that same filehandle, then I would say that that client=20 is seriouly confused and we'd like to know about it. If the server is deciding, in Marc's words, that "nodes in my cluster=20 filesystem ... take the roles of metadata server or data server for=20 each file independently", how is the client to determine what the=20 proper metadata server for each file is? Are we talking about some protocol extension beyond what has already been discussed for pnfs to distribute the data server role? We haven't talked about anything to specifically support distribution of the metadata server role for a filesystem. I'm not saying that that would be a bad thing to have,=20 but it seems to be a new thing and right now I think the pnfs focus=20 should be more on nailing down the stuff already discussed. It is possible with a cluster filesystem to have multiple clients each mount different servers as the metadata server and leave it to the cluster filesystem to provide the coherence for metadata operations. In that case multiple servers would be acting as metadata servers for what is really the same filesystem. However, in that case, I would argue that we still want the language and the error in Garth's draft (even if Garth now seems to backing away from it). Suppose a client is talking to server A and a file has filehandle X and that file has two stripes, one with handle P on server B and the other with handle Q on server C. Those particular handles (should) give you=20 the ability to do data operations and nothing else. If you do metadata=20 operations with them, you should get an error, since you are doing something that pnfs does not allow. This is without regard to the fact that B and C may have the ability to act as metadata servers for other handles. However, they should not act as metadata servers for stripes P and Q since P and Q are stripes and not files. Even if=20 you had a file with a single stripe, the handle for that stripe as a file should be different from the one that only gives the client the right to do IO, P and P' let's say. If a client takes a handle that=20 gives him the right to do IO (he got it from a layout) and uses it for metadata operations, he is violating the protocol and should get an error (and the server SHOULD make sure that he gets one). -----Original Message----- From: Goodson, Garth=20 Sent: Monday, June 20, 2005 6:06 PM To: Dean Hildebrand Cc: nfsv4@ietf.org Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Correct, that is one of the issues I mentioned. But I think there are=20 cases where the data servers are neither stock, nor where it is=20 desirable that they service more than I/O requests (i.e., metadata=20 requests) in which case they may return an error (and that error should=20 be decided upon). -Garth Dean Hildebrand wrote: > If they are stock, how can they return a new error unless they know > something, in which case they are no longer stock. > Dean >=20 > On Mon, 20 Jun 2005, Garth Goodson wrote: >=20 >=20 >>I agree. I don't think it need be an error, but if the system isn't >>designed to handle it, it would be nice if the data servers did/could >>reply with an error. >> >> >>-Garth >> >>Marc Eshel wrote: >> >>> >>>nfsv4-bounces@ietf.org wrote on 06/20/2005 12:39:59 PM: >>> >>> > 5.1 File Striping and Data Access >>> > >>> > Change: simplify striping layout -- have enum for SPARSE vs. DENSE >>> > layout instead of skip and start offset >>> > >>> > Issue: think about what error gets returned if a client performs a >>> > non(READ/WRITE/PUTFH/COMMIT) at a data server; issue: this may be = a >>> > problem if a regular nfsv4 data server is used as it has no way to >>> > differentiate accesses. >>> > >>> >>>Why is should it be an error in the first place. I would like all the >>>nodes in my cluster filesystem to take the roles of metadata server = or >>>data server for each file independently. >>>Marc. >> >>_______________________________________________ >>nfsv4 mailing list >>nfsv4@ietf.org >>https://www1.ietf.org/mailman/listinfo/nfsv4 >> _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Tue Jun 21 12:51:17 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dkly0-00068h-Vm; Tue, 21 Jun 2005 12:51:16 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dklxy-00067n-VR; Tue, 21 Jun 2005 12:51:15 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id MAA07237; Tue, 21 Jun 2005 12:51:12 -0400 (EDT) Received: from e2.ny.us.ibm.com ([32.97.182.142]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DkmLv-0002PH-N0; Tue, 21 Jun 2005 13:16:00 -0400 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e2.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j5LGovqZ018436; Tue, 21 Jun 2005 12:50:57 -0400 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay02.pok.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j5LGosKK246976; Tue, 21 Jun 2005 12:50:57 -0400 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11/8.13.3) with ESMTP id j5LGorhA027194; Tue, 21 Jun 2005 12:50:54 -0400 Received: from [9.56.227.90] (d01ml604.pok.ibm.com [9.56.227.90]) by d01av02.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j5LGor5b027184; Tue, 21 Jun 2005 12:50:53 -0400 In-Reply-To: To: "Noveck, Dave" Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) MIME-Version: 1.0 X-Mailer: Lotus Notes Build V70_M4_01112005 Beta 3NP January 11, 2005 Message-ID: From: Marc Eshel Date: Tue, 21 Jun 2005 09:50:44 -0700 X-MIMETrack: Serialize by Router on D01ML604/01/M/IBM(Build V70_06092005|June 09, 2005) at 06/21/2005 12:50:53, Serialize complete at 06/21/2005 12:50:53 X-Spam-Score: 0.0 (/) X-Scan-Signature: dd887a8966a4c4c217a52303814d0b5f Cc: "Goodson, Garth" , nfsv4-bounces@ietf.org, nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============1887892119==" Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org This is a multipart message in MIME format. --===============1887892119== Content-Type: multipart/alternative; boundary="=_alternative 005C7F9688257027_=" This is a multipart message in MIME format. --=_alternative 005C7F9688257027_= Content-Type: text/plain; charset="US-ASCII" nfsv4-bounces@ietf.org wrote on 06/21/2005 06:39:07 AM: > Hold on. I'm not clear exactly how people are supposing that this > will work. > > If I have a given server and let us say it is the metadata server > for the root directory of an fs, then with v4 (and so far pnfs) as > it stands it is the metadata server for every every descendent object > until and unless the fsid changes. If a client were to decide > arbitrarily that it could try to do metadata operations on a different > server with that same filehandle, then I would say that that client > is seriouly confused and we'd like to know about it. > > If the server is deciding, in Marc's words, that "nodes in my cluster > filesystem ... take the roles of metadata server or data server for > each file independently", how is the client to determine what the > proper metadata server for each file is? Are we talking about some > protocol extension beyond what has already been discussed for pnfs > to distribute the data server role? We haven't talked about anything > to specifically support distribution of the metadata server role for > a filesystem. I'm not saying that that would be a bad thing to have, > but it seems to be a new thing and right now I think the pnfs focus > should be more on nailing down the stuff already discussed. With fs-locations we give the client a list of metadata server that can act as the server for given filesystem and the client can choose which one to use and switch among them at will or because of some failure. I agree that once you got a layout for a file you should follow the layout instruction and read/write only from the specified nodes and get an error if you don't. > It is possible with a cluster filesystem to have multiple clients > each mount different servers as the metadata server and leave it to > the cluster filesystem to provide the coherence for metadata operations. > In that case multiple servers would be acting as metadata servers for > what is really the same filesystem. However, in that case, I would > argue that we still want the language and the error in Garth's draft > (even if Garth now seems to backing away from it). Suppose a client > is talking to server A and a file has filehandle X and that file > has two stripes, one with handle P on server B and the other with > handle Q on server C. Those particular handles (should) give you > the ability to do data operations and nothing else. If you do metadata > operations with them, you should get an error, since you are doing > something that pnfs does not allow. This is without regard to the > fact that B and C may have the ability to act as metadata servers > for other handles. However, they should not act as metadata servers > for stripes P and Q since P and Q are stripes and not files. Even if > you had a file with a single stripe, the handle for that stripe as a > file should be different from the one that only gives the client the > right to do IO, P and P' let's say. If a client takes a handle that > gives him the right to do IO (he got it from a layout) and uses it > for metadata operations, he is violating the protocol and should get > an error (and the server SHOULD make sure that he gets one). Yes, once you started to use a metadata server A for file X you should stick to it as the metadata server for the file until you are done with it and follow the layout to use the appropriate data servers. I am not sure why the restriction on the usage of file handles. It is an add complication to the server implementation to add information to the fh that restricts it usage from specific nodes. We should say that the client should not use fh given by layout for other purposes but not require an error if it does. Marc. > > -----Original Message----- > From: Goodson, Garth > Sent: Monday, June 20, 2005 6:06 PM > To: Dean Hildebrand > Cc: nfsv4@ietf.org > Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) > > > Correct, that is one of the issues I mentioned. But I think there are > cases where the data servers are neither stock, nor where it is > desirable that they service more than I/O requests (i.e., metadata > requests) in which case they may return an error (and that error should > be decided upon). > > -Garth > > Dean Hildebrand wrote: > > If they are stock, how can they return a new error unless they know > > something, in which case they are no longer stock. > > Dean > > > > On Mon, 20 Jun 2005, Garth Goodson wrote: > > > > > >>I agree. I don't think it need be an error, but if the system isn't > >>designed to handle it, it would be nice if the data servers did/could > >>reply with an error. > >> > >> > >>-Garth > >> > >>Marc Eshel wrote: > >> > >>> > >>>nfsv4-bounces@ietf.org wrote on 06/20/2005 12:39:59 PM: > >>> > >>> > 5.1 File Striping and Data Access > >>> > > >>> > Change: simplify striping layout -- have enum for SPARSE vs. DENSE > >>> > layout instead of skip and start offset > >>> > > >>> > Issue: think about what error gets returned if a client performs a > >>> > non(READ/WRITE/PUTFH/COMMIT) at a data server; issue: this may be a > >>> > problem if a regular nfsv4 data server is used as it has no way to > >>> > differentiate accesses. > >>> > > >>> > >>>Why is should it be an error in the first place. I would like all the > >>>nodes in my cluster filesystem to take the roles of metadata server or > >>>data server for each file independently. > >>>Marc. > >> > >>_______________________________________________ > >>nfsv4 mailing list > >>nfsv4@ietf.org > >>https://www1.ietf.org/mailman/listinfo/nfsv4 > >> > > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www1.ietf.org/mailman/listinfo/nfsv4 > > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www1.ietf.org/mailman/listinfo/nfsv4 --=_alternative 005C7F9688257027_= Content-Type: text/html; charset="US-ASCII"

nfsv4-bounces@ietf.org wrote on 06/21/2005 06:39:07 AM:

> Hold on.  I'm not clear exactly how people are supposing that this
> will work.
>
> If I have a given server and let us say it is the metadata server
> for the root directory of an fs, then with v4 (and so far pnfs) as
> it stands it is the metadata server for every every descendent object
> until and unless the fsid changes.  If a client were to decide
> arbitrarily that it could try to do metadata operations on a different
> server with that same filehandle, then I would say that that client
> is seriouly confused and we'd like to know about it.
>

> If the server is deciding, in Marc's words, that "nodes in my cluster
> filesystem ... take the roles of metadata server or data server for
> each file independently", how is the client to determine what the
> proper metadata server for each file is?  Are we talking about some
> protocol extension beyond what has already been discussed for pnfs
> to distribute the data server role?  We haven't talked about anything
> to specifically support distribution of the metadata server role for
> a filesystem.  I'm not saying that that would be a bad thing to have,
> but it seems to be a new thing and right now I think the pnfs focus
> should be more on nailing down the stuff already discussed.

With fs-locations we give the client a list of metadata server that can act as the server for given filesystem and the client can choose which one to use and switch among them at will or because of some failure. I agree that once you got a layout for a file you should follow the layout instruction and read/write only from the specified nodes and get an error if you don't.
 
> It is possible with a cluster filesystem to have multiple clients
> each mount different servers as the metadata server and leave it to
> the cluster filesystem to provide the coherence for metadata operations.
> In that case multiple servers would be acting as metadata servers for
> what is really the same filesystem.  However, in that case, I would
> argue that we still want the language and the error in Garth's draft
> (even if Garth now seems to backing away from it).  Suppose a client
> is talking to server A and a file has filehandle X and that file
> has two stripes, one with handle P on server B and the other with
> handle Q on server C.  Those particular handles (should) give you
> the ability to do data operations and nothing else.  If you do metadata
> operations with them, you should get an error, since you are doing
> something that pnfs does not allow.  This is without regard to the
> fact that B and C may have the ability to act as metadata servers
> for other handles.  However, they should not act as metadata servers
> for stripes P and Q since P and Q are stripes and not files.  Even if
> you had a file with a single stripe, the handle for that stripe as a
> file should be different from the one that only gives the client the
> right to do IO, P and P' let's say.  If a client takes a handle that
> gives him the right to do IO (he got it from a layout) and uses it
> for metadata operations, he is violating the protocol and should get
> an error (and the server SHOULD make sure that he gets one).

Yes, once you started to use a metadata server A for file X you should stick to it as the metadata server for the file until you are done with it and follow the layout to use the appropriate data servers. I am not sure why the restriction on the usage of file handles. It is an add complication to the server implementation to add information to the fh that restricts it usage from specific nodes. We should say that the client should not use fh given by layout for other purposes but not require an error if it does.

Marc.    

>
> -----Original Message-----
> From: Goodson, Garth
> Sent: Monday, June 20, 2005 6:06 PM
> To: Dean Hildebrand
> Cc: nfsv4@ietf.org
> Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt)
>
>
> Correct, that is one of the issues I mentioned.  But I think there are
> cases where the data servers are neither stock, nor where it is
> desirable that they service more than I/O requests (i.e., metadata
> requests) in which case they may return an error (and that error should
> be decided upon).
>
> -Garth
>
> Dean Hildebrand wrote:
> > If they are stock, how can they return a new error unless they know
> > something, in which case they are no longer stock.
> > Dean
> >
> > On Mon, 20 Jun 2005, Garth Goodson wrote:
> >
> >
> >>I agree.  I don't think it need be an error, but if the system isn't
> >>designed to handle it, it would be nice if the data servers did/could
> >>reply with an error.
> >>
> >>
> >>-Garth
> >>
> >>Marc Eshel wrote:
> >>
> >>>
> >>>nfsv4-bounces@ietf.org wrote on 06/20/2005 12:39:59 PM:
> >>>
> >>> > 5.1 File Striping and Data Access
> >>> >
> >>> > Change: simplify striping layout -- have enum for SPARSE vs. DENSE
> >>> > layout instead of skip and start offset
> >>> >
> >>> > Issue: think about what error gets returned if a client performs a
> >>> > non(READ/WRITE/PUTFH/COMMIT) at a data server; issue: this may be a
> >>> > problem if a regular nfsv4 data server is used as it has no way to
> >>> > differentiate accesses.
> >>> >
> >>>
> >>>Why is should it be an error in the first place. I would like all the
> >>>nodes in my cluster filesystem to take the roles of metadata server or
> >>>data server for each file independently.
> >>>Marc.
> >>
> >>_______________________________________________
> >>nfsv4 mailing list
> >>nfsv4@ietf.org
> >>https://www1.ietf.org/mailman/listinfo/nfsv4
> >>
>
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www1.ietf.org/mailman/listinfo/nfsv4
>
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www1.ietf.org/mailman/listinfo/nfsv4
--=_alternative 005C7F9688257027_=-- --===============1887892119== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 --===============1887892119==-- From nfsv4-bounces@ietf.org Tue Jun 21 12:59:37 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dkm65-0007aB-LN; Tue, 21 Jun 2005 12:59:37 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dkm63-0007a1-Bq; Tue, 21 Jun 2005 12:59:35 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id MAA07787; Tue, 21 Jun 2005 12:59:32 -0400 (EDT) Received: from mx1.netapp.com ([216.240.18.38]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DkmTz-0002az-RF; Tue, 21 Jun 2005 13:24:21 -0400 Received: from smtp2.corp.netapp.com (10.57.159.114) by mx1.netapp.com with ESMTP; 21 Jun 2005 09:59:25 -0700 X-IronPort-AV: i="3.93,218,1115017200"; d="scan'208"; a="201421316:sNHT19941100" Received: from [10.34.24.132] (loderunner.hq.netapp.com [10.34.24.132]) by smtp2.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id j5LGxNtr005338; Tue, 21 Jun 2005 09:59:23 -0700 (PDT) Message-ID: <42B8476B.2020006@netapp.com> Date: Tue, 21 Jun 2005 09:59:23 -0700 From: Garth Goodson User-Agent: Debian Thunderbird 1.0.2 (X11/20050602) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Marc Eshel Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Scan-Signature: 7d33c50f3756db14428398e2bdedd581 Content-Transfer-Encoding: 7bit Cc: nfsv4-bounces@ietf.org, "Noveck, Dave" , nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org > Yes, once you started to use a metadata server A for file X you should > stick to it as the metadata server for the file until you are done with > it and follow the layout to use the appropriate data servers. I am not > sure why the restriction on the usage of file handles. It is an add > complication to the server implementation to add information to the fh > that restricts it usage from specific nodes. We should say that the > client should not use fh given by layout for other purposes but not > require an error if it does. > > Marc. > Currently, I'm not sure that we can require an error (as much as I would like to require it). Some people want to be able to use unmodified NFSv4 servers as the data servers. If this is allowed those data servers will not know to return an error. This needs to be worked out. I still believe that it would help the client if an error could be returned (as Dave pointed out), especially if the data server can not service the metadata operation. -Garth _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Tue Jun 21 13:12:56 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkmIy-0001CK-5P; Tue, 21 Jun 2005 13:12:56 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkmIx-0001CF-3e for nfsv4@megatron.ietf.org; Tue, 21 Jun 2005 13:12:55 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id NAA08632 for ; Tue, 21 Jun 2005 13:12:52 -0400 (EDT) Received: from mx2.netapp.com ([216.240.18.37]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1Dkmgq-0002uX-It for nfsv4@ietf.org; Tue, 21 Jun 2005 13:37:41 -0400 Received: from smtp1.corp.netapp.com (10.57.156.124) by mx2.netapp.com with ESMTP; 21 Jun 2005 10:12:40 -0700 X-IronPort-AV: i="3.93,218,1115017200"; d="scan'208"; a="252475630:sNHT18287720" Received: from svlexc03.hq.netapp.com (svlexc03.corp.netapp.com [10.57.156.149]) by smtp1.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id j5LHCdEF022636 for ; Tue, 21 Jun 2005 10:12:39 -0700 (PDT) Received: from lavender.hq.netapp.com ([10.56.11.75]) by svlexc03.hq.netapp.com with Microsoft SMTPSVC(6.0.3790.0); Tue, 21 Jun 2005 10:12:39 -0700 Received: from exnane01.hq.netapp.com ([10.97.0.61]) by lavender.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.6713); Tue, 21 Jun 2005 10:12:39 -0700 Received: from tmt.netapp.com ([10.97.6.31]) by exnane01.hq.netapp.com with Microsoft SMTPSVC(6.0.3790.0); Tue, 21 Jun 2005 13:12:37 -0400 Message-Id: <6.2.1.2.2.20050621130609.048c2ca0@exnane01.nane.netapp.com> X-Mailer: QUALCOMM Windows Eudora Version 6.2.1.2 Date: Tue, 21 Jun 2005 13:12:26 -0400 To: nfsv4@ietf.org From: "Talpey, Thomas" Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) In-Reply-To: <42B8476B.2020006@netapp.com> References: <42B8476B.2020006@netapp.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-OriginalArrivalTime: 21 Jun 2005 17:12:37.0908 (UTC) FILETIME=[6C903D40:01C57684] X-Spam-Score: 0.0 (/) X-Scan-Signature: d6b246023072368de71562c0ab503126 X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org At 12:59 PM 6/21/2005, Garth Goodson wrote: >Currently, I'm not sure that we can require an error (as much as I would >like to require it). Some people want to be able to use unmodified >NFSv4 servers as the data servers. If this is allowed those data >servers will not know to return an error. In the presence of a session (which is guaranteed by pNFS, right?), the v4 server can easily remember whether the client previously requested and received a device list. Of course, this doesn't prove that a given client request is non-conflicting (the client could be mixing regular v4 and pNFS traffic on the session), but with a little common sense specsmanship I think it could be forbidden on a per-pNFS-session basis. In fact, I think it would be a good idea, since the server can and should be able to tune its resources to match the fact that clients won't be doing much data transfer to the metadata server. Tom. _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Tue Jun 21 13:14:32 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkmKW-0001ID-J6; Tue, 21 Jun 2005 13:14:32 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkmKV-0001I0-44; Tue, 21 Jun 2005 13:14:31 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id NAA08819; Tue, 21 Jun 2005 13:14:28 -0400 (EDT) Received: from e2.ny.us.ibm.com ([32.97.182.142]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DkmiR-0002xt-Ib; Tue, 21 Jun 2005 13:39:16 -0400 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e2.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j5LHEFF7000994; Tue, 21 Jun 2005 13:14:15 -0400 Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay02.pok.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j5LHEFKK253048; Tue, 21 Jun 2005 13:14:15 -0400 Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.12.11/8.13.3) with ESMTP id j5LHE5XC012983; Tue, 21 Jun 2005 13:14:05 -0400 Received: from [9.56.227.90] (d01ml604.pok.ibm.com [9.56.227.90]) by d01av03.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j5LHE5m8012793; Tue, 21 Jun 2005 13:14:05 -0400 In-Reply-To: <42B8476B.2020006@netapp.com> To: Garth Goodson Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) MIME-Version: 1.0 X-Mailer: Lotus Notes Build V70_M4_01112005 Beta 3NP January 11, 2005 Message-ID: From: Marc Eshel Date: Tue, 21 Jun 2005 10:13:51 -0700 X-MIMETrack: Serialize by Router on D01ML604/01/M/IBM(Build V70_06092005|June 09, 2005) at 06/21/2005 13:14:05, Serialize complete at 06/21/2005 13:14:05 X-Spam-Score: 0.0 (/) X-Scan-Signature: 3002fc2e661cd7f114cb6bae92fe88f1 Cc: nfsv4-bounces@ietf.org, "Noveck, Dave" , nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============0316768988==" Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org This is a multipart message in MIME format. --===============0316768988== Content-Type: multipart/alternative; boundary="=_alternative 005E9D5788257027_=" This is a multipart message in MIME format. --=_alternative 005E9D5788257027_= Content-Type: text/plain; charset="US-ASCII" nfsv4-bounces@ietf.org wrote on 06/21/2005 09:59:23 AM: > > Yes, once you started to use a metadata server A for file X you should > > stick to it as the metadata server for the file until you are done with > > it and follow the layout to use the appropriate data servers. I am not > > sure why the restriction on the usage of file handles. It is an add > > complication to the server implementation to add information to the fh > > that restricts it usage from specific nodes. We should say that the > > client should not use fh given by layout for other purposes but not > > require an error if it does. > > > > Marc. > > > > Currently, I'm not sure that we can require an error (as much as I would > like to require it). Some people want to be able to use unmodified > NFSv4 servers as the data servers. If this is allowed those data > servers will not know to return an error. This needs to be worked out. > I still believe that it would help the client if an error could be > returned (as Dave pointed out), especially if the data server can not > service the metadata operation. > > -Garth > If the data server can not service the metadata operation you will get an error. The only problem is if it can but it should not. I don't see why we would like to restrict it. Marc. --=_alternative 005E9D5788257027_= Content-Type: text/html; charset="US-ASCII"

nfsv4-bounces@ietf.org wrote on 06/21/2005 09:59:23 AM:

> > Yes, once you started to use a metadata server A for file X you should
> > stick to it as the metadata server for the file until you are done with
> > it and follow the layout to use the appropriate data servers. I am not
> > sure why the restriction on the usage of file handles. It is an add
> > complication to the server implementation to add information to the fh
> > that restricts it usage from specific nodes. We should say that the
> > client should not use fh given by layout for other purposes but not
> > require an error if it does.
> >
> > Marc.    
> >
>
> Currently, I'm not sure that we can require an error (as much as I would
> like to require it).  Some people want to be able to use unmodified
> NFSv4 servers as the data servers.  If this is allowed those data
> servers will not know to return an error.  This needs to be worked out.
>   I still believe that it would help the client if an error could be
> returned (as Dave pointed out), especially if the data server can not
> service the metadata operation.
>
> -Garth
>
If the data server can not service the metadata operation you will get an error. The only problem is if it can but it should not. I don't see why we would like to restrict it.

Marc. --=_alternative 005E9D5788257027_=-- --===============0316768988== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 --===============0316768988==-- From nfsv4-bounces@ietf.org Tue Jun 21 14:15:47 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DknEA-0003Wn-6V; Tue, 21 Jun 2005 14:12:02 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DknE8-0003Wa-5t; Tue, 21 Jun 2005 14:12:00 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id OAA13849; Tue, 21 Jun 2005 14:11:59 -0400 (EDT) Received: from e6.ny.us.ibm.com ([32.97.182.146]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1Dknc4-0004Wh-32; Tue, 21 Jun 2005 14:36:46 -0400 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e6.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j5LIBl0J025047; Tue, 21 Jun 2005 14:11:47 -0400 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay04.pok.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j5LIBlW1149032; Tue, 21 Jun 2005 14:11:47 -0400 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.12.11/8.13.3) with ESMTP id j5LIBkQN029939; Tue, 21 Jun 2005 14:11:46 -0400 Received: from [9.56.227.90] (d01ml604.pok.ibm.com [9.56.227.90]) by d01av04.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j5LIBkKZ029927; Tue, 21 Jun 2005 14:11:46 -0400 In-Reply-To: <6.2.1.2.2.20050621130609.048c2ca0@exnane01.nane.netapp.com> To: "Talpey, Thomas" Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) MIME-Version: 1.0 X-Mailer: Lotus Notes Build V70_M4_01112005 Beta 3NP January 11, 2005 Message-ID: From: Marc Eshel Date: Tue, 21 Jun 2005 11:11:38 -0700 X-MIMETrack: Serialize by Router on D01ML604/01/M/IBM(Build V70_06092005|June 09, 2005) at 06/21/2005 14:11:46, Serialize complete at 06/21/2005 14:11:46 X-Spam-Score: 0.1 (/) X-Scan-Signature: bdc523f9a54890b8a30dd6fd53d5d024 Cc: nfsv4@ietf.org, nfsv4-bounces@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============1076809633==" Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org This is a multipart message in MIME format. --===============1076809633== Content-Type: multipart/alternative; boundary="=_alternative 0063E7AE88257027_=" This is a multipart message in MIME format. --=_alternative 0063E7AE88257027_= Content-Type: text/plain; charset="US-ASCII" nfsv4-bounces@ietf.org wrote on 06/21/2005 10:12:26 AM: > At 12:59 PM 6/21/2005, Garth Goodson wrote: > >Currently, I'm not sure that we can require an error (as much as I would > >like to require it). Some people want to be able to use unmodified > >NFSv4 servers as the data servers. If this is allowed those data > >servers will not know to return an error. > > In the presence of a session (which is guaranteed by pNFS, right?), the > v4 server can easily remember whether the client previously requested > and received a device list. Of course, this doesn't prove that a given > client request is non-conflicting (the client could be mixing regular v4 > and pNFS traffic on the session), but with a little common sense > specsmanship I think it could be forbidden on a per-pNFS-session > basis. > > In fact, I think it would be a good idea, since the server can and should > be able to tune its resources to match the fact that clients won't be > doing much data transfer to the metadata server. > > Tom. > Yes it would be nice to configure some nodes to be just data server nodes but it would also be nice and useful to distribute all metadata and data server among all the nodes. The server has the control using exportfs and fs-locations to export its metadata nodes and layout information to list the data servers. If we say that the file handles given by layout should only be used for READ/WRITE/PUTFH/COMMIT it should be enough. I don't see the need for additional restrictions. If you do please explain why. Marc. --=_alternative 0063E7AE88257027_= Content-Type: text/html; charset="US-ASCII"

nfsv4-bounces@ietf.org wrote on 06/21/2005 10:12:26 AM:

> At 12:59 PM 6/21/2005, Garth Goodson wrote:
> >Currently, I'm not sure that we can require an error (as much as I would
> >like to require it).  Some people want to be able to use unmodified
> >NFSv4 servers as the data servers.  If this is allowed those data
> >servers will not know to return an error.
>
> In the presence of a session (which is guaranteed by pNFS, right?), the
> v4 server can easily remember whether the client previously requested
> and received a device list. Of course, this doesn't prove that a given
> client request is non-conflicting (the client could be mixing regular v4
> and pNFS traffic on the session), but with a little common sense
> specsmanship I think it could be forbidden on a per-pNFS-session
> basis.
>
> In fact, I think it would be a good idea, since the server can and should
> be able to tune its resources to match the fact that clients won't be
> doing much data transfer to the metadata server.
>
> Tom.
>
Yes it would be nice to configure some nodes to be just data server nodes but it would also be nice and useful to distribute all metadata and data server among all the nodes. The server has the control using exportfs and fs-locations to export its metadata nodes and layout information to list the data servers. If we say that the file handles given by layout should only be used for READ/WRITE/PUTFH/COMMIT it should be enough. I don't see the need for additional restrictions. If you do please explain why.

Marc. --=_alternative 0063E7AE88257027_=-- --===============1076809633== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 --===============1076809633==-- From nfsv4-bounces@ietf.org Tue Jun 21 16:51:33 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkpiV-0002fF-GK; Tue, 21 Jun 2005 16:51:33 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkpiQ-0002eh-OK; Tue, 21 Jun 2005 16:51:26 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id QAA13592; Tue, 21 Jun 2005 16:51:24 -0400 (EDT) Received: from mx1.netapp.com ([216.240.18.38]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1Dkq6P-0005Qh-Ba; Tue, 21 Jun 2005 17:16:14 -0400 Received: from smtp2.corp.netapp.com (10.57.159.114) by mx1.netapp.com with ESMTP; 21 Jun 2005 13:51:10 -0700 X-IronPort-AV: i="3.93,218,1115017200"; d="scan'208,217"; a="201840810:sNHT32839732" Received: from svlexc03.hq.netapp.com (svlexc03.corp.netapp.com [10.57.156.149]) by smtp2.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id j5LKp9iX005818; Tue, 21 Jun 2005 13:51:09 -0700 (PDT) Received: from lavender.hq.netapp.com ([10.56.11.75]) by svlexc03.hq.netapp.com with Microsoft SMTPSVC(6.0.3790.0); Tue, 21 Jun 2005 13:51:09 -0700 Received: from exnane01.hq.netapp.com ([10.97.0.61]) by lavender.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.6713); Tue, 21 Jun 2005 13:51:08 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Date: Tue, 21 Jun 2005 16:51:07 -0400 Message-ID: Thread-Topic: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Thread-Index: AcV2gWVZzDC6hwe2R4CMTxLYr3RIDwAHJckg From: "Noveck, Dave" To: "Marc Eshel" X-OriginalArrivalTime: 21 Jun 2005 20:51:08.0887 (UTC) FILETIME=[F350DE70:01C576A2] X-Spam-Score: 0.9 (/) X-Scan-Signature: e05124dbe0e171b371ff9d88326a1ab7 Cc: "Goodson, Garth" , nfsv4-bounces@ietf.org, nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============2090594847==" Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org This is a multi-part message in MIME format. --===============2090594847== Content-class: urn:content-classes:message Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C576A2.F279A5EF" This is a multi-part message in MIME format. ------_=_NextPart_001_01C576A2.F279A5EF Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable > With fs-locations we give the client a list of metadata server=20 > that can act as the server for given filesystem and the client=20 > can choose which one to use and switch among them at will or=20 > because of some failure.=20 =20 The problem is that in v4.0 as it stands there is an enormous range of ways that a client can interpret the fs_locations list: =20 "Here is a list of servers and you can switch when there is failure, after refetching your change attibutes (because change attribute is within the purview of a particular server)." (and that is not the most extreme case of discontinuity -- some people are thinking that=20 filehandles and fileids will just change or that you may wind up with a slighly out-of-data version of the data). =20 "Here is a list of servers and you can switch when there is a=20 failure with no discontinuity of access (changes in fh's, stateid's, fileids, change attributes), although changing is a big deal and you shouldn't do it without good cause." =20 "Here is a list of servers and you can switch when there is a failure with no discontinuity of access or even at will since there is no big cost to switch." =20 "Here is a list of servers and you can access any of these=20 servers as you will at the same time (multi-pathing or a=20 cluster fs) since they are all effectively the same thing". =20 The problem is that the client only knows about the list and has no way of knowing which of the statements above is associated with the list of servers he is getting. I have been thinking about a locations_info attribute for 4.1 that would allow the server to tell the client which of those he meant and also=20 give preference information (local vs. remote copies, absolutely up-to-data vs. slightly out-of-data copies). =20 > is solely within the purview of a particular server =20 > I agree that once you got a layout for a file you should follow=20 > the layout instruction and read/write only from the specified=20 > nodes and get an error if you don't.=20 =20 Then it sounds like we are in violent agreement, except maybe for choice of modal auxiliaries or capitalization. =20 You say "you should follow the layout instruction" and I'm torn between saying "you SHOULD follow the layout instruction" and "you MUST follow the layout instruction". =20 You say "[should] get an error if you don't" and I say "the server SHOULD give you an error if you don't". -----Original Message----- From: Marc Eshel [mailto:eshel@almaden.ibm.com]=20 Sent: Tuesday, June 21, 2005 12:51 PM To: Noveck, Dave Cc: Dean Hildebrand; Goodson, Garth; nfsv4@ietf.org; nfsv4-bounces@ietf.org Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) =09 =09 nfsv4-bounces@ietf.org wrote on 06/21/2005 06:39:07 AM: =09 > Hold on. I'm not clear exactly how people are supposing that this > will work. >=20 > If I have a given server and let us say it is the metadata server > for the root directory of an fs, then with v4 (and so far pnfs) as > it stands it is the metadata server for every every descendent object > until and unless the fsid changes. If a client were to decide > arbitrarily that it could try to do metadata operations on a different > server with that same filehandle, then I would say that that client=20 > is seriouly confused and we'd like to know about it. >=20 > If the server is deciding, in Marc's words, that "nodes in my cluster=20 > filesystem ... take the roles of metadata server or data server for=20 > each file independently", how is the client to determine what the=20 > proper metadata server for each file is? Are we talking about some > protocol extension beyond what has already been discussed for pnfs > to distribute the data server role? We haven't talked about anything > to specifically support distribution of the metadata server role for > a filesystem. I'm not saying that that would be a bad thing to have,=20 > but it seems to be a new thing and right now I think the pnfs focus=20 > should be more on nailing down the stuff already discussed. =09 With fs-locations we give the client a list of metadata server that can act as the server for given filesystem and the client can choose which one to use and switch among them at will or because of some failure. I agree that once you got a layout for a file you should follow the layout instruction and read/write only from the specified nodes and get an error if you don't.=20 =20 > It is possible with a cluster filesystem to have multiple clients > each mount different servers as the metadata server and leave it to > the cluster filesystem to provide the coherence for metadata operations. > In that case multiple servers would be acting as metadata servers for > what is really the same filesystem. However, in that case, I would > argue that we still want the language and the error in Garth's draft > (even if Garth now seems to backing away from it). Suppose a client > is talking to server A and a file has filehandle X and that file > has two stripes, one with handle P on server B and the other with > handle Q on server C. Those particular handles (should) give you=20 > the ability to do data operations and nothing else. If you do metadata=20 > operations with them, you should get an error, since you are doing > something that pnfs does not allow. This is without regard to the > fact that B and C may have the ability to act as metadata servers > for other handles. However, they should not act as metadata servers > for stripes P and Q since P and Q are stripes and not files. Even if=20 > you had a file with a single stripe, the handle for that stripe as a > file should be different from the one that only gives the client the > right to do IO, P and P' let's say. If a client takes a handle that=20 > gives him the right to do IO (he got it from a layout) and uses it > for metadata operations, he is violating the protocol and should get > an error (and the server SHOULD make sure that he gets one). =09 Yes, once you started to use a metadata server A for file X you should stick to it as the metadata server for the file until you are done with it and follow the layout to use the appropriate data servers. I am not sure why the restriction on the usage of file handles. It is an add complication to the server implementation to add information to the fh that restricts it usage from specific nodes. We should say that the client should not use fh given by layout for other purposes but not require an error if it does.=20 =09 Marc. =20 =09 >=20 > -----Original Message----- > From: Goodson, Garth=20 > Sent: Monday, June 20, 2005 6:06 PM > To: Dean Hildebrand > Cc: nfsv4@ietf.org > Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) >=20 >=20 > Correct, that is one of the issues I mentioned. But I think there are=20 > cases where the data servers are neither stock, nor where it is=20 > desirable that they service more than I/O requests (i.e., metadata=20 > requests) in which case they may return an error (and that error should=20 > be decided upon). >=20 > -Garth >=20 > Dean Hildebrand wrote: > > If they are stock, how can they return a new error unless they know > > something, in which case they are no longer stock. > > Dean > >=20 > > On Mon, 20 Jun 2005, Garth Goodson wrote: > >=20 > >=20 > >>I agree. I don't think it need be an error, but if the system isn't > >>designed to handle it, it would be nice if the data servers did/could > >>reply with an error. > >> > >> > >>-Garth > >> > >>Marc Eshel wrote: > >> > >>> > >>>nfsv4-bounces@ietf.org wrote on 06/20/2005 12:39:59 PM: > >>> > >>> > 5.1 File Striping and Data Access > >>> > > >>> > Change: simplify striping layout -- have enum for SPARSE vs. DENSE > >>> > layout instead of skip and start offset > >>> > > >>> > Issue: think about what error gets returned if a client performs a > >>> > non(READ/WRITE/PUTFH/COMMIT) at a data server; issue: this may be a > >>> > problem if a regular nfsv4 data server is used as it has no way to > >>> > differentiate accesses. > >>> > > >>> > >>>Why is should it be an error in the first place. I would like all the > >>>nodes in my cluster filesystem to take the roles of metadata server or > >>>data server for each file independently. > >>>Marc. > >> > >>_______________________________________________ > >>nfsv4 mailing list > >>nfsv4@ietf.org > >>https://www1.ietf.org/mailman/listinfo/nfsv4 > >> >=20 > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www1.ietf.org/mailman/listinfo/nfsv4 >=20 > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www1.ietf.org/mailman/listinfo/nfsv4 =09 ------_=_NextPart_001_01C576A2.F279A5EF Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Message
> With = fs-locations=20 we give the client a list of metadata server
> that = can act as=20 the server for given filesystem and the client=20
> can = choose which=20 one to use and switch among them at will or=20
> because of some failure.=20
 
The problem=20 is that in v4.0 as it stands there is an enormous
range of=20 ways that a client can interpret the fs_locations = list:
 
"Here is a list of servers and you can switch = when=20 there is failure,
after refetching your change attibutes = (because change=20 attribute
is within the purview of a particular = server)." =20 (and that is not the
most extreme case of discontinuity -- some = people are=20 thinking that
filehandles and fileids will just change or = that you=20 may wind up
with a slighly out-of-data version of the=20 data).
 
"Here is a list of servers and you can switch = when=20 there is a
failure with no discontinuity of access = (changes in=20 fh's, stateid's,
fileids, change attributes), although = changing is a big=20 deal
and you shouldn't do it without good=20 cause."
 
"Here is a list of servers and you can switch = when=20 there is a
failure with no discontinuity of access or = even at will=20 since
there is no big cost to=20 switch."
 
"Here is a list of servers and you can access = any of=20 these
servers as you will at the same time = (multi-pathing or=20 a
cluster fs) since they=20 are all effectively the same=20 thing".
 
The problem is that the client only knows = about the=20 list and
has no way of knowing which of the statements = above is=20 associated
with the list of servers he is getting.  = I have=20 been thinking
about a locations_info attribute for 4.1 that = would=20 allow the
server to tell the client which of those he = meant and=20 also
give preference information (local vs. remote = copies,=20 absolutely
up-to-data vs. slightly out-of-data=20 copies).
 
> is solely within the purview of a = particular=20 server
 
> I agree that once you got a = layout for a=20 file you should follow =
> the layout instruction and = read/write only=20 from the specified=20
> nodes and get an error if you=20 don't.
 
Then it sounds like we are = in violent=20 agreement, except maybe for
choice of modal = auxiliaries or=20 capitalization.
 
You say "you should follow = the layout=20 instruction" and I'm torn
between saying "you SHOULD = follow the=20 layout instruction" and
"you MUST follow the = layout=20 instruction".
 
You say "[should] get an = error if you=20 don't" and I say "the server
SHOULD give you an error = if you=20 don't".

-----Original Message-----
From: Marc = Eshel=20 [mailto:eshel@almaden.ibm.com]
Sent: Tuesday, June 21, 2005 = 12:51=20 PM
To: Noveck, Dave
Cc: Dean Hildebrand; Goodson, = Garth;=20 nfsv4@ietf.org; nfsv4-bounces@ietf.org
Subject: RE: [nfsv4] = pNFS=20 issues/changes = (draft-welch-pnfs-ops-02.txt)



nfsv4-bounces@ietf.org wrote on 06/21/2005 06:39:07 = AM:

>=20 Hold on.  I'm not clear exactly how people are supposing that=20 this
> will work.
>
> If I have a given server and = let us=20 say it is the metadata server
> for the root directory of an fs, = then=20 with v4 (and so far pnfs) as
> it stands it is the metadata = server for=20 every every descendent object
> until and unless the fsid = changes.=20  If a client were to decide
> arbitrarily that it could try = to do=20 metadata operations on a different
> server with that same = filehandle,=20 then I would say that that client
> is seriouly confused and = we'd like=20 to know about it.
>

> If = the server=20 is deciding, in Marc's words, that "nodes in my cluster
> = filesystem=20 ... take the roles of metadata server or data server for
> each = file=20 independently", how is the client to determine what the
> = proper=20 metadata server for each file is?  Are we talking about = some
>=20 protocol extension beyond what has already been discussed for = pnfs
> to=20 distribute the data server role?  We haven't talked about=20 anything
> to specifically support distribution of the metadata = server=20 role for
> a filesystem.  I'm not saying that that would be = a bad=20 thing to have,
> but it seems to be a new thing and right now I = think=20 the pnfs focus
> should be more on nailing down the stuff = already=20 discussed.

With fs-locations we = give the=20 client a list of metadata server that can act as the server for given=20 filesystem and the client can choose which one to use and switch among = them at=20 will or because of some failure. I agree that once you got a layout = for a file=20 you should follow the layout instruction and read/write only from the=20 specified nodes and get an error if you don't.
 
> It is possible with a cluster filesystem = to have=20 multiple clients
> each mount different servers as the metadata = server=20 and leave it to
> the cluster filesystem to provide the = coherence for=20 metadata operations.
> In that case multiple servers would be = acting as=20 metadata servers for
> what is really the same filesystem.=20  However, in that case, I would
> argue that we still want = the=20 language and the error in Garth's draft
> (even if Garth now = seems to=20 backing away from it).  Suppose a client
> is talking to = server A=20 and a file has filehandle X and that file
> has two stripes, one = with=20 handle P on server B and the other with
> handle Q on server C.=20  Those particular handles (should) give you
> the ability = to do=20 data operations and nothing else.  If you do metadata
> = operations=20 with them, you should get an error, since you are doing
> = something that=20 pnfs does not allow.  This is without regard to the
> fact = that B=20 and C may have the ability to act as metadata servers
> for = other=20 handles.  However, they should not act as metadata = servers
> for=20 stripes P and Q since P and Q are stripes and not files.  Even if =
> you had a file with a single stripe, the handle for that = stripe as=20 a
> file should be different from the one that only gives the = client=20 the
> right to do IO, P and P' let's say.  If a client = takes a=20 handle that
> gives him the right to do IO (he got it from a = layout)=20 and uses it
> for metadata operations, he is violating the = protocol and=20 should get
> an error (and the server SHOULD make sure that he = gets=20 one).

Yes, once you started to = use a=20 metadata server A for file X you should stick to it as the metadata = server for=20 the file until you are done with it and follow the layout to use the=20 appropriate data servers. I am not sure why the restriction on the = usage of=20 file handles. It is an add complication to the server implementation = to add=20 information to the fh that restricts it usage from specific nodes. We = should=20 say that the client should not use fh given by layout for other = purposes but=20 not require an error if it does.

Marc.=20    

>
> = -----Original=20 Message-----
> From: Goodson, Garth
> Sent: Monday, June = 20, 2005=20 6:06 PM
> To: Dean Hildebrand
> Cc: nfsv4@ietf.org
> = Subject: Re: [nfsv4] pNFS issues/changes = (draft-welch-pnfs-ops-02.txt)
>=20
>
> Correct, that is one of the issues I mentioned. =  But I=20 think there are
> cases where the data servers are neither = stock, nor=20 where it is
> desirable that they service more than I/O = requests (i.e.,=20 metadata
> requests) in which case they may return an error = (and that=20 error should
> be decided upon).
>
> = -Garth
>=20
> Dean Hildebrand wrote:
> > If they are stock, how = can they=20 return a new error unless they know
> > something, in which = case they=20 are no longer stock.
> > Dean
> >
> > On = Mon, 20=20 Jun 2005, Garth Goodson wrote:
> >
> >
> = >>I=20 agree.  I don't think it need be an error, but if the system=20 isn't
> >>designed to handle it, it would be nice if the = data=20 servers did/could
> >>reply with an error.
>=20 >>
> >>
> >>-Garth
> = >>
>=20 >>Marc Eshel wrote:
> >>
> = >>>
>=20 >>>nfsv4-bounces@ietf.org wrote on 06/20/2005 12:39:59 = PM:
>=20 >>>
> >>> > 5.1 File Striping and Data=20 Access
> >>> >
> >>> > Change: = simplify=20 striping layout -- have enum for SPARSE vs. DENSE
> >>> = >=20 layout instead of skip and start offset
> >>> = >
>=20 >>> > Issue: think about what error gets returned if a = client=20 performs a
> >>> > non(READ/WRITE/PUTFH/COMMIT) at a = data=20 server; issue: this may be a
> >>> > problem if a = regular=20 nfsv4 data server is used as it has no way to
> >>> = >=20 differentiate accesses.
> >>> >
> = >>>
>=20 >>>Why is should it be an error in the first place. I would = like all=20 the
> >>>nodes in my cluster filesystem to take the = roles of=20 metadata server or
> >>>data server for each file=20 independently.
> >>>Marc.
> >>
>=20 >>_______________________________________________
> = >>nfsv4=20 mailing list
> >>nfsv4@ietf.org
>=20 >>https://www1.ietf.org/mailman/listinfo/nfsv4
> = >>
>=20
> _______________________________________________
> nfsv4 = mailing=20 list
> nfsv4@ietf.org
>=20 https://www1.ietf.org/mailman/listinfo/nfsv4
>
>=20 _______________________________________________
> nfsv4 mailing=20 list
> nfsv4@ietf.org
>=20 = https://www1.ietf.org/mailman/listinfo/nfsv4
= =00 ------_=_NextPart_001_01C576A2.F279A5EF-- --===============2090594847== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 --===============2090594847==-- From nfsv4-bounces@ietf.org Tue Jun 21 18:27:48 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkrDg-0002mb-72; Tue, 21 Jun 2005 18:27:48 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkrDc-0002mT-HN; Tue, 21 Jun 2005 18:27:45 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA20295; Tue, 21 Jun 2005 18:27:41 -0400 (EDT) Received: from e1.ny.us.ibm.com ([32.97.182.141]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1Dkrbb-0000Xl-53; Tue, 21 Jun 2005 18:52:32 -0400 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e1.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j5LMRTwH030433; Tue, 21 Jun 2005 18:27:29 -0400 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay02.pok.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j5LMRTKK259646; Tue, 21 Jun 2005 18:27:29 -0400 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.12.11/8.13.3) with ESMTP id j5LMRTAx005040; Tue, 21 Jun 2005 18:27:29 -0400 Received: from [9.56.227.90] (d01ml604.pok.ibm.com [9.56.227.90]) by d01av04.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j5LMRTJM005037; Tue, 21 Jun 2005 18:27:29 -0400 In-Reply-To: To: "Noveck, Dave" Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) MIME-Version: 1.0 X-Mailer: Lotus Notes Build V70_M4_01112005 Beta 3NP January 11, 2005 Message-ID: From: Marc Eshel Date: Tue, 21 Jun 2005 15:27:20 -0700 X-MIMETrack: Serialize by Router on D01ML604/01/M/IBM(Build V70_06092005|June 09, 2005) at 06/21/2005 18:27:28, Serialize complete at 06/21/2005 18:27:28 Content-Type: text/plain; charset="US-ASCII" X-Spam-Score: 0.0 (/) X-Scan-Signature: 52f7a77164458f8c7b36b66787c853da Cc: "Goodson, Garth" , nfsv4-bounces@ietf.org, nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org "Noveck, Dave" wrote on 06/21/2005 01:51:07 PM: > > With fs-locations we give the client a list of metadata server > > that can act as the server for given filesystem and the client > > can choose which one to use and switch among them at will or > > because of some failure. > > The problem is that in v4.0 as it stands there is an enormous > range of ways that a client can interpret the fs_locations list: > > "Here is a list of servers and you can switch when there is failure, > after refetching your change attibutes (because change attribute > is within the purview of a particular server)." (and that is not the > most extreme case of discontinuity -- some people are thinking that > filehandles and fileids will just change or that you may wind up > with a slighly out-of-data version of the data). > > "Here is a list of servers and you can switch when there is a > failure with no discontinuity of access (changes in fh's, stateid's, > fileids, change attributes), although changing is a big deal > and you shouldn't do it without good cause." > > "Here is a list of servers and you can switch when there is a > failure with no discontinuity of access or even at will since > there is no big cost to switch." > > "Here is a list of servers and you can access any of these > servers as you will at the same time (multi-pathing or a > cluster fs) since they are all effectively the same thing". > > The problem is that the client only knows about the list and > has no way of knowing which of the statements above is associated > with the list of servers he is getting. I have been thinking > about a locations_info attribute for 4.1 that would allow the > server to tell the client which of those he meant and also > give preference information (local vs. remote copies, absolutely > up-to-data vs. slightly out-of-data copies). > > > is solely within the purview of a particular server > > > I agree that once you got a layout for a file you should follow > > the layout instruction and read/write only from the specified > > nodes and get an error if you don't. > > Then it sounds like we are in violent agreement, except maybe for > choice of modal auxiliaries or capitalization. > > You say "you should follow the layout instruction" and I'm torn > between saying "you SHOULD follow the layout instruction" and > "you MUST follow the layout instruction". > > You say "[should] get an error if you don't" and I say "the server > SHOULD give you an error if you don't". > I say should follow and not MUST follow because I am trying to avoid the complication to the server if it MUST enforce this rule which might not be a problem for the server in the first place. For example, I might want to allow the read of the same large file(no caching) from one set of data server for client A and a different set of data server for client B. Now to enforce the above rule the server need to some how encode into the file handle information about which client can read what from which data server? I prefer not have this added extra work on the server and just say that the client should follow the rule to guaranty successful operation. Marc. _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Tue Jun 21 18:33:46 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkrJS-0005Kn-Fc; Tue, 21 Jun 2005 18:33:46 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkrJO-0005Kc-TB; Tue, 21 Jun 2005 18:33:44 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA20853; Tue, 21 Jun 2005 18:33:40 -0400 (EDT) Received: from mx2.netapp.com ([216.240.18.37]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DkrhO-0000ho-0e; Tue, 21 Jun 2005 18:58:31 -0400 Received: from smtp2.corp.netapp.com (10.57.159.114) by mx2.netapp.com with ESMTP; 21 Jun 2005 15:33:30 -0700 X-IronPort-AV: i="3.93,219,1115017200"; d="scan'208"; a="252976577:sNHT18599380" Received: from [10.34.24.132] (loderunner.hq.netapp.com [10.34.24.132]) by smtp2.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id j5LMXUT0012533; Tue, 21 Jun 2005 15:33:30 -0700 (PDT) Message-ID: <42B895BA.2060501@netapp.com> Date: Tue, 21 Jun 2005 15:33:30 -0700 From: Garth Goodson User-Agent: Debian Thunderbird 1.0.2 (X11/20050602) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Marc Eshel Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Scan-Signature: 0fa76816851382eb71b0a882ccdc29ac Content-Transfer-Encoding: 7bit Cc: nfsv4-bounces@ietf.org, "Noveck, Dave" , nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org Marc Eshel wrote: > "Noveck, Dave" wrote on 06/21/2005 01:51:07 PM: > > >>>With fs-locations we give the client a list of metadata server >>>that can act as the server for given filesystem and the client >>>can choose which one to use and switch among them at will or >>>because of some failure. >> >>The problem is that in v4.0 as it stands there is an enormous >>range of ways that a client can interpret the fs_locations list: >> >>"Here is a list of servers and you can switch when there is failure, >>after refetching your change attibutes (because change attribute >>is within the purview of a particular server)." (and that is not the >>most extreme case of discontinuity -- some people are thinking that >>filehandles and fileids will just change or that you may wind up >>with a slighly out-of-data version of the data). >> >>"Here is a list of servers and you can switch when there is a >>failure with no discontinuity of access (changes in fh's, stateid's, >>fileids, change attributes), although changing is a big deal >>and you shouldn't do it without good cause." >> >>"Here is a list of servers and you can switch when there is a >>failure with no discontinuity of access or even at will since >>there is no big cost to switch." >> >>"Here is a list of servers and you can access any of these >>servers as you will at the same time (multi-pathing or a >>cluster fs) since they are all effectively the same thing". >> >>The problem is that the client only knows about the list and >>has no way of knowing which of the statements above is associated >>with the list of servers he is getting. I have been thinking >>about a locations_info attribute for 4.1 that would allow the >>server to tell the client which of those he meant and also >>give preference information (local vs. remote copies, absolutely >>up-to-data vs. slightly out-of-data copies). >> >> >>>is solely within the purview of a particular server >> >>>I agree that once you got a layout for a file you should follow >>>the layout instruction and read/write only from the specified >>>nodes and get an error if you don't. >> >>Then it sounds like we are in violent agreement, except maybe for >>choice of modal auxiliaries or capitalization. >> >>You say "you should follow the layout instruction" and I'm torn >>between saying "you SHOULD follow the layout instruction" and >>"you MUST follow the layout instruction". >> >>You say "[should] get an error if you don't" and I say "the server >>SHOULD give you an error if you don't". >> > > > I say should follow and not MUST follow because I am trying to avoid the > complication to the server if it MUST enforce this rule which might not be > a problem for the server in the first place. For example, I might want to > allow the read of the same large file(no caching) from one set of data > server for client A and a different set of data server for client B. Now > to enforce the above rule the server need to some how encode into the file > handle information about which client can read what from which data > server? I prefer not have this added extra work on the server and just say > that the client should follow the rule to guaranty successful operation. > Marc. I think your example can be handled by giving client A and client B different layouts for the same file. The filehandles in the different layouts can be different as would be the device IDs. As long as the metadata server controls the sharing modes (read-only vs. read/write) there shouldn't be a problem. -Garth _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Tue Jun 21 18:46:18 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkrVa-0001JW-Fq; Tue, 21 Jun 2005 18:46:18 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DkrVY-0001JM-9w; Tue, 21 Jun 2005 18:46:16 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA21781; Tue, 21 Jun 2005 18:46:13 -0400 (EDT) Received: from e4.ny.us.ibm.com ([32.97.182.144]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DkrtY-00013T-Oh; Tue, 21 Jun 2005 19:11:05 -0400 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e4.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j5LMk1j3025860; Tue, 21 Jun 2005 18:46:01 -0400 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay02.pok.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j5LMk1KK259260; Tue, 21 Jun 2005 18:46:01 -0400 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11/8.13.3) with ESMTP id j5LMk0SN006091; Tue, 21 Jun 2005 18:46:00 -0400 Received: from [9.56.227.90] (d01ml604.pok.ibm.com [9.56.227.90]) by d01av02.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j5LMk0av006076; Tue, 21 Jun 2005 18:46:00 -0400 In-Reply-To: <42B895BA.2060501@netapp.com> To: Garth Goodson Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) MIME-Version: 1.0 X-Mailer: Lotus Notes Build V70_M4_01112005 Beta 3NP January 11, 2005 Message-ID: From: Marc Eshel Date: Tue, 21 Jun 2005 15:45:51 -0700 X-MIMETrack: Serialize by Router on D01ML604/01/M/IBM(Build V70_06092005|June 09, 2005) at 06/21/2005 18:46:00, Serialize complete at 06/21/2005 18:46:00 Content-Type: text/plain; charset="US-ASCII" X-Spam-Score: 0.0 (/) X-Scan-Signature: 34d35111647d654d033d58d318c0d21a Cc: nfsv4-bounces@ietf.org, "Noveck, Dave" , nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org Garth Goodson wrote on 06/21/2005 03:33:30 PM: > Marc Eshel wrote: > > "Noveck, Dave" wrote on 06/21/2005 01:51:07 PM: > > > > > >>>With fs-locations we give the client a list of metadata server > >>>that can act as the server for given filesystem and the client > >>>can choose which one to use and switch among them at will or > >>>because of some failure. > >> > >>The problem is that in v4.0 as it stands there is an enormous > >>range of ways that a client can interpret the fs_locations list: > >> > >>"Here is a list of servers and you can switch when there is failure, > >>after refetching your change attibutes (because change attribute > >>is within the purview of a particular server)." (and that is not the > >>most extreme case of discontinuity -- some people are thinking that > >>filehandles and fileids will just change or that you may wind up > >>with a slighly out-of-data version of the data). > >> > >>"Here is a list of servers and you can switch when there is a > >>failure with no discontinuity of access (changes in fh's, stateid's, > >>fileids, change attributes), although changing is a big deal > >>and you shouldn't do it without good cause." > >> > >>"Here is a list of servers and you can switch when there is a > >>failure with no discontinuity of access or even at will since > >>there is no big cost to switch." > >> > >>"Here is a list of servers and you can access any of these > >>servers as you will at the same time (multi-pathing or a > >>cluster fs) since they are all effectively the same thing". > >> > >>The problem is that the client only knows about the list and > >>has no way of knowing which of the statements above is associated > >>with the list of servers he is getting. I have been thinking > >>about a locations_info attribute for 4.1 that would allow the > >>server to tell the client which of those he meant and also > >>give preference information (local vs. remote copies, absolutely > >>up-to-data vs. slightly out-of-data copies). > >> > >> > >>>is solely within the purview of a particular server > >> > >>>I agree that once you got a layout for a file you should follow > >>>the layout instruction and read/write only from the specified > >>>nodes and get an error if you don't. > >> > >>Then it sounds like we are in violent agreement, except maybe for > >>choice of modal auxiliaries or capitalization. > >> > >>You say "you should follow the layout instruction" and I'm torn > >>between saying "you SHOULD follow the layout instruction" and > >>"you MUST follow the layout instruction". > >> > >>You say "[should] get an error if you don't" and I say "the server > >>SHOULD give you an error if you don't". > >> > > > > > > I say should follow and not MUST follow because I am trying to avoid the > > complication to the server if it MUST enforce this rule which might not be > > a problem for the server in the first place. For example, I might want to > > allow the read of the same large file(no caching) from one set of data > > server for client A and a different set of data server for client B. Now > > to enforce the above rule the server need to some how encode into the file > > handle information about which client can read what from which data > > server? I prefer not have this added extra work on the server and just say > > that the client should follow the rule to guaranty successful operation. > > Marc. > > I think your example can be handled by giving client A and client B > different layouts for the same file. The filehandles in the different > layouts can be different as would be the device IDs. As long as the > metadata server controls the sharing modes (read-only vs. read/write) > there shouldn't be a problem. > > -Garth I know I can do it. I just don't want to make sure (enforce the rule) that each client is using the file handles to read only from the specified data server. Marc. Marc. _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Tue Jun 21 19:54:14 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DksZK-00032l-Ky; Tue, 21 Jun 2005 19:54:14 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DksZJ-00032g-7Z for nfsv4@megatron.ietf.org; Tue, 21 Jun 2005 19:54:13 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id TAA28014 for ; Tue, 21 Jun 2005 19:54:12 -0400 (EDT) Received: from mx1.netapp.com ([216.240.18.38]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DksxI-0003Fn-9w for nfsv4@ietf.org; Tue, 21 Jun 2005 20:19:02 -0400 Received: from smtp2.corp.netapp.com (10.57.159.114) by mx1.netapp.com with ESMTP; 21 Jun 2005 16:54:01 -0700 X-IronPort-AV: i="3.93,219,1115017200"; d="scan'208"; a="201922787:sNHT17774552" Received: from [10.34.24.132] (loderunner.hq.netapp.com [10.34.24.132]) by smtp2.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id j5LNs1QV003470; Tue, 21 Jun 2005 16:54:01 -0700 (PDT) Message-ID: <42B8A899.5030204@netapp.com> Date: Tue, 21 Jun 2005 16:54:01 -0700 From: Garth Goodson User-Agent: Debian Thunderbird 1.0.2 (X11/20050602) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Marc Eshel Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Scan-Signature: b22590c27682ace61775ee7b453b40d3 Content-Transfer-Encoding: 7bit Cc: "Noveck, Dave" , nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org Marc Eshel wrote: > Garth Goodson wrote on 06/21/2005 03:33:30 PM: > > >>Marc Eshel wrote: >> >>>"Noveck, Dave" wrote on 06/21/2005 01:51:07 > > PM: > >>> >>>>>With fs-locations we give the client a list of metadata server >>>>>that can act as the server for given filesystem and the client >>>>>can choose which one to use and switch among them at will or >>>>>because of some failure. >>>> >>>>The problem is that in v4.0 as it stands there is an enormous >>>>range of ways that a client can interpret the fs_locations list: >>>> >>>>"Here is a list of servers and you can switch when there is failure, >>>>after refetching your change attibutes (because change attribute >>>>is within the purview of a particular server)." (and that is not the >>>>most extreme case of discontinuity -- some people are thinking that >>>>filehandles and fileids will just change or that you may wind up >>>>with a slighly out-of-data version of the data). >>>> >>>>"Here is a list of servers and you can switch when there is a >>>>failure with no discontinuity of access (changes in fh's, stateid's, >>>>fileids, change attributes), although changing is a big deal >>>>and you shouldn't do it without good cause." >>>> >>>>"Here is a list of servers and you can switch when there is a >>>>failure with no discontinuity of access or even at will since >>>>there is no big cost to switch." >>>> >>>>"Here is a list of servers and you can access any of these >>>>servers as you will at the same time (multi-pathing or a >>>>cluster fs) since they are all effectively the same thing". >>>> >>>>The problem is that the client only knows about the list and >>>>has no way of knowing which of the statements above is associated >>>>with the list of servers he is getting. I have been thinking >>>>about a locations_info attribute for 4.1 that would allow the >>>>server to tell the client which of those he meant and also >>>>give preference information (local vs. remote copies, absolutely >>>>up-to-data vs. slightly out-of-data copies). >>>> >>>> >>>> >>>>>is solely within the purview of a particular server >>>> >>>>>I agree that once you got a layout for a file you should follow >>>>>the layout instruction and read/write only from the specified >>>>>nodes and get an error if you don't. >>>> >>>>Then it sounds like we are in violent agreement, except maybe for >>>>choice of modal auxiliaries or capitalization. >>>> >>>>You say "you should follow the layout instruction" and I'm torn >>>>between saying "you SHOULD follow the layout instruction" and >>>>"you MUST follow the layout instruction". >>>> >>>>You say "[should] get an error if you don't" and I say "the server >>>>SHOULD give you an error if you don't". >>>> >>> >>> >>>I say should follow and not MUST follow because I am trying to avoid > > the > >>>complication to the server if it MUST enforce this rule which might > > not be > >>>a problem for the server in the first place. For example, I might want > > to > >>>allow the read of the same large file(no caching) from one set of data > > >>>server for client A and a different set of data server for client B. > > Now > >>>to enforce the above rule the server need to some how encode into the > > file > >>>handle information about which client can read what from which data >>>server? I prefer not have this added extra work on the server and just > > say > >>>that the client should follow the rule to guaranty successful > > operation. > >>>Marc. >> >>I think your example can be handled by giving client A and client B >>different layouts for the same file. The filehandles in the different >>layouts can be different as would be the device IDs. As long as the >>metadata server controls the sharing modes (read-only vs. read/write) >>there shouldn't be a problem. >> >>-Garth > > > I know I can do it. I just don't want to make sure (enforce the rule) that > each client is using the file handles to read only from the specified data > server. > Marc. > > Marc. Ok, that is a valid concern (not having to propagate layouts to the data servers to validate that I/Os are coming from the correct clients). I guess the object guys get around this by encoding the layout/device IDs into the capability that is handed back to the client with the layout. It has been marked as an open issue... -Garth _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Wed Jun 22 06:55:22 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dl2t8-00053f-Hx; Wed, 22 Jun 2005 06:55:22 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dl2t6-00053X-Jd; Wed, 22 Jun 2005 06:55:20 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id GAA24097; Wed, 22 Jun 2005 06:55:17 -0400 (EDT) Received: from mx2.netapp.com ([216.240.18.37]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1Dl3HD-0002ro-BA; Wed, 22 Jun 2005 07:20:15 -0400 Received: from smtp1.corp.netapp.com (10.57.156.124) by mx2.netapp.com with ESMTP; 22 Jun 2005 03:55:10 -0700 X-IronPort-AV: i="3.93,220,1115017200"; d="scan'208"; a="253067190:sNHT23769408" Received: from svlexc03.hq.netapp.com (svlexc03.corp.netapp.com [10.57.156.149]) by smtp1.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id j5MAtAIi018285; Wed, 22 Jun 2005 03:55:10 -0700 (PDT) Received: from burgundy.hq.netapp.com ([10.56.10.66]) by svlexc03.hq.netapp.com with Microsoft SMTPSVC(6.0.3790.0); Wed, 22 Jun 2005 03:55:10 -0700 Received: from exnane01.hq.netapp.com ([10.97.0.61]) by burgundy.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.6713); Wed, 22 Jun 2005 03:55:10 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Date: Wed, 22 Jun 2005 06:55:08 -0400 Message-ID: Thread-Topic: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Thread-Index: AcV2sGpmjMnqRzZCRDOS8fA6mORgUAAZWreg From: "Noveck, Dave" To: "Marc Eshel" X-OriginalArrivalTime: 22 Jun 2005 10:55:10.0160 (UTC) FILETIME=[DBDDE500:01C57718] X-Spam-Score: 0.0 (/) X-Scan-Signature: 093efd19b5f651b2707595638f6c4003 Content-Transfer-Encoding: quoted-printable Cc: "Goodson, Garth" , nfsv4-bounces@ietf.org, nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org > > You say "you should follow the layout instruction" and I'm torn > > between saying "you SHOULD follow the layout instruction" and > > "you MUST follow the layout instruction". > >=20 > > You say "[should] get an error if you don't" and I say "the server > > SHOULD give you an error if you don't". >=20 > I say should follow and not MUST follow because I am trying to avoid = the=20 > complication to the server if it MUST enforce this rule which might = not be=20 > a problem for the server in the first place.=20 Hold on. I never suggested "MUST" for the server's obligation to check. It seems that the server checking is where you see = difficulties/inconvenience. I did suggest "MUST" as a possiblility for the clients' obligation to=20 conform. These two do *not* have to go in tandem. While it wouldn't = make any sense to have the client not have to conform while the server is = giving him an error if he doesn't, it is perfectly reasonable for the spec to strongly state the rule for the client but not to insist that the server check for compliance if it has great difficulties doing so. > For example, I might want to=20 > allow the read of the same large file(no caching) from one set of data = > server for client A and a different set of data server for client B. = Now=20 > to enforce the above rule the server need to some how encode into the = file=20 > handle information about which client can read what from which data=20 > server?=20 I may not be uderstanding your example correctly but it sounds like the case you are worried about is not really at issue here. I prefer not have this added extra work on the server and just say=20 that the client should follow the rule to guaranty successful operation. = -----Original Message----- From: Marc Eshel [mailto:eshel@almaden.ibm.com] Sent: Tuesday, June 21, 2005 6:27 PM To: Noveck, Dave Cc: Dean Hildebrand; Goodson, Garth; nfsv4@ietf.org; nfsv4-bounces@ietf.org Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) "Noveck, Dave" wrote on 06/21/2005 01:51:07 PM: > > With fs-locations we give the client a list of metadata server=20 > > that can act as the server for given filesystem and the client=20 > > can choose which one to use and switch among them at will or=20 > > because of some failure.=20 >=20 > The problem is that in v4.0 as it stands there is an enormous > range of ways that a client can interpret the fs_locations list: >=20 > "Here is a list of servers and you can switch when there is failure, > after refetching your change attibutes (because change attribute > is within the purview of a particular server)." (and that is not the > most extreme case of discontinuity -- some people are thinking that=20 > filehandles and fileids will just change or that you may wind up > with a slighly out-of-data version of the data). >=20 > "Here is a list of servers and you can switch when there is a=20 > failure with no discontinuity of access (changes in fh's, stateid's, > fileids, change attributes), although changing is a big deal > and you shouldn't do it without good cause." >=20 > "Here is a list of servers and you can switch when there is a > failure with no discontinuity of access or even at will since > there is no big cost to switch." >=20 > "Here is a list of servers and you can access any of these=20 > servers as you will at the same time (multi-pathing or a=20 > cluster fs) since they are all effectively the same thing". >=20 > The problem is that the client only knows about the list and > has no way of knowing which of the statements above is associated > with the list of servers he is getting. I have been thinking > about a locations_info attribute for 4.1 that would allow the > server to tell the client which of those he meant and also=20 > give preference information (local vs. remote copies, absolutely > up-to-data vs. slightly out-of-data copies). >=20 > > is solely within the purview of a particular server >=20 > > I agree that once you got a layout for a file you should follow=20 > > the layout instruction and read/write only from the specified=20 > > nodes and get an error if you don't.=20 >=20 > Then it sounds like we are in violent agreement, except maybe for > choice of modal auxiliaries or capitalization. >=20 > You say "you should follow the layout instruction" and I'm torn > between saying "you SHOULD follow the layout instruction" and > "you MUST follow the layout instruction". >=20 > You say "[should] get an error if you don't" and I say "the server > SHOULD give you an error if you don't". >=20 I say should follow and not MUST follow because I am trying to avoid the = complication to the server if it MUST enforce this rule which might not = be=20 a problem for the server in the first place. For example, I might want = to=20 allow the read of the same large file(no caching) from one set of data=20 server for client A and a different set of data server for client B. Now = to enforce the above rule the server need to some how encode into the = file=20 handle information about which client can read what from which data=20 server? I prefer not have this added extra work on the server and just = say=20 that the client should follow the rule to guaranty successful operation. = Marc.=20 _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Wed Jun 22 06:57:31 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dl2vD-0005aX-1R; Wed, 22 Jun 2005 06:57:31 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dl2vC-0005Zl-Is for nfsv4@megatron.ietf.org; Wed, 22 Jun 2005 06:57:30 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id GAA24271 for ; Wed, 22 Jun 2005 06:57:27 -0400 (EDT) Received: from mx1.netapp.com ([216.240.18.38]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1Dl3JJ-0002th-8U for nfsv4@ietf.org; Wed, 22 Jun 2005 07:22:25 -0400 Received: from smtp2.corp.netapp.com (10.57.159.114) by mx1.netapp.com with ESMTP; 22 Jun 2005 03:57:21 -0700 X-IronPort-AV: i="3.93,220,1115017200"; d="scan'208"; a="201968973:sNHT22844000" Received: from svlexc02.hq.netapp.com (svlexc02.corp.netapp.com [10.57.157.136]) by smtp2.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id j5MAvKPs009281 for ; Wed, 22 Jun 2005 03:57:20 -0700 (PDT) Received: from burgundy.hq.netapp.com ([10.56.10.66]) by svlexc02.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.6713); Wed, 22 Jun 2005 03:57:20 -0700 Received: from exnane01.hq.netapp.com ([10.97.0.61]) by burgundy.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.6713); Wed, 22 Jun 2005 03:57:20 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: FW: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Date: Wed, 22 Jun 2005 06:57:19 -0400 Message-ID: Thread-Topic: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Thread-Index: AcV2sGpmjMnqRzZCRDOS8fA6mORgUAAZWregAADEk5A= From: "Noveck, Dave" To: X-OriginalArrivalTime: 22 Jun 2005 10:57:20.0774 (UTC) FILETIME=[29B80260:01C57719] X-Spam-Score: 0.0 (/) X-Scan-Signature: 6640e3bbe8a4d70c4469bcdcbbf0921d Content-Transfer-Encoding: quoted-printable X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org Best to ignore the following for the moment. Send hit inadvertantly. Updated, more coherent message will be sent soon. -----Original Message----- From: Noveck, Dave=20 Sent: Wednesday, June 22, 2005 6:55 AM To: 'Marc Eshel' Cc: Dean Hildebrand; Goodson, Garth; nfsv4@ietf.org; nfsv4-bounces@ietf.org Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) > > You say "you should follow the layout instruction" and I'm torn > > between saying "you SHOULD follow the layout instruction" and > > "you MUST follow the layout instruction". > >=20 > > You say "[should] get an error if you don't" and I say "the server > > SHOULD give you an error if you don't". >=20 > I say should follow and not MUST follow because I am trying to avoid = the=20 > complication to the server if it MUST enforce this rule which might = not be=20 > a problem for the server in the first place.=20 Hold on. I never suggested "MUST" for the server's obligation to check. It seems that the server checking is where you see = difficulties/inconvenience. I did suggest "MUST" as a possiblility for the clients' obligation to=20 conform. These two do *not* have to go in tandem. While it wouldn't = make any sense to have the client not have to conform while the server is = giving him an error if he doesn't, it is perfectly reasonable for the spec to strongly state the rule for the client but not to insist that the server check for compliance if it has great difficulties doing so. > For example, I might want to=20 > allow the read of the same large file(no caching) from one set of data = > server for client A and a different set of data server for client B. = Now=20 > to enforce the above rule the server need to some how encode into the = file=20 > handle information about which client can read what from which data=20 > server?=20 I may not be uderstanding your example correctly but it sounds like the case you are worried about is not really at issue here. I prefer not have this added extra work on the server and just say=20 that the client should follow the rule to guaranty successful operation. = -----Original Message----- From: Marc Eshel [mailto:eshel@almaden.ibm.com] Sent: Tuesday, June 21, 2005 6:27 PM To: Noveck, Dave Cc: Dean Hildebrand; Goodson, Garth; nfsv4@ietf.org; nfsv4-bounces@ietf.org Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) "Noveck, Dave" wrote on 06/21/2005 01:51:07 PM: > > With fs-locations we give the client a list of metadata server=20 > > that can act as the server for given filesystem and the client=20 > > can choose which one to use and switch among them at will or=20 > > because of some failure.=20 >=20 > The problem is that in v4.0 as it stands there is an enormous > range of ways that a client can interpret the fs_locations list: >=20 > "Here is a list of servers and you can switch when there is failure, > after refetching your change attibutes (because change attribute > is within the purview of a particular server)." (and that is not the > most extreme case of discontinuity -- some people are thinking that=20 > filehandles and fileids will just change or that you may wind up > with a slighly out-of-data version of the data). >=20 > "Here is a list of servers and you can switch when there is a=20 > failure with no discontinuity of access (changes in fh's, stateid's, > fileids, change attributes), although changing is a big deal > and you shouldn't do it without good cause." >=20 > "Here is a list of servers and you can switch when there is a > failure with no discontinuity of access or even at will since > there is no big cost to switch." >=20 > "Here is a list of servers and you can access any of these=20 > servers as you will at the same time (multi-pathing or a=20 > cluster fs) since they are all effectively the same thing". >=20 > The problem is that the client only knows about the list and > has no way of knowing which of the statements above is associated > with the list of servers he is getting. I have been thinking > about a locations_info attribute for 4.1 that would allow the > server to tell the client which of those he meant and also=20 > give preference information (local vs. remote copies, absolutely > up-to-data vs. slightly out-of-data copies). >=20 > > is solely within the purview of a particular server >=20 > > I agree that once you got a layout for a file you should follow=20 > > the layout instruction and read/write only from the specified=20 > > nodes and get an error if you don't.=20 >=20 > Then it sounds like we are in violent agreement, except maybe for > choice of modal auxiliaries or capitalization. >=20 > You say "you should follow the layout instruction" and I'm torn > between saying "you SHOULD follow the layout instruction" and > "you MUST follow the layout instruction". >=20 > You say "[should] get an error if you don't" and I say "the server > SHOULD give you an error if you don't". >=20 I say should follow and not MUST follow because I am trying to avoid the = complication to the server if it MUST enforce this rule which might not = be=20 a problem for the server in the first place. For example, I might want = to=20 allow the read of the same large file(no caching) from one set of data=20 server for client A and a different set of data server for client B. Now = to enforce the above rule the server need to some how encode into the = file=20 handle information about which client can read what from which data=20 server? I prefer not have this added extra work on the server and just = say=20 that the client should follow the rule to guaranty successful operation. = Marc.=20 _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Wed Jun 22 09:51:00 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dl5d6-0008Hg-Jp; Wed, 22 Jun 2005 09:51:00 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dl5d4-0008HP-MX; Wed, 22 Jun 2005 09:50:58 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id JAA11287; Wed, 22 Jun 2005 09:50:56 -0400 (EDT) Received: from mx2.netapp.com ([216.240.18.37]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1Dl61C-0000JL-Lb; Wed, 22 Jun 2005 10:15:55 -0400 Received: from smtp1.corp.netapp.com (10.57.156.124) by mx2.netapp.com with ESMTP; 22 Jun 2005 06:50:48 -0700 X-IronPort-AV: i="3.93,221,1115017200"; d="scan'208"; a="253110518:sNHT24388284" Received: from svlexc02.hq.netapp.com (svlexc02.corp.netapp.com [10.57.157.136]) by smtp1.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id j5MDolwd012359; Wed, 22 Jun 2005 06:50:47 -0700 (PDT) Received: from lavender.hq.netapp.com ([10.56.11.75]) by svlexc02.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.6713); Wed, 22 Jun 2005 06:50:47 -0700 Received: from exnane01.hq.netapp.com ([10.97.0.61]) by lavender.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.6713); Wed, 22 Jun 2005 06:50:47 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Date: Wed, 22 Jun 2005 09:50:45 -0400 Message-ID: Thread-Topic: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Thread-Index: AcV2sGpmjMnqRzZCRDOS8fA6mORgUAAfERgw From: "Noveck, Dave" To: "Marc Eshel" X-OriginalArrivalTime: 22 Jun 2005 13:50:47.0105 (UTC) FILETIME=[64601B10:01C57731] X-Spam-Score: 1.3 (+) X-Scan-Signature: 7e439b86d3292ef5adf93b694a43a576 Content-Transfer-Encoding: quoted-printable Cc: "Goodson, Garth" , nfsv4-bounces@ietf.org, nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org The full response this time. > > You say "you should follow the layout instruction" and I'm torn > > between saying "you SHOULD follow the layout instruction" and > > "you MUST follow the layout instruction". > >=20 > > You say "[should] get an error if you don't" and I say "the server > > SHOULD give you an error if you don't". >=20 > I say should follow and not MUST follow because I am trying to avoid = the=20 > complication to the server if it MUST enforce this rule which might = not be=20 > a problem for the server in the first place.=20 Hold on. I never suggested "MUST" for the server's obligation to check. It seems that the server checking is where you see = difficulties/inconvenience. I did suggest "MUST" as a possiblility for the clients' obligation to=20 conform. These two do *not* have to go in tandem. While it wouldn't = make any sense to have the client not have to conform while the server is = giving him an error if he doesn't, it is perfectly reasonable for the spec to strongly state the rule for the client but not to insist that the server check for compliance if it has great difficulties doing so. > For example, I might want to=20 > allow the read of the same large file(no caching) from one set of data = > server for client A and a different set of data server for client B. = Now=20 > to enforce the above rule the server need to some how encode into the = file=20 > handle information about which client can read what from which data=20 > server?=20 I may not be uderstanding your example correctly but it sounds like the case you are worried about is not really at issue here. I know we have been talking kind of loosely about should/SHOULD/MUST=20 "follow the layout instruction". This is overbroad. If a server is told to use server111.clustersRus.org and takes that same handles and uses it on some other server, server111.clustersRus.com for example, then he is not following the layout instruction, but the spec is not going to require anybody to specifically act to make sure that he=20 gets an error. The effect of using a filehandle on a server other than the one it for had always been undefined, and I expect it will continue to be. Even though your data servers above are going to=20 be in more confederal relationship than the two server111's, I=20 think the same will still hold. If I take a handle for X and use it on Y, I have a real good chance of getting STALE but there is no guarantee that I will. The specific issue that started this (and that I'm still talking=20 about) is more limited. I'm given a handle H for a server A in a layout and in that is the requirement that that handle be valid for READ/WRITE, etc. and not for SETATTR. If the client uses it on A and does a SETATTR, he SHOULD get an error. If he uses that same handle on B and does a READ then he is broken but the server has no obligation to recognize handles for other servers. Similarly if he does a SETATTR with the handle on B. > I prefer not have this added extra work on the server and just say=20 > that the client should follow the rule to guaranty successful = operation.=20 In the IETF "should" is very weak and amounts to "gee, it is sort of a good idea to". "SHOULD" is much stronger and says "Do it unless you = have a real good reason not to". "MUST" just says to do it.=20 I guess I still think that if you receive a handle for server A in a=20 layout, you MUST NOT use it to do operations on that server other than PUTFH, COMMIT, READ, WRITE, and that if you do, the server SHOULD give=20 you an error. If you feel this is too difficult for the server, then the "SHOULD"=20 would give you enough wiggle-room. -----Original Message----- From: Marc Eshel [mailto:eshel@almaden.ibm.com] Sent: Tuesday, June 21, 2005 6:27 PM To: Noveck, Dave Cc: Dean Hildebrand; Goodson, Garth; nfsv4@ietf.org; nfsv4-bounces@ietf.org Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) "Noveck, Dave" wrote on 06/21/2005 01:51:07 PM: > > With fs-locations we give the client a list of metadata server=20 > > that can act as the server for given filesystem and the client=20 > > can choose which one to use and switch among them at will or=20 > > because of some failure.=20 >=20 > The problem is that in v4.0 as it stands there is an enormous > range of ways that a client can interpret the fs_locations list: >=20 > "Here is a list of servers and you can switch when there is failure, > after refetching your change attibutes (because change attribute > is within the purview of a particular server)." (and that is not the > most extreme case of discontinuity -- some people are thinking that=20 > filehandles and fileids will just change or that you may wind up > with a slighly out-of-data version of the data). >=20 > "Here is a list of servers and you can switch when there is a=20 > failure with no discontinuity of access (changes in fh's, stateid's, > fileids, change attributes), although changing is a big deal > and you shouldn't do it without good cause." >=20 > "Here is a list of servers and you can switch when there is a > failure with no discontinuity of access or even at will since > there is no big cost to switch." >=20 > "Here is a list of servers and you can access any of these=20 > servers as you will at the same time (multi-pathing or a=20 > cluster fs) since they are all effectively the same thing". >=20 > The problem is that the client only knows about the list and > has no way of knowing which of the statements above is associated > with the list of servers he is getting. I have been thinking > about a locations_info attribute for 4.1 that would allow the > server to tell the client which of those he meant and also=20 > give preference information (local vs. remote copies, absolutely > up-to-data vs. slightly out-of-data copies). >=20 > > is solely within the purview of a particular server >=20 > > I agree that once you got a layout for a file you should follow=20 > > the layout instruction and read/write only from the specified=20 > > nodes and get an error if you don't.=20 >=20 > Then it sounds like we are in violent agreement, except maybe for > choice of modal auxiliaries or capitalization. >=20 > You say "you should follow the layout instruction" and I'm torn > between saying "you SHOULD follow the layout instruction" and > "you MUST follow the layout instruction". >=20 > You say "[should] get an error if you don't" and I say "the server > SHOULD give you an error if you don't". >=20 I say should follow and not MUST follow because I am trying to avoid the = complication to the server if it MUST enforce this rule which might not = be=20 a problem for the server in the first place. For example, I might want = to=20 allow the read of the same large file(no caching) from one set of data=20 server for client A and a different set of data server for client B. Now = to enforce the above rule the server need to some how encode into the = file=20 handle information about which client can read what from which data=20 server? I prefer not have this added extra work on the server and just = say=20 that the client should follow the rule to guaranty successful operation. = Marc.=20 _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Wed Jun 22 11:24:24 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dl75T-0004XO-VR; Wed, 22 Jun 2005 11:24:23 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dl75R-0004X4-Dz; Wed, 22 Jun 2005 11:24:21 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id LAA23051; Wed, 22 Jun 2005 11:24:18 -0400 (EDT) Received: from newman.eecs.umich.edu ([141.213.4.11]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1Dl7TZ-0003y5-Dk; Wed, 22 Jun 2005 11:49:18 -0400 Received: from willow.eecs.umich.edu (willow.eecs.umich.edu [141.213.4.14]) by newman.eecs.umich.edu (8.13.2/8.13.0) with ESMTP id j5MFNxuU000457 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 22 Jun 2005 11:23:59 -0400 Received: from willow.eecs.umich.edu (localhost.eecs.umich.edu [127.0.0.1]) by willow.eecs.umich.edu (8.13.1/8.13.0) with ESMTP id j5MFNw8s016805; Wed, 22 Jun 2005 11:23:59 -0400 Received: from localhost (dhildebz@localhost) by willow.eecs.umich.edu (8.13.1/8.13.1/Submit) with ESMTP id j5MFNwC8016802; Wed, 22 Jun 2005 11:23:58 -0400 Date: Wed, 22 Jun 2005 11:23:58 -0400 (EDT) From: Dean Hildebrand To: "Noveck, Dave" Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,NO_OBLIGATION autolearn=no version=3.0.3 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on newman.eecs.umich.edu X-Virus-Scan: : UVSCAN at UoM/EECS X-Spam-Score: 1.3 (+) X-Scan-Signature: 848ed35f2a4fc0638fa89629cb640f48 Cc: "Goodson, Garth" , nfsv4-bounces@ietf.org, Marc Eshel , nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org Two comments: 1) I think another example to keep in mind is that a layout could be redundant. You can imagine layouts that just return "file available on all data servers", and then let the client use its own load balancing scheme to access data. Of course this depends on all clients using the same load balancing algorithm, but file systems already exist in this manner. 2) I think it can be safely assumed that someone is going to want to use these extensions to send GETATTR, etc to the data servers to offload work from the metadata server. Garth's NASD paper did this with NASD NFS and showed the benefits. If one layout driver redirects GETATTR's, etc and another one doesn't, I assume the faster one will be used, not the one that follows the spec. If we would rather put this ability into a new FILE_LOCATIONS attribute or something similar, then we should say so. I wrote a FILE_LOCATIONS internet draft about 2 years ago that never went anywhere.... Dean On Wed, 22 Jun 2005, Noveck, Dave wrote: > The full response this time. > > > > You say "you should follow the layout instruction" and I'm torn > > > between saying "you SHOULD follow the layout instruction" and > > > "you MUST follow the layout instruction". > > > > > > You say "[should] get an error if you don't" and I say "the server > > > SHOULD give you an error if you don't". > > > > I say should follow and not MUST follow because I am trying to avoid the > > complication to the server if it MUST enforce this rule which might not be > > a problem for the server in the first place. > > Hold on. I never suggested "MUST" for the server's obligation to check. > It seems that the server checking is where you see difficulties/inconvenience. > > I did suggest "MUST" as a possiblility for the clients' obligation to > conform. These two do *not* have to go in tandem. While it wouldn't make > any sense to have the client not have to conform while the server is giving > him an error if he doesn't, it is perfectly reasonable for the spec to > strongly state the rule for the client but not to insist that the server > check for compliance if it has great difficulties doing so. > > > For example, I might want to > > allow the read of the same large file(no caching) from one set of data > > server for client A and a different set of data server for client B. Now > > to enforce the above rule the server need to some how encode into the file > > handle information about which client can read what from which data > > server? > > I may not be uderstanding your example correctly but it sounds like the > case you are worried about is not really at issue here. > > I know we have been talking kind of loosely about should/SHOULD/MUST > "follow the layout instruction". This is overbroad. If a server is > told to use server111.clustersRus.org and takes that same handles and > uses it on some other server, server111.clustersRus.com for example, > then he is not following the layout instruction, but the spec is not > going to require anybody to specifically act to make sure that he > gets an error. The effect of using a filehandle on a server other > than the one it for had always been undefined, and I expect it will > continue to be. Even though your data servers above are going to > be in more confederal relationship than the two server111's, I > think the same will still hold. If I take a handle for X and use it > on Y, I have a real good chance of getting STALE but there is no > guarantee that I will. > > The specific issue that started this (and that I'm still talking > about) is more limited. I'm given a handle H for a server A in a > layout and in that is the requirement that that handle be valid > for READ/WRITE, etc. and not for SETATTR. If the client uses it > on A and does a SETATTR, he SHOULD get an error. If he uses that > same handle on B and does a READ then he is broken but the server > has no obligation to recognize handles for other servers. Similarly > if he does a SETATTR with the handle on B. > > > I prefer not have this added extra work on the server and just say > > that the client should follow the rule to guaranty successful operation. > > In the IETF "should" is very weak and amounts to "gee, it is sort of a > good idea to". "SHOULD" is much stronger and says "Do it unless you have > a real good reason not to". "MUST" just says to do it. > > I guess I still think that if you receive a handle for server A in a > layout, you MUST NOT use it to do operations on that server other than > PUTFH, COMMIT, READ, WRITE, and that if you do, the server SHOULD give > you an error. > > If you feel this is too difficult for the server, then the "SHOULD" > would give you enough wiggle-room. > > > -----Original Message----- > From: Marc Eshel [mailto:eshel@almaden.ibm.com] > Sent: Tuesday, June 21, 2005 6:27 PM > To: Noveck, Dave > Cc: Dean Hildebrand; Goodson, Garth; nfsv4@ietf.org; > nfsv4-bounces@ietf.org > Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) > > > "Noveck, Dave" wrote on 06/21/2005 01:51:07 PM: > > > > With fs-locations we give the client a list of metadata server > > > that can act as the server for given filesystem and the client > > > can choose which one to use and switch among them at will or > > > because of some failure. > > > > The problem is that in v4.0 as it stands there is an enormous > > range of ways that a client can interpret the fs_locations list: > > > > "Here is a list of servers and you can switch when there is failure, > > after refetching your change attibutes (because change attribute > > is within the purview of a particular server)." (and that is not the > > most extreme case of discontinuity -- some people are thinking that > > filehandles and fileids will just change or that you may wind up > > with a slighly out-of-data version of the data). > > > > "Here is a list of servers and you can switch when there is a > > failure with no discontinuity of access (changes in fh's, stateid's, > > fileids, change attributes), although changing is a big deal > > and you shouldn't do it without good cause." > > > > "Here is a list of servers and you can switch when there is a > > failure with no discontinuity of access or even at will since > > there is no big cost to switch." > > > > "Here is a list of servers and you can access any of these > > servers as you will at the same time (multi-pathing or a > > cluster fs) since they are all effectively the same thing". > > > > The problem is that the client only knows about the list and > > has no way of knowing which of the statements above is associated > > with the list of servers he is getting. I have been thinking > > about a locations_info attribute for 4.1 that would allow the > > server to tell the client which of those he meant and also > > give preference information (local vs. remote copies, absolutely > > up-to-data vs. slightly out-of-data copies). > > > > > is solely within the purview of a particular server > > > > > I agree that once you got a layout for a file you should follow > > > the layout instruction and read/write only from the specified > > > nodes and get an error if you don't. > > > > Then it sounds like we are in violent agreement, except maybe for > > choice of modal auxiliaries or capitalization. > > > > You say "you should follow the layout instruction" and I'm torn > > between saying "you SHOULD follow the layout instruction" and > > "you MUST follow the layout instruction". > > > > You say "[should] get an error if you don't" and I say "the server > > SHOULD give you an error if you don't". > > > > I say should follow and not MUST follow because I am trying to avoid the > complication to the server if it MUST enforce this rule which might not be > a problem for the server in the first place. For example, I might want to > allow the read of the same large file(no caching) from one set of data > server for client A and a different set of data server for client B. Now > to enforce the above rule the server need to some how encode into the file > handle information about which client can read what from which data > server? I prefer not have this added extra work on the server and just say > that the client should follow the rule to guaranty successful operation. > Marc. > _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Wed Jun 22 13:13:17 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dl8mr-0007n7-E7; Wed, 22 Jun 2005 13:13:17 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dl8mq-0007mz-Kv; Wed, 22 Jun 2005 13:13:16 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id NAA03650; Wed, 22 Jun 2005 13:13:12 -0400 (EDT) Received: from e3.ny.us.ibm.com ([32.97.182.143]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1Dl9B0-0007mA-0i; Wed, 22 Jun 2005 13:38:14 -0400 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e3.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j5MHCuS8032608; Wed, 22 Jun 2005 13:12:56 -0400 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay02.pok.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j5MHCuZJ261810; Wed, 22 Jun 2005 13:12:56 -0400 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.12.11/8.13.3) with ESMTP id j5MHCk4b019872; Wed, 22 Jun 2005 13:12:46 -0400 Received: from [9.56.227.90] (d01ml604.pok.ibm.com [9.56.227.90]) by d01av04.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j5MHCk6s019736; Wed, 22 Jun 2005 13:12:46 -0400 In-Reply-To: To: "Noveck, Dave" Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) MIME-Version: 1.0 X-Mailer: Lotus Notes Build V70_M4_01112005 Beta 3NP January 11, 2005 Message-ID: From: Marc Eshel Date: Wed, 22 Jun 2005 10:12:26 -0700 X-MIMETrack: Serialize by Router on D01ML604/01/M/IBM(Build V70_06092005|June 09, 2005) at 06/22/2005 13:12:46, Serialize complete at 06/22/2005 13:12:46 Content-Type: text/plain; charset="US-ASCII" X-Spam-Score: 1.3 (+) X-Scan-Signature: 6e922792024732fb1bb6f346e63517e4 Cc: "Goodson, Garth" , nfsv4@ietf.org, nfsv4-bounces@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org "Noveck, Dave" wrote on 06/22/2005 06:50:45 AM: > The full response this time. > > > > You say "you should follow the layout instruction" and I'm torn > > > between saying "you SHOULD follow the layout instruction" and > > > "you MUST follow the layout instruction". > > > > > > You say "[should] get an error if you don't" and I say "the server > > > SHOULD give you an error if you don't". > > > > > I say should follow and not MUST follow because I am trying to avoid the > > complication to the server if it MUST enforce this rule which might not be > > a problem for the server in the first place. > > Hold on. I never suggested "MUST" for the server's obligation to check. > It seems that the server checking is where you see difficulties/inconvenience. > > I did suggest "MUST" as a possiblility for the clients' obligation to > conform. These two do *not* have to go in tandem. While it wouldn't make > any sense to have the client not have to conform while the server is giving > him an error if he doesn't, it is perfectly reasonable for the spec to > strongly state the rule for the client but not to insist that the server > check for compliance if it has great difficulties doing so. > > > For example, I might want to > > allow the read of the same large file(no caching) from one set of data > > server for client A and a different set of data server for client B. Now > > to enforce the above rule the server need to some how encode into the file > > handle information about which client can read what from which data > > server? > > I may not be uderstanding your example correctly but it sounds like the > case you are worried about is not really at issue here. The example was just to illustrate the information (capabilities) that you would have to encode in the file handle if we wanted the server verify that the client follow the rules. > > I know we have been talking kind of loosely about should/SHOULD/MUST > "follow the layout instruction". This is overbroad. If a server is > told to use server111.clustersRus.org and takes that same handles and > uses it on some other server, server111.clustersRus.com for example, > then he is not following the layout instruction, but the spec is not > going to require anybody to specifically act to make sure that he > gets an error. The effect of using a filehandle on a server other > than the one it for had always been undefined, and I expect it will > continue to be. Even though your data servers above are going to > be in more confederal relationship than the two server111's, I > think the same will still hold. If I take a handle for X and use it > on Y, I have a real good chance of getting STALE but there is no > guarantee that I will. > > The specific issue that started this (and that I'm still talking > about) is more limited. I'm given a handle H for a server A in a > layout and in that is the requirement that that handle be valid > for READ/WRITE, etc. and not for SETATTR. If the client uses it > on A and does a SETATTR, he SHOULD get an error. If he uses that > same handle on B and does a READ then he is broken but the server > has no obligation to recognize handles for other servers. Similarly > if he does a SETATTR with the handle on B. > > > I prefer not have this added extra work on the server and just say > > that the client should follow the rule to guaranty successful operation. > > In the IETF "should" is very weak and amounts to "gee, it is sort of a > good idea to". "SHOULD" is much stronger and says "Do it unless you have > a real good reason not to". "MUST" just says to do it. > > I guess I still think that if you receive a handle for server A in a > layout, you MUST NOT use it to do operations on that server other than > PUTFH, COMMIT, READ, WRITE, and that if you do, the server SHOULD give > you an error. > > If you feel this is too difficult for the server, then the "SHOULD" > would give you enough wiggle-room. > I can see that I can get a way with not checking on the server but I would like to implement a server that is closer to the spec recommendation and understand the requirements. The "SHOULD" makes me a little uncomfortable I would prefer "should", but if it is only me I will shut-up. I would rather keep using file handles as file ids which can be used on any node of a cluster file system and not have to encode capabilities into them. Now the client needs to remember all those different file handles that should be used for different operations and the server verifying that the client did so. Marc. _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Wed Jun 22 14:58:28 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DlAQe-00089D-8M; Wed, 22 Jun 2005 14:58:28 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DlAQc-000895-IC; Wed, 22 Jun 2005 14:58:26 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id OAA17221; Wed, 22 Jun 2005 14:58:24 -0400 (EDT) Received: from mx1.netapp.com ([216.240.18.38]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DlAol-0006Kz-D2; Wed, 22 Jun 2005 15:23:26 -0400 Received: from smtp1.corp.netapp.com (10.57.156.124) by mx1.netapp.com with ESMTP; 22 Jun 2005 11:58:15 -0700 X-IronPort-AV: i="3.93,221,1115017200"; d="scan'208"; a="202012379:sNHT24311260" Received: from svlexc02.hq.netapp.com (svlexc02.corp.netapp.com [10.57.157.136]) by smtp1.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id j5MIwEBT007542; Wed, 22 Jun 2005 11:58:14 -0700 (PDT) Received: from lavender.hq.netapp.com ([10.56.11.75]) by svlexc02.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.6713); Wed, 22 Jun 2005 11:58:14 -0700 Received: from exnane01.hq.netapp.com ([10.97.0.61]) by lavender.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.6713); Wed, 22 Jun 2005 11:58:14 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Date: Wed, 22 Jun 2005 14:58:12 -0400 Message-ID: Thread-Topic: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Thread-Index: AcV3TaPPqL1sKFj4T26HyZXxZx7S4gACeXwg From: "Noveck, Dave" To: "Marc Eshel" X-OriginalArrivalTime: 22 Jun 2005 18:58:14.0121 (UTC) FILETIME=[57A75D90:01C5775C] X-Spam-Score: 1.3 (+) X-Scan-Signature: 43317e64100dd4d87214c51822b582d1 Content-Transfer-Encoding: quoted-printable Cc: "Goodson, Garth" , nfsv4@ietf.org, nfsv4-bounces@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org > I can see that I can get a way with not checking on the server but I would=20 > like to implement a server that is closer to the spec recommendation and=20 > understand the requirements. The "SHOULD" makes me a little uncomfortable=20 > I would prefer "should", but if it is only me I will shut-up. It sounds like you'd prefer "if you feel like it" and that makes me uncomfortable. > I would rather keep using file handles as file ids which can be used on=20 > any node of a cluster file system and not have to encode capabilities into=20 > them. Now the client needs to remember all those different file handles=20 > that should be used for different operations Whoa! Before you said you were OK with the client obligation (to only do READ/WRITE/etc on filehandles it got from layouts) and only objected to the work of the server verifying compliance. Now it appears that you object to forcing the client to obey that rule, i.e. that the problem with the verification is not that it is hard to=20 do but that it would make the client remember "all those different file handles that should be used for different operations". If that is the case then we have a real problem. You have an implementation in which every server may act as a metadata server but the pnfs client cannot=20 assume that all of the implementations with which it will interact will have that characteristic or else we have a massive (lack-of)- interoperability problem. If a layout tells the client he may use handle A on server X to READ/WRITE then he had to be capable of=20 respecting that, whether the server holds him to it or not. I'm perfectly OK with exposing additional functionality that a=20 cluster fs would provide for metadata load-balancing and failover as long as we are clear that this is something that the client is directed to use based on server characteristics. For example, if the devinfo entry says that the layout handle may be used to read/ write on a certain set of guaranteed-equivalent servers, then this=20 is fine. Or if a locations_info attribute for the fs indicated that coherent metadata service was available on a given set of servers, then this is OK as well. But each of these options is an option=20 and the basic architecture of pnfs is that there is a distinction between data service and meta-data service and that the client=20 has to maintain that distinction. Just as a pnfs client should=20 not use a block address in a SETATTR request or send a filehandle=20 in a SCSI block write :-), it should not send a handle it got from=20 a layout in a SETATTR request. It should not send a filehandle it=20 got from the meta-data server to a data server unless it has some=20 specific guidance that it can, such as a locations_info attribute=20 saying servers X, Y, Z are equivalent. The important point is=20 that that latter is not always going to be there and the client may not assume that it is. =20 > and the server verifying that=20 > the client did so. The verification is a big help when testing. This is going to be more complicated that what we've done in the past and the=20 earlier we detect a problem the better off we are all going=20 to be. I wouldn't think of trying to make this work without=20 that kind of checking, particular given all the possible types=20 of implementations we have been talking about here. All this=20 requires is one bit in a file handle saying whether it gives the=20 right to do all operations (including metadata-server operations)=20 or just the subset for data server operations. If you are inclined=20 not to do this, my question would be, "Do you feel lucky?". -----Original Message----- From: Marc Eshel [mailto:eshel@almaden.ibm.com]=20 Sent: Wednesday, June 22, 2005 1:12 PM To: Noveck, Dave Cc: Goodson, Garth; nfsv4@ietf.org; nfsv4-bounces@ietf.org Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) "Noveck, Dave" wrote on 06/22/2005 06:50:45 AM: > The full response this time. >=20 > > > You say "you should follow the layout instruction" and I'm torn > > > between saying "you SHOULD follow the layout instruction" and > > > "you MUST follow the layout instruction". > > >=20 > > > You say "[should] get an error if you don't" and I say "the server > > > SHOULD give you an error if you don't". > >=20 >=20 > > I say should follow and not MUST follow because I am trying to avoid the=20 > > complication to the server if it MUST enforce this rule which might=20 not be=20 > > a problem for the server in the first place.=20 >=20 > Hold on. I never suggested "MUST" for the server's obligation to check. > It seems that the server checking is where you see=20 difficulties/inconvenience. >=20 > I did suggest "MUST" as a possiblility for the clients' obligation to=20 > conform. These two do *not* have to go in tandem. While it wouldn't=20 make > any sense to have the client not have to conform while the server is=20 giving > him an error if he doesn't, it is perfectly reasonable for the spec to > strongly state the rule for the client but not to insist that the server > check for compliance if it has great difficulties doing so. >=20 > > For example, I might want to=20 > > allow the read of the same large file(no caching) from one set of data=20 > > server for client A and a different set of data server for client B. Now=20 > > to enforce the above rule the server need to some how encode into the=20 file=20 > > handle information about which client can read what from which data=20 > > server?=20 >=20 > I may not be uderstanding your example correctly but it sounds like the > case you are worried about is not really at issue here. The example was just to illustrate the information (capabilities) that you=20 would have to encode in the file handle if we wanted the server verify=20 that the client follow the rules. =20 >=20 > I know we have been talking kind of loosely about should/SHOULD/MUST=20 > "follow the layout instruction". This is overbroad. If a server is > told to use server111.clustersRus.org and takes that same handles and > uses it on some other server, server111.clustersRus.com for example, > then he is not following the layout instruction, but the spec is not > going to require anybody to specifically act to make sure that he=20 > gets an error. The effect of using a filehandle on a server other > than the one it for had always been undefined, and I expect it will > continue to be. Even though your data servers above are going to=20 > be in more confederal relationship than the two server111's, I=20 > think the same will still hold. If I take a handle for X and use it > on Y, I have a real good chance of getting STALE but there is no > guarantee that I will. >=20 > The specific issue that started this (and that I'm still talking=20 > about) is more limited. I'm given a handle H for a server A in a > layout and in that is the requirement that that handle be valid > for READ/WRITE, etc. and not for SETATTR. If the client uses it > on A and does a SETATTR, he SHOULD get an error. If he uses that > same handle on B and does a READ then he is broken but the server > has no obligation to recognize handles for other servers. Similarly > if he does a SETATTR with the handle on B. >=20 > > I prefer not have this added extra work on the server and just say=20 > > that the client should follow the rule to guaranty successful=20 operation.=20 >=20 > In the IETF "should" is very weak and amounts to "gee, it is sort of a > good idea to". "SHOULD" is much stronger and says "Do it unless you=20 have > a real good reason not to". "MUST" just says to do it.=20 >=20 > I guess I still think that if you receive a handle for server A in a=20 > layout, you MUST NOT use it to do operations on that server other than > PUTFH, COMMIT, READ, WRITE, and that if you do, the server SHOULD give > you an error. >=20 > If you feel this is too difficult for the server, then the "SHOULD"=20 > would give you enough wiggle-room. >=20 I can see that I can get a way with not checking on the server but I would=20 like to implement a server that is closer to the spec recommendation and understand the requirements. The "SHOULD" makes me a little uncomfortable=20 I would prefer "should", but if it is only me I will shut-up. I would rather keep using file handles as file ids which can be used on=20 any node of a cluster file system and not have to encode capabilities into=20 them. Now the client needs to remember all those different file handles=20 that should be used for different operations and the server verifying that=20 the client did so. Marc.=20 _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Wed Jun 22 16:54:09 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DlCEb-0002Gt-Hy; Wed, 22 Jun 2005 16:54:09 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DlCEX-0002FH-7n; Wed, 22 Jun 2005 16:54:05 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id QAA04409; Wed, 22 Jun 2005 16:54:02 -0400 (EDT) Received: from e2.ny.us.ibm.com ([32.97.182.142]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DlCcj-0001jX-BT; Wed, 22 Jun 2005 17:19:05 -0400 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e2.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j5MKrtZp003553; Wed, 22 Jun 2005 16:53:55 -0400 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay02.pok.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j5MKrtZJ224388; Wed, 22 Jun 2005 16:53:55 -0400 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.12.11/8.13.3) with ESMTP id j5MKrtFk022236; Wed, 22 Jun 2005 16:53:55 -0400 Received: from [9.56.227.90] (d01ml604.pok.ibm.com [9.56.227.90]) by d01av04.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j5MKrtEM022233; Wed, 22 Jun 2005 16:53:55 -0400 In-Reply-To: To: "Noveck, Dave" Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) MIME-Version: 1.0 X-Mailer: Lotus Notes Build V70_M4_01112005 Beta 3NP January 11, 2005 Message-ID: From: Marc Eshel Date: Wed, 22 Jun 2005 13:53:44 -0700 X-MIMETrack: Serialize by Router on D01ML604/01/M/IBM(Build V70_06092005|June 09, 2005) at 06/22/2005 16:53:55, Serialize complete at 06/22/2005 16:53:55 Content-Type: text/plain; charset="US-ASCII" X-Spam-Score: 0.0 (/) X-Scan-Signature: 41c17b4b16d1eedaa8395c26e9a251c4 Cc: "Goodson, Garth" , nfsv4@ietf.org, nfsv4-bounces@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org "Noveck, Dave" on 06/22/2005 11:58:12 AM: > Whoa! Before you said you were OK with the client obligation (to only > do > READ/WRITE/etc on filehandles it got from layouts) and only objected > to the work of the server verifying compliance. > > Now it appears that you object to forcing the client to obey that rule, > i.e. that the problem with the verification is not that it is hard to > do but that it would make the client remember "all those different file > handles that should be used for different operations". If that is the > case then we have a real problem. You have an implementation in which > every server may act as a metadata server but the pnfs client cannot > assume that all of the implementations with which it will interact > will have that characteristic or else we have a massive (lack-of)- > interoperability problem. If a layout tells the client he may use > handle A on server X to READ/WRITE then he had to be capable of > respecting that, whether the server holds him to it or not. I don't object. I just voiced a concern about the implementation overhead. It is obvious that I am thinking of cluster filesystem only and if there is a need for other implementation to require that the client use only the file handles provided for each specific operation then fine. > I'm perfectly OK with exposing additional functionality that a > cluster fs would provide for metadata load-balancing and failover > as long as we are clear that this is something that the client is > directed to use based on server characteristics. For example, if > the devinfo entry says that the layout handle may be used to read/ > write on a certain set of guaranteed-equivalent servers, then this > is fine. Or if a locations_info attribute for the fs indicated that > coherent metadata service was available on a given set of servers, > then this is OK as well. But each of these options is an option > and the basic architecture of pnfs is that there is a distinction > between data service and meta-data service and that the client > has to maintain that distinction. Just as a pnfs client should > not use a block address in a SETATTR request or send a filehandle > in a SCSI block write :-), it should not send a handle it got from > a layout in a SETATTR request. It should not send a filehandle it > got from the meta-data server to a data server unless it has some > specific guidance that it can, such as a locations_info attribute > saying servers X, Y, Z are equivalent. The important point is > that that latter is not always going to be there and the client > may not assume that it is. This sound like a good compromise I would like to see the above options in the protocol. > > and the server verifying that > > the client did so. > > The verification is a big help when testing. This is going to > be more complicated that what we've done in the past and the > earlier we detect a problem the better off we are all going > to be. I wouldn't think of trying to make this work without > that kind of checking, particular given all the possible types > of implementations we have been talking about here. All this > requires is one bit in a file handle saying whether it gives the > right to do all operations (including metadata-server operations) > or just the subset for data server operations. If you are inclined > not to do this, my question would be, "Do you feel lucky?". > Like you say it is only one bit and it is not to difficult to implement on the server, but now you force the client to remember 2 file handles that a different only by one bit (big waste of space :). The client need to remember only which are metadata servers and which are data servers and be required to direct operation to the appropriate server. If the client made a mistake and the server cares than the server can reject the operation. Marc. _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Wed Jun 22 21:51:22 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DlGsE-0005W1-7P; Wed, 22 Jun 2005 21:51:22 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DlGsC-0005Vt-6v; Wed, 22 Jun 2005 21:51:20 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id VAA25753; Wed, 22 Jun 2005 21:51:15 -0400 (EDT) Received: from brmea-mail-4.sun.com ([192.18.98.36]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DlHGM-0006OL-Lt; Wed, 22 Jun 2005 22:16:21 -0400 Received: from sfbaymail1sca.SFBay.Sun.COM ([129.145.154.35]) by brmea-mail-4.sun.com (8.12.10/8.12.9) with ESMTP id j5N1pCqg020981; Wed, 22 Jun 2005 19:51:12 -0600 (MDT) Received: from sheplap.Central.Sun.COM (sheplap.Central.Sun.COM [10.1.194.251]) by sfbaymail1sca.SFBay.Sun.COM (8.12.10+Sun/8.12.10/ENSMAIL,v2.2) with ESMTP id j5N1pBPu013826; Wed, 22 Jun 2005 18:51:11 -0700 (PDT) Received: by sheplap.Central.Sun.COM (Postfix, from userid 76367) id EA1F44065BF; Wed, 22 Jun 2005 20:51:50 -0500 (CDT) Date: Wed, 22 Jun 2005 20:51:50 -0500 From: Spencer Shepler To: Marc Eshel Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Message-ID: <20050623015150.GV5698@sheplap.Central.Sun.COM> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i X-Spam-Score: 0.0 (/) X-Scan-Signature: 8b30eb7682a596edff707698f4a80f7d Cc: "Goodson, Garth" , nfsv4-bounces@ietf.org, "Noveck, Dave" , nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: spencer.shepler@sun.com List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org On Wed, Marc Eshel wrote: <...> > > > and the server verifying that > > > the client did so. > > > > The verification is a big help when testing. This is going to > > be more complicated that what we've done in the past and the > > earlier we detect a problem the better off we are all going > > to be. I wouldn't think of trying to make this work without > > that kind of checking, particular given all the possible types > > of implementations we have been talking about here. All this > > requires is one bit in a file handle saying whether it gives the > > right to do all operations (including metadata-server operations) > > or just the subset for data server operations. If you are inclined > > not to do this, my question would be, "Do you feel lucky?". > > > Like you say it is only one bit and it is not to difficult to implement on > the server, but now you force the client to remember 2 file handles that a > different only by one bit (big waste of space :). The client need to > remember only which are metadata servers and which are data servers and be > required to direct operation to the appropriate server. If the client made > a mistake and the server cares than the server can reject the operation. > Marc. This is a general comment (and one I have made before) based on the expectations of what the server will "look" like for a pNFS extension. It would be prudent identify the various operational, deployment or implementation models that people either have planned for the pNFS functionality or can reasonably imagine. It will be important so that we have this in mind when reviewing the general protocol for the subtle interactions of meta-data and data servers as has been identified in this thread of discussion. Oh yeah, the above is with my working group co-chair hat on. Spencer _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Thu Jun 23 01:12:12 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DlK0a-0001P2-2I; Thu, 23 Jun 2005 01:12:12 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DlK0V-0001Lv-LG; Thu, 23 Jun 2005 01:12:07 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id BAA11312; Thu, 23 Jun 2005 01:12:07 -0400 (EDT) Received: from e5.ny.us.ibm.com ([32.97.182.145]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DlKOj-0006hN-PY; Thu, 23 Jun 2005 01:37:12 -0400 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e5.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j5N5Bf4e013661; Thu, 23 Jun 2005 01:11:41 -0400 Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay02.pok.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j5N5BfZJ226824; Thu, 23 Jun 2005 01:11:41 -0400 Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.12.11/8.13.3) with ESMTP id j5N5BVKu004552; Thu, 23 Jun 2005 01:11:31 -0400 Received: from [9.56.227.90] (d01ml604.pok.ibm.com [9.56.227.90]) by d01av03.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j5N5BVum004222; Thu, 23 Jun 2005 01:11:31 -0400 In-Reply-To: <20050623015150.GV5698@sheplap.Central.Sun.COM> To: spencer.shepler@sun.com Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) MIME-Version: 1.0 X-Mailer: Lotus Notes Build V70_M4_01112005 Beta 3NP January 11, 2005 Message-ID: From: Marc Eshel Date: Wed, 22 Jun 2005 22:11:09 -0700 X-MIMETrack: Serialize by Router on D01ML604/01/M/IBM(Build V70_06092005|June 09, 2005) at 06/23/2005 01:11:30, Serialize complete at 06/23/2005 01:11:30 Content-Type: text/plain; charset="US-ASCII" X-Spam-Score: 0.0 (/) X-Scan-Signature: 8b30eb7682a596edff707698f4a80f7d Cc: "Goodson, Garth" , nfsv4-bounces@ietf.org, "Noveck, Dave" , nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org Spencer Shepler wrote on 06/22/2005 06:51:50 PM: > It would be prudent identify the various operational, deployment or > implementation models that people either have planned for the pNFS > functionality or can reasonably imagine. It will be important so that > we have this in mind when reviewing the general protocol for the > subtle interactions of meta-data and data servers as has been > identified in this thread of discussion. > > Oh yeah, the above is with my working group co-chair hat on. > I just started to think about this topic lately so I don't have a clear model so will just dump what I think that I can or would like to do in short (I am always very terse but I will try to elaborate:). Give a cluster filesystem where all the data is available on all the nodes I would like to use pNFS to do parallel I/O from as many nodes as possible so I would make all nodes to be data servers. I believe the metadata operation can saturate a single node even if it not doing any data I/O so I would like all the nodes to also be metadata server, in other words distribute all the operations to all the nodes. Or, direct all clients to a specific node for a file or a files segment because it is cached on that node or that node has faster access to the disks. Have multiple alternate nodes or maybe all the nodes for any given I/O in the case of an error. Return the data from the metadata server for small files and avoid all the layout exchange and redirection. Have short way to reference a list of nodes that can be in the hundreds that can be given once and not repeated in every layout. Not have to many requirement to validate correct client behavior which requires a lot of book keeping on the server side if the only thing it affected is performance (in the case of cluster filesystem) after all if the client requested all the data from the metadata server it is valid option and no one will produce any error codes. I am not sure if this is much help but I plan to spend much more time on the topic when I get back from vacation in 3 weeks and provide more input. I think that Dave Noveck suggested in his last note to add some options that will help with cluster filesystem implementations and I think this is a good idea :) maybe we need some more input from other cluster filesystem planed or even prototyped implementations. Marc. _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Thu Jun 23 18:26:21 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dla9M-0008CL-VW; Thu, 23 Jun 2005 18:26:20 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dla9K-0008CD-GM for nfsv4@megatron.ietf.org; Thu, 23 Jun 2005 18:26:18 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA09444 for ; Thu, 23 Jun 2005 18:26:16 -0400 (EDT) Received: from gw-w.panasas.com ([63.80.58.206] helo=medlicott.panasas.com) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DlaXj-0000yn-Ki for nfsv4@ietf.org; Thu, 23 Jun 2005 18:51:32 -0400 Received: from panasas.com (welch@localhost) by medlicott.panasas.com (8.11.6/8.11.6) with ESMTP id j5NMPqH30039; Thu, 23 Jun 2005 15:25:52 -0700 Message-Id: <200506232225.j5NMPqH30039@medlicott.panasas.com> X-Authentication-Warning: medlicott.panasas.com: welch owned process doing -bs X-Mailer: exmh version 2.7.3 (cvs) 04/15/2005 with nmh-1.0.4 To: Marc Eshel Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) In-reply-to: References: Comments: In-reply-to Marc Eshel message dated "Wed, 22 Jun 2005 13:53:44 -0700." From: Brent Welch X-URL: http://www.panasas.com/ X-Face: "HxE|?EnC9fVMV8f70H83&{fgLE.|FZ^$>@Q(yb#N,Eh~N]e&]=> r5~UnRml1:4EglY{9B+ :'wJq$@c_C!l8@<$t,{YUr4K,QJGHSvS~U]H`<+L*x?eGzSk>XH\W:AK\j?@?c1o, "Noveck, Dave" , nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org >>>Marc Eshel said: > > "Noveck, Dave" on 06/22/2005 11:58:12 AM: > > > Whoa! Before you said you were OK with the client obligation (to only > > do > > READ/WRITE/etc on filehandles it got from layouts) and only objected > > to the work of the server verifying compliance. > > > > Now it appears that you object to forcing the client to obey that rule, > > i.e. that the problem with the verification is not that it is hard to > > do but that it would make the client remember "all those different file > > handles that should be used for different operations". If that is the > > case then we have a real problem. You have an implementation in which > > every server may act as a metadata server but the pnfs client cannot > > assume that all of the implementations with which it will interact > > will have that characteristic or else we have a massive (lack-of)- > > interoperability problem. If a layout tells the client he may use > > handle A on server X to READ/WRITE then he had to be capable of > > respecting that, whether the server holds him to it or not. > > I don't object. I just voiced a concern about the implementation overhead. > It is obvious that I am thinking of cluster filesystem only and if there > is a need for other implementation to require that the client use only the > file handles provided for each specific operation then fine. > > > I'm perfectly OK with exposing additional functionality that a > > cluster fs would provide for metadata load-balancing and failover > > as long as we are clear that this is something that the client is > > directed to use based on server characteristics. For example, if > > the devinfo entry says that the layout handle may be used to read/ > > write on a certain set of guaranteed-equivalent servers, then this > > is fine. Or if a locations_info attribute for the fs indicated that > > coherent metadata service was available on a given set of servers, > > then this is OK as well. But each of these options is an option > > and the basic architecture of pnfs is that there is a distinction > > between data service and meta-data service and that the client > > has to maintain that distinction. Just as a pnfs client should > > not use a block address in a SETATTR request or send a filehandle > > in a SCSI block write :-), it should not send a handle it got from > > a layout in a SETATTR request. It should not send a filehandle it > > got from the meta-data server to a data server unless it has some > > specific guidance that it can, such as a locations_info attribute > > saying servers X, Y, Z are equivalent. The important point is > > that that latter is not always going to be there and the client > > may not assume that it is. > > This sound like a good compromise I would like to see the above options in > the protocol. I'd like to suggest that we mention the issues about multiple metadata servers, but that we don't explicitly address them in the current pNFS proposals. The goal is to get pNFS clients that interoperate with different servers. If some servers have very different semantics (transparent failover among them, servicing of metadata or data operations with internal forwarding, whatever) then that has a big impact on the clients. In otherwords, we are starting small with just an effort to distribute the I/O load. Bypassing the metadata server for I/O goes a long way to reducing load and providing scalability. Let's get that worked out before we do metadata load balancing. If you really, really, wanted to go there, then you could define a new layout type that returned, e.g., a set of equivalent (deviceID, filehandle) that the client could use based on the availability or load of the data server. You might also be tempted (as Dean is) to return layouts that hint to the client that if it did a GETATTR to a data server it would get back something sensible. However, I don't think we should go there, even though you and I, as cluster file system implementers may have already done that. > > > and the server verifying that > > > the client did so. > > > > The verification is a big help when testing. This is going to > > be more complicated that what we've done in the past and the > > earlier we detect a problem the better off we are all going > > to be. I wouldn't think of trying to make this work without > > that kind of checking, particular given all the possible types > > of implementations we have been talking about here. All this > > requires is one bit in a file handle saying whether it gives the > > right to do all operations (including metadata-server operations) > > or just the subset for data server operations. If you are inclined > > not to do this, my question would be, "Do you feel lucky?". > > > Like you say it is only one bit and it is not to difficult to implement on > the server, but now you force the client to remember 2 file handles that a > different only by one bit (big waste of space :). The client need to > remember only which are metadata servers and which are data servers and be > required to direct operation to the appropriate server. If the client made > a mistake and the server cares than the server can reject the operation. I have the same reaction as Dave - I don't see how you can argue that the spec should imply that the client can get away with switching around the handles used on the metadata servers and the data servers. If your implementation wants to give out the same bit pattern for these cases, that's fine. But clients simply MUST use the file handles in the layouts for operations on the corresponding device, and it is simply undefined what happens if they use a file handle from the metadata server with a data server or vice versa, or heck, swap around the file handles among the different data servers. "Of course" the clients will keep track of what handles are to be used with what servers and what operations, because they MUST. -- Brent Welch Software Architect, Panasas Inc Accelerating Time to Results(tm) with Clustered Storage www.panasas.com welch@panasas.com _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Thu Jun 23 18:50:59 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DlaXD-0003R3-Dd; Thu, 23 Jun 2005 18:50:59 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DlaXB-0003Qy-LT for nfsv4@megatron.ietf.org; Thu, 23 Jun 2005 18:50:57 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA11272 for ; Thu, 23 Jun 2005 18:50:55 -0400 (EDT) Received: from gw-w.panasas.com ([63.80.58.206] helo=medlicott.panasas.com) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DlavZ-00027g-21 for nfsv4@ietf.org; Thu, 23 Jun 2005 19:16:11 -0400 Received: from panasas.com (welch@localhost) by medlicott.panasas.com (8.11.6/8.11.6) with ESMTP id j5NMohJ30146; Thu, 23 Jun 2005 15:50:43 -0700 Message-Id: <200506232250.j5NMohJ30146@medlicott.panasas.com> X-Authentication-Warning: medlicott.panasas.com: welch owned process doing -bs X-Mailer: exmh version 2.7.3 (cvs) 04/15/2005 with nmh-1.0.4 To: Marc Eshel Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) In-reply-to: References: Comments: In-reply-to Marc Eshel message dated "Wed, 22 Jun 2005 22:11:09 -0700." From: Brent Welch X-URL: http://www.panasas.com/ X-Face: "HxE|?EnC9fVMV8f70H83&{fgLE.|FZ^$>@Q(yb#N,Eh~N]e&]=> r5~UnRml1:4EglY{9B+ :'wJq$@c_C!l8@<$t,{YUr4K,QJGHSvS~U]H`<+L*x?eGzSk>XH\W:AK\j?@?c1o, spencer.shepler@sun.com, "Noveck, Dave" , nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org >>>Marc Eshel said: > > Spencer Shepler wrote on 06/22/2005 06:51:50 PM: > > > It would be prudent identify the various operational, deployment or > > implementation models that people either have planned for the pNFS > > functionality or can reasonably imagine. It will be important so that > > we have this in mind when reviewing the general protocol for the > > subtle interactions of meta-data and data servers as has been > > identified in this thread of discussion. > > > > Oh yeah, the above is with my working group co-chair hat on. > > > I just started to think about this topic lately so I don't have a clear > model so will just dump what I think that I can or would like to do in > short (I am always very terse but I will try to elaborate:). Give a > cluster filesystem where all the data is available on all the nodes I > would like to use pNFS to do parallel I/O from as many nodes as possible > so I would make all nodes to be data servers. I believe the metadata > operation can saturate a single node even if it not doing any data I/O so > I would like all the nodes to also be metadata server, in other words > distribute all the operations to all the nodes. Or, direct all clients to > a specific node for a file or a files segment because it is cached on that > node or that node has faster access to the disks. Have multiple alternate > nodes or maybe all the nodes for any given I/O in the case of an error. > Return the data from the metadata server for small files and avoid all the > layout exchange and redirection. Have short way to reference a list of > nodes that can be in the hundreds that can be given once and not repeated > in every layout. Not have to many requirement to validate correct client > behavior which requires a lot of book keeping on the server side if the > only thing it affected is performance (in the case of cluster filesystem) > after all if the client requested all the data from the metadata server it > is valid option and no one will produce any error codes. > > I am not sure if this is much help but I plan to spend much more time on > the topic when I get back from vacation in 3 weeks and provide more input. > I think that Dave Noveck suggested in his last note to add some options > that will help with cluster filesystem implementations and I think this is > a good idea :) maybe we need some more input from other cluster filesystem > planed or even prototyped implementations. First I'll restate what I think your model is, and then describe another one. Under your cluster file system there is some storage substrate that today is hidden by your "nodes" (e.g., a back-end SAN). And, your nodes cooperate to manage metadata and each exports an identical view. You are thinking that pNFS will be another layer over your nodes, so that the underlying storage system is still fairly hidden. In this model, pNFS will let you fetch data for a single file from many "nodes" in parallel, and so get higher bandwidth (ideally) than a single node can deliver. Also, by artfully distributing the layouts returned to clients, you can smear the I/O load over more nodes and achieve more balanced load among your nodes. pNFS in its current form does not directly address the balancing of metadata operations like GETATTR over your nodes. The only approach I can offer you is that different clients mount different nodes to get a coarse level of metadata load balancing. As an aside, I think the FS_LOCATIONS attribute is similar in spirit to what you want, but you want it on a per-file basis. Today that operation applies to whole file systems for the purposes of migration. Ultimately I think we'll want a FILE_LOCATION attribute (or something) that redirects a client to a different metadata server. But that would be a different extension than pNFS. I think it could be orthogonal. A different model for your cluster file system is to bring the pNFS clients more tightly into your cluster file system by exposing more of the underlying storage layer. If you had a SAN, for example, then you'd be giving out block layouts and letting the clients sit right on the SAN and bypass your nodes altogether to do I/O. The objects world takes this approach. The clients can communicate directly with storage devices, and the storage devices don't really know how the objects being read/written by clients fit into the file system. The clients have to communicate with metadata servers which take the role of building up file system semantics on top of something with a simpler interface. Blocks are really simple, and objects are slightly richer. By shunting all the I/O load directly to the storage devices, the metadata servers don't have all that much work to do. So, I'd characterize this as an "asymmetric" model where data servers own particular pieces of storage, and the metadata servers direct clients to the appropriate location via layouts. In contrast, you have a "symmetric" model where any data is available at any storage server. But, ultimately there is a hidden asymmetric model unless you have fully replicated all the data on all nodes in the symmetric system. -- Brent Welch Software Architect, Panasas Inc Accelerating Time to Results(tm) with Clustered Storage www.panasas.com welch@panasas.com _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Thu Jun 23 20:06:09 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dlbhx-0000qk-60; Thu, 23 Jun 2005 20:06:09 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dlbhw-0000qb-6V; Thu, 23 Jun 2005 20:06:08 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id UAA18343; Thu, 23 Jun 2005 20:06:07 -0400 (EDT) Received: from e2.ny.us.ibm.com ([32.97.182.142]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1Dlc6M-0005w7-JF; Thu, 23 Jun 2005 20:31:23 -0400 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e2.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j5O05snQ022406; Thu, 23 Jun 2005 20:05:54 -0400 Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay02.pok.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j5O05siO261134; Thu, 23 Jun 2005 20:05:54 -0400 Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.12.11/8.13.3) with ESMTP id j5O05r11015950; Thu, 23 Jun 2005 20:05:53 -0400 Received: from [9.56.227.90] (d01ml604.pok.ibm.com [9.56.227.90]) by d01av03.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j5O05rNw015947; Thu, 23 Jun 2005 20:05:53 -0400 In-Reply-To: <200506232225.j5NMPqH30039@medlicott.panasas.com> To: Brent Welch Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) MIME-Version: 1.0 X-Mailer: Lotus Notes Build V70_M4_01112005 Beta 3NP January 11, 2005 Message-ID: From: Marc Eshel Date: Thu, 23 Jun 2005 17:05:40 -0700 X-MIMETrack: Serialize by Router on D01ML604/01/M/IBM(Build V70_06092005|June 09, 2005) at 06/23/2005 20:05:53, Serialize complete at 06/23/2005 20:05:53 Content-Type: text/plain; charset="US-ASCII" X-Spam-Score: 0.0 (/) X-Scan-Signature: 3d7f2f6612d734db849efa86ea692407 Cc: "Goodson, Garth" , nfsv4-bounces@ietf.org, "Noveck, Dave" , nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org nfsv4-bounces@ietf.org wrote on 06/23/2005 03:25:52 PM: > > >>>Marc Eshel said: > > > > "Noveck, Dave" on 06/22/2005 11:58:12 AM: > > > > > Whoa! Before you said you were OK with the client obligation (to only > > > do > > > READ/WRITE/etc on filehandles it got from layouts) and only objected > > > to the work of the server verifying compliance. > > > > > > Now it appears that you object to forcing the client to obey that rule, > > > i.e. that the problem with the verification is not that it is hard to > > > do but that it would make the client remember "all those different file > > > handles that should be used for different operations". If that is the > > > case then we have a real problem. You have an implementation in which > > > every server may act as a metadata server but the pnfs client cannot > > > assume that all of the implementations with which it will interact > > > will have that characteristic or else we have a massive (lack-of)- > > > interoperability problem. If a layout tells the client he may use > > > handle A on server X to READ/WRITE then he had to be capable of > > > respecting that, whether the server holds him to it or not. > > > > I don't object. I just voiced a concern about the implementation > overhead. > > It is obvious that I am thinking of cluster filesystem only and if there > > is a need for other implementation to require that the client use only > the > > file handles provided for each specific operation then fine. > > > > > I'm perfectly OK with exposing additional functionality that a > > > cluster fs would provide for metadata load-balancing and failover > > > as long as we are clear that this is something that the client is > > > directed to use based on server characteristics. For example, if > > > the devinfo entry says that the layout handle may be used to read/ > > > write on a certain set of guaranteed-equivalent servers, then this > > > is fine. Or if a locations_info attribute for the fs indicated that > > > coherent metadata service was available on a given set of servers, > > > then this is OK as well. But each of these options is an option > > > and the basic architecture of pnfs is that there is a distinction > > > between data service and meta-data service and that the client > > > has to maintain that distinction. Just as a pnfs client should > > > not use a block address in a SETATTR request or send a filehandle > > > in a SCSI block write :-), it should not send a handle it got from > > > a layout in a SETATTR request. It should not send a filehandle it > > > got from the meta-data server to a data server unless it has some > > > specific guidance that it can, such as a locations_info attribute > > > saying servers X, Y, Z are equivalent. The important point is > > > that that latter is not always going to be there and the client > > > may not assume that it is. > > > > This sound like a good compromise I would like to see the above options > in > > the protocol. > > I'd like to suggest that we mention the issues about multiple > metadata servers, but that we don't explicitly address them in > the current pNFS proposals. The goal is to get pNFS clients that > interoperate with different servers. If some servers have very > different semantics (transparent failover among them, servicing > of metadata or data operations with internal forwarding, whatever) > then that has a big impact on the clients. In otherwords, we are > starting small with just an effort to distribute the I/O load. > Bypassing the metadata server for I/O goes a long way to reducing > load and providing scalability. Let's get that worked out before > we do metadata load balancing. > > If you really, really, wanted to go there, then you could define > a new layout type that returned, e.g., a set of equivalent > (deviceID, filehandle) that the client could use based on > the availability or load of the data server. You might also > be tempted (as Dean is) to return layouts that hint to the client > that if it did a GETATTR to a data server it would get back > something sensible. However, I don't think we should go there, even > though you and I, as cluster file system implementers may have > already done that. > Yes I really really want to go there because there are few different cluster filesystems out there today with clusters of hounders and thousands of nodes and they can really really benefit from the p in pNFS. it is not some future requirement and I really don't want to wait for the next version of this protocol. I don't want to give hundred identical file handles, I want a way to give one file handles and tell the client to use it on a list of data servers that I can give once and reference over and over. I would also use Dean's hint for GETATTR. > > > > and the server verifying that > > > > the client did so. > > > > > > The verification is a big help when testing. This is going to > > > be more complicated that what we've done in the past and the > > > earlier we detect a problem the better off we are all going > > > to be. I wouldn't think of trying to make this work without > > > that kind of checking, particular given all the possible types > > > of implementations we have been talking about here. All this > > > requires is one bit in a file handle saying whether it gives the > > > right to do all operations (including metadata-server operations) > > > or just the subset for data server operations. If you are inclined > > > not to do this, my question would be, "Do you feel lucky?". > > > > > Like you say it is only one bit and it is not to difficult to implement > on > > the server, but now you force the client to remember 2 file handles that > a > > different only by one bit (big waste of space :). The client need to > > remember only which are metadata servers and which are data servers and > be > > required to direct operation to the appropriate server. If the client > made > > a mistake and the server cares than the server can reject the operation. > > I have the same reaction as Dave - I don't see how you can argue that > the spec should imply that the client can get away with switching > around the handles used on the metadata servers and the data servers. > If your implementation wants to give out the same bit pattern for > these cases, that's fine. But clients simply MUST use the file handles > in the layouts for operations on the corresponding device, and it > is simply undefined what happens if they use a file handle from the > metadata server with a data server or vice versa, or heck, swap around > the file handles among the different data servers. "Of course" the > clients will keep track of what handles are to be used with what > servers and what operations, because they MUST. > At first I just suggest that server will not have to verify the usage of the right file handle, not that the client swap around file handles (Dave said that I don't really have to If I don't want), but like you suggested I can use equivalent file handles to avoid the problem, the client have nothing to swap or get confused with. Now I suggest like in the above comment that the client get only one file handles so there is no possibility for confusion and we can save a lot of space. > -- > Brent Welch > Software Architect, Panasas Inc > Accelerating Time to Results(tm) with Clustered Storage > _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Thu Jun 23 20:24:58 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dlc0A-0004sN-Px; Thu, 23 Jun 2005 20:24:58 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dlc07-0004s8-4Q for nfsv4@megatron.ietf.org; Thu, 23 Jun 2005 20:24:57 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id UAA19264 for ; Thu, 23 Jun 2005 20:24:51 -0400 (EDT) Received: from e5.ny.us.ibm.com ([32.97.182.145]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DlcOV-0006uD-ER for nfsv4@ietf.org; Thu, 23 Jun 2005 20:50:07 -0400 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e5.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j5O0OdtC010668 for ; Thu, 23 Jun 2005 20:24:39 -0400 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay04.pok.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j5O0OdKb205956 for ; Thu, 23 Jun 2005 20:24:39 -0400 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11/8.13.3) with ESMTP id j5O0Oc5w000976 for ; Thu, 23 Jun 2005 20:24:39 -0400 Received: from [9.56.227.90] (d01ml604.pok.ibm.com [9.56.227.90]) by d01av02.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j5O0OchV000973; Thu, 23 Jun 2005 20:24:38 -0400 In-Reply-To: <200506232250.j5NMohJ30146@medlicott.panasas.com> To: Brent Welch Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) MIME-Version: 1.0 X-Mailer: Lotus Notes Build V70_M4_01112005 Beta 3NP January 11, 2005 Message-ID: From: Marc Eshel Date: Thu, 23 Jun 2005 17:24:26 -0700 X-MIMETrack: Serialize by Router on D01ML604/01/M/IBM(Build V70_06092005|June 09, 2005) at 06/23/2005 20:24:37, Serialize complete at 06/23/2005 20:24:37 Content-Type: text/plain; charset="US-ASCII" X-Spam-Score: 0.0 (/) X-Scan-Signature: 200d029292fbb60d25b263122ced50fc Cc: "Goodson, Garth" , spencer.shepler@sun.com, "Noveck, Dave" , nfsv4@ietf.org X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org This my last note I am going on vacation and I will not have internet access for 3 weeks. I will continue to bug you when I come back. Marc. Brent Welch wrote on 06/23/2005 03:50:43 PM: > >>>Marc Eshel said: > > > > Spencer Shepler wrote on 06/22/2005 06:51:50 > PM: > > > > > It would be prudent identify the various operational, deployment or > > > implementation models that people either have planned for the pNFS > > > functionality or can reasonably imagine. It will be important so that > > > we have this in mind when reviewing the general protocol for the > > > subtle interactions of meta-data and data servers as has been > > > identified in this thread of discussion. > > > > > > Oh yeah, the above is with my working group co-chair hat on. > > > > > I just started to think about this topic lately so I don't have a clear > > model so will just dump what I think that I can or would like to do in > > short (I am always very terse but I will try to elaborate:). Give a > > cluster filesystem where all the data is available on all the nodes I > > would like to use pNFS to do parallel I/O from as many nodes as possible > > so I would make all nodes to be data servers. I believe the metadata > > operation can saturate a single node even if it not doing any data I/O > so > > I would like all the nodes to also be metadata server, in other words > > distribute all the operations to all the nodes. Or, direct all clients > to > > a specific node for a file or a files segment because it is cached on > that > > node or that node has faster access to the disks. Have multiple > alternate > > nodes or maybe all the nodes for any given I/O in the case of an error. > > Return the data from the metadata server for small files and avoid all > the > > layout exchange and redirection. Have short way to reference a list of > > nodes that can be in the hundreds that can be given once and not > repeated > > in every layout. Not have to many requirement to validate correct client > > behavior which requires a lot of book keeping on the server side if the > > only thing it affected is performance (in the case of cluster > filesystem) > > after all if the client requested all the data from the metadata server > it > > is valid option and no one will produce any error codes. > > > > I am not sure if this is much help but I plan to spend much more time on > > the topic when I get back from vacation in 3 weeks and provide more > input. > > I think that Dave Noveck suggested in his last note to add some options > > that will help with cluster filesystem implementations and I think this > is > > a good idea :) maybe we need some more input from other cluster > filesystem > > planed or even prototyped implementations. > > First I'll restate what I think your model is, and then describe another > one. > > Under your cluster file system there is some storage substrate that > today is hidden by your "nodes" (e.g., a back-end SAN). And, your > nodes cooperate to manage metadata and each exports an identical view. > You are thinking that pNFS will be another layer over your nodes, so > that the underlying storage system is still fairly hidden. In this model, > pNFS will let you fetch data for a single file from many "nodes" in > parallel, and so get higher bandwidth (ideally) than a single node > can deliver. Also, by artfully distributing the layouts returned to > clients, you can smear the I/O load over more nodes and achieve more > balanced load among your nodes. pNFS in its current form does not > directly address the balancing of metadata operations like GETATTR > over your nodes. The only approach I can offer you is that different > clients mount different nodes to get a coarse level of metadata > load balancing. > I would like to use fs_locations to distribute the client to different nodes. > As an aside, I think the FS_LOCATIONS attribute is similar in spirit > to what you want, but you want it on a per-file basis. Today that > operation applies to whole file systems for the purposes of migration. > Ultimately I think we'll want a FILE_LOCATION attribute (or something) > that redirects a client to a different metadata server. But that would > be a different extension than pNFS. I think it could be orthogonal. I don't think that it is orthogonal I was hoping that the output of pNFS will include file-locations. > A different model for your cluster file system is to bring the pNFS > clients more tightly into your cluster file system by exposing more > of the underlying storage layer. If you had a SAN, for example, then > you'd be giving out block layouts and letting the clients sit right > on the SAN and bypass your nodes altogether to do I/O. I prefer the file layout protocol on the block one. > The objects world takes this approach. The clients can communicate > directly with storage devices, and the storage devices don't really > know how the objects being read/written by clients fit into the > file system. The clients have to communicate with metadata servers > which take the role of building up file system semantics on top of > something with a simpler interface. Blocks are really simple, and > objects are slightly richer. By shunting all the I/O load directly > to the storage devices, the metadata servers don't have all that > much work to do. So, I'd characterize this as an "asymmetric" model > where data servers own particular pieces of storage, and the metadata > servers direct clients to the appropriate location via layouts. > In contrast, you have a "symmetric" model where any data is available > at any storage server. But, ultimately there is a hidden asymmetric > model unless you have fully replicated all the data on all nodes > in the symmetric system. > I have a symmetric system the only asymmetric might be in some configuration only in regards to performance and I would use the pNFS protocol optimize the performance where asymmetry exist. I believe that the cluster filesystem model is the simpler one and should be taken into consideration in the first version of pNFS. > -- > Brent Welch > Software Architect, Panasas Inc > Accelerating Time to Results(tm) with Clustered Storage _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Thu Jun 23 20:27:39 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dlc2l-0005kQ-J5; Thu, 23 Jun 2005 20:27:39 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dlc2j-0005iy-G0 for nfsv4@megatron.ietf.org; Thu, 23 Jun 2005 20:27:37 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id UAA19452 for ; Thu, 23 Jun 2005 20:27:36 -0400 (EDT) Received: from traakan.com ([66.160.190.59]) by ietf-mx.ietf.org with smtp (Exim 4.33) id 1DlcR6-0006xg-Je for nfsv4@ietf.org; Thu, 23 Jun 2005 20:52:52 -0400 Received: from GWW15 ([64.168.153.34]) by traakan.com for ; Thu, 23 Jun 2005 17:22:53 -0700 From: "Gordon Waidhofer" To: , Subject: RE: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) Date: Thu, 23 Jun 2005 17:26:45 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1506 In-Reply-To: Importance: Normal X-Spam-Score: 0.0 (/) X-Scan-Signature: df9edf1223802dd4cf213867a3af6121 Content-Transfer-Encoding: 7bit X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org I'll plead ignorance right up front because I haven't had time to read the spec and still don't. But........ Something (metadata server I guess) is providing a (deviceID, fileHandle) pair for accessing file content. How is the deviceID mapped to a node address? Is it possible that the aggregation of data servers can be handled by the deviceID->node mapping (here is a list of alternate addresses for this deviceID)? I'm not a stake holder in pNFS at this time but think it likely in the future. I would, for the sake of sound forward progress, suggest that multiple metadata servers is an order of magnitude more complicated than the single metadata server case. It would hopelessly stall pNFS, and useful single metadata server deployments would stall needlessly because of it. There does seem to be fair bit of chatter about aggregate devices (clients able to alternate at will) that it's worth a little try. I agree with Spencer that case studies would go a long way to helping frame the mind and the discussion. FWIW. Regards, -gww > -----Original Message----- > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org]On Behalf Of > Marc Eshel > Sent: Thursday, June 23, 2005 5:06 PM > To: Brent Welch > Cc: Goodson, Garth; nfsv4-bounces@ietf.org; Noveck, Dave; nfsv4@ietf.org > Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) > > > nfsv4-bounces@ietf.org wrote on 06/23/2005 03:25:52 PM: > > > > > >>>Marc Eshel said: > > > > > > "Noveck, Dave" on 06/22/2005 11:58:12 AM: > > > > > > > Whoa! Before you said you were OK with the client obligation (to > only > > > > do > > > > READ/WRITE/etc on filehandles it got from layouts) and only > objected > > > > to the work of the server verifying compliance. > > > > > > > > Now it appears that you object to forcing the client to obey that > rule, > > > > i.e. that the problem with the verification is not that it is hard > to > > > > do but that it would make the client remember "all those different > file > > > > handles that should be used for different operations". If that is > the > > > > case then we have a real problem. You have an implementation in > which > > > > every server may act as a metadata server but the pnfs client > cannot > > > > assume that all of the implementations with which it will interact > > > > will have that characteristic or else we have a massive (lack-of)- > > > > interoperability problem. If a layout tells the client he may use > > > > handle A on server X to READ/WRITE then he had to be capable of > > > > respecting that, whether the server holds him to it or not. > > > > > > I don't object. I just voiced a concern about the implementation > > overhead. > > > It is obvious that I am thinking of cluster filesystem only and if > there > > > is a need for other implementation to require that the client use > only > > the > > > file handles provided for each specific operation then fine. > > > > > > > I'm perfectly OK with exposing additional functionality that a > > > > cluster fs would provide for metadata load-balancing and failover > > > > as long as we are clear that this is something that the client is > > > > directed to use based on server characteristics. For example, if > > > > the devinfo entry says that the layout handle may be used to read/ > > > > write on a certain set of guaranteed-equivalent servers, then this > > > > is fine. Or if a locations_info attribute for the fs indicated > that > > > > coherent metadata service was available on a given set of servers, > > > > then this is OK as well. But each of these options is an option > > > > and the basic architecture of pnfs is that there is a distinction > > > > between data service and meta-data service and that the client > > > > has to maintain that distinction. Just as a pnfs client should > > > > not use a block address in a SETATTR request or send a filehandle > > > > in a SCSI block write :-), it should not send a handle it got from > > > > a layout in a SETATTR request. It should not send a filehandle it > > > > got from the meta-data server to a data server unless it has some > > > > specific guidance that it can, such as a locations_info attribute > > > > saying servers X, Y, Z are equivalent. The important point is > > > > that that latter is not always going to be there and the client > > > > may not assume that it is. > > > > > > This sound like a good compromise I would like to see the above > options > > in > > > the protocol. > > > > I'd like to suggest that we mention the issues about multiple > > metadata servers, but that we don't explicitly address them in > > the current pNFS proposals. The goal is to get pNFS clients that > > interoperate with different servers. If some servers have very > > different semantics (transparent failover among them, servicing > > of metadata or data operations with internal forwarding, whatever) > > then that has a big impact on the clients. In otherwords, we are > > starting small with just an effort to distribute the I/O load. > > Bypassing the metadata server for I/O goes a long way to reducing > > load and providing scalability. Let's get that worked out before > > we do metadata load balancing. > > > > If you really, really, wanted to go there, then you could define > > a new layout type that returned, e.g., a set of equivalent > > (deviceID, filehandle) that the client could use based on > > the availability or load of the data server. You might also > > be tempted (as Dean is) to return layouts that hint to the client > > that if it did a GETATTR to a data server it would get back > > something sensible. However, I don't think we should go there, even > > though you and I, as cluster file system implementers may have > > already done that. > > > Yes I really really want to go there because there are few different > cluster filesystems out there today with clusters of hounders and > thousands of nodes and they can really really benefit from the p in pNFS. > it is not some future requirement and I really don't want to wait for the > next version of this protocol. I don't want to give hundred > identical file > handles, I want a way to give one file handles and tell the client to use > it on a list of data servers that I can give once and reference over and > over. I would also use Dean's hint for GETATTR. > > > > > > and the server verifying that > > > > > the client did so. > > > > > > > > The verification is a big help when testing. This is going to > > > > be more complicated that what we've done in the past and the > > > > earlier we detect a problem the better off we are all going > > > > to be. I wouldn't think of trying to make this work without > > > > that kind of checking, particular given all the possible types > > > > of implementations we have been talking about here. All this > > > > requires is one bit in a file handle saying whether it gives the > > > > right to do all operations (including metadata-server operations) > > > > or just the subset for data server operations. If you are > inclined > > > > > not to do this, my question would be, "Do you feel lucky?". > > > > > > > Like you say it is only one bit and it is not to difficult to > implement > > on > > > the server, but now you force the client to remember 2 file handles > that > > a > > > different only by one bit (big waste of space :). The client need to > > > remember only which are metadata servers and which are data servers > and > > be > > > required to direct operation to the appropriate server. If > the client > > > made > > > a mistake and the server cares than the server can reject the > operation. > > > > I have the same reaction as Dave - I don't see how you can argue that > > the spec should imply that the client can get away with switching > > around the handles used on the metadata servers and the data servers. > > If your implementation wants to give out the same bit pattern for > > these cases, that's fine. But clients simply MUST use the file handles > > in the layouts for operations on the corresponding device, and it > > is simply undefined what happens if they use a file handle from the > > metadata server with a data server or vice versa, or heck, swap around > > the file handles among the different data servers. "Of course" the > > clients will keep track of what handles are to be used with what > > servers and what operations, because they MUST. > > > At first I just suggest that server will not have to verify the usage of > the right file handle, not that the client swap around file handles (Dave > said that I don't really have to If I don't want), but like you suggested > I can use equivalent file handles to avoid the problem, the client have > nothing to swap or get confused with. Now I suggest like in the above > comment that the client get only one file handles so there is no > possibility for confusion and we can save a lot of space. > > > -- > > Brent Welch > > Software Architect, Panasas Inc > > Accelerating Time to Results(tm) with Clustered Storage > > > > > > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www1.ietf.org/mailman/listinfo/nfsv4 > > _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4 From nfsv4-bounces@ietf.org Sun Jun 26 14:14:55 2005 Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dmbeh-0002gz-Cp; Sun, 26 Jun 2005 14:14:55 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Dmbef-0002gu-NQ for nfsv4@megatron.ietf.org; Sun, 26 Jun 2005 14:14:53 -0400 Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id OAA29197 for ; Sun, 26 Jun 2005 14:14:51 -0400 (EDT) Received: from gw-e.panasas.com ([65.194.124.178] helo=blackcomb.panasas.com) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1Dmc3e-0007UI-9x for nfsv4@ietf.org; Sun, 26 Jun 2005 14:40:43 -0400 Received: from [127.0.0.1] (bhalevy@dynamic-vpn34.panasas.com [172.17.19.34]) by blackcomb.panasas.com (8.9.3/8.9.3) with ESMTP id OAA19272; Sun, 26 Jun 2005 14:14:39 -0400 Message-ID: <42BEF082.1040202@panasas.com> Date: Sun, 26 Jun 2005 21:14:26 +0300 From: Benny Halevy User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Garth Goodson Subject: Re: [nfsv4] pNFS issues/changes (draft-welch-pnfs-ops-02.txt) References: <42B8A899.5030204@netapp.com> In-Reply-To: <42B8A899.5030204@netapp.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Scan-Signature: 39bd8f8cbb76cae18b7e23f7cf6b2b9f Content-Transfer-Encoding: 7bit Cc: Marc Eshel , nfsv4@ietf.org, "Noveck, Dave" X-BeenThere: nfsv4@ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: NFSv4 Working Group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@ietf.org Errors-To: nfsv4-bounces@ietf.org On 6/22/2005, Garth Goodson wrote: [snip] >> >> I know I can do it. I just don't want to make sure (enforce the rule) >> that each client is using the file handles to read only from the >> specified data server. >> Marc. >> >> Marc. > > > Ok, that is a valid concern (not having to propagate layouts to the data > servers to validate that I/Os are coming from the correct clients). I > guess the object guys get around this by encoding the layout/device IDs > into the capability that is handed back to the client with the layout. T10's object capabilities model does not encode the client identity, nor the layout/device IDs into the capability so the object storage device (OSD) have no way to verify that an I/O request came from the "correct" client (e.g. the client that got the cap could have given it to another client and it would just work), yet the capability is generated for each device, so a cap for one device wouldn't work on another device if the two devices have different device keys. > > It has been marked as an open issue... > > -Garth > > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www1.ietf.org/mailman/listinfo/nfsv4 _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4