Isilon Stuff and Other Things
This is a disclaimer: Using the notes below is dangerous for both your sanity and peace of mind. If you still want to read them beware of the fact that they may be "not even wrong". Everything I write in there is just a mnemonic device to give me a chance to fix things I badly broke because I'm bloody stupid and think I can tinker with stuff that is way above my head and go away with it. It reminds me of Gandalf's warning: "Perilous to all of us are the devices of an art deeper than we ourselves possess." Moreover, a lot of it I blatantly stole on the net from other obviously cleverer persons than me -- not very hard. Forgive me. My bad. Please consider it and go away. You have been warned!
(:#toc:)
EMC Support
- Support is at https://support.emc.com
- One must create a profile with 2 roles, one as Authorized Contact and another as Dial Home, Primary Contact.
- Site ID is:
Site ID: 1003902358 Created On: 05/13/2016 12:36 PM Site Name: MCGILL UNIVERSITY Address 1: 3801 UNIVERSITY ST Address 2: ROOM WB212 City: MONTREAL State: Country: CA Postal Code: H3A 2B4
About This Cluster
This is from the web interface, [Help] → [About This Cluster]
About This Cluster OneFSUpgrade Isilon OneFS v8.0.0.4 B_MR_8_0_0_4_053(RELEASE) installed on all nodes. Packages and Updates No packages or updates are installed. Cluster Information GUID: 000e1ea7eec05c211157780e00f5f0ce64c1 Cluster Hardware Node Model Configuration Serial Number Node 1 Isilon X410-4U-Dual-64GB-2x1GE-2x10GE SFP+-144TB 400-0049-03 SX410-301608-0260 Node 2 Isilon X410-4U-Dual-64GB-2x1GE-2x10GE SFP+-144TB 400-0049-03 SX410-301608-0255 Node 4 Isilon X410-4U-Dual-64GB-2x1GE-2x10GE SFP+-144TB 400-0049-03 SX410-301608-0264 Node 3 Isilon X410-4U-Dual-64GB-2x1GE-2x10GE SFP+-144TB 400-0049-03 SX410-301608-0254 Node 5 Isilon X410-4U-Dual-64GB-2x1GE-2x10GE SFP+-144TB 400-0049-03 SX410-301608-0248 Cluster Firmware Device Type Firmware Nodes BMC_S2600CP BMC 1.25.9722 1-5 CFFPS1 CFFPS 03.03 1-5 CFFPS2 CFFPS 03.03 1-5 CMCSDR_Honeybadger CMCSDR 00.0B 1-5 CMC_HFHB CMC 02.05 1-5 IsilonFPV1 FrontPnl UI.01.36 1-5 LOx2-MLC-YD Nvram rp180c01+rp180c01 1-5 Lsi DiskCtrl 20.00.04.00 1-5 LsiExp0 DiskExp 0910+0210 1-5 LsiExp1 DiskExp 0910+0210 1-5 Mellanox Network 2.30.8020+ISL1090110018 1-5 QLogic-NX2 10GigE 7.6.55 1-5 Copyright © 2001-2017 EMC Corporation. All Rights Reserved. This software is protected, without limitation, by copyright law and international treaties. Use of this software and intellectual property contained therein is expressly limited to the terms and conditions of the License Agreement under which it is provided by or on behalf of EMC. All other trademarks used herein are the property of their respective owners.
Logical Node Numbers (LNN), Device IDs, Serial Numbers and Firmwares
- Use
isi_for_array command
to loop over the nodes and run the commandcommand
BIC-Isilon-Cluster-4# isi_for_array isi_hw_status -i BIC-Isilon-Cluster-4: SerNo: SX410-301608-0264 BIC-Isilon-Cluster-4: Config: 400-0049-03 BIC-Isilon-Cluster-4: FamCode: X BIC-Isilon-Cluster-4: ChsCode: 4U BIC-Isilon-Cluster-4: GenCode: 10 BIC-Isilon-Cluster-4: Product: X410-4U-Dual-64GB-2x1GE-2x10GE SFP+-144TB BIC-Isilon-Cluster-1: SerNo: SX410-301608-0260 BIC-Isilon-Cluster-1: Config: 400-0049-03 BIC-Isilon-Cluster-1: FamCode: X BIC-Isilon-Cluster-1: ChsCode: 4U BIC-Isilon-Cluster-1: GenCode: 10 BIC-Isilon-Cluster-1: Product: X410-4U-Dual-64GB-2x1GE-2x10GE SFP+-144TB BIC-Isilon-Cluster-2: SerNo: SX410-301608-0255 BIC-Isilon-Cluster-2: Config: 400-0049-03 BIC-Isilon-Cluster-2: FamCode: X BIC-Isilon-Cluster-2: ChsCode: 4U BIC-Isilon-Cluster-2: GenCode: 10 BIC-Isilon-Cluster-2: Product: X410-4U-Dual-64GB-2x1GE-2x10GE SFP+-144TB BIC-Isilon-Cluster-3: SerNo: SX410-301608-0254 BIC-Isilon-Cluster-3: Config: 400-0049-03 BIC-Isilon-Cluster-3: FamCode: X BIC-Isilon-Cluster-3: ChsCode: 4U BIC-Isilon-Cluster-3: GenCode: 10 BIC-Isilon-Cluster-3: Product: X410-4U-Dual-64GB-2x1GE-2x10GE SFP+-144TB BIC-Isilon-Cluster-5: SerNo: SX410-301608-0248 BIC-Isilon-Cluster-5: Config: 400-0049-03 BIC-Isilon-Cluster-5: FamCode: X BIC-Isilon-Cluster-5: ChsCode: 4U BIC-Isilon-Cluster-5: GenCode: 10 BIC-Isilon-Cluster-5: Product: X410-4U-Dual-64GB-2x1GE-2x10GE SFP+-144TB
isi_nodes
can extract formatted strings like:
BIC-Isilon-Cluster-3# isi_nodes %{id} %{lnn} %{name} %{serialno} 1 1 BIC-Isilon-Cluster-1 SX410-301608-0260 2 2 BIC-Isilon-Cluster-2 SX410-301608-0255 4 3 BIC-Isilon-Cluster-3 SX410-301608-0254 3 4 BIC-Isilon-Cluster-4 SX410-301608-0264 6 5 BIC-Isilon-Cluster-5 SX410-301608-0248
- Why is there an
%{id}
equal to 6? - 20160923: I can now answer this question I think.
- It might be the result of the initial configuration of the cluster back in April ‘16.
- The guy who did it (from J Laganiere from Gallium-it.com) has a few problems with nodes not responding.
- The nodes are labeled from top to bottom as 1 (highest in the rack) to 5 (lowest in the rack).
- They should have been labeled as their physical order in the rack, 1/bottom to 5/top.
- As to why the LLN don’t match the Device IDs: the Device IDs are incrementally updated when failing and adding nodes.
- I smartfailed one node once so that explains the ID=6.
- This is extremely annoying as the allocation of IPs is also affected.
- The last octet of IP pools
prod
andnode
don’t match for the same LNN.
BIC-Isilon-Cluster >>> lnnset LNN Device ID Cluster IP ---------------------------------------- 1 1 10.0.3.1 2 2 10.0.3.2 3 4 10.0.3.4 4 3 10.0.3.3 5 6 10.0.3.5 BIC-Isilon-Cluster-2# isi_nodes %{lnn} %{devid} %{external} %{dynamic} 1 1 172.16.10.20 132.206.178.237,172.16.20.237 2 2 172.16.10.21 132.206.178.236,172.16.20.236 3 4 172.16.10.22 132.206.178.233,172.16.20.234 4 3 172.16.10.23 132.206.178.234,172.16.20.235 5 6 172.16.10.24 132.206.178.235,172.16.20.233 BIC-Isilon-Cluster-2# isi network interfaces ls LNN Name Status Owners IP Addresses -------------------------------------------------------------------- 1 10gige-1 Up - - 1 10gige-2 Up - - 1 10gige-agg-1 Up groupnet0.prod.pool0 132.206.178.237 groupnet0.node.pool1 172.16.20.237 1 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.20 1 ext-2 No Carrier - - 1 ext-agg Not Available - - 2 10gige-1 Up - - 2 10gige-2 Up - - 2 10gige-agg-1 Up groupnet0.prod.pool0 132.206.178.236 groupnet0.node.pool1 172.16.20.236 2 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.21 2 ext-2 No Carrier - - 2 ext-agg Not Available - - 3 10gige-1 Up - - 3 10gige-2 Up - - 3 10gige-agg-1 Up groupnet0.prod.pool0 132.206.178.233 groupnet0.node.pool1 172.16.20.234 3 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.22 3 ext-2 No Carrier - - 3 ext-agg Not Available - - 4 10gige-1 Up - - 4 10gige-2 Up - - 4 10gige-agg-1 Up groupnet0.prod.pool0 132.206.178.234 groupnet0.node.pool1 172.16.20.235 4 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.23 4 ext-2 No Carrier - - 4 ext-agg Not Available - - 5 10gige-1 Up - - 5 10gige-2 Up - - 5 10gige-agg-1 Up groupnet0.prod.pool0 132.206.178.235 groupnet0.node.pool1 172.16.20.233 5 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.24 5 ext-2 No Carrier - - 5 ext-agg Not Available - - -------------------------------------------------------------------- Total: 30
- This will list the status of devices and firmware on a node:
BIC-Isilon-Cluster-1# isi upgrade cluster firmware devices Device Type Firmware Mismatch Lnns --------------------------------------------------------------------- CFFPS1_Blastoff CFFPS 03.03 - 1-5 CFFPS2_Blastoff CFFPS 03.03 - 1-5 CMC_HFHB CMC 01.02 - 1-5 CMCSDR_Honeybadger CMCSDR 00.0B - 1-5 Lsi DiskCtrl 17.00.01.00 - 1-5 LsiExp0 DiskExp 0910+0210 - 1-5 LsiExp1 DiskExp 0910+0210 - 1-5 IsilonFPV1 FrontPnl UI.01.36 - 1-2,4-5 Mellanox Network 2.30.8020+ISL1090110018 - 1-5 LOx2-MLC-YD Nvram rp180c01+rp180c01 - 1-5 --------------------------------------------------------------------- Total: 10
Licenses
- The following licenses are active:
BIC-Isilon-Cluster-3# isi license licenses ls Name Status Expiration ---------------------------------------------------- SmartDedupe Inactive - Swift Activated - SmartQuotas Activated - InsightIQ Activated - SmartPools Inactive - SmartLock Inactive - Isilon for vCenter Inactive - CloudPools Inactive - Hardening Inactive - SnapshotIQ Activated - HDFS Inactive - SyncIQ Inactive - SmartConnect Advanced Activated - ---------------------------------------------------- Total: 13
Alerts and Events
- Modify event retention period from 90 days (default) to 360:
BIC-Isilon-Cluster-1# isi event settings view Retention Days: 90 Storage Limit: 1 Maintenance Start: Never Maintenance Duration: Never Heartbeat Interval: daily BIC-Isilon-Cluster-1# isi event settings modify --retention-days 360 BIC-Isilon-Cluster-1# isi event settings view Retention Days: 360 Storage Limit: 1 Maintenance Start: Never Maintenance Duration: Never Heartbeat Interval: daily
- The syntax to modify events settings:
isi event settings modify [--retention-days <integer>] [--storage-limit <integer>] [--maintenance-start <timestamp>] [--clear-maintenance-start] [--maintenance-duration <duration>] [--heartbeat-interval
- Every event has two ID numbers that help to establish the context of the event.
- The event type ID identifies the type of event that has occurred.
- The event instance ID is a unique number that is specific to a particular occurrence of an event type.
- When an event is submitted to the kernel queue, an event instance ID is assigned.
- You can reference the instance ID to determine the exact time that an event occurred.
- You can view individual events. However, you manage events and alerts at the event group level.
BIC-Isilon-Cluster-3# isi event events list ID Occurred Sev Lnn Eventgroup ID Message -------------------------------------------------------------------------------------------------------- 1.426 04/19 13:28 U 0 1 Resolved from PAPI 3.309 04/19 11:11 C 4 1 External network link ext-1 (igb0) down 2.530 04/20 00:00 I 2 131 Heartbeat Event 3.545 04/27 22:09 C 4 131101 Disk Repair Complete: Bay 2, Type HDD, LNUM 34. Replace the drive according to the instructions in the OneFS Help system. 1.664 05/05 12:05 U 0 131124 Resolved from PAPI 3.563 05/03 23:56 C 4 131124 One or more drives (bay(s) 2 / type(s) HDD) are ready to be replaced. 3.551 04/27 22:19 C 4 131124 One or more drives (bay(s) 2 / type(s) HDD) are ready to be replaced. BIC-Isilon-Cluster-3# isi event events view 3.545 ID: 3.545 Eventgroup ID: 131101 Event Type: 100010010 Message: Disk Repair Complete: Bay 2, Type HDD, LNUM 34. Replace the drive according to the instructions in the OneFS Help system. Devid: 3 Lnn: 4 Time: 2016-04-27T22:09:12 Severity: critical Value: 0.0 BIC-Isilon-Cluster-3# isi event groups list ID Started Ended Causes Short Events Severity --------------------------------------------------------------------------------- 3 04/19 11:10 04/19 13:28 external_network 2 critical 2 04/19 11:10 04/19 13:28 external_network 2 critical 4 04/19 11:10 04/19 11:16 NODE_STATUS_OFFLINE 2 critical 1 04/19 11:11 04/19 13:28 external_network 2 critical 24 04/19 11:16 04/19 11:16 NODE_STATUS_ONLINE 1 information 26 04/19 11:23 04/19 13:28 external_network 2 critical 27 04/19 11:31 04/19 13:28 WINNET_AUTH_NIS_SERVERS_UNREACH 6 critical 32 04/19 12:43 04/19 12:44 HW_IPMI_POWER_SUPPLY_STATUS_REG 4 critical ... 524525 05/30 02:18 05/30 02:18 SYS_DISK_REMOVED 1 critical 524538 05/30 02:19 -- SYS_DISK_UNHEALTHY 3 critical ... BIC-Isilon-Cluster-3# isi event groups view 524525 ID: 524525 Started: 05/30 02:18 Causes Long: Disk Repair Complete: Bay 18, Type HDD, LNUM 27. Replace the drive according to the instructions in the OneFS Help system. Last Event: 2016-05-30T02:18:16 Ignore: No Ignore Time: Never Resolved: Yes Ended: 05/30 02:18 Events: 1 Severity: critical BIC-Isilon-Cluster-3# isi event groups view 524538 ID: 524538 Started: 05/30 02:19 Causes Long: One or more drives (bay(s) 18 / type(s) HDD) are ready to be replaced. Last Event: 2016-06-04T12:42:09 Ignore: No Ignore Time: Never Resolved: No Ended: -- Events: 3 Severity: critical
Scheduling A Maintenance Window
- You can schedule a maintenance window by setting a maintenance start time and duration.
- During a scheduled maintenance window, the system will continue to log events, but no alerts will be generated.
- Scheduling a maintenance window will keep channels from being flooded by benign alerts associated with cluster maintenance procedures.
- Active event groups will automatically resume generating alerts when the scheduled maintenance period ends.
- You can schedule a maintenance window to discontinue alerts while you are performing maintenance on your cluster.
- Schedule a maintenance window by running the
isi event settings modify
command. - The following example command schedules a maintenance window that begins on September 1, 2015 at 11:00pm and lasts for two days:
isi event settings modify --maintenance-start 2015-09-01T23:00:00 --maintenance-duration 2D
Hardware, Devices and Nodes
Storage Pool Protection Level
- Default and suggested protection level for a cluster size less than 2PB is
+2d:1n
. - A
+2d:1n
protection level implies that the cluster can recover from two simultaneous drive failures or one node failure without sustaining any data loss. - The parity overhead is 20% for a 5-nodes cluster with a
+2d:1n
protection level.
BIC-Isilon-Cluster-4# isi storagepool list Name Nodes Requested Protection HDD Total % SSD Total % -------------------------------------------------------------------------------------- x410_144tb_64gb 1-5 +2d:1n 1.1190T 641.6275T 0.17% 0 0 0.00% -------------------------------------------------------------------------------------- Total: 1 1.1190T 641.6275T 0.17% 0 0 0.00%
Hardware status on a specific node:
BIC-Isilon-Cluster-4# isi_hw_status SerNo: SX410-301608-0264 Config: 400-0049-03 FamCode: X ChsCode: 4U GenCode: 10 Product: X410-4U-Dual-64GB-2x1GE-2x10GE SFP+-144TB HWGen: CTO (CTO Hardware) Chassis: ISI36V3 (Isilon 36-Bay(V3) Chassis) CPU: GenuineIntel (2.00GHz, stepping 0x000306e4) PROC: Dual-proc, Octa-HT-core RAM: 68602642432 Bytes Mobo: IntelS2600CP (Intel S2600CP Motherboard) NVRam: LX4381 (Isilon LOx NVRam Card) (2016MB card) (size 2113798144B) FlshDrv: None (No physical dongle supported) ((null)) DskCtl: LSI2308SAS2 (LSI 2308 SAS Controller) (8 ports) DskExp: LSISAS2X24_X2 (LSI SAS2x24 SAS Expander (Qty 2)) PwrSupl: PS1 (type=ACBEL POLYTECH , fw=03.03) PwrSupl: PS2 (type=ACBEL POLYTECH , fw=03.03) ChasCnt: 1 (Single-Chassis System) NetIF: ib1,ib0,igb0,igb1,bxe0,bxe1 IBType: MT4099 QDR (Mellanox MT4099 IB QDR Card) LCDver: IsiVFD1 (Isilon VFD V1) IMB: Board Version 0xffffffff Power Supplies OK Power Supply 1 good Power Supply 2 good CPU Operation (raw 0x88390000) = Normal CPU Speed Limit = 100.00% FAN TAC SENSOR 1 = 8800.000 FAN TAC SENSOR 2 = 8800.000 FAN TAC SENSOR 3 = 8800.000 PS FAN SPEED 1 = 9600.000 PS FAN SPEED 2 = 9500.000 BB +12.0V = 11.935 BB +5.0V = 4.937 BB +3.3V = 3.268 BB +5.0V STBY = 4.894 BB +3.3V AUX = 3.268 BB +1.05V P1Vccp = 0.828 BB +1.05V P2Vccp = 0.822 BB +1.5 P1DDR AB = na BB +1.5 P1DDR CD = na BB +1.5 P2DDR AB = na BB +1.5 P2DDR CD = na BB +1.8V AUX = 1.769 BB +1.1V STBY = 1.081 BB VBAT = 3.120 BB +1.35 P1LV AB = 1.342 BB +1.35 P1LV CD = 1.348 BB +1.35 P2LV AB = 1.378 BB +1.35 P2LV CD = 1.348 VCC_12V0 = 12.100 VCC_5V0 = 5.000 VCC_3V3 = 3.300 VCC_1V8 = 1.800 VCC_5V0_SB = 4.900 VCC_1V0 = 0.990 VCC_5V0_CBL = 5.000 VCC_SW = 4.900 VBATT_1 = 4.000 VBATT_2 = 4.000 PS IN VOLT 1 = 241.000 PS IN VOLT 2 = 241.000 PS OUT VOLT 1 = 12.300 PS OUT VOLT 2 = 12.300 PS IN CURR 1 = 1.200 PS IN CURR 2 = 1.200 PS OUT CURR 1 = 19.000 PS OUT CURR 2 = 19.500 Front Panel Temp = 20.6 BB EDGE Temp = 25.000 BB BMC Temp = 34.000 BB P2 VR Temp = 30.000 BB MEM VR Temp = 28.000 LAN NIC Temp = 42.000 P1 Therm Margin = -56.000 P2 Therm Margin = -58.000 P1 DTS Therm Mgn = -56.000 P2 DTS Therm Mgn = -58.000 DIMM Thrm Mrgn 1 = -66.000 DIMM Thrm Mrgn 2 = -68.000 DIMM Thrm Mrgn 3 = -67.000 DIMM Thrm Mrgn 4 = -66.000 TEMP SENSOR 1 = 23.000 PS TEMP 1 = 28.000 PS TEMP 2 = 28.000
List devices on nodes 5 (node logical node number):
BIC-Isilon-Cluster-1# isi devices list --node-lnn 5 Lnn Location Device Lnum State Serial ----------------------------------------------- 5 Bay 1 /dev/da1 35 HEALTHY S1Z1S6BY 5 Bay 2 /dev/da2 34 HEALTHY Z1ZAECJM 5 Bay 3 /dev/da19 17 HEALTHY S1Z1SB0L 5 Bay 4 /dev/da20 16 HEALTHY S1Z1SAYP 5 Bay 5 /dev/da3 33 HEALTHY Z1ZA74A4 5 Bay 6 /dev/da21 15 HEALTHY Z1ZAEQ13 5 Bay 7 /dev/da22 14 HEALTHY S1Z1SAF5 5 Bay 8 /dev/da23 13 HEALTHY S1Z1SB0C 5 Bay 9 /dev/da4 32 HEALTHY Z1ZAEPR8 5 Bay 10 /dev/da24 12 HEALTHY Z1ZAB3ZD 5 Bay 11 /dev/da25 11 HEALTHY S1Z1RYGS 5 Bay 12 /dev/da26 10 HEALTHY S1Z1SB0A 5 Bay 13 /dev/da5 31 HEALTHY Z1ZAEPS5 5 Bay 14 /dev/da6 30 HEALTHY Z1ZAF5GQ 5 Bay 15 /dev/da7 29 HEALTHY Z1ZAB40S 5 Bay 16 /dev/da27 9 HEALTHY Z1ZAF625 5 Bay 17 /dev/da8 28 HEALTHY Z1ZAEPJY 5 Bay 18 /dev/da9 27 HEALTHY Z1ZAF1LG 5 Bay 19 /dev/da10 26 HEALTHY Z1ZAF724 5 Bay 20 /dev/da28 8 HEALTHY Z1ZAF5W8 5 Bay 21 /dev/da11 25 HEALTHY Z1ZAEW1W 5 Bay 22 /dev/da12 24 HEALTHY Z1ZAF0CW 5 Bay 23 /dev/da29 7 HEALTHY Z1ZAF5VM 5 Bay 24 /dev/da30 6 HEALTHY Z1ZAF59X 5 Bay 25 /dev/da31 5 HEALTHY Z1ZAF21G 5 Bay 26 /dev/da32 4 HEALTHY Z1ZAF5QJ 5 Bay 27 /dev/da33 3 HEALTHY Z1ZAF58Y 5 Bay 28 /dev/da13 23 HEALTHY Z1ZAF6CG 5 Bay 29 /dev/da34 2 HEALTHY Z1ZAB3XJ 5 Bay 30 /dev/da14 22 HEALTHY S1Z1RYHB 5 Bay 31 /dev/da35 1 HEALTHY Z1ZAB3TQ 5 Bay 32 /dev/da15 21 HEALTHY Z1ZAEPYX 5 Bay 33 /dev/da36 0 HEALTHY Z1ZAF4Z0 5 Bay 34 /dev/da16 20 HEALTHY Z1ZAEPMC 5 Bay 35 /dev/da17 19 HEALTHY Z1ZAF4H4 5 Bay 36 /dev/da18 18 HEALTHY Z1ZAF6JA ----------------------------------------------- Total: 36
Use command isi_for_array an select node 4 and list its drives:
BIC-Isilon-Cluster-1# isi_for_array -n4 isi devices drive list BIC-Isilon-Cluster-4: Lnn Location Device Lnum State Serial BIC-Isilon-Cluster-4: ----------------------------------------------- BIC-Isilon-Cluster-4: 4 Bay 1 /dev/da1 35 HEALTHY S1Z1STTN BIC-Isilon-Cluster-4: 4 Bay 2 /dev/da2 36 HEALTHY Z1Z9XE67 BIC-Isilon-Cluster-4: 4 Bay 3 /dev/da19 17 HEALTHY S1Z1NE5B BIC-Isilon-Cluster-4: 4 Bay 4 /dev/da20 16 HEALTHY S1Z1QQBN BIC-Isilon-Cluster-4: 4 Bay 5 /dev/da3 33 HEALTHY S1Z1RYJ0 BIC-Isilon-Cluster-4: 4 Bay 6 /dev/da21 15 HEALTHY S1Z1SL53 BIC-Isilon-Cluster-4: 4 Bay 7 /dev/da22 14 HEALTHY S1Z1QNVG BIC-Isilon-Cluster-4: 4 Bay 8 /dev/da23 13 HEALTHY S1Z1R8TT BIC-Isilon-Cluster-4: 4 Bay 9 /dev/da4 32 HEALTHY S1Z1SLDG BIC-Isilon-Cluster-4: 4 Bay 10 /dev/da24 12 HEALTHY S1Z1RVGX BIC-Isilon-Cluster-4: 4 Bay 11 /dev/da25 11 HEALTHY S1Z1QNSG BIC-Isilon-Cluster-4: 4 Bay 12 /dev/da26 10 HEALTHY S1Z1NEGJ BIC-Isilon-Cluster-4: 4 Bay 13 /dev/da5 31 HEALTHY S1Z1QR9E BIC-Isilon-Cluster-4: 4 Bay 14 /dev/da6 30 HEALTHY S1Z1SL23 BIC-Isilon-Cluster-4: 4 Bay 15 /dev/da7 29 HEALTHY S1Z1NEPA BIC-Isilon-Cluster-4: 4 Bay 16 /dev/da27 9 HEALTHY S1Z1SLAZ BIC-Isilon-Cluster-4: 4 Bay 17 /dev/da8 28 HEALTHY S1Z1STT6 BIC-Isilon-Cluster-4: 4 Bay 18 /dev/da9 27 HEALTHY S1Z1SL2W BIC-Isilon-Cluster-4: 4 Bay 19 /dev/da10 26 HEALTHY S1Z1SL4P BIC-Isilon-Cluster-4: 4 Bay 20 /dev/da28 8 HEALTHY S1Z1QS4J BIC-Isilon-Cluster-4: 4 Bay 21 /dev/da11 25 HEALTHY S1Z1SAXY BIC-Isilon-Cluster-4: 4 Bay 22 /dev/da12 24 HEALTHY S1Z1SL9J BIC-Isilon-Cluster-4: 4 Bay 23 /dev/da29 7 HEALTHY S1Z1NFS6 BIC-Isilon-Cluster-4: 4 Bay 24 /dev/da30 6 HEALTHY S1Z1NE26 BIC-Isilon-Cluster-4: 4 Bay 25 /dev/da31 5 HEALTHY S1Z1RX6H BIC-Isilon-Cluster-4: 4 Bay 26 /dev/da32 4 HEALTHY S1Z1QRTK BIC-Isilon-Cluster-4: 4 Bay 27 /dev/da33 3 HEALTHY S1Z1SAWG BIC-Isilon-Cluster-4: 4 Bay 28 /dev/da13 23 HEALTHY S1Z1QR5B BIC-Isilon-Cluster-4: 4 Bay 29 /dev/da34 2 HEALTHY S1Z1RVEK BIC-Isilon-Cluster-4: 4 Bay 30 /dev/da14 22 HEALTHY S1Z1SLAN BIC-Isilon-Cluster-4: 4 Bay 31 /dev/da35 1 HEALTHY S1Z1QPES BIC-Isilon-Cluster-4: 4 Bay 32 /dev/da15 21 HEALTHY S1Z1SLBR BIC-Isilon-Cluster-4: 4 Bay 33 /dev/da36 0 HEALTHY S1Z1SAXM BIC-Isilon-Cluster-4: 4 Bay 34 /dev/da16 20 HEALTHY S1Z1RVJX BIC-Isilon-Cluster-4: 4 Bay 35 /dev/da17 19 HEALTHY S1Z1RV62 BIC-Isilon-Cluster-4: 4 Bay 36 /dev/da18 18 HEALTHY S1Z1RYH9 BIC-Isilon-Cluster-4: ----------------------------------------------- BIC-Isilon-Cluster-4: Total: 36
Loop through the cluster nodes and grep for non-healthy drives using
BIC-Isilon-Cluster-4# isi_for_array "isi devices drive list| grep -iv healthy" BIC-Isilon-Cluster-2: Lnn Location Device Lnum State Serial BIC-Isilon-Cluster-2: ----------------------------------------------- BIC-Isilon-Cluster-2: ----------------------------------------------- BIC-Isilon-Cluster-2: Total: 36 BIC-Isilon-Cluster-3: Lnn Location Device Lnum State Serial BIC-Isilon-Cluster-3: ----------------------------------------------- BIC-Isilon-Cluster-3: ----------------------------------------------- BIC-Isilon-Cluster-3: Total: 36 BIC-Isilon-Cluster-4: Lnn Location Device Lnum State Serial BIC-Isilon-Cluster-4: ----------------------------------------------- BIC-Isilon-Cluster-4: 4 Bay 2 - N/A REPLACE - BIC-Isilon-Cluster-4: ----------------------------------------------- BIC-Isilon-Cluster-4: Total: 36 BIC-Isilon-Cluster-1: Lnn Location Device Lnum State Serial BIC-Isilon-Cluster-1: ----------------------------------------------- BIC-Isilon-Cluster-1: ----------------------------------------------- BIC-Isilon-Cluster-1: Total: 36 BIC-Isilon-Cluster-5: Lnn Location Device Lnum State Serial BIC-Isilon-Cluster-5: ----------------------------------------------- BIC-Isilon-Cluster-5: ----------------------------------------------- BIC-Isilon-Cluster-5: Total: 36
View Firmware Devices Status
BIC-Isilon-Cluster-5# isi devices drive firmware list --node-lnn all Lnn Location Firmware Desired Model ----------------------------------------------------- 1 Bay 1 SNG4 - ST4000NM0033-9ZM170 1 Bay 2 SNG4 - ST4000NM0033-9ZM170 1 Bay 3 SNG4 - ST4000NM0033-9ZM170 1 Bay 4 SNG4 - ST4000NM0033-9ZM170 1 Bay 5 SNG4 - ST4000NM0033-9ZM170 1 Bay 6 SNG4 - ST4000NM0033-9ZM170 1 Bay 7 SNG4 - ST4000NM0033-9ZM170 1 Bay 8 SNG4 - ST4000NM0033-9ZM170 1 Bay 9 SNG4 - ST4000NM0033-9ZM170 1 Bay 10 SNG4 - ST4000NM0033-9ZM170 1 Bay 11 SNG4 - ST4000NM0033-9ZM170 1 Bay 12 SNG4 - ST4000NM0033-9ZM170 1 Bay 13 SNG4 - ST4000NM0033-9ZM170 1 Bay 14 SNG4 - ST4000NM0033-9ZM170 1 Bay 15 SNG4 - ST4000NM0033-9ZM170 1 Bay 16 SNG4 - ST4000NM0033-9ZM170 1 Bay 17 SNG4 - ST4000NM0033-9ZM170 1 Bay 18 SNG4 - ST4000NM0033-9ZM170 1 Bay 19 SNG4 - ST4000NM0033-9ZM170 1 Bay 20 SNG4 - ST4000NM0033-9ZM170 1 Bay 21 SNG4 - ST4000NM0033-9ZM170 1 Bay 22 SNG4 - ST4000NM0033-9ZM170 1 Bay 23 SNG4 - ST4000NM0033-9ZM170 1 Bay 24 SNG4 - ST4000NM0033-9ZM170 1 Bay 25 SNG4 - ST4000NM0033-9ZM170 1 Bay 26 SNG4 - ST4000NM0033-9ZM170 1 Bay 27 SNG4 - ST4000NM0033-9ZM170 1 Bay 28 SNG4 - ST4000NM0033-9ZM170 1 Bay 29 SNG4 - ST4000NM0033-9ZM170 1 Bay 30 SNG4 - ST4000NM0033-9ZM170 1 Bay 31 SNG4 - ST4000NM0033-9ZM170 1 Bay 32 SNG4 - ST4000NM0033-9ZM170 1 Bay 33 SNG4 - ST4000NM0033-9ZM170 1 Bay 34 SNG4 - ST4000NM0033-9ZM170 1 Bay 35 SNG4 - ST4000NM0033-9ZM170 1 Bay 36 SNG4 - ST4000NM0033-9ZM170 2 Bay 1 SNG4 - ST4000NM0033-9ZM170 2 Bay 2 SNG4 - ST4000NM0033-9ZM170 2 Bay 3 SNG4 - ST4000NM0033-9ZM170 2 Bay 4 SNG4 - ST4000NM0033-9ZM170 2 Bay 5 SNG4 - ST4000NM0033-9ZM170 2 Bay 6 SNG4 - ST4000NM0033-9ZM170 2 Bay 7 SNG4 - ST4000NM0033-9ZM170 2 Bay 8 SNG4 - ST4000NM0033-9ZM170 2 Bay 9 SNG4 - ST4000NM0033-9ZM170 2 Bay 10 SNG4 - ST4000NM0033-9ZM170 2 Bay 11 SNG4 - ST4000NM0033-9ZM170 2 Bay 12 SNG4 - ST4000NM0033-9ZM170 2 Bay 13 SNG4 - ST4000NM0033-9ZM170 2 Bay 14 SNG4 - ST4000NM0033-9ZM170 2 Bay 15 SNG4 - ST4000NM0033-9ZM170 2 Bay 16 SNG4 - ST4000NM0033-9ZM170 2 Bay 17 SNG4 - ST4000NM0033-9ZM170 2 Bay 18 SNG4 - ST4000NM0033-9ZM170 2 Bay 19 SNG4 - ST4000NM0033-9ZM170 2 Bay 20 SNG4 - ST4000NM0033-9ZM170 2 Bay 21 SNG4 - ST4000NM0033-9ZM170 2 Bay 22 SNG4 - ST4000NM0033-9ZM170 2 Bay 23 SNG4 - ST4000NM0033-9ZM170 2 Bay 24 SNG4 - ST4000NM0033-9ZM170 2 Bay 25 SNG4 - ST4000NM0033-9ZM170 2 Bay 26 SNG4 - ST4000NM0033-9ZM170 2 Bay 27 SNG4 - ST4000NM0033-9ZM170 2 Bay 28 SNG4 - ST4000NM0033-9ZM170 2 Bay 29 SNG4 - ST4000NM0033-9ZM170 2 Bay 30 SNG4 - ST4000NM0033-9ZM170 2 Bay 31 SNG4 - ST4000NM0033-9ZM170 2 Bay 32 SNG4 - ST4000NM0033-9ZM170 2 Bay 33 SNG4 - ST4000NM0033-9ZM170 2 Bay 34 SNG4 - ST4000NM0033-9ZM170 2 Bay 35 SNG4 - ST4000NM0033-9ZM170 2 Bay 36 SNG4 - ST4000NM0033-9ZM170 4 Bay 1 SNG4 - ST4000NM0033-9ZM170 4 Bay 2 SNG4 - ST4000NM0033-9ZM170 4 Bay 3 SNG4 - ST4000NM0033-9ZM170 4 Bay 4 SNG4 - ST4000NM0033-9ZM170 4 Bay 5 SNG4 - ST4000NM0033-9ZM170 4 Bay 6 SNG4 - ST4000NM0033-9ZM170 4 Bay 7 SNG4 - ST4000NM0033-9ZM170 4 Bay 8 SNG4 - ST4000NM0033-9ZM170 4 Bay 9 SNG4 - ST4000NM0033-9ZM170 4 Bay 10 SNG4 - ST4000NM0033-9ZM170 4 Bay 11 SNG4 - ST4000NM0033-9ZM170 4 Bay 12 SNG4 - ST4000NM0033-9ZM170 4 Bay 13 SNG4 - ST4000NM0033-9ZM170 4 Bay 14 SNG4 - ST4000NM0033-9ZM170 4 Bay 15 SNG4 - ST4000NM0033-9ZM170 4 Bay 16 SNG4 - ST4000NM0033-9ZM170 4 Bay 17 SNG4 - ST4000NM0033-9ZM170 4 Bay 18 SNG4 - ST4000NM0033-9ZM170 4 Bay 19 SNG4 - ST4000NM0033-9ZM170 4 Bay 20 SNG4 - ST4000NM0033-9ZM170 4 Bay 21 SNG4 - ST4000NM0033-9ZM170 4 Bay 22 SNG4 - ST4000NM0033-9ZM170 4 Bay 23 SNG4 - ST4000NM0033-9ZM170 4 Bay 24 SNG4 - ST4000NM0033-9ZM170 4 Bay 25 SNG4 - ST4000NM0033-9ZM170 4 Bay 26 SNG4 - ST4000NM0033-9ZM170 4 Bay 27 SNG4 - ST4000NM0033-9ZM170 4 Bay 28 SNG4 - ST4000NM0033-9ZM170 4 Bay 29 SNG4 - ST4000NM0033-9ZM170 4 Bay 30 SNG4 - ST4000NM0033-9ZM170 4 Bay 31 SNG4 - ST4000NM0033-9ZM170 4 Bay 32 SNG4 - ST4000NM0033-9ZM170 4 Bay 33 SNG4 - ST4000NM0033-9ZM170 4 Bay 34 SNG4 - ST4000NM0033-9ZM170 4 Bay 35 SNG4 - ST4000NM0033-9ZM170 4 Bay 36 SNG4 - ST4000NM0033-9ZM170 3 Bay 1 SNG4 - ST4000NM0033-9ZM170 3 Bay 2 SNG4 - ST4000NM0033-9ZM170 3 Bay 3 SNG4 - ST4000NM0033-9ZM170 3 Bay 4 SNG4 - ST4000NM0033-9ZM170 3 Bay 5 SNG4 - ST4000NM0033-9ZM170 3 Bay 6 SNG4 - ST4000NM0033-9ZM170 3 Bay 7 SNG4 - ST4000NM0033-9ZM170 3 Bay 8 SNG4 - ST4000NM0033-9ZM170 3 Bay 9 SNG4 - ST4000NM0033-9ZM170 3 Bay 10 SNG4 - ST4000NM0033-9ZM170 3 Bay 11 SNG4 - ST4000NM0033-9ZM170 3 Bay 12 SNG4 - ST4000NM0033-9ZM170 3 Bay 13 SNG4 - ST4000NM0033-9ZM170 3 Bay 14 SNG4 - ST4000NM0033-9ZM170 3 Bay 15 SNG4 - ST4000NM0033-9ZM170 3 Bay 16 SNG4 - ST4000NM0033-9ZM170 3 Bay 17 SNG4 - ST4000NM0033-9ZM170 3 Bay 18 SNG4 - ST4000NM0033-9ZM170 3 Bay 19 SNG4 - ST4000NM0033-9ZM170 3 Bay 20 SNG4 - ST4000NM0033-9ZM170 3 Bay 21 SNG4 - ST4000NM0033-9ZM170 3 Bay 22 SNG4 - ST4000NM0033-9ZM170 3 Bay 23 SNG4 - ST4000NM0033-9ZM170 3 Bay 24 SNG4 - ST4000NM0033-9ZM170 3 Bay 25 SNG4 - ST4000NM0033-9ZM170 3 Bay 26 SNG4 - ST4000NM0033-9ZM170 3 Bay 27 SNG4 - ST4000NM0033-9ZM170 3 Bay 28 SNG4 - ST4000NM0033-9ZM170 3 Bay 29 SNG4 - ST4000NM0033-9ZM170 3 Bay 30 SNG4 - ST4000NM0033-9ZM170 3 Bay 31 SNG4 - ST4000NM0033-9ZM170 3 Bay 32 SNG4 - ST4000NM0033-9ZM170 3 Bay 33 SNG4 - ST4000NM0033-9ZM170 3 Bay 34 SNG4 - ST4000NM0033-9ZM170 3 Bay 35 SNG4 - ST4000NM0033-9ZM170 3 Bay 36 SNG4 - ST4000NM0033-9ZM170 5 Bay 1 SNG4 - ST4000NM0033-9ZM170 5 Bay 2 SNG4 - ST4000NM0033-9ZM170 5 Bay 3 SNG4 - ST4000NM0033-9ZM170 5 Bay 4 SNG4 - ST4000NM0033-9ZM170 5 Bay 5 SNG4 - ST4000NM0033-9ZM170 5 Bay 6 SNG4 - ST4000NM0033-9ZM170 5 Bay 7 SNG4 - ST4000NM0033-9ZM170 5 Bay 8 SNG4 - ST4000NM0033-9ZM170 5 Bay 9 SNG4 - ST4000NM0033-9ZM170 5 Bay 10 SNG4 - ST4000NM0033-9ZM170 5 Bay 11 SNG4 - ST4000NM0033-9ZM170 5 Bay 12 SNG4 - ST4000NM0033-9ZM170 5 Bay 13 SNG4 - ST4000NM0033-9ZM170 5 Bay 14 SNG4 - ST4000NM0033-9ZM170 5 Bay 15 SNG4 - ST4000NM0033-9ZM170 5 Bay 16 SNG4 - ST4000NM0033-9ZM170 5 Bay 17 SNG4 - ST4000NM0033-9ZM170 5 Bay 18 SNG4 - ST4000NM0033-9ZM170 5 Bay 19 SNG4 - ST4000NM0033-9ZM170 5 Bay 20 SNG4 - ST4000NM0033-9ZM170 5 Bay 21 SNG4 - ST4000NM0033-9ZM170 5 Bay 22 SNG4 - ST4000NM0033-9ZM170 5 Bay 23 SNG4 - ST4000NM0033-9ZM170 5 Bay 24 SNG4 - ST4000NM0033-9ZM170 5 Bay 25 SNG4 - ST4000NM0033-9ZM170 5 Bay 26 SNG4 - ST4000NM0033-9ZM170 5 Bay 27 SNG4 - ST4000NM0033-9ZM170 5 Bay 28 SNG4 - ST4000NM0033-9ZM170 5 Bay 29 SNG4 - ST4000NM0033-9ZM170 5 Bay 30 SNG4 - ST4000NM0033-9ZM170 5 Bay 31 SNG4 - ST4000NM0033-9ZM170 5 Bay 32 SNG4 - ST4000NM0033-9ZM170 5 Bay 33 SNG4 - ST4000NM0033-9ZM170 5 Bay 34 SNG4 - ST4000NM0033-9ZM170 5 Bay 35 SNG4 - ST4000NM0033-9ZM170 5 Bay 36 SNG4 - ST4000NM0033-9ZM170 ----------------------------------------------------- Total: 180
Add a drive to a node:
BIC-Isilon-Cluster-5# isi devices add <bay> --node-lnn=< node#>
Disk Failure Replacement Procedure
- A disk in bay 4 of the Logical Node Number 5 (mode 5) is bad.
### Remove bad disk, insert new one. # List disk devices on node 5: BIC-Isilon-Cluster-4# isi_for_array -n5 isi devices drive list BIC-Isilon-Cluster-5: Lnn Location Device Lnum State Serial BIC-Isilon-Cluster-5: ----------------------------------------------- BIC-Isilon-Cluster-5: 5 Bay 1 /dev/da1 35 HEALTHY S1Z1S6BY BIC-Isilon-Cluster-5: 5 Bay 2 /dev/da2 34 HEALTHY Z1ZAECJM BIC-Isilon-Cluster-5: 5 Bay 3 /dev/da19 17 HEALTHY S1Z1SB0L BIC-Isilon-Cluster-5: 5 Bay 4 /dev/da20 N/A NEW K4K73KGB BIC-Isilon-Cluster-5: 5 Bay 5 /dev/da3 33 HEALTHY Z1ZA74A4 BIC-Isilon-Cluster-5: 5 Bay 6 /dev/da21 15 HEALTHY Z1ZAEQ13 BIC-Isilon-Cluster-5: 5 Bay 7 /dev/da22 14 HEALTHY S1Z1SAF5 BIC-Isilon-Cluster-5: 5 Bay 8 /dev/da23 13 HEALTHY S1Z1SB0C BIC-Isilon-Cluster-5: 5 Bay 9 /dev/da4 32 HEALTHY Z1ZAEPR8 BIC-Isilon-Cluster-5: 5 Bay 10 /dev/da24 36 HEALTHY S1Z26JWM BIC-Isilon-Cluster-5: 5 Bay 11 /dev/da25 11 HEALTHY S1Z1RYGS BIC-Isilon-Cluster-5: 5 Bay 12 /dev/da26 10 HEALTHY S1Z1SB0A BIC-Isilon-Cluster-5: 5 Bay 13 /dev/da5 31 HEALTHY Z1ZAEPS5 BIC-Isilon-Cluster-5: 5 Bay 14 /dev/da6 30 HEALTHY Z1ZAF5GQ BIC-Isilon-Cluster-5: 5 Bay 15 /dev/da7 29 HEALTHY Z1ZAB40S BIC-Isilon-Cluster-5: 5 Bay 16 /dev/da27 9 HEALTHY Z1ZAF625 BIC-Isilon-Cluster-5: 5 Bay 17 /dev/da8 28 HEALTHY Z1ZAEPJY BIC-Isilon-Cluster-5: 5 Bay 18 /dev/da9 27 HEALTHY Z1ZAF1LG BIC-Isilon-Cluster-5: 5 Bay 19 /dev/da10 26 HEALTHY Z1ZAF724 BIC-Isilon-Cluster-5: 5 Bay 20 /dev/da28 8 HEALTHY Z1ZAF5W8 BIC-Isilon-Cluster-5: 5 Bay 21 /dev/da11 25 HEALTHY Z1ZAEW1W BIC-Isilon-Cluster-5: 5 Bay 22 /dev/da12 24 HEALTHY Z1ZAF0CW BIC-Isilon-Cluster-5: 5 Bay 23 /dev/da29 7 HEALTHY Z1ZAF5VM BIC-Isilon-Cluster-5: 5 Bay 24 /dev/da30 6 HEALTHY Z1ZAF59X BIC-Isilon-Cluster-5: 5 Bay 25 /dev/da31 5 HEALTHY Z1ZAF21G BIC-Isilon-Cluster-5: 5 Bay 26 /dev/da32 4 HEALTHY Z1ZAF5QJ BIC-Isilon-Cluster-5: 5 Bay 27 /dev/da33 3 HEALTHY Z1ZAF58Y BIC-Isilon-Cluster-5: 5 Bay 28 /dev/da13 23 HEALTHY Z1ZAF6CG BIC-Isilon-Cluster-5: 5 Bay 29 /dev/da34 2 HEALTHY Z1ZAB3XJ BIC-Isilon-Cluster-5: 5 Bay 30 /dev/da14 22 HEALTHY S1Z1RYHB BIC-Isilon-Cluster-5: 5 Bay 31 /dev/da35 1 HEALTHY Z1ZAB3TQ BIC-Isilon-Cluster-5: 5 Bay 32 /dev/da15 21 HEALTHY Z1ZAEPYX BIC-Isilon-Cluster-5: 5 Bay 33 /dev/da36 0 HEALTHY Z1ZAF4Z0 BIC-Isilon-Cluster-5: 5 Bay 34 /dev/da16 20 HEALTHY Z1ZAEPMC BIC-Isilon-Cluster-5: 5 Bay 35 /dev/da17 19 HEALTHY Z1ZAF4H4 BIC-Isilon-Cluster-5: 5 Bay 36 /dev/da18 18 HEALTHY Z1ZAF6JA BIC-Isilon-Cluster-5: ----------------------------------------------- BIC-Isilon-Cluster-5: Total: 36 # Check cluster status: BIC-Isilon-Cluster-4# isi status Cluster Name: BIC-Isilon-Cluster Cluster Health: [ ATTN] Cluster Storage: HDD SSD Storage Size: 638.0T (645.7T Raw) 0 (0 Raw) VHS Size: 7.7T Used: 204.8T (32%) 0 (n/a) Avail: 433.2T (68%) 0 (n/a) Health Throughput (bps) HDD Storage SSD Storage ID |IP Address |DASR | In Out Total| Used / Size |Used / Size ---+---------------+-----+-----+-----+-----+-----------------+----------------- 1|172.16.10.20 | OK |28.8M|83.3k|28.9M|41.0T/ 130T( 32%)|(No Storage SSDs) 2|172.16.10.21 | OK | 1.2M| 3.7M| 4.9M|40.9T/ 130T( 32%)|(No Storage SSDs) 3|172.16.10.22 | OK | 3.2k| 167k| 170k|41.0T/ 130T( 32%)|(No Storage SSDs) 4|172.16.10.23 | OK | 861k|49.2M|50.0M|41.0T/ 130T( 32%)|(No Storage SSDs) 5|172.16.10.24 |-A-- | 858k| 1.6M| 2.5M|41.0T/ 126T( 32%)|(No Storage SSDs) ---+---------------+-----+-----+-----+-----+-----------------+----------------- Cluster Totals: |31.7M|54.7M|86.4M| 205T/ 638T( 32%)|(No Storage SSDs) Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only Critical Events: 10/14 23:24 5 One or more drives (bay(s) 4 / location / type(s) HDD) are... Cluster Job Status: No running jobs. No paused or waiting jobs. # Try to add disk drive in bay 4 of node 5: BIC-Isilon-Cluster-4# isi devices add 4 --node-lnn=5 You are about to add drive bay4, on node lnn 5. Are you sure? (yes/[no]): yes Initiating add on bay4 The drive in bay4 was not added to the file system because it is not formatted. Format the drive to add it to the file system by running the following command, where <bay> is the bay number of the drive: isi devices drive format <bay> # Oups. It doesn't like it. The new disk is an Hitachi drive while # the cluster is built of Seagate ES3. # Login directly to node 5 as it's easier to do the stuff directly there. BIC-Isilon-Cluster-4# ssh BIC-Isilon-Cluster-5 Password: Last login: Sun Oct 15 02:31:18 2017 from 172.16.10.160 Copyright (c) 2001-2016 EMC Corporation. All Rights Reserved. Copyright (c) 1992-2016 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. Isilon OneFS v8.0.0.4 # Check the state of the drive in bay 4: BIC-Isilon-Cluster-5# isi devices drive view 4 Lnn: 5 Location: Bay 4 Lnum: N/A Device: /dev/da20 Baynum: 4 Handle: 333 Serial: K4K73KGB Model: HUS726040ALA610 Tech: SATA Media: HDD Blocks: 7814037168 Logical Block Length: 512 Physical Block Length: 512 WWN: 0000000000000000 State: NEW Purpose: UNKNOWN Purpose Description: A drive whose purpose is unknown Present: Yes # Check the difference between it and the drive in bay 3 (healthy): BIC-Isilon-Cluster-5# isi devices drive view 3 Lnn: 5 Location: Bay 3 Lnum: 17 Device: /dev/da19 Baynum: 3 Handle: 353 Serial: S1Z1SB0L Model: ST4000NM0033-9ZM170 Tech: SATA Media: HDD Blocks: 7814037168 Logical Block Length: 512 Physical Block Length: 512 WWN: 5000C5008CAD9092 State: HEALTHY Purpose: STORAGE Purpose Description: A drive used for normal data storage operation Present: Yes # Format the drive bay 4: BIC-Isilon-Cluster-5# isi devices drive format 4 You are about to format drive bay4, on node lnn 5. Are you sure? (yes/[no]): yes BIC-Isilon-Cluster-5# isi devices drive view 4 Lnn: 5 Location: Bay 4 Lnum: 37 Device: /dev/da20 Baynum: 4 Handle: 332 Serial: K4K73KGB Model: HUS726040ALA610 Tech: SATA Media: HDD Blocks: 7814037168 Logical Block Length: 512 Physical Block Length: 512 WWN: 0000000000000000 State: NONE Purpose: NONE Purpose Description: A drive that doesn't yet have a purpose Present: Yes # The drive shows up as 'PREPARING'now: BIC-Isilon-Cluster-5# isi devices drive list Lnn Location Device Lnum State Serial ------------------------------------------------- 5 Bay 1 /dev/da1 35 HEALTHY S1Z1S6BY 5 Bay 2 /dev/da2 34 HEALTHY Z1ZAECJM 5 Bay 3 /dev/da19 17 HEALTHY S1Z1SB0L 5 Bay 4 /dev/da20 37 PREPARING K4K73KGB 5 Bay 5 /dev/da3 33 HEALTHY Z1ZA74A4 5 Bay 6 /dev/da21 15 HEALTHY Z1ZAEQ13 5 Bay 7 /dev/da22 14 HEALTHY S1Z1SAF5 5 Bay 8 /dev/da23 13 HEALTHY S1Z1SB0C 5 Bay 9 /dev/da4 32 HEALTHY Z1ZAEPR8 5 Bay 10 /dev/da24 36 HEALTHY S1Z26JWM 5 Bay 11 /dev/da25 11 HEALTHY S1Z1RYGS 5 Bay 12 /dev/da26 10 HEALTHY S1Z1SB0A 5 Bay 13 /dev/da5 31 HEALTHY Z1ZAEPS5 5 Bay 14 /dev/da6 30 HEALTHY Z1ZAF5GQ 5 Bay 15 /dev/da7 29 HEALTHY Z1ZAB40S 5 Bay 16 /dev/da27 9 HEALTHY Z1ZAF625 5 Bay 17 /dev/da8 28 HEALTHY Z1ZAEPJY 5 Bay 18 /dev/da9 27 HEALTHY Z1ZAF1LG 5 Bay 19 /dev/da10 26 HEALTHY Z1ZAF724 5 Bay 20 /dev/da28 8 HEALTHY Z1ZAF5W8 5 Bay 21 /dev/da11 25 HEALTHY Z1ZAEW1W 5 Bay 22 /dev/da12 24 HEALTHY Z1ZAF0CW 5 Bay 23 /dev/da29 7 HEALTHY Z1ZAF5VM 5 Bay 24 /dev/da30 6 HEALTHY Z1ZAF59X 5 Bay 25 /dev/da31 5 HEALTHY Z1ZAF21G 5 Bay 26 /dev/da32 4 HEALTHY Z1ZAF5QJ 5 Bay 27 /dev/da33 3 HEALTHY Z1ZAF58Y 5 Bay 28 /dev/da13 23 HEALTHY Z1ZAF6CG 5 Bay 29 /dev/da34 2 HEALTHY Z1ZAB3XJ 5 Bay 30 /dev/da14 22 HEALTHY S1Z1RYHB 5 Bay 31 /dev/da35 1 HEALTHY Z1ZAB3TQ 5 Bay 32 /dev/da15 21 HEALTHY Z1ZAEPYX 5 Bay 33 /dev/da36 0 HEALTHY Z1ZAF4Z0 5 Bay 34 /dev/da16 20 HEALTHY Z1ZAEPMC 5 Bay 35 /dev/da17 19 HEALTHY Z1ZAF4H4 5 Bay 36 /dev/da18 18 HEALTHY Z1ZAF6JA ------------------------------------------------- # Add the drive bay 4: BIC-Isilon-Cluster-5# isi devices drive add 4 You are about to add drive bay4, on node lnn 5. Are you sure? (yes/[no]): yes Initiating add on bay4 The add operation is in-progress. A OneFS-formatted drive was found in bay4 and is being added to the file system. Wait a few minutes and then list all drives to verify that the add operation completed successfully. # There was a event while this was going on. # Not sure what is means as bay for is not a Seagate ES3 anymore. BIC-Isilon-Cluster-5# isi event groups view 4882746 ID: 4882746 Started: 10/18 14:20 Causes Long: Drive in bay 4 location Bay 4 is unknown model ST4000NM0033-9ZM170 Lnn: 5 Devid: 6 Last Event: 2017-10-18T14:20:30 Ignore: No Ignore Time: Never Resolved: Yes Resolve Time: 2017-10-18T14:18:15 Ended: 10/18 14:18 Events: 2 Severity: warning # After a little while, minutes, the drive shows up good: BIC-Isilon-Cluster-5# isi devices drive list Lnn Location Device Lnum State Serial ----------------------------------------------- 5 Bay 1 /dev/da1 35 HEALTHY S1Z1S6BY 5 Bay 2 /dev/da2 34 HEALTHY Z1ZAECJM 5 Bay 3 /dev/da19 17 HEALTHY S1Z1SB0L 5 Bay 4 /dev/da20 37 HEALTHY K4K73KGB 5 Bay 5 /dev/da3 33 HEALTHY Z1ZA74A4 5 Bay 6 /dev/da21 15 HEALTHY Z1ZAEQ13 5 Bay 7 /dev/da22 14 HEALTHY S1Z1SAF5 5 Bay 8 /dev/da23 13 HEALTHY S1Z1SB0C 5 Bay 9 /dev/da4 32 HEALTHY Z1ZAEPR8 5 Bay 10 /dev/da24 36 HEALTHY S1Z26JWM 5 Bay 11 /dev/da25 11 HEALTHY S1Z1RYGS 5 Bay 12 /dev/da26 10 HEALTHY S1Z1SB0A 5 Bay 13 /dev/da5 31 HEALTHY Z1ZAEPS5 5 Bay 14 /dev/da6 30 HEALTHY Z1ZAF5GQ 5 Bay 15 /dev/da7 29 HEALTHY Z1ZAB40S 5 Bay 16 /dev/da27 9 HEALTHY Z1ZAF625 5 Bay 17 /dev/da8 28 HEALTHY Z1ZAEPJY 5 Bay 18 /dev/da9 27 HEALTHY Z1ZAF1LG 5 Bay 19 /dev/da10 26 HEALTHY Z1ZAF724 5 Bay 20 /dev/da28 8 HEALTHY Z1ZAF5W8 5 Bay 21 /dev/da11 25 HEALTHY Z1ZAEW1W 5 Bay 22 /dev/da12 24 HEALTHY Z1ZAF0CW 5 Bay 23 /dev/da29 7 HEALTHY Z1ZAF5VM 5 Bay 24 /dev/da30 6 HEALTHY Z1ZAF59X 5 Bay 25 /dev/da31 5 HEALTHY Z1ZAF21G 5 Bay 26 /dev/da32 4 HEALTHY Z1ZAF5QJ 5 Bay 27 /dev/da33 3 HEALTHY Z1ZAF58Y 5 Bay 28 /dev/da13 23 HEALTHY Z1ZAF6CG 5 Bay 29 /dev/da34 2 HEALTHY Z1ZAB3XJ 5 Bay 30 /dev/da14 22 HEALTHY S1Z1RYHB 5 Bay 31 /dev/da35 1 HEALTHY Z1ZAB3TQ 5 Bay 32 /dev/da15 21 HEALTHY Z1ZAEPYX 5 Bay 33 /dev/da36 0 HEALTHY Z1ZAF4Z0 5 Bay 34 /dev/da16 20 HEALTHY Z1ZAEPMC 5 Bay 35 /dev/da17 19 HEALTHY Z1ZAF4H4 5 Bay 36 /dev/da18 18 HEALTHY Z1ZAF6JA ----------------------------------------------- Total: 36 BIC-Isilon-Cluster-5# isi devices drive view 4 Lnn: 5 Location: Bay 4 Lnum: 37 Device: /dev/da20 Baynum: 4 Handle: 332 Serial: K4K73KGB Model: HUS726040ALA610 Tech: SATA Media: HDD Blocks: 7814037168 Logical Block Length: 512 Physical Block Length: 512 WWN: 0000000000000000 State: HEALTHY Purpose: STORAGE Purpose Description: A drive used for normal data storage operation Present: Yes # Check cluster state after resolving the event group. # I did resolve through the web UI as it is easier. BIC-Isilon-Cluster-5# isi status Cluster Name: BIC-Isilon-Cluster Cluster Health: [ OK ] Cluster Storage: HDD SSD Storage Size: 641.6T (649.3T Raw) 0 (0 Raw) VHS Size: 7.7T Used: 204.8T (32%) 0 (n/a) Avail: 436.8T (68%) 0 (n/a) Health Throughput (bps) HDD Storage SSD Storage ID |IP Address |DASR | In Out Total| Used / Size |Used / Size ---+---------------+-----+-----+-----+-----+-----------------+----------------- 1|172.16.10.20 | OK | 500k| 0| 500k|41.0T/ 130T( 32%)|(No Storage SSDs) 2|172.16.10.21 | OK | 4.6k| 320k| 325k|40.9T/ 130T( 32%)|(No Storage SSDs) 3|172.16.10.22 | OK | 0| 0| 0|41.0T/ 130T( 32%)|(No Storage SSDs) 4|172.16.10.23 | OK | 321k|83.4k| 404k|41.0T/ 130T( 32%)|(No Storage SSDs) 5|172.16.10.24 | OK | 1.1M| 0| 1.1M|41.0T/ 130T( 32%)|(No Storage SSDs) ---+---------------+-----+-----+-----+-----+-----------------+----------------- Cluster Totals: | 1.9M| 403k| 2.3M| 205T/ 642T( 32%)|(No Storage SSDs) Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only Critical Events: Cluster Job Status: Running jobs: Job Impact Pri Policy Phase Run Time -------------------------- ------ --- ---------- ----- ---------- MultiScan[4097] Low 4 LOW 1/4 0:02:18 No paused or waiting jobs. No failed jobs. Recent job results: Time Job Event --------------- -------------------------- ------------------------------ 10/18 04:00:22 ShadowStoreProtect[4096] Succeeded (LOW) 10/18 03:02:15 SnapshotDelete[4095] Succeeded (MEDIUM) 10/18 02:00:04 WormQueue[4094] Succeeded (LOW) 10/18 01:01:37 SnapshotDelete[4093] Succeeded (MEDIUM) 10/18 00:31:08 SnapshotDelete[4092] Succeeded (MEDIUM) 10/18 00:01:39 SnapshotDelete[4091] Succeeded (MEDIUM) 10/17 23:04:54 FSAnalyze[4089] Succeeded (LOW) 10/17 22:33:17 SnapshotDelete[4090] Succeeded (MEDIUM) 11/15 14:53:34 MultiScan[1254] MultiScan[1254] Failed 10/06 14:45:55 ChangelistCreate[975] ChangelistCreate[975] Failed
Network
BIC-Isilon-Cluster-1# isi config Welcome to the Isilon IQ configuration console. Copyright (c) 2001-2016 EMC Corporation. All Rights Reserved. Enter 'help' to see list of available commands. Enter 'help <command>' to see help for a specific command. Enter 'quit' at any prompt to discard changes and exit. Node build: Isilon OneFS v8.0.0.0 B_8_0_0_037(RELEASE) Node serial number: SX410-301608-0260 BIC-Isilon-Cluster >>> status Configuration for 'BIC-Isilon-Cluster' Local machine: ----------------------------------+----------------------------------------- Node LNN : 1 | Date : 2016/06/09 15:53:01 EDT ----------------------------------+----------------------------------------- Interface : ib1 | MAC : 00:00:00:49:fe:80:00:00:00:00:00:00:7c:fe:90:03:00:9e:e9:a2 IP Address : 10.0.1.1 | MAC Options : none ----------------------------------+----------------------------------------- Interface : ib0 | MAC : 00:00:00:48:fe:80:00:00:00:00:00:00:7c:fe:90:03:00:9e:e9:a1 IP Address : 10.0.2.1 | MAC Options : none ----------------------------------+----------------------------------------- Interface : lo0 | MAC : 00:00:00:00:00:00 IP Address : 10.0.3.1 | MAC Options : none ----------------------------------+----------------------------------------- Network: ----------------------------------+----------------------------------------- JoinMode : Manual Interfaces: ----------------------------------+----------------------------------------- Interface : int-a | Flags : enabled_ok Netmask : 255.255.255.0 | MTU : N/A ----------------+-----------------+------------------+---------------------- Low IP | High IP | Allocated | Free ----------------+-----------------+------------------+---------------------- 10.0.1.1 | 10.0.1.254 | 5 | 249 ----------------+-----------------+------------------+---------------------- Interface : int-b | Flags : enabled_ok Netmask : 255.255.255.0 | MTU : N/A ----------------+-----------------+------------------+---------------------- Low IP | High IP | Allocated | Free ----------------+-----------------+------------------+---------------------- 10.0.2.1 | 10.0.2.254 | 5 | 249 ----------------+-----------------+------------------+---------------------- Interface : lpbk | Flags : enabled_ok cluster_traffic failover Netmask : 255.255.255.0 | MTU : 1500 ----------------+-----------------+------------------+---------------------- Low IP | High IP | Allocated | Free ----------------+-----------------+------------------+---------------------- 10.0.3.1 | 10.0.3.254 | 5 | 249 ----------------+-----------------+------------------+----------------------
- Initial groupnet and DNS client settings:
BIC-Isilon-Cluster-4# isi network groupnets list ID DNS Cache Enabled DNS Search DNS Servers Subnets ----------------------------------------------------------------- groupnet0 True - 132.206.178.7 mgmt 132.206.178.186 prod node ----------------------------------------------------------------- Total: 1 BIC-Isilon-Cluster-4# isi network groupnets view groupnet0 ID: groupnet0 Name: groupnet0 Description: Initial groupnet DNS Cache Enabled: True DNS Options: - DNS Search: - DNS Servers: 132.206.178.7, 132.206.178.186 Server Side DNS Search: True Subnets: mgmt, prod, node
- List and view the network subnets defined in the cluster:
BIC-Isilon-Cluster-4# isi network subnets list ID Subnet Gateway|Priority Pools SC Service groupnet0.mgmt 172.16.10.0/24 172.16.10.1|2 mgmt 0.0.0.0 groupnet0.node 172.16.20.0/23 172.16.20.1|3 pool1 172.16.20.232 groupnet0.prod 132.206.178.0/24 132.206.178.1|1 pool0 132.206.178.232 ------------------------------------------------------------------------ Total: 3
- List and view the network pools defined in the cluster.
- Note that the IP allocation for the pool groupnet0.prod.pool0 is set to dynamic. This requires a SmartConnect Advanced license.
BIC-Isilon-Cluster-4# isi network pools list ID SC Zone Allocation Method groupnet0.mgmt.mgmt mgmt.isi.bic.mni.mcgill.ca static groupnet0.node.pool1 nfs.isi-node.bic.mni.mcgill.ca dynamic groupnet0.prod.pool0 nfs.isi.bic.mni.mcgill.ca dynamic ---------------------------------------------------------------------- Total: 3 BIC-Isilon-Cluster-4# isi network pools view groupnet0.mgmt.mgmt ID: groupnet0.mgmt.mgmt Groupnet: groupnet0 Subnet: mgmt Name: mgmt Rules: - Access Zone: System Allocation Method: static Aggregation Mode: lacp SC Suspended Nodes: - Description: - Ifaces: 1:ext-1, 2:ext-1, 4:ext-1, 3:ext-1, 5:ext-1 IP Ranges: 172.16.10.20-172.16.10.24 Rebalance Policy: auto SC Auto Unsuspend Delay: 0 SC Connect Policy: round_robin SC Zone: mgmt.isi.bic.mni.mcgill.ca SC DNS Zone Aliases: - SC Failover Policy: round_robin SC Subnet: prod SC Ttl: 0 Static Routes: - BIC-Isilon-Cluster-4# isi network pools view groupnet0.prod.pool0 ID: groupnet0.prod.pool0 Groupnet: groupnet0 Subnet: prod Name: pool0 Rules: - Access Zone: prod Allocation Method: dynamic Aggregation Mode: lacp SC Suspended Nodes: - Description: - Ifaces: 1:10gige-agg-1, 2:10gige-agg-1, 4:10gige-agg-1, 3:10gige-agg-1, 5:10gige-agg-1 IP Ranges: 132.206.178.233-132.206.178.237 Rebalance Policy: auto SC Auto Unsuspend Delay: 0 SC Connect Policy: round_robin SC Zone: nfs.isi.bic.mni.mcgill.ca SC DNS Zone Aliases: - SC Failover Policy: round_robin SC Subnet: prod SC Ttl: 0 Static Routes: - IC-Isilon-Cluster-2# isi network pools view groupnet0.node.pool1 ID: groupnet0.node.pool1 Groupnet: groupnet0 Subnet: node Name: pool1 Rules: - Access Zone: prod Allocation Method: dynamic Aggregation Mode: lacp SC Suspended Nodes: - Description: - Ifaces: 1:10gige-agg-1, 2:10gige-agg-1, 4:10gige-agg-1, 3:10gige-agg-1, 5:10gige-agg-1 IP Ranges: 172.16.20.233-172.16.20.237 Rebalance Policy: auto SC Auto Unsuspend Delay: 0 SC Connect Policy: round_robin SC Zone: nfs.isi-node.bic.mni.mcgill.ca SC DNS Zone Aliases: - SC Failover Policy: round_robin SC Subnet: node SC Ttl: 0 Static Routes: -
- Display network interfaces configuration:
BIC-Isilon-Cluster-4# isi network interfaces list LNN Name Status Owners IP Addresses -------------------------------------------------------------------- 1 10gige-1 Up - - 1 10gige-2 Up - - 1 10gige-agg-1 Up groupnet0.prod.pool0 132.206.178.237 groupnet0.node.pool1 172.16.20.237 1 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.20 1 ext-2 No Carrier - - 1 ext-agg Not Available - - 2 10gige-1 Up - - 2 10gige-2 Up - - 2 10gige-agg-1 Up groupnet0.prod.pool0 132.206.178.236 groupnet0.node.pool1 172.16.20.236 2 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.21 2 ext-2 No Carrier - - 2 ext-agg Not Available - - 3 10gige-1 Up - - 3 10gige-2 Up - - 3 10gige-agg-1 Up groupnet0.prod.pool0 132.206.178.234 groupnet0.node.pool1 172.16.20.234 3 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.22 3 ext-2 No Carrier - - 3 ext-agg Not Available - - 4 10gige-1 Up - - 4 10gige-2 Up - - 4 10gige-agg-1 Up groupnet0.prod.pool0 132.206.178.235 groupnet0.node.pool1 172.16.20.235 4 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.23 4 ext-2 No Carrier - - 4 ext-agg Not Available - - 5 10gige-1 Up - - 5 10gige-2 Up - - 5 10gige-agg-1 Up groupnet0.prod.pool0 132.206.178.233 groupnet0.node.pool1 172.16.20.233 5 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.24 5 ext-2 No Carrier - - 5 ext-agg Not Available - - -------------------------------------------------------------------- Total: 30
- Suspend or resume a node:
From the docu65065_OneFS-8.0.0-CLI-Administration-Guide, page 950:
Suspend or resume a node You can suspend and resume SmartConnect DNS query responses on a node. Procedure 1. To suspend DNS query responses for an node: a. (Optional) To identify a list of nodes and IP address pools, run the following command: isi network interfaces list b. Run the isi network pools sc-suspend-nodes command and specify the pool ID and logical node number (LNN). Specify the pool ID you want in the following format: <groupnet_name>.<subnet_name>.<pool_name> The following command suspends DNS query responses on node 3 when queries come through IP addresses in pool5 under groupnet1.subnet 3: isi network pools sc-suspend-nodes groupnet1.subnet3.pool5 3 2. To resume DNS query responses for an IP address pool, run the isi network pools sc-resume-nodes command and specify the pool ID and logical node number (LNN). The following command resumes DNS query responses on node 3 when queries come through IP addresses in pool5 under groupnet1.subnet 3: isi network pools sc-resume-nodes groupnet1.subnet3.pool5 3
Example of an IP Failover with Dynamic Allocation Method
- First, set the dynamic IP allocation for the pool:
isi network pools modify groupnet0.prod.pool0 --alloc-method=dynamic
- Then pull the fiber cables from one node, say node 5 and watch what happens:
- Before pulling the cables:
BIC-Isilon-Cluster-4# isi network interfaces list ... 3 10gige-1 Up - - 3 10gige-2 Up - - 3 10gige-agg-1 Up groupnet0.prod.pool0 132.206.178.235 3 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.22 3 ext-2 No Carrier - - 3 ext-agg Not Available - - 4 10gige-1 Up - - ... 5 10gige-1 Up - - 5 10gige-2 Up - - 5 10gige-agg-1 Up groupnet0.prod.pool0 132.206.178.237 5 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.24 5 ext-2 No Carrier - - 5 ext-agg Not Available - - -------------------------------------------------------------------- Total: 30
- After.
- Node 5 external network interfaces 10gig-1, −2, -agg-1 now display No Carrier.
- Note how node 3 external network interface 10gige-agg-1 picked up the IP of node 5.
BIC-Isilon-Cluster-4# isi network interfaces list LNN Name Status Owners IP Addresses ... 3 10gige-1 Up - - 3 10gige-2 Up - - 3 10gige-agg-1 Up groupnet0.prod.pool0 132.206.178.235 132.206.178.237 3 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.22 3 ext-2 No Carrier - - 3 ext-agg Not Available - - ... 5 10gige-1 No Carrier - - 5 10gige-2 No Carrier - - 5 10gige-agg-1 No Carrier groupnet0.prod.pool0 - 5 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.24 5 ext-2 No Carrier - - 5 ext-agg Not Available - -
How To Add a Subnet to a Cluster
- Goal: to have clients access the Isilon cluster through the private network
172.16.20.0/24
(data network). - Hosts in the
arnodes
compute cluster have 2 extra bounded NICs that are configured on this network. - The private network
172.16.20.0/24
is directly attached to cluster’s front-end: there are no intervening gateways or routers in between. - This section explains how to configure the Isilon cluster such that clients on
172.16.20.0/24
are granted NFS access. - Current network cluster state:
BIC-Isilon-Cluster-4# isi network interfaces ls LNN Name Status Owners IP Addresses -------------------------------------------------------------------- 1 10gige-1 Up - - 1 10gige-2 Up - - 1 10gige-agg-1 Up groupnet0.prod.pool0 132.206.178.237 1 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.20 1 ext-2 No Carrier - - 1 ext-agg Not Available - - 2 10gige-1 Up - - 2 10gige-2 Up - - 2 10gige-agg-1 Up groupnet0.prod.pool0 132.206.178.236 2 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.21 2 ext-2 No Carrier - - 2 ext-agg Not Available - - 3 10gige-1 Up - - 3 10gige-2 Up - - 3 10gige-agg-1 Up groupnet0.prod.pool0 132.206.178.233 3 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.22 3 ext-2 No Carrier - - 3 ext-agg Not Available - - 4 10gige-1 Up - - 4 10gige-2 Up - - 4 10gige-agg-1 Up groupnet0.prod.pool0 132.206.178.234 4 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.23 4 ext-2 No Carrier - - 4 ext-agg Not Available - - 5 10gige-1 Up - - 5 10gige-2 Up - - 5 10gige-agg-1 Up groupnet0.prod.pool0 132.206.178.235 5 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.24 5 ext-2 No Carrier - - 5 ext-agg Not Available - - -------------------------------------------------------------------- BIC-Isilon-Cluster-4# isi network pools ls ID SC Zone Allocation Method ---------------------------------------------------------------------- groupnet0.mgmt.mgmt mgmt.isi.bic.mni.mcgill.ca static groupnet0.prod.pool0 nfs.isi.bic.mni.mcgill.ca dynamic ---------------------------------------------------------------------- Total: 2 BIC-Isilon-Cluster-4# isi network subnets ls ID Subnet Gateway|Priority Pools SC Service ------------------------------------------------------------------------ groupnet0.mgmt 172.16.10.0/24 172.16.10.1|2 mgmt 0.0.0.0 groupnet0.prod 132.206.178.0/24 132.206.178.1|1 pool0 132.206.178.232 ------------------------------------------------------------------------ Total: 2
- It is faster and easier to configure this by using the WebUI rather than the CLI.

- Essentially, it boils down to the following actions:
- Create a new subnet called
node
in the default groupnetgroupnet0
. - Set the SmartConnect (Sc) Service IP to
172.16.20.232
. - Update the domain master DNS server with the new delegation record and “glue” record. More on this later.
- Create a new subnet called
BIC-Isilon-Cluster-4# isi network subnets view groupnet0.node ID: groupnet0.node Name: node Groupnet: groupnet0 Pools: pool1 Addr Family: ipv4 Base Addr: 172.16.20.0 CIDR: 172.16.20.0/24 Description: - Dsr Addrs: - Gateway: 172.16.20.1 Gateway Priority: 3 MTU: 1500 Prefixlen: 24 Netmask: 255.255.255.0 Sc Service Addr: 172.16.20.232 VLAN Enabled: False VLAN ID: -
- Create a new pool called
pool1
with the following properties:- Access zone is set to
prod
like the poolpool0
. - Allocation method is
dynamic
. - Select the 10gige aggregate interfaces from each node.
- Set the SmartConnect Connect policy to
round-robin
. - Best practices might require to set it to
cpu
ornetwork utilization
for NFSv4. Benchmarking should help. - Name the SmartConnect zone as
nfs.isi-node.bic.mni.mcgill.ca
. - Proper records in the master domain DNS server will have to be set for the new zone. More on this later.
- Access zone is set to
BIC-Isilon-Cluster-4# isi network pools view groupnet0.node.pool1 ID: groupnet0.node.pool1 Groupnet: groupnet0 Subnet: node Name: pool1 Rules: - Access Zone: prod Allocation Method: dynamic Aggregation Mode: lacp SC Suspended Nodes: - Description: - Ifaces: 1:10gige-agg-1, 2:10gige-agg-1, 4:10gige-agg-1, 3:10gige-agg-1, 5:10gige-agg-1 IP Ranges: 172.16.20.233-172.16.20.237 Rebalance Policy: auto SC Auto Unsuspend Delay: 0 SC Connect Policy: round_robin SC Zone: nfs.isi-node.bic.mni.mcgill.ca SC DNS Zone Aliases: - SC Failover Policy: round_robin SC Subnet: node SC Ttl: 0 Static Routes: -
- With this in place, the cluster network interfaces settings will be:
BIC-Isilon-Cluster-4# isi network interfaces ls LNN Name Status Owners IP Addresses -------------------------------------------------------------------- 1 10gige-1 Up - - 1 10gige-2 Up - - 1 10gige-agg-1 Up groupnet0.prod.pool0 132.206.178.237 groupnet0.node.pool1 172.16.20.237 1 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.20 1 ext-2 No Carrier - - 1 ext-agg Not Available - - 2 10gige-1 Up - - 2 10gige-2 Up - - 2 10gige-agg-1 Up groupnet0.prod.pool0 132.206.178.236 groupnet0.node.pool1 172.16.20.236 2 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.21 2 ext-2 No Carrier - - 2 ext-agg Not Available - - 3 10gige-1 Up - - 3 10gige-2 Up - - 3 10gige-agg-1 Up groupnet0.prod.pool0 132.206.178.233 groupnet0.node.pool1 172.16.20.234 3 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.22 3 ext-2 No Carrier - - 3 ext-agg Not Available - - 4 10gige-1 Up - - 4 10gige-2 Up - - 4 10gige-agg-1 Up groupnet0.prod.pool0 132.206.178.234 groupnet0.node.pool1 172.16.20.235 4 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.23 4 ext-2 No Carrier - - 4 ext-agg Not Available - - 5 10gige-1 Up - - 5 10gige-2 Up - - 5 10gige-agg-1 Up groupnet0.prod.pool0 132.206.178.235 groupnet0.node.pool1 172.16.20.233 5 ext-1 Up groupnet0.mgmt.mgmt 172.16.10.24 5 ext-2 No Carrier - - 5 ext-agg Not Available - - -------------------------------------------------------------------- Total: 30
- A few notes about the above:
- Because the initial cluster configuration was sloppy, the LNNs (Logical Node Number) and Node ID don’t match.
- This explains why some 10gige-agg interface have different octal bits in
pool0
andpool1
. - Ultimately, the LLNs and NodeIDs should be re-assigned to match the nodes position in the rack.
- This would avoid potential mistakes when updating or servicing the cluster.
- Current setting:
BIC-Isilon-Cluster-4# isi config Welcome to the Isilon IQ configuration console. Copyright (c) 2001-2016 EMC Corporation. All Rights Reserved. Enter 'help' to see list of available commands. Enter 'help <command>' to see help for a specific command. Enter 'quit' at any prompt to discard changes and exit. Node build: Isilon OneFS v8.0.0.1 B_MR_8_0_0_1_131(RELEASE) Node serial number: SX410-301608-0264 BIC-Isilon-Cluster >>> lnnset LNN Device ID Cluster IP ---------------------------------------- 1 1 10.0.3.1 2 2 10.0.3.2 3 4 10.0.3.4 4 3 10.0.3.3 5 6 10.0.3.5 BIC-Isilon-Cluster >>> exit BIC-Isilon-Cluster-4#
- The domain DNS configuration must be updated:
- The new zone delegation for the SmartConnect zone
isi-node.bic.mni.mcgill.ca.
has to be put in place. - A new glue record must be created for the SSIP (SmartConnect Service IP) of the delegated zone.
- The new zone delegation for the SmartConnect zone
; glue record sip-node.bic.mni.mcgill.ca. IN A 172.16.20.232 ; zone delegation isi-node.bic.mni.mcgill.ca. IN NS sip-node.bic.mni.mcgill.ca.
- Verify the
SC Zone: nfs.isi-node.bic.mni.mcgill.ca
resolves properly and in a round-robin way. - Both on the cluster and a clients:
malin@login1:~$ dig +short nfs.isi-node.bic.mni.mcgill.ca 172.16.20.236 malin@login1:~$ dig +short nfs.isi-node.bic.mni.mcgill.ca 172.16.20.237 malin@login1:~$ dig +short nfs.isi-node.bic.mni.mcgill.ca 172.16.20.233 malin@login1:~$ dig +short nfs.isi-node.bic.mni.mcgill.ca 172.16.20.234 malin@login1:~$ dig +short nfs.isi-node.bic.mni.mcgill.ca 172.16.20.235 malin@login1:~$ dig +short nfs.isi-node.bic.mni.mcgill.ca 172.16.20.236 BIC-Isilon-Cluster-4# dig +short nfs.isi-node.bic.mni.mcgill.ca 172.16.20.237 BIC-Isilon-Cluster-4# dig +short nfs.isi-node.bic.mni.mcgill.ca 172.16.20.233 BIC-Isilon-Cluster-4# dig +short nfs.isi-node.bic.mni.mcgill.ca 172.16.20.234 BIC-Isilon-Cluster-4# dig +short nfs.isi-node.bic.mni.mcgill.ca 172.16.20.235 BIC-Isilon-Cluster-4# dig +short nfs.isi-node.bic.mni.mcgill.ca 172.16.20.236 BIC-Isilon-Cluster-4# dig +short nfs.isi-node.bic.mni.mcgill.ca 172.16.20.237
- It works!
FileSystems and Access Zones
- There are 2 access zones defined.
- Default zone is System: it must exist and cannot be deleted.
- The access zone called prod will be used to hold the user data.
- Both zones have lsa-nis-provider:BIC as Auth Providers.
- See below in Section about NIS that this might create a security weakness.
BIC-Isilon-Cluster-4# isi zone zones list Name Path ------------ System /ifs prod /ifs ------------ Total: 2 BIC-Isilon-Cluster-4# isi zone view System Name: System Path: /ifs Groupnet: groupnet0 Map Untrusted: - Auth Providers: lsa-nis-provider:BIC, lsa-file-provider:System, lsa-local-provider:System NetBIOS Name: - User Mapping Rules: - Home Directory Umask: 0077 Skeleton Directory: /usr/share/skel Cache Entry Expiry: 4H Zone ID: 1 BIC-Isilon-Cluster-4# isi zone view prod Name: prod Path: /ifs Groupnet: groupnet0 Map Untrusted: - Auth Providers: lsa-nis-provider:BIC, lsa-local-provider:prod, lsa-file-provider:System NetBIOS Name: - User Mapping Rules: - Home Directory Umask: 0077 Skeleton Directory: /usr/share/skel Cache Entry Expiry: 4H Zone ID: 2
- Another. more concise way of displaying the defined access zones:
IC-Isilon-Cluster-4# isi zone list -v Name: System Path: /ifs Groupnet: groupnet0 Map Untrusted: - Auth Providers: lsa-nis-provider:BIC, lsa-file-provider:System, lsa-local-provider:System NetBIOS Name: - User Mapping Rules: - Home Directory Umask: 0077 Skeleton Directory: /usr/share/skel Cache Entry Expiry: 4H Zone ID: 1 -------------------------------------------------------------------------------- Name: prod Path: /ifs Groupnet: groupnet0 Map Untrusted: - Auth Providers: lsa-nis-provider:BIC, lsa-local-provider:prod, lsa-file-provider:System NetBIOS Name: - User Mapping Rules: - Home Directory Umask: 0077 Skeleton Directory: /usr/share/skel Cache Entry Expiry: 4H Zone ID: 2
NFS, NIS: Exports and Aliases.
- There seem to be something amiss with NIS and OneFS v8.0.
- The System access zone had to be provided with NIS authentication as otherwise only numerical UIDs and GIDs show up on the /isi/data filesystem.
- There might be a potential security weakness there.
- See https://community.emc.com/thread/193468?start=0&tstart=0 even though this thread is for v7.2
- Created /etc/netgroup with “+” in it on one node as suggested in the post above and somehow OneFS propagated it to the other nodes.
- List the NIS auth providers:
BIC-Isilon-Cluster-4# isi auth nis list Name NIS Domain Servers Status ----------------------------------------- BIC vamana 132.206.178.227 online 132.206.178.243 ----------------------------------------- Total: 1
BIC-Isilon-Cluster-1# isi auth nis view BIC Name: BIC NIS Domain: vamana Servers: 132.206.178.227, 132.206.178.243 Status: online Authentication: Yes Balance Servers: Yes Check Online Interval: 3m Create Home Directory: No Enabled: Yes Enumerate Groups: Yes Enumerate Users: Yes Findable Groups: - Findable Users: - Group Domain: NIS_GROUPS Groupnet: groupnet0 Home Directory Template: - Hostname Lookup: Yes Listable Groups: - Listable Users: - Login Shell: /bin/bash Normalize Groups: No Normalize Users: No Provider Domain: - Ntlm Support: all Request Timeout: 20 Restrict Findable: Yes Restrict Listable: No Retry Time: 5 Unfindable Groups: wheel, 0, insightiq, 15, isdmgmt, 16 Unfindable Users: root, 0, insightiq, 15, isdmgmt, 16 Unlistable Groups: - Unlistable Users: - User Domain: NIS_USERS Ypmatch Using Tcp: No
- Show the exports for the zone prod:
BIC-Isilon-Cluster-1# isi nfs exports list --zone prod ID Zone Paths Description --------------------------------- 1 prod /ifs/data - --------------------------------- Total: 1 BIC-Isilon-Cluster-1# isi nfs exports view 1 --zone prod ID: 1 Zone: prod Paths: /ifs/data Description: - Clients: 132.206.178.0/24 Root Clients: - Read Only Clients: - Read Write Clients: - All Dirs: No Block Size: 8.0k Can Set Time: Yes Case Insensitive: No Case Preserving: Yes Chown Restricted: No Commit Asynchronous: No Directory Transfer Size: 128.0k Encoding: DEFAULT Link Max: 32767 Map Lookup UID: No Map Retry: Yes Map Root Enabled: True User: nobody Primary Group: - Secondary Groups: - Map Non Root Enabled: False User: nobody Primary Group: - Secondary Groups: - Map Failure Enabled: False User: nobody Primary Group: - Secondary Groups: - Map Full: Yes Max File Size: 8192.00000P Name Max Size: 255 No Truncate: No Read Only: No Readdirplus: Yes Readdirplus Prefetch: 10 Return 32Bit File Ids: No Read Transfer Max Size: 1.00M Read Transfer Multiple: 512 Read Transfer Size: 128.0k Security Type: unix Setattr Asynchronous: No Snapshot: - Symlinks: Yes Time Delta: 1.0 ns Write Datasync Action: datasync Write Datasync Reply: datasync Write Filesync Action: filesync Write Filesync Reply: filesync Write Unstable Action: unstable Write Unstable Reply: unstable Write Transfer Max Size: 1.00M Write Transfer Multiple: 512 Write Transfer Size: 512.0k
- It doesn’t seem possible to directly list the netgroups defined on the NIS master.
- One can however list the members of a specific netgroup if one happens to know its name:
BIC-Isilon-Cluster-4# isi auth netgroups view xgeraid --recursive --provider nis:BIC Netgroup: - Domain: - Hostname: edgar-xge.bic.mni.mcgill.ca Username: - -------------------------------------------------------------------------------- Netgroup: - Domain: - Hostname: gustav-xge.bic.mni.mcgill.ca Username: - -------------------------------------------------------------------------------- Netgroup: - Domain: - Hostname: tatania-xge.bic.mni.mcgill.ca Username: - -------------------------------------------------------------------------------- Netgroup: - Domain: - Hostname: tubal-xge.bic.mni.mcgill.ca Username: - -------------------------------------------------------------------------------- Netgroup: - Domain: - Hostname: tullus-xge.bic.mni.mcgill.ca Username: - -------------------------------------------------------------------------------- Netgroup: - Domain: - Hostname: tutor-xge.bic.mni.mcgill.ca Username: - BIC-Isilon-Cluster-1# isi auth netgroups view computecore Netgroup: - Domain: - Hostname: thaisa Username: - -------------------------------------------------------------------------------- Netgroup: - Domain: - Hostname: vaux Username: - -------------------------------------------------------------------------------- Netgroup: - Domain: - Hostname: widow Username: -
I can’t reproduce this behaviour anymore so it should be taken with a grain of salt! I’ll leave this in place for the moment, buut it might go away soon…
- Clients in netgroups must be specified with IP addresses, names don’t work:
BIC-Isilon-Cluster-1# isi nfs exports modify 1 --clear-clients --zone prod BIC-Isilon-Cluster-1# isi auth netgroups view isix --zone prod Netgroup: - Domain: - Hostname: dromio.bic.mni.mcgill.ca Username: - BIC-Isilon-Cluster-1# isi nfs exports modify 1 --clients isix --zone prod bad host dromio in netgroup isix, skipping BIC-Isilon-Cluster-1# isi auth netgroups view xisi Netgroup: - Domain: - Hostname: 132.206.178.51 Username: - BIC-Isilon-Cluster-1# isi nfs exports modify 1 --add-clients xisi --zone prod BIC-Isilon-Cluster-1# isi nfs exports view 1 --zone prod ID: 1 Zone: prod Paths: /ifs/data Description: - Clients: xisi Root Clients: - Read Only Clients: - Read Write Clients: - All Dirs: No Block Size: 8.0k Can Set Time: Yes Case Insensitive: No Case Preserving: Yes Chown Restricted: No Commit Asynchronous: No Directory Transfer Size: 128.0k Encoding: DEFAULT Link Max: 32767 Map Lookup UID: No Map Retry: Yes Map Root Enabled: True User: nobody Primary Group: - Secondary Groups: - Map Non Root Enabled: False User: nobody Primary Group: - Secondary Groups: - Map Failure Enabled: False User: nobody Primary Group: - Secondary Groups: - Map Full: Yes Max File Size: 8192.00000P Name Max Size: 255 No Truncate: No Read Only: No Readdirplus: Yes Readdirplus Prefetch: 10 Return 32Bit File Ids: No Read Transfer Max Size: 1.00M Read Transfer Multiple: 512 Read Transfer Size: 128.0k Security Type: unix Setattr Asynchronous: No Snapshot: - Symlinks: Yes Time Delta: 1.0 ns Write Datasync Action: datasync Write Datasync Reply: datasync Write Filesync Action: filesync Write Filesync Reply: filesync Write Unstable Action: unstable Write Unstable Reply: unstable Write Transfer Max Size: 1.00M Write Transfer Multiple: 512 Write Transfer Size: 512.0k
Workaround To The Zone Exports Issue With Netgroups.
- Netgroup entries in the NIS maps must be FQDN: even short names won’t work with the option
—ignore-unresolvable-hosts
. - Modify the zone exports by using the options to the
isi nfs exports modify 1 —add-clients sgibic —ignore-unresolvable-hosts —zone prod
BIC-Isilon-Cluster-3# isi auth netgroups view sgibic --recursive --provider nis:BIC Netgroup: - Domain: - Hostname: julia.bic.mni.mcgill.ca Username: - -------------------------------------------------------------------------------- Netgroup: - Domain: - Hostname: luciana.bic.mni.mcgill.ca Username: - -------------------------------------------------------------------------------- Netgroup: - Domain: - Hostname: mouldy.bic.mni.mcgill.ca Username: - -------------------------------------------------------------------------------- Netgroup: - Domain: - Hostname: vaux.bic.mni.mcgill.ca Username: - BIC-Isilon-Cluster-3# isi nfs exports view 1 --zone prod ID: 1 Zone: prod Paths: /ifs/data Description: - Clients: isix, xisi Root Clients: - Read Only Clients: - Read Write Clients: - All Dirs: No Block Size: 8.0k Can Set Time: Yes Case Insensitive: No Case Preserving: Yes Chown Restricted: No Commit Asynchronous: No Directory Transfer Size: 128.0k Encoding: DEFAULT Link Max: 32767 Map Lookup UID: No Map Retry: Yes Map Root Enabled: True User: nobody Primary Group: - Secondary Groups: - Map Non Root Enabled: False User: nobody Primary Group: - Secondary Groups: - Map Failure Enabled: False User: nobody Primary Group: - Secondary Groups: - Map Full: Yes Max File Size: 8192.00000P Name Max Size: 255 No Truncate: No Read Only: No Readdirplus: Yes Readdirplus Prefetch: 10 Return 32Bit File Ids: No Read Transfer Max Size: 1.00M Read Transfer Multiple: 512 Read Transfer Size: 128.0k Security Type: unix Setattr Asynchronous: No Snapshot: - Symlinks: Yes Time Delta: 1.0 ns Write Datasync Action: datasync Write Datasync Reply: datasync Write Filesync Action: filesync Write Filesync Reply: filesync Write Unstable Action: unstable Write Unstable Reply: unstable Write Transfer Max Size: 1.00M Write Transfer Multiple: 512 Write Transfer Size: 512.0k BIC-Isilon-Cluster-3# isi nfs exports modify 1 --add-clients sgibic --zone prod bad host julia in netgroup sgibic, skipping BIC-Isilon-Cluster-3# isi nfs exports view 1 --zone prod ID: 1 Zone: prod Paths: /ifs/data Description: - Clients: isix, xisi Root Clients: - Read Only Clients: - Read Write Clients: - All Dirs: No Block Size: 8.0k Can Set Time: Yes Case Insensitive: No Case Preserving: Yes Chown Restricted: No Commit Asynchronous: No Directory Transfer Size: 128.0k Encoding: DEFAULT Link Max: 32767 Map Lookup UID: No Map Retry: Yes Map Root Enabled: True User: nobody Primary Group: - Secondary Groups: - Map Non Root Enabled: False User: nobody Primary Group: - Secondary Groups: - Map Failure Enabled: False User: nobody Primary Group: - Secondary Groups: - Map Full: Yes Max File Size: 8192.00000P Name Max Size: 255 No Truncate: No Read Only: No Readdirplus: Yes Readdirplus Prefetch: 10 Return 32Bit File Ids: No Read Transfer Max Size: 1.00M Read Transfer Multiple: 512 Read Transfer Size: 128.0k Security Type: unix Setattr Asynchronous: No Snapshot: - Symlinks: Yes Time Delta: 1.0 ns Write Datasync Action: datasync Write Datasync Reply: datasync Write Filesync Action: filesync Write Filesync Reply: filesync Write Unstable Action: unstable Write Unstable Reply: unstable Write Transfer Max Size: 1.00M Write Transfer Multiple: 512 Write Transfer Size: 512.0k BIC-Isilon-Cluster-3# isi nfs exports modify 1 --add-clients sgibic --ignore-unresolvable-hosts --zone prod BIC-Isilon-Cluster-3# isi nfs exports view 1 --zone prod ID: 1 Zone: prod Paths: /ifs/data Description: - Clients: isix, sgibic, xisi Root Clients: - Read Only Clients: - Read Write Clients: - All Dirs: No Block Size: 8.0k Can Set Time: Yes Case Insensitive: No Case Preserving: Yes Chown Restricted: No Commit Asynchronous: No Directory Transfer Size: 128.0k Encoding: DEFAULT Link Max: 32767 Map Lookup UID: No Map Retry: Yes Map Root Enabled: True User: nobody Primary Group: - Secondary Groups: - Map Non Root Enabled: False User: nobody Primary Group: - Secondary Groups: - Map Failure Enabled: False User: nobody Primary Group: - Secondary Groups: - Map Full: Yes Max File Size: 8192.00000P Name Max Size: 255 No Truncate: No Read Only: No Readdirplus: Yes Readdirplus Prefetch: 10 Return 32Bit File Ids: No Read Transfer Max Size: 1.00M Read Transfer Multiple: 512 Read Transfer Size: 128.0k Security Type: unix Setattr Asynchronous: No Snapshot: - Symlinks: Yes Time Delta: 1.0 ns Write Datasync Action: datasync Write Datasync Reply: datasync Write Filesync Action: filesync Write Filesync Reply: filesync Write Unstable Action: unstable Write Unstable Reply: unstable Write Transfer Max Size: 1.00M Write Transfer Multiple: 512 Write Transfer Size: 512.0k
A Real Example With Quotas
- Create an export with no root squashing from hosts in the
admincore
NIS netgroup and access to hosts inadmincore
- List the exports in the prod zone.
- Check the exports for any error.
BIC-Isilon-Cluster-4# isi nfs exports create /ifs/data/bicadmin1 --zone prod --clients admincore --root-clients admincore --ignore-unresolvable-hosts BIC-Isilon-Cluster-4# isi nfs exports list --zone prod ID Zone Paths Description ------------------------------------------- 1 prod /ifs/data - 3 prod /ifs/data/bicadmin1 - ------------------------------------------- Total: 2 BIC-Isilon-Cluster-4# isi nfs exports view 3 --zone prod ID: 3 Zone: prod Paths: /ifs/data/bicadmin1 Description: - Clients: admincore Root Clients: admincore Read Only Clients: - Read Write Clients: - All Dirs: No Block Size: 8.0k Can Set Time: Yes Case Insensitive: No Case Preserving: Yes Chown Restricted: No Commit Asynchronous: No Directory Transfer Size: 128.0k Encoding: DEFAULT Link Max: 32767 Map Lookup UID: No Map Retry: Yes Map Root Enabled: True User: nobody Primary Group: - Secondary Groups: - Map Non Root Enabled: False User: nobody Primary Group: - Secondary Groups: - Map Failure Enabled: False User: nobody Primary Group: - Secondary Groups: - Map Full: Yes Max File Size: 8192.00000P Name Max Size: 255 No Truncate: No Read Only: No Readdirplus: Yes Readdirplus Prefetch: 10 Return 32Bit File Ids: No Read Transfer Max Size: 1.00M Read Transfer Multiple: 512 Read Transfer Size: 128.0k Security Type: unix Setattr Asynchronous: No Snapshot: - Symlinks: Yes Time Delta: 1.0 ns Write Datasync Action: datasync Write Datasync Reply: datasync Write Filesync Action: filesync Write Filesync Reply: filesync Write Unstable Action: unstable Write Unstable Reply: unstable Write Transfer Max Size: 1.00M Write Transfer Multiple: 512 Write Transfer Size: 512.0k BIC-Isilon-Cluster-4# isi nfs exports check --zone prod ID Message ---------- ---------- Total: 0
How to create NFS aliases and use them
- It might be useful to create NFS aliases so that NFS clients can use a short symbolic name to mount the Isilon exports.
- Useful for ipl, movement or noel agglomerated mount points like
/ifs/data/ipl/ipl-5–6−8–10/
,/ifs/data/movement/movement3–4−5–6−7
or/ifs/data/noel/noel1–5
.
BIC-Isilon-Cluster-2# mkdir /ifs/data/movement/movement3-4-5-6-7 BIC-Isilon-Cluster-2# isi quota quotas create /ifs/data/movement/movement3-4-5-6-7 directory --zone prod --hard-threshold 400G --container=yes BIC-Isilon-Cluster-2# for i in 3 4 5 6 7; do mkdir /ifs/data/movement/movement3-4-5-6-7/movement$i; done BIC-Isilon-Cluster-2# ll /ifs/data/movement/movement3-4-5-6-7 total 14 drwxr-xr-x 7 root wheel 135 Oct 19 14:56 ./ drwxr-xr-x 5 root wheel 89 Oct 19 14:42 ../ drwxr-xr-x 2 root wheel 0 Oct 19 14:56 movement3/ drwxr-xr-x 2 root wheel 0 Oct 19 14:56 movement4/ drwxr-xr-x 2 root wheel 0 Oct 19 14:56 movement5/ drwxr-xr-x 2 root wheel 0 Oct 19 14:56 movement6/ drwxr-xr-x 2 root wheel 0 Oct 19 14:56 movement7/ BIC-Isilon-Cluster-2# for i in 3 4 5 6 7; do isi nfs exports create /ifs/data/movement/movement3-4-5-6-7/movement$i --zone prod --clients admincore --root-clients admincore; done BIC-Isilon-Cluster-2# for i in 3 4 5 6 7; do isi nfs aliases create /movement$i /ifs/data/movement/movement3-4-5-6-7/movement$i --zone prod; done
- This is used for the ipl. movement and noel allocated storage:
BIC-Isilon-Cluster-2# isi nfs aliases ls --zone prod | egrep '(ipl|movement|noel)' prod /ipl1 /ifs/data/ipl/ipl-agglo/ipl1 prod /ipl10 /ifs/data/ipl/ipl-5-6-8-10/ipl10 prod /ipl11 /ifs/data/ipl/ipl11 prod /ipl2 /ifs/data/ipl/ipl-agglo/ipl2 prod /ipl3 /ifs/data/ipl/ipl-agglo/ipl3 prod /ipl4 /ifs/data/ipl/ipl-agglo/ipl4 prod /ipl5 /ifs/data/ipl/ipl-5-6-8-10/ipl5 prod /ipl6 /ifs/data/ipl/ipl-5-6-8-10/ipl6 prod /ipl7 /ifs/data/ipl/ipl-agglo/ipl7 prod /ipl8 /ifs/data/ipl/ipl-5-6-8-10/ipl8 prod /ipl9 /ifs/data/ipl/ipl-agglo/ipl9 prod /ipl_proj01 /ifs/data/ipl/ipl-agglo/proj01 prod /ipl_proj02 /ifs/data/ipl/proj02 prod /ipl_proj03 /ifs/data/ipl/proj03 prod /ipl_proj04 /ifs/data/ipl/proj04 prod /ipl_proj05 /ifs/data/ipl/proj05 prod /ipl_proj06 /ifs/data/ipl/proj06 prod /ipl_proj07 /ifs/data/ipl/proj07 prod /ipl_proj08 /ifs/data/ipl/proj08 prod /ipl_proj09 /ifs/data/ipl/proj09 prod /ipl_proj10 /ifs/data/ipl/proj10 prod /ipl_proj11 /ifs/data/ipl/proj11 prod /ipl_proj12 /ifs/data/ipl/proj12 prod /ipl_proj13 /ifs/data/ipl/proj13 prod /ipl_proj14 /ifs/data/ipl/proj14 prod /ipl_proj15 /ifs/data/ipl/proj15 prod /ipl_proj16 /ifs/data/ipl/proj16 prod /ipl_quarantine /ifs/data/ipl/quarantine prod /ipl_scratch01 /ifs/data/ipl/scratch01 prod /ipl_scratch02 /ifs/data/ipl/scratch02 prod /ipl_scratch03 /ifs/data/ipl/scratch03 prod /ipl_scratch04 /ifs/data/ipl/scratch04 prod /ipl_scratch05 /ifs/data/ipl/scratch05 prod /ipl_scratch06 /ifs/data/ipl/scratch06 prod /ipl_scratch07 /ifs/data/ipl/scratch07 prod /ipl_scratch08 /ifs/data/ipl/scratch08 prod /ipl_scratch09 /ifs/data/ipl/scratch09 prod /ipl_scratch10 /ifs/data/ipl/scratch10 prod /ipl_scratch11 /ifs/data/ipl/scratch11 prod /ipl_scratch12 /ifs/data/ipl/scratch12 prod /ipl_scratch13 /ifs/data/ipl/scratch13 prod /ipl_scratch14 /ifs/data/ipl/scratch14 prod /ipl_scratch15 /ifs/data/ipl/scratch15 prod /ipl_user01 /ifs/data/ipl/ipl-agglo/user01 prod /ipl_user02 /ifs/data/ipl/user02 prod /movement3 /ifs/data/movement/movement3-4-5-6-7/movement3 prod /movement4 /ifs/data/movement/movement3-4-5-6-7/movement4 prod /movement5 /ifs/data/movement/movement3-4-5-6-7/movement5 prod /movement6 /ifs/data/movement/movement3-4-5-6-7/movement6 prod /movement7 /ifs/data/movement/movement3-4-5-6-7/movement7 prod /movement8 /ifs/data/movement/movement8 prod /movement9 /ifs/data/movement/movement9 prod /noel1 /ifs/data/noel/noel1-5/noel1 prod /noel2 /ifs/data/noel/noel1-5/noel2 prod /noel3 /ifs/data/noel/noel1-5/noel3 prod /noel4 /ifs/data/noel/noel1-5/noel4 prod /noel5 /ifs/data/noel/noel1-5/noel5 prod /noel6 /ifs/data/noel/noel6 prod /noel7 /ifs/data/noel/noel7 prod /noel8 /ifs/data/noel/noel8
- With the NFS aliases in place a NFS client can mount an exports like this:
~$ mkdir /mnt/ifs/movement7 ~$ mount -t nfs -o vers=4 nfs.isi.bic.mni.mcgill.ca:/movement7 /mnt/ifs/movement7
Quotas
User Quotas
- One can use the web GUI to create a user quota for export
/ifs/data/bicadmin1
defined above. - Here, using the CLI we create a user quota for user
malin
on the export/ifs/data/bicadmin1
with soft and hard limits.
BIC-Isilon-Cluster-4# isi quota quotas create /ifs/data/bicadmin1 user \ --user malin --hard-threshold 1G \ --soft-threshold 500M --soft-grace 1W --zone prod --verbose Created quota: USER:malin@/ifs/data/bicadmin1 BIC-Isilon-Cluster-2# isi quota quotas list --path /ifs/data/bicadmin1 --zone prod --verbose Type AppliesTo Path Snap Hard Soft Adv Grace Files With Overhead W/O Overhead Over Enforced Container Linked ------------------------------------------------------------------------------------------------------------------------------------------------- user malin /ifs/data/bicadmin1 No 1.00G 500.00M - 1W 4625 1.39G 1024.00M - Yes No No directory DEFAULT /ifs/data/bicadmin1 No 400.00G 399.00G - 1W 797410 138.94G 93.39G - Yes Yes - ------------------------------------------------------------------------------------------------------------------------------------------------- Total: 2
Directory Quotas
- The exports
/ifs/data/loris
withID=5
has already been created. - Here we put a 1TB directory quota on it and list the quota explicitely.
- Option
—container yes
should be used: thendf
on a client will display the exports quota value rather than the whole cluster available space in the zone of the export.
BIC-Isilon-Cluster-3# isi nfs exports list --zone prod ID Zone Paths Description ------------------------------------------- 1 prod /ifs/data - 3 prod /ifs/data/bicadmin1 - 4 prod /ifs/data/bicdata - 5 prod /ifs/data/loris - ------------------------------------------- Total: 4 BIC-Isilon-Cluster-3# isi nfs exports view 5 --zone prod ID: 5 Zone: prod Paths: /ifs/data/loris Description: - Clients: admincore Root Clients: admincore Read Only Clients: - Read Write Clients: - All Dirs: No Block Size: 8.0k Can Set Time: Yes Case Insensitive: No Case Preserving: Yes Chown Restricted: No Commit Asynchronous: No Directory Transfer Size: 128.0k Encoding: DEFAULT Link Max: 32767 Map Lookup UID: No Map Retry: Yes Map Root Enabled: True User: nobody Primary Group: - Secondary Groups: - Map Non Root Enabled: False User: nobody Primary Group: - Secondary Groups: - Map Failure Enabled: False User: nobody Primary Group: - Secondary Groups: - Map Full: Yes Max File Size: 8192.00000P Name Max Size: 255 No Truncate: No Read Only: No Readdirplus: Yes Readdirplus Prefetch: 10 Return 32Bit File Ids: No Read Transfer Max Size: 1.00M Read Transfer Multiple: 512 Read Transfer Size: 128.0k Security Type: unix Setattr Asynchronous: No Snapshot: - Symlinks: Yes Time Delta: 1.0 ns Write Datasync Action: datasync Write Datasync Reply: datasync Write Filesync Action: filesync Write Filesync Reply: filesync Write Unstable Action: unstable Write Unstable Reply: unstable Write Transfer Max Size: 1.00M Write Transfer Multiple: 512 Write Transfer Size: 512.0k BIC-Isilon-Cluster-3# isi quota quotas create /ifs/data/loris directory \ --hard-threshold 1T --zone prod --container=yes BIC-Isilon-Cluster-3# isi quota quotas list Type AppliesTo Path Snap Hard Soft Adv Used ---------------------------------------------------------------------------- user malin /ifs/data/bicadmin1 No 10.00G 500.00M - 1024.00M directory DEFAULT /ifs/data/bicadmin1 No 900.00G 800.00G - 298.897G directory DEFAULT /ifs/data/bicdata No 1.00T - - 106.904G directory DEFAULT /ifs/data/loris No 1.00T - - 0 ---------------------------------------------------------------------------- Total: 4 BIC-Isilon-Cluster-3# isi quota quotas view /ifs/data/loris directory Path: /ifs/data/loris Type: directory Snapshots: No Thresholds Include Overhead: No Usage Files: 16050 With Overhead: 422.04G W/O Overhead: 335.45G Over: - Enforced: Yes Container: Yes Linked: - Thresholds Hard Threshold: 1.00T Hard Exceeded: No Hard Last Exceeded: 1969-12-31T19:00:00 Advisory: - Advisory Exceeded: No Advisory Last Exceeded: - Soft Threshold: - Soft Exceeded: No Soft Last Exceeded: - Soft Grace: -
Snapѕhots
- This has been recently (Aug 2016) set in motion.
- All the settings are in flux, like snapshots schedule and path naming convention.
- Some experimentation will be in order.
Snapshots schedules listing
BIC-Isilon-Cluster-3# isi snapshot schedules ls ID Name --------------------------------- 2 snapshot-bicadmin1-daily-31d 3 snapshot-bicdata-daily-7d 4 snapshot-mril2-daily-3d 5 snapshot-mril3-daily-3d --------------------------------- Total: 4
Viewing the scheduled snapshots in details
BIC-Isilon-Cluster-3# isi snapshot schedules ls -v ID: 2 Name: snapshot-bicadmin1-daily-31d Path: /ifs/data/bicadmin1 Pattern: snapshot_bicadmin1_31d_%Y-%m-%d-%H-%M Schedule: every 1 days at 07:00 PM Duration: 1M1D Alias: alias-snapshot-bicadmin1-daily Next Run: 2016-10-04T19:00:00 Next Snapshot: snapshot_bicadmin1_31d_2016-10-04-19-00 -------------------------------------------------------------------------------- ID: 3 Name: snapshot-bicdata-daily-7d Path: /ifs/data/bicdata Pattern: snapshot-bicdata_daily_7d_%Y-%m-%d-%H-%M Schedule: every 1 days at 07:00 PM Duration: 1W1D Alias: alias-snapshot-bicdata-daily Next Run: 2016-10-04T19:00:00 Next Snapshot: snapshot-bicdata_daily_7d_2016-10-04-19-00 -------------------------------------------------------------------------------- ID: 4 Name: snapshot-mril2-daily-3d Path: /ifs/data/mril/mril2 Pattern: snapshot-mril2-daily-3d-%Y-%m-%d-%H-%M Schedule: every 1 days at 11:45 PM Duration: 3D1H Alias: alias-snapshot-mril2-daily-3d Next Run: 2016-10-04T23:45:00 Next Snapshot: snapshot-mril2-daily-3d-2016-10-04-23-45 -------------------------------------------------------------------------------- ID: 5 Name: snapshot-mril3-daily-3d Path: /ifs/data/mril/mril3 Pattern: snapshot-mril3-daily-3d-%Y-%m-%d-%H-%M Schedule: every 1 days at 11:45 PM Duration: 3D2H Alias: alias-snapshot-mril3-daily-3d Next Run: 2016-10-04T23:45:00 Next Snapshot: snapshot-mril3-daily-3d-2016-10-04-23-45
Listing the snapshots and viewing the details on a particular snapshot
BIC-Isilon-Cluster-3# isi snapshot snapshots list ID Name Path -------------------------------------------------------------------------------- 378 alias-snapshot-bicadmin1-daily /ifs/data/bicadmin1 737 snapshot_bicadmin1_30D_2016-09-03-_19-00 /ifs/data/bicadmin1 740 snapshot-bicdata_daily_30D_expiration_2016-09-03-_19-00 /ifs/data/bicdata 744 snapshot_bicadmin1_30D_2016-09-04-_19-00 /ifs/data/bicadmin1 747 snapshot-bicdata_daily_30D_expiration_2016-09-04-_19-00 /ifs/data/bicdata 751 snapshot_bicadmin1_30D_2016-09-05-_19-00 /ifs/data/bicadmin1 754 snapshot-bicdata_daily_30D_expiration_2016-09-05-_19-00 /ifs/data/bicdata 758 snapshot_bicadmin1_30D_2016-09-06-_19-00 /ifs/data/bicadmin1 761 snapshot-bicdata_daily_30D_expiration_2016-09-06-_19-00 /ifs/data/bicdata 765 snapshot_bicadmin1_30D_2016-09-07-_19-00 /ifs/data/bicadmin1 768 snapshot-bicdata_daily_30D_expiration_2016-09-07-_19-00 /ifs/data/bicdata 772 snapshot_bicadmin1_30D_2016-09-08-_19-00 /ifs/data/bicadmin1 775 snapshot-bicdata_daily_30D_expiration_2016-09-08-_19-00 /ifs/data/bicdata 779 snapshot_bicadmin1_30D_2016-09-09-_19-00 /ifs/data/bicadmin1 782 snapshot-bicdata_daily_30D_expiration_2016-09-09-_19-00 /ifs/data/bicdata 786 snapshot_bicadmin1_30D_2016-09-10-_19-00 /ifs/data/bicadmin1 789 snapshot-bicdata_daily_30D_expiration_2016-09-10-_19-00 /ifs/data/bicdata 793 snapshot_bicadmin1_30D_2016-09-11-_19-00 /ifs/data/bicadmin1 796 snapshot-bicdata_daily_30D_expiration_2016-09-11-_19-00 /ifs/data/bicdata 800 snapshot_bicadmin1_30D_2016-09-12-_19-00 /ifs/data/bicadmin1 803 snapshot-bicdata_daily_30D_expiration_2016-09-12-_19-00 /ifs/data/bicdata 807 snapshot_bicadmin1_30D_2016-09-13-_19-00 /ifs/data/bicadmin1 810 snapshot-bicdata_daily_30D_expiration_2016-09-13-_19-00 /ifs/data/bicdata 814 snapshot_bicadmin1_30D_2016-09-14-_19-00 /ifs/data/bicadmin1 817 snapshot-bicdata_daily_30D_expiration_2016-09-14-_19-00 /ifs/data/bicdata 821 snapshot_bicadmin1_30D_2016-09-15-_19-00 /ifs/data/bicadmin1 824 snapshot-bicdata_daily_30D_expiration_2016-09-15-_19-00 /ifs/data/bicdata 828 snapshot_bicadmin1_30D_2016-09-16-_19-00 /ifs/data/bicadmin1 831 snapshot-bicdata_daily_30D_expiration_2016-09-16-_19-00 /ifs/data/bicdata 835 snapshot_bicadmin1_30D_2016-09-17-_19-00 /ifs/data/bicadmin1 838 snapshot-bicdata_daily_30D_expiration_2016-09-17-_19-00 /ifs/data/bicdata 842 snapshot_bicadmin1_30D_2016-09-18-_19-00 /ifs/data/bicadmin1 845 snapshot-bicdata_daily_30D_expiration_2016-09-18-_19-00 /ifs/data/bicdata 849 snapshot_bicadmin1_30D_2016-09-19-_19-00 /ifs/data/bicadmin1 852 snapshot-bicdata_daily_30D_expiration_2016-09-19-_19-00 /ifs/data/bicdata 856 snapshot_bicadmin1_30D_2016-09-20-_19-00 /ifs/data/bicadmin1 859 snapshot-bicdata_daily_30D_expiration_2016-09-20-_19-00 /ifs/data/bicdata 863 snapshot_bicadmin1_30D_2016-09-21-_19-00 /ifs/data/bicadmin1 866 snapshot-bicdata_daily_30D_expiration_2016-09-21-_19-00 /ifs/data/bicdata 870 snapshot_bicadmin1_30D_2016-09-22-_19-00 /ifs/data/bicadmin1 873 snapshot-bicdata_daily_30D_expiration_2016-09-22-_19-00 /ifs/data/bicdata 877 snapshot_bicadmin1_30D_2016-09-23-_19-00 /ifs/data/bicadmin1 880 snapshot-bicdata_daily_30D_expiration_2016-09-23-_19-00 /ifs/data/bicdata 884 snapshot_bicadmin1_30D_2016-09-24-_19-00 /ifs/data/bicadmin1 887 snapshot-bicdata_daily_30D_expiration_2016-09-24-_19-00 /ifs/data/bicdata 891 snapshot_bicadmin1_30D_2016-09-25-_19-00 /ifs/data/bicadmin1 894 snapshot-bicdata_daily_30D_expiration_2016-09-25-_19-00 /ifs/data/bicdata 898 snapshot_bicadmin1_30D_2016-09-26-_19-00 /ifs/data/bicadmin1 901 snapshot-bicdata_daily_30D_expiration_2016-09-26-_19-00 /ifs/data/bicdata 905 snapshot_bicadmin1_30D_2016-09-27-_19-00 /ifs/data/bicadmin1 908 snapshot-bicdata_daily_30D_expiration_2016-09-27-_19-00 /ifs/data/bicdata 912 snapshot_bicadmin1_30D_2016-09-28-_19-00 /ifs/data/bicadmin1 915 snapshot-bicdata_daily_30D_expiration_2016-09-28-_19-00 /ifs/data/bicdata 919 snapshot_bicadmin1_30D_2016-09-29-_19-00 /ifs/data/bicadmin1 922 snapshot-bicdata_daily_30D_expiration_2016-09-29-_19-00 /ifs/data/bicdata 926 snapshot_bicadmin1_30D_2016-09-30-_19-00 /ifs/data/bicadmin1 929 snapshot-bicdata_daily_30D_expiration_2016-09-30-_19-00 /ifs/data/bicdata 933 snapshot_bicadmin1_30D_2016-10-01-_19-00 /ifs/data/bicadmin1 936 snapshot-bicdata_daily_30D_expiration_2016-10-01-_19-00 /ifs/data/bicdata 940 snapshot_bicadmin1_30D_2016-10-02-_19-00 /ifs/data/bicadmin1 943 snapshot-bicdata_daily_30D_expiration_2016-10-02-_19-00 /ifs/data/bicdata 947 snapshot_bicadmin1_30D_2016-10-03-_19-00 /ifs/data/bicadmin1 950 snapshot-bicdata_daily_30D_expiration_2016-10-03-_19-00 /ifs/data/bicdata 952 FSAnalyze-Snapshot-Current-1475546412 /ifs -------------------------------------------------------------------------------- Total: 64 BIC-Isilon-Cluster-3# isi snapshot snapshots view snapshot_bicadmin1_30D_2016-10-03-_19-00 ID: 947 Name: snapshot_bicadmin1_30D_2016-10-03-_19-00 Path: /ifs/data/bicadmin1 Has Locks: No Schedule: snapshot-bicadmin1-daily-31d Alias Target ID: - Alias Target Name: - Created: 2016-10-03T19:00:03 Expires: 2016-11-03T19:00:00 Size: 1.016G Shadow Bytes: 0 % Reserve: 0.00% % Filesystem: 0.00% State: active
- What is this snapshot
FSAnalyze-Snapshot-Current-1475546412
?? - I never created it.
- It looks likes it is against best-practices: path is
/ifs
- Found the answer: this is needed for the FS analytics done with the InsightIQ server.
- DO NOT DELETE IT!
BIC-Isilon-Cluster-3# isi snapshot snapshots view FSAnalyze-Snapshot-Current-1475546412 ID: 952 Name: FSAnalyze-Snapshot-Current-1475546412 Path: /ifs Has Locks: No Schedule: - Alias Target ID: - Alias Target Name: - Created: 2016-10-03T22:00:12 Expires: - Size: 1.2129T Shadow Bytes: 0 % Reserve: 0.00% % Filesystem: 0.00% State: active
Snapshot aliases point to the latest snapshot
BIC-Isilon-Cluster-3# isi snapshot aliases ls ID Name Target ID Target Name --------------------------------------------------------------------------------------- 378 alias-snapshot-bicadmin1-daily 947 snapshot_bicadmin1_30D_2016-10-03-_19-00 ---------------------------------------------------------------------------------------
Creating snapshot schedules
- Create a snapshot schedule and an alias to it so that that it points to the last performed snapshot.
- For example, to create a snapshot schedule for
/ifs/data/mril/mril2
done- Everyday day at 11:45PM
- with a retention period of 73 hours (3 days + 1 hour).
- Alias
alias-snapshot-mril2-daily-3d
points to the last scheduled snapshot.
BIC-Isilon-Cluster-2# isi snapshot schedules create snapshot-mril2-daily-3d /ifs/data/mril/mril2 snapshot_mril2_daily_3d-%Y-%m-%d-%H-%M \ "every 1 days at 11:45 PM" --duration 73H --alias alias-snapshot-mril2-daily-3d BIC-Isilon-Cluster-2# isi snapshot schedules ls ID Name --------------------------------- 2 snapshot-bicadmin1-daily-31d 3 snapshot-bicdata-daily-7d 4 snapshot-mril2-daily-3d 5 snapshot-mril3-daily-3d --------------------------------- Total: 4 BIC-Isilon-Cluster-2# isi snapshot schedules view 4 ID: 4 Name: snapshot-mril2-daily-3d Path: /ifs/data/mril/mril2 Pattern: snapshot_mril2_daily_3d-%Y-%m-%d-%H-%M Schedule: every 1 days at 11:45 PM Duration: 3D1H Alias: alias-snapshot-mril2-daily-3d
- CLI command can be messy! The web GUI is more intuitive.
- See below for the
pattern
syntax. - Syntax for snapshot schedule creation:
BIC-Isilon-Cluster-2# isi snapshot schedules create <name> <path> <pattern> <schedule> [--alias <alias>] [--duration <duration>] [--verbose Options <name> Specifies a name for the snapshot schedule. <path> Specifies the path of the directory to include in the snapshots. <pattern> Specifies a naming pattern for snapshots created according to the schedule. See below. <schedule> Specifies how often snapshots are created. Specify in the following format: "<interval> [<frequency>]" Specify <interval> in one of the following formats: Every [{other | <integer>}] week [on <day>] Every [{other | <integer>}] month [on the <integer>] Every [<day>[, ...] [of every [{other | <integer>}] week]] The last {day | weekday | <day>} of every [{other |<integer>}] month The <integer> {weekday | <day>} of every [{other | <integer>}] month Yearly on <month> <integer> Yearly on the {last | <integer>} [weekday | <day>] of <month> Specify <frequency> in one of the following formats: at <hh>[:<mm>] [{AM | PM}] every [<integer>] {hours | minutes} [between <hh>[:<mm>] [{AM | PM}] and <hh>[:<mm>] [{AM | PM}]] every [<integer>] {hours | minutes} [from <hh>[:<mm>] [{AM | PM}] to <hh>[:<mm>] [{AM | PM}]] You can optionally append "st", "th", or "rd" to <integer>. For example, you can specify "Every 1st month" Specify <day> as any day of the week or a three-letter abbreviation for the day. For example, both "saturday" and "sat" are valid. --alias <alias> Specifies an alias for the latest snapshot generated based on the schedule. The alias enables you to quickly locate the most recent snapshot that was generated according to the schedule. Specify as any string. {--duration | -x} <duration> Specifies how long snapshots generated according to the schedule are stored on the cluster before OneFS automatically deletes them. Specify in the following format: <integer><units> The following <units> are valid: Y Specifies years M Specifies months W Specifies weeks D Specifies days H Specifies hours {--verbose | -v} Displays a message confirming that the snapshot schedule was created.
- The following variables can be included in a snapshot naming pattern:
- Have fun choosing one!
Variable | Description |
---|---|
%A | The day of the week. |
%a | The abbreviated day of the week. For example, if the snapshot is generated on a Sunday, %a is replaced with Sun. |
%B | The name of the month. |
%b | The abbreviated name of the month. For example, if the snapshot is generated in September, %b is replaced with Sep. |
%C | The first two digits of the year. For example, if the snapshot is created in 2014, %C is replaced with 20. |
%c | The time and day. This variable is equivalent to specifying b T %Y. |
%d | The two digit day of the month. |
%e | The day of the month. A single-digit day is preceded by a blank space. |
%F | The date. This variable is equivalent to specifying m-%d. |
%G | The year. This variable is equivalent to specifying G is replaced with 2016, because only one day of that week is in 2017. |
%g | The abbreviated year. This variable is equivalent to specifying g is replaced with 16, because only oneday of that week is in 2017. |
%H | The hour. The hour is represented on the 24-hour clock. Single-digit hours are preceded by a zero. For example, if a snapshot is created at 1:45 AM, %H is replaced with 01. |
%h | The abbreviated name of the month. This variable is equivalent to specifying %b. |
%I | The hour represented on the 12-hour clock. Single-digit hours are preceded by a zero. For example, if a snapshot is created at 1:45 PM, %I is replaced with 01. |
%j | The numeric day of the year. For example, if a snapshot is created on February 1, %j is replaced with 32. |
%k | The hour represented on the 24-hour clock. Single-digit hours are preceded by a blank space. |
%l | The hour represented on the 12-hour clock. Single-digit hours are preceded by a blank space. For example, if a snapshot is created at 1:45 AM, %I is replaced with 1. |
%M | The two-digit minute. |
%m | The two-digit month. |
%p | AM or PM. |
%{PolicyName} | The name of the replication policy that the snapshot was created for. This variable is valid only if you are specifying a snapshot naming pattern for a replication policy. |
%R | The time. This variable is equivalent to specifying M. |
!%r | The time. This variable is equivalent to specifying M:p. |
%S | The two-digit second. |
%s | The second represented in UNIX or POSIX time. |
%{SrcCluster} | The name of the source cluster of the replication policy that the snapshot was created for. This variable is valid only if you are specifying a snapshot naming pattern for a replication policy. |
%T | The time. This variable is equivalent to specifying M:%S |
%U | The two-digit numerical week of the year. Numbers range from 00 to 53. The first day of the week is calculated as Sunday. |
%u | The numerical day of the week. Numbers range from 1 to 7. The first day of the week is calculated as Monday. For example, if a snapshot is created on Sunday, %u is replaced with 7. |
%V | The two-digit numerical week of the year that the snapshot was created in. Numbers range from 01 to 53. The first day of the week is calculated as Monday. If the week of January 1 is four or more days in length, then that week is counted as the first week of the year. |
%v | The day that the snapshot was created. This variable is equivalent to specifying b-%Y. |
%W | The two-digit numerical week of the year that the snapshot was created in. Numbers range from 00 to 53. The first day of the week is calculated as Monday. |
%w | The numerical day of the week that the snapshot was created on. Numbers range from 0 to 6. The first day of the week is calculated as Sunday. For example, if the snapshot was created on Sunday, %w is replaced with 0. |
%X | The time that the snapshot was created. This variable is equivalent to specifying M:%S. |
%Y | The year that the snapshot was created in. |
%y | The last two digits of the year that the snapshot was created in. For example, if the snapshot was created in 2014, %y is replaced with 14. |
%Z | The time zone that the snapshot was created in. |
%z | The offset from coordinated universal time (UTC) of the time zone that the snapshot was created in. If preceded by a plus sign, the time zone is east of UTC. If preceded by a minus sign, the time zone is west of UTC. |
%+ | The time and date that the snapshot was created. This variable is equivalent to specifying b X Y. |
%% | Escapes a percent sign. For example, 100%% is replaced with 100%. |
Creating ChangeList between snapshots
- Create a ChangeList between 2 snapshots and list its content.
- Delete it at the end.
BIC-Isilon-Cluster-3# isi snapshot snapshots ls | grep mril3 965 snapshot-mril3-daily-3d-2016-10-04-23-45 /ifs/data/mril/mril3 966 alias-snapshot-mril3-daily-3d /ifs/data/mril/mril3 979 snapshot-mril3-daily-3d-2016-10-05-23-45 /ifs/data/mril/mril3 BIC-Isilon-Cluster-3# isi snapshot snapshots view 979 ID: 979 Name: snapshot-mril3-daily-3d-2016-10-05-23-45 Path: /ifs/data/mril/mril3 Has Locks: No Schedule: snapshot-mril3-daily-3d Alias Target ID: - Alias Target Name: - Created: 2016-10-05T23:45:04 Expires: 2016-10-09T01:45:00 Size: 6.0k Shadow Bytes: 0 % Reserve: 0.00% % Filesystem: 0.00% State: active BIC-Isilon-Cluster-3# isi snapshot snapshots view 965 ID: 965 Name: snapshot-mril3-daily-3d-2016-10-04-23-45 Path: /ifs/data/mril/mril3 Has Locks: No Schedule: snapshot-mril3-daily-3d Alias Target ID: - Alias Target Name: - Created: 2016-10-04T23:45:10 Expires: 2016-10-08T01:45:00 Size: 61.0k Shadow Bytes: 0 % Reserve: 0.00% % Filesystem: 0.00% State: active BIC-Isilon-Cluster-3# isi job jobs start ChangelistCreate --older-snapid 965 --newer-snapid 979 BIC-Isilon-Cluster-3# isi_changelist_mod -l 965_979_inprog BIC-Isilon-Cluster-3# isi job jobs list ID Type State Impact Pri Phase Running Time --------------------------------------------------------------- 964 ChangelistCreate Running Low 5 2/4 21m --------------------------------------------------------------- Total: 1 BIC-Isilon-Cluster-3# isi_changelist_mod -l 965_979 BIC-Isilon-Cluster-3# isi_changelist_mod -a 965_979 st_ino=4357852748 st_mode=040755 st_size=14149 st_atime=1475608476 st_mtime=1475608476 st_ctime=1475698088 st_flags=224 cl_flags=00 path=/ifs/data/mril/mril3/ilana/matlab st_ino=4374360447 st_mode=040755 st_size=207 st_atime=1467833479 st_mtime=1467833479 st_ctime=1475698088 st_flags=224 cl_flags=00 path=/ifs/data/mril/mril3/ilana/matlab/AMICO-master/matlab/other st_ino=4402080042 st_mode=0100644 st_size=1033 st_atime=1475678676 st_mtime=1475678676 st_ctime=1475698087 st_flags=224 cl_flags=01 path=/ifs/data/mril/mril3/ilana/matlab/correlate.m~ st_ino=4402080043 st_mode=0100644 st_size=2922 st_atime=1475588733 st_mtime=1475588733 st_ctime=1475698087 st_flags=224 cl_flags=01 path=/ifs/data/mril/mril3/ilana/matlab/AMICO-master/matlab/other/AMICO_LoadData.m st_ino=4414831639 st_mode=0100644 st_size=1047 st_atime=1475690420 st_mtime=1475690420 st_ctime=1475698087 st_flags=224 cl_flags=01 path=/ifs/data/mril/mril3/ilana/matlab/correlate.m st_ino=4374468851 st_mode=0100644 st_size=2921 st_atime=1466519137 st_mtime=1466519137 st_ctime=1470857490 st_flags=224 cl_flags=02 path=/ifs/data/mril/mril3/ilana/matlab/AMICO-master/matlab/other/AMICO_LoadData.m st_ino=4416223571 st_mode=0100644 st_size=890 st_atime=1475264575 st_mtime=1475264575 st_ctime=1475350931 st_flags=224 cl_flags=02 path=/ifs/data/mril/mril3/ilana/matlab/correlate.m BIC-Isilon-Cluster-3# isi_changelist_mod -k 965_979
Jobs
How To Delete A Large Amount Of Files/Dirs Without Impacting the Cluster Performance
- Submit the job type TreeDelete.
- Here the /ifs/data/zmanda contains 5TB of restored data from the Zmanda NDMP backup tapes.
BIC-Isilon-Cluster-4# isi job jobs start TreeDelete --paths /ifs/data/zmanda --priority 10 --policy low Started job [4050] BIC-Isilon-Cluster-4# isi job jobs list ID Type State Impact Pri Phase Running Time --------------------------------------------------------- 4050 TreeDelete Running Low 10 1/1 - --------------------------------------------------------- Total: 1 BIC-Isilon-Cluster-4# isi job jobs view 4050 ID: 4050 Type: TreeDelete State: Running Impact: Low Policy: LOW Pri: 10 Phase: 1/1 Start Time: 2017-10-12T11:45:41 Running Time: 22s Participants: 1, 2, 3, 4, 6 Progress: Started Waiting on job ID: - Description: {'count': 1, 'lins': {'1:1044:db60': """/ifs/data/zmanda"""}} BIC-Isilon-Cluster-4# isi status Cluster Name: BIC-Isilon-Cluster Cluster Health: [ OK ] Cluster Storage: HDD SSD Storage Size: 641.6T (649.3T Raw) 0 (0 Raw) VHS Size: 7.7T Used: 207.1T (32%) 0 (n/a) Avail: 434.5T (68%) 0 (n/a) Health Throughput (bps) HDD Storage SSD Storage ID |IP Address |DASR | In Out Total| Used / Size |Used / Size ---+---------------+-----+-----+-----+-----+-----------------+----------------- 1|172.16.10.20 | OK |48.9k| 133k| 182k|41.4T/ 130T( 32%)|(No Storage SSDs) 2|172.16.10.21 | OK |20.5M|128.0|20.5M|41.4T/ 130T( 32%)|(No Storage SSDs) 3|172.16.10.22 | OK | 1.3M| 111k| 1.4M|41.4T/ 130T( 32%)|(No Storage SSDs) 4|172.16.10.23 | OK |14.1M|75.6M|89.6M|41.4T/ 130T( 32%)|(No Storage SSDs) 5|172.16.10.24 | OK |96.6k|66.7k| 163k|41.4T/ 130T( 32%)|(No Storage SSDs) ---+---------------+-----+-----+-----+-----+-----------------+----------------- Cluster Totals: |36.0M|75.9M| 112M| 207T/ 642T( 32%)|(No Storage SSDs) Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only Critical Events: Cluster Job Status: Running jobs: Job Impact Pri Policy Phase Run Time -------------------------- ------ --- ---------- ----- ---------- TreeDelete[4050] Low 10 LOW 1/1 0:25:01 No paused or waiting jobs. No failed jobs. Recent job results: Time Job Event --------------- -------------------------- ------------------------------ 10/12 04:00:02 ShadowStoreProtect[4049] Succeeded (LOW) 10/12 03:05:16 SnapshotDelete[4048] Succeeded (MEDIUM) 10/12 02:00:17 WormQueue[4047] Succeeded (LOW) 10/12 01:05:31 SnapshotDelete[4046] Succeeded (MEDIUM) 10/12 00:04:57 SnapshotDelete[4045] Succeeded (MEDIUM) 10/11 23:21:32 FSAnalyze[4043] Succeeded (LOW) 10/11 22:37:01 SnapshotDelete[4044] Succeeded (MEDIUM) 10/11 20:00:25 ShadowStoreProtect[4042] Succeeded (LOW) 11/15 14:53:34 MultiScan[1254] MultiScan[1254] Failed 10/06 14:45:55 ChangelistCreate[975] ChangelistCreate[975] Failed
InsightIQ Installation and Config
Install IIQ
- License must be installed on the Isilon cluster.
- Create a CentOS 6.7 (beurk) virtual machine and properly configure the network on it.
- Call this machine zaphod.
- Need the InsightIQ shell script from EMC support.
- Will use a local (to the VM) data store for IIQ.
- The install script will fail due to some dependencies mismatch with openssl, I think.
- Here a way to force the install.
- Extract the content of the self-packaged script.
- Remove the offending package openssl-1.0.1e-42.el6_7.2.x86_64.rpm from it.
- Manually install yum install openssl-devel.x86_64.
- Run the install script sh ./install_insightiq.sh.
root@zaphod ~$ sh ./install-insightiq-4.0.0.0049.sh --target ./iiq root@zaphod ~$ cd iiq root@zaphod ~$ rm openssl-1.0.1e-42.el6_7.2.x86_64.rpm root@zaphod ~/iiq$ ll *.rpm -rw-r--r-- 1 root root 928548 Jan 12 18:55 bash-4.1.2-29.el6.x86_64.rpm -rw-r--r-- 1 root root 367680 Jan 12 18:55 freetype-2.3.11-14.el6_3.1.x86_64.rpm -rw-r--r-- 1 root root 3993500 Jan 12 18:55 glibc-2.12-1.149.el6_6.7.x86_64.rpm -rw-r--r-- 1 root root 14884088 Jan 12 18:56 glibc-common-2.12-1.149.el6_6.7.x86_64.rpm -rw-r--r-- 1 root root 25899811 Jan 12 18:55 isilon-insightiq-4.0.0.0049-1.x86_64.rpm -rw-r--r-- 1 root root 139192 Jan 12 18:56 libXfont-1.4.5-3.el6_5.x86_64.rpm -rw-r--r-- 1 root root 25012 Jan 12 18:56 libfontenc-1.0.5-2.el6.x86_64.rpm -rw-r--r-- 1 root root 178512 Jan 12 18:56 libjpeg-turbo-1.2.1-3.el6_5.x86_64.rpm -rw-r--r-- 1 root root 186036 Jan 12 18:56 libpng-1.2.49-1.el6_2.x86_64.rpm -rw-r--r-- 1 root root 280524 Jan 12 18:56 openssh-5.3p1-112.el6_7.x86_64.rpm -rw-r--r-- 1 root root 448872 Jan 12 18:56 openssh-clients-5.3p1-112.el6_7.x86_64.rpm -rw-r--r-- 1 root root 331544 Jan 12 18:56 openssh-server-5.3p1-112.el6_7.x86_64.rpm -rw-r--r-- 1 root root 1225760 Jan 12 18:56 openssl-devel-1.0.1e-42.el6_7.2.x86_64.rpm -rw-r--r-- 1 root root 1033984 Jan 12 18:56 postgresql93-9.3.4-1PGDG.rhel6.x86_64.rpm -rw-r--r-- 1 root root 1544220 Jan 12 18:56 postgresql93-devel-9.3.4-1PGDG.rhel6.x86_64.rpm -rw-r--r-- 1 root root 194856 Jan 12 18:56 postgresql93-libs-9.3.4-1PGDG.rhel6.x86_64.rpm -rw-r--r-- 1 root root 4259740 Jan 12 18:56 postgresql93-server-9.3.4-1PGDG.rhel6.x86_64.rpm -rw-r--r-- 1 root root 43900 Jan 12 18:56 ttmkfdir-3.0.9-32.1.el6.x86_64.rpm -rw-r--r-- 1 root root 453984 Jan 12 18:56 tzdata-2015b-1.el6.noarch.rpm -rw-r--r-- 1 root root 39308573 Jan 12 18:56 wkhtmltox-0.12.2.1_linux-centos6-amd64.rpm -rw-r--r-- 1 root root 76712 Jan 12 18:56 xorg-x11-font-utils-7.2-11.el6.x86_64.rpm -rw-r--r-- 1 root root 2929960 Jan 12 18:56 xorg-x11-fonts-75dpi-7.2-9.1.el6.noarch.rpm -rw-r--r-- 1 root root 532016 Jan 12 18:56 xorg-x11-fonts-Type1-7.2-9.1.el6.noarch.rpm root@zaphod ~/iiq$ yum list openssl\* Installed Packages openssl.x86_64 1.0.1e-42.el6_7.4 @updates/$releasever Available Packages openssl.i686 1.0.1e-42.el6_7.4 updates openssl-devel.i686 1.0.1e-42.el6_7.4 updates openssl-devel.x86_64 1.0.1e-42.el6_7.4 updates openssl-perl.x86_64 1.0.1e-42.el6_7.4 updates openssl-static.x86_64 1.0.1e-42.el6_7.4 updates openssl098e.i686 0.9.8e-20.el6.centos.1 updates openssl098e.x86_64 0.9.8e-20.el6.centos.1 updates root@zaphod ~/iiq$ yum install openssl-devel.x86_64 =============================================================================================================================================================================================== Package Arch Version Repository Size =============================================================================================================================================================================================== Installing: openssl-devel x86_64 1.0.1e-42.el6_7.4 updates 1.2 M Installing for dependencies: keyutils-libs-devel x86_64 1.4-5.el6 base 29 k krb5-devel x86_64 1.10.3-42z1.el6_7 updates 502 k libcom_err-devel x86_64 1.41.12-22.el6 base 33 k libselinux-devel x86_64 2.0.94-5.8.el6 base 137 k libsepol-devel x86_64 2.0.41-4.el6 base 64 k zlib-devel x86_64 1.2.3-29.el6 base 44 k Transaction Summary =============================================================================================================================================================================================== Install 7 Package(s) Total download size: 2.0 M Installed size: 4.9 M Installed: openssl-devel.x86_64 0:1.0.1e-42.el6_7.4 Dependency Installed: keyutils-libs-devel.x86_64 0:1.4-5.el6 krb5-devel.x86_64 0:1.10.3-42z1.el6_7 libcom_err-devel.x86_64 0:1.41.12-22.el6 libselinux-devel.x86_64 0:2.0.94-5.8.el6 libsepol-devel.x86_64 0:2.0.41-4.el6 zlib-devel.x86_64 0:1.2.3-29.el6 root@zaphod ~/iiq$ sh ./install_insightiq.sh This script automates the installation or upgrade of InsightIQ. If you are running a version of InsightIQ that can be upgraded by this version, the upgrade will occur automatically. If you are trying to upgrade an unsupported version, the script will exit. If you are installing on a new system, the script will perform a clean install. Are you ready to proceed with the installation? Please enter (Y)es or (N)o followed by [ENTER] >>> y =============================================================================================================================================================================================== Package Arch Version Repository Size =============================================================================================================================================================================================== Installing: freetype x86_64 2.3.11-14.el6_3.1 /freetype-2.3.11-14.el6_3.1.x86_64 816 k isilon-insightiq x86_64 4.0.0.0049-1 /isilon-insightiq-4.0.0.0049-1.x86_64 93 M libXfont x86_64 1.4.5-3.el6_5 /libXfont-1.4.5-3.el6_5.x86_64 295 k libfontenc x86_64 1.0.5-2.el6 /libfontenc-1.0.5-2.el6.x86_64 40 k libjpeg-turbo x86_64 1.2.1-3.el6_5 /libjpeg-turbo-1.2.1-3.el6_5.x86_64 466 k libpng x86_64 2:1.2.49-1.el6_2 /libpng-1.2.49-1.el6_2.x86_64 639 k postgresql93 x86_64 9.3.4-1PGDG.rhel6 /postgresql93-9.3.4-1PGDG.rhel6.x86_64 5.2 M postgresql93-devel x86_64 9.3.4-1PGDG.rhel6 /postgresql93-devel-9.3.4-1PGDG.rhel6.x86_64 6.7 M postgresql93-libs x86_64 9.3.4-1PGDG.rhel6 /postgresql93-libs-9.3.4-1PGDG.rhel6.x86_64 631 k postgresql93-server x86_64 9.3.4-1PGDG.rhel6 /postgresql93-server-9.3.4-1PGDG.rhel6.x86_64 15 M ttmkfdir x86_64 3.0.9-32.1.el6 /ttmkfdir-3.0.9-32.1.el6.x86_64 99 k wkhtmltox x86_64 1:0.12.2.1-1 /wkhtmltox-0.12.2.1_linux-centos6-amd64 109 M xorg-x11-font-utils x86_64 1:7.2-11.el6 /xorg-x11-font-utils-7.2-11.el6.x86_64 294 k xorg-x11-fonts-75dpi noarch 7.2-9.1.el6 /xorg-x11-fonts-75dpi-7.2-9.1.el6.noarch 2.9 M xorg-x11-fonts-Type1 noarch 7.2-9.1.el6 /xorg-x11-fonts-Type1-7.2-9.1.el6.noarch 863 k Installing for dependencies: avahi-libs x86_64 0.6.25-15.el6 base 55 k blas x86_64 3.2.1-4.el6 base 321 k c-ares x86_64 1.10.0-3.el6 base 75 k cups-libs x86_64 1:1.4.2-72.el6 base 321 k cyrus-sasl-gssapi x86_64 2.1.23-15.el6_6.2 base 34 k fontconfig x86_64 2.8.0-5.el6 base 186 k gnutls x86_64 2.8.5-19.el6_7 updates 347 k keyutils x86_64 1.4-5.el6 base 39 k lapack x86_64 3.2.1-4.el6 base 4.3 M libX11 x86_64 1.6.0-6.el6 base 586 k libX11-common noarch 1.6.0-6.el6 base 192 k libXau x86_64 1.0.6-4.el6 base 24 k libXext x86_64 1.3.2-2.1.el6 base 35 k libXrender x86_64 0.9.8-2.1.el6 base 24 k libbasicobjects x86_64 0.1.1-11.el6 base 21 k libcollection x86_64 0.6.2-11.el6 base 36 k libdhash x86_64 0.4.3-11.el6 base 24 k libevent x86_64 1.4.13-4.el6 base 66 k libgfortran x86_64 4.4.7-16.el6 base 267 k libgssglue x86_64 0.1-11.el6 base 23 k libini_config x86_64 1.1.0-11.el6 base 46 k libipa_hbac x86_64 1.12.4-47.el6_7.8 updates 106 k libldb x86_64 1.1.25-2.el6_7 updates 113 k libnl x86_64 1.1.4-2.el6 base 121 k libpath_utils x86_64 0.2.1-11.el6 base 24 k libref_array x86_64 0.1.4-11.el6 base 23 k libsss_idmap x86_64 1.12.4-47.el6_7.8 updates 110 k libtalloc x86_64 2.1.5-1.el6_7 updates 26 k libtdb x86_64 1.3.8-1.el6_7 updates 43 k libtevent x86_64 0.9.26-2.el6_7 updates 29 k libtiff x86_64 3.9.4-10.el6_5 base 343 k libtirpc x86_64 0.2.1-10.el6 base 79 k libxcb x86_64 1.9.1-3.el6 base 110 k nfs-utils x86_64 1:1.2.3-64.el6 base 331 k nfs-utils-lib x86_64 1.1.5-11.el6 base 68 k pytalloc x86_64 2.1.5-1.el6_7 updates 10 k python-argparse noarch 1.2.1-2.1.el6 base 48 k python-sssdconfig noarch 1.12.4-47.el6_7.8 updates 133 k rpcbind x86_64 0.2.0-11.el6_7 updates 51 k samba4-libs x86_64 4.2.10-6.el6_7 updates 4.4 M sssd x86_64 1.12.4-47.el6_7.8 updates 101 k sssd-ad x86_64 1.12.4-47.el6_7.8 updates 193 k sssd-client x86_64 1.12.4-47.el6_7.8 updates 152 k sssd-common x86_64 1.12.4-47.el6_7.8 updates 978 k sssd-common-pac x86_64 1.12.4-47.el6_7.8 updates 136 k sssd-ipa x86_64 1.12.4-47.el6_7.8 updates 238 k sssd-krb5 x86_64 1.12.4-47.el6_7.8 updates 135 k sssd-krb5-common x86_64 1.12.4-47.el6_7.8 updates 191 k sssd-ldap x86_64 1.12.4-47.el6_7.8 updates 216 k sssd-proxy x86_64 1.12.4-47.el6_7.8 updates 130 k Transaction Summary =============================================================================================================================================================================================== Install 65 Package(s) Total size: 252 M Total download size: 15 M Installed size: 277 M insightiq 0:off 1:off 2:on 3:on 4:on 5:on 6:off chmod: cannot access `sssd': No such file or directory ip6tables: unrecognized service ip6tables: unrecognized service error reading information on service ip6tables: No such file or directory Shutting down interface eth0: [ OK ] Shutting down loopback interface: [ OK ] Bringing up loopback interface: [ OK ] Bringing up interface eth0: Determining if ip address 132.206.178.250 is already in use for device eth0... [ OK ] Generating RSA private key, 2048 bit long modulus ..+++ .........................................................................................................................+++ e is 65537 (0x10001) Signature ok subject=/C=US/ST=Washington/L=Seattle/O=EMC Isilon/CN=InsightIQ/emailAddress=support@emc.com Getting Private key Initializing database: [ OK ] Starting iiq_db service: [ OK ] Starting insightiq: [ OK ] Installed: freetype.x86_64 0:2.3.11-14.el6_3.1 isilon-insightiq.x86_64 0:4.0.0.0049-1 libXfont.x86_64 0:1.4.5-3.el6_5 libfontenc.x86_64 0:1.0.5-2.el6 libjpeg-turbo.x86_64 0:1.2.1-3.el6_5 libpng.x86_64 2:1.2.49-1.el6_2 postgresql93.x86_64 0:9.3.4-1PGDG.rhel6 postgresql93-devel.x86_64 0:9.3.4-1PGDG.rhel6 postgresql93-libs.x86_64 0:9.3.4-1PGDG.rhel6 postgresql93-server.x86_64 0:9.3.4-1PGDG.rhel6 ttmkfdir.x86_64 0:3.0.9-32.1.el6 wkhtmltox.x86_64 1:0.12.2.1-1 xorg-x11-font-utils.x86_64 1:7.2-11.el6 xorg-x11-fonts-75dpi.noarch 0:7.2-9.1.el6 xorg-x11-fonts-Type1.noarch 0:7.2-9.1.el6 Dependency Installed: avahi-libs.x86_64 0:0.6.25-15.el6 blas.x86_64 0:3.2.1-4.el6 c-ares.x86_64 0:1.10.0-3.el6 cups-libs.x86_64 1:1.4.2-72.el6 cyrus-sasl-gssapi.x86_64 0:2.1.23-15.el6_6.2 fontconfig.x86_64 0:2.8.0-5.el6 gnutls.x86_64 0:2.8.5-19.el6_7 keyutils.x86_64 0:1.4-5.el6 lapack.x86_64 0:3.2.1-4.el6 libX11.x86_64 0:1.6.0-6.el6 libX11-common.noarch 0:1.6.0-6.el6 libXau.x86_64 0:1.0.6-4.el6 libXext.x86_64 0:1.3.2-2.1.el6 libXrender.x86_64 0:0.9.8-2.1.el6 libbasicobjects.x86_64 0:0.1.1-11.el6 libcollection.x86_64 0:0.6.2-11.el6 libdhash.x86_64 0:0.4.3-11.el6 libevent.x86_64 0:1.4.13-4.el6 libgfortran.x86_64 0:4.4.7-16.el6 libgssglue.x86_64 0:0.1-11.el6 libini_config.x86_64 0:1.1.0-11.el6 libipa_hbac.x86_64 0:1.12.4-47.el6_7.8 libldb.x86_64 0:1.1.25-2.el6_7 libnl.x86_64 0:1.1.4-2.el6 libpath_utils.x86_64 0:0.2.1-11.el6 libref_array.x86_64 0:0.1.4-11.el6 libsss_idmap.x86_64 0:1.12.4-47.el6_7.8 libtalloc.x86_64 0:2.1.5-1.el6_7 libtdb.x86_64 0:1.3.8-1.el6_7 libtevent.x86_64 0:0.9.26-2.el6_7 libtiff.x86_64 0:3.9.4-10.el6_5 libtirpc.x86_64 0:0.2.1-10.el6 libxcb.x86_64 0:1.9.1-3.el6 nfs-utils.x86_64 1:1.2.3-64.el6 nfs-utils-lib.x86_64 0:1.1.5-11.el6 pytalloc.x86_64 0:2.1.5-1.el6_7 python-argparse.noarch 0:1.2.1-2.1.el6 python-sssdconfig.noarch 0:1.12.4-47.el6_7.8 rpcbind.x86_64 0:0.2.0-11.el6_7 samba4-libs.x86_64 0:4.2.10-6.el6_7 sssd.x86_64 0:1.12.4-47.el6_7.8 sssd-ad.x86_64 0:1.12.4-47.el6_7.8 sssd-client.x86_64 0:1.12.4-47.el6_7.8 sssd-common.x86_64 0:1.12.4-47.el6_7.8 sssd-common-pac.x86_64 0:1.12.4-47.el6_7.8 sssd-ipa.x86_64 0:1.12.4-47.el6_7.8 sssd-krb5.x86_64 0:1.12.4-47.el6_7.8 sssd-krb5-common.x86_64 0:1.12.4-47.el6_7.8 sssd-ldap.x86_64 0:1.12.4-47.el6_7.8 sssd-proxy.x86_64 0:1.12.4-47.el6_7.8
Configure IIQ and X509 Certificates for Web Access.
- Do as said in the install manual, caveat some errors/omissions.
- Create a user called iiq on the IIQ server (zaphod).
- On the Isilon cluster, activate the user insightiq on the Auth File provider, System zone.
BIC-Isilon-Cluster-4# isi auth users view insightiq Name: insightiq DN: - DNS Domain: - Domain: UNIX_USERS Provider: lsa-file-provider:System Sam Account Name: insightiq UID: 15 SID: S-1-22-1-15 Enabled: Yes Expired: No Expiry: - Locked: No Email: - GECOS: InsightIQ User Generated GID: No Generated UID: No Generated UPN: Yes Primary Group ID: GID:15 Name: insightiq Home Directory: /ifs/home/insightiq Max Password Age: - Password Expired: No Password Expiry: - Password Last Set: - Password Expires: Yes Shell: /sbin/nologin UPN: insightiq@UNIX_USERS User Can Change Password: No
- Install the BIC wildcard Comodo X509 certificate, key server and Comodo cert bundle in /home/iiq/certificates/STAR_bic_mni_mcgill_ca.pem.
root@zaphod ~$ cat STAR_bic_mni_mcgill_ca.crt \ STAR_bic_mni_mcgill_ca.key \ COMODO_CA_bundle.crt >> /home/iiq/certificates/STAR_bic_mni_mcgill_ca.pem
- Protect that file since it contains the secret server key.
root@zaphod ~$ chmod 400 /home/iiq/certificates/STAR_bic_mni_mcgill_ca.pem
- Modify /etc/isilon/insightiq.ini for the server cert location: ssl_pem = /home/iiq/certificates/STAR_bic_mni_mcgill_ca.pem.
- Restart the IIQ stuff with /etc/init.d/insightiq restart.
- Installation guide speaks uses command iiq_restart which is an alias defined in /etc/profile.d/insightiq.sh
- Check in /var/log/insightiq_stdio.log to see if the cert is OK.
- Port 80 and 443 must not be blocked by a firewall. Access restrictions should be enabled however.
- Connect to the web interface using the credentials for the user iiq.
- Go to “Setting” and add a cluster to monitor using the SmartConnect ip access sip.bic.mni.mcgill.ca.
- Use the cluster’s local user insightiq and its credentials to connect to the cluster.
- Bingo.
NFS Benchmarks Using FIO
- The following is shamelessly copied/stolen (with a few local modifications to suit our local environment) from this EMC blog entry:
- The benchmark as described in the above URL bypasses the mechanisms provided by the Isilon product
SmartConnect Adavanced
:- Connections to the cluster are made directly to an IP of a particular node in the Isilon cluster.
- This is done in order to not introduce any biases with any load-balancing (round-robin, cpu or network) done by
SmartConnect
.
- Strategy:
- Latencies and IOPs (I/O Operations per seconds) are the most meaningful metrics when assessing random IO performance: bandwidth is of secondary value for this case of I/O access pattern.
- For sequential access to storage, I/O performance is best assessed by measuring the client-to-server bandwidth.
- The client buffers and caching mechanisms must be examined and dealt with carefully.
- We are not interested in benchmarking the clients efficient use of local caches!
- A birds’s view of the network layout and the working files organization as explained in the url above:
/ifs/data/fiotest => /mnt/isilon/fiotest/ .. /fiojob_8k_50G_4jobs_randrw /fioresult_8k_50G_4jobs_randrw_172.16.20.42.log /172.16.20.102/ /172.16.20.203/ /172.16.20.204/ /172.16.20.42/ /control/.. /cleanup_remount_1to1.sh /nfs_copy_trusted.sh +-----------------+ /run_nfs_fio_8k_50G_4jobs_randrw.sh | node02 | /nfs_hosts.list | 172.16.20.202 | /trusted.key +-----------------+ /trusted.key.pub 2x 1GiG +-----------------+ +------------------+ | node03 |.........| LNN2 | | 172.16.20.203 |.........| 172.16.20.236 | +-----------------+ +------------------+ +-----------------+ +------------------+ | node04 |.........| LNN4 | | 172.16.20.204 |.........| 172.16.20.234 | +-----------------+ +------------------+ +-----------------+ +------------------+ | thaisa |.........| LNN5 | | 172.16.20.42 |.........| 172.16.20.235 | +-----------------+ +------------------+ +-----------------+ +------------------+ | widow |.........| LNN1 | | 172.16.20.102 |.........| 172.16.20.237 | +-----------------+ +------------------+ +------------------+ | LNN3 | | 172.16.20.233 | +------------------+
If this diagram is enough for you, skip to the section FIO Configuration and Benchmarking if you are not interested in the details of the networking setup or even simply jump to the FIO NFS Statistics Reports section for the actual results of the benchmarks.
Nodes Configuration
- Use the nodes
node02
,node03
,node04
,thaisa
andwidow
. - Note: somehow I can’t make the 6th node
vaux
mount the Isilon exports so I didn’t use it out of frustration. - Here are the relevant nodes network configuration and settings:
node02
,node03
andnode04
have the same network layout:eth0
in192.168.86.0/24
eth1
andeth2
bonded tobond0
in data network172.16.20.0/24
bond0:0
IP alias in management network172.16.10.0/24
- All data links from nodes to the Isilon cluster network front-end are dual 1GiG in bond.
node02: eth0 inet addr:192.168.86.202 Bcast:192.168.86.255 Mask:255.255.255.0 bond0 inet addr:172.16.20.202 Bcast:172.16.20.255 Mask:255.255.255.0 bond0:0 inet addr:172.16.10.202 Bcast:172.16.10.255 Mask:255.255.255.0 ~# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.86.1 0.0.0.0 UG 0 0 0 eth0 172.16.10.0 0.0.0.0 255.255.255.0 U 0 0 0 bond0 172.16.20.0 0.0.0.0 255.255.255.0 U 0 0 0 bond0 192.168.86.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 node03: eth0 inet addr:192.168.86.203 Bcast:192.168.86.255 Mask:255.255.255.0 bond0 inet addr:172.16.20.203 Bcast:172.16.20.255 Mask:255.255.255.0 bond0:0 inet addr:172.16.10.203 Bcast:172.16.10.255 Mask:255.255.255.0 ~# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.86.1 0.0.0.0 UG 0 0 0 eth0 172.16.10.0 0.0.0.0 255.255.255.0 U 0 0 0 bond0 172.16.20.0 0.0.0.0 255.255.255.0 U 0 0 0 bond0 192.168.86.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 node04: eth0 inet addr:192.168.86.204 Bcast:192.168.86.255 Mask:255.255.255.0 bond0 inet addr:172.16.20.204 Bcast:172.16.20.255 Mask:255.255.255.0 bond0:0 inet addr:172.16.10.204 Bcast:172.16.10.255 Mask:255.255.255.0 ~# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.86.1 0.0.0.0 UG 0 0 0 eth0 172.16.10.0 0.0.0.0 255.255.255.0 U 0 0 0 bond0 172.16.20.0 0.0.0.0 255.255.255.0 U 0 0 0 bond0 192.168.86.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
thaisa
andwidow
in real life are Xen Dom0.- They provide virtual hosts when running a Xen-ified kernel.
- For the purpose of this test, both have been rebooted without a Xen kernel.
- They use a (virtual) bridged network interface
xenbr0
connected to a bonded network interfacebond0
that acts as a external physical network interface.
thaisa: ~# brctl show bridge name bridge id STP enabled interfaces xenbr0 8000.00e081c19a1a no bond0 xenbr0 inet addr:132.206.178.42 Bcast:132.206.178.255 Mask:255.255.255.0 xenbr0:0 inet addr:172.16.20.42 Bcast:172.16.20.255 Mask:255.255.255.0 route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 132.206.178.1 0.0.0.0 UG 0 0 0 xenbr0 132.206.178.0 0.0.0.0 255.255.255.0 U 0 0 0 xenbr0 172.16.20.0 0.0.0.0 255.255.255.0 U 0 0 0 xenbr0 192.168.86.0 0.0.0.0 255.255.255.0 U 0 0 0 xenbr0 widow: ~# brctl show bridge name bridge id STP enabled interfaces xenbr0 8000.00e081c19a9a no bond0 xenbr0 inet addr:132.206.178.102 Bcast:132.206.178.255 Mask:255.255.255.0 xenbr0:0 inet addr:172.16.20.102 Bcast:172.16.20.255 Mask:255.255.255.0 route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 132.206.178.1 0.0.0.0 UG 0 0 0 xenbr0 132.206.178.0 0.0.0.0 255.255.255.0 U 0 0 0 xenbr0 172.16.20.0 0.0.0.0 255.255.255.0 U 0 0 0 xenbr0 192.168.86.0 0.0.0.0 255.255.255.0 U 0 0 0 xenbr0
- Create a NIS netgroup
fiotest
# Temp netgroup for fio test: node02,node03,node04,thaisa,widow. # nodes access Isilon from the 172.16.20.0/24 network. # thaisa, vaux and widow have bonded NIC aliases in 172.16.20.0/24 and 132.206.178.0/24 networks. # FOLLOWING LINE IS A ONE LINER. WATCH OUT FOR END_OF_LINE CHARACTERS! # DO NOT COPY-AND-PASTE!! fiotest (172.16.20.202,,) (172.16.20.203,,) (172.16.20.204,,) \ (172.16.20.42,,) (132.206.178.42,,) (thaisa.bic.mni.mcgill.ca,,) \ (172.16.20.102,,) (132.206.178.102,,) (widow.bic.mni.mcgill.ca,,)
Node02
is the control host (called “harness” in the blog link above).Node02
should have password-less root access to the Isilon cluster and hosts infiotest
- Create a ssh key and distribute the public key to
fiotest
in~root/.ssh/authorized_keys
- Distribute the pub key on all the nodes of the cluster, using
isi_for_array
- Create a ssh config file that will redirect the ssh host keys to
/dev/null
. - The options
CheckHostIP
andStrictHostKeyChecking
are to remove any spurious warning messages in the output stream of the FIO logfiles.
- Create a ssh key and distribute the public key to
node02:~# cat .ssh/config Host mgmt.isi.bic.mni.mcgill.ca ForwardX11 no ForwardAgent no User root CheckHostIP no StrictHostKeyChecking no UserKnownHostsFile /dev/null
- Verify that you can ssh from
node02
to any nodes or cluster nodes and alsomgmt.isi.bic.mni.mcgill.ca
without issuing a password. - Continue if and only if you have this working without any problem.
- Create a NFSv4 export on the Isilon cluster with the properties:
isi nfs exports create /ifs/data/fiotest --zone prod --root-clients fiotest --clients fiotest isi nfs aliases create /fiotest /ifs/data/fiotest --zone prod isi quota quotas create /ifs/data/fiotest directory --zone prod --hard-threshold 2T --container yes chmod 777 /ifs/data/fiotest ls -ld /ifs/data/fiotest drwxrwxrwx 9 root wheel 4711 Sep 27 11:11 /ifs/data/fiotest
- no_root_squash for hosts in
fiotest
: all nodes must be able to write as root in/ifs/data/fiotest
- read/write by everyone for the top dir
/ifs/data/fiotest
- On all the nodes:
- Create the local mount point
/mnt/isilon/fiotest
. - Verify that each node can mount the Isilon export on the local mount point
/mnt/isilon/fiotest
- Continue if and only if you have this working without any problem.
- Create the local mount point
- Create a file
/mnt/isilon/fiotest/control/nfs_hosts.list
with the IPs of the nodes and cluster node IPS. - Separator is the pipe character
|
. - No comments, no white space, no trailing end-of-line white space!
~# cat /mnt/isilon/fiotest/control/nfs_hosts.list 172.16.20.203|172.16.20.236 172.16.20.204|172.16.20.234 172.16.20.42|172.16.20.235 172.16.20.102|172.16.20.237
- Verify that the export
/ifs/data/fiotest
on the Isilon cluster can be mounted on all nodes. - Only when the config above are done and working correctly can we start on working with FIO.
Not interested in the FIO configuration: jump to the FIO NFS Statistics Reports section for the actual results of the benchmarks.
FIO Configuration and Benchmarking
- The FIO benchmarking are done using the following logic:
- On the master node
node02
, for each benchmark run with a specific FIO configuration:- Run the cleaning script
/mnt/isilon/fiotest/controlcleanup_remount_1to1.sh
: - For each benchmarking nodes in
/mnt/isilon/fiotest/control/nfs_hosts.list
:- Removes any FIO working files located in
/mnt/isilon/fiotest/xxx.xxx.xxx.xxx
used in the read-write benchmarkings. - umount/remounts the Isilon export on the benchmarking node.
- Removes any FIO working files located in
- Run the FIO script
/mnt/isilon/fiotest/control/run_nfs_fio_8k_50G_4jobs_randrw.sh
:- flush the L1 and L2 caches on the all the Isilon cluster nodes.
- For each nodes in
/mnt/isilon/fiotest/control/nfs_hosts.list
:- Connect to a benchmarking node with ssh.
- Sync all local filesystem cache buffers to disks and flush all I/O caches.
- Run the FIO command:
- FIO job file is
/mnt/isilon/fiotest/fiojob_8k_50G_4jobs_randrw
. - FIO working dir is
/mnt/isilon/fiotest/xxx.xxx.xxx.xxx
where x’s are IP of the benchmarking node. - FIO output is sent to
/mnt/isilon/fiotest/fioresult_8k_50G_4jobs_randrw_xxx.xxx.xxx.xxx.log
.
- Run the cleaning script
- On the master node
- Once more, an ascii diagram explaining the files layout:
/ifs/data/fiotest => /mnt/isilon/fiotest/ .. /fiojob_8k_50G_4jobs_randrw <-- FIO jobfile /fioresult_8k_50G_4jobs_randrw_172.16.20.42.log <-- FIO output logfile /172.16.20.102/ <-- FIO working directory /172.16.20.203/ <-- " " /172.16.20.204/ <-- " " /172.16.20.42/ <-- " " /control/.. /cleanup_remount_1to1.sh <-- cleanup and remount script /nfs_copy_trusted.sh <-- key distributor script +-----------------+ /run_nfs_fio_8k_50G_4jobs_randrw.sh <-- FIO start script | node02 | /nfs_hosts.list <-- nodes list | 172.16.20.202 | /trusted.key <-- ssh private +-----------------+ /trusted.key.pub <-- and public keys 2x 1GiG +-----------------+ +------------------+ | node03 |.........| LNN2 | | 172.16.20.203 |.........| 172.16.20.236 | +-----------------+ +------------------+ +-----------------+ +------------------+ | node04 |.........| LNN4 | | 172.16.20.204 |.........| 172.16.20.234 | +-----------------+ +------------------+ +-----------------+ +------------------+ | thaisa |.........| LNN5 | | 172.16.20.42 |.........| 172.16.20.235 | +-----------------+ +------------------+ +-----------------+ +------------------+ | widow |.........| LNN1 | | 172.16.20.102 |.........| 172.16.20.237 | +-----------------+ +------------------+ +------------------+ | LNN3 | | 172.16.20.233 | +------------------+
The cleanup script file /mnt/isilon/fiotest/controlcleanup_remount_1to1.sh
#!/bin/bash #first go through all lines in hosts.list for i in $(cat /mnt/isilon/fiotest/control/nfs_hosts.list) ; do # then split each line read in to an array by the pipe symbol IFS='|' read -a pairs <<< "${i}" # show back the mapping echo "Client host: ${pairs[0]} Isilon node: ${pairs[1]}" # connect over ssh with the key and mount hosts, create directories etc. - has to be single line ssh -i /mnt/isilon/fiotest/control/trusted.key ${pairs[0]} -fqno StrictHostKeyChecking=no \ "[ -d /mnt/isilon/fiotest/${pairs[0]} ] && rm -rf /mnt/isilon/fiotest/${pairs[0]}; sleep 1; \ umount -fl /mnt/isilon/fiotest; sleep 1; \ mount -t nfs -o vers=4 ${pairs[1]}:/fiotest /mnt/isilon/fiotest; sleep 1; \ [ ! -d /mnt/isilon/fiotest/${pairs[0]} ] && mkdir /mnt/isilon/fiotest/${pairs[0]}" # erase the array pair unset pairs # go for the next line in nfs_hosts.list; done
The FIO script file /mnt/isilon/fiotest/control/run_nfs_fio_8k_50G_4jobs_randrw.sh
#!/bin/bash # First, connect to the first isilon node, and flush cache on array # This might takes minutes to complete. echo -n "Purging L1 and L2 cache first..."; ssh -i /mnt/isilon/fiotest/control/trusted.key mgmt.isi.bic.mni.mcgill.ca -fqno StrictHostKeyChecking=no "isi_for_array isi_flush" #ssh -i /mnt/isilon/fiotest/control/trusted.key mgmt.isi.bic.mni.mcgill.ca -fqno StrictHostKeyChecking=no "isi_for_array w" # wait for cache flushing to finish, normally around 10 seconds is enough # on larger clusters, sometimes up to few minutes should be used! echo "...sleeping for 30secs" sleep 30 # The L3 cache purge is not recommended as all metadata accelerated by SSDs is going. but, maybe... #echo "On OneFS 7.1.1 clusters and newer, running L3, purging L3 cache"; #ssh -i /mnt/isilon/fiotest/control/trusted.key 10.63.208.64 -fqno StrictHostKeyChecking=no "isi_for_array isi_flush --l3-full"; #sleep 10; # The rest is similar to the other scripts # First go through all lines in nfs_hosts.list for i in $(cat /mnt/isilon/fiotest/control/nfs_hosts.list) ; do # then split each line read in to an array by the pipe symbol IFS='|' read -a pairs <<< "${i}" # Connect over ssh with the key and mount hosts, create directories etc. - has to be single line # "sync && echo 3 > /proc/sys/vm/drop_caches" purges all buffers to disk # The fio jobfile is one level above from control directory ssh -i /mnt/isilon/fiotest/control/trusted.key ${pairs[0]} -fqno StrictHostKeyChecking=no \ "sync && echo 3 > /proc/sys/vm/drop_caches; FILENAME=\"/mnt/isilon/fiotest/${pairs[0]}\" \ fio --output=/mnt/isilon/fiotest/fioresult_8k_50G_4jobs_randrw_${pairs[0]}.log \ /mnt/isilon/fiotest/fiojob_8k_50G_4jobs_randrw" done
The FIO jobfile /mnt/isilon/fiotest/fiojob_8k_50G_4jobs_randrw
.
- The most important parameters:
directory=${FILENAME}
sets the working directory to the variable${FILENAME}
, set in the FIO calling script.rw=randrw
specifies a mixed random read and write I/O pattern.size=50G
sets the total transferred I/O size to 50GB.bs=
sets the block size for I/O units. Default: 4kdirect=0
makes use of buffered I/O.ioengine=sync
use a synchronous ioengine (simple read, write and fseeks system calls).iodepth=1
: I/O depth set to 1 (number of I/O units to keep in flight towards the working file).numjobs=4
creates 4 clones (processes/threads performing the same workload) of this job.group_reporting
aggregates per-job stats into one per-group report when numjobs is specified.runtime=10800
restricts the run time to 10800 seconds, 3 hours. This might limit the total transferred to less than the value specified bysize=
; --start job file -- [global] description=-------------THIS IS A JOB DOING ${FILENAME} --------- directory=${FILENAME} rw=randrw size=50G bs=8k zero_buffers direct=0 sync=0 refill_buffers ioengine=sync iodepth=1 numjobs=4 group_reporting runtime=10800 [8k_randread] ; -- end job file --
A typical output log file from FIO is like the following:
- 1024k I/O block size, random read-write I/O pattern, 4 threads, 50GB data transferred per thread, 200GB total.
- 4 of these are submitted at the same time on 4 different nodes for a total number of 16 threads and a total transferred size of 800GB (runtime might limit this)
1024k_randrw: (g=0): rw=randrw, bs=1M-1M/1M-1M/1M-1M, ioengine=sync, iodepth=1 ... fio-2.1.11 Starting 4 processes 1024k_seqrw: Laying out IO file(s) (1 file(s) / 51200MB) 1024k_seqrw: Laying out IO file(s) (1 file(s) / 51200MB) 1024k_seqrw: Laying out IO file(s) (1 file(s) / 51200MB) 1024k_seqrw: Laying out IO file(s) (1 file(s) / 51200MB) 1024k_seqrw: (groupid=0, jobs=4): err= 0: pid=27587: Mon Oct 3 11:03:24 2016 Description : [-------------THIS IS A JOB DOING /mnt/isilon/fiotest/172.16.20.203 ---------] read : io=102504MB, bw=36773KB/s, iops=35, runt=2854400msec clat (msec): min=11, max=29666, avg=106.96, stdev=433.67 lat (msec): min=11, max=29666, avg=106.96, stdev=433.67 clat percentiles (msec): | 1.00th=[ 23], 5.00th=[ 27], 10.00th=[ 29], 20.00th=[ 34], | 30.00th=[ 36], 40.00th=[ 42], 50.00th=[ 54], 60.00th=[ 83], | 70.00th=[ 110], 80.00th=[ 143], 90.00th=[ 198], 95.00th=[ 255], | 99.00th=[ 408], 99.50th=[ 510], 99.90th=[ 6390], 99.95th=[10290], | 99.99th=[16712] bw (KB /s): min= 34, max=37415, per=31.68%, avg=11651.07, stdev=8193.03 write: io=102296MB, bw=36698KB/s, iops=35, runt=2854400msec clat (usec): min=399, max=57018, avg=450.81, stdev=477.23 lat (usec): min=399, max=57018, avg=451.15, stdev=477.23 clat percentiles (usec): | 1.00th=[ 410], 5.00th=[ 418], 10.00th=[ 422], 20.00th=[ 426], | 30.00th=[ 430], 40.00th=[ 434], 50.00th=[ 438], 60.00th=[ 442], | 70.00th=[ 450], 80.00th=[ 458], 90.00th=[ 470], 95.00th=[ 490], | 99.00th=[ 556], 99.50th=[ 580], 99.90th=[ 684], 99.95th=[ 956], | 99.99th=[27520] bw (KB /s): min= 34, max=80788, per=34.53%, avg=12670.23, stdev=10361.46 lat (usec) : 500=48.01%, 750=1.90%, 1000=0.01% lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.10%, 50=23.92% lat (msec) : 100=9.51%, 250=13.87%, 500=2.40%, 750=0.14%, 1000=0.01% lat (msec) : 2000=0.01%, >=2000=0.10% cpu : usr=0.07%, sys=1.03%, ctx=2703279, majf=0, minf=118 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=102504/w=102296/d=0, short=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: io=102504MB, aggrb=36772KB/s, minb=36772KB/s, maxb=36772KB/s, mint=2854400msec, maxt=2854400msec WRITE: io=102296MB, aggrb=36698KB/s, minb=36698KB/s, maxb=36698KB/s, mint=2854400msec, maxt=2854400msec
- An explanation of the output stats in a FIO logfile (from the manpage):
io Number of megabytes of I/O performed. bw Average data rate (bandwidth). runt Threads run time. slat Submission latency minimum, maximum, average and standard deviation. This is the time it took to submit the I/O. clat Completion latency minimum, maximum, average and standard deviation. This is the time between submission and completion. bw Bandwidth minimum, maximum, percentage of aggregate bandwidth received, average and standard deviation. cpu CPU usage statistics. Includes user and system time, number of context switches this thread went through and number of major and minor page faults. IO depths Distribution of I/O depths. Each depth includes everything less than (or equal) to it, but greater than the previous depth. IO issued Number of read/write requests issued, and number of short read/write requests. IO latencies Distribution of I/O completion latencies. The numbers follow the same pattern as IO depths. The group statistics show: io Number of megabytes I/O performed. aggrb Aggregate bandwidth of threads in the group. minb Minimum average bandwidth a thread saw. maxb Maximum average bandwidth a thread saw. mint Shortest runtime of threads in the group. maxt Longest runtime of threads in the group. Finally, disk statistics are printed with reads first: ios Number of I/Os performed by all groups. merge Number of merges in the I/O scheduler. ticks Number of ticks we kept the disk busy. io_queue Total time spent in the disk queue. util Disk utilization.
FIO NFS Statistics Reports
- FIO outputs galores of stats!
- Some stats, like disks statistics are not relevant in the case of NFS benchmarkings.
- Synchronous (sync) and asynchronous IO (libaio), buffered and un-buffered should be done.
- Synchronous IO (sync) is usually done for regular applications.
- Synchronous just refers to the system call interface: i.e. the when the system call returns to the application/
- It does not imply synchronous I/O aka O_SYNC which is way slower and enabled by sync=1
- Thus it does not guarantee that the I/O has been physically written to the underlying device.
- For reads, the IO has been done by the device. For writes, it could just be sitting in the page cache for later writeback.
- For reads, the IO always happens in the context of the process.
- For buffered writes, it usually does not. The process merely dirties the page, kernel threads will most often do the actual writeback of the data.
- direct=1 will circumvent the page cache.
- direct=1 will make the writes sync as well.
- So instead of just returning when it’s in page cache, when a sync write with direct=1 returns, the data has been received and acknowledged by the backing device.
- aio assumes the identity of the process. aio is usually mostly used by databases.
- Question:
- hat difference is between the following two other than the second one seems to be more popular in fio example job files?
- 1) ioengine=sync + direct=1
- 2) ioengine=libaio + direct=1
- Current answer: It is that fio can issue further I/Os while the Linux kernels handles the I/O.
- Perform FIO random IO (mixed read-write) and sequential IO (mixed read-write).
- Block size ranges from 4k to 1024k in multiplicative steps of 2.
- Working file size set to 50G. (
runtime=10800
(3 hours) might limit the total transferred size). - Basic synchronous read and write (sync) is used for the ioengine (
ioengine=sync
).- A second round of async benchmarks should be attempted with
ioengine=libaio
(Linux native asynchronous I/O) along withdirect=1
and a range ofiodepth
.
- A second round of async benchmarks should be attempted with
- buffered IO is set (
direct=false
).- Un-buffered IO will almost certainly worsen stats performance, but that’s not Real Life (TM).
- Real performance of the Isilon cluster should be assessed by bypassing the client’s local memory caching, ie, set
direct=1
andiodepth=1
and higher values.
- Set
iodepth=1
as it doesn’t make sense to use any larger value when using a synchronous ioengine.- Is it important to realize that the OS/kernel/block IO stack might restrict the
iodepth
parameter values. - This is to checked when one sets
ioengine=libaio
ANDdirect=false
.
- Is it important to realize that the OS/kernel/block IO stack might restrict the
- 4 threads for each FIO job will be launched (
numjobs=4
). - Only consider the following stats:
- Random IO: IOPs and latencies (average and 95th percentiles value).
- Sequential IO: bandwidth.
- Plot the following stats versus the FIO blocks size used: (4k,8k,16k,32k,64k,128k,256k,512k,1024k)
- IOPs for random IO reads and writes
- Total submission and completion latency (clat) 95th percentile values
clat 99.95th=[XXX]
,ie 95% of all latencies are under this latency value. - Bandwidth for sequential reads and writes.