Uploaded image for project: 'IT: Release Engineering'
  1. IT: Release Engineering
  2. RELENG-773

Nexus running out of inodes

    Issue XMLXMLWordPrintable

Details

    Description

      Opening this bug to track the following:

      1. Resolution of the issue
      2. Documentation in releng/docs regarding the issue

      Attachments

        JEditor

          Activity

            Proposed some documentation here https://gerrit.linuxfoundation.org/infra/8676

            zxiiro Thanh Ha (zxiiro) added a comment - Proposed some documentation here https://gerrit.linuxfoundation.org/infra/8676

            On Saturday Feb 3rd, 2018. With the help of jconway we discovered the file system on OpenDaylight's Nexus instance ran out of inodes. We decided to mitigate this by clearing up some files to free up more inodes and returned Nexus back to normal operation.

            On Monday Feb 5th, 2018. We discovered that the mitigation was not sufficient and ran into the inodes issue again. We decided to delete all jenkins092/builder-* jobs as they are very old and no longer necessary as well as jenkins092/docs-* jobs. This operation cleared up about 3 million inodes which hopefully will hold us off some more time. Will continue to monitor the system to ensure we are not rapidly running low on inodes.

            The longer term solution is to migrate the file system to a new XFS file system. An rsync is in progress to move the files. Furthermore we decided in the future to put logs storage on a separate filesystem to isolate inodes issue if it happens again such that it doesn't cause a total outage of Nexus.

            zxiiro Thanh Ha (zxiiro) added a comment - On Saturday Feb 3rd, 2018. With the help of jconway we discovered the file system on OpenDaylight's Nexus instance ran out of inodes. We decided to mitigate this by clearing up some files to free up more inodes and returned Nexus back to normal operation. On Monday Feb 5th, 2018. We discovered that the mitigation was not sufficient and ran into the inodes issue again. We decided to delete all jenkins092/builder-* jobs as they are very old and no longer necessary as well as jenkins092/docs-* jobs. This operation cleared up about 3 million inodes which hopefully will hold us off some more time. Will continue to monitor the system to ensure we are not rapidly running low on inodes. The longer term solution is to migrate the file system to a new XFS file system. An rsync is in progress to move the files. Furthermore we decided in the future to put logs storage on a separate filesystem to isolate inodes issue if it happens again such that it doesn't cause a total outage of Nexus.

            Types of errors that can be find in Nexus logs when Nexus runs out of space and/or inodes:

            2018-02-05 00:03:31 ERROR [1506131-1837956] - org.sonatype.nexus.web.internal.ErrorPageFilter - Internal error
            org.eclipse.jetty.io.EofException: null
            
            
            2018-02-05 00:27:41 ERROR [1506131-1837917] - org.sonatype.nexus.unpack.rest.UnpackPlexusResource - Got exception during processing request "PUT http://nexus.opendaylight.org/service/local/repositories/logs/content-compressed/releng/vex-yul-odl-jenkins-1/yangtools-maven-javadoc-publish-nitrogen/14": 
            org.sonatype.nexus.proxy.LocalStorageException: Could not create the directory hierarchy in repository "logs" [id=logs] to write "/srv/sonatype-work/nexus/storage/logs/releng/vex-yul-odl-jenkins-1/yangtools-maven-javadoc-publish-nitrogen/14/javadoc/org/opendaylight/yangtools/yang/model/util/class-use"
            
            
            2018-02-05 00:27:43 ERROR [1506131-1837898] - org.sonatype.nexus.unpack.rest.UnpackPlexusResource - Got exception during processing request "PUT http://nexus.opendaylight.org/service/local/repositories/logs/content-compressed/releng/vex-yul-odl-jenkins-1/yangtools-maven-javadoc-publish-nitrogen/14": 
            org.sonatype.nexus.proxy.LocalStorageException: No space left on device
            
            
            zxiiro Thanh Ha (zxiiro) added a comment - Types of errors that can be find in Nexus logs when Nexus runs out of space and/or inodes: 2018-02-05 00:03:31 ERROR [1506131-1837956] - org.sonatype.nexus.web.internal.ErrorPageFilter - Internal error org.eclipse.jetty.io.EofException: null 2018-02-05 00:27:41 ERROR [1506131-1837917] - org.sonatype.nexus.unpack.rest.UnpackPlexusResource - Got exception during processing request "PUT http://nexus.opendaylight.org/service/local/repositories/logs/content-compressed/releng/vex-yul-odl-jenkins-1/yangtools-maven-javadoc-publish-nitrogen/14": org.sonatype.nexus.proxy.LocalStorageException: Could not create the directory hierarchy in repository "logs" [id=logs] to write "/srv/sonatype-work/nexus/storage/logs/releng/vex-yul-odl-jenkins-1/yangtools-maven-javadoc-publish-nitrogen/14/javadoc/org/opendaylight/yangtools/yang/model/util/class-use" 2018-02-05 00:27:43 ERROR [1506131-1837898] - org.sonatype.nexus.unpack.rest.UnpackPlexusResource - Got exception during processing request "PUT http://nexus.opendaylight.org/service/local/repositories/logs/content-compressed/releng/vex-yul-odl-jenkins-1/yangtools-maven-javadoc-publish-nitrogen/14": org.sonatype.nexus.proxy.LocalStorageException: No space left on device
            askb Anil Belur added a comment -

            Looks like since the last check we have used nearly 0.7 mil inodes on Nexus and may have another 4-5 days till the issue crops up again!

            Last login: Tue Feb  6 00:56:17 2018 from 172.30.100.72
            $ df -i
            Filesystem             Inodes     IUsed     IFree IUse% Mounted on
            /dev/vda1             8387584     69316   8318268    1% /
            devtmpfs              2028104       358   2027746    1% /dev
            tmpfs                 2033463         1   2033462    1% /dev/shm
            tmpfs                 2033463       549   2032914    1% /run
            tmpfs                 2033463        16   2033447    1% /sys/fs/cgroup
            /dev/mapper/vg1-lv1 163840000 159943908   3896092   98% /srv
            tmpfs                 2033463         1   2033462    1% /run/user/0
            tmpfs                 2033463         1   2033462    1% /run/user/930600049
            /dev/mapper/vg2-lv2 524287552  38842350 485445202    8% /newsrv
            tmpfs                 2033463         1   2033462    1% /run/user/930600062
            
            askb Anil Belur added a comment - Looks like since the last check we have used nearly 0.7 mil inodes on Nexus and may have another 4-5 days till the issue crops up again! Last login: Tue Feb 6 00:56:17 2018 from 172.30.100.72 $ df -i Filesystem Inodes IUsed IFree IUse% Mounted on /dev/vda1 8387584 69316 8318268 1% / devtmpfs 2028104 358 2027746 1% /dev tmpfs 2033463 1 2033462 1% /dev/shm tmpfs 2033463 549 2032914 1% /run tmpfs 2033463 16 2033447 1% /sys/fs/cgroup /dev/mapper/vg1-lv1 163840000 159943908 3896092 98% /srv tmpfs 2033463 1 2033462 1% /run/user/0 tmpfs 2033463 1 2033462 1% /run/user/930600049 /dev/mapper/vg2-lv2 524287552 38842350 485445202 8% /newsrv tmpfs 2033463 1 2033462 1% /run/user/930600062
            askb Anil Belur added a comment -

            zxiiro is the rsync going through on `/newsrv`, since the number of used inodes (38842350) is same as yesterday.

            /dev/mapper/vg1-lv1 163840000 161284661   2555339   99% /srv
            /dev/mapper/vg2-lv2 524287552  *38842350* 485445202    8% /newsrv
            
            askb Anil Belur added a comment - zxiiro is the rsync going through on `/newsrv`, since the number of used inodes (38842350) is same as yesterday. /dev/mapper/vg1-lv1 163840000 161284661 2555339 99% /srv /dev/mapper/vg2-lv2 524287552 *38842350* 485445202 8% /newsrv
            zxiiro Thanh Ha (zxiiro) added a comment - - edited

            Looks like it's completed (although I'm surprised at the difference in storage space used). jconway will look into it tomorrow. I've issued another rsync in the meantime so that it pulls in newer files.

            Filesystem           Size  Used Avail Use% Mounted on
            /dev/mapper/vg1-lv1  4.9T  3.1T  1.6T  67% /srv
            /dev/mapper/vg2-lv2  4.9T  1.4T  3.5T  29% /newsrv
            
            zxiiro Thanh Ha (zxiiro) added a comment - - edited Looks like it's completed (although I'm surprised at the difference in storage space used). jconway will look into it tomorrow. I've issued another rsync in the meantime so that it pulls in newer files. Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg1-lv1 4.9T 3.1T 1.6T 67% /srv /dev/mapper/vg2-lv2 4.9T 1.4T 3.5T 29% /newsrv

            Plan now is to do migration on Monday.

            We decided to drop the /newsrv parition and restart syncing as we wanted to split up the partition layout for logs. New syncs are going now.

            In the meantime I will see if I can clear up enough inodes to last us until Monday.

            zxiiro Thanh Ha (zxiiro) added a comment - Plan now is to do migration on Monday. We decided to drop the /newsrv parition and restart syncing as we wanted to split up the partition layout for logs. New syncs are going now. In the meantime I will see if I can clear up enough inodes to last us until Monday.

            This was completed yesterday. We swapped the file systems as follows:

            /srv new file system for artifacts
            /srv/.../logs new file system for logs only
            /srv/.../old-logs symlink to old-logs

            We are still waiting for the logs to complete syncing.

            zxiiro Thanh Ha (zxiiro) added a comment - This was completed yesterday. We swapped the file systems as follows: /srv new file system for artifacts /srv/.../logs new file system for logs only /srv/.../old-logs symlink to old-logs We are still waiting for the logs to complete syncing.

            People

              zxiiro Thanh Ha (zxiiro)
              zxiiro Thanh Ha (zxiiro)
              Andrew Grimberg, Anil Belur, Jordan Conway, Thanh Ha (zxiiro)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: