Announcement

Collapse
No announcement yet.

NVMe ZNS Support Coming To Linux 5.9

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • NVMe ZNS Support Coming To Linux 5.9

    Phoronix: NVMe ZNS Support Coming To Linux 5.9

    Landing in the block subsystem's "-next" tree today is ZNS support for NVMe drives...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    I'm curious to know if this would be beneficial for desktop usage purposes.
    I did just watched a video about zoned namespaces, but from what i get the target audience isn't desktop users.

    I'm guessing zoned namespaces is beneficial for situations where the services are virtualized but the storage for those services is still written to one NVMe/SSD. With this zoning stuff those services could be dedicated to their own zone on a NVMe/SSD. Still, i'd like some more confirmation as i'm all but sure about this.

    Comment


    • #3
      Suppose benchmarks would be needed. Anything else is speculation and unimportant.

      Comment


      • #4
        Short answer, yes. Even if mainly for reduced costs with some performance gains (less complexity of GC and DRAM requirements for mapping).

        Comment


        • #5
          Originally posted by markg85 View Post
          I'm curious to know if this would be beneficial for desktop usage purposes.
          I did just watched a video about zoned namespaces, but from what i get the target audience isn't desktop users.

          I'm guessing zoned namespaces is beneficial for situations where the services are virtualized but the storage for those services is still written to one NVMe/SSD. With this zoning stuff those services could be dedicated to their own zone on a NVMe/SSD. Still, i'd like some more confirmation as i'm all but sure about this.
          Zoned namespaces are more about giving more information about how the disk itself behaves. While an nvme device may expose itself as a collection of 4k sectors, the underlying media may be able to write 4k blocks, but may only be able to erase/rewrite blocks that are 64k, 128k, or even bigger in size. Thus, if you're doing a lot of updates to a lot of small files, you might run into a write amplification issue, where just a few writes on the OS side could involve writing and rewriting the underlying data multiple times as it deletes and overwrites the blocks containing that data.

          For the desktop side, the big ones I can see offhand are more predictable drive performance, and longer life, because the OS doesn't have to make assumptions regarding how the underlying storage medium behaves. Bringing all that information to the OS level can mean filesystem developers, app developers, etc, could work those assumptions into the decisions of how and when to write data -- it might be more worth it, for example, to accept a bit higher risk of data loss to batch writes together to ensure consistent performance. It can also mean potentially less expensive drives, because pulling that data to the OS level means that the drive can just worry about writing blocks to disk, and not concern itself with writing/overwriting data to make it look like a disk with 4k sectors, and the amounts of on-drive RAM/CPU/etc needed to make that happen.

          Comment


          • #6
            Originally posted by markg85 View Post
            I'm curious to know if this would be beneficial for desktop usage purposes.
            I did just watched a video about zoned namespaces, but from what i get the target audience isn't desktop users.

            I'm guessing zoned namespaces is beneficial for situations where the services are virtualized but the storage for those services is still written to one NVMe/SSD. With this zoning stuff those services could be dedicated to their own zone on a NVMe/SSD. Still, i'd like some more confirmation as i'm all but sure about this.
            Really that video is targeted at the server market. But the largest users of SSD without DDRAM are desktop users. Yes removing the DDRAM from the SSD could be like a 10 USD saving off the built cost of the machines over a million units of something that really adds up and if it not giving a performance advantage any more its a direct saving makers of devices will be able todo. Currently removing DDRAM off SSD without ZNS can result in short SSD life span and unpredictable performance by attempting to make up for the missing DDRAM this is not great for consumer complaints and warranty claims..

            Big thing about ZNS is reduce dram requirement on SSD to perform well. This means a ZNS version of a DDRAM less SSD will have less issues and be a lot closer to the versions ZNS drives with DDRAM we will not know how close until we see ZNS dram-less drives in production(yes it possible that they will be that close that there will be no functional difference in lifespan). Problem here is operating systems have to catch up and I don't see Microsoft being that quick. Android, Chromebooks and Linux workstations could be taking advantage of ZNS drives fairly quickly.

            Do note that video listed working on support for f2fs, ext4 and Btrfs with Xfs down the track. This is another case of ZFS left out in the cold like they were with SMR.

            Zone based storage devices be it SSD or harddrives is the future tech be it SMR or ZNS it will come to the desktop at some point in volume.

            Comment


            • #7
              @KesZerda and @oiaohm does that mean that i - as end user - have to do exactly 0 to get ZNS and the benefits it brings in time with Linux?
              Also, from a developer point of view, do i have to do anything?

              I find it "magical" if it would just started working "out of the blue" at some day, which i can't imagine to be the case. Also, from what i get zones are reserved areas. Lets assume, for a 1TB NVMe, that you have 1000 zones (probably more but this makes the example easier). Each zone would be 1GB. Now what if you used up all data in 1 zone where it can't grow as the other zones are there taking up all other "space" (but being mostly empty). I guess what i'm asking is if i would get "out of space" errors when one zone is full but the NVMe in it's entirety still has many gigabytes free in zones that aren't fully utilized yet.

              Comment


              • #8
                Originally posted by markg85 View Post
                @KesZerda and @oiaohm does that mean that i - as end user - have to do exactly 0 to get ZNS and the benefits it brings in time with Linux?
                Also, from a developer point of view, do i have to do anything?
                If you are a file system developer yes you have to change the way you do things. End users formatting ZNS like SMR drives may have to do it with particular flags so everything works right. Users above really should not notice any major operational difference.

                Originally posted by markg85 View Post
                I find it "magical" if it would just started working "out of the blue" at some day, which i can't imagine to be the case. Also, from what i get zones are reserved areas. Lets assume, for a 1TB NVMe, that you have 1000 zones (probably more but this makes the example easier). Each zone would be 1GB. Now what if you used up all data in 1 zone where it can't grow as the other zones are there taking up all other "space" (but being mostly empty). I guess what i'm asking is if i would get "out of space" errors when one zone is full but the NVMe in it's entirety still has many gigabytes free in zones that aren't fully utilized yet.
                Zones are more like sectors. So a in your example a multi gigabyte file could be spread over multi zones by the file system. What zones exposes from a SSD is what is called banks. Reality of a SSD is that you append data to a bank but to delete anything in bank you have to delete the complete lot this is very much the same as SMR. Yes in theory a bank in a SSD could be 1G in size reality is 4 to 256 MB is more likely but I will stick to your 1G banks/zones you don't see this size in production now but it would be possible future..

                Currently that SSD is made up of banks is hidden by the controller this is why a SSD drive getting a lot of fragment data can stall out. Different problem to what you are thinking. You were thinking 1 zone fully full things are stuffed that is not the problem. The real problem that current SSD is like where the 1000 1G zones are part full off stuff at this point the drive has to stall out while it reorders things.


                If the file system can know about the zones/banks in the device it can be in charge of what information is stacked into what zones/banks. This can remove guess work. Current SSD are really guess what data should be with what other data..

                The speed of an SSD can be greatly reduced without the proper maintenance known as garbage collection and TRIM. Here’s what they are and how they work.


                Yes you find instructions to use Trim command on SSD and this is stack blocks as tight as possible with each other in a dumb way. Lets say you are downloading a file when you perform a trim. Current SSD controller could take a small file tempory and shove it in the same bank as the large file you are downloading that you are going to store for quite some time of course that temporary file in time gets deleted and now the next trim to compact has to move copy that large file contents from where it is to another bank so it can unify free space again due to the strict rule you can only delete complete banks. At the file system level knowing about zones/banks a goof like this could have been avoided there are a few advantages for file system big one fact knows what files are in fact open and are being added as well as take highly educated guess what is a temporary file so unify those into zones/banks compared to the SSD controller in the SSD only know that it has written or unwritten areas. .

                If you have not worked out smart allocation of zones/banks in SSD could reduce number of writes due to not having to move data around as often. Remember SSD flash has limited write cycles.

                Please note different harddrive makers are putting out harddrives for the desktop space that are SMR drives but the device managed stuff that basically pull the same trick as SSD are doing of hiding the fact they have to delete large zones with increased ram and processing by the controller. Remember both the existing SSD and the harddrive case SMR device managed case have write operations happen without the OS knowledge this makes it possible for data integrity issues to be created behind file system with late detection that it happened.

                Please note supporting zones base drives has been under way with the Linux kernel for over half a decade now so not magically coming working over night with Linux Kernel we would be in many decades of work by individuals to get to this point. So not magically starting working. Lots and lots of work done so it can just work. Please note just work the finer details to make zoned based stuff work perfectly still has to be completed.

                Comment


                • #9
                  Originally posted by oiaohm View Post

                  Really that video is targeted at the server market. But the largest users of SSD without DDRAM are desktop users. Yes removing the DDRAM from the SSD could be like a 10 USD saving off the built cost of the machines over a million units of something that really adds up and if it not giving a performance advantage any more its a direct saving makers of devices will be able todo. Currently removing DDRAM off SSD without ZNS can result in short SSD life span and unpredictable performance by attempting to make up for the missing DDRAM this is not great for consumer complaints and warranty claims..

                  Big thing about ZNS is reduce dram requirement on SSD to perform well. This means a ZNS version of a DDRAM less SSD will have less issues and be a lot closer to the versions ZNS drives with DDRAM we will not know how close until we see ZNS dram-less drives in production(yes it possible that they will be that close that there will be no functional difference in lifespan). Problem here is operating systems have to catch up and I don't see Microsoft being that quick. Android, Chromebooks and Linux workstations could be taking advantage of ZNS drives fairly quickly.

                  Do note that video listed working on support for f2fs, ext4 and Btrfs with Xfs down the track. This is another case of ZFS left out in the cold like they were with SMR.

                  Zone based storage devices be it SSD or harddrives is the future tech be it SMR or ZNS it will come to the desktop at some point in volume.
                  Any idea what it'll take for ZFS to support it? Will EXT4 et al need to re-create their volumes in order to use ZNS - or will it work with existing volumes so long as the kernel supports it?

                  Comment


                  • #10
                    Originally posted by oiaohm View Post

                    If you are a file system developer yes you have to change the way you do things. End users formatting ZNS like SMR drives may have to do it with particular flags so everything works right. Users above really should not notice any major operational difference.



                    Zones are more like sectors. So a in your example a multi gigabyte file could be spread over multi zones by the file system. What zones exposes from a SSD is what is called banks. Reality of a SSD is that you append data to a bank but to delete anything in bank you have to delete the complete lot this is very much the same as SMR. Yes in theory a bank in a SSD could be 1G in size reality is 4 to 256 MB is more likely but I will stick to your 1G banks/zones you don't see this size in production now but it would be possible future..

                    Currently that SSD is made up of banks is hidden by the controller this is why a SSD drive getting a lot of fragment data can stall out. Different problem to what you are thinking. You were thinking 1 zone fully full things are stuffed that is not the problem. The real problem that current SSD is like where the 1000 1G zones are part full off stuff at this point the drive has to stall out while it reorders things.


                    If the file system can know about the zones/banks in the device it can be in charge of what information is stacked into what zones/banks. This can remove guess work. Current SSD are really guess what data should be with what other data..

                    The speed of an SSD can be greatly reduced without the proper maintenance known as garbage collection and TRIM. Here’s what they are and how they work.


                    Yes you find instructions to use Trim command on SSD and this is stack blocks as tight as possible with each other in a dumb way. Lets say you are downloading a file when you perform a trim. Current SSD controller could take a small file tempory and shove it in the same bank as the large file you are downloading that you are going to store for quite some time of course that temporary file in time gets deleted and now the next trim to compact has to move copy that large file contents from where it is to another bank so it can unify free space again due to the strict rule you can only delete complete banks. At the file system level knowing about zones/banks a goof like this could have been avoided there are a few advantages for file system big one fact knows what files are in fact open and are being added as well as take highly educated guess what is a temporary file so unify those into zones/banks compared to the SSD controller in the SSD only know that it has written or unwritten areas. .

                    If you have not worked out smart allocation of zones/banks in SSD could reduce number of writes due to not having to move data around as often. Remember SSD flash has limited write cycles.

                    Please note different harddrive makers are putting out harddrives for the desktop space that are SMR drives but the device managed stuff that basically pull the same trick as SSD are doing of hiding the fact they have to delete large zones with increased ram and processing by the controller. Remember both the existing SSD and the harddrive case SMR device managed case have write operations happen without the OS knowledge this makes it possible for data integrity issues to be created behind file system with late detection that it happened.

                    Please note supporting zones base drives has been under way with the Linux kernel for over half a decade now so not magically coming working over night with Linux Kernel we would be in many decades of work by individuals to get to this point. So not magically starting working. Lots and lots of work done so it can just work. Please note just work the finer details to make zoned based stuff work perfectly still has to be completed.
                    Hi @oiaohm, thank you for that elaborate explanation!
                    I think it makes a bit more sense to me now.

                    Note for the "developer" point of view, i meant "just" a developer. Specifically not a filesystem dev
                    But it looks like we, developers, don't have to care about this at all.

                    Lastly, would this zone stuff be auto-enabled on - say - ext4 once all pieces are in place? Or is this going to be something that's off by default where users (or distributions) can opt-in to enable it?

                    Comment

                    Working...
                    X