Results 1 to 10 of 13

Thread: Hot-Data Tracking Still Baking For The Linux Kernel

Hybrid View

  1. #1
    Join Date
    Jan 2007
    Posts
    14,591

    Default Hot-Data Tracking Still Baking For The Linux Kernel

    Phoronix: Hot-Data Tracking Still Baking For The Linux Kernel

    A few months ago I wrote about hot-data tracking for the Linux kernel, a VFS feature that could be used by Btrfs and other Linux file-systems for delivering improved performance. Unfortunately the patch-set didn't make the new Linux 3.8 development cycle, but hot-data tracking is still being worked on for merging into a future Linux kernel release...

    http://www.phoronix.com/vr.php?view=MTI1OTc

  2. #2

    Default

    ZFS already does this through the ARC algorithm.

  3. #3
    Join Date
    May 2011
    Posts
    1,475

    Default

    Quote Originally Posted by ryao View Post
    ZFS already does this through the ARC algorithm.
    Which means Linux should have it by the year 2050.

    Shortly after it's integrated into Windows 27.

  4. #4
    Join Date
    Oct 2012
    Posts
    148

    Default

    Quote Originally Posted by ryao View Post
    ZFS already does this through the ARC algorithm.
    And ZFS lacks support for device removal from pool, defragmentation (in a FS that fragments files by design) and raid level migration...

    Both file systems are far from "complete".

  5. #5
    Join Date
    Aug 2010
    Posts
    14

    Default

    Quote Originally Posted by ryao View Post
    ZFS already does this through the ARC algorithm.
    No it doesn´t, the ARC keeps a copy of the hottest blocks, they also reside in their original location. A better solution would
    migrate the blocks to faster/slower storage depending on usage pattern, which also means that it would be persistent across
    reboot which the ZFS ARC is not. Also the ARC only does read caching, with a true tiering solution, even writes are faster if they
    get migrated to a lower storage tier.
    Hopefully the Linux VFS implementation will make it possible to achive this functionality.
    Last edited by LasseKongo; 12-23-2012 at 01:05 PM.

  6. #6

    Default

    Quote Originally Posted by LasseKongo View Post
    No it doesn´t, the ARC keeps a copy of the hottest blocks, they also reside in their original location. A better solution would
    migrate the blocks to faster/slower storage depending on usage pattern, which also means that it would be persistent across
    reboot which the ZFS ARC is not. Also the ARC only does read caching, with a true tiering solution, even writes are faster if they
    get migrated to a lower storage tier.
    Hopefully the Linux VFS implementation will make it possible to achive this functionality.
    Try out L2ARC. It is a cache on faster storage. Migrating things (like Apple's Fusion drive) is bad for performance because it requires additional IOs. Having a copy somewhere faster does not have such a penalty. There is no reason that ARC could not be used in either scenario, but moving data around does not make sense when you can just cache it.
    Last edited by ryao; 12-23-2012 at 03:28 PM.

  7. #7
    Join Date
    Aug 2010
    Posts
    14

    Default

    Quote Originally Posted by ryao View Post
    Try out L2ARC. It is a cache on faster storage. Migrating things (like Apple's Fusion drive) is bad for performance because it requires additional IOs. Having a copy somewhere faster does not have such a penalty. There is no reason that ARC could not be used in either scenario, but moving data around does not make sense when you can just cache it.
    I am using L2ARC for caching 2 RAID-Z devices in my ZFS box, and yes, read operations will benefit if they are in the cache. There are a couple of downsides as I see it:

    * After a clean boot the L2ARC is invalid, and depending on your setup it can take some time to warm up with new data.
    * Just like with ZFS dedup there is a table keeping track of which blocks that are located on the L2ARC, I believe it is 320 bytes / block as with dedup tables. If you have a large L2ARC this consumes a lot of memory. If you have a 100GB L2ARC with 8K blocks this table consumes 4GB om RAM. In FreeBSD the default setting is to allow 25% of main memory to ZFS metadata, which mean I would need 16GB in the system to keep the table in memory.
    * Writes still go to the slow disks first, and is then eventually copied to the L2ARC for caching. A ZIL can speed up metadata operations, but not data.

    The data migration can be a scheduled job, it doesn´t have to be in real time, or it can be done on idle I/O cycles.

    It will certainly be interesting to see how the BTRFS guys is going to use this. I heard the RAID5/6 patches are going to be included in 3.8 which would clear another obstacle for adoptation for me. I will stick with ZFS for the time being, but I think BTRFS will be really good in maybe a years time. Just the fact that I cannot add or remove disks to the vdevs, or even remove an entire vdev i ZFS is starting to piss me off.

  8. #8

    Default

    Quote Originally Posted by LasseKongo View Post
    I am using L2ARC for caching 2 RAID-Z devices in my ZFS box, and yes, read operations will benefit if they are in the cache. There are a couple of downsides as I see it:

    * After a clean boot the L2ARC is invalid, and depending on your setup it can take some time to warm up with new data.
    * Just like with ZFS dedup there is a table keeping track of which blocks that are located on the L2ARC, I believe it is 320 bytes / block as with dedup tables. If you have a large L2ARC this consumes a lot of memory. If you have a 100GB L2ARC with 8K blocks this table consumes 4GB om RAM. In FreeBSD the default setting is to allow 25% of main memory to ZFS metadata, which mean I would need 16GB in the system to keep the table in memory.
    * Writes still go to the slow disks first, and is then eventually copied to the L2ARC for caching. A ZIL can speed up metadata operations, but not data.

    The data migration can be a scheduled job, it doesn´t have to be in real time, or it can be done on idle I/O cycles.

    It will certainly be interesting to see how the BTRFS guys is going to use this. I heard the RAID5/6 patches are going to be included in 3.8 which would clear another obstacle for adoptation for me. I will stick with ZFS for the time being, but I think BTRFS will be really good in maybe a years time. Just the fact that I cannot add or remove disks to the vdevs, or even remove an entire vdev i ZFS is starting to piss me off.
    You certainly are familiar with ZFS. However, you are wrong about ZIL only speeding up metadata operations. It applies to data as well, although only for small synchronous writes. You also always have a ZIL (unless you set sync=always on your datasets and zvols). You can make it external to your normal vdevs by using a SLOG device. As for L2ARC, it should work very well for those that do not reboot frequently. People have discussed making it persistent across reboots for a while, although the fact that the hottest data remains cached in RAM limits the usefulness of doing that. In theory, hibernation could be used instead of reboots, although I have yet to test that with swap on a ZFS zvol.

    Anyway, you are correct about the memory requirements of the L2ARC map. That is not much of a problem if you have a sufficiently large amount of memory to hold the L2ARC map in the memory ZFS has for metadata. It might be best to consider it separately from other metadata to eliminate the cannibalization of cache space. Avoiding a situation where the hottest metadata is forced to the L2ARC by virtue of the L2ARC map being large is definitely something that could be achieved by doing that.

    By the way, I don't know what you mean by "add or remove disks to vdevs". You can certainly take disks away, although doing that leaves the vdevs in a degraded state until you replace them.
    Last edited by ryao; 12-24-2012 at 01:30 PM.

  9. #9
    Join Date
    Jan 2009
    Posts
    1,345

    Default

    Quote Originally Posted by ryao View Post
    ZFS already does this through the ARC algorithm.
    super fantastic awesome brah

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •