Page 3 of 7 FirstFirst 12345 ... LastLast
Results 21 to 30 of 67

Thread: Ted Ts'o: EXT4 Within Striking Distance Of XFS

  1. #21
    Join Date
    Aug 2008
    Posts
    234

    Default

    Quote Originally Posted by kebabbert View Post
    You dont understand what Data Integrity is.

    It is not about a disk crashes or something similar. It is about retrieving the same data you put on the disk. Imagine you put this data on disk: "1234567890" but a corruption occured so you got back "2234567890". And the hardware does not even notice the data got corrupted. This is called Silent Corruption and occurs all the time.

    Now, imagine you have a fast filesystem but there is silent corruption now and then. You can NOT trust the data you get back. As I have shown in links, this happens to XFS, ReiserFS, JFS, ext3, etc. Even to hardware raid, it happens all the time.

    CERN did a test, their 3.000 storage Linux servers showed 100s of instances of silent corruption. (CERN wrote a known bit pattern to the disks and compared the result - and they differed). CERN can not trust the data on disk. Therefore CERN are now migrating to ZFS (which actually is the only modern solution who is designed from scratch, to protect against silent corruption).

    I dont get it, who wants to have a fast filesystem which gives you false data?
    You obviously don't get it, or refuse to note the examples given to you, like the google situation. You also don't get that there is no such thing as 100% data integrity EVER. No chance. In all cases choices made based on cost/performance/and data integrity needs are all balanced.

    Google doesn't need perfect data integrity or hardware quality at that. By the volumes of data they process and the volume of hardware they deal with they don't care if a server is of iffy quality, they toss it. Large data volumes might also suggest a faster yet more data error prone file system might fit them ROI wise best.

    Like I asked you, do you use ECC in all your computers in all situations? Do you slightly overvolt/over clock/ underclock any part of your computer,back up on tape,backup on 10yr DVDs? The point is, you personally make trade offs on data integrity on a daily basis.

    Gee I don't get it! Ohhh how can someone/corporation live in a world without 100% data integrity?!!

  2. #22
    Join Date
    Jul 2009
    Posts
    61

    Default

    Quote Originally Posted by energyman View Post
    And no matter what the devs write 'it was never guaranteed'... FUCK YOU.
    POSIX says what you do or do not do, as an app, if you want your data safely stored. How much more simple does it need to be? Follow POSIX, or else it's the program author's fault when the data is not safely stored. Early filesystems like ext2 were designed such that (not on purpose) it wasn't really a problem if the software didn't save properly, and therefore people used bad code for too long with no issues. Now we have filesystems which make use of everything possible, and broken software fails on them. So fix the software.

    Didn't know you were still around, Jade, since the Reiser confession. Or was that forced, or something, I suppose?

  3. #23
    Join Date
    Jan 2008
    Location
    Have a good day.
    Posts
    678

    Default

    Quote Originally Posted by tytso View Post
    Some people like to treat file system benchmarks as a competition, and want to score wins and losses. That's not the way I look at it. I hack file systems because I'm passionate about working on that technology. I'm more excited about how I can make ext4 better, and not whether I can "beat down" some other file system. That's not what it's all about.

    -- Ted
    That's a hell of a quote. Respect.

  4. #24
    Join Date
    Aug 2008
    Posts
    234

    Default

    Quote Originally Posted by yotambien View Post
    That's a hell of a quote. Respect.
    I think the first thought that popped in my head was "Thank you!", for the hard work and sentiment.

  5. #25
    Join Date
    Nov 2008
    Posts
    418

    Default

    Quote Originally Posted by Tgui View Post
    You obviously don't get it, or refuse to note the examples given to you, like the google situation. You also don't get that there is no such thing as 100% data integrity EVER. No chance. In all cases choices made based on cost/performance/and data integrity needs are all balanced.

    Google doesn't need perfect data integrity or hardware quality at that. By the volumes of data they process and the volume of hardware they deal with they don't care if a server is of iffy quality, they toss it. Large data volumes might also suggest a faster yet more data error prone file system might fit them ROI wise best.

    Like I asked you, do you use ECC in all your computers in all situations? Do you slightly overvolt/over clock/ underclock any part of your computer,back up on tape,backup on 10yr DVDs? The point is, you personally make trade offs on data integrity on a daily basis.

    Gee I don't get it! Ohhh how can someone/corporation live in a world without 100% data integrity?!!
    Hmmm... You dont seem to understand what I mean.

    CERN is storing lots of data because of their LHC (Large Hadron Collider) which costed billions and decades to plan and build. They are trying to find the Higgs boson(?). CERN really thinks it is important that their bits from the experiments are stored correct. Now they are migrating to ZFS:


    http://blogs.sun.com/simons/entry/hp..._science_means

    "Simultaneously, more LCH sites are beginning to use Sun's Thumper (Sun Fire x4500) ultra-dense disk storage systems.

    Having conducted testing and analysis of ZFS, it is felt that the combination of ZFS and Solaris solves the critical data integrity issues that have been seen with other approaches. They feel the problem has been solved completely with the use of this technology. There is currently about one Petabyte of Thumper storage deployed across Tier1 and Tier2 sites. That number is expected to rise to approximately four Petabytes by the end of this summer."



    Here is another link about LHC and ZFS:
    http://hpc-events.com/sun-dresden07/CERN_Gasthuber.pdf

    "Solaris 10/11 with ZFS solves critical data
    integrity cases
    seen on all types of HW (/$ independent) <---------OBS!!!!
    we feel that this issue is solved completely
    Numerous sites now deploy ZFS + Thumper
    other storage + ZFS still minor
    >1 PB already in operation (T1 + T2)
    - just count the well known sites (as of April 07)
    - doubles soon"





    Also, in finance (which is I work) it is extremely important that the data is stored correct. I hope you understand that there are some fields of work, where performance is secondary.



    Of course I am not saying that ZFS is 100% secure, but it is far more secure than other solutions. The thing is that ZFS uses end-to-end checksummming. No other solution uses that. End to end means, from RAM down to controller down to disk, the entire chain. There might be bit flips in some parts of the chain. No solution compares the end of the chain with checksums. They only compare checksums within a realm, not when data passes a realm to a new.




    For instance, here is an example where ZFS end-to-end checksum immediately detected there was an error in a switch. The switch injected faulty bits into the data stream, down to server. Earlier, they did not notice the faulty switch and the corrupt data:
    http://jforonda.blogspot.com/2007/01...meets-zfs.html

    "As it turns out our trusted SAN was silently corrupting data due to a bad/flaky FC port in the switch. DMX3500 faithfully wrote the bad data and returned normal ACKs back to the server, thus all our servers reported no storage problems.
    ...
    ZFS was the first one to pick up on the silent corruption"




    There is also research from computer scentists that show that hardware raid and filesystems are not secure, you can not trust on them. There is also research that shows ZFS to be safe, ZFS detected all artificially introduced errors. ZFS would have also corrected all errors if they had used raid. Now they used single disk.

    The first step is detecting errors. Then you can correct the errors. Unfortunately, only ZFS is designed from scratch to detect errors.




    In summary, you can use a storage solution that scales to PetaBytes and is SAFE. So, I dont see why you should focus on performance, if you can not use that solution in for instance, finance?

    (No, I do not use ECC RAM yet, because I am waiting to upgrade my old ZFS server. When I upgrade, I will surely use ECC RAM. I agree ECC is important)

  6. #26
    Join Date
    Jul 2008
    Posts
    1,731

    Default

    Quote Originally Posted by Ranguvar View Post
    POSIX says what you do or do not do, as an app, if you want your data safely stored. How much more simple does it need to be? Follow POSIX, or else it's the program author's fault when the data is not safely stored. Early filesystems like ext2 were designed such that (not on purpose) it wasn't really a problem if the software didn't save properly, and therefore people used bad code for too long with no issues. Now we have filesystems which make use of everything possible, and broken software fails on them. So fix the software.

    Didn't know you were still around, Jade, since the Reiser confession. Or was that forced, or something, I suppose?
    file A is on disk.

    You want to rename it to B.

    You call rename(). A crash at the wrong moment and both are gone. r there is a file A or B. But it contents? Gone. That is a fucking braindead idiocy.
    from the btrfs faq:

    What are the crash guarantees of rename?
    Renames NOT overwriting existing files do not give additional guarantees. This means, a sequence like
    echo "content" > file.tmp
    mv file.tmp file

    # *crash*
    will most likely give you a zero-length "file". The sequence can give you either
    Neither file nor file.tmp exists
    Either file.tmp or file exists and is 0-size or contains "content"


    That is inacceptable. No matter what POSIX says. POSIX is crap anyway (windows NT is posix compliant too... yeah..)

    Whoever thinks that some clusterfuck like that is acceptable has a major problem with reality.

    In reality data is sacrosanct. Nuking it is not an option. A FS nuking data is fucking broken by fucking design.

  7. #27
    Join Date
    Apr 2008
    Location
    Saskatchewan, Canada
    Posts
    466

    Default

    Quote Originally Posted by Ranguvar View Post
    POSIX says what you do or do not do, as an app, if you want your data safely stored. How much more simple does it need to be? Follow POSIX, or else it's the program author's fault when the data is not safely stored.
    Fsync() is evil.

    I mean, really, truly, horribly, satanically evil.

    Flushing data on my laptop forces the disk to spin up just to write a file that I probably don't care that much about, thereby wasting my battery power. Flushing data on an ecommerce database server, on the other hand, is probably vital to ensure that databases are kept up to date.

    But that's a system configuration choice, and should not ever be something that applications randomly decide to do. If I don't care that I might lose the last five minutes of files when I crash, then Firefox shouldn't be calling fsync() every time I visit a new web page. But it does because there are so many crappy filesystems which will corrupt your files if you crash before everything has been flushed to disk.

    Having every application decide whether to force a write to the disk and waste my battery power is simply braindead. Filesystems should behave in a sensible manner so that we don't need this kind of hackery to make them work the way they should have worked in the first place. If I have file A on disk and I edit it and write it out, then the filesystem should be able to ensure that when I read it back after a reboot it will either be file A or file B and not an empty file or some corrupted mixture of the two. Anything else is unacceptable in a general use filesystem (special use filesystems may well prefer speed to consistency and be able to handle corruption issues).

  8. #28
    Join Date
    Oct 2008
    Posts
    3,251

    Default

    Quote Originally Posted by kebabbert View Post
    Hmmm... You dont seem to understand what I mean.
    No, it's pretty clear you're the one who isn't understanding. Everyone agrees that in certain cases it's important to have data integrity. What you don't seem to get is that in certain situations it is perfectly acceptable to have data errors. Even if you don't know they are there. This has been explained to you, but you keep repeating the same stuff so I'm not sure if you're ignoring us or just don't understand the concept.

  9. #29
    Join Date
    Nov 2008
    Posts
    418

    Default

    Quote Originally Posted by smitty3268 View Post
    No, it's pretty clear you're the one who isn't understanding. Everyone agrees that in certain cases it's important to have data integrity. What you don't seem to get is that in certain situations it is perfectly acceptable to have data errors.
    Ok, please enlighten me. I work in finance, therefore I have trouble seeing situations where it is acceptable that the data you get, is not correct. Maybe my line of work has colored me, but please give me some real life examples where it is acceptable that you get erroneous data (not some contrived examples).

    I suppose you also advocate fast cpus, that every once in a while insists that 1 + 1 = 7?

    To me, it is strange to suggest such storage solutions or hardware. I can promise you that in finance your suggestions would get kicked out, faster than greased lightning.

  10. #30
    Join Date
    Oct 2007
    Location
    Sweden
    Posts
    174

    Default

    Quote Originally Posted by kebabbert View Post
    Ok, please enlighten me. I work in finance, therefore I have trouble seeing situations where it is acceptable that the data you get, is not correct.
    Multimedia files usually tolerate being slightly corrupted. It might show as an small artefact in a movie or an image or a crack an audio stream.
    Applications such as video hosting would probably be great candidates for fast but less secure storage.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •