Data-mining reveals that 80% of books published 1924-63 never had their copyrights renewed and are now in the public domain.
But there's another source of public domain works: until the 1976 Copyright Act, US works were not copyrighted unless they were registered, and then they quickly became public domain unless that registration was renewed.
For many years, the Internet Archive has hosted an archive of registration records, which were partially machine-readable.
Enter the New York Public Library, which employed a group of people to encode all these records in XML, making them amenable to automated data-mining.
But here's a genuinely fun fact: most books published in the US before 1964 are in the public domain!
Back then, you had to send in a form to get a second 28-year copyright term, and most people didn't bother. »