Google Print and Fair Use

Google Print is the topic that may single-handedly keep the copyright-related blog world in business for the next few years.
Last week, Google added the full text of 10,000 public domain books into the Google Print database. The NY Times reports: Google Adds LIbrary Texts to Search Database: “The additions, from the university libraries at Michigan, Harvard and Stanford and from the New York Public Library, represent the first large group of material to be made available electronically from those libraries, which along with Oxford University contracted with Google last year to let the company scan and make searchable the contents of much or all of their collections.”
On Google’s corporate blog, Adam Mathes writes: Preserving public domain books: “The world’s libraries are a tremendous source of knowledge, much of which has never been available online. One of our goals for Google Print is to change that, and today we’ve taken an exciting step toward meeting it: making available a number of public domain books that were never subject to copyright or whose copyright has expired.”
The following day, announced a program that will sell online access to “any page, section or chapter of a book.” These commercial programs will convert the full-text databases used for searching into a way to offer access to full text works as well as a way compensate rightsholders– like iTunes. Again, from the NY Times: Want ‘War and Peace’ Online? How About 20 Pages at a Time?: “The idea is to do for books what Apple has done for music, allowing readers to buy and download parts of individual books for their own use through their computers rather than trek to a store or receive them by mail. Consumers could purchase a single recipe from a cookbook, for example, or a chapter on rebuilding a car engine from a repair manual.”
This week, the debate even spilled over into my favorite football-related column, Gregg Easterbrook’s Tuesday Morning Quarterback, where Easterbrook writes:

Copyright law gives authors and performers the exclusive right to make or authorize copies of their works; the exclusive right to make or authorize copies is, at heart, what a copyright represents.…
[Google] says it will not scan books whose authors send a letter of objection. But if you want to use a copyrighted work, the legal onus is on you to get permission, not on the copyright holder to lodge a protest. Google’s position is like saying that if you do not want your house broken in to, it is your responsibility to send a notification to thieves. In this analogy, Google is the thief — just like in the real world! Remember when Google maintained it would never be the next Microsoft? It’s not; Microsoft obeys the law. Remember when Google was going to be a corporate good-guy? Google is fast becoming the next Enron; maybe this is the kind of thing that happens when your founders decide they need an entire Boeing 767 to themselves. Contrast Google’s corporate kleptomania to Amazon’s decision to offer online books only if authors grant permission. As we enter the digital age, it becomes ever-more important society resists the idea that unaccountable corporations have an unlimited right to seize whatever exists in electronic form. And Google, now that you have declared it is fine to copy intellectual property without permission, surely you won’t object if anyone steals your proprietary software and corporate data?

In order to understand the legal implications of the Google Print case, we have to look at what Google is doing– scanning books into an electronic database for the purpose of indexing.
In Kelly v. Arriba Soft, The 9th Circuit ruled that creating thumbnails of images in a search engine is fair use.

The search engine at issue in this case is unconventional in
that it displays the results of a user’s query as “thumbnail”
images. When a user wants to search the internet for informa-
tion on a certain topic, he or she types a search term into a
search engine, which then produces a list of web sites that
contain information relating to the search term. Normally, the
list of results is in text format. The Arriba search engine, how-
ever, produces its list of results as small pictures.
To provide this service, Arriba developed a computer pro-
gram that “crawls” the web looking for images to index. This
crawler downloads full-sized copies of the images onto Arri-
ba’s server. The program then uses these copies to generate
smaller, lower-resolution thumbnails of the images. Once the
thumbnails are created, the program deletes the full-sized
originals from the server. Although a user could copy these
thumbnails to his computer or disk, he cannot increase the
resolution of the thumbnail; any enlargement would result in
a loss of clarity of the image.

The Google Print service provides essentially the same service as the Arriba Soft image search engine, except that it searches print books instead of digital images.

We must determine if Arri-
ba’s use of the images merely superseded the object of the
originals or instead added a further purpose or different charac-
ter…Although Arriba made exact replications of Kelly’s
images, the thumbnails were much smaller, lower-resolution
images that served an entirely different function than Kelly’s
original images.

The court ruled that create a search engine index is a transformative use that does not supersede the purpose of the original work. The character of a copy used in a search engine index is different than the character of a copy used to read. The search engine use helps to find the book. The intrinsic purposes of the use are different.
The court found that creating a complete copy is necessary to create a service that adds value to the images:

It was necessary for Arriba to copy the entire
image to allow users to recognize the image and decide
whether to pursue more information about the image or the
originating web site. If Arriba only copied part of the image,
it would be more difficult to identify it, thereby reducing the
usefulness of the visual search engine

Google’s book scans are used only for the purpose of creating a full-text index for searching and not for offering text to users. Google is not distributing copies of copyrighted books without permission. For books submitted to the index by publishers, Google provides acess to a couple of pages (with permission of the copyright owner.) For books scanned in under the partnership with university libraries, Google provides access to ~30 word excerpts that contain the user’s search term. Google’s Screenshots page explains this well.
In UMG Recordings v., the court found that a digital locker service, which created medium-shifted full copies of recorded music, was an infringing use. The defendant’s service not only created but distributed complete copies. Like the Arriba Soft thumbnail images, these copies were at a lower resolution/fidelity than the original works. Unlike the Arriba Soft thumbnails, these copies were used to supplant the original use of the works– for listening.
The key difference between Google and Arriba Soft is that Arriba searches images already on the web in digital form. Google is digitizing the books made available only in print, possibly superseding the market for electronic versions of those same books. Images placed on the web may be thought to be made available with an implied consent to be indexed.
Google Print does not provide access to the complete work and its full copies are used to add value by creating an index, rather than to merely replace the traditional use.
If Google, like Amazon, was providing access to a complete copyrighted work, Google would clearly need permission.
The authors and publishers complaint is based on the fact that Google is copying the entire book without permission in order to create this index. And this question shows why this case is important. Does Copyright law regulate the act of copying or the act of distribution? If making a copy of a complete work in order to create a searchable index, then Google’s entire business is threatened. In indexing the web, Google creates complete copies of web pages, unless the web publisher explicitly opts out using the robots.txt protocol. In addition, Google not only creates, but also distributes medium-shifted cache copies of .PDF and .DOC files.
If Copyright law is concerned with regulating the act of copying, then Google may be in trouble, but then so might culture. As a matter of public policy, copyright law might be better served by regulating distribution rather than regulating copying per se. If it is impossible to search the entire web, we lose this wonderful resource. As a matter of public policy, prohibiting intermediate copying will harm public access to information. Just because Google would have the ability to disseminate infringing copies might not mean that it should be prohibited from using infringing copies.
The NY Public Library will hold a live panel discussion, The Battle Over Books: Authors & Publishers Take on the Google Print Library Project, with Allan Adler (Association of American Publishers), Chris Anderson (Wired Magazine), David Drummond (Google), Paul LeClerc & David Ferriero (The New York Public Library), Lawrence Lessig (Stanford Law School), and Nick Taylor (The Authors Guild.) I will liveblog this, if possible.
Pat Schroeder and Bob Barr wrote an op-ed piece in the Washington Times stressing the rights of authors: Reining in Google: “Not only is Google trying to rewrite copyright law, it is also crushing creativity. ”
In Forbes, Nick Schultz defends Google: Don’t Fear Google: “The way the current copyright law works, I can take a book out from any library, read it and write a review of it for publication on the Web site I edit or in the pages of or anywhere else. This “fair use” of material involves no copyright violation. Readers benefit from learning a bit about the book, authors and publishers benefit from increased exposure. ”
Copyright treatise author Raymond Nimmer thinks that the Google project is very different from the Arriba Soft case and that Google’s use is not fair use, based mainly on the fact that it is a commercial enterprise: Google Lawsuit Begins; Fair Use On the one hand, this large company desires to make a massive number of copies of other persons’ property for its own profit. On the other hand, the authors and publishers that own the property rights have been given exclusive rights to copy or distribute copies of their works as part of a statutory scheme that intends to provide authors with incentive to create new works.”
Another treatise author, William Patry, prefers to apply a market substitute test for fair use: Google Revisited: “So in the Google project, why should we care if there are server copies? The purposes for the copies in connection with the Print Library project is to give people access to knowledge about the existence of the book as well as a tiny amount of text. That is of great help to researchers and hopefully to authors and publishers of the books too. It in no way harms copyright owners unless the project becomes something else, namely a full-text service which then is a market substitute.”
I tend to think that this is the core analysis of fair use– if the use is a market substitute for the original work, it is probably not a fair use.
Jason Schultz was quoted in a segment on NPR’s California Report on Google Lawsuits over Images, Books
In, Farhad Manjoo has an excellent piece that summarizes the implications of these cases: Throwing Google at the book: “A year later, Google’s grand plan to digitize the world’s books still seems as fantastical as it did when it was first proposed. Earlier this year, the company started scanning books at libraries, and on Nov. 3 launched an elegant beta version of its book search engine — but the project faces an uncertain future.”
On a tangentially related note, Eric Goldman discusses a different search engine indexing case: Newborn v. Yahoo: “In this case, a web publisher sued Google and Yahoo for contributory copyright and contributory trademark infringement based (apparently) on their indexing the publisher’s press releases. I say “apparently” because the plaintiff was unable to articulate a legal complaint or a statement of facts that the judge could understand. Because of the defects in the complaint, the judge granted a motion to dismiss with prejudice, ending the case before it started.”
More links and commentary follow in the extended entry.

Search Engine Watch’s Danny Sullivan discusses the difference between creating a full-text index and making the full text content accessible on the web: Indexing Versus Caching & How Google Print Doesn’t Reprint: “Here’s the thing. Google is NOT, repeat NOT, republishing copies of books that it scans out of libraries. This is a fundamental mistake that many people seem to be making.”
In Slate, Tim Wu frames the debate: Leggo My Ego: Google Print and the other culture war.: “What’s going on? Google has become the new ground zero for the “other” culture war. Not the one between Ralph Reed and Timothy Leary, but the war between Silicon Valley and Hollywood; California’s cultural civil war. At stake are two different visions of what might best promote authorship in this country. One side trumpets the culture of authorial exposure, the other urges the culture of authorial control.”
Michael Madison also looks at how the publishers frame the case: Google Print II: “This is not only bet-the-company litigation, it’s bet-the-Internet litigation.”
At the U of Chicago Law faculty blog, Doug Lichtman ponders Google Print: “Remember that whatever legal rule we create here is a legal rule, not a Google-specific contract. Thus, we have to make sure that any fair use rights we articulate here will work in a world that has lots of players (Google, Yahoo, new startups, and so on) plus also dishonest players that will abuse the rules here in much the same way that the Grokster and Napster folks abused the rules related to small-scale sharing of music.”
Joseph Liu wonders how the Grokster contributory infringement standard will affect Google’s liability: Google & Grokster: “Even if you believe, as do I, that Google’s activities are or should be fair use, there’s an interesting separate question re: what efforts, if any, Google should be obligated to take to keep the digitized books secure from third parties. For example, what if third parties could use Google Print to easily reconstruct full digital versions of print books (e.g. by sending a series of overlapping queries to Google Print and reassembling the search results)?”
In the Washington Post, Mary Sue Coleman, the president of the University of Michigan, defends Google: Riches We Must Share . .: “Beyond the specific legal challenges emerging in the wake of such a sea change, there are deeply important public policy issues at stake. We must not lose sight of the transformative nature of Google’s plan or the public good that can come from it.”
While in the same WaPo op-ed page, Nick Taylor, president of the Authors Guild writes: . . . But Not at Writers’ Expense: “The value of Google’s project notwithstanding, society has traditionally seen its greatest value in the rights of individuals, and particularly in the dignity of their work and just compensation for it.”
Laura Quilter comments on both op-ed pieces: lost licensing revenue & Google Print: “I was particularly disappointed with Nick Taylor’s editorial, in a few ways. Taylor wisely doesn’t actually make any legal arguments. Instead, his editorial boils down to the complaint that Google Print is lost licensing revenue for publishers. It’s okay, that he makes that point, because that’s actually the publishers and Authors’ Guild’s real (and only) point. I just resent the rhetorical slurs that are used to pad the actual argument.”
Wired reports that Writers Side With Google in Scrap: “Google’s plan to scan library book collections and make them searchable may be drawing ire from publishers and authors’ advocates, but some obscure and first-time writers are lining up on the search engine’s side of the dispute — arguing that the benefits of inclusion in the online database outweigh the drawbacks.”
One of those authors is Meghann Marco, who sent A Letter to Google: “I think Google Print is a good idea. No one has been able to explain to me how I would suffer from people being able to search for phrases and read excerpts of my book on-line. ”
Richard Nash, is a publisher whose company Soft Skull Press is a member of the Association of American Publishers, but who disagrees with the position of the association: The Google Debate: “I’ve been having a (very civilized) exchange with the Association of American Publishers over their Google lawsuit. I’m basically furious about what’s going on, though I can’t blame them directly: it’s the membership that decides what they do.”
In the LA Times, Xeni Jardin admonishes, You authors are saps to resist Googling: “If the paranoid myopia that drives such thinking penetrates too deeply into the law, search engines will eventually shut down. What’s the difference, after all, between a copyrighted Web page and a copyrighted book? What if Internet entrepreneurs could sue Google for indexing their websites?”
Siva Vaidhyanathan, Derek Slater, Michael Madison and Laura Quilter discuss the long-term potential effects on copyright, as well as the relevance of Google as the defendant. (There are more posts by each on their blogs, but these are probably the core of the discussion)
Siva: “Game On” for Google Print: “My real issues are with the libraries here. Google can and should do what’s best for its shareholders. The rest of us should worry about what’s best for the culture, democracy, and the Internet. We can’t count on any company to do that for us. We should be able to count on libraries to do it. They work for us. Google doesn’t.”
Slater: Gogle Print and “Copyright Meltdown”: “hese days, any time a comparison is made to P2P of almost any kind, many people have knee jerk reactions and resort to firmly entrenched battle stances. Those interested in having serious discussions about the future of copyright must work very hard to keep that tarpit of a debate about P2P from infecting all other issues – debates not only about Google Print, but also about podcasting and me2me technologies, for instance.”
Madison: Google Bad, Library Good — Follow Up: “Both Google and the libraries are parts of overlapping institutional universes, and it’s best to look at Google Print, and the wisdom of signing on with Google (or rejecting Google’s plans) in light of those universes.”
Quilter: Google’s evilness is beside the point (Bonus Rant Included) The point of people’s support for Google Print is not that we support Google, love Google, or want Google to control our access to information. The point is that Google, and any other entity who wants to do it, should be able to add value to information. Google should not be THE ONE; Google should be ONE OF MANY. Picking and choosing a single entity presupposes that the information is already controlled, and this new use, this new added value, is to be carefully metered as a scarce resource.”
Findlaw’s Julie Hilden: Authors Sue Google Over Its “Print for Libraries” Project: Will the Suit Succeed? Should It? And Why, As An Author, I’m Opting Out of Any Class Action” “Should this suit be certified as a class action? What should Google’s position be on class certification? (We know the plaintiffs position: They want it.) Who’s likely to win this suit? And, assuming the suit is certified as a class action, should individual authors opt in, or opt out?”
LawFont has a nice summary of the controversy: Book digitisation projects: Google Print and all that
Copyfight’s Donna Wentworth strings together a similar cross-blog conversation: Speaking Volumes
Finally, Google Print, a bibliography.