Viewing entries in
Web/Tech

Sorry, Writers, but I'm Siding With Google's Robots

February 8, 2014

THE WALL STREET JOURNAL
February 8, 2014

Sorry, Writers, but I'm Siding With Google's Robots
by James Panero

Copyright laws too often stifle the creativity they claim to protect. Time for a 21st-century update.

How much did mention of "copyright" increase in American books published in the second half of the 20th century? The answer is by nearly a factor of three. How about "intellectual property," a neologism designed to equate copyright with real property? By a whopping factor of 70. But what about "public domain," the term for our creative commons where the arts are replanted and renewed? The answer is almost not at all.

We know this thanks to a new program called Ngram, an offshoot of Google Books that analyzes the metadata of what is now the world's most extensive literary index. Ngram gives us a sense of how ideas have circulated over the past 200 years. And when it comes to creative freedom, the numbers don't look good.

Since the 1970s, U.S. terms of copyright have been extended and tightened at the behest of the film, music and publishing industries in a way that hurts how we can enjoy, share, study and repurpose culture. Don't believe me? When was the last time you saw Martin Luther King's "I have a dream" speech on television in full? As a copyrighted work zealously guarded and monetized by the King estate, it's still rarely shown.

Technology companies have emerged as the key counterweight to the lawyers and lobbyists of the content giants. And that's one reason November's victory for Google Books in Authors Guild v. Google is important.

In 2004, Google announced a partnership with Harvard, Stanford, Oxford, the University of Michigan and the New York Public Library to begin scanning their holdings, turning the printed pages of millions of books into digital grist for its search mill. The robot scanners ran their eyes over everything, from books in the public domain to copyrighted material, which under current law includes most of what's been published since 1923. The results have been a boon to the culture of ideas.

Yet since Google never tracked down the millions of rights-holders of more recent works, the initiative has been embroiled in litigation over copyright infringement since its inception—even though Google has used copyrighted books only for its search index (as opposed to showing the full text). The Authors Guild, one of the plaintiffs against Google, declared the scanning "exploitation" and a "hazard for every author." U.S. Circuit Judge Denny Chin in Manhattan disagreed and dismissed the group's claims after eight years of litigation, declaring Google's project a "transformative" fair use. The Authors Guild has vowed to appeal.

As a writer, I'm siding with the robots. Google Books is far from perfect: Even advocates have worried about the consolidation of scanned information, fearing it will lead to a new digital monopoly. But it brings literature into the online world, exposing a younger generation to books they otherwise would never encounter.

Google Books' legal victory can also be seen as a chink in the armor of ironclad copyright laws. Copyright was never meant to be an indefinite "intellectual property." Article I, Section 8 of the U.S. Constitution gives Congress the power "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries." Much like patents, copyright was a utilitarian measure to protect creative work through a temporary government-granted monopoly.

For the founders, that meant a protective period of 14 years with the right of renewal for another 14. Since then, and especially over the last three decades, the terms have exploded. For self-made work, copyright is now in effect for the life of the author plus 70 years. For work-for-hire, the terms are 95 years after publication or 120 years after creation, whichever is shorter.

In Congress, the terms have tended to have the curious ability to grow just as Mickey Mouse is set to exit copyright, effectively locking down America's cultural patrimony to protect Disney. The "Copyright Term Extension Act" of 1998 is commonly derided as "The Mickey Mouse Protection Act," since it extended Disney's control of the cartoon character for another 20 years. The motion picture industry has argued for even more—a perpetual copyright, or "forever less one day." But would this actually be good for the arts? Numerous studies, such as a 2007 analysis by economist Rufus Pollock at Cambridge, have shown that far shorter terms would maximize creative output.

Considering the Democratic Party's ties to Hollywood, Republicans should be the natural leaders on intellectual property reform. Conservatives such as Reihan Salam, Patrick Ruffini, Timothy P. Carney and Jordan Bloom have argued convincingly for it—but so far the party isn't listening. When Derek Khanna, a young policy analyst, wrote a white paper in 2012 for the Republican Study Committee on rolling back copyright, he was shown the door. "The Republican Party hasn't been pro-innovation," he explained to me. "Copyright reform is a vital component of a more forward-leading platform."

At the start of 2014, Duke Law School's Center for the Study of the Public Domain published a list of books that would be entering the public domain under the laws that existed through 1978. For works ranging from Jack Kerouac's "On the Road" to Dr. Seuss's "Cat in the Hat," "you would be free to translate these books into other languages, create Braille or audio versions for visually impaired readers . . . or adapt them for film." Too bad: Under current law, you can't.

"Poetry can only be made out of other poems; novels out of other novels," wrote the critic Northrop Frye. "Literature shapes itself, and is not shaped externally." The freedom to work with a renewed public domain should be our inheritance—if only we stopped Mickey Mousing around with copyright.

Comment

1 Comment

At the Internet Archive, Saving Data While Spurning the Cloud

December 4, 2013

IMAG0444

Brewster Kahle, founder of the Internet Archive (all photographs by the author for Hyperallergic)

HYPERALLERGIC
December 4, 2013

At the Internet Archive, Saving Data While Spurning the Cloud
by James Panero

SAN FRANCISCO — At 3:30 am on November 6, a fire swept through the scanning center of the Internet Archive. The news was poignant for an organization that thinks hard about how information is lost and the best ways to save it. For nearly two decades, the San Francisco nonprofit has been uniquely dedicated to the open preservation of web, text, coding, audio, and video media — a Library of Congress for the 21st century built through private philanthropy and sweat equity. None of the Archive’s employees or volunteers were hurt in the blaze, but the fire totaled the Archive’s annex building along with $600,000 in digitization equipment and some irreplaceable archival material. An emergency appeal brought in $60,000 over its first two days, and the drive is ongoing.

“This episode has reminded us that digitizing and making copies are good strategies for both access and preservation,” wrote Brewster Kahle, the Archive’s founder and director, on the organization’s blog the day of the fire. Thanks to the Archive’s mirrored servers — spread over three continents — and a warehouse of hard-copy source material, Kahle said the Archive’s digital data would have survived even if its headquarters had been fully destroyed. “Let's keep making copies,” he concluded. (The virtue of the Internet’s duplicating qualities is a topic I wrote about in an essay called “The Culture of the Copy.”)

How libraries endure was on Kahle’s mind when I visited the Archive in San Francisco’s Richmond District earlier this year. “What happens to libraries is that they’re burned,” he said. “They are generally burned by governments. The Library of Congress, for instance, has already been burned once, by the Brits. So if that’s what happens, well, design for it, make copies.”

Internet Archive servers in the Christian Science building that houses the organization in San Francisco

Kahle now has the resources to make copies on a grand scale. The Archive recently surpassed 10 petabytes of data (they printed bumpers stickers to mark the milestone). The nonprofit costs $12 million a year to run: about $5 million of that comes in from libraries paying 10 cents a page for digital scans, $2 million comes from national and local libraries paying for archival services, and about $5 million comes in from foundations. “I am the funder of last resort,” says Kahle. “I won the lottery, the Internet lottery, so I can plug in when it doesn’t come through.” Even before Kahle sold his company Alexa Internet to Amazon in the late 1990s, he had focused on preserving digital information through duplication. “If the Library of Alexandria had made copies, and put them into either India or China, we would have the other works of Aristotle, the other plays of Euripides.”

A “scribe” works at one of the book digitization stations damaged in the fire

In 2009, Kahle moved the Archive headquarters into a former Christian Science church where it operates today. The collonaded facade of the classical building suited Kahle’s Alexandrian aspirations (the Archive has also partnered with the revived Bibliotheca in Alexandria, Egypt). “We bought this building because it matched our logo,” he told me. The basement meeting room became the Archive’s open-plan office, halfway between a tech hub and a student commons. Over a long table where Kahle invited me to join the Archive’s Friday lunch, the office breaks bread with whomever Kahle finds interesting. On the day I visited, Kahle introduced himself by dropping a black box in my hands — an inexpensive hard drive that could store a library’s worth of books and be widely distributed. My other seatmate was a crunchy bookseller from a San Francisco commune.

Upstairs, the church’s large sanctuary remains unchanged save for a few modifications. In the pews are statues of the Archive’s longterm workers. “It’s kind of a riff on the terracotta soldiers idea,” Kahle explained. “If you work at a non-profit there is no gold at the end of the rainbow. There’s no stock options. So this is sort of a way of saying thanks.” In front, the hymnal numbers have been replaced by the numbers for Pi and Phi. In two apses at the back are racks of blinking servers. “That is 2.5 petabytes of the primary copy of the Internet Archive.” Upstairs, in the church’s old offices, are the Archive’s additional primary servers. “The idea of having your data in an off-site location center, or in the ‘cloud,’ wherever the hell that is, strikes me as an insane idea. If it’s really important to you, keep it close to you.” Kahle said the Archive follows its own server design. “To buy something from Dell, HP, Sun, whatever — their profit margins are so unbelievably huge and their products so bad that it actually was better to design and build our own.”

Statues of long-time Internet Archive staffers recall terracotta soldiers

Mixed in among the servers is the Archive’s one-room schoolhouse for Kahle’s teenage son, Logan. “This is a classroom for one student and one teacher,” Kahle said as we walked by. “We are experimenting with one-on-one teaching. Logan wanted to learn differently and faster than what he was able to do in private school.” Kahle argued that his stripped down approach is economically more efficient than a private school with its administration and overhead costs.

For someone who made his fortune off the Internet, Kahle has an unexpected off-the-grid mindset. His sense for multimedia survivalism took off when he realized the technology existed to do what might sound impossible: through the right software and storage, to take a snapshot of the entire open Internet every two months. The public face of this effort became “the Wayback Machine,” the free online interface that allows anyone to search the Archive’s database that at last count boasted “368 Billion web pages saved over time.” What Kahle calls “an out-of-print web pages service” is now used by about 600,000 people a day and is the Archive’s most recognizable feature.

Yet the Archive’s reach now goes beyond the web to the preservation of a broad range of media. “We started collecting television,” Kahle said. “The Library of Congress is supposed to, but they weren’t. Twenty channels of television 24 hours a day.” Last year, the archive created a searchable video database of television news. “We’d like to make everyone into a Jon Stewart research department, so you can basically reference and compare and contrast what it is that has been on television.”

A film digitization station.

The digital Archive includes recorded audio (with nearly 9,000 live concert recordings of the Grateful Dead), vintage software and gaming code, old public service messages and video captures of ghostly home movies, and a vast library of scanned books, which can either be digitally loaned or downloaded depending on copyright. Here Kahle wants to create an archive similar to Google Books but “without having centralized control.” While he has praised the recent fair use summary judgement in favor of Google Books over The Authors Guild, Kahle has been critical of Google’s proprietorial control over its own scanned archive, as well as the quality that results from its robotic scanners. “We actively encourage people like Aaron Swartz to go and download millions of books at a time,” he says of his own scans. “We publish tools on how to do it. This is what libraries are for.” It was the book and video scanning building, where I saw young employees and volunteers hunched over rows of stations labeled “scribes,” that burned on November 6.

For all his faith in digital technology, Kahle believes in keeping hard copies. While other libraries may scan their contents in order to reduce their paper storage costs, sometimes “de-accessioning” books to pulping mills, Kahle has created an offsite storage vault where he hopes to keep a copy of every published book available, which he estimates to be 10 million copies. Much like the Svalbard Global Seed Vault buried in the permafrost of Norway, what he calls “The Physical Archive of the Internet Archive” already stores 500,000 copies in climate-controlled shipping containers along with other hard copy assets such as the Archive’s old servers — all there for future needs or an archive of last resort in a doomday destruction of the digital database.

“I have more faith right now in the Wikipedia generation than I do in the institutions that get all the funding, whether they be universities, libraries, museums. The bottom up generation is building the real infrastructure,” Kahle said.

“So how come you’re not the Librarian of Congress?” I asked.

“He’s still alive,” Kahle responded, as he moved on to point out the next rack of servers.

A panoramic view of the Internet Archive offices in San Francisco (click to enlarge)

1 Comment

Copycat Quandary

April 10, 2013

Edward C. Banfield

THE NEW CRITERION
April 2013

To the Editors:

James Panero writes in “The Culture of the Copy” (The New Criterion, January 2013) that my college professor Ed Banfield suggested museums sell their original works and replace them with passable facsimiles—a suggestion for which your founder Hilton Kramer criticized him. This gives the wrong impression. Ed thought many second-rate museums felt they had to purchase only original works, and, due to their very limited budgets, they could only afford second-rate art originals. As a result, museumgoers in smaller cities did not have the opportunity to view first-rate art. He thought that the Rockefellers and others had created copies of well-known works which were indistinguishable from the originals and which sold for relatively modest prices. Therefore, why not allow smaller, less wealthy museums to purchase these copies so their publics could view first-rate rather than second-rate art? It sounded reasonable to me when Ed proposed it, and it sounds reasonable to me now. I am at a loss to understand why the art community so violently objects to this.

Robert L. Freedman, Esq.
Philadelphia

James Panero replies:

The use of copies has an important place in the history of art. This is true especially when access to original artwork has been limited. Up through the first half of the twentieth century, plaster casts made from original sculptures were used widely as study aids in museums and art academies. By mid-century, however, these casts were removed from view. In part, American museums had by then come into possession of more original work. But I would also argue that copies came to be overly devalued in relation to originals, and this was unfortunate. I am glad to hear that the Metropolitan Museum now lends its plaster cast collection out to universities here and abroad. The art museum at Fairfield University in Connecticut, for example, currently displays several Met casts on long-term loan.

In other words, the idea of us allowing “smaller, less wealthy museums” to display copies “so their publics could view first-rate rather than second-rate art” was around long before Professor Banfield made his proposal concerning art copies in the early 1980s. One could say that art-library slide collections, and before that magic lantern projections, were all copies used in much the same way as those plaster casts. The same goes today for the high resolution digital scans available through initiatives such as Google Art Project.

In all of these cases, copies serve as necessary substitutes. Their availability has been widely beneficial to a public that might not otherwise have access to great works of art. And even when originals are available, reproductions have a place, because they don’t keep museum hours, and it’s not always possible to lecture about art in a gallery setting.

If Professor Banfield had suggested only that second-rate museums use their limited resources to purchase copies, as Freedman suggests, I agree that would have sounded reasonable. But Banfield suggested much more in his proposal, and the art community was right to object to it.

“I go further,” Banfield wrote in 1982, “Why should public museums not substitute reproductions for originals?” Kramer was therefore correct in giving the impression that Banfield advocated the wholesale deaccessioning, or selling off, of museum collections to fulfill his vision. Banfield’s arguments for this were esoteric at best, nonsensical at worst, but had something to do with a desire to see the “multibillion-dollar art business . . . fall into an acute and permanent recession.” Whatever the reasoning, it was an unreasonable and vastly destructive idea when Banfield proposed it. It remains so today in ideas such as the “Central Library Plan,” a proposal to remove the books and gut the stacks at the main branch of the New York Public Library, which I mention in my essay.

At the heart of these ideas is both a contempt for the art-going, book-reading public and the elitist sense that they either don’t deserve or cannot appreciate the real thing. “It would not be unduly cynical,” Banfield wrote in 1982, “to say that many of the thousands who stood in line for a ten-second look at ‘Aristotle Contemplating the Bust of Homer,’ after the Metropolitan Museum paid $6 million to acquire it, would as willingly have stood to see the $6 million in cash.” Sorry, but to make such a statement is about as cynical as you can get.

1 Comment

Writing + News

Sorry, Writers, but I'm Siding With Google's Robots

At the Internet Archive, Saving Data While Spurning the Cloud

Copycat Quandary

all rights reserved