EPIC Testimony on Content Extraction / Google Gmail / March 15, 2005

Testimony of Chris Jay Hoofnagle
Director, Electronic Privacy Information Center West Coast Office

Privacy Risks of E-mail Scanning

Before the California Senate Judiciary Committee

Tuesday, March 15, 2005

Introduction

Chairman Dunn, Vice-Chairman Morrow, and Members of the Committee, thank you for extending the opportunity to testify on the privacy risks raised by e-mail scanning. My name is Chris Hoofnagle and I am director of the Electronic Privacy Information Center's (EPIC) West Coast office. Founded in 1994, EPIC has been a leader in protecting Internet Privacy. We released the first Internet Privacy policy report in 1997, and continue to monitor business practices that erode individuals' privacy.

Today's hearing focuses on e-mail scanning. It is impossible to discuss this topic without at least mentioning Google, the provider of G-mail, an advertising-supported e-mail system that engages in "content extraction" (the term used in Google's patents) on all incoming and outgoing e-mail in order to engage in direct marketing.

As an initial matter, I think it is important to recognize that Google is an innovative and much admired company. But it is also important to recognize that the law has long distinguished between the content of communications and routing or "traffic" information. The content of communications is sacred; for law enforcement to gain access to a telephone call, it must comply with heightened procedures that go far beyond a normal search warrant. Similarly, private parties can be subject to civil and criminal sanctions for accessing the content of another's conversation. It therefore causes one some dissonance when Google, a company that I and many others admire, propose and execute a plan when they will cross the content barrier, extract content from messages, and commodify personal information for advertising.

I wish to emphasize that from a technical standpoint, Google is right about one thing, there is no categorical difference between "content extraction" and spam filtering--each involves a process that analyzes content of messages. However, from a legal standpoint, there is a fundamental difference. The law has long recognized that communications providers should not peek into the contents of a message unless they have a valid reason relating to the actual delivery of mail.

Similarly, there is no technical difference between a warrantless search of a home and one supported by a warrant. The difference is legal. And it's an important difference. If it is only technology, not law, that marks the lines in the sand between commercial or government surveillance and privacy, the day will soon come where there will be no privacy for any of us.

Finally, to be intellectually honest, we have to consider how we would feel if a different company were engaging in content extraction, one not as popular and trusted as Google. I doubt that many in the technical community would have been as receptive to this system had it been offered by Microsoft. And although Google has is held in high esteem by the public as a good corporate citizen, past performance is no guarantee of future behavior�especially following Google's IPO when the company will have a legal duty to maximize shareholder wealth.

The Importance of Communications Privacy

The initial postal mail systems in the United States were unreliable�not only did messages get lost, they were opened by strangers and sometimes read or altered. Benjamin Franklin, in establishing the national postal service, recognized the importance not only of reliable delivery, but also of message integrity and privacy.[1] Today, first class postal mail is sealed against inspection and is reliably delivered across the country.

In the field of wire communications, privacy and integrity are key. With the passage of Title III, Congress made clear that wiretapping was an investigative means of last resort. Congress required a "super warrant" process that not only specifies a showing of probable cause, but also minimization, statistical reporting, and even reporting to the person who was investigated if no prosecution was brought. Standards were also brought to bear on wiretapping performed for national security purposes.

With the Internet, we should strive to maintain strong standards for communications privacy. Privacy gives individuals a shield from outside pressure; it allows people to be autonomous. Privacy enables us to freely communicate. It allows us to visit medical web sites and obtain information without revealing who we are or our specific need for the information. Privacy enables us to freely associate without fear of retaliation. All of these values are represented in the law that governs mail and phones. The law should also extend these principles to e-mail, as it is rapidly replacing first class mail, and to some extent, phone conversations, as a preferred method of communication.

If Ben Franklin could give us a strong measure of privacy in our mails back in the 18th Century, should not the promising technology of the Internet deliver a similar (or better) standard in the 21st?

Content Extraction: Total Information Awareness for the Internet

The prospect that a computer could, en masse, view transactional and content data and draw conclusions was the plan of John Poindexter's Total Information Awareness (TIA). TIA proposed to look a wide array of personal information and make inferences for the prevention of terrorism or general crime. Congress rejected Poindexter's plan.

Google's content extraction is different than TIA in that it is designed to pitch advertising rather than catch criminals. But regardless of its purpose, it is an invasive system that looks at the following bits of information:

Subject Line
Body of the e-mail
Sender's name
Sender's address
Business card file (e.g. vcard)
Directory paths of attached files
Attached files (e.g. word processing files, pictures, etc.)
Information from a web page link included in e-mail
Concepts derived from files web page links
Time e-mail was sent
Geographic location of sender
Geographic location of recipient

Allowing the extraction of this content from e-mail messages is likely to have profound consequences for privacy. First, if companies can view private messages to pitch advertising, it is a matter of time before law enforcement will seek access to detect criminal conspiracies. All too often in Washington, one hears policy wonks asking, "if credit card companies can analyze your data to sell your cereal, why can't the FBI mine your data for terrorism?"

In the 1990s, privacy advocates warned the public about the risks to privacy that were posed by direct marketers. We argued that it was a matter of time before law enforcement and national security interests tried to obtain direct marketing data. To counter these risks, the Direct Marketing Association (DMA) touted its self-regulatory ethical guidelines. In numerous representations to the media and regulators, DMA officials and direct marketers attempted to quell criticism surrounding the possibility of law enforcement access to marketing data.

But now there are strong alliances between direct marketers and law enforcement interests. Despite the promises of the 1990s, many direct marketing companies sell data to the government for law enforcement or antiterrorism purposes. Companies such as Choicepoint, a Georgia-based marketing company that sells a broad array of data, have multi-million dollar contracts with dozens of federal agencies. Acxiom, a Little Rock, Arkansas company, was even contemplated as a source of consumer transactional data for Poindexter's TIA. Nuala O'Connor Kelly, the privacy officer of the Department of Homeland Security, was formerly employed by Doubleclick, a major Internet direct advertiser.

The point of this is not to enumerate all the ties between direct marketers and homeland security interests, but to illustrate that their interests are similar�both want to collect more data on Americans and have it flow from the private sector to the government. How can we be guaranteed that Google's content extraction will not be converted into a tool of law enforcement? How can be guaranteed that if Google deploys Gmail, law enforcement will not seek a court order to convert it into a system of surveillance?[2]

Second, content extraction may reduce Fourth Amendment expectations of privacy. In the United States, violations of privacy with respect to the Fourth Amendment are based partly on whether the person had a legitimate expectation of privacy. If a major online e-mail provider such as Google is allowed to monitor private communications�even in an automated way�the expectations of e-mail privacy may be eroded. That is, courts may consider the service as evidence of a lack of a reasonable expectation in e-mail. Businesses and government organizations may thus find it easier to legally monitor e-mail communications. These effects are long-term and will undoubtedly outlive Google.

Third, non-subscribers may have the content of their messages extracted. Non-subscribers who are e-mailing a Gmail user have not consented to the surveillance, and indeed may not even be aware that their communications are being analyzed or that a profile may being compiled.

Fourth, greater privacy risks may be presented by computer content extraction than a human eavesdropper. Just because content extraction is performed by a computer doesn't mean that it is less privacy invasive. It may be more so. For instance, in the telemarketing context, substantive regulation did not become popular until marketers started using computers to assist their calling. Through the use of autodialers, telemarketers could call millions of people. It was that advance over humans' limited capabilities to call many people that provided the tipping point for states and Congress to regulate the field.

Similarly, in some ways, having a person reading your e-mail would be less privacy invasive than a computer system. Unlike large computer systems, people do not have unlimited storage, memory and associative capability. But computers have the ability to build profiles of users based on their communications. Computer-based content extraction can make privacy invasions continuous and automated, making it a difficult privacy problem to solve just as spam and telemarketing, both of which are also continuous and automated by computer.

Fifth, content extraction may cause a "race to the bottom." Over the past ten years, Internet privacy practices have raced to the bottom. Prominent Internet retailers have changed their privacy policies to the detriment of consumers without any legal consequence. Tracking mechanisms have become more pervasive and invasive. Many sites that did not require personal information just a few years ago now require extensive registrations. There is no reason to believe the same will not occur with content extraction. If one major company is allowed to do it, others will follow suit, and one will have fewer options for privacy in e-mail.

Specific Google Privacy Risks

Eric Schmidt, Google's chief executive, did highlight a few broad priorities for the company, like adding more types of information to Google's index and using personal information about each user to answer queries better. "We are moving to a Google that knows more about you," he said.[3]

Although Google's tools are extremely useful, we have to be mindful that the company is in a unique position to monitor individuals' computer use. On the Internet, the risk to privacy that Google presents is probably second only to Microsoft. While risks of content extraction have been articulated above, specific risks posed by Google also deserve consideration.

First, Google's access to personal information is expanding. The company employs a persistent cookie on users' computers (that does not expire until 2038), thus allowing the company to track users across product lines. Although Google said that it does not cross-reference the cookies, nothing is stopping them from doing so at any time ("It might be really useful for us to know that information. I'd hate to rule anything like that out," said Google co-founder Larry Page).

Google retains a powerful ability to create detailed profiles on users, whether or not they do so today: e-mail addresses and "concept" information about a person's friends, family and co-workers; the daily search terms typed into Google; and myriad personal information provided to Orkut, a social networking service. The Gmail privacy policy explicitly allows such uses: "Google may share cookie information among its other services for the purpose of providing you a better experience."

Second, the prospect of unlimited data retention creates a honey pot for law enforcement. Although Google currently says that they will not record the "concepts" extracted from e-mails, they could decide to do so in the future and thereby create detailed profiles of users. Building such profiles on years of past communication in addition to current communications is made easier if users never delete e-mails. Additionally, communications stored for more than 180 days are exposed to lower protections from law enforcement access; with Gmail, many such e-mails could be made easily available to police.

Third, Google's privacy policy is insufficient. The company can sell personal information if it is sold. ("We reserve the right to transfer your personal information in the event of a transfer of ownership of Google, such as acquisition by or merger with another company".) Also, Google can make unilateral changes to the policy and unless it deems them "significant," it may not even notify users ("If we make any significant changes to this policy, we will notify you by posting a notice of such changes on the Gmail login page.") As outlined above, the policy regarding retention is very broad: "...residual copies of e-mail may remain on our systems for some time, even after you have deleted messages from your mailbox or after the termination of your account." (These and the rest of the references to the privacy policy are based on the 6/28/2004 version.)

Conclusion

Thank you for holding this hearing on content extraction. We continue to believe that e-mail should be a surveillance-free zone, an area where individuals should be able to communicate freely without either commercial or law enforcement intrusion.

[1] Robert Ellis Smith, Ben Franklin's Web Site, Privacy and Curiosity from Plymouth Rock to the Internet (Privacy Journal, 2000).

[2] See e.g. Company v. U.S.A. decision, No. 02-15635 (9th Cir. Nov. 18, 2003), where a federal district court ordered a car navigation service provider to leave the system on continuously, turning the automobile's navigation equipment into a wiretap for law enforcement.

[3] Saul Hansell, Google's Chef Speaks, but Not Its Finance Officer, N Y. Times, Feb. 10, 2005, available at http://www.nytimes.com/2005/02/10/technology/10google.html

Last Updated: March 15, 2005
Page URL: http://www.epic.org/privacy/gmail/casjud3.15.05.html