Data brokers scrape public records and sell information online. Years of stalking and abuse have followed—and policymakers need to step in.
In August, the Consumer Financial Protection Bureau (CFPB) announced that it was undertaking new rulemaking related to data brokers, or companies (in the author’s definition) involved in the business of collecting, aggregating, and selling and otherwise monetizing consumers’ information. This followed the CFPB’s request for information regarding data brokers and their business practices and came alongside numerous other data brokerage-related policy actions, such as California passing the Delete Act, to empower consumers to more efficiently opt out of the sale of their data by some third-party data brokers.
In the debate about data privacy and harms to Americans, however, one issue has not received adequate attention by the press or in policy conversations relative to the severity and volume of harm: the link between publicly available information and stalking and gendered violence. For decades, “people search” data brokers have compiled profiles on millions of people—including their family members, contact information, and home addresses—and published them online for search and sale. It could cost as low as $0.95 per record—or $3.40/search, for a monthly fee—to buy one of these dossiers. In turn, for decades, abusive individuals have bought this data and used it to hunt down and stalk, harass, intimidate, assault, and even murder other people. The harms of stalking and gendered violence fall predominantly on women as well as members of the LGBTQIA+ community.
This matters for the privacy debate because many so-called people search websites get this data by scraping public records, from voting registries to property filings. Yet this information is completely exempted from many state privacy laws because it is considered “publicly available information.” One prominent line of argument suggests that since the information is already out there, a company that aggregates it, digitizes it, and links it to profiles of specific individuals makes no difference.
This piece analyzes the links between people search data brokers, stalking and gendered violence, and public records. It also analyzes the state privacy law exemptions and other gaps that permit companies to scrape and use information from those records. I argue that the debate surrounding people search websites and data brokers manifests as a clash between two seemingly irreconcilable perspectives: that the only way to better protect people is to have this information entirely removed from the internet, and that removing any of this information from the internet is unthinkable given it is already published and that government records should be accessible to journalists, civil society watchdogs, and other stakeholders. This binary framing must be resisted. The way forward in this debate is for policymakers to recognize the importance of public records for press reporting and other functions while also dispensing with the myth that digitizing information, aggregating it, linking it to individuals, and selling it online does not somehow considerably change the risks posed to individuals and communities of people.
People Search Data Brokers and Public Records
The data brokerage ecosystem encompasses companies, broadly speaking, involved in the business of collecting, aggregating, selling, and sharing data. There are first-party companies involved in this ecosystem—mobile apps, websites, and others that directly collect consumers’ information from them and then sell it. There are also third-party companies involved in this ecosystem—firms that acquire information about consumers with whom they do not directly interact. Within the ecosystem, there is also variance between companies’ focus areas: Some data brokers, for example, focus heavily on consumer credit reporting and consumer profiling (such as Equifax, TransUnion, and Experian), while others focus entirely on selling people’s geolocation data. Data brokers get their information from a variety of sources, which can include mobile app developers, software development kits embedded in mobile apps, government records, retailers, health tech companies not covered by the Health Insurance Portability and Accountability Act (HIPAA), marketing firms, and much, much more.
People search data brokers—also known as “white pages” websites—focus on compiling dossiers of information on people in order to sell it to other individuals. Typically, these companies get their information from government records. These include voting registries, property filings, marriage certificates, motor vehicle records, criminal records, court documents, death records, professional licenses (such as for lawyers), bankruptcy filings, and much more. These dossiers contain everything from individuals’ names, addresses, and family information to data about finances, criminal justice system history, and home and vehicle purchases. People search websites’ business pitch boils down to the fact that they have done the work of compiling data, digitizing it, and linking it to specific people so that it can be searched online.
For decades, individuals engaged in various forms of abuse have purchased this information from people search websites and used it to hunt down and stalk, harass, intimidate, assault, and even murder other people. In October 1999, an individual stalking a 20-year-old woman, Amy Boyer, pulled up next to her car while she was at work and shot and killed her; he then shot himself. This man had purchased her date of birth from people search broker Docusearch for $20. (After the broker gave multiple dates of birth that were not for the right person, he provided them with her home address to get the right date of birth.) The man then spent $45 to acquire her Social Security number; $109 to get her employment information, which was refunded after the company could not deliver; $30 to search her by Social Security number, which yielded her home address (which he reportedly already had); and then another $109 to get a second employment information search. For this last request, a Docusearch employee called Boyer, lied about their identity and motives to get her to tell them her employment address, and then supplied that information to the stalker. The following month, the individual went to her workplace and murdered her.
Threats posed by stalking and identity theft lead to the conclusion that the risk of criminal misconduct is sufficiently foreseeable so that an investigator has a duty to exercise reasonable care in disclosing a third person’s personal information to a client; this is especially true when the investigator does not know the client or the client’s purpose in seeking the information.
Congress also passed Amy Boyer’s Law in 2000, to prohibit selling or publicly displaying an individual’s Social Security number without the “affirmatively expressed consent” of that person. It did not, however, apply to home addresses, children’s information, and other data points that can be gleaned from public records.
Legal debates about the Remsburg v. Docusearch case persisted afterward; for example, some observers raised the question of whether murder is a foreseeable risk, even if stalking is foreseeable, and argued that Social Security Number is “irrelevant” in this context because the individual already knew Boyer’s home address. Nonetheless, many cases of stalking and gendered violence have occurred in the ensuing years where the abusive individuals have acquired data from people search data brokers. In 2017, for instance, a survivor of domestic violence put it this way to ABC News:
If you have someone who’s tried to kill you, for them to be able to just type in your name, and any known address that you’ve stayed at can pop up. It’s scary, because now they know ways to start trying to find you.
For years, privacy advocates, regulators, and organizations supporting survivors of stalking and gendered violence have voiced concerns about people search data brokers and their linkage and sale of personal information. Beth Givens, the founder of Privacy Rights Clearinghouse, argued in 2002 that “providing access to public records on the internet alters the balance between access and privacy that has existed in paper and microfiche records.” Four experts from the National Network to End Domestic Violence and a criminal justice scholar wrote in 2007 that “survivors have found that within their own communities, critical conversations about privacy and victim safety are being left out of community decisions to publish information that is considered to be part of a public record.”
More recently, the National Domestic Violence hotline notes that stalking tactics include, among many others, “collecting information about you using public records, [using] online search services, or hiring investigators.” The Safety Net Project, part of the National Network to End Domestic Violence, warns that people search websites, on top of stalkerware and other technologies, can be another vector through which abusive individuals gather information about people; after all, “they are easy for even people who have never heard of them to find and use.”
The Federal Trade Commission (FTC) even noted in its landmark 2014 report on data brokers that “people search products can be used to facilitate harassment, or even stalking, and may expose domestic violence victims, law enforcement officers, prosecutors, public officials, or other individuals to retaliation or other harm.”
“Publicly Available Information” Carve-Outs
While there are ongoing debates about whether people search data brokers have legal responsibilities to the people about whom they gather and sell data, the sources of this information—public records—are completely carved out from every single state consumer privacy law. Consumer privacy laws in California, Colorado, Connecticut, Delaware, Indiana, Iowa, Montana, Oregon, Tennessee, Texas, Utah, and Virginia all contain highly similar or completely identical carve-outs for “publicly available information” or government records. Tennessee’s consumer data privacy law, for example, stipulates that “personal information,” a cornerstone of the legislation, does not include “publicly available information,” defined as
information that is lawfully made available through federal, state, or local government records, or information that a business has a reasonable basis to believe is lawfully made available to the general public through widely distributed media, by the consumer, or by a person to whom the consumer has disclosed the information, unless the consumer has restricted the information to a specific audience.
This is the exact same language as the carve-out in, among others, the California privacy regime, which is often held up as the national leader in state privacy regulations. What this means is that consumers, as is frequently noted in the media, have some legal rights in these states—or will, once the laws come into effect—to mandate that companies delete some of their information. But this does not apply to every scenario, kind of data, and company. “Publicly available information” and government records lie outside the scope of covered personal data.
Even under California’s newly passed Delete Act—which creates a centralized mechanism for consumers to ask some third-party data brokers to delete their information—consumers across the board cannot exercise these rights when it comes to data scraped from property filings, marriage certificates, public court documents, and much more. (There is a limited set of exceptions here when it comes to survivors of stalking and domestic violence, discussed more below.) The reason California’s Delete Act does not fix this problem is because the act exists alongside the California Consumer Privacy Act and the California Privacy Rights Act and adopts the same definitions, which includes this public records carve-out. And unless the people search data broker is engaged in activity covered under the federal Fair Credit Reporting Act, such as compiling credit information and selling it to financial institutions—which gives the consumer some rights over the data—there is no federal protection against this kind of data aggregation, linkage, and sale, either.
Such exemptions pose stalking and gendered violence risks to individuals. If someone in Virginia, to give a different example, asks a people search data broker to delete their information, and the information in question is derived entirely from “publicly available information,” the company could choose to do so of its own volition—but it would not be legally required to comply. Part of the reasoning behind this fact is that the original source of the information remains “public.” With property filings, for instance, the municipal agency that originally holds the filing would still have the documents in a cabinet or their local server, whether the people search data broker deleted the information from its systems or not. It is still possible for members of the public to physically retrieve the documentation.
But treating these scenarios as equivalent, and the data broker’s activity as zero change in risk, ignores the fact that digitizing the information, aggregating it, and linking it to specific individuals meaningfully changes the context of the data—and the risk to people. Givens, in her 2002 article “Public Records on the Internet,” referenced other scholars using “practical obscurity” to “describe the de facto privacy protection accorded court documents stored in back rooms and accessible only by visiting the court house and asking a clerk to retrieve them.”
Take court filings as an example. Reporters, accountability watchdogs, and even some legal rights organizations may want to have access to court filings to monitor corruption and wrongdoing, watch for civil rights violations, and otherwise keep the public informed about the criminal justice system and other legal ongoings. At the same time, the publication of this information allows data brokers to compile the information, link it to people, and then sell that data. A June 2005 report by the National Network to End Domestic Violence, for example, noted:
Many courts are beginning to publish both indexes of court records and the full documents and case files to the Internet, often without providing any notice to citizens or options for victims to restrict web-access. The Montgomery County, Pennsylvania Court went a step further, publishing the names and addresses of victims (and their children) who obtain protection orders on the Internet.
Once the information is digitized, aggregated, linked to a person, and sold online by a data broker, anyone with an internet connection can go online and search for an individual’s profile. The murder of Boyer, and other tragic cases since then, underscores that abusive individuals can follow these very steps and buy people’s whereabouts and other data for mere dollars or cents. Similarly, even though a court record may be “public” already, the personal information from it was not previously available online, linked to a person (and a profile of them), and for sale by a private company with its own autonomy over whether or not to vet buyers.
By contrast, retrieving the information in person would require, at minimum, traveling to a specific place, in person, during business hours, and interacting with an official who would have to procure the documents—where that place, not insignificantly, is a government building. Doing so may also leave a paper trail of some kind with a government organization, rather than a private enterprise. As just one example, to retrieve criminal records in Washington, D.C., from the Metropolitan Police Department, a person typically must fill out a criminal history request with their name, race, Social Security number, date of birth, gender, and place of birth—whether requesting in person or via mail—and might also have to provide an officer with valid government ID. People requesting public records in D.C. must often specify the purpose for doing so, too. These two scenarios of typing online on a computer and visiting a police station in person are clearly very different. Yet when they are discussed by some privacy lawyers and many data brokers, they are often portrayed or treated as indistinguishable.
People search data brokers’ very own business pitches contradict the idea that there is no change to the data context. Many people search data broker websites will discuss the supposed value their services and databases add by doing the work of digitizing and aggregating information and linking it to people, including by scraping public records. Indeed, that is the whole reason they expect people to pay for their services. Although the information is already “public” in some sense, there is a meaningful change to the data context—and the risk.
Breaking Beyond the Binary
There are plenty of important reasons for a democracy to have court documents and bankruptcy filings, for instance, be government records in the public domain. Civil society watchdogs might wish to review financial information and property holdings of an elected official to investigate allegations of corruption. Reporters might rely on court documents to understand government proceedings issues that affect members of their state and local communities. The long list of hypothetical scenarios goes on. On its website, the Society of Professional Journalists, for instance, writes that “public records, meetings, and proceedings play an important role in the journalism profession,” even though “media outlets call attention to that role relatively infrequently.”
Fifty-four civil society organizations—from the ACLU to the Freedom of the Press Foundation—made this argument recently, in response to a bill that Sens. Amy Klobuchar (D-Minn.) and Ted Cruz (R-Texas) attempted to put into the National Defense Authorization Act for 2024, to try to remove information about themselves, their families, and their staff from some data brokers’ databases and from some websites—but not information about anyone else, including their own constituents. As Klobuchar made clear, the intention was not to target journalism, but to minimize the risk that a violent person could acquire information online about a member of Congress and target them (even if it would not achieve that objective). The 54 civil society organizations cited concerns this would harm press reporting and accountability for elected officials:
[A]s drafted, Amendment 218 would allow Members of Congress to compel the censorship of a broad range of information whose publication is protected by the First Amendment—including the types of information routinely reported by journalists, government watchdogs, and ordinary citizens. This is precisely the information necessary for the American public to evaluate lawmakers’ adherence to our laws and ethical standards, as well as their policy promises to their constituents.
These are important points, and they are a valuable perspective from organizations on the front lines of press reporting and anti-corruption issues.
Simultaneously, though, they reflect a binary that is typical of this debate about “publicly available information” and risks of interpersonal and other violence. On the one hand, there is a proposed solution that would call for (in theory) completely removing information about certain people from the internet and limiting the publication of that information in public records. (Of course, with elected representatives, there is arguably both enhanced risk to those people as well as enhanced reason to make sure information on properties, vehicle ownership, and more remains public.) The view is that having it up is too dangerous, and takedowns from the internet are the only solution.
On the other hand, there is a view that importantly recognizes the value of this information for press reporting and anti-corruption work—yet entirely dismisses the idea that digitizing the information, aggregating it, and making it available for search and sale online changes the risks. These organizations, while often well intentioned and making many good points, fell into this trap in their letter to Klobuchar and Cruz: The proposed bill’s “damaging outcomes,” they wrote, “must also be weighed against the fact that Members of Congress, like all Americans, are already protected by a variety of criminal statutes and civil remedies against conduct such as stalking and assault, which make much of the legislation superfluous.” As I have written previously:
Putting aside the groups’ relevant concerns about speech and holding elected officials accountable, the notion that the criminalization of stalking makes the publication of home addresses online “superfluous” is contradicted by decades’ worth of gendered violence associated with the aggregation and publication of that very information. Selling a person’s address and other data online for just a few dollars, on a website that anyone can publicly access and search, enhances risks to individuals’ safety.
The conversation is even more urgent in the case of stalking and gendered violence. Despite rightful and important discussion about the safety of members of Congress and their families, or federal judges and their families (after a horrific, misogynistic attack on the home of a New Jersey federal judge in 2020 and the murder of her 20-year-old son), people search data brokers have contributed for decades to stalking and gendered violence problems, with significant harms, and the issue has received insufficient policy attention. Congress has yet to pass these kinds of legal protections for survivors of stalking and gendered violence. To her credit, Klobuchar has led some of the recent, related activity in Congress, such as writing a letter with Sen. Lisa Murkowski (R-Alaska) in 2021 to the FTC about the need to address people search websites and stalking. But the overall lack of policy attention and follow-through on this issue needs to change—including because changes in law will best empower the FTC to carry out privacy investigations and enforcements.
What Is to Be Done?
There is not a single “solution” or silver bullet to address the competing interests and concerns at issue. But there are some points and ideas that policymakers, legislators, and other stakeholders should consider. These include expanding the discussion about what kinds of legal, regulatory, and policy measures could target, in particular, the digitization, aggregation, linkage, and sale of personal information from public records; strengthening address confidentiality programs while also recognizing their core gaps; and breaking beyond the binary of proposing widespread takedowns of public records information or doing nothing at all. Policymakers must also realize that while corporate controls and best practices are important, they are a supplement to regulation, not a substitute for it altogether.
Placing limits on what data brokers can digitize, aggregate, and sell online does not necessarily require placing limits on the underlying record. These are related but distinct issues. It is possible to say that government agencies can keep physical copies of records in filing cabinets while also saying that a private company gathering those records, digitizing them, piling them into a database, linking them to individuals and their families, and posting them on the internet for sale is engaging in a prohibited or restricted activity. Breaking out of the binary of what is “possible” to address risks of stalking, gendered violence, and similar harms opens up the conversation for these kinds of solutions. Doing so does not eliminate other factors, such as the importance of press reporting, but it escapes an all-or-nothing framing where privacy is conceptually pitted against freedom—and where the safety of survivors of gendered violence, women, LGBTQIA+ people, and others is endangered in the process. One way to think about this may be targeting the digitization, aggregation, and brokerage of data about people, not the availability of public records themselves. The technological processes meaningfully change the data context and risk.
Of course, the records are still out there, and this is why some states have implemented “address confidentiality programs.” These permit survivors of stalking and gendered violence to swap out their home address in public records with a government-assigned address. In Washington, D.C., to stick with the earlier example, qualified applicants receive an authorization card with a substitute address, which is put on public records, accepted by D.C. agencies, and used for mail forwarding to the person’s real home.
It is clearly better to have these programs than not; California, New York, and many other states have implemented similar or adjacent efforts to increase protections for individuals. But the programs have their flaws. In Washington, D.C., for instance, applicants must have moved within the past 60 days, plan to move within the next 30 days, or have “taken adequate measures to ensure their current address is not easily accessible online or through public records.”
The website for California’s Safe at Home program—which provides a free post office box and mail forwarding service—states that “Safe at Home is not authorized to make demands of private organizations or businesses” and that “no, the program is not able to delete information that already exists in public records.” There is some added benefit in California, as the underlying law (California Code § 6208.1) says that no businesses can knowingly and intentionally publicly post or display a program participant’s home address or home phone number online. But it appears that the state of California does not take on the burden of enforcing this itself, instead allowing individuals to bring legal action if their data is left up. The state also does not promise complete success with the program and cannot retroactively force companies to delete information retrieved from previous public records.
Strengthening these state programs and easing the application burden on survivors would help improve the status quo. However, the same issue remains—there is a link between people search data brokers, public records, and risks of stalking and violence. This is perhaps even more reason to consider the first point—that if address confidentiality programs improve protections for public records that can be physically retrieved, then the glaring gaps center around people search websites still having the ability (in most cases) to publish and sell that information online.
Experts and scholars have also offered some ideas worthy of consideration by policymakers, privacy analysts, and other stakeholders. It is necessary to get first-hand perspectives from advocacy organizations and survivors of stalking and gendered violence themselves. While this piece cites some of these perspectives, it is surely not intended to be a comprehensive survey; any policymaker approaching this issue should learn from survivors of stalking and gendered violence as well as others working on these issues to understand their experiences and the nature of the harms. There is plenty of nuance to the reality that cannot be captured in news articles. University of Georgia School of Law professor Thomas Kadri offers some additional options in his 2022 article “Brokered Abuse,” such as changing anti-abuse laws to offer more privacy by obscurity and moving faster to address the imminent harms faced by people whose information is online. Kadri also argues for a broader understanding of harm in this context:
The harms of brokered data go beyond the fact that brokers make it easier to surveil people and expose them to physical, psychological, financial, and reputational harms. In addition, people must beg every single broker to conceal their information from thousands of separate databases, over and over again, with little or no legal recourse if brokers reject their efforts to regain some obscurity.
It is also worth considering arguments about whether company controls would help with the issue. As Kaveh Waddell wrote for the Atlantic in January 2017, “there are legal limits on how people can use information gleaned from these sites—they can’t be used to evaluate a job candidate, for example, or to stalk someone—but there are few actual safeguards in place to keep a user from doing just that.” Looking back further, the nonprofit Electronic Privacy Information Center (EPIC) made a similar point in its amicus curiae brief in the Remsburg v. Docusearch case: In that instance, the Docusearch co-owners both “implicitly acknowledg[ed] that many of their clients seek information for illegal or harmful purposes,” as EPIC wrote, “yet their only method of verifying [clients’] intentions is simply asking for a verbal assurance of good intentions.”
Put bluntly, it is absurd to think that asking a stalker “are you doing anything illegal with this data?” would prevent harm. There is certainly a place for know-your-customer controls in people search data brokers’ sales processes to better vet their clients. But those controls should be instituted as a supplement to strong legal and regulatory controls, not as a substitute altogether for measures attempting to prevent harm before it happens. To quote Kate Crawford’s 2014 article “Big Data Stalking” about the data broker industry, “a narrow focus on individual responsibility is not enough: the problem is systemic.”
Without regulation, the measures that some companies have instituted of their own volition do not entirely work anyway. Journalist Mara Hvistendahl wrote about this issue after attempting to remove her own information from people search websites in 2020. People search data brokers were often acquired by other companies, she said, requiring her to keep track of which was which; and “worse yet, the companies were continually trawling driver’s license registration records, voter registration databases, and address information from the U.S. Postal Service, creating listings to replace the ones I had removed.” Even when consumers can ask a people search broker to delete their information—including for pressing safety and security reasons—the broker’s systems might automatically repopulate that data mere hours or days later.
The abuse-related harms have been too great and too numerous, and persisted for far too long. Policymakers, lawyers, and other voices in this conversation should reconsider how they discuss issues of public records, data brokers, and stalking and violence. Years of front-line work on stalking and gendered violence, combined with years of expert writing and scholarship on people search data brokers and stalking, provide a foundation for this debate. To move forward, policymakers must begin by recognizing how the modern digitization, aggregation, linkage, and sale of information fundamentally changes the context of the data—and the risks to people’s lives.
– Justin Sherman is a contributing editor at Lawfare. He is also the founder and CEO of Global Cyber Strategies, a Washington, DC-based research and advisory firm; a senior fellow at Duke University’s Sanford School of Public Policy, where he runs its research project on data brokerage; and a nonresident fellow at the Atlantic Council.
“Published Courtesy of Lawfare“