New York City police have shut down a controversial unit that sent plainclothes detectives into Muslim communities without specific cause and secretly designated several mosques “terrorism enterprises,” effectively allowing police to investigate anyone who prayed there. The unit, formed after the 9/11 attacks, sent informants with secret microphones to record sermons and collected license plates of people parked outside, automatically monitoring their movements around the city.
“The Demographics Unit created psychological warfare in our community,” said Linda Sarsour, of the Arab American Association of New York. “Those documents, they showed where we live. That’s the cafe where I eat. That’s where I pray. That’s where I buy my groceries. They were able to see their entire lives on those maps. And it completely messed with the psyche of the community.”
Although the unit is now defunct, it remains unclear how many of its controversial practices may be continued in other forms. The Times reports that “the Police Department appears to be moving its policies closer to those of the F.B.I. Both agencies are allowed to use census data, public information and government data to create detailed maps of ethnic communities.”
At Robinson + Yu, we’re crazy about coffee, and we’ve picked up a digital scale to measure out just the right amount of grounds for each cup. But judging from Amazon’s automatic suggestions, weighing coffee may not be the most popular use for this scale. Under the heading “Customers Who Bought This Item Also Bought,” the site “serves up a quickstart kit for selling drugs” — including several varieties of tiny ziplock bags, a drug testing kit, a machine for manufacturing gelatin capsule pills, and a safe designed to look like a Dr. Pepper can.
These recommendations are a product of big data methods — in this case, “collaborative filtering.” Amazon analyzes shopping data from thousands of customers, and automatically highlights items that tend to get bought together. Netflix uses a similar approach for its recommendations.
As far as we know, law enforcement officials have not (and likely could not) simply ask for the shopping records of everyone who bought this scale. But Amazon itself, by looking at individual shopping histories, probably could identify some likely drug dealers simply from the things they’ve bought. It’s a powerful reminder of how much companies like Amazon can infer from our purchase patterns, and of the need for clear policies that restrict how that information may be used.
The FBI may have used big data analysis methods — looking for hidden patterns in campaign finance filings — to detect alleged campaign finance fraud by conservative activist Dinesh D’Souza. Mr. D’Souza allegedly made contributions in the names of others, a practice barred by federal law.
Some of New York’s largest auto insurers charge higher rates to drivers with less education and non-professional jobs, claims the New York Public Interest Research Group (NYPIRG) in a new report.
NYPIRG used auto insurers’ websites to record pricing information. Holding all factors constant except education and occupation, it found that drivers with non-professional, non-managerial jobs and less education were quoted higher rates — even if they had better driving records or drove less. For example:
Geico charged a retail cashier with a high school degree and a flawless driving record 23% more than an executive with a college degree who caused a crash 3-5 years ago ($722 vs. $557) and 24% more than an executive with a college degree who got two speeding tickets 1-3 years ago ($722 vs. $584).
“In many cases, [considering these factors] increases rates for those who can least afford it,” said Andy Morrison of NYPIRG.
Some argue that only factors related to driving should be fair game, especially since drivers are legally required to buy coverage. “I am convinced that the use of non-driving-related factors to raise rates for people with low-paying jobs and less education is unfair and should be tightly regulated, if allowed to be used at all,” said J. Robert Hunter, Director of Insurance for Consumer Federation of America and a former Texas Insurance Commissioner.
But insurers claim that education and occupation data helps them more accurately predict risk. “These factors are correlated with risk, which is why regulators have allowed the use of education and occupation in determining how much a consumer pays for insurance,” said Ellen Melchionni, president of the New York Insurance Association.
Though pricing car insurance on individuals’ supposed level of risk sounds fair, the practice could undermine the broad social benefits of our insurance system.
NYPIRG has asked New York regulators to look more closely at the practice and potentially apply a state law that bars “unfairly discriminatory” rates.
In Los Angeles, police car recording antennas — which can automatically capture and store an officer’s voice, even when the officer leaves her vehicle — are going missing. In one division, almost half of the 80 cars are missing an antenna.
“This equipment is for the protection of the public and of the officers. To have people who don’t like the rules to take it upon themselves to do something like this is very troubling,” said Steve Soboroff, president of the independent commission that oversees the department.
This isn’t the first time we’ve seen police officers object to monitoring tools. Last November, police officers in Boston complained about lost privacy after GPS monitoring devices were isntalled in their cruisers.
If you want to read about the limits and pitfalls of big data, you’ve had a growing number of options in recent weeks. One of the best is “Big data: are we making a big mistake?” by Tim Hartford in the Financial Times. It’s worth a read in full, and adds to the big data lessons we offered last month.
Additional lessons include:
Big data is often just “found data.” Unlike large datasets captured in scientific laboratories by physicists, companies’ data comes from our web searches, financial data, social network activity, etc. These sources often provide an incomplete and “messy collage of datapoints.”
Big data is vulnerable to sampling errors, favoring or excluding certain groups. Lots of data doesn’t mean representative data. This can create challenges, even for well-meaning data projects. For example, Boston created a smartphone app called “Street Bump,” which uses residents’ smartphone accelerometers to detect potholes and automatically report them to the city. But standing alone, the app is likely to drive the attention of city services to younger, affluent areas where more people own smartphones.
Impressive predictions often come with quiet false positives. Target’s data-driven prediction of a woman’s pregnancy has become legendary. But we don’t know how many times the retailer guessed wrong. “There’s a huge false positive issue,” said Kaiser Fung, who has developed similar programs for other stores.
The bottom line, Hartford argues, is that big data analyses — like traditional statistical analyses — must be handled with care: “Statisticians have spent the past 200 years figuring out what traps lie in wait when we try to understand the world through data. The data are bigger, faster and cheaper these days — but we must not pretend that the traps have all been made safe. They have not.”
Fusion centers and “suspicious activity reporting” are “particularly concerning because of the way they can lead to racial and religious profiling,” explains the EFF in a new and extensive FAQ on fusion centers.
On Monday, the Supreme Court declined to intervene in a pending lawsuit against the NSA’s domestic surveillance of calling records. The Court is not likely to assess the program in the near term.
In testimony to the Council of Europe yesterday, Edward Snowden claimed that the NSA specifically targeted “purely civilian or human rights organizations” for surveillance within the borders of the United States.
Wikipedia is one of the most visible sites on the web, topping search results for many of its 4.4 million articles. Crafted by volunteer editors, it tries to cover everything — and comes remarkably close. But Wikipedia is edited primarily by white men. It provides 159 articles on Simpsons characters, but just 19 on Mexican feminists.
The Wikipedia community is formally open, but can be culturally unwelcoming to diverse voices, including those of women. Women compose as few as 9 percent of all editors, found a recent study. This lack of diversity has spurred extensive discussion and a growing campaign to recruit more women editors.
A recent “edit-a-thon” focused on topics related to art and feminism brought together roughly 600 participants in 30 locations around the world. Sarah Mirk, one of the participants, writes that Wikipedia mirrors the biases of the mainstream media, since it requires articles to cite external markers of an article subject’s “notability.”
While Wikipedia is a great platform for knowledge, it builds on existing institutions. The fact that female artists are less likely to have their work reviewed in mainstream media and less likely to have their work shown in museums means it’s harder to add them to Wikipedia, too.
Nonetheless, the edit-a-thon contributed roughly a hundred new articles on art and feminism topics to Wikipedia, and improved many more.
It’s a start.
New Washington Post columnist Catherine Rampell highlights the importance of the Census Bureau and other federal statistical efforts to track social and economic conditions:
In the past few years, federal statistical programs — you know, the ones that collect information openly through surveys, rather than secretly through wiretapping and malware — have been under attack. Budgets have been chopped and data series eliminated or at least made fuzzier, messier, less useful . . . our measurement tools are breaking down.[For example,] the American Community Survey . . . costs just $234 million to administer but determines how $450 billion in federal funds gets allocated for schools, housing, veterans’ benefits and roads, among other things.
The LAPD’s Pacific Division has asked local residents to help deter crime by taking a walk in neighborhoods it predicts to be at higher risk of crime. It’s novel to see predictive policing technology used to encourage positive community engagement.
The LAPD analyzes past crime data to highlight zones on a map that it considers when deploying its officers. These zones change every day.
“You can kind of think of crime as a disease. If a crime happens, we can see how it affects the likelihood that another incident is going to happen within a certain area in a certain amount of time after that,” explained Jeremy Heffner, who works on predictive policing software.
There is a risk that such predictions, which are based on historical records such as past arrests, might entrench police bias. But here, the Division is also asking for its community’s help. From its release:
To further increase the effectiveness of Predictive Policing we are asking the public to spend any free time that you may have in these areas too. You can simply walk with a neighbor, exercise, or walk your dog in these areas and your presence alone can assist in deterring would be criminals from committing crime in your neighborhood. Each day we will release via social media the closest cross streets for [locations] in your neighborhood.
At least one neighbor has expressed willingness to help out. “I’d change the route I take on dog walks to help out. And if lots of my neighbors do the same, it’ll be a sign of civic health. We’re all responsible for safeguarding our neighborhoods,” said Conor Friedersdorf (a resident of the Venice area of Los Angeles), writing for the Atlantic.