Thursday, January 3, 2019

Trust In Digital Content

This is the fourth and I hope final part of a series about trust in digital content that might be called:
Is this the real  life?
Is this just fantasy
  The series so far moved down the stack:
  • The first part was Certificate Transparency, about how we know we are getting content from the Web site we intended to.
  • The second part was Securing The Software Supply Chain, about how we know we're running the software we intended to, such as the browser that got the content whose certificate was transparent.
  • The third part was Securing The Hardware Supply Chain, about how we can know that the hardware the software we secured is running on is doing what we expect it to.
Below the fold this part asks whether, even if the certificate, software and hardware were all perfectly secure, we could trust what we were seeing.

Max Read's How Much of the Internet Is Fake? Turns Out, a Lot of It, Actually introduces the idea of "The Inversion":
For a period of time in 2013, the Times reported this year, a full half of YouTube traffic was “bots masquerading as people,” a portion so high that employees feared an inflection point after which YouTube’s systems for detecting fraudulent traffic would begin to regard bot traffic as real and human traffic as fake. They called this hypothetical event “the Inversion.”
These were "click-fraud" bots. Read explains:
In late November, the Justice Department unsealed indictments against eight people accused of fleecing advertisers of $36 million in two of the largest digital ad-fraud operations ever uncovered. Digital advertisers tend to want two things: people to look at their ads and “premium” websites — i.e., established and legitimate publications — on which to host them. The two schemes at issue in the case, dubbed Methbot and 3ve by the security researchers who found them, faked both. Hucksters infected 1.7 million computers with malware that remotely directed traffic to “spoofed” websites — “empty websites designed for bot traffic” that served up a video ad purchased from one of the internet’s vast programmatic ad-exchanges, but that were designed, according to the indictments, “to fool advertisers into thinking that an impression of their ad was served on a premium publisher site,” like that of Vogue or The Economist. Views, meanwhile, were faked by malware-infected computers with marvelously sophisticated techniques to imitate humans: bots “faked clicks, mouse movements, and social network login information to masquerade as engaged human consumers.” Some were sent to browse the internet to gather tracking cookies from other websites, just as a human visitor would have done through regular behavior. Fake people with fake cookies and fake social-media accounts, fake-moving their fake cursors, fake-clicking on fake websites — the fraudsters had essentially created a simulacrum of the internet, where the only real things were the ads.
Outside the simulacrum the ads may be real, malvertising, or cryptocurrency miners, but the content they are displayed in is less and less real. In Managing the Cultural Record in the Information Warfare Era, Cliff Lynch surveys this problem:
The first development is the ability to fabricate audio and video evidence. Software that can do this is becoming readily available and doesn't require extraordinary computational resources. If you want to produce a persuasive video of someone speaking any script you'd like and if that person has a reasonable amount of available recorded video, you can synthesize that video into the fabrication software. The obvious place for this is politics: pick your target politician, put words in his or her mouth, then package this into propaganda or attack ads as desired.

Fabrication is much more than talking heads, of course. In keeping with the long tradition of early technology exploitation in pornography markets, another popular application is "deepfakes," where someone (a public figure or otherwise) is substituted into a starring role in a porn video (the term "deepfakes" is used both for the overall substitution technology and for the specific porn application). This is already happening, though the technology is as yet far from perfect. Beyond the obvious uses (e.g., advertising and propaganda), there are plentiful disturbing applications that remain unexplored, particularly when these can be introduced into authoritative contexts. Imagine, for example, being able to source fabrications such as police body-camera footage, CATV surveillance, or drone/satellite reconnaissance feeds. The nature of evidence is changing quickly.
...
While there's a great deal to be learned from our experiences over the past century, what's different today is the scale, the ready availability of these tools to interested individuals (rather than nation-states), and the move into audio/video contexts.
Insecurity of the Internet of Things adds to the problem, as Charlie Osborne describes in Hackers can exploit this bug in surveillance cameras to tamper with footage:
Researchers have discovered a vulnerability in Nuuo surveillance cameras which can be exploited to hijack these devices and tamper with footage and live feeds.

On Thursday, cybersecurity firm Digital Defense said that its Vulnerability Research Team (VRT) had uncovered a zero-day vulnerability in Nuuo NVRmini 2 Network Video Recorder firmware, software used by hundreds of thousands of surveillance cameras worldwide.
...
Back in September, researchers from Tenable revealed a remote code execution flaw in Nuuo cameras. This vulnerability, nicknamed Peekaboo, also permitted attackers to tamper with camera footage.
As with Lynch, Max Read addresses the same issue of trust in content:
The only site that gives me that dizzying sensation of unreality as often as Amazon does is YouTube, which plays host to weeks’ worth of inverted, inhuman content. TV episodes that have been mirror-flipped to avoid copyright takedowns air next to huckster vloggers flogging merch who air next to anonymously produced videos that are ostensibly for children. An animated video of Spider-Man and Elsa from Frozen riding tractors is not, you know, not real: Some poor soul animated it and gave voice to its actors, and I have no doubt that some number (dozens? Hundreds? Millions? Sure, why not?) of kids have sat and watched it and found some mystifying, occult enjoyment in it. But it’s certainly not “official,” and it’s hard, watching it onscreen as an adult, to understand where it came from and what it means that the view count beneath it is continually ticking up.

These, at least, are mostly bootleg videos of popular fictional characters, i.e., counterfeit unreality. Counterfeit reality is still more difficult to find—for now. In January 2018, an anonymous Redditor created a relatively easy-to-use desktop-app implementation of “deepfakes,” the now-infamous technology that uses artificial-intelligence image processing to replace one face in a video with another — putting, say, a politician’s over a porn star’s. A recent academic paper from researchers at the graphics-card company Nvidia demonstrates a similar technique used to create images of computer-generated “human” faces that look shockingly like photographs of real people. (Next time Russians want to puppeteer a group of invented Americans on Facebook, they won’t even need to steal photos of real people.) Contrary to what you might expect, a world suffused with deepfakes and other artificially generated photographic images won’t be one in which “fake” images are routinely believed to be real, but one in which “real” images are routinely believed to be fake — simply because, in the wake of the Inversion, who’ll be able to tell the difference?
Even without altering a frame of video, simply editing it has already had major real-world consequences. Paul Schrodt's 'The Apprentice' editor recalls Donald Trump saying he wanted to 'drill' female crew members reports:
CineMontage, a journal for the Motion Picture Editors Guild, talked to editors who worked on the NBC reality show, who say that the image of Donald Trump "was carefully crafted and manufactured in postproduction to feature a persona of success, leadership, and glamour, despite the raw footage of the reality star that was often 'a disaster.'"

"We were told to not show anything that was considered too much of a 'peek behind the curtain,'" one editor, Jonathon Braun, told CineMontage.

The editors say one of their biggest challenges was in the boardroom, making Trump's often whimsical decisions about who was fired instead look "legitimate."

"Trump would often make arbitrary decisions which had nothing to do with people's merit," an anonymous editor said. "He'd make decisions based on whom he liked or disliked personally, whether it be for looks or lifestyle, or he'd keep someone that 'would make good TV' [according to Trump]."

This required creative editing to set up the firings in a way that would make them seem logical, according to the sources, and while manipulative editing is standard in reality TV, this was apparently on another level.
Thomas Hale's What crowdfunding is really about describes another way to distort reality:
Imagine you’re a small, fast-growing business. You attract £20m from a single, institutional investor. Suddenly, there’s a public relations crisis. You hire a specialist to deal with the fallout. The institutional investor is concerned, but they don’t really get involved. Their expertise lies in risk and return, not reputation.

Now, imagine that the same investor gets thousands of its employees to continuously defend your reputation on social media. Imagine they include it as a condition of the investment, for free.

You’ve just imagined crowdfunding.
...
It multiplies the size of the online community willing to participate (which in the case of Monzo is already bigger than those with investments). The £20m the company aims to raise this week could easily add well over 10,000 equity investors in its business (the investment limit is £2,000 per person). Instead of trying to control reputation through relationships with a small number of custodians, the sheer volume of support has the capacity to swamp critical voices.
Lynch notices an even more subtle problem:
Anyone who has followed security breaches and penetrations over the past few years knows that the track record of protecting data aggregations from exfiltration and subsequent disclosure or exploitation is very poor. And there are many examples of attackers that have maintained a presence in organizational networks and systems over long periods of time once they have succeeded in an initial penetration. While a tremendous amount of data has been stolen, we hear very little about data that has been compromised or altered, particularly in a low-key way. I believe that in the long term, compromise is going to be much more damaging and destabilizing than disclosure or exfiltration.
He distinguishes two motivations for compromise:
I want to explicitly note here the difference between the act of quietly rewriting the record and enjoying the results of the rewrites that are accepted as truth and that of deliberately destroying the confidence of the public (including the scholarly community) by creating compromise, confusion, and ambiguity to suggest that the record cannot be trusted.
Faked data doesn't have to be the result of external compromise:
Metrics should be the most real thing on the internet: They are countable, trackable, and verifiable, and their existence undergirds the advertising business that drives our biggest social and search platforms. Yet not even Facebook, the world’s greatest data–gathering organization, seems able to produce genuine figures. In October, small advertisers filed suit against the social-media giant, accusing it of covering up, for a year, its significant overstatements of the time users spent watching videos on the platform (by 60 to 80 percent, Facebook says; by 150 to 900 percent, the plaintiffs say). According to an exhaustive list at MarketingLand, over the past two years Facebook has admitted to misreporting the reach of posts on Facebook Pages (in two different ways), the rate at which viewers complete ad videos, the average time spent reading its “Instant Articles,” the amount of referral traffic from Facebook to external websites, the number of views that videos received via Facebook’s mobile site, and the number of video views in Instant Articles.
Facebook is lying for its own profit. Nathalie Maréchal's Targeted Advertising Is Ruining the Internet and Breaking the World gets closer to the real question of cui bono? from destroying trust in media, and especially academic content:
Safiya Noble, an associate professor at the University of California, Los Angeles and author of Algorithms of Oppression, told me in an email that “we are dependent upon commercial search engines to sort truth from fiction, yet these too, are unreliable fact-checkers on many social and political issues. In essence, we are witnessing a full-blown failure of trust in online platforms at a time when they are the most influential force in undermining or protecting democratic ideals around the world.”
...
Advertising’s shift to digital has cannibalized the news media’s revenue, thus weakening the entire public sphere. And linking advertising to pageviews incentivizes media organizations to produce articles that perform well, sometimes at the expense of material that educates, entertains, or holds power-holders accountable. Targeted advertising provides tools for political advertisers and propagandists to micro-segment audiences in ways that inhibit a common understanding of reality. This creates a perfect storm for authoritarian populists like Rodrigo Duterte, Donald Trump, and Jairo Bolsanaro to seize power, with dire consequences for human rights.
Tom Sullivan edges toward the same question:
It is remarkable what moral compromises people will make provided the right incentives. Intelligence and foreign policy analyst Malcolm Nance recently mentioned the acronym used in the intelligence community for motives behind people becoming spies. MICE: Money, Ideology, Compromise or Coercion, and Ego. It was money that made “Sashko” lie for a living.

Filmmaker Kate Stonehill's "Fake News Fairytale" tells the story of a Macedonian teenager who took up spreading fake news for a quick buck until he saw what havoc his actions contributed to thousands of mile away:
“I think there’s a common misconception that people who write fake news must have a nefarious desire to influence politics one way or another,” Stonehill told The Atlantic. “I’m sure some of them undoubtedly do, but I met many people in Macedonia, including Sashko, who were writing fake news simply because they can make some money.”

[...]

Stonehill believes the first step to combatting the seductive proliferation of falsehoods is opening up an honest, critical discussion about technology, speech, and politics in order to better understand the fake-news phenomenon. “How, when, and why did the truth lose its currency?” she asked. “Who is profiting when the truth doesn’t matter? In my opinion, we’re only just beginning to unpack the answers to these questions.”
"Who is profiting when the truth doesn’t matter?" is the important question. The answer is obvious - those with the power to shape the lie. Governments, not just authoritarian governments, and powerful corporations. Here are a few US examples:
  • We are at the end of a historic process of governments of all stripes failing to weigh the short-term benefits of lying to their citizens against the long-term costs of eroding belief in government pronouncements. The Cold War, the JFK, RFK and MLK assassinations, Vietnam, Iraq, the War on Terror, austerity, and the foreclosure crisis all featured blatant lying by the government. They have led to a situation in which no-one in the reality-based community believes anything they hear from the current administration. Greg Sargent's tweetstorm makes important points about Trump's lying:
    Why does Trump lie *all the time* about *everything,* even the most trivial, easily disprovable matters?

    The frequency and the audacity of Trump’s disinformation is the *whole point* of it -- to wear you down. More and more of the lies slip past, undetected and uncorrected.
    ...
    Once Trump’s lying is understood as concerted and deliberate disinformation, it becomes clear that the frequency and audacity of it is *the whole point.*

    Those are features of the lying. They are central to declaring the power to say what reality is

    The other crucial half of this is to destroy the credibility of the institutional press.

    Previous presidents have tangled with the media. But Trump’s ongoing casting of the press as the "enemy of the people" is in important respects something new:
    ...
    Trump is *openly and unapologetically* declaring that norms of consistency and standards of interplay with the institutional press *do not* bind him.
  • Over the same period corporations have run massive long-term disinformation campaigns about pollution, smoking, and climate change, among others. And billionaire right-wing media owners have facilitated them, while using their reach to intimidate governments (see Hack Attack, Nick Davies' astonishing account of the Murdoch press' "information operations" in the UK).
In the academic context, Lynch sees some hope:
A four-pronged approach to the new information warfare environment seems to be emerging. One prong is greatly improved forensics; this is a mostly technical challenge, and memory organizations will be mainly users, not developers, of these technologies. Documentation of provenance and chain of custody are already natural actions for memory organizations; the challenge here is to make this work more transparent and rigorous and to allow broad participation. Capture of materials, particularly in a world of highly targeted and not easily visible channels, will be a third challenge at both technical and intellectual levels (though we are seeing some help now from platform providers). Finally, contextualization of fakes or suspected fakes is perhaps the greatest challenge, and the one that is least amenable to technological solutions.
Even in the academic context, Lynch's prongs have issues:
  1. Forensics will always remain one side of a co-evolutionary process. Machine learning merely accelerates both sides of it. And even during periods when forensics have the upper hand, they must be applied to have an effect. Unless they are built in to browsers and enabled by default, even most scholars will not bother to apply them.
  2. Documentation of provenance and chain of custody maybe a natural action for memory organizations, but in an era of shrinking real budgets it isn't a priority for funding as against, for example, subscriptions.
  3. Capture of open access academic content isn't a priority for funding either; capture of paywalled content is difficult and sure to become more so as Elsevier absorbs the whole of the academic workflow.
  4. Contextualization is resource intensive and, as Lynch points out, hard to automate. So, again, it will be hard to justify adequate funding.
In the research communication context, the bad incentives researchers, reviewers and publishers operate under are destroying trust in peer-reviewed research from the inside. I wrote in 2016's More Is Not Better:
Even if there is no actual misconduct, the bad incentives will still cause bad science to proliferate via natural selection, or the scientific equivalent of Gresham's Law that "bad money drives out good". The Economist's Incentive Malus, subtitled Poor scientific methods may be hereditary, is based on The natural selection of bad science by Paul E. Smaldino and Richard McElreath, which starts:

Poor research design and data analysis encourage false-positive findings. Such poor methods persist despite perennial calls for improvement, suggesting that they result from something more than just misunderstanding. The persistence of poor methods results partly from incentives that favour them, leading to the natural selection of bad science. This dynamic requires no conscious strategizing—no deliberate cheating nor loafing—by scientists, only that publication is a principal factor for career advancement.
Of Lynch's four prongs, only forensics in the form of replication and audit bots such as Statcheck can improve things at this level.

Outside the academic context, Lynch's prongs are wholly inadequate to the scale of the problem. It is one of those problems so common in this era of massive inequality whose costs are imposed on everyone while the benefits accrue to a small, affluent minority. Fixing them requires overwhelming coordinated political action, the very thing destroying trust in digital content is designed to prevent.

5 comments:

David. said...

Much more on how editing created Donald Trump in Patrick Radden Keefe's How Mark Burnett Resurrected Donald Trump as an Icon of American Success:

"Burnett has often boasted that, for each televised hour of “The Apprentice,” his crews shot as many as three hundred hours of footage. The real alchemy of reality television is the editing—sifting through a compost heap of clips and piecing together an absorbing story. Jonathon Braun, an editor who started working with Burnett on “Survivor” and then worked on the first six seasons of “The Apprentice,” told me, “You don’t make anything up. But you accentuate things that you see as themes.” He readily conceded how distorting this process can be."

David. said...

It is advertising, but PPCprotect has a good overview of click farms.

David. said...

Bad bots now make up 20 percent of web traffic by Charlie Osborne has a astonishing graphic of the small proportion of Web traffic that is actual humans (probably):

"Bots, in general, are estimated to make up roughly 37.9 percent of all Internet traffic. In 2018, one in five website requests -- 20.4 percent -- of traffic was generated by bad bots alone."

David. said...

In Let adware be treated as malware, Canuck boffins declare after breaking open Wajam ad injector, Thomas Claiburn reports on research by Xavier de Carné de Carnavalet and Mohammad Mannan of Concordia University:

"Wajam, which injects ads into browser traffic, uses techniques employed by malware: browser process injection attacks (man-in-the-browser) seen in the Zeus banking Trojan, anti-analysis and evasion techniques, anti-detection features seen in rootkits, security policy downgrading and data leakage.

Also, over the past four years, the code has contained flaws that expose people using it to arbitrary content injection, man-in-the-middle (MITM) attacks, and remote code execution (RCE)."

The problem is that this adware isn't recognized as malware because:

"security companies remain reticent to apply the term malware too liberally because companies making dubious software have a history of suing. Recall in 2005 how spyware biz Zango, now defunct, sued Zone Labs for calling its software what it was."

David. said...

StackOverflow is trying but failing to protect its readers from tracking and malvertising, reports Tim Anderson for The Register:

"StackOverflow has previously stated that its policy "includes but is not limited to running only static, non-animated banner[s], keeping all ads relevant to software development, not participating in real-time bidding or selling our inventory to ad networks. We are not selling user data or targeting ads to you based on any personally identifiable user data."

An updated policy that covers collecting personal information, monitoring account activity and using profile information and job history to target advertising is here."

Despite this, a banner ad for Microsoft Azure was found to be:

"complete with JavaScript code intended to track users regardless of their privacy choices.

A user (Gregg Man from the Google Chrome developer team) noticed the issue because he had browser developer tools open and it appeared the site was trying to play audio, a common annoyance, with the debug message "The AudioContext was not allowed to start".

Deeper investigation though showed that the script was not in fact trying to start audio, but rather calling numerous browser APIs in order to create a unique "fingerprint" for the user's computer. This means that even with cookies blocked, the ad server can track the user."