Exclusive: Kaspersky Lab plans Swiss data center to combat spying allegations – documents

MOSCOW/TORONTO (Reuters) – Moscow-based Kaspersky Lab plans to open a data center in Switzerland to address Western government concerns that Russia exploits its anti-virus software to spy on customers, according to internal documents seen by Reuters.

FILE PHOTO: The logo of Russia’s Kaspersky Lab is displayed at the company’s office in Moscow, Russia October 27, 2017. REUTERS/Maxim Shemetov/File Picture

Kaspersky is setting up the center in response to actions in the United States, Britain and Lithuania last year to stop using the company’s products, according to the documents, which were confirmed by a person with direct knowledge of the matter.

The action is the latest effort by Kaspersky, a global leader in anti-virus software, to parry accusations by the U.S. government and others that the company spies on customers at the behest of Russian intelligence. The U.S. last year ordered civilian government agencies to remove the Kaspersky software from their networks.

Kaspersky has strongly rejected the accusations and filed a lawsuit against the U.S. ban.

The U.S. allegations were the “trigger” for setting up the Swiss data center, said the person familiar with Kapersky’s Switzerland plans, but not the only factor.

“The world is changing,” they said, speaking on condition of anonymity when discussing internal company business. “There is more balkanisation and protectionism.”

The person declined to provide further details on the new project, but added: “This is not just a PR stunt. We are really changing our R&D infrastructure.”

A Kaspersky spokeswoman declined to comment on the documents reviewed by Reuters.

In a statement, Kaspersky Lab said: “To further deliver on the promises of our Global Transparency Initiative, we are finalizing plans for the opening of the company’s first transparency center this year, which will be located in Europe.”

“We understand that during a time of geopolitical tension, mirrored by an increasingly complex cyber-threat landscape, people may have questions and we want to address them.”

Kaspersky Lab launched a campaign in October to dispel concerns about possible collusion with the Russian government by promising to let independent experts scrutinize its software for security vulnerabilities and “back doors” that governments could exploit to spy on its customers.

The company also said at the time that it would open “transparency centers” in Asia, Europe and the United States but did not provide details. The new Swiss facility is dubbed the Swiss Transparency Centre, according to the documents.


Work in Switzerland is due to begin “within weeks” and be completed by early 2020, said the person with knowledge of the matter.

The plans have been approved by Kaspersky Lab CEO and founder Eugene Kaspersky, who owns a majority of the privately held company, and will be announced publicly in the coming months, according to the source.

“Eugene is upset. He would rather spend the money elsewhere. But he knows this is necessary,” the person said.

It is possible the move could be derailed by the Russian security services, who might resist moving the data center outside of their jurisdiction, people familiar with Kaspersky and its relations with the government said.

Western security officials said Russia’s FSB Federal Security Service, successor to the Soviet-era KGB, exerts influence over Kaspersky management decisions, though the company has repeatedly denied those allegations.

The Swiss center will collect and analyze files identified as suspicious on the computers of tens of millions of Kaspersky customers in the United States and European Union, according to the documents reviewed by Reuters. Data from other customers will continue to be sent to a Moscow data center for review and analysis.

Files would only be transmitted from Switzerland to Moscow in cases when anomalies are detected that require manual review, the person said, adding that about 99.6 percent of such samples do not currently undergo this process.

A third party will review the center’s operations to make sure that all requests for such files are properly signed, stored and available for review by outsiders including foreign governments, the person said.

Moving operations to Switzerland will address concerns about laws that enable Russian security services to monitor data transmissions inside Russia and force companies to assist law enforcement agencies, according to the documents describing the plan.

The company will also move the department which builds its anti-virus software using code written in Moscow to Switzerland, the documents showed.

Kaspersky has received “solid support” from the Swiss government, said the source, who did not identify specific officials who have endorsed the plan.

Reporting by Jack Stubbs in Moscow and Jim Finkle in Toronto; Editing by Jonathan Weber

'Socially responsible' investors reassess Facebook ownership

NEW YORK (Reuters) – With European and U.S. lawmakers calling for investigations into reports that Facebook user data was accessed by UK based consultancy Cambridge Analytica to help President Donald Trump win the 2016 election, investors are asking even more questions about the social media company’s operations.

A Facebook sign is displayed at the Conservative Political Action Conference (CPAC) at National Harbor, Maryland, U.S., February 23, 2018. REUTERS/Joshua Roberts

An increasingly vocal base of investors who put their money where their values are had already started to sour on Facebook, one of the market’s tech darlings.

Facebook’s shares closed down nearly 7.0 percent on Monday, wiping nearly $40 billion off its market value as investors worried that potential legislation could damage the company’s advertising business.

Facebook Inc Chief Executive Mark Zuckerberg is facing calls from lawmakers to explain how the political consultancy gained improper access to data on 50 million Facebook users.

Cambridge Analytica said it strongly denies the media claims and said it deleted all Facebook data it obtained from a third-party application in 2014 after learning the information did not adhere to data protection rules.

“The lid is being opened on the black box of Facebook’s data practices, and the picture is not pretty,” said Frank Pasquale, a University of Maryland law professor who has written about Silicon Valley’s use of data.

The scrutiny presents a fresh threat to Facebook’s reputation, which is already under attack over Russia’s alleged use of Facebook tools to sway U.S. voters with divisive and false news posts before and after the 2016 election.

“We do have some concerns,” said Ron Bates, portfolio manager on the $131 million 1919 Socially Responsive Balanced Fund, a Facebook shareholder.

“The big issue of the day around customer incidents and data is something that has been discussed among ESG (environmental, social and corporate governance) investors for some time and has been a concern.”

Bates said he is encouraged by the fact that the company has acknowledged the privacy issues and is responding, and thinks it remains an appropriate investment for now.

Facebook said on Monday it had hired digital forensics firm Stroz Friedberg to carry out a comprehensive audit of Cambridge Analytica and the company had agreed to comply and give the forensics firm complete access to their servers and systems.

“What would be a deal-breaker for us would be if we saw this recurring and we saw significant risk to the consumer around privacy,” said Bates.

More than $20 trillion globally is allocated toward “responsible” investment strategies in 2016, a figure that grew by a quarter from just two years prior, according to Global Sustainable Investment Alliance, an advocacy group.

New York City Comptroller Scott Stringer, who oversees $193 billion in city pension fund assets, said in a statement to Reuters on Monday that, “as investors in Facebook, we’re closely following what are very alarming reports.”

Sustainalytics BV, a widely used research service that rates companies on their ESG performance for investors, told Reuters on Monday it is reviewing its Facebook rating, which is currently “average.”

“We’re definitely taking a look at it to see if there should be some change,” said Matthew Barg, research manager at Sustainalytics.

“Their business model is so closely tied to having access to consumer data and building off that access. You want to see that they understand that and care about that.”

ESG investors had already expressed concerns about Facebook before media reports that Cambridge Analytica harvested the private data on Facebook users to develop techniques to support Trump’s presidential campaign.

Wall Street investors, including ESG funds, have ridden the tech sector to record highs in recent months, betting on further outsized returns from stocks including Facebook, Apple Inc and Google parent Alphabet Inc.

Jennifer Sireklove, director of responsible investing at Seattle-based Parametric, a money manager with $200 billion in assets, said an increasing number of ethics-focused investors were avoiding Facebook and other social media companies, even before the most recent reports about privacy breaches.

Parametric held a call with clients on Friday to discuss concerns about investing in social media companies overall, including Google.

“More investors are starting to question whether these companies are contributing to a fair and well-informed public marketplace, or are we becoming all the more fragmented because of the ways in which these companies are operating,” she said.

Reporting by Trevor Hunnicutt and David Randall; Additional reporting by Kate Duguid in New York and Noel Randewich in San Francisco; Editing by Jennifer Ablan and Clive McKeef

Lithium-Silicon Batteries Could Give Your Phone 30% More Power

A new battery technology could increase the power packed into phones, cars, and smartwatches by 30% or more within the next few years. The new lithium-silicon batteries, nearing production-ready status thanks to startups including Sila Technologies and Angstron Materials, will leapfrog marginal improvements in existing lithium-ion batteries.

Recent promises of breakthrough battery technology have often amounted to little, but veteran Wall Street Journal tech reporter Christopher Mims believes lithium-silicon is the real thing. So do BMW, Intel, and Qualcomm, all of of which are backing the development of the new batteries.

Get Data Sheet, Fortune’s technology newsletter.

The core innovation is building anodes, one of the main components of any battery, primarily from silicon. Silicon anodes hold more power than today’s graphite-based versions, but are often delicate or short-lived in real-world applications. Sila Technologies has built prototypes that solve the problem by using silicon and graphene nanoparticles to make the technology more durable, and says its design can store 20% to 40% more energy than today’s lithium-ions. Several startups are competing to build the best lithium-silicon batteries, though, and one —Enovix, backed by Intel and Qualcomm — says its approach could pack as much as 50% more energy into a smartphone.

One of the major battery suppliers for both Apple and Samsung is Amperex Technology, which has a strategic investment partnership with Sila. That could point to much more long-lasting mobile devices on the way. The new batteries, Amperex Chief Operating Officer Joe Kit Chu Lam told the Journal, will probably be announced in a consumer device within the next two years. BMW also says it aims to incorporate the technology in an electric car by 2023, increasing power capacity by 10% to 15% over lithium-ion batteries.

China Will Block Travel for Those With Bad ‘Social Credit’

Chinese authorities will begin revoking the travel privileges of those with low scores on its so-called “social credit system,” which ranks Chinese citizens based on comprehensive monitoring of their behavior. Those who fall afoul of the system could be blocked from rail and air travel for up to a year.

China’s National Development and Reform Commission released announcements on Friday saying that the restrictions could be triggered by a broad range of offenses. According to Reuters, those include acts from spreading false information about terrorism to using expired tickets or smoking on trains.

Get Data Sheet, Fortune’s technology newsletter.

The Chinese government publicized its plans to create a social credit system in 2014. There is some evidence that the government’s system is entwined with China’s private credit scoring systems, such as Alibaba’s Zhima Credit, which tracks users of the AliPay smartphone payment system. It evaluates not only individuals’ financial history (which has proven problematic enough in the U.S.), but consumption patterns, education, and even social connections.

A Wired report last year found that a user with a low Zhima Credit score had to pay more to rent a bicycle, hotel room, or even an umbrella. Zhima Credit’s CEO has said, in an eerie prefiguring of the new travel restrictions, that the system “will ensure that the bad people in society don’t have a place to go, while good people can move freely and without obstruction.”

Though the policy has only now become public, Reuters says it may have come into effect earlier — in a press conference last year, an official said 6.15 million Chinese citizens had already been blocked from air travel for social misdeeds.

6 Signs You're About to Be Fired

No matter how hard you work, there’s a possibility you may someday be laid off or fired, often without much warning. However, after your boss has delivered the bad news, chances are you’ll be able to look back and think of a few warning signs.

But what if you could know in advance that the hammer was about to fall? Those who have been fired multiple times often report similar experiences in the hours, days, and even weeks before they were let go.

Here are a few signs that you may need to dust off your resume.

1. Your boss warns you.

Your boss likely won’t give you an exact date and time of your firing in advance, but many employees do get warnings. The first indication is likely your performance review, which will contain valuable insights into how your boss thinks you’re doing.

Beyond that, you may receive verbal or written warnings about certain behaviors that could put your job at risk. If you ignore those warnings and refuse to make changes, your supervisor may feel there’s no other choice but to terminate.

2. You commit fireable offenses.

Not every fired employee is guilty of an offense, but there are things you can do that will increase your risk. If you’re chronically late, for instance, you could end up on the chopping block.

In fact, in a 2017 CareerBuilder survey, a whopping 41 percent of employers said they’ve fired an employee for being late. You’ll also put a target on your back by having an affair with a coworker or client, blabbing about your company on social media, or behaving inappropriately.

3. The job is a bad fit.

When you landed the job, it may have been the right fit at the time. Or perhaps it was always a bad match, but you needed the money. Whatever the situation, if your job is no longer right for you, you may not be the only one noticing it.

Consider edging your way back into the job market by networking and keeping an eye out for opportunities that are a good fit. Otherwise, you’re not only risking termination, but you’re wasting time in a job that won’t further your career.

4. You’ve been ostracized.

It usually takes a while for employers to fire someone, especially if HR brings pressure to document everything to avoid legal issues. During that time period, any employees who know the termination is imminent can tend to distance themselves from the person. You may notice people have difficulty making eye contact or you are shut out of important meetings. If you start to feel as though people are avoiding you, it might be time to get your resume ready.

5. Your boss’s behavior has changed.

In the months leading up to a termination, an employee often finds his or her boss has a sudden change in behavior. I’ve seen this run in extremes. At one job years ago, not too long before I was let go, my boss began clamping down on me, micromanaging my every move. I’ve also seen it where a soon-to-be-fired colleague found themselves completely abandoned by the boss. Either way, this type of sudden behavior change isn’t usually good news.

6. Your company has changed.

Layoffs and terminations often occur as a result of a company-wide change. It could be something as simple as losing a big client, cutting the business’s income. Mergers and acquisitions also prompt unexpected staff changes, sometimes impacting large groups of people at once.

It’s important to realize that not every company change will result in terminations. However, employers will usually expend a great deal of effort reassuring employees nothing will change, only to turn around and make changes soon after.

As a journalist and employee of television and radio stations, I saw this situation repeatedly because of the ever-changing media landscape and the layoffs that came with it over the years. You sometimes get a little too familiar with that feeling of dread that pops up before an expected layoff. The best remedy for this is to always keep your resume up to date.

Firings often catch people by surprise, even if there were warning signs. But if you begin to feel uncomfortable with your work situation, you can always meet with a recruiter or begin networking in your industry to make valuable connections. Once you are ready to begin looking for a job, you’ll be in a position to quickly move on to something else.

Bitcoin exchange reaches deal with Barclays for UK transactions

LONDON (Reuters) – One of the biggest bitcoin exchanges has struck a rare deal which will allow it to open a bank account with Britain’s Barclays, making it easier for UK customers of the exchange to buy and sell cryptocurrencies, the UK boss of the exchange said on Wednesday.

Workers are seen in at Barclays bank offices in the Canary Wharf financial district in London, Britain, November 17, 2017. Picture taken November 17, 2017. REUTERS/Toby Melville

Large global banks have been reluctant to do business with companies that handle bitcoin and other digital coins because of concerns they are used by criminals to launder money and that regulators will soon crack down on them.

San Francisco-based exchange, Coinbase, said its UK subsidiary was the first to be granted an e-money license by the UK’s financial watchdog, a precursor to getting the banking relationship with Barclays.

The Barclays account will make it easier for British customers. Previously, they had to transfer pounds into euros and go through an Estonian bank.

“Having domestic GBP payments with Barclays reduces the cost, improves the customer experience…and makes the transaction faster,” said Zeeshan Feroz, Coinbase’s UK CEO.

The UK is the largest market for Coinbase in Europe, and the exchange said its customer base in the region was growing at twice the rate of elsewhere.

A collection of Bitcoin (virtual currency) tokens are displayed in this picture illustration taken December 8, 2017. REUTERS/Benoit Tessier/Illustration

Feroz said that it took considerable time to get a UK bank on board, partly because Barclays needed to be sure that Coinbase had the right systems in place to prevent money laundering.

Regulators across the globe have warned that cryptocurrencies are used by criminals to launder money, and some exchanges have been shut down.

“It’s a completely brand new industry. There’s a lot of understanding and risk management that’s needed,” Feroz said.

Despite growing interest in both digital currencies and the technology behind them, some big lenders have limited their customers ability to buy cryptocurrencies, fearing a plunge in their value will leave customers unable to repay debts.

In February, British banks Lloyds and Virgin Money said they would ban credit card customers from buying cryptocurrencies, following the lead of JP Morgan and Citigroup. [nL8N1PU10Y]

Coinbase said it had also become the first crypto exchange to use Britain’s Faster Payments Scheme, a network used by the traditional financial industry.

Reporting by Tommy Wilkes and Emma Rumney; Editing by Elaine Hardcastle

Hybrid cloud file and object pushes the frontiers of storage

Use of public cloud services have been widely adopted by IT departments around the world. But it has become clear hybrid solutions that span on- and off-premises deployment are often superior, and seem to be on the rise.

However, to get data in and out of the public cloud can be tricky from a performance and consistency point of view. So, could a new wave of distributed file systems and object stores hold the answer?

Hybrid cloud operations require the ability to move data between private and public datacentres. Without data mobility, public and private cloud are nothing more than two separate environments that can’t exploit the benefits of data and application portability.

Looking at the storage that underpins public and private cloud, there are potentially three options available.

Block storage, traditionally used for high-performance input/output (I/O), doesn’t offer practical mobility features. The technology is great on-premise, or across locations operated by the same organisation.

That’s because block access storage depends on the use of a file system above the block level to organise data and provide functionality. For example, snapshots and replication depend on the maintenance of strict consistency between data instances.

Meanwhile, object storage provides high scalability and ubiquitous access, but can lag in terms of data integrity and performance capabilities required by modern applications.

Last writer wins

There’s also no concept of object locking – it’s simply a case of last writer wins. This is great for relatively static content, but not practical for database applications or analytics that need to do partial content reads and updates.

But, object storage is a method of choice for some hybrid cloud storage distributed environments. It can work to provide a single object/file environment across locations with S3 almost a de facto standard for access between sites.

File storage sits between the two extremes. It offers high scalability, data integrity and security and file systems have locking that protect against concurrent updates either locally or globally, depending on how lock management is implemented. Often, file system data security permissions integrate with existing credentials management systems like Active Directory.

File systems, like object storage, implement a single global name space that abstracts from the underlying hardware and provide consistency in accessing content, wherever it is located. Some object storage-based systems also provide file access via network file system (NFS) and server message block protocol (SMB).

In some ways what we’re looking at here are a development of the parallel file system, or its key functionality, for hybrid cloud operations.

Distributed and parallel file systems have been on the market for years. Dell EMC is a market leader with its Isilon hardware platform. Also, DDN offers a hardware solution called Gridscaler and there are also a range of other software solutions like Lustre, Ceph and IBM’s Spectrum Scale (GPFS).

But these are not built for hybrid cloud operations. So, what do new solutions offer over the traditional suppliers?

Distributed file systems 2.0

The new wave of distributed file systems and object stores are built to operate in hybrid cloud environments. In other words, they are designed to work across private and public environments.

Key to this is support for public cloud and the capability to deploy a scale-out file/object cluster in the public cloud and span on/off-premise operations with a hybrid solution.

Native support for public cloud means much more than simply running a software instance in a cloud VM. Solutions need to be deployable with automation, understand the performance characteristics of storage in cloud instances and be lightweight and efficient to reduce costs as much as possible.

New distributed file systems in particular are designed to cover applications that require very low latency to operate efficiently. These include traditional databases, high-performance analytics, financial trading and general high-performance computing applications, such as life sciences and media/entertainment.

By providing data mobility, these new distributed file systems allow end users and IT organisations to take advantage of cheap compute in public cloud, while maintaining data consistency across geographic boundaries.

Supplier roundup

WekaIO was founded in 2013 and has spent almost five years developing a scale-out parallel file system solution called Matrix. Matrix is a POSIX-compliant file system that was specifically designed for NVMe storage.

As a scale-out storage offering, Matrix runs across a cluster of commodity storage servers or can be deployed in the public cloud and run on standard compute instances using local SSD block storage. It also claims hybrid operations are possible, with the ability to tier to public cloud services. WekaIO publishes latency figures as low as 200µs and I/O throughput of 20,000 to 50,000 IOPS per CPU core.

Elastifile was founded in 2014 and has a team with a range of successful storage product developments behind it, including XtremIO and XIV. The Elastifile Cloud File System (ECFS) is a software solution built to scale across thousands of compute nodes, offering file, block and object storage.

ECFS is designed to support heterogeneous environments, including public and private cloud environments under a single global name space. Today, this is achieved using a feature called CloudConnect, which bridges the gap between on-premise and cloud deployments.

Qumulo was founded in 2012 by a team that previously worked on developing the Isilon scale-out NAS platform. The Qumulo File Fabric (QF2) is a scale-out software solution that can be deployed on commodity hardware or in the public cloud.

Cross-platform capabilities are provided through the ability to replicate file shares between physical locations using a feature called Continuous Replication. Although primarily a software solution, QF2 is available as an appliance with a throughput of 4GBps per node (minimum four nodes), although no latency figures are quoted.

Object storage maker Cloudian announced an upgrade in January 2018 to its Hyperstore product which brings true hybrid cloud operations across Microsoft, Amazon and Google cloud environments with data portability between them. Cloudian is based on the Apache Cassandra open source distributed database.

It can come as storage software that customers deploy on commodity hardware, in cloud software format or in hardware appliance form. Hyperfile file access – which is Posix/Windows compliant – can also be deployed on-premise and in the cloud to provide file access.

Multi-cloud data controller

Another object storage specialist, Scality, will release a commercially supported version of its “multi-cloud data controller” Zenko at the end of March. The product promises to allow customers hybrid cloud functionality; to move, replicate, tier, migrate and search data across on-premise, private cloud locations and public cloud, although it’s not that clear how seamless those operations will be.

Zenko is based on Scality’s 2016 launch of its S3 server, which provided S3 access to Scality Ring object storage. The key concept behind Zenko is to allow customers to mix and match Scality on-site storage with storage from different cloud providers, initially Amazon Web Services, Google Cloud Platform and Microsoft Azure.

Microsoft women filed 238 discrimination and harassment complaints

SAN FRANCISCO (Reuters) – Women at Microsoft Corp working in U.S.-based technical jobs filed 238 internal complaints about gender discrimination or sexual harassment between 2010 and 2016, according to court filings made public on Monday.

FILE PHOTO: The Microsoft logo is shown on the Microsoft Theatre in Los Angeles, California, U.S., June 13, 2017. REUTERS/Mike Blake/File Photo – RC177D20CF10

The figure was cited by plaintiffs suing Microsoft for systematically denying pay raises or promotions to women at the world’s largest software company. Microsoft denies it had any such policy.

The lawsuit, filed in Seattle federal court in 2015, is attracting wider attention after a series of powerful men have left or been fired from their jobs in entertainment, the media and politics for sexual misconduct.

Plaintiffs’ attorneys are pushing to proceed as a class action lawsuit, which could cover more than 8,000 women.

More details about Microsoft’s human resources practices were made public on Monday in legal filings submitted as part of that process.

The two sides are exchanging documents ahead of trial, which has not been scheduled.

Out of 118 gender discrimination complaints filed by women at Microsoft, only one was deemed“founded” by the company, according to the unsealed court filings.

Attorneys for the women described the number of complaints as“shocking” in the court filings, and said the response by Microsoft’s investigations team was“lackluster.”

Companies generally keep information about internal discrimination complaints private, making it unclear how the number of complaints at Microsoft compares to those at its competitors.

In a statement on Tuesday, Microsoft said it had a robust system to investigate concerns raised by its employees, and that it wanted them to speak up.

Microsoft budgets more than $55 million a year to promote diversity and inclusion, it said in court filings. The company had about 74,000 U.S. employees at the end of 2017.

Microsoft said the plaintiffs cannot cite one example of a pay or promotion problem in which Microsoft’s investigations team should have found a violation of company policy but did not.

U.S. District Judge James Robart has not yet ruled on the plaintiffs’ request for class action status.

A Reuters review of federal lawsuits filed between 2006 and 2016 revealed hundreds containing sexual harassment allegations where companies used common civil litigation tactics to keep potentially damning information under wraps.

Microsoft had argued that the number of womens’ human resources complaints should be secret because publicizing the outcomes could deter employees from reporting future abuses.

A court-appointed official found that scenario“far too remote a competitive or business harm” to justify keeping the information sealed.

Reporting by Dan Levine; Additional reporting by Salvador Rodriguez; Editing by Bill Rigby, Edwina Gibbs and Bernadette Baum

Dropbox sees IPO price between $16 and $18 per share

(Reuters) – Data-sharing business Dropbox Inc (DBX.O) on Monday filed for an initial public offering of 36 million shares, giving the company a value of more than $7 billion at the higher end of the range.

The DropBox logo is seen in this illustration photo July 28, 2017. REUTERS/Thomas White/Illustration

Dropbox expects its debut price to be between $16 and $18 per share, the company said in a filing. (bit.ly/2FwgJJ2)

The San Francisco-based company, which started as a free service to share and store photos, music and other large files, competes with much larger technology firms such as Alphabet Inc’s (GOOGL.O) Google, Microsoft Corp (MSFT.O) and Amazon.com Inc (AMZN.O) as well as cloud-storage rival Box Inc (BOX.N).

In its regulatory filing with the Securities and Exchange Commission, Dropbox reported 2017 revenue of $1.11 billion, up 31 percent from $844.8 million, a year earlier.

The Dropbox app is seen in this illustration photo October 16, 2017. REUTERS/Thomas White/Illustration

The company’s net loss narrowed to $111.7 million in 2017 from $210.2 million in 2016.

Dropbox, which has 11 million paying users across 180 countries, said that about half of its 2017 revenue came from customers outside the United States.

The IPO will be a key test of Dropbox’s worth after it was valued at almost $10 billion in a private fundraising round in 2014.

Goldman Sachs & Co, JPMorgan, Deutsche Bank Securities, BofA Merrill Lynch are the lead underwriters for the public offer.

Reporting by Diptendu Lahiri in Bengaluru; Editing by Arun Koyyur

Can Machine Learning Find Meaning in a Mess of Genes?

“We don’t have much ground truth in biology.” According to Barbara Engelhardt, a computer scientist at Princeton University, that’s just one of the many challenges that researchers face when trying to prime traditional machine-learning methods to analyze genomic data. Techniques in artificial intelligence and machine learning are dramatically altering the landscape of biological research, but Engelhardt doesn’t think those “black box” approaches are enough to provide the insights necessary for understanding, diagnosing and treating disease. Instead, she’s been developing new statistical tools that search for expected biological patterns to map out the genome’s real but elusive “ground truth.”

Quanta Magazine

author photo


Original story reprinted with permission from Quanta Magazine, an editorially independent publication of the Simons Foundation whose mission is to enhance public understanding of science by covering research developments and trends in mathematics and the physical and life sciences.

Engelhardt likens the effort to detective work, as it involves combing through constellations of genetic variation, and even discarded data, for hidden gems. In research published last October, for example, she used one of her models to determine how mutations relate to the regulation of genes on other chromosomes (referred to as distal genes) in 44 human tissues. Among other findings, the results pointed to a potential genetic target for thyroid cancer therapies. Her work has similarly linked mutations and gene expression to specific features found in pathology images.

The applications of Engelhardt’s research extend beyond genomic studies. She built a different kind of machine-learning model, for instance, that makes recommendations to doctors about when to remove their patients from a ventilator and allow them to breathe on their own.

She hopes her statistical approaches will help clinicians catch certain conditions early, unpack their underlying mechanisms, and treat their causes rather than their symptoms. “We’re talking about solving diseases,” she said.

To this end, she works as a principal investigator with the Genotype-Tissue Expression (GTEx) Consortium, an international research collaboration studying how gene regulation, expression and variation contribute to both healthy phenotypes and disease. Right now, she’s particularly interested in working on neuropsychiatric and neurodegenerative diseases, which are difficult to diagnose and treat.

Quanta Magazine recently spoke with Engelhardt about the shortcomings of black-box machine learning when applied to biological data, the methods she’s developed to address those shortcomings, and the need to sift through “noise” in the data to uncover interesting information. The interview has been condensed and edited for clarity.

What motivated you to focus your machine-learning work on questions in biology?

I’ve always been excited about statistics and machine learning. In graduate school, my adviser, Michael Jordan [at the University of California, Berkeley], said something to the effect of: “You can’t just develop these methods in a vacuum. You need to think about some motivating applications.” I very quickly turned to biology, and ever since, most of the questions that drive my research are not statistical, but rather biological: understanding the genetics and underlying mechanisms of disease, hopefully leading to better diagnostics and therapeutics. But when I think about the field I am in—what papers I read, conferences I attend, classes I teach and students I mentor—my academic focus is on machine learning and applied statistics.

We’ve been finding many associations between genomic markers and disease risk, but except in a few cases, those associations are not predictive and have not allowed us to understand how to diagnose, target and treat diseases. A genetic marker associated with disease risk is often not the true causal marker of the disease—one disease can have many possible genetic causes, and a complex disease might be caused by many, many genetic markers possibly interacting with the environment. These are all challenges that someone with a background in statistical genetics and machine learning, working together with wet-lab scientists and medical doctors, can begin to address and solve. Which would mean we could actually treat genetic diseases—their causes, not just their symptoms.

You’ve spoken before about how traditional statistical approaches won’t suffice for applications in genomics and health care. Why not?

First, because of a lack of interpretability. In machine learning, we often use “black-box” methods—[classification algorithms called] random forests, or deeper learning approaches. But those don’t really allow us to “open” the box, to understand which genes are differentially regulated in particular cell types or which mutations lead to a higher risk of a disease. I’m interested in understanding what’s going on biologically. I can’t just have something that gives an answer without explaining why.

The goal of these methods is often prediction, but given a person’s genotype, it is not particularly useful to estimate the probability that they’ll get Type 2 diabetes. I want to know how they’re going to get Type 2 diabetes: which mutation causes the dysregulation of which gene to lead to the development of the condition. Prediction is not sufficient for the questions I’m asking.

A second reason has to do with sample size. Most of the driving applications of statistics assume that you’re working with a large and growing number of data samples—say, the number of Netflix users or emails coming into your inbox—with a limited number of features or observations that have interesting structure. But when it comes to biomedical data, we don’t have that at all. Instead, we have a limited number of patients in the hospital, a limited number of genotypes we can sequence—but a gigantic set of features or observations for any one person, including all the mutations in their genome. Consequently, many theoretical and applied approaches from statistics can’t be used for genomic data.

What makes the genomic data so challenging to analyze?

The most important signals in biomedical data are often incredibly small and completely swamped by technical noise. It’s not just about how you model the real, biological signal—the questions you’re trying to ask about the data—but also how you model that in the presence of this incredibly heavy-handed noise that’s driven by things you don’t care about, like which population the individuals came from or which technician ran the samples in the lab. You have to get rid of that noise carefully. And we often have a lot of questions that we would like to answer using the data, and we need to run an incredibly large number of statistical tests—literally trillions—to figure out the answers. For example, to identify an association between a mutation in a genome and some trait of interest, where that trait might be the expression levels of a specific gene in a tissue. So how can we develop rigorous, robust testing mechanisms where the signals are really, really small and sometimes very hard to distinguish from noise? How do we correct for all this structure and noise that we know is going to exist?

So what approach do we need to take instead?

My group relies heavily on what we call sparse latent factor models, which can sound quite mathematically complicated. The fundamental idea is that these models partition all the variation we observed in the samples, with respect to only a very small number of features. One of these partitions might include 10 genes, for example, or 20 mutations. And then as a scientist, I can look at those 10 genes and figure out what they have in common, determine what this given partition represents in terms of a biological signal that affects sample variance.

So I think of it as a two-step process: First, build a model that separates all the sources of variation as carefully as possible. Then go in as a scientist to understand what all those partitions represent in terms of a biological signal. After this, we can validate those conclusions in other data sets and think about what else we know about these samples (for instance, whether everyone of the same age is included in one of these partitions).

When you say “go in as a scientist,” what do you mean?

I’m trying to find particular biological patterns, so I build these models with a lot of structure and include a lot about what kinds of signals I’m expecting. I establish a scaffold, a set of parameters that will tell me what the data say, and what patterns may or may not be there. The model itself has only a certain amount of expressivity, so I’ll only be able to find certain types of patterns. From what I’ve seen, existing general models don’t do a great job of finding signals we can interpret biologically: They often just determine the biggest influencers of variance in the data, as opposed to the most biologically impactful sources of variance. The scaffold I build instead represents a very structured, very complex family of possible patterns to describe the data. The data then fill in that scaffold to tell me which parts of that structure are represented and which are not.

So instead of using general models, my group and I carefully look at the data, try to understand what’s going on from the biological perspective, and tailor our models based on what types of patterns we see.

How does the latent factor model work in practice?

We applied one of these latent factor models to pathology images [pictures of tissue slices under a microscope], which are often used to diagnose cancer. For every image, we also had data about the set of genes expressed in those tissues. We wanted to see how the images and the corresponding gene expression levels were coordinated.

We developed a set of features describing each of the images, using a deep-learning method to identify not just pixel-level values but also patterns in the image. We pulled out over a thousand features from each image, give or take, and then applied a latent factor model and found some pretty exciting things.

For example, we found sets of genes and features in one of these partitions that described the presence of immune cells in the brain. You don’t necessarily see these cells on the pathology images, but when we looked at our model, we saw a component there that represented only genes and features associated with immune cells, not brain cells. As far as I know, no one’s seen this kind of signal before. But it becomes incredibly clear when we look at these latent factor components.

You’ve worked with dozens of human tissue types to unpack how specific genetic variations help shape complex traits. What insights have your methods provided?

We had 44 tissues, donated from 449 human cadavers, and their genotypes (sequences of their whole genomes). We wanted to understand more about the differences in how those genotypes expressed their genes in all those tissues, so we did more than 3 trillion tests, one by one, comparing every mutation in the genome with every gene expressed in each tissue. (Running that many tests on the computing clusters we’re using now takes about two weeks; when we move this iteration of GTEx to the cloud as planned, we expect it to take around two hours.) We were trying to figure out whether the [mutant] genotype was driving distal gene expression. In other words, we were looking for mutations that weren’t located on the same chromosome as the genes they were regulating. We didn’t find very much: a little over 600 of these distal associations. Their signals were very low.

But one of the signals was strong: an exciting thyroid association, in which a mutation appeared to distally regulate two different genes. We asked ourselves: How is this mutation affecting expression levels in a completely different part of the genome? In collaboration with Alexis Battle’s lab at Johns Hopkins University, we looked near the mutation on the genome and found a gene called FOXE1, for a transcription factor that regulates the transcription of genes all over the genome. The FOXE1 gene is only expressed in thyroid tissues, which was interesting. But we saw no association between the mutant genotype and the expression levels of FOXE1. So we had to look at the components of the original signal we’d removed before—everything that had appeared to be a technical artifact—to see if we could detect the effects of the FOXE1 protein broadly on the genome.

We found a huge impact of FOXE1 in the technical artifacts we’d removed. FOXE1, it seems, regulates a large number of genes only in the thyroid. Its variation is driven by the mutant genotype we found. And that genotype is also associated with thyroid cancer risk. We went back to the thyroid cancer samples—we had about 500 from the Cancer Genome Atlas—and replicated the distal association signal. These things tell a compelling story, but we wouldn’t have learned it unless we had tried to understand the signal that we’d removed.

What are the implications of such an association?

Now we have a particular mechanism for the development of thyroid cancer and the dysregulation of thyroid cells. If FOXE1 is a druggable target—if we can go back and think about designing drugs to enhance or suppress the expression of FOXE1—then we can hope to prevent people at high thyroid cancer risk from getting it, or to treat people with thyroid cancer more effectively.

The signal from broad-effect transcription factors like FOXE1 actually looks a lot like the effects we typically remove as part of the noise: population structure, or the batches the samples were run in, or the effects of age or sex. A lot of those technical influences are going to affect approximately similar numbers of genes—around 10 percent—in a similar way. That’s why we usually remove signals that have that pattern. In this case, though, we had to understand the domain we were working in. As scientists, we looked through all the signals we’d gotten rid of, and this allowed us to find the effects of FOXE1 showing up so strongly in there. It involved manual labor and insights from a biological background, but we’re thinking about how to develop methods to do it in a more automated way.

So with traditional modeling techniques, we’re missing a lot of real biological effects because they look too similar to noise?

Yes. There are a ton of cases in which the interesting pattern and the noise look similar. Take these distal effects: Pretty much all of them, if they are broad effects, are going to look like the noise signal we systematically get rid of. It’s methodologically challenging. We have to think carefully about how to characterize when a signal is biologically relevant or just noise, and how to distinguish the two. My group is working fairly aggressively on figuring that out.

Why are those relationships so difficult to map, and why look for them?

There are so many tests we have to do; the threshold for the statistical significance of a discovery has to be really, really high. That creates problems for finding these signals, which are often incredibly small; if our threshold is that high, we’re going to miss a lot of them. And biologically, it’s not clear that there are many of these really broad-effect distal signals. You can imagine that natural selection would eliminate the kinds of mutations that affect 10 percent of genes—that we wouldn’t want that kind of variability in the population for so many genes.

But I think there’s no doubt that these distal associations play an enormous role in disease, and that they may be considered as druggable targets. Understanding their role broadly is incredibly important for human health.

Original story reprinted with permission from Quanta Magazine, an editorially independent publication of the Simons Foundation whose mission is to enhance public understanding of science by covering research developments and trends in mathematics and the physical and life sciences.