How Do Cloud Collaboration Platforms Handle Search: An Investigation

It’s impossible to overstate how much we take search capabilities for granted. When a cloud platform advertises its search technology, it can often feel like an apartment ad bragging that the place has walls. We assume that search functionality–and powerful search functionality, at that–is a given, the bare minimum that we can expect from cloud collaboration platforms.

It’s really no surprise that so many people feel this way. After all, many of us have spent more of our lives with access to search engines than without. It’s the ground floor of their understanding of the internet.

Still, search technology is a vastly underrated capability and a functionality that many organizations still wrestle with. For those companies that are dealing with hundreds of thousands of files stored in tens of thousands of folders in cloud content collaboration platforms, the ability to efficiently and easily search for what a given user is looking for is essential. The alternative is somehow scrolling through what seems like endless data, hoping that you spot the name of the document or spreadsheet you need.

Cloud storage solutions understand this, and that’s why they emphasize the power of their individual search capabilities. Because the ultimate goal of adopting a collaboration platform is easy, comprehensive access to your data, Box, Dropbox, SharePoint Online, and Google Drive have been working to provide their customers with increasingly effective ways to sort through their files, to find exactly what they’re looking for.

We at Cloud FastPath are constantly interested in the ways that cloud collaboration platforms are developing new technologies and new features, and, recently, we’ve been interested in how cloud search capabilities are growing, how each platform is handling them, and what the next step will likely be.

As such, we put together an experiment, uploading five different files into Box, Dropbox, SharePoint Online, and Google Drive. Each file was specifically crafted to test the search engine on a unique type of search:

  • An image with text on it, to see if the image could be found by the text it contained;
  • A PDF with text, to see whether the engine would recognize the file by that text alone;
  • A Word Document containing non-Romanized languages, to see how well the search engine could recognize unique characters;
  • A spreadsheet containing a linked file, to see whether that document could be found only by the link;
  • A latin1-encoded test file to discover whether the contents of that file would be immediately findable, see other example file types on GitHub.

The Experiment

At the beginning of the experiment, the five files were all uploaded to the aforementioned cloud platforms. We opted to wait roughly ten minutes following the upload so that the platform could digest, process, and index the new files.

In crafting the text for both the image and the PDF, we employed WordGenerator.Net, figuring that an intentionally random combination of words made it less likely for the file to be mistaken for or clustered with other pre-existing files. Although we attempted to keep as many factors controlled as possible, each cloud platform was not identical in terms of how much content and how many users it had. However, they were very similar – none of them had over a TB of content or over 30 users.

In execution, the experiment was divided into two parts. In the first section, we sought to see what types of information could be located using the search capabilities of Box, Dropbox, SharePoint Online, and Google Drive using the aforementioned five files. As each file was designed to test the search engines of all platforms in a different way, it allowed us to see where different platforms excelled, and others fell short. While we did time each search to see how long it would take for a given platform to locate what we were looking for, speed was of far less interest here.

Subsequently, we performed a second test that focused on the speed with which platforms were able to index files. To achieve this, we uploaded a simple word document, then searched it immediately. By the time this test was completed, we hoped to gain a fuller understanding of just how long it would take for an uploaded file to be searchable. Viewed in tandem, these two separate tests were designed to look both at efficiency, and how different platforms were actually able to handle and diversify their search capabilities.

The Results

Looking first at the indexing speed test, the disparities were somewhat intense. While Google Drive and Dropbox Business were able to locate a word document based on the text in its body in under five seconds (in truth, it seemed as though GDrive managed it instantaneously), it took SharePoint Online’s search features almost a minute, and Box’s several minutes. After learning that Box’s search functionality had been down at the time of the initial experiment, we reran the index test several more times. While Box did better, it remained the slowest of the four platforms. Still, the complications during the initial run of testing were an important reminder of one of the major issues that providers need to address when it comes to perfecting search: the possibility of the feature going down. While this may appear somewhat obvious, search—being a standalone feature—can fail while the bulk of a platform works as normal, to the point where users might not even know there’s a systemic problem until they receive a notification. Search going down can be fairly disruptive for businesses and end users, and even something as simple as an indexing speed problem can become immensely frustrating.

Platform Indexing Time
Box >2 minutes
Dropbox 4 seconds
SharePoint Online 50 seconds
Google Drive 4 seconds

 

While the results of the indexing test were vastly different, the other five tests turned out surprisingly similar. Not a single one of the platforms was able to locate either the image or the PDF based on the text it contained, nor were any of the platforms able to find the linked file (though, interestingly, Box seemed capable of finding other PDFs and links—which might suggest that the complications with indexing speed hindered its ability to find certain uploads). However, every single one of the four major platforms was able to quickly identify the foreign characters, and while their efficiency varied, all were able to find the encoding test file in under 30 seconds.

Text in Image Text in PDF Foreign Characters in Word Doc Link in Spreadsheet Latin1-encoded File
BOX No No Yes No Yes
DROPBOX No No Yes No Yes
SHAREPOINT ONLINE No No Yes No Yes
GOOGLE DRIVE No No Yes No Yes

 

Advanced Search Features

Upon completing our tests, we took the time to also peek at the different advanced search features provided by each platform. While Google Drive was by far the most detailed, most platforms provided a healthy dose of specifications including the location of the file (i.e. which folder, site, or drive), the type of file (docx, xlsx, csv, ppt, etc.), who owns the file, and the time that the file was most recently accessed.

Advanced search features in Google Drive

 

The structure of platform storage possibilities plays heavily into what sort of options the platforms provide. As you can see in the following images, SharePoint Online immediately gives users the option to select not only specific files, but specific sites and events as well. While it retains the basic aforementioned categories, it’s clear that SPO recognizes how users are comfortable navigating SharePoint architecture; and the search functionality is specifically tailored to that.

Sharepoint search interface

Sharepoint search results

 

While Box’s search options are comparatively pared down, the platform gives users the choice between searching based on the content, location, and history of the file, and searching by the specific metadata template that a file might contain. Like SharePoint Online, it also gives users the ability to restrict their search to a given folder, or to search their entire cloud ecosystem, simply by checking a box.

Box search interface

 

Dropbox is by far the most basic, with a simple universal search bar. While you are able to search files by either their name of their contents, options such as owner, created date, or type of file are not available, and—despite the speed with which we witnessed Dropbox Business indexing files—the limitations on search criteria can make Dropbox folder structures more difficult to navigate quickly.

The variations in search capabilities largely reflect the ways in which businesses tend to use their cloud platforms, but all have the ability to evolve to more comprehensively assist users in tracking down necessary files and folders. Extensive as Google Drive’s search options are, that’s not the only way forward—simply adding more categories—when it comes to enhancing search technology. In fact, recent developments from organizations like Box suggest that there are far more innovative and creative ways to expand the potential of search in cloud collaboration platforms. 

The Future of Search

While it would be easy to dismiss the search capabilities of SharePoint Online, Box, Dropbox, and Google Drive simply because they each stumbled over certain tests, it would also be a mistake. Cloud collaboration platforms are known for constant, relentless innovation, and if the gaps in performance demonstrate anything, it’s that search functionality is a new and exciting frontier for cloud providers. It’s ultimately, anybody’s game, and how cloud providers seize on this opportunity will likely greatly affect their success.

Already, new technologies and features suggest an evolution in search capabilities. Take Box Skills, which is able to identify specific items and words—such as a tennis racket, or a company slogan—in an image, and tag that image accordingly, adding new levels of detailed organization to user experience through machine learning innovation.

On top of this, platforms such as Google Drive and Dropbox Business employ machine learning backed features that will automatically suggest files to users attempting to search, based on a number of factors. Google’s version of this tool, Google Drive Quick Access–launched at the tail end of 2016 for desktop and available on mobile devices prior—can recommend files to users because of who edits those files, at what time of day those files are regularly accessed, how recently a given file has been amended, and even which files seem most relevant to upcoming scheduled meetings.

Tools like these demonstrate that the teams behind platforms like Dropbox or SharePoint Online, Google Drive or Box are taking search, and its potential for growth seriously. They are leading a nonstop push to expand the features available to both individual users and entire organizations.

As has been the case with nearly every other aspect of cloud collaboration, we firmly believe that search capability is in for some exciting and revolutionary expansions.