Quantcast
Channel: Research Data @Essex
Viewing all articles
Browse latest Browse all 20

Emerging technical solutions for local and cloud based research data management

$
0
0

Research groups and university departments need practical solutions for better managing their data. Storage and access are particular problems, as highlighted by the RDE project’s completed (and soon-to-be-published) consultation work at the university. Almost all the researchers we interviewed expressed concern over how their data was stored and that they felt they couldn’t collaborate either internally of externally as easily as they would like. What follows is a review of some possible approaches to these problems, considering factors such as data security, sharing potential and ease of setup. Here we are looking at the stage between repository and data collection specifically, when data needs to be managed, backed up and secure while allowing for collaboration. There’s been quite a bit of discussion about various potential solutions among the research data management community, so this is a kind of informal synthesis that focuses on three of the most significant.

I’ll first provide a brief bit of background on the state of play at the University of Essex. Researchers are currently given the option (i.e. not forced) of enlisting computer services to provide and maintain storage for their research. We found that most choose not to do this, partially through a lack of awareness that the option exists, but also to maintain control and keep things simple. As a result there are divergent standards of storage across the institution. There are no specific tools for data management provided by the university, and IT services operate a Microsoft Active Directory service through which users interact with the central servers. This provides basic access control and security for all computers on the network and automates a certain amount of maintenance work (e.g. software updates).

DataStage

Official URL: http://www.dataflow.ox.ac.uk/index.php/about/about-datastage

DataStage is a new piece of software developed by Oxford's DataFlow team, targeted squarely at research groups and facilitating the integration of local file management and repository deposit. As well as acting as a tailored accompaniment to their repository package, DataBank, DataStage is offered in standalone form with the promise of cross-repository platform compatibility.

Available as beta software only at present, DataStage is still in active development, so it’s hard to pass any kind of judgement at this stage, but we will make a few comments. Foremost, it's great to see something built from the ground up to fit research data management (RDM) processes. There are two key elements – a simple mapped drive method for adding and organising files (on the users machine), and a web interface for adding metadata and packaging for repository submission. This seems a logical way to do things, and would integrate well with existing research workflows. Tailoring of the web interface to different research groups is possible, but would require a fair amount of programming effort at this stage. DataStage is also rather limited by SWORD functionality at the moment, in that using the transfer protocol is not able to adequaltely handle complex data collections. There’s a lengthy exposition of data specific challenges on the SWORD project blog which is worth a read if you’re following the progress of this tool. We’re really looking forward to the incoming ‘data version’.

Despite above challenges, as a solution for web access and external collaboration in an academic context, there is no other tool that comes close to DataStage yet. DataStage remains in beta, and while it is perhaps not yet feasible for easy deployment, we are excited to see where it goes next.

SharePoint

Official URL: http://sharepoint.microsoft.com/

SharePoint has traditionally been an enterprise oriented file management tool, but there are a few specific cases of SharePoint for RDM that we have come across. The best comes from Southampton’s DataPool project, who have been testing it out as a data management environment that could even be integrated with a repository, a la DataStage. Check out Dorothy Byatt’s screen captures here. This looks impressive, offering an all-encompassing interface for management of the research process. SharePoint has the potential to occupy a similar niche to DataStage, but will perhaps be viewed more favourably by university IT services given existing technical knowledge. This is further aided by the VRE research sharing toolkits Microsoft offer as plugins for SharePoint.

At the University of Essex, the IT department wishes to focus SharePoint development on corporate systems, and are exploring cloud based alternatives for data (see the next section). Perhaps Southampton’s work, if published, could persuade them otherwise? If you have any experience with SharePoint for data management that we haven’t covered here, we'd love to hear from you! We haven’t found much in the way of ‘living’ examples yet.

'Cloud' solutions

While acknowledging that this is more buzz word than anything any specific architecture, this is a topic that researchers have asked us about and merits some discussion. We suspect that popular ‘cloud’ based services such as DropBox and Google Drive are used more widely than reported, and their uptake is likely to increase as their benefits become more widely known. While such storage methods have great potential and incredible convenience for the mobile researcher, potentially significant issues over data security should be very carefully considered. The Information Security Guide: Effective Practices and Solutions for Higher Education guide’ section on cloud storage has the following advice on DropBox:

“Use of cloud data storage solutions such as DropBox should typically be avoided for storage of high risk institutional information. That is, a file that contains private or sensitive information, information that is covered by federal regulations or that has a very high intellectual property value to your institution.”

In addition to there not being absolute guarantees as to the security of your data, they also require agreement to terms and conditions which may not be desirable. Academics should be discouraged from using such services for research then, but if this argument is told hold any weight there needs to be an alternative with the same convenience and flexibility. Fortunately, the JANET Brokerage provide a framework and support hub for securing cloud services for academia. This might include assesmment of service security through a rigorous audit process. The server side aspect then, seems catered for. A novel client will have to be developed however, if there is to be a similar level of workflow integration to e.g. DropBox, and I have yet to hear of any active developments in this area.

An alternative is to operate an internally hosted cloud service such as ownCloud, which involves no loss of ownership or control of data. The Orbital project has done a great review of the ownCloud service, and conclude that it could be ideal for a research environment. The number of cloud options is increasing exponentially and I know other universities have their own alternatives e.g. Exeter working with ATMOS. It will be interesting to see in time which are adopted more widely in academia.

Encryption

Encryption is essential for those working with sensitive data, and is not necessarily a process built into any of the solutions discussed above. One UK Data Archive recommended solution, that we have used to transfer research data during RD@E project, is free and open-source TrueCrypt. The encryption software packages the files in a locked archive that requires a file key and passphrase to unlock. We found it fairly easy to use as total newbies. More extensive guidance on encryption methods can be found on the UK Data Archive website.

Concluding thoughts

It should be clear from this review that there is no technical panacea for managing research data at this moment in time. This is still an emerging area, but the pressure is on to find practical solutions and development is likely to accelerate. For now institutions are presented with two options - either develop bespoke tools (standards and best practise considered of course), or attempt to manage a patchwork of existing tools as best as possible.

It will be interesting to see at the JISC Managing Research Data programme meeting later this month what other tools the community are working with. We will report right here on the blog.

Permalink | Leave a comment  »


Viewing all articles
Browse latest Browse all 20

Trending Articles