PaperCamp: No Marshmallows, Just Term Papers

Google Summer of Code

Submitted by: Submitted by CodeVib

Views: 10

Words: 1261

Pages: 6

Category: Science and Technology

Date Submitted: 05/03/2016 07:51 PM

Interactive pipeline for retrieving, processing and

visualising vast public gene expression datasets

__________________________________________________________________________

Name: Vibhor Sharma

Education: Software Engineering Undergraduate, Wuhan University, China

Nationality: Indian

Email:

vibhormagotra@gmail.com

Mobile: 008613247115103

Mentor: Mr.

Jüri Reimand

Project Description

Summary:

This project will be aimed at creating a software package in R that helps in retrieving the dataset

that are being stored in various public databases like ArrayExpress and GEO and lie there

unused. These datasets contain differentially expressed genes which could provide valuable

insights towards cancer genomics and cancer prediction

Project Aims:

This project would be aimed at

● Retrieving the large collection of gene expression data efficiently.

● Carrying out the preprocessing and quality control on the datasets which is of utmost

importance.

● Visualisation of datasets to browse through them interactively by developing a web

interface.

How the project will fulfill the aims:

● Since there are different platforms and organisms whose datasets are available, so all

these datasets need to be processed separately. For retrieving the datasets we could use

the packages such as “

GEOquery” to query the databases online and download the

necessary data. We can use the GEO DataSets ID (GDSxxx, where xxx represents some

number) to download those datasets or we could also download the simpler

tabdelimited text files like GSE series matrix files as ExpressionSet by using the

respective GSExxxx IDs. So, user could be asked to input the GSE IDs or GDS IDs to

●

download the respective datasets. After getting the GSE series matrix files we can then

see if the expression values are already in terms of log2 or not. If not then we must ...

View Full Essay