BioHPC Home
BioHPC
Computational Biology Application Suite for High Performance Computing
BioHPC: Architecture
The
system consists of a web server running the interface (ASP.NET C#),
Microsoft SQL server (ADO.NET), compute clusters running Microsoft
Windows, ftp server and file server. Two local compute cluster
schedulers are supported (CCS and Cornell vsched), remote clusters can
be used via JSDL. An example of working JSDL connection can be seen in
BioHPC @ CBS, which is linked to Athena cluster at Microsoft
headquarters.
Users can interact with their jobs and data by a web browser, ftp or
e-mail. The jobs are currently submitted through BioHPC active server
web pages, users are provided with links for job progress monitoring.
When a job is finished an e-mail with link to final results (by http or
ftp) is sent. The jobs can be cancelled, stopped and restarted by users
through a set of urls provided at the time of submission. The interface
has also a built in user management system which can limit software
and/or database access to specified users. Data access between different
users is restricted – users can access only their own jobs and data,
even when working as guests. There is also administrative interface
allowing for easy management of jobs, users, applications with automatic
e-mail notification of possible problems.
Compute clusters
Two MS Windows based systems are supported:
- MS Windows Compute Cluster Server 2003 is used on all x64
hardware platforms. MS MPI is used for parallel communication.
- MS Windows Server 2003 in using Cornell vsched scheduler is
used on all x32 hardware platforms. MPI Pro is used
for parallel communication.
Web server
- Web server is required, but it does not need to be a separate
machine. It can be installed on head node.
- The interface is written in C# on ASP.NET
platform.
- Web pages use Javascript on the client browser
and ASP on the server side.
- It is fully compatible with all popular web
browsers supporting DOM and Javascipt.
- CBSU web server is running MS Windows Server
2003.
- Windows SDK and .NET 3.0 are required.
File server
- File server is required, but it does not need to be a separate
machine. It can be installed on cluster head node or on web server.
- Network drive is used for simple data exchange and sharing between
nodes
- Network drive is also available from outside
the cluster for web server
- It is essential for computational biology
applications to maintain local HD storage (~50GB) on nodes for sequence
databases
- Current BioHPC@CBSU file server has capacity of
6TB of storage
Ftp server
- FTP server is optional, and may be located on head node or web
server or file server
- FTP server is used for long term storage of finished job’s results
- It is also used for transferring input/output
data files that are too big for http transfer
- BioHPC@CBSU FTP server runs MS Windows Server 2003
and has storage capacity of 6TB
SQL server
- SQL server is required, but it does not need to
be a separate machine. It may be installed on head node, web server or
other machine.
- Full MS SQL server is NOT necessary, MS SQL Server Express Advanced
(free) may be used.
- SQL server is used to store and manage user and interface
information.
- It generates unique job id during submission,
and keeps all further job information
- MS Windows Server 2003 and MS SQL Server 2005.