CCSE Data Center Services
The goal of the College of Computer Science and Software Engineering Data Research Center (CCSE-DRC) is to provide a cloud service infrastructure for big data analyses and deep learning using cutting-edge software platforms. The CCSE-DRC provides different services based on the needs of our faculty and students.
CCSE-DRC architecture consists in physical machines used as sole resource and mainly as cluster. Our cluster lays the foundation for complex software frameworks.
Services provided by CCSE Data Research Center
KVM’s machines with any Linux distribution are available. Standard setup include 2 Cores, 8 GB memory, and 100 GB Storage, but it can expanded as previous requirement to the CCSE-DRC. Those machines could be Desktop or servers, and can be access using CLI or GUI (SPICE- VNC). For GUI access the user should install the GUI client.
- Compiled against library: libvirt 6.0.0
- Using library: libvirt 6.0.0
- Using API: QEMU 6.0.0
- Running hypervisor: QEMU 4.2.1
LXD is a next generation system container manager. It offers a user ex- perience similar to virtual machines but using Linux containers instead. Some of the advantages of containers over VM’s includes quicker spin- ning up apps, and better share of resources which is vital in our academic community. Similar to VM’s we offer CLI and Desktop access for our users, and deployment of multiples containers to create a personalized micro-cluster.
- Lxd version: 4.0.7
Deep Learning Servers
These are shared resources used by our GRA’s students and faculties. Each server has 4 Tesla Maxwell Nvidia’s GPU’s per server and an ISCSI storage server attached.
The main deep learning frameworks are installed such as PyTorch, Ten- sorflow, Keras, and Caffe. These servers also include the most important Python libraries for machine learning and data science like Numpy, Pan- das, SciPy, Matplotlib, SymPy, Sckits-image, Sckits-learn, Pandas, Bokeh, and Jupyter among others. In addition, R-cran statistical language, Julia mathematical language and Unix tools are installed.
The user can install their own Python libraries using virtual environ- ment, and R packages as no administrators. In case that the user needs administrative rights to install system libraries the servers provides Stan- dard Docker and Docker-nvidia containers. Please ask to CCSE-DRC for more information.
- torch 1.9.0
- torchsummary 1.5.1
- torchvision 0.10.0
- tensorboard 2.6.0
- tensorboard-data-server 0.6.1
- tensorboard-plugin-wit 1.8.0
- tensorflow 2.5.0
- jupyter 1.0.0
- jupyter-client 6.1.12
- keras 2.6.0
- keras-nightly 2.5.0.dev2021032900
- Keras-Preprocessing 1.1.2
- matplotlib 3.4.2
- matplotlib-inline 0.1.2
- numba 0.53.1
- numpy 1.19.5
- opencv-python 22.214.171.124
- pandas 1.3.1
Hadoop is an open-source software framework used for storing and pro- cessing Big Data in a distributed manner on large clusters of commodity hardware.Apache HDFS or Hadoop Distributed File System is a block- structured file system where each file is divided into blocks of a pre- determined size. These blocks are stored across a cluster of one or several machines. Apache Hadoop HDFS Architecture follows a Head/Worker Architecture, where a cluster comprises of a single NameNode (Head node) and all the other nodes are DataNodes (Worker nodes). HDFS can be deployed on a broad spectrum of machines that support Java. Though one can run several DataNodes on a single machine, but in the practical world, these DataNodes are spread across various machines. CCSE-DCR provides a cluster with 10 nodes at the moment but can be expanded as needed. This is a shared resource and can be used to experiment with new map-reduced algorithms or to process huge data.
- Hadoop 2.7
- Spark 2.1
- Spark Standalone Cluster 3.1.2
- Hadoop 3.0
- Spark 3.1.2
Classic data analysis was performed over data resting mostly on a rela- tional database or plain text. The natural consequence of this approach is that the analysis may be conducted on the whole collected data, or on a data batch accumulated over a period of time. Subsequently, because of the massification of portable devices, traditional storage methods be- came insufficient due to the enormous amount of fluid data available. Thus, new technologies like streaming analytics emerged to solve pre- vious limitations.
Streaming analytics processes never ending data originated from con- nected devices(IoT), people networks (social media), and interrelated complex systems (autonomous platforms) among others. Some goals of streaming analytics are to facilitate real time statistical analysis, to per- form machine learning analysis and training, and to interact with other frameworks for permanent data storage. Real time analysis refers not only to real time data analysis, but also to the analysis of data batches
collected over short periods of time, ranging from seconds to minutes. In addition, these systems must be able to place the information in tem- porary storage during a specific period of time and should also be able to store more than one temporal batch.
CCSE-DCR provides a cluster with 2 Kafka nodes at the moment but can be expanded as needed. This is a shared resource for more information contact CCSE-DRC.
Kubernetes, at its basic level, is a system for running and coordinating containerized applications across a cluster of machines. It is a platform designed to completely manage the life cycle of containerized applica- tions and services using methods that provide predictability, scalabil- ity, and high availability. In conjunction with Kubernetes we use Helm which is a package manager for Kubernetes that allows developers and operators to more easily package, configure, and deploy applications and services onto Kubernetes clusters. For more information contact CCSE- DRC
Services we would provide in the near future
The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable.
Azure Development Tools for Teaching.
Faculty, Staff and Students can sign in with their netIDs and passwords and then go to the software section to see everything. This is only for Microsoft products but it includes operating systems and other development software including Project and Visio.
Access to NETLAB+ is fast and easy, requiring only a web browser with a Java plugin. Build-in applications for virtual machine and device console are included. NETLAB+ provides scheduled access to your virtual machines and lab equipment. All lab access is controlled by reservation, using the scheduler. Using the calendar interface, students and instructors can view the pods and the timeslots available to schedule lab time at their convenience.