CS 643 Cloud Computing Programming Project 2
Abstract
<h2>Cover Page</h2> <p><strong>CS 643 Cloud Computing Programming Project 2</strong></p> <p>Student Name</p> <p>Institutional Affiliation</p> <p>Course</p> <p>Instructor</p> <p>Date</p> <h2>Project Purpose and Learning Objectives</h2> <p>The purpose of this individual programming assignment is to develop practical skills in building parallel machine learning applications on the Amazon AWS cloud platform. The project focuses on using Apache Spark to train a machine learning model across multiple EC2 instances, applying Spark MLlib to develop and evaluate the model, and using Docker to containerize the prediction application for simplified deployment.</p> <h2>Machine Learning Application Requirements</h2> <p>The project requires the development of a wine quality prediction model using Spark and MLlib. The model must be trained using the TrainingDataset.csv file on four EC2 instances in parallel. The ValidationDataset.csv file should be used to validate the model and tune model parameters for improved prediction performance. The final prediction application should accept a test dataset path as a command-line argument and output the model’s prediction performance using the F1 score.</p> <h2>Implementation Environment and Programming Requirements</h2> <p>The assignment must be implemented in Java, Scala, or Python on Ubuntu Linux. Students may use AWS services such as EMR, Flintrock, or other Spark-compatible cloud services to complete parallel training. The training process does not need to be containerized, but the prediction application must be capable of running on a single EC2 instance before being packaged into a Docker container.</p> <h2>Docker Deployment Requirement</h2> <p>The prediction application must be packaged inside a Docker container so that it can be deployed and tested across different environments. The Docker image should be uploaded to Docker Hub. The testing command should allow the evaluator to run the container with a test file path, using Docker volume mapping if needed to make the dataset accessible inside the container.</p> <h2>Submission Requirements</h2> <p>The final submission must include a text, Word, or PDF document containing a GitHub link to the project code and a Docker Hub link to the container image. The GitHub repository should include both the parallel training code and the prediction application code. The document must also provide step-by-step instructions for setting up the AWS cloud environment, running parallel model training, executing the prediction application without Docker, and running the prediction application with Docker.</p> <h2>AI Tool Usage Documentation</h2> <p>If ChatGPT or another AI coding assistant is used, the submission must explain how the tool supported the project. The report should identify which code was generated by AI, which code was written from scratch, and which AI-generated code was adapted. It should also describe the usefulness of the AI-generated suggestions and provide an overall reflection on the experience of using AI tools during development.</p> <h2>Grading Criteria</h2> <p>The project will be graded based on four major components. Parallel training implementation is worth 40 points, the single-machine prediction application is worth 20 points, the Docker container for the prediction application is worth 20 points, and the written description of the development process, including any ChatGPT or copilot experience, is worth 20 points. An additional 20 points of extra credit may be awarded to the top three submissions based on prediction performance.</p> <h2>Conclusion</h2> <p>This project provides practical experience in cloud-based machine learning, parallel model training, Spark MLlib, AWS EC2 deployment, and Docker containerization. Successful completion requires both technical implementation and clear documentation of the development process, cloud setup, model training, prediction execution, Docker deployment, and any AI-assisted programming support.</p>