Common Steps to Preparing the Infrastructure for Institutional and Enterprise AI

몬드리안에이아이(주)

With recent advances in Artificial Intelligence (AI), companies and institutions around the world are gearing up their effort to integrate AI into the business and solution workflow. Institutions with sufficient resources, can build the AI capability internally. It also not uncommon to see companies shop around, experimenting with various AI offerings from startups or established companies.

There are so many domains where AI investment and bet are placed to transform the industries and daily life. From investment point of view, McKinsey considers six categories of AI investment: autonomous vehicles, natural language, computer vision, smart robotics, virtual agents, and general purpose machine learning. In 2016, the biggest investment was made to general purpose machine learning, amounting at around $ 7 billion.

Investing in AI is investing for the future. Besides investment in people by recruiting in AI-related roles and preparing the existing workforce to be ready for performing AI tasks, institutions and enterprises need to be well advised on the infrastructure needed to build the AI capability. Investment in the infrastructure is not a lightweight task due to its long term impact to the institutions. Emerging technology like AI introduces new set of tools and systems, which may impact existing business process and workflow. Thus, careful considerations should be put when making an institution of company-wide choice of AI infrastructure and tools suite.

One of the important aspects in building AI capability is AI modeling, which consists of three phases: model generation, model training, and model evaluation/retraining. To build such good AI model with high precision and recall, it is important to ensure that these elements exist: good quality data, proper algorithm, and sufficient computing infrastructure.

Good quality data. Raw data is usually not AI ready. Data should be first cleaned, filtered, transformed, and enriched before being used to build the AI model.

Proper algorithm. At the current state, AI model is tailored to specific problem domain. Researchers and practitioners are still searching and experimenting with developing Artificial General Intelligence (AGI), which corresponds to machine intelligence that could successfully perform any intellectual task that a human being can. Few years ago, convolutional neural network (CNN) was arguably the main algorithm to build neural-network-based AI. These days newer algorithms such generative adversarial network (GAN), recurrent neural network (RNN), reinforcement learning (RL) have also gained popularity to be applied in various problem domains.

Sufficient computing infrastructure. Model generation is an iterative process. This means that once the model is generated, it needs to keep learning and improving as the system takes more data input and the data analyst or engineer identifies false positives or true negatives. Improving the model can be manifested in changing the parameters of the neurons in the neural network or adding/removing layers. This iterative tasks consume significant amount of computing resources. Hence, it is also important to plan the computing infrastructure such that the process of building and training models can be done in less time.

The Era of GPU Computing

Parallelization has been a common route to improve the system performance. A decade ago, parallelization was primarily done by splitting processing into multiple CPU cores (multi-core parallelism) or distributing the task payload to several computing nodes (multi-node parallelism / distributed system). These recent years, a new way of parallelization has been taking the spotlight. It is in the GPU parallelization.

One big strength of GPU compared with CPU is the number of cores. High-end NVIDIA desktop GPUs, for example, are built from more than two thousands cores. Compare this with modern server with eight or sixteen cores. Each core in GPU has also been optimized to execute computing intensive tasks, such as floating point computation or matrix operation, which are very common in AI computation.

The core-level parallelization is further amplified with node-level GPU parallelization. Using technology such as NVIDIA Collective Communication (NCCL), it is becoming possible to perform parallel computation in GPUs located in different host. This brings interesting possibilities about the complex neural network formation that can be sped up using distributed GPUs, hence increasing possibility of the discovery of new AI algorithms with better precision and more general applicability.

Preparing Infrastructure for the Era of AI

The era of AI-powered computing can be preceded with the transition from CPU-based computation to GPU-based computation. Institutions and enterprises need to be ready for such transition and prepare in advance in order to gain first-mover advantage against the competitors.

While institutions can be immediately use various AI tools and platforms provided by the big players such as Amazon with its Sage Maker solution or Azure with Azure with Azure Machine Learning services, building such infrastructure in house can also be a preferred route.

Despite different use of AI, the infrastructure is usually set up with the following common steps:

1. Set up distributed cluster for traditional distributed compute task

The traditional distributed compute cluster is primarily used for Extract Transform Load (ETL) task. Due to the circumstances where data is not immediately usable by the AI modeler, an ETL cluster may need to be provisioned. The cluster can run Hadoop or Spark for processing raw data and transforming it into objects / format that is readily consumed by the AI modeler

2. Set up GPU workstation / cluster

The GPU workstation will be primarily used for AI modeling activities. A single workstation shall suffice for generating few models. However, when thousands of AI models should be generated, trained and evaluated, it is more desirable to set up a cluster of GPU servers and perform modeling activities on the cluster.

Model generation involves experiments with data transformation. A more convenient tool for a data scientist / AI engineers could be notebook-style UI. Jupyter notebook is often used as the Notebook UI for fiddling with data and generating the UI model.

3. Automate the infrastructure deployment

As more data is processed, the infrastructure should also be expanded. This means that new compute node or GPU workstation should be provisioned. Following the provisioning of the hardware is the installation of software components. An institution without sufficient engineering / devops team may face difficulties in installing and configuring components for doing AI. However, a technology-focused company with sufficient devops may kickstart the initiative to automate the installation, configuration and deployment of the infrastructure.

The first step to automating the infrastructure deployment is by containerizing software components. This means that instead of installing on VM or baremetals, software components will be installed into a container, published into a registry, and immediately deployed by pulling the relevant container from the registry.

Mondrian AI has been building AI workstation and platform to help institution and enterprises start their AI journey with less pain. To learn more about Mondrian AI workstation and the Mondrian Platform, simply reach out to us at contact[at]mondrian.ai.

기업문화 엿볼 때, 더팀스

로그인

/