An Introduction and Explanation
At the time of writing (July 2025), Oracle have a “Race to Certification 2025” promotion where several certifications have free training and exam attempts. One of these is the snappily named Oracle Cloud Infrastructure 2025 Certified AI Foundations Associate. I took a lot of notes while going through the video training, and as I learn by writing, I have turned these notes into a couple of articles (with a little help from AI, obviously), the first of which appears below.
Whether you intend taking the exam or not, this foundation article is a useful general primer on how AI works and especially on the language and terms that are used. If you do want to obtain the certification, the contents of these two articles cover the full syllabus and may be an alternative, or at least a supplement to Oracle’s own video-based training.
Here is the link to the certification if you are interested (you will need an Oracle account): Oracle Cloud Infrastructure AI Foundations Associate 2025 certification
In this foundation article, we will cover:
- Grasping the Fundamentals: Clearly distinguish between Artificial Intelligence, Machine Learning, and Deep Learning.
- Learning the Terminology: Become fluent in the essential vocabulary of AI, from “regression” and “CNNs” to “LLMs” and “fine-tuning.”
- Exploring Practical Use Cases: Discover how AI is applied to solve real-world problems, such as analysing customer feedback, providing medical transcriptions, and creating new content.
The follow-up article presents a more practical angle on AI capabilities in the Oracle Cloud:
- See Theory in Action: Understand how a major cloud provider like Oracle Cloud Infrastructure delivers AI capabilities through its infrastructure, platforms, and services.
- Consider the Ethical Implications: Appreciate the importance of building AI systems that are responsible, fair, and trustworthy.
Let’s begin.
Module 1: AI Foundations

Introduction to AI Concepts
The goal of this module is to establish a clear understanding of the core concepts that define Artificial Intelligence.
We will explain the key terms you’ll encounter when dealing with AI, explore the relationship between them, and outline the standard process every machine learning project follows. By the end of this module, you will have a solid framework for understanding the more advanced topics to come.
Artificial Intelligence (AI)
Definition: At its broadest, AI refers to the development of computer systems able to perform tasks that normally require human intelligence. It is the science of making machines capable of imitating human thought and behaviour. Note that the term “AI” doesn’t imply the systems are actually “intelligent”, only that they appear to be. There is a related term, “Artificial General Intelligence (AGI)“, that is generally used for systems that display real intelligence, a capability that doesn’t exist as yet.
Objective: The overall objective is to create systems that can perceive their environment, reason, learn, and make decisions to achieve specific goals. A key driver for AI adoption is its ability to process vast amounts of data at a speed and scale far beyond human capability, which allows for enhanced efficiency, pattern detection, and automation.
Example: A self-driving car uses AI to perceive its surroundings (pedestrians, other cars, traffic signals) and make complex decisions like changing lanes safely. This single application involves multiple underlying AI techniques.
Machine Learning (ML)
Definition: Machine Learning is a subset of AI. Instead of being explicitly programmed with rules, ML systems are given the ability to learn and improve from experience on their own. The core objective is to develop algorithms that learn patterns from past data to make predictions or identify trends.
The Learning Process: ML enables systems to learn from large amounts of provided data. For example, instead of writing thousands of rules to identify spam, you provide an ML algorithm with thousands of example emails, and it learns the characteristics of spam for itself.
Example: An email filtering system uses machine learning to automatically classify incoming messages as “spam” or “not spam” based on patterns it has learned from a history of previously categorised emails.
Deep Learning (DL)
Definition: Deep Learning is a specialised subset of ML that uses a layered structure of algorithms known as Artificial Neural Networks (ANNs), which are inspired by the human brain. Deep Learning excels at finding intricate and complex patterns in large datasets.
Key Characteristic: DL is particularly effective with complex, unstructured data like images, sound, and natural language, where features may or may not be easily interpretable by humans.
Example: An image classification application uses deep learning to recognise complex visual patterns like shapes, textures, and colours. This enables it to identify whether a photo contains a specific object (like a cat or a dog).
The Machine Learning Lifecycle
Regardless of the specific technique used, building an AI solution generally follows a standardised, cyclical process known as the Machine Learning Lifecycle. Understanding these stages is fundamental to knowing how AI is implemented in the real world.
There are several different definitions with varying numbers of steps, but the basic lifecycle consists of the following key stages:
- Data Collection & Preparation – The initial step involves gathering the necessary data for the task. The quality and quantity of this data are critical for the success of the model, so the preparation must include cleansing the data (handling missing values, removing errors) and preparing it for the model (e.g., labelling).
- Model Training – This is the core “learning” phase. An algorithm processes the prepared data to establish a relationship between the input features and the desired output. During training, the model adjusts its internal parameters to find patterns and create an accurate representation of the data.
- Model Evaluation – Once trained, the model’s performance must be tested on a separate set of data it has not seen before. Doing so measures the model’s accuracy and effectiveness, ensuring it can generalise its learning to new, unseen situations.
- Inference (Prediction) – Inference is the primary function of a trained model, using the finalised model to make predictions or decisions on new, live data points. For example, after being trained on historical house prices, a model can infer the price of a new house on the market.
- Deployment – Deployment is the process of making the trained model available for use in a production environment, such as integrating it into a mobile app, a website, or an internal business process.
- Monitoring & Retraining – Once deployed, a model’s performance is continuously monitored. Over time, as new data becomes available or patterns in the world change, the model may need to be retrained with updated data to maintain its accuracy. This cyclical monitoring and updating ensures the AI system remains relevant and effective.
Module 2: Machine Learning

Introduction
Now that you understand the high-level landscape of AI, this module will dig into the practical engine of modern AI: Machine Learning. We will explore the three primary ways that machines learn: Supervised, Unsupervised, and Reinforcement Learning.
You will learn how to identify which approach to use for a given problem and understand the fundamental concepts like regression, classification, and clustering that form the building blocks of most AI applications.
Types of Machine Learning
Supervised Learning: Learning from Labelled Data
Supervised learning is the most common type of machine learning. It’s called “supervised” because the learning process is guided by a dataset where the correct answers are already known, much like a student learning with a teacher’s answer sheet.
Core Concept: The model is trained on labelled data, which consists of input features and a corresponding output label, also known as the target variable. The goal is for the model to learn the relationship between the inputs and the target so it can predict the target for new, unseen inputs.
Use Cases:
- Spam Detection: Classifies incoming emails as ‘Spam’ or ‘Not Spam’ by training on a large dataset of labelled examples.
- Medical Diagnosis and Risk Assessment: Predicts a patient’s health risk or even a potential diagnosis by analysing patterns in labelled historical medical data.
- Image Classification: Assigns labels to images, like ‘cat’ or ‘dog’, after training on a vast library of pre-labelled photos.
- Credit Scoring and Fraud Detection: Identifies the risk of fraudulent transactions or loan defaults by analysing patterns in historical financial data.
- Predictive Forecasting (Regression): Forecasts future numerical values like sales, temperatures, or stock prices by analysing past time-series data.
- Recommendation Engines: Suggests new items by predicting a user’s potential rating based on their past activity and preferences. Note that unsupervised learning (see next section) can also be used for recommendation engines by clustering users with other users to determine what other users bought or watched next.
Supervised learning is typically used for two types of tasks: Regression and Classification.
A. Regression: Predicting Continuous Values
Definition: Regression is used when the target variable you want to predict is a simple numerical value.
Use Case: Think about predicting a house price based on its features (square footage, number of bedrooms, location). The price is a value that can fall anywhere along a predictable line.
B. Classification: Predicting Discrete Categories
Definition: Classification is used when the target variable is a discrete category or label. The model’s job is to assign one of a predefined set of labels to a new input.
Types of Classification:
- Binary Classification: There are only two possible outcomes. For example, an email is either “Spam” or “Not Spam”.
- Multi-Class Classification: There are more than two possible outcomes. For example, a system that classifies hospital patients into “Low Risk,” “Moderate Risk,” or “High Risk” categories.
The Role of the Loss Function: During training, a loss function is used to measure the model’s performance. It calculates the difference, or “cost”, between the model’s predictions and the actual target labels. The objective is to adjust the model’s internal parameters to minimise this cost to make the predictions more accurate.
Understanding Thresholds: In many classification models, the initial output is a probability and not the final answer. For example, an email might be assessed as having a 60% chance of being spam. A classification threshold (e.g., 0.5) is used to convert the probability into a final decision, in this example zero or one, yes or no. Adjusting the threshold can make the model more or less sensitive. For instance, increasing the threshold for a spam filter would mean it needs to be more certain an email is spam before classifying it as such, potentially reducing false positives but also potentially allowing some spam to get through.
Parametric vs Non-parametric Algorithms
A parametric model, like Linear Regression, makes a strong assumption about the form of the data, e.g., it assumes the data can be represented by a set of parameters and a straight line graph. It derives a fixed number of parameters, e.g., the slope and intercept of that line, to define the model.
A non-parametric model, like K-Nearest Neighbours (KNN), does not make such assumptions about the data’s underlying structure. Instead of using a fixed set of parameters, it uses the entire training dataset to make predictions. The model’s complexity grows as you add more data, making it flexible but often more computationally intensive.
KNN classifies a new, unseen data point based on the principle that “you are who your neighbours are”. Imagine you want to classify a new email as “Spam” or “Not Spam”. If you set k=7, KNN will find the 7 emails from your labelled training set that are most similar to your new email. If 5 of those 7 neighbours are labelled “Spam” and 2 are “Not Spam”, KNN will predict that your new email is “Spam”.
Unsupervised Learning: Finding Hidden Patterns
Unsupervised learning works with unlabelled data. There is no pre-existing “correct answer” or target variable.
Core Concept: The goal is not to predict an output but to explore and understand the inherent structure and relationships within the data itself.
Primary Application – Clustering: Clustering is a common unsupervised technique where the algorithm groups similar data points together into clusters. Items within a cluster are more similar to each other than to those in other clusters.
Use Cases:
- Customer Segmentation: Grouping customers based on purchasing behaviour to create targeted marketing campaigns.
- Topic Analysis: Analysing customer feedback forms to identify and group common complaints or themes.
- Outlier Analysis: Spotting unusual spending patterns in credit card transactions to identify potential fraudulent usage
Distinction: It’s important to remember that tasks requiring a predefined label, like spam detection, are classed as supervised, not unsupervised.
Reinforcement Learning: Learning Through Trial and Error
Reinforcement Learning (RL) is distinct from the data-driven approaches above.
Core Concept: RL involves an agent that learns to make optimal decisions by interacting with its environment. The agent performs actions and receives feedback in the form of rewards or penalties. Through trial and error, it learns a policy (a strategy) that maximizes its cumulative reward over time.
Use Cases: RL is best suited for dynamic, goal-oriented tasks like training a robot to navigate a room, mastering a strategic video or board game, or optimizing an autonomous vehicle’s driving policy.
Understand Your Data: Choose Your Method
The type of data you have available is a factor in deciding how to analyse it, but the choice of machine learning method, Supervised, Unsupervised, or Reinforcement Learning, actually depends on your goal. Let’s explore how different goals applied to the same data types lead to different training methods.
Working with Time-Series and Sequential Data
Sequential data is defined by its order, where events happen in a defined sequence. Examples include stock prices, server logs, or a user’s click history on a website.
Goal 1: To predict a future value or category
Method: Supervised Learning
How it works: You use the historical sequence as your input features and a future event as your labelled target variable.
Example (Regression): Using the last 30 days of a stock’s price (the sequence) to predict tomorrow’s price (a continuous value).
Example (Classification): Using a customer’s sequence of viewed products to predict whether they will make a purchase (“Yes” or “No”).
Goal 2: To identify unusual patterns or anomalies
Method: Unsupervised Learning
How it works: You don’t have labelled examples of anomalies. Instead, you train a model to understand what “normal” sequential behaviour looks like. Any data that significantly deviates from this learned normal pattern is flagged as an anomaly.
Example: Monitoring a sequence of network traffic data to detect a potential security breach that doesn’t match typical patterns.
Goal 3: To learn an optimal strategy over time
Method: Reinforcement Learning
How it works: An agent makes a series of decisions at each step in the sequence, receiving rewards or penalties to learn the best long-term strategy.
Example: An automated stock trading bot deciding whether to buy, hold, or sell at each time step to maximize its final profit.
Working with Complex Data (Images, Text, Audio)
These forms of data are unstructured, where features aren’t neatly organized in tables and are the primary domain for Deep Learning (which we’ll cover in Module 3).
Goal 1: To categorise or label the data
Method: Supervised Learning
How it works: You need a large, labelled dataset. The model learns to map the complex input (like pixels in an image) to a specific label.
Example (Images): Training a model on thousands of images labelled as “Cat” or “Dog” to classify new images.
Example (Text): Training a model on news articles labelled with topics like “Sports,” “Politics,” or “Technology” to categorize new articles.
Goal 2: To group similar data without predefined labels
Method: Unsupervised Learning
How it works: The model analyses the data and groups it based on inherent similarities it discovers on its own.
Example (Images): An algorithm that processes a personal photo library and automatically groups pictures into clusters of beach vacations, city trips, and family portraits without any prior labels.
Example (Text): Using clustering to analyse thousands of customer reviews to discover the main emerging themes or complaints.
Goal 3: To create new, original data
Method: Generative Models (an advanced form of Unsupervised or Self-Supervised Learning).
How it works: The model learns the underlying patterns and structure of the input data so deeply that it can generate new, synthetic examples that are similar to the original data.
Example: A model trained on portraits of historical figures can generate a new, realistic-looking portrait of a person who never existed. We will explore this in detail in Module 4.
Module 3: Deep Learning & Neural Network Architectures

Introduction
In the previous module, we mentioned that Deep Learning is a powerful subset of Machine Learning used for handling complex data such as images, audio or large amounts of text.
Now we will explore the fundamental building block of deep learning, the Artificial Neural Network, and then examine the specialised architectures designed for specific tasks like analysing images and understanding sequences.
Fundamentals of Artificial Neural Networks (ANNs)
At its core, deep learning is powered by Artificial Neural Networks, which are computational models inspired by the structure of the human brain.
Basic Structure: An ANN consists of interconnected nodes, or neurons, organised into layers. There are three main types of layers:
- Input Layer: This is the network’s entry point. It receives the raw input data for the task, such as the individual pixel values of an image or numerical representations of words in a sentence.
- Hidden Layers: Positioned between the input and output layers, these are the core computational engine of the network. The primary function of hidden layers is to process and transform the inputs, learning to recognise features and patterns. In an image recognition task, the initial hidden layers might learn to detect simple features like edges and colours, while deeper layers learn to combine these into more complex patterns like eyes, noses, or textures. They are responsible for capturing the internal representation of the data. The “deep” in “deep learning” signifies the presence of multiple hidden layers.
- Output Layer: This is the final layer that produces the network’s result, such as a classification label (“Cat”), a numerical prediction (a stock price), or generated text.
Key Deep Learning Architectures
While the basic ANN structure is versatile, specialised architectures have been developed to handle specific types of data with much greater efficiency. Here we will discuss three key types: CNNs, RNNs, and LSTMs.
Convolutional Neural Networks (CNNs): The Visual Experts
Core Concept: CNNs are the go-to architecture for processing grid-like data, most notably images. Their power comes from a specialised type of layer called a convolutional layer.
Function of the Convolutional Layer: The primary function of the convolutional layer is to detect specific features in the input image. It works by sliding small filters (also called kernels) across the image to identify patterns like edges, corners, textures, and shapes, which allows the network to learn a hierarchical representation of visual features automatically.
Primary Use Cases:
- Image Classification: Determining what an image contains.
- Facial Recognition: Identifying individuals from an image or video.
- Object Detection: Locating and identifying multiple objects within an image.
In essence, CNNs are the backbone of most Computer Vision tasks. The goal of computer vision is to enable computers to “see” and interpret the visual world. It focuses on the task of identifying patterns in images and extracting relevant features from them, allowing machines to understand the content of photos, videos, and other visual inputs.
When you encounter a task that involves analysing visual data, like recognising faces, detecting license plates on vehicles, or classifying documents based on their layout, you are dealing with a computer vision problem and will likely use a CNN.
CNNs can be used in combination with other types of learning, such as for self-driving cars, where the output from the CNN would be an input into reinforcement learning algorithms.
Recurrent Neural Networks (RNNs): The Sequence Specialists
Core Concept: RNNs are designed specifically for handling sequential data, where the order of information is critical (e.g., text, time-series data, music).
Key Feature: Unlike other networks, RNNs have a feedback loop which allows information to persist, creating an internal state or “memory” of previous inputs. This memory enables the network to understand context and dependencies across time.
Primary Use Cases:
- Text Completion: Suggesting the next word in a sentence, as in writing poetry.
- Music Generation: Composing a melody where the next note depends on the previous ones.
- Speech Recognition: Transcribing spoken language into text.
Primary Limitation: The memory of a standard RNN is short-term. Due to a technical challenge known as the vanishing gradient problem, RNNs struggle with long-range dependencies, which means their ability to connect information from far back in a long sequence is limited. For example, in a long paragraph, it might forget the context provided in the first sentence by the time it reaches the last.
Long Short-Term Memory (LSTM) Networks: Overcoming RNN Limitations
Core Concept: An LSTM is a special, more advanced type of RNN specifically designed to solve the short-term memory problem. It is the practical solution to the vanishing gradient issue found in standard RNNs.
Key Innovation – The Gating Mechanism: LSTMs introduce a memory cell that can maintain information for longer periods. The flow of information into and out of this cell is controlled by a series of gates:
1. Input Gate: Decides what new information from the current input is important and should be stored in the memory cell.
2. Forget Gate: Decides what information from the past is no longer relevant and should be discarded.
3. Output Gate: Determines what part of the information from the memory cell should be used to produce the network’s output at the current time step.
Primary Benefit: The gating mechanism allows the network to selectively remember important information and forget irrelevant details over very long sequences, which gives it both a long-term and a short-term memory, making it far more powerful for complex sequential tasks than a standard RNN.
Use Cases: LSTMs are used for the same tasks as RNNs (text analysis, translation, speech recognition) but are generally the preferred choice when dealing with longer, more complex sequences where long-term context is vital.
Module 4: Generative AI & Large Language Models (LLMs)

Introduction
The previous modules focused on AI that predicts, classifies, or finds patterns. This module explores one of the most transformative areas of modern AI: Generative AI.
Here, the primary goal is not to predict a label but to create new, original content. We will examine the technology that powers this revolution, Large Language Models (LLMs), and learn the key techniques for controlling and customising their output.
What is Generative AI?
Definition: Generative AI is a category of artificial intelligence that, instead of just analysing existing data, creates diverse new content such as text, audio, and images. Generative models learn patterns from vast, unstructured, and unlabelled data (often a significant portion of the public internet). They do so without human supervision, learning grammar, facts, reasoning abilities, and conversational styles on their own.
Common Applications of Generative AI
The capabilities of Generative AI and LLMs have led to a wide array of powerful applications across many industries. While the underlying technology is complex, its uses are often intuitive and creative. Here are some of the most common applications:
- Content Creation and Augmentation: The most direct application includes writing drafts for emails, marketing copy, blog posts, and even creative works like poetry and scripts. It can also augment human writing by suggesting line completions or rephrasing sentences.
- Advanced Chatbots and Conversational Agents: Unlike older rule-based chatbots, LLM-powered agents can engage in nuanced, context-aware conversations, answer complex questions, and serve as sophisticated customer service representatives.
- Text Summarisation: Generative AI can read and understand long documents, articles, or reports and condense them into concise, accurate summaries, saving significant time and effort.
- Code Generation: Developers use LLMs to generate code snippets in various programming languages, debug existing code, and even translate code from one language to another, accelerating the software development lifecycle.
- Image, Music, and Art Generation: Models can be trained on visual or audio data to create entirely new, original works, including generating realistic images from text descriptions, composing new musical melodies, or creating unique digital art.
- Semantic Search and Knowledge Management: By understanding the meaning behind text, Generative AI can power search systems that find results based on contextual relevance rather than just keyword matching, making it easier to find information in large knowledge bases.
Large Language Models (LLMs)
Large Language Models are the core technology behind most modern text-based Generative AI applications. They are highly advanced neural networks designed to understand and generate human language.
Tokens: Before an LLM can process text, it breaks the input down into smaller pieces called tokens. A token can be a word, part of a word, or even just punctuation. The tokenisation process turns language into a numerical format that the model can work with.
The Transformer Architecture: While the RNNs and LSTMs we saw in Module 3 process text sequentially (one word at a time), LLMs are built on the Transformer architecture, which can process all tokens in a sentence simultaneously, making it vastly more efficient and powerful.
The Attention Mechanism: The breakthrough feature of the Transformer is its attention mechanism (or self-attention), which allows the model, when processing a single word, to look at all the other words in the input sequence and weigh their importance to understand the context.
Think of how you read a sentence. When you see the word “it,” your brain automatically looks for the noun it refers to. The attention mechanism gives the model this same ability, allowing it to resolve ambiguities and track relationships across long sentences and paragraphs far more effectively than an RNN’s memory.
The Balancing Act: Creating a powerful LLM isn’t just about making the model bigger. It requires finding an optimal balance between model size (the number of parameters), data size (the amount of training text), and data quality. A smaller model trained on very high-quality data can often outperform a larger model trained on “noisy” data.
Hallucinations: Large Language Models don’t always get it right, and sometimes they can generate text that is non-factual, nonsensical, or not grounded in their training data. This is an ongoing issue that the AI companies are working on resolving. When asking an LLM for factual information, it should ALWAYS be fact-checked before you use it for learning, commercial, or any other purposes. Powerful though LLMs are, many other models and mechanisms are lining up to overcome these issues.
Working with LLMs: Customisation and Control
A pre-trained LLM is a powerful generalist, but its true value is unlocked when you guide or adapt it for a specific purpose. There are two main approaches: Prompting and Fine-tuning.
Prompting: Guiding the Model without Changing It
This approach involves carefully crafting the input (the prompt) to steer the model’s output in real-time, without altering the model itself.
Prompt Engineering: There is an art and a science to designing prompts to elicit a specific style, format, or type of response from an LLM. Prompt Engineering is a lengthy topic in itself and is the subject of many articles online.
In-context Learning (or Few-shot Prompting): Responses can be improved by providing explicit examples of your task directly within the prompt. For instance, if you want to classify movie reviews, you might give the model two examples of reviews and their sentiment before giving it the new one you want it to classify, which shows the model the exact format you want, dramatically improving its accuracy for your specific task.
Fine-tuning: Adapting the Model’s Knowledge
Fine-tuning is a more advanced technique that involves changing the model’s internal parameters.
Definition: Fine-tuning is the process of retraining a pre-trained model on a smaller, task-specific dataset. The model’s parameters are adjusted to improve its performance, adapt it to a specific domain (like legal or medical language), or align it with a certain conversational tone.
Efficient Fine-tuning: Training all the parameters of a massive LLM is extremely expensive. Modern techniques, like the T-Few method, offer a more efficient solution. Instead of updating the entire model, this approach selectively updates only a fraction of the model’s weights, dramatically reducing the cost and time required for customisation, making it feasible to create specialised models for specific business needs.
Conclusion
So, that’s the basics of AI. Given the information in this article, you should have a better understanding of what you are reading in other stories about AI.
If you intend to carry on and sit the Oracle Cloud Infrastructure 2025 Certified AI Foundations Associate exam, do read through the second article in this series (it is much shorter than this one), which covers some Oracle-specific knowledge around AI functionality offered by the Oracle Cloud.

