The Data, Analytics and AI Glossary
Plain English for the terms that come up when you are trying to get real value from your data. A plain definition, with an example or a note on why it matters where it helps.
Agreed definitions
Writing down, in plain business terms, what your core measures mean - an active customer, net revenue - so every number agrees. It is the least technical and most valuable part of getting data right.
Settling once what an active customer is, before anyone reports on it.
Why it matters: Most reporting disputes are really arguments about definitions nobody ever wrote down.
AI agent
An AI that does not just answer but takes actions on your behalf, stringing several steps together towards a goal. It is the direction much of AI is heading, and the point at which governance stops being optional.
An assistant that reads a request, looks up the data, and drafts the reply - in one go.
Why it matters: The more an agent can do, the more it matters what it is allowed to touch.
AI governance
The rules and oversight that keep AI use safe, fair and accountable: what it can access, how its answers are checked, and who is responsible. As AI does more, this stops being optional.
Deciding which data an assistant may see, and how its answers are reviewed, before you switch it on.
Why it matters: The businesses that adopt AI well are the ones that set the guardrails first, not after.
Analysis Views
A Business Central feature for slicing transactions by dimension inside the system, without a separate reporting tool. Useful for quick views close to the ledger.
Viewing sales by department without leaving Business Central.
Analytics
Turning data into answers you can act on. It spans a ladder, from describing what happened, through explaining why, to predicting what comes next and recommending what to do. Most businesses live on the bottom rung and have far more to gain higher up than they realise.
Knowing sales fell is data; knowing which customers are about to lapse, and calling them, is analytics.
Analytics Acceleration Programme
AAPHopton's way of delivering analytics in three stages: getting the foundations right, proving value quickly on a real decision, and handing over a platform your own team can run. It is built to make you capable, not dependent.
The path from a standing start to your own people running the platform.
Why it matters: It is deliberately the opposite of a retainer you can never leave.
Anomaly detection
Spotting values that do not fit the normal pattern, which can flag errors, fraud or genuine change early. It watches the data so people do not have to.
An alert when a cost code jumps far outside its usual range.
Apache Spark
SparkAn engine for processing very large amounts of data quickly across many machines. It sits under platforms like Microsoft Fabric and Databricks, and matters when your volumes are truly large.
Crunching billions of rows that would overwhelm an ordinary database.
Why it matters: It is a heavyweight tool; needing it is a sign of real scale, not a default.
Artificial intelligence
AIA broad term for software that performs tasks we associate with human intelligence, such as recognising patterns or understanding language. Most useful business AI today is machine learning, or the assistants built on it.
An assistant that summarises a report, or a model that predicts demand.
Azure Data Factory
ADFMicrosoft's data movement and orchestration service, used to schedule and run the flows that move and transform data, often into a warehouse. Its capabilities also live inside Fabric.
The scheduled flows that pull yesterday's orders from each source into the warehouse overnight.
Azure SQL
A managed cloud database from Microsoft, often the right-sized warehouse for a mid-market business that does not yet need Fabric. Reliable, well understood, and inexpensive at modest scale.
A tidy Azure SQL warehouse behind Power BI handles a great many mid-market reporting needs.
Why it matters: Plenty of businesses reach for a big platform when a clean Azure SQL database would have done the job for far less.
Azure Synapse Analytics
Microsoft's previous-generation analytics platform, much of which has been folded into Microsoft Fabric. You will still meet it in established estates.
A business that built on Synapse a few years ago, now weighing a move to Fabric.
Basket analysis
Finding which products are bought together more often than chance would predict, to inform ranging, merchandising and promotions. It works on the till and order data you already keep.
Two lines that sell together far more than their individual popularity would explain.
Why it matters: The skill is telling a real pattern from a coincidence; bread and milk sell together because everyone buys both.
Benchmark
Comparing a number against something meaningful: a peer group, an industry norm, or your own past, so you can judge whether it is good, bad or about right. A figure on its own rarely tells you much; a figure in context does.
A 30-day debtor figure means little until you know your sector typically runs at 45.
Why it matters: Without a benchmark, every number is just a number, and targets get set by gut feel.
Big data
Data that is too large, too fast-moving or too varied for ordinary tools to handle. The phrase is less fashionable than it was, and most mid-market businesses have a tidy-up problem rather than a big data problem.
Sensor readings from a production line, arriving by the second, are big data; a few years of orders are not.
Why it matters: Most businesses reaching for big data tools would get further by getting their ordinary data in order first.
Business Central
BCMicrosoft's ERP for small and mid-sized businesses, running finance, sales, purchasing, inventory and operations. It is where many mid-market businesses create the data they later want to report on.
Where the orders, invoices and ledger entries are created in the first place.
Why it matters: Reporting on it well is a particular skill, because the data is structured for running the business, not for analysis.
Business intelligence
BITurning business data into reports, dashboards and answers that support decisions. For most mid-market businesses, in practice, it means Power BI sitting on a clean model.
The monthly board pack, and the dashboards behind it, are business intelligence.
Why it matters: BI tells you what is happening and why; predicting what comes next is where analytics and AI take over.
Calculated column and measure
Two ways to add calculations in Power BI. A calculated column is worked out row by row and stored; a measure is worked out on the fly as you slice and filter. Choosing the wrong one is a common cause of slow, bloated models.
Margin per line is often a column; total margin for whatever you have selected is a measure.
Certified dataset
Marking a model or dataset as trusted and endorsed, so people know which version to build on. It is how you stop the spread of rival copies.
A certified sales model everyone is told to use, ending the quiet proliferation of near-duplicates.
Chart of accounts
The structured list of accounts a business records its finances against. Consolidating several companies means lining these up first.
Two companies using different codes for the same cost have to be mapped together before group totals mean anything.
Why it matters: Mismatched charts of accounts are the first and largest job in any multi-company consolidation.
Churn prediction
A model that scores which customers are drifting away, learned from the history of those who left and those who stayed, so you can act before they go.
A weekly list of at-risk accounts for the retention team to call.
Why it matters: Keeping a customer is far cheaper than winning one, and a few weeks of warning is usually enough.
Classification and regression
Two common kinds of prediction. Classification puts something into a category; regression predicts a number.
Copilot
Microsoft's AI assistant built into Power BI and other tools, which summarises, drafts reports and answers questions in plain language. It is helpful, and only ever as good as the model beneath it.
Asking Copilot to summarise a report, draft a page, or explain a measure.
Why it matters: Point it at a clean model and it shines; point it at a mess and it produces confident, wrong answers.
Data
Recorded facts about your business: sales, customers, orders, stock, the lot. On its own it is raw material, neither useful nor useless until something is done with it, and most businesses are sitting on far more of it than they use.
Every till receipt, invoice and service record you keep is data, whether or not anyone is looking at it.
Why it matters: You are almost never short of data. You are short of data that is clean, joined up and trusted.
Data governance
The agreements and controls that make data trustworthy: what terms mean, who can see what, and how a number can be traced. It is what separates data people trust from data they argue about.
Without it, every report quietly invents its own version of the truth.
Why it matters: Governance is not bureaucracy for its own sake; it is the difference between confidence and chaos.
Data lake
A large, cheap store that holds raw data of any shape, before it is cleaned and structured. It is where data can land first and be refined later.
Years of raw logs and exports kept as-is, ready to be shaped when a need arises.
Data lineage
The traceable path from a number back through the model to its source, so any figure can be explained and defended.
When someone senior challenges a number, you can show exactly how it was calculated.
Why it matters: An answer you cannot trace is an answer you cannot defend, which is a problem the moment AI starts producing them.
Data mart
A smaller, focused slice of a warehouse, built for one team or subject area. It gives a department what it needs without the whole warehouse.
A finance data mart holding just what the finance team reports on.
Data maturity
How far along a business is in using data well, from spreadsheets and gut feel to governed, predictive analytics. Knowing where you sit decides the right next step.
A business reporting by spreadsheet is at a different stage from one running churn models, and needs different things next.
Why it matters: Most failed projects tried to skip a stage; maturity is the map that stops you doing that.
Data mining
Sifting large amounts of data for patterns and relationships that are not obvious, often as a first step towards predictive work.
Finding that two product lines sell together far more often than expected.
Data modelling
Structuring data so it is easy and fast to report on: deciding the tables, the relationships and the definitions. It is the difference between a model that is a pleasure to work with and one that fights you.
Shaping raw sales, customer and product tables into a clean star schema.
Why it matters: Most reporting problems are modelling problems wearing a different hat.
Data pipeline
An automated flow that moves and transforms data from a source into your warehouse or lakehouse, on a schedule, without anyone exporting and pasting.
A nightly pipeline pulling yesterday's orders from Business Central into the warehouse.
Data quality
Whether your data is accurate, complete, consistent and current. Poor quality is the single most common reason analytics and AI projects disappoint.
If half your customer records have no postcode, any analysis by area starts compromised, and no clever tool will rescue it.
Why it matters: AI does not fix poor data, it amplifies it - and does so with great confidence.
Data residency
Where your data is physically stored, which can matter for legal, contractual and compliance reasons.
Keeping UK data in a UK region to satisfy a customer's requirement.
Data strategy
A short, plain plan for how data will earn its keep: what matters, in what order, and how it will be governed. Best kept to a single page.
A one-page plan the board reads, not a binder nobody opens.
Why it matters: A strategy nobody reads changes nothing; brevity is what makes it real.
Data visualisation
Presenting data as charts, maps and visuals so a pattern can be seen at a glance rather than read from a table. Good visualisation makes the point; poor visualisation buries it.
A single trend line that shows in a second what a page of figures would take minutes to find.
Why it matters: The aim is the fastest path to understanding, not the most impressive-looking chart.
Data warehouse
A database built and structured specifically for reporting and analysis, as opposed to the systems that run the business day to day. It brings sources together and shapes them for fast, consistent reporting.
Sales, finance and stock brought into one place and modelled for reporting.
Databricks
A data and AI platform built around large-scale engineering and data science. It is strong at what it is built for, and for most mid-market businesses it is more platform than the work requires.
Heavy machine learning across very large datasets.
Dataflow
A lower-code way to pull and shape data in the Microsoft stack, often used to feed a model without building full pipelines. A useful middle ground for smaller needs.
A dataflow that cleans a supplier's spreadsheet feed before it reaches reporting.
DAX
The formula language used to write measures and calculations in Power BI. It is powerful and famously easy to get subtly wrong, which is why calculations belong in one governed model rather than scattered across reports.
A year-to-date sales figure, written once as a DAX measure and reused everywhere.
Why it matters: Two people writing the same measure slightly differently is how a business ends up with two versions of revenue.
Deep learning
A kind of machine learning using layered networks loosely inspired by the brain, behind much of modern AI including language and image tools. It is powerful and data-hungry.
The technology underneath an assistant that understands written language.
Delta Lake
A storage format that brings reliability and structure to a data lake, underpinning Microsoft Fabric and Databricks. It is plumbing you rarely deal with directly.
The format your Fabric tables quietly sit on.
Descriptive analytics
Reporting on what has already happened. It is the most common kind of analytics and the place nearly every business starts, and done well it is valuable in its own right.
Last month's sales by region, set against the month before.
Diagnostic analytics
Going a step beyond description to understand why something happened, usually by slicing and comparing until the cause shows up.
Sales dropped, and the diagnostic view reveals it was one region and one product line, not the whole book.
Dimension
The descriptive context you slice your numbers by: customer, product, region, date. Dimensions are the 'by' in sales by region by month.
Dimensions
Tags such as department, project or region applied to transactions in Business Central, used to analyse the numbers by category later.
Tagging every cost with a department, so you can report by department afterwards.
Direct Lake
A Fabric feature that lets Power BI read large data straight from OneLake at speed, without importing it or running slow live queries. It combines the speed of import with the freshness of live data.
Fast reporting over very large tables, without the wait of a full refresh.
Why it matters: It is one of the real reasons a larger business might move to Fabric, rather than the brochure.
DirectQuery and Import
Two ways Power BI gets at data. Import copies it into the model for speed; DirectQuery leaves it in the source and queries it live. Import is usually faster and the right default; DirectQuery suits very large or very current data.
Most mid-market reports run on Import, refreshed on a schedule; DirectQuery is the exception, not the rule.
Dynamics 365
D365Microsoft's family of business applications, of which Business Central is the ERP for small and mid-sized businesses.
Business Central for finance and operations, alongside Dynamics 365 Sales for the pipeline.
ERP
Enterprise Resource PlanningThe core system that runs the business: finance, sales, purchasing, stock, operations. For many mid-market firms that system is Business Central.
The system your finance team lives in day to day.
Establish, Build, Continuity
The three stages of the Analytics Acceleration Programme. Establish gets the foundations and definitions right; Build proves value on a real decision; Continuity hands ownership to your team.
Foundations first, then a working result, then it is yours to run.
ETL and ELT
Extract, transform, load: the work of getting data out of source systems, cleaning and shaping it, and landing it where reporting can use it. ELT simply reorders the steps. Either way it decides whether anything downstream is trustworthy.
Pulling, tidying and combining four systems into one clean reporting layer.
Why it matters: It is the least glamorous and most important part of any analytics build, and the first thing skimped under time pressure.
Fabric capacity
F-SKUThe unit you buy Microsoft Fabric in, sized from small (F2) upwards. From F64 and above, report viewers can read content without their own paid licence - which is the threshold that often decides the cost.
Below F64 you licence every viewer; at F64 and above viewing is included in the capacity.
Why it matters: Sizing the capacity wrong, in either direction, is one of the most expensive mistakes in the Microsoft data world.
Fabric data agent
The Fabric feature that lets people ask questions of governed data in plain language and get answers back. The value is the trusted model underneath, not the chat box on top.
Asking which region grew fastest last quarter, in words, and getting the right number.
Fact table
The table holding the events you measure, such as sales lines or transactions. It is usually the largest table in the model and the thing your measures add up.
Financial reporting
Business Central's built-in tool for building financial statements from the chart of accounts. Good for ledger-accurate finance views, limited beyond them.
A profit and loss run straight from BC - fine until you need history, other modules, or design.
Why it matters: It does finance statements well and was never meant to carry the whole business's reporting.
Forecasting
Projecting what is likely to happen, such as demand or cash, from patterns in your history, and expressing it as a range rather than a single figure. It runs on the sales history you already keep.
A forecast of next month's demand per line, with a sensible high and low.
Why it matters: A range you can plan against beats a single number that is always wrong by some amount.
Generative AI
GenAIAI that creates new content - text, images, code - rather than predicting a number or category. The assistants most people now mean by AI are generative.
Drafting a summary, an email, or a first pass at a report.
Why it matters: It is fluent and confident, which is both its strength and its risk; it needs grounding in your own trusted data.
Hallucination
When an AI gives a confident answer that is simply wrong. The risk is highest when it is pointed at ungoverned, missing or contradictory data.
A clean, persuasive sentence quoting a revenue figure three departments would dispute.
Why it matters: The danger is not that it is wrong - it is that it is wrong fluently, which makes people trust it more.
Incremental refresh
Refreshing only the data that has changed rather than reloading everything, which keeps large models fast and cheap to update.
Refreshing only the last few days of sales each night, rather than three years of it.
Intercompany
Trading between your own companies, which has to be identified and removed before group figures make sense, or you count the same money twice.
One subsidiary selling to another, netted out of the consolidated accounts.
Why it matters: Forget it and the group looks bigger and more profitable than it is, which auditors notice.
KPI
Key Performance IndicatorThe handful of numbers that show whether the business is on track. The skill is keeping the list short, because a hundred indicators is the same as none.
Monthly recurring revenue, debtor days and gross margin: the three the board acts on.
Why it matters: Most dashboards track far too much; the value is in choosing the few that change a decision.
Lakehouse
A store that combines the flexibility of a data lake with the structure of a warehouse, suited to larger or semi-structured data. It is a core building block of Microsoft Fabric.
Useful when you have volume and variety that a tidy warehouse alone would struggle with.
Large language model
LLMThe kind of AI behind assistants like Copilot, trained on vast amounts of text to understand and generate language. It is excellent with words and is not, by itself, a source of truth about your business.
The technology that lets you ask a question in plain English and get a written answer.
Machine learning
MLLearning patterns from your history to say something useful about what happens next. It is prediction from data, not reasoning, and it learns from the records you already keep.
Learning from the customers who left to spot the next ones at risk.
Why it matters: You usually do not need new data or a data scientist to start - you need a clear question your data can answer.
Master data
MDMThe core reference data of the business, such as the definitive list of customers or products, kept consistent so everything else lines up.
One agreed customer list, rather than three that nearly match.
Why it matters: Much of what looks like a reporting problem is really a master data problem in disguise.
Measure
A calculation in the model, such as gross margin or year-to-date sales, defined once and reused everywhere it is needed. Agreeing your measures is most of what makes reporting trustworthy.
One agreed margin measure, used across a dozen reports, rather than a dozen slightly different versions.
Medallion architecture
A common way to organise data in layers, from raw (bronze) through cleaned (silver) to ready-for-reporting (gold), so quality improves at each stage. It keeps a build orderly and traceable.
Raw exports land in bronze, are cleaned into silver, and shaped for reports in gold.
Metric
Any number you measure. A KPI is a metric that matters enough to act on; the great majority of metrics are not KPIs and never should be.
Microsoft Fabric
Microsoft's unified data platform, bringing a lakehouse, warehouse, data engineering, real-time and Power BI together on one capacity and one bill, inside your own tenant. It is where Microsoft is heading, which is not the same as where every business needs to be today.
Instead of separate tools stitched together, one platform under one roof.
Why it matters: Adopting it early, before the work demands it, is a common and expensive mistake.
Model
In AI, the thing that has learned the pattern and produces the prediction. It is built once from history, then used again and again, and kept up to date as the world moves.
A churn model that scores every customer each night.
Model Context Protocol
MCPAn emerging standard for connecting AI assistants to your tools and data in a controlled way. It is how agents are given safe, governed access to act.
Letting an assistant query a governed dataset without handing it the keys to everything.
Natural language analytics
Asking questions of your data in ordinary words and getting an answer, rather than building or reading a report. It depends entirely on a clean semantic layer to be trustworthy.
Point it at a governed model and it helps; point it at a mess and it answers wrongly, with confidence.
Why it matters: The demo always impresses; what decides whether it is safe is the model nobody demos.
Natural language processing
NLPThe branch of AI concerned with understanding and generating human language. It is what lets you ask questions in plain words and have software understand them.
Typing a question to a report and getting an answer back.
Object-level security
Hiding whole tables or fields from people who should not see them, as opposed to hiding individual rows. It works alongside row-level security.
Hiding the salary table entirely from everyone outside HR and finance.
On-premises data gateway
A small piece of software that lets Power BI in the cloud reach data that still lives on your own servers, securely. It is the bridge between on-premises systems and cloud reporting.
Refreshing a Power BI report from a database that still sits in your own server room.
OneLake
Fabric's single storage layer, where data lives once and every Fabric tool reads from the same copy, rather than each keeping its own.
Your Business Central and sales data land in OneLake and are used everywhere without copying.
Power BI
Microsoft's reporting and analytics tool. It connects to your data, builds a model with agreed definitions, and turns that into reports people can read and explore. For most mid-market Microsoft businesses it is the natural choice.
The dashboards your board reviews each month are usually built in Power BI.
Why it matters: It is only ever as good as the model beneath it, which is where the real work sits.
Power BI Desktop and Service
The two halves of Power BI. Desktop is the free Windows application where reports are built; the Service is the cloud where they are published, shared and refreshed.
An analyst builds in Desktop, then publishes to the Service for the business to read.
Power BI Pro
The standard per-user Power BI licence, needed to create content and, below an F64 capacity, to view shared content. It is the default licence for most users.
Below a capacity, every person reading reports needs a Pro licence.
Why it matters: The maths between licensing everyone on Pro and buying a capacity is exactly where money is won or lost.
Power Platform
Microsoft's low-code family - Power BI, Power Apps, Power Automate and Power Pages - for reporting, building apps and automating work without heavy development.
A Power Automate flow that emails a report the moment a target is missed.
Power Query
The tool inside Power BI for cleaning and shaping data before it reaches the model. It is the unglamorous step that decides whether the model is sound.
Trimming, renaming and joining raw exports into tidy tables before any reporting is built.
Predictive analytics
Using your history to say what is likely to happen next. This is where machine learning starts to earn its keep, and where data you already hold becomes forward-looking.
Scoring which customers are most likely to lapse next quarter, so you can act first.
Why it matters: A few weeks of warning is usually enough to change the outcome, which a backward-looking report can never give you.
Prescriptive analytics
The furthest rung: not just what will happen, but what to do about it. It pairs a prediction with a recommended action.
Not only that a line will run short, but how much to reorder and when.
Prompt
The instruction or question you give an AI assistant. Clearer, more specific prompts get noticeably better answers.
Pyramid Analytics
A decision intelligence platform that brings data preparation, analytics and data science together in one governed, no-code environment. In March 2026 it was acquired by ServiceNow, which is folding its semantic and modelling layer into the ServiceNow AI platform.
One governed model serving a business reader, an analyst and a data scientist - and now the semantic layer that grounds ServiceNow's AI agents in agreed definitions.
Why it matters: As Pyramid Partner of the Year, Hopton can give a balanced view on what the new direction means for existing users.
Relationship
The link between tables that lets the model combine them, so a sale knows which customer and product it belongs to. Relationships are what turn separate tables into a model.
Report and dashboard
A report is a set of pages you explore and drill into; a dashboard is a single summary view of the headlines. The distinction helps you design for the audience.
A one-page board dashboard, backed by a detailed report the team drills into.
Retrieval-augmented generation
RAGGiving an AI assistant your own trusted documents or data to answer from, rather than relying only on what it learned in training, so answers are grounded in your facts.
An assistant that answers from your own policies and reports, not the general internet.
RFM
Recency, Frequency, MonetaryA simple, robust way of segmenting customers by how recently and how often they buy and how much they spend. It is often the best first step into predictive work.
Spotting your best customers, and the ones quietly slipping away.
Row-level security
A rule that limits which rows of data each person can see, tied to their identity, so one report shows each user only what they are allowed to.
A regional manager opens the company report and sees only their own region's figures.
Why it matters: Get it wrong and a friendly report becomes the quickest route to someone seeing what they should not.
Self-service BI
Letting people answer their own questions from a trusted model, rather than queue for the data team. It works well when the model underneath is governed, and descends into chaos when it is not.
A manager builds their own view from certified data without waiting on anyone.
Why it matters: Self-service on an ungoverned estate just multiplies the number of conflicting versions of the truth.
Semantic model
The shared layer where your data is modelled and each measure is defined once. It is the real asset in any Power BI estate, because every report, and every AI assistant, reads from it.
Define net revenue once in the model, and every report - and Copilot - uses the same figure.
Why it matters: The dashboards are disposable; the model is the thing worth investing in and protecting.
Single source of truth
One agreed place each number comes from, so two people asking the same question get the same answer. In practice it usually means one governed model that every report and assistant reads from.
When finance and sales quote the same revenue figure without arguing, you have one; when they do not, you do not.
Why it matters: Most reporting disputes are not about the data - they are about there being several versions of it.
Snowflake
A cloud data warehouse known for scaling storage and compute separately and working across clouds. Like Databricks, it answers a problem of scale that most mid-market businesses do not yet have.
Large-scale warehousing, or sharing data across organisations.
Star schema
A way of structuring data for reporting, with fact tables in the middle surrounded by dimension tables. It is the foundation of a model that is both fast and easy to understand.
A sales fact table linked to customer, product and date dimensions.
Why it matters: Most slow, confusing models are slow and confusing because they ignored this simple shape.
Structured and unstructured data
Structured data sits in neat rows and columns, like a sales ledger. Unstructured data does not, like emails, documents, photos or call notes.
Your order lines are structured; the free-text notes your service team types are unstructured.
Supervised and unsupervised learning
Two broad styles of machine learning. Supervised learning trains on examples with known outcomes; unsupervised learning finds structure without being told the answer.
Training data
The history a model learns from. For most businesses this is the sales, service and order data already sitting in their systems, not something new to gather.
Three years of orders is the training set a demand model learns from.
Why it matters: You have been collecting the training set for years; the barrier is rarely volume, it is whether it is clean and joined up.
UK data protection
UK GDPRThe rules governing personal data in the UK. They apply in full to anything AI does with personal data; the technology gets no exemption.
An assistant answering questions about customers still has to respect access, consent and security.
Why it matters: The most common and avoidable breach is people pasting business or client data into public AI tools.
Vector database
A store designed to find things by meaning rather than exact match, which is how AI assistants retrieve the right information to answer from.
Finding the three most relevant documents to answer a question, even when they use different words.
When plain English is not enough
Knowing the words is a start.
Turning your own data into something the whole business can trust is the work. If any term in here raised a question about your own data, we are happy to talk it through.
Book a free analytics audit