Glossary

A
API (Application Program Interface)
A set of programming standards and procedures for accessing or building web-based software applications.
API Marketplaces
An API marketplace is similar to a marketplace that involves two stakeholders. One is engaged in purchasing and the other is engaged in sales. This concept applies to APIs or application program interfaces to create API marketplaces.
Algorithm
A mathematical formula or statistical process used to analyse data.
Application
Software that enables a computer to perform a certain task.
Augmented Analytics
Augmented Analytics refers to the use of machine learning and natural language processing to improve data analysis and sharing.
Auto ML
Automated Machine Learning or AutoML is an end-to-end automation process for applying real machine learning problems.
Automation
Automation is the creation and use of technology to oversee, execute, and control the production of goods and services. This is a process that allows you to perform tasks previously performed by humans.
B
Big Data
Big data describes a large amount of data that is constantly growing. The data is diverse and can include structured, semi-structured, and unstructured data that can be used for machine learning and advanced analytics.
Block chain
A blockchain is a system of records (called blocks) connected in a peer-to-peer network, also known as cryptography. Each block contains the details of the previous block such as cryptographic hashes, timestamps, transaction data, etc.
C
Consumer Analytics
Consumer analytics involves the process of collecting large amounts of customer data. This data can offer companies with valuable understandings about their customers, such as their behaviour, likes and dislikes. This helps companies make informed decisions relevant to marketing and customer correlation management.
Continuous Intelligence
Continuous intelligence refers to real-time analytics data, where continuous business value is derived from all data.
D
Data Centre
A data centre is a collection of computer servers (also switches, firewalls, and routers in some cases) used to store, process, and distribute massive amounts of data.
Data Dashboarding
A data dashboard is a tool that provides an interactive and centralized format for monitoring, measuring, and analysing.
Data Engineering
Data engineering is a branch of data science that deals with the mechanisms of data collection and analysis. Data engineers ensure that the data used by a company is accurate, reliable, and organized.
Data Lakes
A data lake is a system that stores data in its raw format. Unstructured, native, and of various sizes, a file in a data lake has no fixed boundary.
Data Management
Data management refers to everything related to the use of data as an indispensable resource. It encompasses the collection, storage and protection of data to ensure the data remains accessible and reliable.
Data Modelling
Data modelling is the process of creating a data model for the data to be stored in a database. This data model is a conceptual representation of data objects, the associations between different data objects and the rules.
Deep learning
Deep learning is an area of ​​machine learning and artificial intelligence in which algorithms can perform unsupervised learning from large amounts of unstructured data.
Digital Ethics
Digital Ethics is the study of how to conduct oneself ethically, responsibly and professionally across digital platforms.
Digital Transformation
Digital transformation involves the use of technology to transform business processes, the organization’s internal culture, and customer relationship management.
E
Embedded Analytics
Embedded Analytics is a tool that focuses on data analysis and business intelligence and makes it more accessible through various process applications to enable users to work smarter and more efficiently.
H
Historical Data
Historical data refers to data collected in the past. This is often used as a basis for predicting future trends.
I
Innovation
Innovation is the process of developing and adopting new creative production techniques or ways of thinking.
Insights
An insight is an in-depth and accurate understanding of a complex problem.
Integration
Integration is a mathematical term used when an equation requires finding one or more integrals.
Intelligence
Intelligence refers to the ability to understand concepts, make judgments, and apply acquired knowledge.
IoT
IoT (Internet of Things) is the network of interconnected physical devices that are connected to the Internet, have unique identifiers, and can independently transmit data over the network.
IoT Edge Analytics
IoT Edge Analytics is a tool that enables companies and organizations to process data closer to its source. It allows data to be generated by sensor-rich assets or devices, all pre-processed in real-time
J
Juridical Data Compliance
The use of data stored in a country must comply with the laws of that country. Relevant when using cloud solutions with data stored in different countries or continents.
K
KPI
KPI stands for Key Performance Indicator. It is usually used in the context of business/marketing analysis as a type of performance measurement to evaluate the success of an organization or a specific campaign. Examples of KPIs applied to eCommerce: Cart Abandonment Rate, Conversion Rate, Cost of Customer Acquisition, Customer Lifetime Value, Average Order Value, and Gross Profit Margin.
Knowledge Graphs
Knowledge Graph is a knowledge base used by Google to improve the results of its search engines with information from a variety of sources.
L
Latency
Any delay in a response or transmission of data from one point to another.
Load Balancing
The process of distributing workloads across a computer network or cluster of computers to optimize performance
Location Analytics
Location Analytics brings mapping and map-driven analysis to enterprise systems and data warehouses. It allows you to associate geospatial datasets.
Location Data
GPS data describing a geographic location
M
Machine-generated Data
Data created automatically by machines via sensors or algorithms or other non-human sources.
Machine Learning
A method of designing systems that can learn, adapt, and improve based on the data fed to them. Using predictive and statistical algorithms fed into these machines, they continually learn and target correct behavior and insights, and they continually improve as more data flows through the system.
MapReduce
A programming model for processing and generating large amounts of data. This model does two different things. First, the map involves transforming one data set into another, more useful and decomposed data set, made up of parts called tuples. Tuples can typically be processed independently across multiple processors. Second, Reduction takes all the decomposed, processed tuples and combines their output into a usable result. The result is a handy breakdown of processing.
Massively Parallel Processing (MPP)
Using many different processors (or computers) to perform specific computing tasks simultaneously.
Mean
The weighted average of the data. The population mean is denoted by ? (Greek letter mu) and the sample mean is given as x?
Median
The mean of a data set when ordered by order of magnitude.
Metadata
Data about data; it provides information about what the data is about. For example, where data points were collected
Mode
The measurement that occurs most frequently in a data set.
Multi-dimensional Databases
A database optimized for Data Online Analytical Processing (OLAP) applications and data warehousing
Moving Average
The moving average is a technical indicator that allows investors to analyze price action. Based on past prices, the moving average helps iron out price moves and identify trend direction and its resistance levels.
Multivariant analysis
Multivariate analysis is a technique used to analyze data containing two or more independent variables to predict a value of a dependent variable.
N
Network Analysis
Analyzing connections between nodes in a network and the strength of their ties.
Neural Network
Models inspired by the real biology of the brain. These are used to estimate mathematical functions and enable various types of learning algorithms. Deep learning is a similar term and is widely regarded as a modern buzzword that is rebranding the neural network paradigm for today.
NLP
NLP, or Natural Language Processing, is a subfield of linguistics, computer science, information technology, and artificial intelligence that focuses on the interactions between computers and human language.
Natural-Language Generation
Natural Language Generation or NLG is when data is converted to the English language for better understanding. This AI technology is mainly used by companies and organizations that are mainly related to better customer engagement.
Normal Distribution
The most important continuous probability distribution in statistics is the normal distribution (also known as the Gaussian distribution). The normal distribution is the well-known bell curve. Once m and s are specified, the entire curve is determined.
NoSQL (Not ONLY SQL)
A broad class of database management systems characterized by non-compliance with the widely used relational database management system model. NoSQL databases are not primarily based on tables and generally do not use SQL for data manipulation. Database management systems designed to handle large amounts of data and are often well suited to big data systems due to their flexibility and distributed-first architecture required for large unstructured databases.
Null Hypothesis
statement of no change or difference; assumed to be true until sufficient evidence is presented to disprove it.
O
Operational Databases
Databases that perform regular operations of an organization that are generally very important to the business. They typically use online transaction processing that allows them to enter, collect, and retrieve specific information about the organization.
Optimization Analysis
The process of finding optimal problem parameters subject to constraints. Optimization algorithms heuristically test a variety of parameter configurations to find an optimal result, which is determined by a characteristic function (also called a fitness function).
Outlier Detection
An object that deviates significantly from the general average within a data set or combination of data. It’s numerically distant from the rest of the data, so it indicates something is out of the ordinary and generally needs additional analysis
Open CV
Open Computer Vision is one of the most popular open source libraries for real-time image processing and machine learning.
Optimization
Optimization refers to the process of increasing achievable performance by eliminating undesirable factors. It works on the principle of finding an alternative despite limitations, all with minimal cost and time.
Ordinal
Ordinal is used to describe the order in which something relates to others of its kind.
P
PaaS
PaaS is short for Platform as a Service.Businesses use PaaS to host the applications used in the day-to-day work without needing to purchase and manage the infrastructure.The infrastructure may include servers, networks, and operating systems
Pattern Recognition
Identifying patterns in data using algorithms to make predictions about new data from the same source.
Population
A record consisting of all members of a group. Descriptive parameters (e.g. ?, ?) are used to describe the population.
Predictive Modelling
The process of developing a model that is most likely to predict a trend or outcome.
Probability
Distribution: A statistical function that describes all possible values and probabilities that a random variable can take on within a given range. Probability distributions can be discrete or continuous.
Public Data
Public information or a record created with public funds
Q
Qualitative
This refers to the use of analysis or judgment based on information that is not measurable or non-quantifiable. This is not driven by numbers and therefore relies heavily on social interactions, experimentation and other intangible approaches
Quality Assurance
This term is used to ensure the quality of a service or product. Particular reference is made to the belief that all requirements are met. Quality assurance involves ensuring that errors and defects are avoided, problems are resolved, and quality is kept under control.
Quantile Deviation
It’s a measure of dispersion that essentially tells you how data is distributed around a central point.
Quantile Range Outliers
The Quantile Range Outliers method for outlier detection uses the quantile distribution of the values in a column to locate the extreme values. Quantiles are useful for detecting outliers because there is no distributional assumption associated with them. The data is simply sorted from smallest to largest. For example, the 20th percentile is the value where 20% of the values are smaller. Extreme values are found using a multiplier of the interquartile range, the distance between two specified quantiles.
Query
Asking for information to answer a certain question
R
Real Time Data
Data created, processed, stored, analyzed and visualized within milliseconds.
Regression Analysis
Regression is a statistical measure used to determine the relationship between the mean of one variable (dependent) and the corresponding values of the other variable (independent).
Regular Expressions
Regular expressions are essentially defined as strings that help pattern match against strings to define a search pattern
Reinforcement learning
Reinforcement learning is a machine learning paradigm that focuses on taking action to maximize reward. It asks software and machines to determine the best method, behavior, or path, depending on what the situation calls for.
Resource
Resource refers to the main supply required to perform a specific activity. It can also be measured or qualitative.
Robotic Process Automation Software
A robotic process automation software is a type of technology that allows anyone to configure computer software or a robot to emulate and integrate the actions of a human interacting with digital systems to carry out a business process. In other words, it is software for artificial intelligence workers.
S
Sample
: A dataset consisting of only a subset of the members of a population. Sample statistics are used to draw conclusions about the entire population from the measurements of a sample.
Scalability
The ability of a system or process to maintain an acceptable level of performance as workload or scope increases.
Semi-structured Data
Data that is not structured by a formal data model but offers other ways (tags or other markings) to describe the data hierarchies.
Sentiment Analysis
The application of statistical functions and probability theory to comments people make online or on social media to determine how they think about a product, service, or company
Significant Difference
The term used to describe the results of a statistical hypothesis test where a difference is too large to reasonably be attributed to chance
Single-variance Test (Chi-square Test)
Compares the variance of a sample of data to a target value. Uses the chi-square distribution.
Software as a Service (SaaS)
Enables vendors to host an application and make it available via the internet (cloud servicing). SaaS providers provide services over the cloud rather than hard copies.
Spark (Apache Spark)
): A fast, open-source, in-memory data processing engine to efficiently run streaming, machine learning, or SQL workloads that require rapid iterative access to datasets. Spark is generally much faster than MapReduce.
Spatial Analysis
Analyzing spatial data, such as geographic data or topological data, to identify and understand patterns and regularities within data distributed across a geographic space.
SQL (Structured Query Language)
A programming language for retrieving data from a relational database. Stream Processing: Stream processing is designed to operate on real-time and streaming data with continuous queries. Combined with streaming analytics (i.e. the ability to continuously compute mathematical or statistical analysis during the stream), stream processing solutions are designed to process large volumes in real-time.
Structured Data
Data organized according to a given structure.
Sum of Squares
In ANOVA, the total sum of squares helps express the total variation that can be attributed to different factors. From the ANOVA table, %SS is the factor sum of squares divided by the total sum of squares. Similar to R2 in regression.
T
Text Analytics
is the process of drawing meaning out of written communications, often used in the context of enhaving customer experience. This involves examining text and finding patterns or interests that would influence the action.
Text Mining
Text mining is a method for analyzing unstructured data. In other words, it derives high-quality information from texts. The amount of data is large, so it is processed and structured using software that identifies patterns, themes, keywords, etc.
Time Series
Time Series refers to a sequence of numerical data indexed at equally spaced points. This is used to track the movement and progress of each of the data points
Time Series Modelling
Time series refers to a sequence of numeric data indexed at equally spaced points. This is used to track the movement and progress of each of the data points. Time series modeling is a method of forecasting or forecasting that uses time-based data to gain further insights. Time-based data includes data that has been tracked and analyzed for a set period of time in order to later make an informed decision.
Trend Analytics
Trend Analytics are data analytics that are bound by two points. It’s a business-driven approach that helps in capitalizing at the right time based on current trends.
Transactional Data
Data that relates to the conducting of business, such as accounts payable and receivable data or product shipments data.
Two Sample t-test
A statistical test to compare the means of two samples of data. Uses the t-distribution.
Type I Error
The error that occurs when the null hypothesis is rejected when it is actually true.
Type II Error
The error that occurs when the null hypothesis is not rejected when in fact it is false.
U
Uni Variate
A mathematical term, univariate, is used to describe data composed of observations based on a single characteristic
Uniform
Uniform is a term used to ensure that a group’s things are standardized or identical.
Union
In terms of data, a union is a user-defined data type available in C that contains variables of other data types in the same memory location.
Univariate analysis
This form of data analysis is only performed on a single variable. In such an analysis, each variable is a data set that is carefully examined and summarized.
Unknown Variable
A value of a variable that is unspecified, mysterious, or arbitrary is an unknown variable.
Unstructured Data
Unstructured Data refers to information that aren’t properly arranged in a database. Unstructured data includes texts and multimedia content that does not fit neatly in a database
Unsupervised learning
Unsupervised learning is when an AI (Artificial Intelligence) algorithm or machine is learned without the presence of labels producing any output. The algorithm works on its own, based on the information provided, without any guidance or help.
V
Variable
A variable refers to a numeric value, property, or quantity that increases or decreases depending on the situation
Venn Diagram
A Venn diagram refers to a mathematical quantity that is represented pictorially. It appears as multiple overlapping closed curves used to organize information visually.
Video/Image Analytics
Video/Image analytics are tools that categorizes images and videos from social media, and sorting them according to everything applied to text: gender, age, facial expressions, objects, actions, scenes, topics, sentiment and brand logos.
Virtual Assistants
This is a term used to describe a specific profession practiced by people that allows them to do administrative, technical, or creative work from the comfort of their home or other remote locations.
Virtual Reality
Virtual reality refers to a simulated environment created using computer technology in which the user is simply immersed in the experience
Visualization
Visualization is a graphical method used to communicate a message, often through images, animation, or charts.
W
Web Analytics
Web analytics is the analysis of data based on the behavior of visitors on a particular website. It is basically used to optimize websites to attract more visitors.
Web Design
Web design is simply defined as the process of creating websites. It takes into account various aesthetic factors including layout, content and graphics.
Z
Zero
is both a number and the numeric digit used to represent that number in digits. The number 0 fulfills a central role in mathematics as the additive identity of integers, real numbers, and many other algebraic structures. As a digit, 0 is used as a placeholder in place value systems.

Let’s Talk Business

Our expert team are proud to be leading the way in global big data consultancy. Get in touch today to find out more information.