Help Me Understand Data Warehouse, Data Mart, ETL, ELT, Data Lake, DaaS …

Databases and data management has its own set of technical terminology, descriptive terms, and jargon. When dealing with databases (and related processes such as ETL) it’s important to understand what these terms mean and the processes they refer to. This is important not just from a technological perspective but also from a business one. Understanding specific industry terms allows you to better select the services your company requires and what the scope of these services should be. This is important because many companies waste money every year on services they don’t need and features they don’t use. The opposite is also true as more than one company has had severe negative business impacts from not having the database infrastructure they need to perform daily tasks.

What follows is a brief lexicon of common industry terminology. The information provided is meant to be a general overview and not get too detailed on the technological matters at hand. Remember you don’t need to know everything about how a system or service works but understanding what it is on a basic level is highly valuable.

A Brief Lexicon Of  Database Service Terminology

  • The Cloud: a term that’s seemingly impossible to escape in modern networking the cloud simply refers to how data is accessed. Older databases were localized to the physical location of the hardware itself. The cloud simply refers to accessing information via an internet connection. Modern databases can be accessed anywhere (even if the database hardware is thousands of miles away) via the internet. Note that proper software and access configuration (username/password) is still required.

  • DaaS (Data As A Service): chiefly a marketing term this refers to companies offering data based solutions such as cloud services, databases, and related services as a product. Data itself isn’t so much the service as the care, management, and organization of the data itself is what is truly being sold. May also overlap with SaaS (Software as a Service) due to associated database management software many service providers include with their data management services.

  • Data Availability: all database service providers state how your data will be available. This is commonly stated in uptime and ability to respond to outside events such as natural disasters without losing access to information. Some providers can keep service availability in rather extreme conditions however, such services are normally more expensive.

  • Database: a database is simply a centralized hub where information is stored and accessed by a company’s employees. Databases can range greatly in scope from a small database accessed by a dozen employees to huge redundant storage systems accessed by thousands of different users every day. Data Marts and Data Warehouses refer to specific types of databases.

  • Data Mart: a data mart can be thought of as a specialized database. It contains information related to a specific topic, department, or area of interest to your company. As data marts are designed for specific user groups they only contain relevant information. For example, a data mart designed for your HR department likely wouldn’t contain daily sales data as they have no real need for that information.

  • Data Warehouse: a data warehouse collects valuable information from a variety of sources to generate reports, analytical documentation, predictions, and overviews of important company events such as current sales trends, customer behavior, and other key business topics. A data warehouse is your central database of all valuable company information.

  • DBMS (Database Management System): refers to the software used to create, update, and manage a database. Such software is often user-friendly allowing end users to access, update, and add information to a database. The advantages of A DBMS is it allows easy access and updating of database information without the need of a specialized skillset. The key disadvantage is adding a DBMS on top of your existing database system requires more resources be used by the system.

  • ETL (Extract, Transform, Load): refers to the overall process of how data is migrated to a database. The data is extracted from its original location, transformed to the necessary format (it may also be organized and cleaned up as needed), and loaded into the database system so others can access it. The ETL process creates a traditional data warehouse as defined above.

  • ELT (Extract, Load, Transform): works similar to an ETL process, however, the order the steps performed in is changed. In an ELT process data is extracted and loaded into a database before it is transformed. A key drawback of this system is it requires a powerful database system to function but when applied properly it gives you the ability to move a large amount of data without impacting the processes of the source data locations or networks. ELT creates a ‘data lake’ where the information is stored but needs a further transformation to produce organized results.

  • IoT (The Internet of Things): tangibly related to databases IoT refers to everyday objects that have internet connectivity. This can include smart watches, appliances, self-driving cars, and so on. Where this relates to databases is the ever widening variety of tools you can use to access your company’s data (such as checking your work email via your smart watch).

  • NoSQL: the No, in this case, can refer to ‘not only’ or ‘non-relational’. This refers to a database that does not make use of the traditional relational tables with rows and columns. For certain types of data, this can be very useful as a NoSQL database is very efficient, streamlined, and can be run across multiple servers simultaneously a cost efficient way of increasing database functionality.

  • Petabyte: as database technology continues to mature the sheer size of storage space available also continues to grow. While home storage (and many businesses) use gigabytes and terabytes (1,000 gigabytes in size) to judge their storage space a petabyte is 1,000,000 gigabytes of storage and will see further use as a term in the future to describe storage size.

  • SQL (Structured Query Language): the traditional programing language of databases information is stored in a table format with rows and columns. Information is managed with exact commands such as insert, delete, and update. SQL has been in use in its standardized form since the mid-1980s.
  • WEBINAR

    “Prepare your Inbound-Marketing for the world of AI”

    DATE: 26th Oct 2017

    TIME: 15:30 (London)

    Reserve your place
  • Posted in: