What is a data warehouse and what type of data should it contain quizlet?

  • Flashcards

  • Learn

  • Test

  • Match

Terms in this set (60)

A subject-oriented, integrated, time variant, and non-volatile collection of data in support of management's decision-making process

What is a data warehouse

Subjects of enterprise

DW focused around subjects of enterprise or application areas?

Data represented in snap shots but move is towards real time data

What is meant by Time Variance

* Potential high returns on investment
* Competitive advantage
* Increased productivity of corporate decision makers

Benefits of Data Warehousing (Business side)

* Underestimation of resources for data loading
* Hidden problems with source systems
* Required data not captured
* Increased end-user demands
* Data homogenization
* High demand for resources
* Data ownership
* High maintenance
* Long duration projects
* Complexity of integration

Problems of Data Warehousing (10)

online transaction processing (OLTP) databases

Main Sources of Data for DW

* Holds current and integrated operational data for analysis.
* Often structured and supplied with data in the same way as the data warehouse.
* Can act as staging area for data to be moved into the warehouse.

Why Operational Data Store (ODS) used?

Legacy systems not unable to fulfil reporting requirements

Problem with using Operational Data Store (ODS)

Internal or external data sources (primarily OLTP )

Explain the tools for Extraction in ETL

* Applies a series of rules to extracted data, determines how data will be used for analysis

* Data summations, data encoding, data merging, data splitting, data calculations, and creation of surrogate keys.

Explain the tools for Transformation in ETL

As data loads, additional constraints defined in the database schema can be activated (such as uniqueness, referential integrity, and mandatory fields)

Explain the tools for Loading in ETL

Manage external data sources

* Analysis of data to ensure consistency.
* Transformation and merging of source data from temporary storage into data warehouse tables.
* Creation of indexes and views on base tables.
* Generation of denormalizations, (if necessary).
* Generation of aggregations, (if necessary).
* Backing-up and archiving data.
* Query profile
* Determines which indexes and aggregations are appropriate

Role of warehouse Manager (7)

Directing queries to the appropriate tables and scheduling the execution of queries

Role of Query Manager

* To map data sources to a common view of information within the warehouse.
* To automate the production of summary tables.
* To direct a query to the most appropriate data source.

Role of Metadata

* Load performance
* Load processing
* Data quality management
* Query performance
* Terabyte scalability
* Mass user scalability
* Networked data warehouse
* Warehouse administration
* Integrated dimensional analysis
* Advanced query functionality

Data Warehouse DBMS Requirements (9)

* A database that contains a subset of corporate data to support the analytical requirements of certain area of business

What is a datamart

* give users access to the data they need to analyse most often.

* To provide data in a form that matches the collective view of the data by a group of users in a department or business application area.

* To improve end-user response time due to the reduction in the volume of data to be accessed.

* To provide appropriately structured data as dictated by the requirements of the end user access tools.

Reasons to use datamarts

* Simpler than DW
* Cost less
* Users easier to identify

Why use Data mart instead of DW

* Which user requirements are most important
* Which data should be considered first?
* Should the project be scaled down into something more manageable?
* Should the infrastructure for a scaled down project be capable of ultimately delivering a full-scale enterprise-wide data warehouse (i.e do you want it scalable in the end)?

4 Questions that should be asked asked at beginning of every data warehouse project

Because they simpler and specific

Why often better to begin a project with Data Marts

* Interview staff users
* Interview operational staff to see what clean, consistent data sources there are

How are requirements gathered in the beginning of DW project

can give consistent and comprehensive data strategy

Main Advantage of Inmons Corporate Information strategy

Large and complex that may not fulfil requirements within the time frame

Main disadvantage of Inmons Corporate Information strategy

Scaled down project so can demonstrate value faster

Main Advantage of Kimball's Business Dimensional Lifecycle

If data marts are developed by seperate teams, end DW may not be consistent

Main disdvantage of Kimball's Business Dimensional Lifecycle

Identify information requirements and associated business uses, Data Warehouse Bus Matrix is produced

How is a DW project began, what is produced as a result?

* List key business processes and how they are to be analysed
* Used to create first data mart
* Dimensionality modelling to establish the data model (called star schema) for each data mart.

Purpose of Data Warehouse Bus Matrix is produced

* Technology (top track)
* Data (middle track)
* Business intelligence (BI) applications (bottom track)

3 Tracks of DW Design

Table with a composite primary key, called the fact table, and a set of smaller tables called dimension tables.

What does Dimensionality modelling produce

Each Dimension Table has a non-composite Primary Key that matches to a component composite key in fact table

How do the Fact and Dimension Tables interact

Dimension Tables

Fact or Dimension Tables usually contain textual data

Fact Table

Fact or Dimension Tables usually contain numerical and additive values

True

Facts are generated by events that occurred in the past, T/F?

True

Dimension attributes are used as the constraints in data warehouse queries, T/F

speed up query performance by denormalising reference information into a single dimension table (different from snowflake schema)

Why are star schemas used

This gives independence from OLTP keys

Why are natural keys in stars schemas replaced with surrogate keys

Snowflake Schema are normalised

Difference between Snowflake and Star Schemas?

* Efficiency
* Ability to handle changing requirements
* Extensibility
* Ability to model common business situations
* Predictable query processing

Advantages of Dimensionality Modelling

Select Business Process
Declare Grain
Choose Dimensions
Identify Facts

4 Steps of Stage 1 of Kimballs Business Dimensional Lifecycle

* Decide what a record of the fact table is to represent.
* Identify dimensions of the fact table. The grain decision for the fact table also determines the grain of each dimension table
* Time is always present on fact table of schema

Purpose of Declare Grain step in Kimballs?

Dimensions set the context for asking questions about the facts in the fact table.

Purpose of Choose Dimensions step in Kimballs?

Conformed

Term for when a dimension is one or more data mart

They must be exactly the same dimension, or one must be a mathematical subset of the other

If a dimension is in more than one data mart, what must be the case?

NO

Are non numeric Facts usable in Fact Table?

NO NO NO

Are non-additive facts Facts usable in Fact Table?

...

fact at different granularity from other facts in table

* Phase 2 involves the rounding out of the dimensional tables.
* Text Descriptions added to dimension tables so easy to understand

What does Phase 2 Kimballs do (2)

* A dimensional model, which contains more than one fact table sharing one or more conformed dimension tables

Fact Constellation

Data Warehouse / Data Mart + Data mining/ OLA

Business Intelligence BI Definition

Multi-dimensional views of Data

Support for complex calculations

Time Intelligence

3 key features of OLAP

Roll up
Drill down
Slice and Dice
Pivot

Analytical operations that can be performed on data cube

* Multidimensional conceptual view
* Transparency
* Accessibility
* Consistent report performance
* Client-server architecture
* Generic Dimensioanlity
* Dynamic Sparse matrix Handling
* Multi-user support
* Unrestricted cross dimensional operations
* Intuitive data manipulations
* Flexible reporting
* Unlimited dmimneiosns and aggregation levels

Codd's Rules for OLAP (12)

ROLL UP, CUBE

Which OLAP extended SQL functions correspond to GROUP BY in vanilla SQL

RANK, DENSE_RANK

...

DENSE_RANK always consecutive, RANK jumps if there have been ties in the ranking

Difference between RANK and DENSE_RANK operation

Reveal aggregate data

Purpose of the Drill down operation on OLAP cubes

Slice selects for one dimension, whereas dice selects for many

Difference between Slice and Dice operations on OLAP cube

changes axis of different dimensions (like a rubix cube I guess)

Purpose of Pivot operation on an OLAP data cube

Associations discovery

Sequential pattern discovery

Similar Time Sequence Discovery

3 Variations of Link Analysis

* Business Understanding
* Data Understanding
* Data Preparation
* Modelling
* Evaluation
* Deployment

CRISP-DM phases

Other sets by this creator

SPANISH 2 PSU (MOSAICOS Ch. 8 Fiestas Y Celebracio…

115 terms

Connor_Amos

SPANISH 2 PSU (Mosaicos Spanish Chapter 9 Vocabula…

94 terms

Connor_Amos

SPANISH 2 PSU (MOSAICOS Ch. 8 Fiestas Y Celebracio…

21 terms

Connor_Amos

BLAW Exam 3 set 9 - Prof Dan Cahoy

18 terms

Connor_Amos

Verified questions

computer science

T F The operand of the increment and decrement operators can be any valid mathematical expression.

Verified answer

computer science

Suppose that you want to output 0 with probability 1/2 and 1 with probability 1/2. At your disposal is a procedure BIASED-RANDOM, that outputs either 0 or 1. It outputs 1 with some probability p and 0 with probability 1 - p, where 0 < p < 1, but you do not know what p is. Give an algorithm that uses BIASED-RANDOM as a subroutine, and returns an unbiased answer, returning 0 with probability 1/2 and 1 with probability 1/2. What is the expected running time of your algorithm as a function of p?

Verified answer

computer science

Numerologists claim to be able to determine a person's character traits based on the "numeric value" of a name. The value of a name is determined by summing up the values of the letters of the name where 'a' is 1, 'b' is 2, 'c' is 3 etc., up to 'z' being 26. For example, the name "Zelle" would have the value 26 + 5 + 12 + 12 + 5 = 60 (which happens to be a very auspicious number, by the way). Write a program that calculates the numeric value of a single name provided as input.

Verified answer

computer science

In mathematics, $C_{k}^{n}$ denotes the number of different ways that k things can be selected from among n different choices. For example, if you are choosing among six desserts and are allowed to take two, the number of different combinations you could choose is $C_{2}^{6}$. Here's one formula to compute this value: $$ C_{k}^{n}=\frac{n !}{k !(n-k) !} $$ This value also gives rise to an interesting recursion: $$ C_{k}^{n}=C_{k-1}^{n-1}+C_{k}^{n-1} $$ Write both an iterative and a recursive function to compute combinations and compare the efficiency of your two solutions. Hints: when $k=1, C_{k}^{m}=n$ and when $n

Verified answer

Recommended textbook solutions

What is a data warehouse and what type of data should it contain quizlet?

Introduction to Algorithms

3rd EditionCharles E. Leiserson, Clifford Stein, Ronald L. Rivest, Thomas H. Cormen

726 solutions

What is a data warehouse and what type of data should it contain quizlet?

Information Technology Project Management: Providing Measurable Organizational Value

5th EditionJack T. Marchewka

346 solutions

What is a data warehouse and what type of data should it contain quizlet?

Operating System Concepts

9th EditionAbraham Silberschatz, Greg Gagne, Peter B. Galvin

489 solutions

What is a data warehouse and what type of data should it contain quizlet?

Information Technology Project Management: Providing Measurable Organizational Value

5th EditionJack T. Marchewka

346 solutions

Other Quizlet sets

FHS APUSH - Chapter 6

22 terms

johnsonvaleriee

Lesson2(ResidentRights)

12 terms

HOSA_AHS

S W 2010 Midterm

25 terms

tori_marie_logel

2017-18 Battle of the Books

460 terms

Emily_Douglas_

What is a data warehouse and what type of data should it contain?

Data Warehouse Defined Data warehouses are solely intended to perform queries and analysis and often contain large amounts of historical data. The data within a data warehouse is usually derived from a wide range of sources such as application log files and transaction applications.

What is a data warehouse quizlet?

Data warehouse. A logical collection of information - gathered from many different operational databases - that supports business analysis activities and decision-making tasks. primary purpose of a data warehouse. aggregate information throughout an organization into a single repository for decision-making purposes.

What are the data types in data warehouse?

Types of Data Stored in a Data Warehouse.
Historical data..
Derived data..
Metadata..

What is a data warehouse and what are its main characteristics quizlet?

Data warehouse is Subject Oriented, Integrated, Time-Variant and Nonvolatile collection of data that support management's decision making process. b. Data warehouse is Subject Oriented, Integrated, Time-Variant and Nonvolatile collection of data that support daily management process.