What is a data warehouse and what type of data should it contain quizlet?
Terms in this set (60)A subject-oriented, integrated, time variant, and non-volatile collection of data in support of management's decision-making process What is a data warehouse Subjects of enterprise DW focused around subjects of enterprise or application areas? Data represented in snap shots but move is towards real time data What is meant by Time Variance * Potential high returns on investment Benefits of Data Warehousing (Business side)
* Underestimation of resources for data loading Problems of Data Warehousing (10) online transaction processing (OLTP) databases Main Sources of Data for DW * Holds current and integrated operational data for analysis. Why Operational Data Store (ODS) used? Legacy systems not unable to fulfil reporting requirements Problem with using Operational Data Store (ODS) Internal or external data sources (primarily OLTP ) Explain the tools for Extraction in ETL * Applies a series of rules to extracted data, determines how data will be used for analysis * Data summations, data encoding, data merging, data splitting, data calculations, and creation of surrogate keys. Explain the tools for Transformation in ETL As data loads, additional constraints defined in the database schema can be activated (such as uniqueness, referential integrity, and mandatory fields) Explain the tools for Loading in ETL Manage external data sources * Analysis of
data to ensure consistency. Role of warehouse Manager (7) Directing queries to the appropriate tables and scheduling the execution of queries Role of Query Manager * To map data sources to a common view of information within the warehouse. Role of Metadata *
Load performance Data Warehouse DBMS Requirements (9) * A database that contains a subset of corporate data to support the analytical requirements of certain area of business What is a datamart * give users access to the data they need to analyse most often. * To provide data in a form that matches the collective view of the data by a group of users in a department or business application area. * To improve end-user response time due to the reduction in the volume of data to be accessed. * To provide appropriately structured data as dictated by the requirements of the end user access tools. Reasons to use datamarts * Simpler than DW Why use Data mart instead of DW * Which user requirements are most important 4 Questions that should be asked asked at beginning of every data warehouse project Because they simpler and specific Why often better to begin a project with Data Marts * Interview staff users How are requirements gathered in the beginning of DW project can give consistent and comprehensive data strategy Main Advantage of Inmons Corporate Information strategy Large and complex that may not fulfil requirements within the time frame Main disadvantage of Inmons Corporate Information strategy Scaled down project so can demonstrate value faster Main Advantage of Kimball's Business Dimensional Lifecycle If data marts are developed by seperate teams, end DW may not be consistent Main disdvantage of Kimball's Business Dimensional Lifecycle Identify information requirements and associated business uses, Data Warehouse Bus Matrix is produced How is a DW project began, what is produced as a result? * List key business processes and how they are to be analysed Purpose of Data Warehouse Bus Matrix is produced * Technology (top track) 3 Tracks of DW Design Table with a composite primary key, called the fact table, and a set of smaller tables called dimension tables. What does Dimensionality modelling produce Each Dimension Table has a non-composite Primary Key that matches to a component composite key in fact table How do the Fact and Dimension Tables interact Dimension Tables Fact or Dimension Tables usually contain textual data Fact Table Fact or Dimension Tables usually contain numerical and additive values True Facts are generated by events that occurred in the past, T/F? True Dimension attributes are used as the constraints in data warehouse queries, T/F speed up query performance by denormalising reference information into a single dimension table (different from snowflake schema) Why are star schemas used This gives independence from OLTP keys Why are natural keys in stars schemas replaced with surrogate keys Snowflake Schema are normalised Difference between Snowflake and Star Schemas? * Efficiency Advantages of Dimensionality Modelling Select Business Process 4 Steps of Stage 1 of Kimballs Business Dimensional Lifecycle * Decide what a record of the fact table is to represent. Purpose of Declare Grain step in Kimballs? Dimensions set the context for asking questions about the facts in the fact table. Purpose of Choose Dimensions step in Kimballs? Conformed Term for when a dimension is one or more data mart They must be exactly the same dimension, or one must be a mathematical subset of the other If a dimension is in more than one data mart, what must be the case? NO Are non numeric Facts usable in Fact Table? NO NO NO Are non-additive facts Facts usable in Fact Table? ... fact at different granularity from other facts in table * Phase 2 involves the rounding out of the dimensional tables. What does Phase 2 Kimballs do (2) * A dimensional model, which contains more than one fact table sharing one or more conformed dimension tables Fact Constellation Data Warehouse / Data Mart + Data mining/ OLA Business Intelligence BI Definition Multi-dimensional views of Data Support for complex calculations Time Intelligence 3 key features of OLAP Roll up Analytical operations that can be performed on data cube * Multidimensional conceptual view Codd's Rules for OLAP (12) ROLL UP, CUBE Which OLAP extended SQL functions correspond to GROUP BY in vanilla SQL RANK, DENSE_RANK ... DENSE_RANK always consecutive, RANK jumps if there have been ties in the ranking Difference between RANK and DENSE_RANK operation Reveal aggregate data Purpose of the Drill down operation on OLAP cubes Slice selects for one dimension, whereas dice selects for many Difference between Slice and Dice operations on OLAP cube changes axis of different dimensions (like a rubix cube I guess) Purpose of Pivot operation on an OLAP data cube Associations discovery Sequential pattern discovery Similar Time Sequence Discovery 3 Variations of Link Analysis * Business Understanding CRISP-DM phases Other sets by this creatorSPANISH 2 PSU (MOSAICOS Ch. 8 Fiestas Y Celebracio…115 terms Connor_Amos SPANISH 2 PSU (Mosaicos Spanish Chapter 9 Vocabula…94 terms Connor_Amos SPANISH 2 PSU (MOSAICOS Ch. 8 Fiestas Y Celebracio…21 terms Connor_Amos BLAW Exam 3 set 9 - Prof Dan Cahoy18 terms Connor_Amos Verified questionscomputer science T F The operand of the increment and decrement operators can be any valid mathematical expression. Verified answer
computer science Suppose that you want to output 0 with probability 1/2 and 1 with probability 1/2. At your disposal is a procedure BIASED-RANDOM, that outputs either 0 or 1. It outputs 1 with some probability p and 0 with probability 1 - p, where 0 < p < 1, but you do not know what p is. Give an algorithm that uses BIASED-RANDOM as a subroutine, and returns an unbiased answer, returning 0 with probability 1/2 and 1 with probability 1/2. What is the expected running time of your algorithm as a function of p? Verified answer
computer science Numerologists claim to be able to determine a person's character traits based on the "numeric value" of a name. The value of a name is determined by summing up the values of the letters of the name where 'a' is 1, 'b' is 2, 'c' is 3 etc., up to 'z' being 26. For example, the name "Zelle" would have the value 26 + 5 + 12 + 12 + 5 = 60 (which happens to be a very auspicious number, by the way). Write a program that calculates the numeric value of a single name provided as input. Verified answer
computer science In
mathematics, $C_{k}^{n}$ denotes the number of different ways that k things can be selected from among n different choices. For example, if you are choosing among six desserts and are allowed to take two, the number of different combinations you could choose is $C_{2}^{6}$. Here's one formula to compute this value: $$ C_{k}^{n}=\frac{n !}{k !(n-k) !} $$ This value also gives rise to an interesting recursion: $$ C_{k}^{n}=C_{k-1}^{n-1}+C_{k}^{n-1} $$ Write both an iterative and a recursive
function to compute combinations and compare the efficiency of your two solutions. Hints: when $k=1, C_{k}^{m}=n$ and when $n Verified answer Introduction to Algorithms3rd EditionCharles E. Leiserson, Clifford Stein, Ronald L. Rivest, Thomas H. Cormen 726 solutions Information Technology Project Management: Providing Measurable Organizational Value5th EditionJack T. Marchewka 346 solutions
Operating System Concepts9th EditionAbraham Silberschatz, Greg Gagne, Peter B. Galvin 489 solutions
Information Technology Project Management: Providing Measurable Organizational Value5th EditionJack T. Marchewka 346 solutions Other Quizlet setsFHS APUSH - Chapter 622 terms johnsonvaleriee Lesson2(ResidentRights)12 terms HOSA_AHS S W 2010 Midterm25 terms tori_marie_logel 2017-18 Battle of the Books460 terms Emily_Douglas_ What is a data warehouse and what type of data should it contain?Data Warehouse Defined
Data warehouses are solely intended to perform queries and analysis and often contain large amounts of historical data. The data within a data warehouse is usually derived from a wide range of sources such as application log files and transaction applications.
What is a data warehouse quizlet?Data warehouse. A logical collection of information - gathered from many different operational databases - that supports business analysis activities and decision-making tasks. primary purpose of a data warehouse. aggregate information throughout an organization into a single repository for decision-making purposes.
What are the data types in data warehouse?Types of Data Stored in a Data Warehouse. Historical data.. Derived data.. Metadata.. What is a data warehouse and what are its main characteristics quizlet?Data warehouse is Subject Oriented, Integrated, Time-Variant and Nonvolatile collection of data that support management's decision making process. b. Data warehouse is Subject Oriented, Integrated, Time-Variant and Nonvolatile collection of data that support daily management process.
|