Data Infrastructure for Analytics in Education

Cary K. Jim, Ph.D., Data & Analytics Manager, The ASSISTments Foundation

With the growing interest in migrating information systems to the cloud, modern digital infrastructure allows greater system interoperability and data integration to meet the organization’s needs and strategic goals. Although there are many resources to help educational groups to conceptualize and implement data analytics, some of the challenges remain at the operation level that impacts how data are collected, stored, and retrieved. The success of digital transformation for analytics lies at the intersection between process, people, and technology. These three components work simultaneously and collaboratively to build and support the data infrastructure and strategy at your organization. A few key questions to consider with your department leads and stakeholders at the initial phase of building data infrastructure for analytics are:  

  • What are the strategic goals and key outcomes to focus on at your organization?
  • Where is the existing data that can address these goals within your IT infrastructure? Are there any data not tracked to address your goals or objectives? Any data discrepancy issues?
  • What types of data delivery or analytics output your team needs and how often should these be updated?

This may sound like a chicken-and-egg dilemma, in which you need to fully understand your data systems to foresee what is needed to make decisions or drive strategies. Depending on the maturity of your organization’s IT systems and processes, you may need to conduct an initial assessment and determine what are the use cases of using data to address your goals. If you don’t have the talent in-house, most technology partners can provide their experts to guide your team and do the heavy lifting in the technical design and implementation. At the executive level, your leadership in guiding and communicating your business and strategic goals will help the directors and managers to align their data process and workflow to create the culture for best practices during the design and development of the data infrastructure. Next, we will review some key stages, from data collection to analytics, and what to look for in each stage.

Figure 1. Key Stages in Data Architecture for Analytics in Education

In education, data are often used to inform teaching and learning, user experience, knowledge management, or business strategies. To meet these goals, purposeful data architecture is built to manage different data processes across information systems and web applications. Figure 1 illustrates a generic architecture to show how data movement and processes are set up for analytics. The progression is to ensure the data structure and quality improve before reaching the final stage. The use of a data lake or data warehouse will depend on your data scenario and how soon you want the systems to be ready for analytics and reporting use. There are three types of data warehouse architecture: relational, dimensional, or hybrid. Each type has its own way of dealing with data process and integration to allow various types of data analysis, reporting, or self-service data retrieval by the end-users.

The most tedious and demanding task is data modeling. However, data modeling is an important step in mapping out how data points are connected and structured. There are three types of data models, and each will add clarity to the architectural design and how data and information should be integrated for analytics. The conceptual data model presents an abstract view of different technologies and how they are used to support your organization’s processes. Then a logic data model is developed to specify data entities, their relationships, and attributes, such as data types and, formats. The last phase is a physical data model to create schemas of how data will be physically stored in the systems and the relationship between entities or data objects. During data modeling, the architectural design and engineering efforts should be documented and kept current for use. Expect your system to evolve and the documentation as part of the knowledge management process will save the team some headaches down the road. The data analysis stage may start small, from basic descriptive analysis to exploratory data analysis, before building some predictive models as your data infrastructure becomes more mature. Any analysis should be designed around your objectives and goals to ensure the result can be used to inform decision-making. More advanced level analysis or data mining approaches that address both business processes and analytics workflow are common in the business domain. The field of education can consider these frameworks to help solidify a data-driven decision-making process at your organization.

Other essential components of a comprehensive modern data architecture are data governance, data security, metadata, and data curation. Although there are trade-offs in building a data infrastructure for analytics due to time, talent, and resource limitations, it is possible to reach a mature data infrastructure with a clear data map from birth to its final resting place.

[1] CRISP-DM, SAS Institute SEMMA, Six Sigma DMAIC are the most commonly used methodology in data mining projects or business process improvement 

About The ASSISTments Foundation
Assess Learning. Analyze Data. Differentiate Instruction.
The ASSISTments Foundation is the 501c3 non-profit behind ASSISTments Teacher, an evidence-backed digital math practice and assessment platform. Our mission is to impact advancements in math instruction to make teaching and learning more evidence-based and aligned to the diverse needs of students. Our vision is for every student to be seen, supported, and successful in math class. With ASSISTments Teacher, educators can accurately assess student progress, identify areas for improvement, and make data-informed decisions that positively impact learning.

The ASSISTments Foundation (TAF) works with Worcester Polytechnic Institute (WPI) to conduct cutting-edge research on the learning sciences.