Introduction

Research background and motivations

The domain of education is vast and intricate, entailing the growth and development of both individuals and communities. In an age marked by rapid advancements in science and technology, the educational system undergoes constant refinement and modernization1. Education serves as the cornerstone of social progress, and within the domain of higher education, audio-visual aesthetic education holds significant importance in enhancing students’ comprehensive qualities and cultivating well-rounded talents. Focusing on aesthetic education, audio-visual aesthetic education not only enhances their aesthetic qualities and perspectives but also enriches their perceptiveness and comprehension, which are crucial for holistic personal growth2. In this era of technological advancement, computers and internet technologies have profoundly impacted various fields, including education. Audio-visual aesthetic education has integrated computer and deep learning technologies, providing new possibilities for enhancing teaching quality. Within the realm of audio-visual aesthetic education, endeavors persist in enhancing technological prowess and instructional methodologies to elevate pedagogical standards. Notably, in the contemporary era characterized by the swift evolution of computer and Internet technologies, a pervasive trend of interdisciplinary collaboration has emerged, encompassing diverse industries, including auditory aesthetic education3. In response to the contemporary demand for multifaceted competencies, the concept of Science, Technology, Engineering, Arts, and Mathematics (STEAM) education has materialized, reflecting its comprehensive educational nature. Concurrently, pedagogical approaches rooted in deep learning technology remain under active exploration4.

At the core of progress in the domain of audio-visual aesthetic education is the fundamental enhancement of individuals’ comprehensive capabilities. This purpose aligns harmoniously with the overarching objectives of STEAM education5. As a pivotal form of integrated education, STEAM education emphasizes the holistic exploration of science, technology, engineering, arts, and mathematics, which closely resonates with the goals of audio-visual aesthetic education. Therefore, the fusion of both approaches may better meet the contemporary demands for well-rounded and comprehensive talents.

Throughout history, art education has been intrinsically linked with the emotive expression of humanity. New challenges arise in the context of modern university education. In contemporary higher education, achieving pedagogical objectives and aligning with the prevailing epoch necessitates the implementation of aesthetics-based teaching practices6. In the contemporary global landscape, audio-visual aesthetic education has transcended its conventional role of merely imparting oral instruction in music and art, adopting a comprehensive and integrated approach. In this regard, educators are enjoined to not only cultivate students’ acumen in specific disciplines but also foster their holistic learning across diverse domains, thereby augmenting their overall proficiency7. For example, Anggereni et al. studied the application of audio-visual media education to enhance kindergarten children’s singing skills, with the aim of promoting their cognitive development8. While audio-visual aesthetic education ostensibly fosters music and visual art appreciation skills, it indispensably imparts robust foundational competencies requisite for mastery across various disciplines and scientific pursuits. Audio-visual aesthetic education requires students to possess strong fundamental skills, including an accurate understanding of emotional expression, which forms the foundation for steadily improving students’ aesthetic abilities9. Traditional audio-visual aesthetic education has expanded beyond music and fine arts, placing increased emphasis on cultivating students’ overall capabilities. Winarto and colleagues conducted a qualitative study to assess the effectiveness of integrating audio-visual media into Islamic junior high school education in enhancing learning outcomes. The research findings indicate that using audio-visual media can effectively improve students’ academic performance and interest in Islamic junior high school education10. However, several critical issues remain within existing research. For instance, a deeper understanding is needed regarding the impact of audio-visual aesthetic education on students’ overall abilities and how it fosters emotional expression and aesthetic competence development. Additionally, the effective integration of STEAM education principles to advance the development of audio-visual aesthetic education remains an unresolved issue. To address these concerns, this research aims to bridge the gaps in current research. This research explores the comprehensiveness and integration of audio-visual aesthetic education and its role in promoting students’ holistic development. It also investigates methods for incorporating STEAM education principles into audio-visual aesthetic education to enhance educational quality. Through this research, the objective is to provide fresh insights and innovative approaches to the field of education, catering to the needs of contemporary students and aligning with the advancements of this era.

Research objectives

The continuous advancement of technology has broadened the scope of audio-visual aesthetic education, extending its reach beyond traditional domains such as music and visual arts. This expansion prompts the question of effectively integrating audio-visual aesthetic education into students’ overall learning experiences, enhancing their comprehensive abilities across various disciplines. Furthermore, the incorporation of STEAM education principles has become increasingly vital. Therefore, the central research question of this paper revolves around the strategies for advancing audio-visual aesthetic education to meet the evolving needs of contemporary students. This includes integrating STEAM education principles, addressing interdisciplinary challenges, and modernizing educational approaches to align with the developments of our era.

The research is dedicated to enhancing the effectiveness of audio-visual aesthetics in university-level vocal music education. It seeks to achieve the following specific research objectives:

  1. (1)

    Explore the integration of audio-visual aesthetic education: this investigation investigates methods for more comprehensively integrating audio-visual aesthetic education into a broader range of academic fields. It necessitates the development of innovative educational approaches and curriculum designs to deliver a more inclusive educational experience.

  2. (2)

    Evaluate the effectiveness of the new educational model: the research aims to assess the effectiveness of a proposed three-stage educational model. This evaluation includes empirical research and surveys to gauge the teaching outcomes of the new model. The primary goal is to determine whether this model significantly enhances students’ audio-visual aesthetic abilities, self-confidence in learning, and overall learning efficiency.

  3. (3)

    Provide best practices for audio-visual aesthetic education: the ultimate objective is to offer a set of best practices and recommendations for audio-visual aesthetic education. These insights are intended as valuable references for educators and decision-makers. The research outcomes aim to provide substantial guidance for the enhancement and advancement of audio-visual aesthetic education within higher education institutions.

Through the pursuit of these objectives, this research aims to contribute to the ongoing development and improvement of audio-visual aesthetic education. This research endeavors to advance the current educational landscape by contextualizing it within a contemporary backdrop framework, coupled with a comprehensive review of pertinent literature anchored in the realms of deep learning and STEAM education. By synthesizing these essential dimensions, the research propounds an innovative paradigm for curriculum development and proposes a novel teaching process model. The research outcome holds the potential to ameliorate the pedagogical methodologies pertaining to auditory aesthetics in higher education institutions. Furthermore, they serve to familiarize educational practitioners with the merits of integrating the STEAM education model and deep learning into their instructional frameworks. In effect, this engagement fosters the widespread acceptance and adoption of novel educational principles, particularly those intrinsic to STEAM. Ultimately, the assimilation of these transformative educational concepts may precipitate a reformation of the broader social teaching system, ultimately uplifting the overall standard of education within society. This research carries substantial importance and value in audio-visual aesthetics education. It serves as a catalyst for augmenting the quality of education in this field, propelling interdisciplinary learning, and exerting influence on educational policies and theories. Furthermore, it plays a pivotal role in fostering the development of students endowed with comprehensive skills and innovative capabilities. The research outcomes provide invaluable experiences and lessons that can significantly inform the future of education.

Literature review

Audio-visual aesthetic education represents a pivotal objective within the domain of art instruction. Early in the evolution of art pedagogy, foundational works emphasized the necessity for colleges and universities to prioritize aesthetic education as a key instructional goal11. The origins of STEAM education can be traced back to World War II when the former Soviet Union’s launch of the first artificial satellite compelled the United States government to recognize the need for an educational paradigm that seamlessly integrated technology and science12. Subsequently, the progression of educational models witnessed a transition from Science-Technology-Society education to STEM and eventually to STEAM13. In the 21st century, art was seamlessly integration into the comprehensive STEAM education system. Towards the close of the twentieth century, the United States shifted from a state-led teaching system to a national framework governed by federal standards14. The early 21st century saw the adoption of STEM education across all levels of learning in the United States, aiming to cultivate a generation of more skilled and well-rounded individuals15. Notably, the implementation of the STEAM education system in the United States facilitated the emergence of numerous high-tech and multifaceted talents, significantly elevating the nation’s international influence and educational competitiveness16. As a mature STEAM education system materialized in the United States, countries worldwide recognized the strategic significance of this model and subsequently embraced its principles17. The earnest consideration given by researchers to aesthetic education underscores the crucial role of audio-visual aesthetic education in talent development. Simultaneously, the steady growth and influence of STEAM education, since its inception, have substantially impacted the United States’ educational landscape, prompting numerous colleges and universities to adopt STEAM education systems aligned with their distinct attributes. Moreover, with the advent of computing technology, the application of deep learning in education has emerged as a prominent area of interest. In this context, the utilization of STEAM education concepts and deep learning technology to establish an aesthetic education system in China represents a timely and noteworthy subject worthy of scholarly investigation.

In the context of STEAM education, scholars have conducted various studies. Jesionkowska et al. (2020) researched active learning methods for STEAM subjects, utilizing a format in which students were tasked with building augmented reality (AR) applications as part of their learning. They evaluated the applicability of active learning for STEAM subjects through qualitative case study methods18. Bertrand and Namukasa (2020) aimed to understand STEAM teaching programs and student learning provided by non-profit organizations and public schools in Ontario, Canada. They conducted qualitative case studies involving interviews, observations, and data analysis of curriculum documents, identifying facilitating factors and examining relationships in comparison to other STEAM studies19. Park and Cho (2022) viewed history as a humanities discipline that could be combined with STEM, exploring various learning objectives related to history in STEAM curriculum materials. They analyzed the presentation of history-related learning objectives and highlighted several ways in which historical and STEM learning objectives interact20. Skowronek et al. (2022) presented a special perspective on underrepresentation in STEM and minority students’ education. They emphasized that combining technical training with social sciences, arts, ethics, and business can enable future leaders to creatively solve recognizable barriers to a sustainable society21. Lee (2021) introduced the general education situation in Uzbekistan, analyzed newspaper articles and government documents related to STEAM education, and shared teachers’ views on STEAM education through survey feedback and interviews. Additionally, they provided suggestions for the government to more effectively achieve STEAM education-related reform goals22.

Scholars have undertaken numerous studies on audio-visual aesthetics education. For instance, Yan and Xia explored the integration of virtual reality (VR) and artificial intelligence technologies into music education to enhance students’ learning capabilities. Through the creation of interactive course delivery models and empirical verification, their results demonstrated the high accuracy of this technology in music signal recognition. This technology has the potential to effectively improve music instruction outcomes23. Abdunabievich and his colleagues delved into various types of educational technologies pertaining to moral, aesthetic, and student education. They classified these technologies into different categories based on distinct contexts and stages, highlighting their diversity and adaptability to cater to students with varying educational backgrounds and in different contexts24. Meanwhile, Yue and his research team examined China’s education system and reforms, with a specific focus on the application of educational policies in high school music education. They emphasized the crucial role of aesthetic education as a pathway to guide individuals in discovering, perceiving, expressing, and creating beauty in society25. Collectively, these studies on audio-visual aesthetics education underscore its significant and multifaceted contributions to the field of education.

In conclusion, existing research emphasizes the importance of audio-visual aesthetic education, and also highlights the necessity of combining the STEAM education concept with audio-visual aesthetic education. However, regarding how to achieve this integration in specific teaching practices, especially its application in different cultural and educational backgrounds, further discussion is still needed. Therefore, this research focuses on proposing a new teaching model to promote the effective combination of audio-visual aesthetic education and STEAM education and explore its application in college vocal music education.

Research methodology

Literature research method

The literature research method constitutes a deliberate examination of data pertaining to the research subject, serving as a foundational element in scientific inquiry. Similar to other methods of collecting market information, literature research necessitates the establishment of a meticulous search plan in its preliminary stages, followed by the meticulous screening of retrieved literature for authenticity and applicability. After an in-depth exploration of the relevant literature on STEAM education and audio-visual aesthetic education, it is found that the research on effectively combining the two to improve vocal music teaching in institutions of higher learning is still in its infancy. Based on this, this research further proposes a teaching model that integrates deep learning and STEAM education, and verifies its effect through empirical research. Both of these boast a rich developmental history and practical application in advanced nations, particularly the United States, thereby offering substantial value for research endeavors. Keywords such as “STEAM education” and “audio-visual aesthetics” are employed to conduct searches, amassing papers published in recent years from repositories such as CNKI and Google Scholar. This comprehensive exploration aims to provide an understanding of educational development, imparting robust theoretical support to the research undertaking.

Improved curriculum integration development model based on STEAM education

This research entails a comprehensive analysis of Chinese and foreign scholarly papers, culminating in synthesizing and evaluating various case studies pertaining to STEAM courses. Drawing from this extensive review, an integrated curriculum development model anchored in the principles of STEAM education is proposed26. Concurrently, an examination of the STEAM education network is conducted, elucidating the key focal points of this educational approach. These include analytical exploration, learning research, educational concepts, educational standards, educational simulation, interdisciplinarity, alignment with national requirements, and other critical elements27, as depicted in Fig. 1.

Fig. 1
figure 1

Hotspots of STEAM education.

Full size image

Figure 1 depicts STEAM education as an all-encompassing educational system, offering a fresh perspective and a new direction for higher education audio-visual aesthetic teaching. In this combined approach of audio-visual aesthetic education and STEAM education, it is essential to emphasize the integration of traditional culture and STEAM education28. This means that in the teaching process, traditional cultural elements should be combined with the principles of STEAM education, allowing students to appreciate works of art and gain knowledge in science, technology, engineering, and mathematics. In the context of this integrated approach, exploring and developing an educational system after researching a focal point becomes an effective implementation model based on STEAM education. This model considers interdisciplinary collaboration and integration as essential educational methods, enabling students to acquire more comprehensive knowledge through cross-disciplinary learning while fostering comprehensive abilities and innovative thinking. Moreover, this model should align with the country’s educational needs, harmonizing educational objectives with national development strategies to cultivate highly skilled talents that meet societal demands.

In summary, the integrated curriculum development model based on STEAM provides significant guidance and reference for higher education audio-visual aesthetic teaching, making education more comprehensive, practical, and adaptable to contemporary development. By combining traditional culture with STEAM education and emphasizing interdisciplinary collaboration and alignment with national needs, higher education audio-visual aesthetic teaching can effectively nurture talent with comprehensive qualities and innovative capabilities, contributing to the advancement and progress of society.

Project practice method of STEAM education

By its very nature, STEAM education necessitates a multifaceted approach and continuous adaptation. Drawing insights from both the findings of the literature review and accumulated experience, STEAM teaching initiatives can be broadly classified into five distinct categories. The diverse array of scientific teaching methods encompassed within the STEAM system is elucidated in Table 1.

Table 1 Development approach to science education in the STEAM concept.
Full size table

Within Table 1, the evolution of scientific education methods for STEAM concepts represents an ongoing and progressive endeavor. This pursuit must be directed towards holistic and comprehensive development, intertwining the diverse and extensive qualities of STEAM’s multidisciplinary domains to foster the cultivation of comprehensive and innovative talents. The advancement of STEAM education development must underscore the cultivation of comprehensive capabilities and align with the imperatives of national progress. Through interdisciplinary courses, the focus should be on scientific and core concepts29, anchored in well-defined teaching objectives and informed by existing instructional resources. Moreover, it should be rooted in practicality, drawing insights from practical feedback and historical data. Innovative ideas in this realm necessitate the creation of pertinent teaching projects attuned to local needs and the specific educational context of students. These projects should manifest in diverse forms, unconstrained by traditional paradigms. The emphasis should not be on requiring every project to encompass all five areas of STEAM but rather on facilitating cross-domain integration to foster a synergistic and comprehensive educational experience.

Teaching strategies and process models based on deep learning

In educational strategies related to deep learning, considerable attention is devoted to integrating STEAM and artificial intelligence thinking30. The pedagogical process of deep learning can be delineated into three distinct stages, as visually depicted in Fig. 2.

Fig. 2
figure 2

Teaching strategies for deep learning.

Full size image

In Fig. 2, within the domain of cognitive skills, the educational objective is to empower students to personalize their instruction through autonomous knowledge acquisition and critical thinking. The aim is to foster a flexible application of acquired knowledge, both within and beyond the confines of the educational institution. Consequently, students are encouraged to integrate theoretical learning with practical experience inside and outside the school setting, thereby cultivating autonomous and critical thinking capabilities. Regarding interpersonal communication, the learning program fosters group cooperation and experiential learning within the school environment and beyond. Emphasis is placed on cultivating teamwork and fostering collective consciousness among students. This stage aims to facilitate student interactions and communication, thereby enhancing interpersonal skills through collaborative efforts and hands-on learning experiences. With regard to personal aptitude development, repeated decision-making and engagement in educational practices are paramount in enabling students to cultivate independent higher-order thinking skills. This stage represents a pivotal juncture wherein students have the opportunity to hone their innovative prowess and problem-solving abilities through consistent practice and deliberate decision-making processes. Within the domain of college audition aesthetic teaching under deep learning, a prevalent approach involves a comprehensive three-stage model comprising pre-teaching, interactive learning, and extended training. The interactive learning phase encompasses five essential components: concept introduction, deep understanding, thinking transformation, ability transfer, and inductive reflection, as visually depicted in Fig. 3.

Fig. 3
figure 3

Three-stage deep learning teaching process model.

Full size image

In Fig. 3, the pre-teaching stage constitutes the fundamental bedrock of the instructional process, necessitating educators to comprehensively grasp the textbook’s content and students’ circumstances. It entails meticulous curriculum planning, incorporating pertinent materials31. This stage involves determining teaching material content, instructional content, student profiles, and pre-tasks. Interactive learning embodies the fusion of deep learning principles with teaching methodologies, signifying a pivotal juncture in the instructional approach. Designers must possess a lucid comprehension of the logical connection between teaching and learning materials. Through thoughtful reflection and synthesis, diverse elements within the teaching process are meticulously evaluated, leading to refinements in the teaching format and an augmentation of learning efficacy. The third section is the extension learning, which is the conclusion of the teaching process and is usually conducted in the form of assignments. In extension learning, it is essential to closely link the logical relationship between preliminary teaching and interactive learning, forming a rigorous cyclical model to ensure the integrity and effectiveness of the teaching process. The quality and effectiveness of higher education audio-visual aesthetic teaching can be enhanced by employing the above deep learning teaching strategies in conjunction with STEAM education and the cultivation of artificial intelligence thinking. Students improve their aesthetic abilities and develop comprehensive skills, innovative thinking, and critical thinking, preparing themselves adequately for future development. Moreover, this teaching strategy can better adapt to the rapid growth of the information society and the application trends of deep learning technology.

Experimental design and performance evaluation

Experimental materials

Retrieval data based on STEAM education

Drawing upon the antecedent literature research, the keyword for retrieval is “STEM/STEAM education.” Subsequently, the assembled corpus of relevant scholarly articles on the identified subject is presented in Fig. 4.

Fig. 4
figure 4

Annual statistics of publications related to STEAM education.

Full size image

Figure 4 highlights a discernible surge in the volume of scholarly papers concerning STEAM education, particularly notable following the announcement of the STEM education law in the United States in 2015. The United States remains at the forefront of exploring the dimensions of STEAM education, evidenced by its expanding coverage of related research topics. The integration of STEAM education is now inseparably linked to keywords such as educational philosophy, educational literacy, and interdisciplinary studies. The data underscores the steady progression of STEAM education as a prominent and widely accepted pedagogical approach, signifying its ascent into the realm of mainstream educational methodologies duly acknowledged and embraced by educational practitioners.

Comparison of old and new teaching concepts

This research conducts a one-week teaching comparison experiment in art appreciation To assess the advantages of the aesthetic teaching method based on STEAM education and deep learning compared to traditional teaching. Two classes, Classes A and B, from University Z, are selected for this experiment. Class A adopts a curriculum that is developed and enhanced through the lens of STEAM education, with the teacher designing specific content for the three-stage teaching process based on the strategies and process model of deep learning. On the other hand, Class B follows the conventional audition aesthetic teaching methods typically employed in colleges and universities. After the teaching sessions are completed, a comprehensive questionnaire is administered to gauge the efficacy of the two teaching methodologies. Subsequently, a comparative analysis is conducted to discern the impact of each approach. Specifically, the statistical outcomes pertaining to the course content of Class A are presented in Fig. 5, while those of Class B are depicted in Fig. 6.

Fig. 5
figure 5

Scoring of class A course content evaluation items.

Full size image
Fig. 6
figure 6

Scoring of class B course content evaluation items.

Full size image

As shown in Fig. 6, Class A demonstrates a higher average score than Class B. This observation suggests that incorporating the three-stage educational process model, guided by the integration of STEAM and deep learning principles, substantially improves the overall teaching effectiveness. Most students demonstrate improved learning efficiency through this novel teaching approach, which not only fosters their audio-visual aesthetic proficiency but also bolsters their self-confidence in the learning process.

Regarding course performance assessment, three key aspects are considered for scoring: group mutual evaluation, teacher evaluation, and self-evaluation. The statistical results for Classes A and B, employing a scoring criterion of 3 points per item, are presented in Figs. 7 and 8, respectively.

Fig. 7
figure 7

Scoring of class A course performance evaluation items.

Full size image
Fig. 8
figure 8

Scoring of class B course performance evaluation items.

Full size image

As illustrated in Fig. 8, within the course performance segment, each class is organized into four groups, each comprising 10 students. Students in Class A exhibit a heightened capacity to accurately reflect upon their own perspectives, effectively summarize learning outcomes, and adeptly address learning challenges. Remarkably, 55% of Class A students articulate the presence of clear learning objectives and methods. They demonstrate adaptability in modifying their learning approaches based on task requirements and manifest a lucid grasp of their learning achievements. During peer evaluations within their groups, 60% of Class A students acknowledge the ability of group members to accommodate differing viewpoints, effectively fulfilling their individual responsibilities while fostering mutual coordination among group members, thereby enhancing overall learning efficiency. Compared to Class B, Class A students demonstrate a more precise ability to position works according to their appreciation, along with a more comprehensive understanding of emotional expression and specialized knowledge inherent in the works. Their application of aesthetic principles extends beyond the confines of the course to offer novel perspectives on appreciating works from the standpoint of other disciplines. In terms of teaching design, STEAM education-based courses revolve predominantly around the work’s central theme, emphasizing the process’s logical progression and its interconnectedness with other disciplines. In contrast, traditional audio-visual aesthetic education primarily centers on the course itself, potentially deviating from actual aesthetic demands and possessing certain limitations.

The three-stage education model, which amalgamates STEAM education with deep learning, exhibits a profound consideration of students’ learning conditions while harnessing the potent analytical capabilities of computers in education. This leads to a notable enhancement in teaching efficiency and flexibility. Notably, STEAM education implementation in colleges and universities has garnered high acclaim from educators, underscoring the current national focus on students’ comprehensive development and the promotion of interdisciplinary learning initiatives.

Research methodology and curriculum evaluation

This experiment used a comparative research approach to evaluate the quality of audio-visual aesthetics education. Purposeful sampling techniques were utilized to intentionally select two equivalently sized classes, ensuring a representative comparison. Consent was obtained from both students and teachers to ensure the legitimacy and ethicality of the research process. Surveys were conducted to collect feedback from students and teachers, and the gathered data were subsequently subjected to analysis and evaluation. Statistical methods were employed to assess the reliability of the collected questionnaire data. The participants are college students from different cultural backgrounds, covering a variety of majors and learning experiences. The participants are students from urban and rural areas, aged between 18 and 25. Most of the participants have received higher education and have a certain foundation in music and visual arts. The diversity of these participants enhances the applicability of the research results in different educational environments. In addition, the data collection process pays special attention to the participants’ backgrounds in audio-visual aesthetic education to ensure that the collected data can reflect their learning experiences in their respective cultural and educational environments.

The Art Appreciation Elective offered by Z University serves as the foundation for this comparative study. Two equally sized classes, denoted as Classes A and B, are purposefully selected for the teaching comparison. Their courses are primarily centered on music appreciation, with the addition of video appreciation to evaluate the audio-visual aesthetic teaching quality. The selected works for appreciation encompass traditional musical compositions such as A Parting Tune with A Thrice Repeated Refrain and High Mountains and Flowing Water. The instructional duration spans one week, encompassing three distinct stages: pre-learning, interactive learning, and extension learning. Throughout the course, various audio-visual materials are available to enrich the learning experience. Upon the week-long teaching sessions’ conclusion, students and teachers are provided with pertinent questionnaires to gather their feedback and insights. These questionnaires are subsequently collected for analysis and evaluation.

Regarding the course content, the enhanced curriculum integrating deep learning and STEAM education encompasses a pre-learning phase wherein students actively engage in learning and comprehension. Subsequently, the teaching effectiveness is evaluated based on three key aspects: aesthetic understanding, artistic perception, and cultural interpretation, each scored according to the instructor’s assessment. Three components of course performance evaluation are appraised: self-assessment, group peer assessment, and teacher evaluation, each contributing to the evaluation of teaching quality. The specific scoring criteria and details for these evaluations are provided in Table 2.

Table 2 Scoring criteria for course content evaluation.
Full size table

Each item in the evaluation is scored out of 15 points, with the following grading scale: (1) 1–5 points: average performance; (2) 6–10 points: good performance; (2) 11–15 points: excellent performance. Course performance is evaluated within the context of cooperative learning, encompassing three critical aspects: self-evaluation, group mutual evaluation, and teacher evaluation. The detailed scoring criteria for each of these aspects are outlined in Table 3.

Table 3 Scoring rules for course performance.
Full size table

The evaluation process involves scoring and assessing three specific aspects of course performance based on designated teaching indicators. Following this assessment, the collected questionnaires are subjected to statistical analysis.

Questionnaire reliability analysis

The analysis of the questionnaire reliability treats each individual questions within the questionnaire as equivalent. The reliability equation can be conceptualized as the ratio between the signal variance and the total variance, as expressed in Eq. (1) and Eq. (2).

$$:{S}_{x}^{2}={S}_{T}^{2}+{S}_{E}^{2}$$
(1)
$$:r=frac{{S}_{T}^{2}}{{S}_{X}^{2}}$$
(2)

In Eq. (1) and Eq. (2), the subscripts x, r, and E denote the variances of score, signal, and noise, respectively. The reliability coefficient, denoted by r, indicates the extent of relevance among the questions within the questionnaire. Its value is directly proportional to the level of consistency among the questions. If the questionnaire comprises k equivalent questions, the score variance can be represented by a matrix, as exemplified in Eq. (3).

$$:c = left[ {begin{array}{*{20}{c}}{sigma _{1,1}^2}&{: cdots :}&{:{sigma _{1,k}}} {: vdots }&{: ddots :}&{: vdots } {:{sigma _{k,1}}}&{: cdots :}&{:sigma _{k,k}^2} end{array}} right]$$
(3)

The sum of the diagonal elements in the matrix is denoted by (:sum:{sigma:}_{i}^{2}). The derivation of the reliability coefficient is presented in Eq. (4).

$$:1 – frac{{sum {sigma _i^2} }}{{sigma _Y^2}}$$
(4)

To ensure self-consistency of the equation, Eq. (4) is multiplied by (:frac{k}{k-1}), resulting in the general expression for Cronbach’s α, as shown in Eq. (5):

$$:alpha : = frac{k}{{k – 1}}left( {1 – frac{{sum {sigma _i^2} }}{{sigma _Y^2}}} right)$$
(5)

The reliability analysis, employing Cronbach’s α as the measure, primarily relies on covariance. This analysis entails multiplying the variance of each questionnaire question through the available variance value, resulting in a product of k, as depicted in Eq. (6).

$$:sum {sigma _Y^2 = kmathop vlimits^ – }$$
(6)

Similarly, the sum of covariances can also be expressed by (:stackrel{-}{c}), as shown in Eq. (7).

$$:sigma _Y^2 = kmathop vlimits^ – + ({k^2} – k)mathop climits^ -$$
(7)

Then, Eq. (8) is derived.

$$:alpha : = frac{k}{{k – 1}}left( {1 – frac{{kmathop vlimits^ – }}{{kmathop vlimits^ – + kleft( {k – 1} right)mathop climits^ – }}} right)$$
(8)

Equation (8) can be simplified as Eq. (9).

$$:alpha:=frac{kstackrel{-}{c}}{stackrel{-}{v}+left(k-1right)stackrel{-}{c}}$$
(9)

The variance and mean of Eq. (9) are converted into correlations among questionnaire items, leading to the mathematical representation of the reliability analysis, as depicted in Eq. (10).

$$:alpha:=frac{kstackrel{-}{r}}{1+left(k-1right)stackrel{-}{r}}$$
(10)

The reliability analysis employs the measure of Cronbach’s α reliability coefficient, as illustrated in Fig. 9.

Fig. 9
figure 9

Cronbach’s α reliability evaluation criteria.

Full size image

Figure 9 represents the confidence level, with a reliability coefficient exceeding 0.7 considered reliable according to established standards. Consistent with the principles and measurement criteria for reliability testing, the analysis of the questionnaire’s reliability is presented in Table 4.

Table 4 Analysis of results of questionnaire reliability.
Full size table

In Table 4, the overall reliability score of the questionnaire reaches 0.736, indicating credible results.

Specific applications of deep learning technology in audio-video aesthetic teaching

In instructional design, deep learning technology is mainly applied to audio-video aesthetic education in the following ways:

  1. (1)

    Emotion recognition: The convolutional neural network in deep learning is utilized to analyze students’ facial expressions and emotional responses when watching videos and listening to music during the learning process. Through emotion recognition technology, teachers can capture students’ emotional changes in real time, such as excitement, calmness, and concentration, and thus adjust teaching strategies and provide personalized guidance according to students’ emotional feedback.

  2. (2)

    Audio analysis: This research adopts the time-frequency image analysis technology in deep learning (such as short-time Fourier transform combined with deep learning models) to analyze the audio data of musical works. It intends to help students understand the rhythm, melody, and emotional expression in music more deeply. This technology can automatically generate music score analysis and emotional features, enhancing students’ music perception ability.

  3. (3)

    Visual data processing: In the video appreciation session, deep learning analyzes elements such as color, composition, and movement trajectory in videos to help students feel the beauty of artworks more comprehensively and visually. This not only helps improve students’ visual aesthetic ability but also enables students to associate and compare the visual elements in audio-video works with musical expressions.

Through the introduction of these technologies, the teaching process can more efficiently and accurately feedback students’ learning status and help students achieve significant improvement in emotional expression and artistic perception ability.

Challenges and limitations of deep learning in music education

As a core technology in the field of artificial intelligence, deep learning has shown great potential in art education in recent years. In addition to its application in college vocal music teaching, it can also play an important role in other fields of art education and provide new possibilities for interdisciplinary research. In art education, deep learning can be used in aspects such as image generation, style transfer, and art authentication. Through generative adversarial networks, students can blend their own creations with the styles of famous artists to create works with unique artistic styles. This not only stimulates students’ creative enthusiasm but also deepens their understanding of different art schools and techniques. Meanwhile, deep learning technology can also help authenticate the authenticity of artworks and provide technical support for the art market and art history research. In dance education, deep learning can be used for motion capture and posture analysis to help students correct dance movements and improve expressiveness. By recording students’ dance movements with camera equipment, deep learning models can analyze and provide feedback in real time, point out deficiencies in movements, and provide personalized practice suggestions. This technology not only improves teaching efficiency but also enhances students’ learning initiative. In the fields of drama and film education, deep learning can be used for emotion recognition and character analysis. Through the analysis of performers’ facial expressions and voices, deep learning models can evaluate the accuracy of their emotional expressions and help students better understand the psychology of characters. Moreover, deep learning can also be used in scriptwriting to assist screenwriters in generating plot developments and dialogues and stimulate creative inspiration. Combining deep learning with STEAM education helps cultivate students’ interdisciplinary thinking and comprehensive abilities. The advanced technical means provided by deep learning can enrich the teaching content and forms of STEAM education. For example, in music education, using deep learning technology for audio analysis and generation can help students understand music structure and creative principles more deeply; in engineering education, combining deep learning for music production and arrangement can cultivate students’ technical application ability and artistic creation ability. This integration not only enriches teaching content but also cultivates students’ comprehensive qualities and innovation abilities. With the continuous development of deep learning technology, its application prospects in art education will be even broader. Future interdisciplinary research can further explore the combination methods of deep learning and different art forms and develop more intelligent tools and platforms suitable for education. In addition, through interdisciplinary collaborative research, deep integration of technology and art can be achieved, promoting innovation and the development of education models.

Although deep learning technology has shown significant advantages in audio-video aesthetic teaching, there are still some challenges and limitations in specific applications:

  1. (1)

    Demand for hardware resources: The operation of deep learning algorithms usually requires strong computing power and a large amount of data support. However, many colleges and universities may face problems such as insufficient hardware equipment or imperfect infrastructure. This limits the widespread application of deep learning technology to a certain extent.

  2. (2)

    Technical ability requirements: For front-line teachers, using deep learning technology for instructional design and implementation may require a certain technical background or professional training. Teachers need to master basic programming skills and the usage methods of deep learning tools, which undoubtedly increases the complexity of teaching implementation.

  3. (3)

    Data acquisition and privacy issues: Deep learning models need a large amount of student data (such as emotional responses and audio responses) for training and optimization, which involves data collection, storage, and privacy protection. In educational institutions, how to collect and use these data legally and compliantly is an aspect that requires special attention.

  4. (4)

    Teaching resources and costs: The introduction of deep learning technology may mean the need to purchase specialized teaching software, training datasets, and corresponding hardware equipment. These resources may not be easily accessible in some educational institutions. In particular, schools with limited resources may find it difficult to implement this innovative teaching model.

Through the discussion of these challenges, it is hoped to provide valuable references for other researchers and educators when using deep learning technology and promote the reasonable and effective application of this technology in education.

Discussion

According to Harlen (2016), the development of a curriculum grounded in the principles of STEAM is feasible and emphasizes the need to enhance students’ potential and foster their comprehensive abilities, aligning with the viewpoint espoused in this research32. Li (2016) employed diverse evaluation methods in teaching development, combining STEAM education with maker education, and introduced novel evaluation approaches to drive educational advancement33. In line with these perspectives, this research adopts STEAM education to propose a novel teaching model, subsequently evaluating and comparing its efficacy against the traditional teaching approach. The research findings affirm the superior efficiency of the new teaching model.

Additionally, Hsiao and Su (2021) integrated the concept of sustainable development into VR-assisted STEAM education, providing students with comprehensive inter-disciplinary STEAM education. The research findings show that the combination of STEAM education and VR-assisted experiential courses improve students’ learning satisfaction and outcomes while motivating their learning drive34. Consistent with the results of this research, both studies demonstrate that STEAM education can enhance students’ interest in learning. Ozkan and Umdu Topsakal (2021) investigated the effectiveness of STEAM education in cultivating the conceptual understanding of force and energy topics among 13-14-year-old students. They conducted experiments with 7th-grade students in experimental and control groups. The results indicated that STEAM education positively influenced students’ conceptual understanding, reducing or transferring misconceptions. Moreover, the experimental group’s post-test conceptual understanding scores were significantly higher than those of the control group35. Their experiment, like this research, demonstrated that STEAM education can strengthen students’ comprehension of classroom knowledge and enhance their learning abilities. Mun (2022) explored the role of aesthetic experiences in the learning process of integrating art, science, and technology. They studied students’ experiences of creating interactive art in the context of STEAM education in South Korea. They found that through this learning experience, students recognized the limitations of their thoughts and changed their ideas by applying new scientific knowledge and skills36. This result indicated an improvement in students’ comprehensive abilities, which aligns with the results of this research. Utomo et al. aimed to assess the effectiveness of a STEAM biotechnology module incorporating Flash animations in high school biology instruction. The results indicated the validation of the module, with highly positive student responses. Effectiveness test results demonstrated significant learning progress among students who used the module37. These findings align with this research, suggesting that interdisciplinary STEAM approaches and multimedia teaching tools can enhance the effectiveness of biology education. Zheng initiated their study with an English audio-visual oral course for a 2022 law major class, using 50 original Disney English movies as core teaching materials. They further applied STEAM educational principles to enhance students’ listening and speaking skills38. This research shared commonalities with the current study, utilizing movies for audio-visual aesthetics education and integrating STEAM education principles.

The educational model adopted in this research places students at the forefront to harness their intrinsic motivation and foster their creative capacity through a dynamic three-stage deep learning process centered around the works. The overarching goal is to cultivate inter-disciplinary abilities among students. In light of the contemporary context and the rapid advancement of science and technology, it is imperative to continuously innovate teaching methods. Traditional modes of education, wherein teachers solely dictate information to students, fail to stimulate innovative thinking among students. Therefore, fostering students’ ability for scientific exploration necessitates a confluence of theory with practical application, allowing theoretical knowledge to evolve and flourish through hands-on exploration.

In practical teaching scenarios, the innovative curriculum design based on STEAM education and deep learning, along with the three-stage teaching process model, can effectively help students enhance their music and visual appreciation abilities and promote comprehensive qualities, fostering well-rounded development in aesthetic perspectives, emotional expression, and overall abilities. As a contemporary teaching concept, STEAM education prioritizes students’ comprehensive development and aligns with the requirements of the era of economic globalization. Combining the STEAM education concept with audio-visual aesthetic teaching provides an organic way to integrate scientific knowledge with emotional expression in the instructional approach. This novel teaching model can be promoted in higher education, positively impacting teaching quality and promoting students’ comprehensive development. The research’s evaluation of teaching effectiveness through questionnaire surveys demonstrates the superiority of the new educational model. This finding indicates that the new curriculum design concept and teaching process model can effectively enhance students’ learning outcomes and comprehensive abilities, holding significant practical significance in cultivating well-rounded college students.

Conclusion

Research contribution

In the context of ongoing reforms in the national education system, the enhancement of audio-visual aesthetic teaching quality in colleges necessitates the effective utilization of teaching resources and information reservoirs. Continuous improvement in teaching methodologies is crucial, optimizing the convergence of science and technology with instructional efficiency. Drawing on extensive literature research, this research synthesizes and examines the principles of deep learning and STEAM education, offering a comprehensive analysis of the current state, challenges, and methodological approaches in education. Significantly, an innovative and refined developmental model for educational courses is proposed, anchored in the prevailing themes of the teaching era. A three-stage teaching process model is introduced upon the foundation of deep learning, laying the groundwork for the future advancement of audio-visual aesthetic courses in higher education institutions. Given that college students have engaged with society and actively contribute to national development, their knowledge reservoirs reflect the country’s educational level and scientific and technological prowess. This research endeavors to support and empower audio-visual aesthetics educators, catalyzing advancing education within the nation.

This research holds multifaceted implications for policy, practice, and theoretical development. In terms of policy, it introduces a teaching model that fuses audio-visual aesthetics with STEAM education and empirically substantiates its efficacy in enhancing the quality of education. Policymakers may contemplate integrating this teaching approach into educational policies to foster more holistic education and elevate students’ overall competence. In practice, the three-stage teaching model and the content assessment methods provide practical teaching guidelines for educators. Professionals in the field of education can draw inspiration from these techniques to improve their teaching methods, particularly in the realm of audio-visual aesthetics education. In terms of theoretical advancement, this research, in collaboration with STEAM education theory, establishes a new theoretical framework for the field of education. This interdisciplinary educational approach may kindle greater interest among researchers in comprehensive educational models, thereby propelling further progress in educational theory. In summary, this research offers valuable insights for policymakers, educational practitioners, and researchers in education, contributing to enhancing educational practice and theory.

Future works and research limitations

Although this research has accomplished specific goals, several limitations should be noted. The research was conducted with students from just two classes, resulting in a relatively small sample size. This limited scope could potentially affect the research’s external validity, which concerns the generalizability of research findings to larger populations or different contexts. Consequently, future research endeavors might contemplate enlarging the sample size to ascertain the relevance of this study to more specific subjects and teaching scenarios. In the contemporary educational landscape, integrating information technology with STEAM education has emerged as a productive and encouraging teaching approach, fostering students’ independent learning capabilities. Dissemination of these relevant teaching methods is paramount and should be expeditiously pursued. Nonetheless, specific challenges, such as prevailing teaching philosophies among educators and the capital investment required by educational institutions, pose obstacles to the widespread adoption of STEAM education and deep learning in the education field. Consequently, the current implementation of these methods remains in a trial phase. To advance this field, future research endeavors should focus on broadening the application scope of STEAM education and deep learning in teaching, thereby facilitating more educational practitioners to recognize the research value of this innovative educational model. Moreover, improvements in academic courses are predominantly confined to the instructional aspect, and the understanding of deep learning and STEAM educational concepts may not be sufficiently comprehensive. The practicality of teaching experiments may lack universality, and due to resource constraints, experiments are restricted to one university, resulting in challenges in controlling variables effectively. In summary, the investigation of audio-visual aesthetic teaching calls for substantial verification and exploration. Additional experimentation with larger samples is warranted to yield more reliable and robust data. Overall, research on enhancing teaching methodologies in this domain remains in its early stages.