INTRODUCTION
In the last decades, an increasing number of interventions describe the integration of language arts and inquiry-based science and technology (S&T) instruction in elementary education (Cervetti et al., 2012; Vitale and Romance, 2011). In S&T education, students are taught about the natural and material (human-made) environment and technological artifacts by collecting, analysing and interpreting information gathered through experimenting and testing (Cakir, 2008; NGSS Lead States, 2013). Although the precise elementary school language arts curriculum varies across schools and countries, curriculum core standards typically address components of reading skills, writing skills and oral language in the first language (e.g., see Common Core Standards Initiative, 2010; International Reading Association and National Council of Teachers of English, 1996; SLO, 2020). Integrating language arts and inquiry-based S&T education aligns with sociocultural theories of learning, which emphasize that language learning can be enhanced when it occurs in a socially and culturally relevant context to students (Lemke, 1990). In line with these theories, science education is viewed as a community of discourse (Leach and Scott, 1995; Lemke, 1990), as language plays a vital role in developing an understanding of the world, and more specifically, of S&T (Daniels, 2001; Lemke, 1990). Constructivist and sociocultural theories of learning emphasize the importance of inquiry- or design-based pedagogies in S&T education, as it suits the true nature of S&T (Lewis, 2006). The theoretical alignments between S&T and language learning are reflected in recent curricular shifts towards promoting science knowledge-in-use (Harris et al., 2019) and language-in-use (Lee et al., 2013), rather than focusing on the body of knowledge of the subjects. Recent studies have showed there are multiple connections between the standards of the two subjects (Lee et al., 2013). Language arts and S&T instruction involve many similar cognitive and intellectual processes, such as making predictions and assessing the quality of arguments and assumptions based on data and evidence (Baker, 1991; Bradbury, 2014). Consequently, interest in linking the curricular contents of S&T with language arts education has increased in recent years.
For teachers, an important motive for integrating instruction in language arts and S&T concerns the limited time spent on teaching S&T in elementary education (Martin et al., 2012). Many teachers feel insecure about their content knowledge and scientific and technological skills (Appleton, 2007; Asma et al., 2011; Traianou, 2007). Integrating instruction in S&T and language arts can make S&T instruction more appealing for teachers, and could thus increase the time spent on teaching S&T (Appleton, 2007).
In response to these developments, interventions have been designed to evaluate the effects of integrated language arts and S&T (ILS&T) curricula that adopt an inquiry- or design-based pedagogy (e.g., Guthrie et al., 2004; Romance and Vitale, 2001). Because of the natural connection between both subjects and the potential learning opportunities in integrated approaches, an integrated approach could promote students’ metacognitive and conceptual growth, and lead to a higher level of retention (Yore and Treagust, 2006). However, interventions have shown high levels of variation with respect to their content and instructional approach, and in terms of their effects on students’ learning outcomes. An analysis of the differences between the interventions is needed to understand the variability in their effects. Integrated language arts and S&T curricula may differ substantially, because of the various language arts modalities (i.e., reading, writing, oral language, and vocabulary) and S&T characteristics (e.g., design or inquiry, primary focus on knowledge or skills) that can be involved. The characteristics of the interventions will logically affect their results. Below, we explicate how this review differs from previous reviews on this topic. This review aims to provide a systematic analysis of the effects of the reported ILS&T interventions by answering the following research questions:
-
What features characterize studies of and interventions for ILS&T?
-
Does ILS&T instruction enhance language arts and S&T learning compared to language arts and S&T instruction that is not integrated?
-
Which characteristics of ILS&T interventions are associated with the effects on language arts and S&T learning?
BACKGROUND LITERATURE
Before tackling the empirical literature related to the research questions, it is necessary to consider what the relevant theory and research have to say about expected student learning gains as a result of learning language arts and S&T in an integrated way. Given those expected learning gains, it is also important to develop a theoretical and empirical basis for determining which study and intervention characteristics could moderate the effects on student achievement in language arts and S&T.
Expected Student Learning Outcomes of ILS&T Instruction
From a theoretical perspective, ILS&T instruction might improve learning outcomes with respect to several aspects of language arts and S&T learning.
Children’s language skills might benefit from the meaningful and authentic context of science education. Integration of the language arts and science S&T curriculum provides students with an authentic purpose for communicating and using and interpreting language in different forms (e.g., texts, conversations, figures, see Christie, 2017; Hapgood and Palincsar, 2006). Because of this, scholars have proposed that S&T instruction can create a stronger sense of purpose and relevance and can therefore facilitate retention for language learning (Guthrie et al., 2006; Stoller, 2008). Scholars have also argued that reading motivation can be enhanced by situating reading in a meaningful and activating context, such as the science classroom (Wigfield and Guthrie, 1997).
Moreover reading (and writing) can be viewed as a constructive process of interpreting disciplinary information (Osborne, 2002), and prior knowledge is therefore considered to be a critical determinant of disciplinary learning, including in the language arts (Kintsch, 2004).
From the perspective of science education, S&T learning can arguably benefit from integration with language arts instruction, because more advanced language skills can help students transform global ideas into (S&T) knowledge that is more coherent and structured (Osborne, 2002). S&T knowledge and skills strongly appeal to students’ (productive and receptive) language skills. Like in the scientific community, students have to communicate and interpret information in texts and discussions that is often abstract and complex (for instance, see Lee et al., 2013). Enhancing students’ language skills can therefore also enhance students’ ability to participate in S&T practices.
Finally, various scholars have argued that students’ attitudes towards S&T learning could be enhanced by making S&T instruction more relevant and coherent to students’ daily lives, and by focusing on the distinctive value of engaging in science (Chen et al., 2014; Jenkins, 2011). This can be realized through integration with language arts education, by offering real-world examples through texts and discussions, and engaging with scientific content in meaningful and multimodal ways.
Although integration can be an effective approach to teaching language arts and S&T, there are also potential obstacles. Nixon and Akerson (2004) argued that it may be difficult to align the complex goals of language arts learning with the goals of S&T. For example, when students are instructed to report about their scientific observations using a new writing framework, this may end up suppressing the cognitive processing of science concepts, because students’ attention is mainly directed towards the correct use of the new writing structure. The limited attentional capacity model suggests that increasing the cognitive complexity of a task may cause students to prioritize one aspect of performance and neglect the others, due to limited attentional resources (Skehan and Foster, 2001). Similarly, teachers may feel the need to make allowances for the learning objectives of one of the domains to make the learning activities feasible for the students. This may lead to superficial treatment of challenging learning material, which can be demotivating for students (Brophy and Alleman, 1991).
Prior Reviews and Current Study
Scholars have previously examined the added value of curricula that integrate science and language for student learning. Yore et al. (2003) conducted an extensive conceptual review of the literature on literacy and science integration in order to outline current trends and future directions for this area of research, but did not evaluate the effects on student learning. Bradbury (2014) reviewed the literature on science and language integration, with an emphasis on empirical studies and their effects, but did not distinguish between traditional science instruction and inquiry-based or design-based instruction. Furthermore, the review did not report the ESs of the studies, but rather presented a descriptive analysis of the interventions. Several meta-analyses have evaluated the effect of literacy and science integration on specific aspects of language arts learning, such as vocabulary (Guo et al., 2016) and writing (Graham et al., 2020). Other meta-analyses have examined the integration of language arts instruction with other content-area learning (Graham et al., 2020; Hwang et al., 2022), including science, social studies, and mathematics. Although these reviews reported promising results, it is not yet possible to disentangle the effects of ILS&T instruction based on their findings. In particular, the impact of ILS&T instruction with inquiry- or design-based pedagogy has not been previously subjected to thorough examination in any review. This pedagogical focus is relevant, because of the shift from traditional, teacher-led science instruction to inquiry- or design-based pedagogies in curricular standards (e.g., NGSS Lead States, 2013). These pedagogies are underpinned by different theories of learning (e.g., behaviourism, constructivism) and are characterized by different beliefs about how knowledge is constructed. Furthermore, the way in which learning content is offered to students impacts learning outcomes (for a recent review, see Khalaf and Mohammed Zin, 2018). Inquiry- or design-based learning also exposes students to discipline-specific language and emphasizes communicating about complex concepts and relationships. The current review additionally encompasses all domains of language learning (i.e., reading, writing, oral language, and vocabulary) to reflect the language arts curriculum in elementary schools. In this way, the effects that are described in these interventions are closely aligned with classroom practice advocated in many current educational standards (e.g., NGSS Lead States, 2013). Finally, scholars have not previously canvassed the features of studies and interventions to identify effective approaches to ILS&T instruction. This requires a clear characterization of ILS&T studies and interventions, and an analysis of the relation between these characteristics and the intervention effects, which was a second aim of the current review.
Potential Moderators
Various factors can potentially affect the impact of ILS&T interventions on student outcomes. The effects on student outcomes can vary depending on the characteristics of the study (e.g., study design, instruments used to measure student achievement) and of the intervention (e.g., learning goals, instructional method). Potential moderators are described below and summarized in Table 2.
Study characteristics
When comparing effects of educational interventions, it is first important to consider the research design of the study (Wilson and Lipsey, 2001). Studies with a (cluster) randomized design (i.e., experiments in which clusters of people rather than individuals are assigned at random to treatments, such as pre-existing classes, schools) are generally preferred for studying intervention effects, as this design affords the best ground for causal inferences. When experimental and control groups are not assigned at random, this may lead to experimental and control groups that are not on average equal on relevant characteristics (e.g., motivation, performance level) which makes it difficult to attribute results to the intervention. In quasi-experiments, matching procedures can be used to promote that participants in the two conditions are as much as possible similar with respect to certain covariates (e.g., demographics, ability level, school setting). However, matching is only possible for those variables one has information on. It is, for example, difficult to match teachers on their teaching ability (Borenstein et al., 2019).
Second, it is important to consider the type of control group that is included in experimental studies. The effect of ILS&T instruction can be compared to control groups receiving only language instruction, only S&T instruction, or separate language and S&T instruction. For instance, when comparing the ILS&T intervention to only S&T instruction, students in the control group receive the same S&T instruction as the experimental group, but without the incorporation of language instruction. Therefore, it would seem likely that the control group students would show progress on their S&T outcomes, but not so much on their language outcomes, as they did not receive that instruction at all. Thus, the comparison with each type of control group presents a slightly different type of evidence.
Third, the implementation scale can affect the outcome of an intervention. Studies with a smaller sample size make it easier for program developers to maintain a higher degree of treatment fidelity (“super implementation”), and therefore tend to overestimate effects (Cheung and Slavin, 2016).
Fourth, the methods used to evaluate the intervention effects should be considered. Independent instruments are preferred to researcher-developed instruments (Wolf and Harbatkin, 2023). Researcher-developed instruments specifically designed for a study are often associated with higher effect sizes (ESs), because of their (over)alignment with the intervention (Cheung and Slavin, 2016; Wilson and Lipsey, 2001). The time of testing in pre-post-test designs can also influence intervention effects. Post-test measures that are administered directly after the intervention only inform about the short-term effects, whereas effects found with retention tests are more likely to be long-lasting. In addition, when reporting ESs, post-test gains that are not adjusted for pre-test differences can be a lingering effect of pre-test differences or standard error, instead of a treatment effect (Wilson and Lipsey, 2001).
Intervention characteristics
Besides the study characteristics, the interventions described in the studies can also vary, in terms of both the instructional intervention provided to students and the support (if any) given to teachers via a teacher professional development (TPD) program.
Instructional intervention. It is crucial to consider instructional alignment, which has been demonstrated to produce higher achievement results than poorly aligned instruction (Cohen, 1987). The learning objectives largely determine the content and format of instruction, and therefore, these aspects must be coordinated appropriately. Various learning taxonomies can be used to assess instructional alignment, such as Bloom’s learning taxonomy (Anderson et al., 2001) or Gagné’s learning hierarchies (Gagné, 1968). Primarily, assessing instructional alignment requires a classification of learning goals, learning content and instructional methods.
To this end, ILS&T instruction ideally involves concrete learning goals for language arts and for S&T learning. A widely used classification of the domains that learning goals may focus on is: knowledge, skills and/or attitude (e.g., Bloom and Krathwohl, 1956). Regarding S&T instruction, learning goals can attend to the development of knowledge of the natural and material environment, skills for scientific inquiry and technological design (e.g., defining problems, analysing data), and a critical, curious and investigative attitude (NGSS Lead States, 2013). In language arts instruction, learning goals can address the development of knowledge about language (e.g., text structures and knowledge of reading strategies), language skills (e.g., reading comprehension, the application of reading or writing strategies), and a positive attitude towards language-related learning (Common Core Standards Initiative, 2010). Moreover, language learning activities can serve as a vehicle for S&T learning, for instance, when oral language activities involve talking about S&T phenomena, without offering deliberate support for advancing students’ oral language skills. Additionally, the integration of the learning goals can be realized in various ways, depending on the intensity and complexity of subject integration. The framework developed by Gresnigt et al. (2014) proposes a hierarchy of integration approaches: fragmented, connected, fused, multidisciplinary, interdisciplinary and transdisciplinary (see Table 1). At the lowest level of integration, the curriculum has separate goals for each subject. At the highest level, the curricular goals transcend the individual disciplines.
Table 1. Classification of integrated language and S&T programs (Gresnigt et al., 2014)
No
|
Hierarchy of integration
|
1
|
Fragmented: Separate and distinct learning goals for the subjects. Often viewed as the traditional way of teaching.
|
2
|
Connected: A connection is made between the subjects. The content of the lessons is taught by separate teachers, or the content of the lessons is about one subject.
|
3
|
Nested: A skill or knowledge from one subject is targeted within the other subject, but each subject has its own (set of) learning goals and one of the subjects is dominant over the other.
|
4
|
Multidisciplinary: Two or more subject areas are organized around the same theme or topic, but the disciplines preserve their identity. Each subject has its own (set of) learning goals, and the subjects are equally important.
|
5
|
Interdisciplinary: Skills and concepts are emphasized across the subject areas rather than within the subjects. The (set of) learning goals transcend(s) the individual subjects. The learning goals are (predominantly) taken from subject curricula or schoolbooks and/or are teacher oriented.
|
6
|
Transdisciplinary: The curriculum and (set of) learning goals transcend the individual subjects, and the learning goals predominantly include solving real-world problems and/or are student oriented.
|
|
Second, what language arts and S&T learning content is taught to students to achieve the learning goals should be considered. Various frameworks have been developed to classify learning content in educational settings. Frameworks from a science inquiry perspective often include declarative or conceptual knowledge (knowing what) and procedures (knowing how, e.g., methods for investigating) (Furtak et al., 2012; McCormick, 1997). A third category that is often distinguished in this context relates to epistemological knowledge (knowing why, the nature of knowledge and how knowledge is created, e.g., knowing how scientists make claims) (Duschl, 2008; Furtak et al., 2012). Learning content can be classified as declarative knowledge (knowing what; e.g., factual or conceptual knowledge), procedures (knowing how, e.g., methods for investigating), or epistemological knowledge (knowing why, the nature of knowledge and how knowledge is created, e.g., knowing how scientists make claims). In many cases, educational interventions attend to more than one learning goal (e.g., knowledge and skills), and more than one type of learning content (e.g., declarative knowledge and procedures).
Third, the instructional method used to achieve the learning goals should be considered. Inquiry- or design-based learning is believed to best suit the nature of science, engineering and technology (Lewis, 2006). Based on a simplified version of the categorization of classroom inquiry developed by Banchi and Bell (2008), a distinction can be made between confirmatory inquiry or design and guided inquiry or design. In confirmatory inquiry or design, students are presented with a question/problem, and follow given procedures to confirm the answer or solution. In guided inquiry or design, students design their own procedures in a self-directed exploration, after being presented with a question or problem. In our view, inquiry- and design-based education should not be confused with unguided discovery learning without any form of direct instruction or transmission of knowledge as directions also have its place in inquiry and design education (Kirschner et al., 2006).
Linguistic activities in S&T instruction, such as argumentation exercises, place high demands on students’ cognitive, metacognitive and social abilities (Cheuk, 2016). Therefore, students need (temporary) support from the teacher to carry out tasks that are currently beyond their ability, also referred to as scaffolding (van de Pol et al., 2010). Scaffolding includes a diagnosis of students’ needs, which requires teachers to monitor students’ learning progress. Next, teachers determine the appropriate follow-up instructional support (i.e., scaffold). This support may involve teachers giving additional explanation, giving modelling examples (e.g., demonstration), or providing cognitive feedback to students. Then, a gradual shift in responsibility from teacher to student takes place as the student becomes more skilled.
Additionally, it can be expected that the duration of the intervention may influence the intended achievements. Students need sufficient instructional time to process information and develop knowledge and skills. Thus, short-term interventions may be less successful in achieving positive learning outcomes, especially when learning goals and content are complex.
Finally, it should be noted that contextual aspects of the student intervention play a role in the implementation, such as individual characteristics of the students (e.g., age, gender, prior knowledge, level of competence in S&T and language arts, see Kyriakides et al., 2018), teacher characteristics (e.g., experience with ILS&T teaching, teachers’ beliefs, attitude, and self-efficacy regarding the intervention, see Thurlings et al., 2015) and classroom characteristics (e.g., available time and resources for learning).
Teacher professional development. It is well known that teachers play a pivotal role in the success of educational reform and interventions (Dobber et al., 2017). Scholars have argued that integration of school subjects is a particularly complex undertaking that requires teachers to recognize meaningful connections between the learning processes in both subjects, and therefore requires teacher professional development (TPD; Akerson and Young, 2008; Bradbury, 2014). Literature in the field of content and language integrated learning (CLIL) extensively explored the complexities involved with synthesizing instruction in second language learning and subject areas, amongst which science or STEM (e.g., understanding of effective pedagogical strategies, language proficiency, collaboration, see Kim and Graham, 2022; Pérez Cañado, 2018). However, the existing literature does not yet provide sufficient insight into which aspects of TPD may have an effect on student learning outcomes in the particular context of ILS&T instruction. Below, we draw from the broader literature on effective characteristics of TPD in disciplinary contexts to identify several key features of TPD could potentially affect student learning outcomes in the context of ILS&T instruction. Additionally, in this examination of TPD characteristics, a cognitive psychology perspective is adopted, emphasizing teachers’ cognitive processes and decision-making strategies underpinning ILS&T teaching practices.
Because of the complexity of learning to integrate language arts instruction and S&T instruction, it is expected that short-term TPD programs will not suffice. However, the required duration of TPD programs may depend on the prior knowledge and experience of the teachers. Although scholars have often called attention the appropriate duration of TPD programs (e.g., Darling-Hammond et al., 2017; Desimone, 2002), there is no undisputed benchmark.
Much as in the discussion of the important characteristics of the instructional program for students, TPD programs require instructional alignment between the learning goals, learning content and instructional method to be effective (Cohen, 1987). Overall, TPD learning goals can be categorized as attending to the development of knowledge, skills, or attitudes. Two main types of knowledge can be distinguished: content knowledge (CK) and pedagogical content knowledge (PCK) (Fernandez, 2014; Shulman, 1986). In this context, CK refers to domain-specific knowledge of S&T and language arts, while PCK refers to the knowledge of appropriate instructional and pedagogical strategies for teaching subject-matter within the domains. Furthermore, teachers’ skills can be categorized as simpler or more complex skills; whereas complex skills require the coordination and integration of knowledge and skills (van Merriënboer and Kirschner, 2017), simple skills do not require the performer to process much information or make many decisions. For example, giving direct instruction to explain an S&T concept can be identified as a relatively simple skill, whereas scaffolding students can be categorized as a complex skill, as it requires teachers to make decisions that vary across different contexts (van Merriënboer and Kirschner, 2017). Finally, learning goals can attend to teachers’ attitude(s), for example, towards ILS&T education.
As with the instructional program for students, the learning content of TPD programs can be categorized as declarative knowledge (i.e., factual, conceptual), procedures (i.e., “know-how”, methods) or epistemological knowledge (i.e., the nature of knowledge within a domain).
Regarding the instructional method, it is important that there are clear standards for the intended competency development. Such standards can help make the desired outcome of the TPD activities more concrete and assessable, which helps determine whether TPD activities were effective in achieving their goals or not, and to plan for improvement if necessary. Through modelling of the intended instructional practices, best practices can be made explicit to teachers (Borko et al., 2010; Darling-Hammond et al., 2017). Furthermore, researchers have suggested that teachers should engage in practice-based TPD experiences that allow them to immerse themselves in their own learning in order to shift their thinking and teaching practice (Borko et al., 2010; Loucks-Horsley et al., 2009). By offering teachers opportunities for situated practice, the learning tasks are representative of the real task in daily life, which stimulates the transfer of learning (van Merriënboer et al., 2002). As in the instruction for students, teachers can be supported during the TPD activities by providing (a gradual decrease of) scaffolds, by monitoring progress and providing appropriate follow-up instructional support (Fang et al., 2008; van de Pol et al., 2010). In addition, Borko et al. (2010) argued that collaborative practice is an important component of high-quality TPD activities. When teachers can work collaboratively to enhance and reflect on their practice, as well as support and motivate each other, the TPD activities are more effective.
Finally, involving school leadership in TPD activities can enhance the successful implementation of reform efforts (Darling-Hammond et al., 2017; Desimone, 2002). School leaders may actively participate in TPD activities or promote the rationale behind the TPD actively among teachers.
Table 2. Overview of study and intervention characteristics that potentially moderate effects
Coded study characteristics
|
|
Research design
|
Cluster-randomized experiment / quasi-experiment
|
|
Implementation scale (number of students)
|
Small (< 100) / medium (100-500) / large (> 500)
|
|
Control group
|
Language-only / S&T-only / separate language and S&T instruction
|
|
Measurement method
|
Instruments
|
Independent / researcher-developed
|
Time of post-test
|
Directly after intervention / retention test
|
Intervention characteristics
|
|
Instructional intervention
|
|
Learning goal
|
Language a
|
NS / language knowledge / language skills / attitude towards language / language as a means for S&T learning
|
|
S&T a
|
NS / S&T knowledge, inquiry or design skills / attitude towards S&T
|
|
Level of integration
|
NS / fragmented / connected / nested (in S&T or language) / multidisciplinary / interdisciplinary / transdisciplinary
|
|
Learning content
|
Language a
|
NS / declarative knowledge / procedures / epistemological knowledge
|
|
S&T a
|
NS / declarative knowledge / procedures / epistemological knowledge
|
|
Instructional method
|
Learning tasks
|
NS / confirmatory inquiry/design / guided inquiry/design
|
|
Monitoring
|
No / yes
|
|
Follow-up instructional support a
|
NS / additional explanation / demonstration / cognitive feedback
|
|
Duration
|
Number of hours / time span
|
|
TPD intervention
|
|
TPD
|
Took place in study
|
Yes / no
|
|
Duration
|
Number of hours / time span
|
|
Learning goal a
|
NS, CK (language and/or S&T) / PCK (language and/or S&T) / simple skills, complex skills, attitude
|
|
Learning content a
|
NS / declarative knowledge / procedures / epistemological knowledge
|
|
Instructional method
|
Standards for intended teacher competencies
|
None / vague/not clear / clear
|
|
Modelling of intended instructional practices
|
Yes / no
|
|
Opportunities for practice a
|
Isolated from classroom context / situated in classroom context
|
|
Yes / no
|
|
Monitoring
|
Yes / no
|
|
Follow-up instructional support a
|
NS / additional explanation / demonstration / cognitive feedback
|
|
Collective participation
|
NS / individual / team-based
|
|
Attention to contextual conditions
|
Attention to school leadership and organization (for embedding the innovation)
|
Yes / no
|
|
a A combination of codes is possible.
NS = Not specified.
|
MATERIALS AND METHODS
Selection of Studies
A systematic search was carried out in three databases: Web of Science, Scopus, and ERIC. The search strings included search terms related to
-
language arts learning (including reading, writing, oral language, and vocabulary),
-
elementary school and
-
inquiry- or design-based S&T education.
Due to the complexity of the terminology, two sets of search terms were used, focusing on either inquiry or design learning. To find relevant literature on ILS&T interventions that included inquiry-based learning, the following search string was used: (“5E” OR “inquiry-based learning” OR “inquiry cycle” OR “science inquiry”) AND (“language” OR “literacy” OR “vocabulary” OR “reading” OR “writing” OR “oral language”) AND (“elementary school” OR “primary school”). The second search string addressed ILS&T interventions that included design-based learning: (“engineering design” OR “technological design”) AND (“language” OR “literacy” OR “vocabulary” OR “reading” OR “writing” OR “oral language”) AND (“elementary school” OR “primary school”). Several inclusion criteria were identified based on the PICOS framework (population, intervention, comparison, outcome, study type), as can be seen in Table 3.
Table 3. PICOS inclusion criteria for study selection
Population
|
It involves students from kindergarten through 6 (ages 4-12).
It is undertaken in a school setting to ensure ecological validity.
It involves a general student population instead of emphasizing student populations with specific characteristics (e.g., second language learners, learning disabilities).
|
Intervention
|
It evaluates an intervention that engages students in learning S&T as well as language arts.
S&T instruction in both conditions involves an inquiry- and/or design-based pedagogy. Educational approaches that are restricted to the transmission of declarative knowledge without paying attention to the practices and nature of S&T are beyond our scope.
|
Comparison
|
It examines effect of ILS&T intervention compared to a non-integrated (language and/or S&T) curriculum.
|
Outcome(s)
|
It reports quantitative measurements of the effects of the intervention on student learning outcomes for one or more of (a) knowledge, (b) skills, or (c) attitudes in relation to language and/or S&T.
|
Study type
|
It describes a two-group pre-test post-test design.
It was peer-reviewed and published in the last 20 years (2000-2022).
|
|
Additional relevant literature was found through the “snowball method”, by analysing the reference lists of relevant articles to yield further results. Figure 1 shows the PRISMA flow chart of the selection process for studies included in this review. A total of 19 studies ended up being included.
Analysis
To answer the first research question (What features characterize studies of and interventions for ILS&T?), all intervention characteristics were coded based on coding rules that were developed by the research team. The coding rules covered all variables that emerged from the theoretical framework listed in Table 2. Each code included a description and an example. It should be noted that, although the contextual aspects of the student intervention emerged from the theoretical framework as an intervention characteristic, this was not included as a variable in our study because the information in the articles was too limited. All studies were coded separately by the first author based on the coding rules. When studies provided too little information about certain characteristics, these variables were coded as not specified (NS). To determine interrater agreement, a random sample of 20% of the studies was coded by a second independent researcher, who was first trained in the use of the coding rules. Each coder was given a copy of the full articles, coding sheet, and coding instructions. The total percentage of agreement was 88.6%. Due to the large number of categories that was coded, and the fact that the categories consisted of varying numbers of coding options, it was not possible to calculate an overall Cohen’s kappa.
To answer the second research question (Does ILS&T instruction enhance language and S&T learning compared to language and S&T instruction that is not integrated?) the learning outcomes of the interventions were compared using Cohen’s d as the common measure for effect sizes (ESs). Cohen’s d expresses the standardized mean difference between experimental and control groups. When a study did not report ESs, we primarily used means and standard deviations from the experimental and control group to calculate the difference in post-test means divided by the pooled standard deviation, per the following formula (Cohen, 1998).
\[Cohen^{'}s\ d = (M_{2} - M_{1})/{SD}_{pooled}\]
where
\[{SD}_{pooled} = \sqrt{({SD}_{1}^{2} + {SD}_{2}^{2})/2}\ .\]
When the means and standard deviations were not reported, ESs were calculated based on F-tests, t-tests (Borenstein et al., 2019), or given values of Pearson’s r (Ruscio, 2008) or Z-score (Rosenthal and DiMatteo, 2001). In the case of missing data, authors were contacted to attempt to retrieve the data. Several studies reported ESs as partial eta squared (η2p), which can be defined as the ratio of variance accounted for by an effect and its associated error variance. For comparison, the partial eta squared values were also converted to Cohen’s d values (Cohen, 1998), although the partial eta squared results were also given, as they refer to a different type of effect. Some studies reported more than one outcome measure for the same outcome variable. For these studies, the mean ES was calculated for the outcome variable. If the sample sizes were roughly equal, the unweighted mean was computed. If the sample sizes were different, the mean ESs were weighted by the sample sizes. Likewise, there were multiple studies that reported mean scores for different groups (e.g., for grade 3 and for grade 5). These ESs were combined in the same manner, if the groups received the same intervention under the same conditions.
In terms of homogeneity, we assumed high variability of effects across studies. As can be seen in Table 4, the studies varied in terms of study design, scale, and instrumentation.
Moreover, studies used different types of control groups, comparing the ILS&T intervention to S&T-only, language-only, or separate S&T and language instruction. In the context of this review, we considered all three control groups to be a type of “business-as-usual” approach. However, they offer a different type of evidence. Therefore, the computation of an overall ES would not be representative and would not allow for the careful consideration of the conditions under which the studies were carried out. To report combined effects of the studies on students’ learning achievements, the mean weighted ESs for all outcome variables were computed for studies with a similar study design (i.e., cluster-randomized, quasi-experimental), by weighting the ESs by their inverse variance (Borenstein et al., 2021; Wilson and Lipsey, 2001). The standard error of estimate (SE) was calculated for each ES, using the formula below.
\[SE = \sqrt{\frac{n_{1} + n_{2}}{n_{1}n_{2}} + \frac{{\overline{ES}}_{sm}}{2(n_{1} + n_{2})}}\ .\]
Then, the SE was transformed into a weight, using the formula 1/SE2. Finally, the weighted mean ES was calculated by dividing the sum of the weighted ESs by the sum of the weightings:
\[\overline{ES} = \frac{\sum_{}^{}{(w \times ES)}}{\sum_{}^{}w}\ .\]
To answer the third research question (Which characteristics of ILS&T interventions are associated with the effects on language and S&T learning?), the intervention characteristics were analysed to identify recurring patterns in the relation between those characteristics and the effects on student achievement.
Whether the interventions were characterized by an appropriate match between the learning goals, learning content and instructional method was investigated first. To determine this, four relations were examined, namely,
-
S&T learning goal and learning content,
-
language learning goal and learning content,
-
S&T learning goal and instructional method (i.e., type of learning task), and
-
S&T learning content and instructional method (i.e., type of learning content).
The alignment of the language learning goal and content with the instructional method was not assessed, because here, the instructional method referred to the type of learning task which was S&T-related (i.e., confirmatory or guided inquiry/design tasks). For each of these relations, a study could receive 1 point, adding up to a maximum total score of 4. Relations that could not be scored due to missing information, for example, when a study did not specify the learning goal, were coded as not specified (NS). This study would also receive a “missing” total score, because otherwise the total score would indicate that one or more of the relations was deemed not appropriate. The first and second relations addressed the match between the learning goal and learning content for S&T and language, which involved a similar set of scoring rules, but they were scored separately. The subjects’ learning goals were deemed an appropriate match in case of learning content that included declarative or epistemological knowledge. When learning goals involved skills, the learning content had to address procedures to be an appropriate match. All types of learning content were considered appropriate for learning goals related to attitude. For the third relationship, it was examined whether the S&T learning goals were appropriately aligned to the type of learning task (i.e., confirmatory or guided inquiry/design tasks). Confirmatory inquiry was not considered appropriate for learning goals that included skills in inquiry/design, because students are following given procedures to confirm an answer or solution. Therefore, students likely do not develop inquiry or design skills in such a setting.
Whether the other intervention characteristics distinguished in Table 2 were associated with the intervention effects was examined next. To do so, similar procedures were followed as for the analysis of the second research question, by calculating the mean weighted ESs of studies when grouped together by similar intervention characteristics, by weighting the ESs by their inverse variance (Borenstein et al., 2021; Wilson and Lipsey, 2001).
RESULTS
Research Question 1: Characteristics of the Studies and Interventions
Study characteristics
Table 4 shows an overview of the characteristics of all 19 studies included in this review. The studies were mostly conducted in the USA (n = 11). The other studies were conducted in Taiwan (n = 5), United Kingdom (n = 1), Turkey (n = 1) and the Netherlands (n = 1). They were evenly distributed with respect to their publication dates: with 6 studies being published between 2000-2006, 5 studies between 2007-2013, and 8 studies between 2014-2021. Most studies involved students from the middle years of elementary school (Grades 3, 4 and 5) and adopted a quasi-experimental design (n = 16), out of which 7 studies had composed the experimental and control groups based on matching procedures at the start of the study. In the other studies (n = 9), non-random assignment procedures were used. Few studies adopted the strongest research design (i.e., a cluster-randomized controlled trial; n = 3) and only one study included a retention test. Ten studies included a medium student sample size (100-500). The other studies adopted either small (<100) or large (>500) student sample sizes (n = 5 and n = 4), respectively). The experimental conditions were compared to a control condition where students received either only S&T instruction (n = 10), separate language and S&T instruction (n = 4), or only language instruction (n = 4). For one study, the type of instruction in the control group was not specified. Most control groups received instruction according to the regular, district-adopted language arts and/or science curriculum (n = 16), with a few exceptions where a different intervention was implemented in the control group (e.g., strategy instruction for reading comprehension without S&T integration, Guthrie et al., 2004). The use of independent or researcher-developed instruments to measure student achievement was evenly distributed in the sample.
Table 4. Overview of study characteristics and effects on student learning outcomes
Reference
|
DE
|
SSS
|
G
|
Instrument
|
TT
|
A
|
Control Group
|
Effects
|
Country
|
S&T
|
Language
|
Biyik and Senel (2019)
|
QE
|
38
|
4
|
RD (S&T knowledge)
I (inquiry or design skills)
|
A, R
|
Y
|
S&T only
|
Knowledge: d=0.66a,
d=0.47a (retention test)
Inquiry or design skills: d=0.57a
|
|
Turkey
|
Cervetti et al. (2012)
|
CR
|
467 to 1.027
|
4
|
RD
|
A
|
Y
|
S&T only
|
Knowledge: d=0.54a
|
Reading: d=0.09a
Writing: d=0.40
Vocabulary: d=0.13a
|
USA
|
Chen et al. (2013)
|
CR
|
838
|
4
|
RD
|
A
|
Y (S&T knowledge), N (writing)
|
S&T only
|
Knowledge: d=0.25
|
|
USA
|
Chen et al. (2016)
|
QE
|
72
|
4
|
RD
|
A
|
Y
|
S&T only
|
Knowledge: d=0.77a (η2p=0.13)
Attitude: d=0.51a (η2p=0.06)
|
|
Taiwan
|
Girod and Twyman (2009)
|
QE
|
53
|
2
|
I
|
A
|
Y
|
S&T only
|
Knowledge: d=1.35a
Attitude: d=0.38a
|
|
USA
|
Guthrie et al. (2000)
|
CR
|
162
|
3, 5
|
I
|
A
|
Y
|
Separate instruction both
|
-
|
Reading motivation: d=0.44a
|
USA
|
Guthrie et al. (2004)
|
QE (matched)
|
243
|
3, 5
|
Ic
|
A
|
N
|
Separate instruction both
|
-
|
Reading: d=1.11
Reading motivation: d=1.2a
|
USA
|
Hong et al. (2013)
|
QE
|
218
|
5
|
RD (S&T knowledge), I (S&T attitude)
|
A
|
Y
|
NS
|
Knowledge: d=0.91a (η2p=0.17)
Attitude: d=0.29a (η2p=0.29)
|
|
Taiwan
|
Kara and Kingir (2021)
|
QE
|
107
|
4
|
RD
|
D, A
|
Y
|
S&T only
|
Knowledge: d=1.12a
|
|
Taiwan
|
Lai and Chan (2020)
|
QE
|
118
|
5
|
RD
|
A
|
Y
|
S&T only
|
Knowledge: d=0.32a
Attitude: d=0.19a
|
|
Taiwan
|
Lutz et al. (2006)
|
QE
|
80
|
4
|
RD
|
A
|
Y
|
Language only
|
|
Reading: d=0.87a
|
USA
|
Mercer et al. (2004)
|
QE (matched)
|
230
|
4
|
I
|
NS
|
Y
|
Separate instruction
both
|
Knowledge: d=0.39a
|
|
UK
|
Romance and Vitale (2001)b
|
QE (matched)
|
540
|
4, 5
|
I
|
NS
|
Y
|
Separate instruction
both
|
Knowledge: d=0.68a
|
Reading: d=0.22a
|
USA
|
van Keulen and Boendermaker (2020)
|
QE (matched)
|
141
|
3-6
|
I
|
A
|
Y
|
Language only
|
Attitude: d=0.22a
|
Reading: d=0.05a
|
Netherlands
|
Vitale and Romance (2011)
|
QE (matched)
|
513
|
1, 2
|
I
|
NS
|
Y
|
Separate instruction
both
|
Knowledge: d=0.16a
|
Reading: d=0.52a
|
USA
|
Vitale and Romance (2012)
|
QE (matched)
|
363
|
1, 2
|
I
|
A
|
N
|
Separate instruction
both
|
Knowledge: d=0.94a
|
Reading: d=0.72a
|
USA
|
Wigfield and Guthrie (2004)
|
QE (matched)
|
350
|
3
|
I
|
A
|
Y
|
Language only
|
|
Attitude (reading motivation): d=0.18a
|
USA
|
Wright and Gotwals (2017)
|
QE
|
147
|
K
|
RD
|
A
|
Y
|
Separate instruction
both
|
Knowledge: d=1.17a
|
Vocabulary: d=1.42a
|
USA
|
Yang and Wang (2014)
|
QE
|
49
|
4
|
RD
|
D, A
|
Y
|
S&T only
|
Knowledge: d=0.63a
|
|
Taiwan
|
Note: Study characteristics that were not specified in studies are indicated by NS. For study type, CR: Cluster-randomized; QE: Quasi-experimental. For instrument, RD: Researcher developed; I: Independent. Grade levels are equivalents in the US educational system. Time of testing is coded as directly after (A), retention (R) or during (D) the intervention. Bolded effect sizes are statistically significant.
DE = Design; SSS = Student sample size; G = Grade; TT = Time of testing; A = Adjusted for pre-test differences.
a Cohen’s d was calculated by the researcher based on the available data about the study.
b Only year 4 findings were included from Romance and Vitale (2001), as the other data did not comply with our inclusion criteria.
b Only ESs measured by independent measures were included, as both types of instruments were used in this study.
|
Intervention characteristics
The interventions could include both instructional intervention (for students) and TPD (for teachers). Table 5 shows an overview of the characteristics of the instructional intervention in the 19 studies. The ILS&T instructional interventions most often incorporated multiple learning goals that attended to (a combination of) language skills, specifically writing (n = 15) and reading (n = 12), and S&T knowledge (n = 16) and inquiry or design skills (n = 10). Regarding the integration level, most studies adopted a multidisciplinary approach (n = 10); none of the studies adopted the lowest or highest integration levels (i.e., connected, transdisciplinary). The language-related learning content largely emphasized procedures (e.g., how to write a report; n = 14). Writing activities took many forms, for example journal writing, summarizing, and writing evidence-based explanations. The reading activities addressed second-hand investigations through texts, developing content-area reading strategies, and the understanding of text features and structures, among other things. Fewer studies included learning content that addressed oral language (n = 6) and vocabulary (n = 3). Examples of oral language activities are constructing scientific arguments through evidence-based reasoning and developing discussion skills. The vocabulary activities attended to, for example, academic and scientific vocabulary. The S&T learning content predominantly related to declarative knowledge (e.g., definition of force; n = 14). Unfortunately, in many studies the instructional methods used in the intervention were not specified. What did stand out was that most interventions engaged students in guided inquiry or design activities (n = 12) but did not specify that the monitoring of students’ learning progress was part of instruction, or that a type of follow-up instructional support (e.g., additional explanation, demonstration) was offered to students. The duration of the interventions was widely spread between 12 to 135 hours, with most interventions lasting between 20-50 hours (n = 8). The time span over which the intervention took place ranged from 4 to 40 weeks, with most interventions covering relatively short time spans (12 weeks or less; n = 12). Four studies did not indicate the number of hours spent on the ILS&T intervention.
Table 5. Overview of intervention characteristics
Reference
|
Learning Goal
|
Learning Content
|
Method
|
Language
|
S&T
|
Integration level
|
Language
|
S&T
|
Learning task
|
Intervention duration
|
Biyik and Senel (2019)
|
Language as S&T learning
|
Knowledge
|
N (in S&T)
|
NS
|
NS
|
NS
|
24 hours/8 weeks
|
Cervetti et al. (2012)
|
Skills
|
Knowledge
|
M
|
DK, P
|
DK, P
|
GI
|
30-40 hours/40 weeks
|
Chen et al. (2013)
|
Skills
|
Knowledge
|
N (in S&T)
|
P
|
DK
|
NS
|
NS/8 weeks
|
Chen et al. (2016)
|
NS
|
NS
|
N (in S&T)
|
NS
|
DK
|
CI
|
24 hours/ 12 weeks
|
Girod and Twyman (2009)
|
Knowledge, skills
|
Knowledge, inquiry or design skills
|
I
|
NS
|
NS
|
NS
|
NS/10 weeks
|
Guthrie et al. (2000)
|
Skills
|
Knowledge, inquiry or design skills
|
M
|
P
|
DK, P
|
GI
|
NS/36 weeks
|
Guthrie et al. (2004)
|
Skills
|
Knowledge, inquiry or design skills
|
M
|
P
|
DK, P
|
GI
|
90 hours/ 12 weeks
|
Hong et al. (2013)
|
Skills
|
Knowledge, inquiry or design skills
|
I
|
P
|
NS
|
GI
|
18 hours/12 weeks
|
Kara and Kingir (2021)
|
Skills
|
Knowledge, inquiry or design skills
|
M
|
P
|
P
|
GI
|
34 hours/17 weeks
|
Lai and Chan (2020)
|
Skills
|
Knowledge
|
N (in S&T)
|
P
|
DK, P
|
NS
|
27 hours/9 weeks
|
Lutz et al. (2006)
|
Skills
|
Knowledge, inquiry or design skills
|
M
|
P
|
DK, P
|
GI
|
90-120 hours/12 weeks
|
Mercer et al. (2004)
|
Knowledge, skills
|
NS
|
N (in language)
|
DK, EK
|
DK
|
GI
|
12 hours/23 weeks
|
Romance and Vitale (2001)
|
Skills
|
Knowledge
|
M
|
P
|
DK, P
|
GI
|
80 hours/40 weeks
|
van Keulen and Boendermaker (2020)
|
Skills
|
Inquiry or design skills
|
M
|
P
|
P
|
GI
|
52 hours/26 weeks
|
Vitale and Romance (2011)
|
Skills
|
Knowledge
|
M
|
P
|
DK, P
|
GI
|
30 hours/8 weeks
|
Vitale and Romance (2012)
|
Skills
|
Knowledge, inquiry or design skills
|
M
|
NS
|
DK, P
|
GI
|
135 hours/36 weeks
|
Wigfield and Guthrie (2004)
|
Skills
|
Knowledge, inquiry or design skills
|
M
|
P
|
DK, P
|
GI
|
NS/12 weeks
|
Wright and Gotwals (2017)a
|
Skills
|
Knowledge, inquiry or design skills
|
I
|
DK, P
|
DK, P
|
CI
|
30 hours/8 weeks
|
Yang and Wang (2014)
|
Skills
|
Knowledge
|
N (in S&T)
|
P
|
DK
|
CI
|
12 hours/4 weeks
|
Note: NS = Not specified. For integration level, N = nested, M = multidisciplinary, I = interdisciplinary. For learning content, DK = declarative knowledge; P = procedures; EK = epistemological knowledge. For learning task, GI = guided inquiry/design, CI = confirmatory inquiry/design.
a Monitoring was specified to occur only in Wright (2017). Whether Follow-up instructional support was provided was not specified in any of the included studies.
|
Out of the 19 studies, 10 studies incorporated TPD activities to equip teachers for the ILS&T instructional intervention that was implemented in the study. Table 6 shows an overview of the characteristics of the TPD activities that were described in these 10 studies. The mean duration of the TPD activities was 48 hours (SD = 23.8) over a mean time span of 8.2 weeks (SD = 10.2). In five studies, the learning goals were not specified. In the other studies, the TPD learning goals were related to pedagogical content knowledge (n = 4) for S&T instruction, content knowledge (n = 3), and complex teacher skills (developing ILS&T lessons). The learning content as operationalized in the TPD activities was not specified in most studies (n = 8). The standards for the intended teacher competencies were not specified in any of these studies. Only one study included opportunities for teachers to practice in their own classrooms (situated practice). In one study, specific mention was made that teachers’ progress was continually monitored throughout the duration of the TPD activities. In four studies, teachers collectively participated in the TPD activities as a school team. None of the studies reported the involvement of school leaders in the TPD activities.
Table 6. Overview of TPD characteristics
Reference
|
Learning goal(s)
|
Learning content
|
Opportunities for practice
|
Modelling of intended instructional practices
|
M
|
Duration
|
ICP
|
Guthrie et al. (2000)
|
PCK
|
NS
|
NS
|
NS
|
NS
|
40 hours / missing
|
Individual
|
Guthrie et al. (2004)
|
NS
|
NS
|
NS
|
NS
|
NS
|
80 hours / missing
|
Individual
|
Lutz et al. (2006)
|
NS
|
NS
|
NS
|
NS
|
NS
|
80 hours / 2 weeks
|
NS
|
Mercer et al. (2004)
|
NS
|
NS
|
NS
|
Yes
|
NS
|
8 hours / missing
|
Individual
|
Romance and Vitale (2001)
|
CK, PCK (S&T)
|
Declarative knowledge, procedures
|
Yes, situated
|
NS
|
NS
|
60 hours / 13 weeks
|
Team-based
|
van Keulen and Boendermaker (2020)
|
PCK (S&T), complex skills
|
NS
|
NS
|
NS
|
NS
|
60 hours / 24 weeks
|
Team-based
|
Vitale and Romance (2011)
|
CK, PCK (S&T)
|
Declarative knowledge, procedures
|
NS
|
NS
|
Yes
|
32 hours / missing
|
Individual
|
Vitale and Romance (2012)
|
NS
|
NS
|
NS
|
NS
|
NS
|
36 hours / 1 week
|
NS
|
Wigfield and Guthrie (2004)
|
NS
|
NS
|
NS
|
NS
|
NS
|
Missing / 2 weeks
|
Individual
|
Wright and Gotwals (2017)
|
CK
|
NS
|
NS
|
NS
|
NS
|
36 hours / 1 week
|
Team-based
|
Note: NS = Not specified. Standards for intended competencies, follow-up instructional support, and attention to school leadership were not specified to occur in any of the included studies. M = Monitoring; ICP = Individual or collective participation.
|
Research Question 2: Effects of ILS&T Interventions on Students’ Learning Achievement
All ESs that were obtained from the 19 included studies are given in Table 4. Out of all obtained ESs, 16 were statistically significant, all of which were in favour of the treatment group. On average, students who received ILS&T instruction demonstrated higher levels of learning achievement than their peers in the control group, with a mean ES of d = 0.43. However, it is important to acknowledge the high variability in the ESs, which ranged from d = 0.05 to d = 1.71. Moreover, the ESs reflected several different outcome variables (e.g., writing, S&T knowledge, reading motivation). It was therefore worthwhile to perform a closer examination of the ESs (per outcome variable). To this end, Table 7 shows that when ESs are grouped by the varying outcome variables, all mean ESs have a positive direction, meaning that the treatment group outperformed the control group. In almost all studies, students receiving the ILS&T intervention outperformed their peers in the control group for learning achievement in language and S&T. Another distinct finding from Table 7 is the relatively high number of studies that measured an intervention effect on reading, S&T knowledge, and attitude towards S&T learning, compared to the other outcome variables. Moreover, none of the studies investigated the effects on students’ oral language achievement. Although some interventions did include oral language activities, the effect on students’ corresponding language achievement was not measured. In other words, the evidence on the impact of such interventions on oral language achievement was not strong.
Table 7. Mean effect sizes of ILS&T interventions per outcome measure
Outcome variable
|
Mean ES (Cohen’s d)
|
Number of ESs
|
Vocabulary
|
0.20
|
2
|
Writing
|
0.40
|
1
|
Reading
|
0.33
|
7
|
Reading motivation
|
0.62
|
3
|
S&T knowledge
|
0.56
|
14
|
Inquiry or design skills
|
0.57
|
1
|
Attitude towards S&T learning
|
0.31
|
5
|
|
Although the findings above provide evidence for the effects of ILS&T instruction on student achievement, we also considered the wide variation in the characteristics of the studies (e.g., study design) and the implications of this for the strength of the evidence provided by the studies. Hence, Table 8 shows the results of categorizing the studies based on their characteristics and determining the mean ESs for the grouped studies. The mean ES for the quasi-experimental studies was more than twice as large as the mean ES for the cluster-randomized studies, with the largest mean ES found for studies that used matching procedures for composing the experimental and control groups. The studies that included a large sample (N > 500) yielded a smaller mean ES than studies with small or medium samples. Table 8 shows that studies that compared the ILS&T intervention to separate language and S&T instruction demonstrated the highest mean ES. All studies with a control group receiving separate language and S&T instruction reported statistically significant ESs in favour of the treatment group. These ESs concerned measures of S&T knowledge (5 ESs), reading (4 ESs), reading motivation (2 ESs) and vocabulary (1 ES). Out of the 10 studies that compared ILS&T interventions to S&T-only instruction, 8 ESs were statistically significant, measuring effects related to S&T knowledge (8 ESs), inquiry or design skills (1 ES), attitude towards S&T learning (1 ESs), writing (1 ES) and vocabulary (1 ES). The mean ES for the studies that compared the ILS&T intervention to language only instruction was considerably lower than the ES for studies with other types of control groups. Among the studies that included a control group receiving language-only instruction, only half of the ESs were statistically significant: one for reading comprehension, and one for reading motivation. Based on these studies, it is difficult to determine whether the integration of S&T with language arts instruction enhances students’ learning achievement in language compared to when they are offered language instruction only. Finally, Table 8 indicates that ESs that were obtained through measures with independent instruments were twice as high as ESs obtained through measures with researcher-developed instruments.
Table 8. Comparison of mean effect sizes of ILS&T interventions for study characteristics
Study characteristic
|
Mean ES (Cohen’s d)
|
Number of ESs
|
Number of studies
|
Study design
|
Cluster-randomized
|
0.25
|
6
|
3
|
Quasi-experimental
|
0.61
|
27
|
16
|
Matched groups
|
0.78
|
15
|
7
|
Non-equivalent groups
|
0.56
|
12
|
9
|
Sample size
|
Small (< 100)
|
0.71
|
8
|
5
|
Medium (100-500)
|
0.66
|
19
|
10
|
Large (> 500)
|
0.26
|
6
|
4
|
Control group type
|
Separate language and S&T instruction
|
0.66
|
12
|
4
|
S&T-only instruction
|
0.29
|
15
|
10
|
Language-only instruction
|
0.21
|
4
|
4
|
Not clear
|
0.59
|
2
|
1
|
Type of instrument
|
Independent
|
0.57
|
17
|
11 a
|
Researcher-developed
|
0.32
|
16
|
10 a
|
a Some studies used both independent and researcher-developed instruments to measure different outcome variables, which is why the sum of the number of studies adds up to more than 19.
|
Research Question 3: Relation Between Intervention Characteristics and Intervention Effects
The characteristics of the ILS&T interventions as described in the 19 studies are given in Table 5. Overall, it stands out that few studies provide detailed descriptions of the ILS&T intervention, making analysis of the relation between these intervention characteristics and the intervention effects challenging. We hypothesized that a good alignment between the learning goal, learning content and instructional method of the ILS&T interventions would enhance learning and would therefore lead to better results (higher ESs). Table 9 shows how well the interventions were aligned with respect to their goals, content, and method. Out of the 19 studies, 11 studies provided sufficient information to determine whether there was an appropriate match between the learning goals, content, and instructional method of the intervention. Out of these 11 studies, only 1 study demonstrated a misalignment between these factors (Wright and Gotwals, 2017).
In that study, the learning goals and content included, among others, inquiry or design skills and (S&T) procedures, while instruction followed a confirmatory inquiry approach (i.e., students followed given procedures), which is not the most appropriate method for the development of such skills. Nevertheless, the study reported very high ESs for student achievement, although this may be explained by the fact that only student achievement in S&T knowledge and vocabulary were measured.
Table 9. Alignment between learning goals, content, and instructional method in ILS&T interventions
Reference
|
Learning goal – Learning content
|
Learning goal – Instructional method
|
Learning content – Instructional method
|
Total score
|
Language
|
S&T
|
Biyik and Senel (2019)
|
Missing
|
Missing
|
Missing
|
Missing
|
Missing
|
Cervetti et al. (2012)
|
1
|
1
|
1
|
1
|
4
|
Chen et al. (2013)
|
1
|
1
|
Missing
|
Missing
|
Missing
|
Chen et al. (2016)
|
Missing
|
Missing
|
Missing
|
Missing
|
Missing
|
Girod and Twyman (2009)
|
Missing
|
Missing
|
Missing
|
Missing
|
Missing
|
Guthrie et al. (2000)
|
1
|
1
|
1
|
1
|
4
|
Guthrie et al. (2004)
|
1
|
1
|
1
|
1
|
4
|
Hong et al. (2013)
|
1
|
Missing
|
1
|
1
|
Missing
|
Kara and Kingir (2021)
|
1
|
1
|
1
|
1
|
4
|
Lai and Chan (2020)
|
1
|
1
|
Missing
|
Missing
|
Missing
|
Lutz et al. (2006)
|
1
|
1
|
1
|
1
|
4
|
Mercer et al. (2004)
|
0
|
Missing
|
1
|
1
|
Missing
|
Romance and Vitale (2001)
|
1
|
1
|
1
|
1
|
4
|
van Keulen and Boendermaker (2020)
|
1
|
1
|
1
|
1
|
4
|
Vitale and Romance (2011)
|
1
|
1
|
1
|
1
|
4
|
Vitale and Romance (2012)
|
Missing
|
0
|
1
|
1
|
Missing
|
Wigfield and Guthrie (2004)
|
1
|
1
|
1
|
1
|
4
|
Wright and Gotwals (2017)
|
1
|
1
|
0
|
0
|
2
|
Yang and Wang (2014)
|
1
|
1
|
1
|
1
|
4
|
|
Table 10 shows the mean ESs for interventions that adopted a similar level of integration. It can be observed that the mean ES increases with higher levels of integration, indicating that when the two subjects are more intertwined, this is accompanied by higher learning gains compared to non-integrated instruction. It was expected that studies with a nested approach, where learning goals for one subject are dominant over those for the other, would yield higher ESs for the dominant subject. However, because all studies with a nested approach (in both S&T and language) only measured S&T learning outcomes, this hypothesis could not be tested. Furthermore, it should be noted that some mean ESs in Table 10 are only based on one ES and therefore do not provide very strong evidence of a stable pattern.
Table 10. Mean effect sizes for four levels of integration
Level of integration
|
S&T learning outcomes
|
Language learning outcomes
|
Mean ES (Cohen’s d)
|
Number of ESs
|
Number of studies
|
Mean ES (Cohen’s d)
|
Number of ESs
|
Number of studies
|
Nested in S&T
|
.36
|
8
|
5
|
-
|
-
|
-
|
Nested in language
|
.39
|
1
|
1
|
-
|
-
|
-
|
Multidisciplinary
|
.56
|
6
|
6
|
.31
|
12
|
9
|
Interdisciplinary
|
.73
|
5
|
3
|
1.42
|
1
|
1
|
Note: The number of studies includes all outcome variables related to the subjects (e.g., for language: reading, writing, vocabulary), which is why the sum of the number of studies adds up to more than 19.
|
Regarding the duration of the intervention, Table 11 shows that the mean ES obtained from studies with interventions covering a short time span (12 weeks or less) was higher than the mean ES from studies with interventions covering a longer time span (13 weeks or more). Regarding the other intervention characteristics that were examined, only one study specified that during the intervention, the teachers monitored students’ learning progress towards the learning goals. None of the studies specified that the teachers implemented follow-up instructional support during the intervention. Therefore, it is difficult to analyse whether this had any impact on the ESs of these interventions.
Table 11. Mean effect sizes for studies with short or long duration of the intervention, and with and without TPD
Studies
|
Mean ES (Cohen’s d)
|
Number of ESs
|
Number of studies
|
Studies with short interventions (12 weeks or less)
|
.59
|
20
|
12
|
Studies with long interventions (13 weeks or more)
|
.34
|
13
|
7
|
Studies with TPD
|
.55
|
14
|
9
|
Studies without TPD
|
.36
|
19
|
10
|
|
Finally, the last intervention characteristics concerned the TPD activities that were used to prepare teachers for implementation of the instructional intervention. Table 11 shows that studies that included TPD activities yielded higher ESs on average than studies with no TPD activities. The studies provided too little information about the learning goals, learning content and instructional method of the TPD activities to perform any meaningful analysis about the association of these characteristics with the study ES (see Table 6).
DISCUSSION
This review provided an overview of the effects of ILS&T instruction on student learning achievement in language arts and S&T. Unlike previous reviews that evaluated the effects of language arts and science integration, the current review only included studies that focused on S&T instruction with an inquiry- or design-based pedagogy. Furthermore, this review addressed all aspects of language arts learning, including reading, writing, oral language, and vocabulary. Finally, this review described a comprehensive analysis of the features of studies and interventions, and their association with the reported study effects. An important finding is that many studies lacked a detailed description of the study and intervention characteristics, complicating our analysis. This stresses the need for scholars to provide detailed reports of the design and implementation of intervention studies. In the section below, the main findings of the review are discussed along with the limitations and areas for future research.
Characteristics of the Studies and Interventions
The first research question was: What are the characteristics of the studies and interventions in the experimental literature on ILS&T? The analysis resulted in an overview of the distribution of the corpus of studies with respect to their study and intervention characteristics, which showed wide variation. Moreover, many studies lacked specification of the characteristics of the instructional intervention as well as the TPD activities, particularly regarding the instructional methods. Thus, it is unknown whether either these interventions did not include certain TPD characteristics, such as the monitoring of students’ learning process or opportunities for teachers to practice, or merely did not explicitly mention doing this in the publications.
Effects of ILS&T Interventions on Students’ Learning Achievement
The second research question addressed the effects of ILS&T interventions on student learning. Similar to other review studies in which the impact of integrated instruction in the context of science and language arts learning was investigated (Bradbury, 2014; Graham et al., 2020; Guo et al., 2016; Hwang et al., 2022) we found evidence for the effectiveness of ILS&T instruction. The analysis showed that on average, students who received ILS&T instruction demonstrated higher levels of learning achievement for all reported outcome variables for language and S&T than their peers in a control group. Even though no statistically significant differences between the treatment and control group were found in a few studies, none reported statistically significant effects in favour of the control group. This is an indication that, although they are preliminary due to the limited scope of the available research, there are indications that ILS&T instruction can improve learning achievement in language arts and S&T and does not harm the learning in either subject.
A comparison of the weighted mean ESs of studies with similar study characteristics revealed that ESs tended to be higher for studies that involved a quasi-experimental design and a control group that received separate language and S&T instruction. As expected, studies with a small sample size (< 100) demonstrated higher ESs, although it has been argued that smaller sample sizes tend to overestimate effects (Cheung and Slavin, 2016). The mean ES obtained from independent instruments was higher than the weighted mean ES obtained from researcher-developed instruments. This finding is surprising, as it was expected to be reversed, due to the alignment of researcher-developed instruments with the intervention (Cheung and Slavin, 2016; Wilson and Lipsey, 2001).
Based on the reported outcome variables in the studies that were included in this review, it can be argued that students who received ILS&T instruction outperformed their peers who received separate language and S&T instruction on all measures of S&T knowledge, reading, reading motivation and vocabulary. When compared to S&T-only instruction, evidence was found that ILS&T enhances students’ achievement in S&T knowledge, inquiry and design skills, and attitude towards S&T learning. These studies provide evidence that the addition of language arts instruction to S&T instruction enhances S&T learning. Although some of these studies also measured students’ language achievement (i.e., vocabulary, writing), ESs found for language achievement are less meaningful in these studies as it can be expected that students who did not receive language instruction during the intervention would score lower on a measure of language achievement. Finally, this review was unable to provide compelling evidence that ILS&T instruction enhances students’ learning achievement when compared to language-only instruction, due to the low number of studies that included a control group receiving language-only instruction. Additionally, few to none of the studies measured students’ oral language skills, writing skills, and inquiry/design skills, so we could not confirm that ILS&T instruction is effective for enhancing these outcome variables. The absence of such assessment in the studies may be partially due to the complexity of measuring these skills (see Davey et al., 2015; Dockrell and Marshall, 2015).
From a theoretical perspective, there are many potential benefits of ILS&T instruction for enhancing students’ language and S&T learning achievement. Still, the approaches to integration that were described in the studies were often relatively rudimentary (e.g., reading or talking about an S&T topic, developing vocabulary). It can be questioned whether the potential of ILS&T is currently being fully harnessed in these interventions. Moreover, it is difficult to assess this based on the reported ESs, because the instruments that are being used to measure student learning achievement are often not aligned with the complexity of the intended (integrated) learning goal of the intervention. The current review only distinguished between independent and researcher-developed instruments, but more features of the measurement procedure are worthwhile to consider.
First, it is important to align assessment with the purpose and content of the intervention, which unfortunately was not always the case in the studies included in this review. For instance, a study that evaluates an intervention that is meant to develop skills (e.g., reading, inquiry/design skills) but only includes assessment of S&T knowledge may produce a high effect size and falsely give the impression that the intervention was highly effective.
Second, the nature of various types of instruments should be considered. For example, testing S&T knowledge or vocabulary can be done in a relatively straightforward and reliable manner, using multiple-choice or open-ended questions that elicit conceptual knowledge. Measuring students’ inquiry/design skills or reading comprehension skills can be more complex and requires a different format, such as performance assessment (Shavelson, 1991). Thus, achieving higher ESs for more complex outcome variables may be more challenging.
Third, it should be considered which outcome variable is measured by instruments, and whether this outcome variable aligns with the intended object of assessment. In our analysis, it was difficult to determine the outcome variable that was measured by an instrument in some studies. For example, Chen et al. (2016) designed an instrument in their study that aimed to measure students’ written argumentation skills. However, after examining the assessment rubric or scoring rules that were included, we concluded that the instrument only measured students’ S&T knowledge and not students’ writing skills in a specified genre, and their argumentation skills (which could be considered an S&T skill). Similarly, we concluded that the MAT- and ITBS-tests that were often used in studies only assessed students’ scientific knowledge, rather than “critical thinking skills”, as was claimed by the authors. This indicates that at times, simplified instruments are used to measure relatively complex skills, which may imply a lack of suitable instruments. This issue has been widely addressed in the context of language arts assessment. For instance, assessment of reading comprehension is inherently complex, as it must be based on indirect symptoms and artifacts of text comprehension (such as disciplinary knowledge), as comprehension itself is a process that cannot be observed directly (Johnston, 1984; Pearson and Hamm, 2005). A common criticism in literature on reading comprehension assessment is whether it is possible to measure the complex interplay of knowledge, strategies and skills required for reading comprehension using a reading test that mainly contains multiple choice comprehension questions (Francis et al., 2005; Pearson and Hamm, 2005). Similarly, effective writing involves a complex interplay of accurate presentation of information, fluency, syntax, and conventions (e.g., see Isaacson, 1984). Thus, more research is needed to explore the availability and suitability of instruments to measure student learning outcomes in the context of ILS&T instruction.
Relation Between Intervention Characteristics and Effects
The third research question was: How do the intervention characteristics relate to the intervention’s effects on student learning? Instructional interventions with good alignment between the learning goal(s), learning content and instructional method did not appear to report higher (or lower) ESs than studies with a missing total score; therefore, it remains difficult to conclude whether well-aligned interventions contribute to higher student learning achievement. A further comparison revealed that ESs tended to be higher when the instructional interventions adopted a higher level of integration (i.e., interdisciplinary), which was also found in previous studies (Gresnigt et al., 2014; Loepp, 1999). Additionally, it was found that the mean ESs were higher for interventions that had a short duration (12 weeks or less). It has often been found that short interventions tend to yield higher ESs, but also lead to short-term improvements in student learning. Measuring long-lasting effects for interventions with a long duration (e.g., lasting a school year) can be more challenging.
It was found that interventions that included TPD activities generally yielded higher ESs. Unfortunately, too little information was provided by the studies regarding the TPD characteristics (i.e., learning goals, learning content, instructional method) to perform further analysis. As the focus of the studies was on the instructional intervention, it might not be surprising that the level of detail about the TPD activities was much lower. Moreover, it is worth bearing in mind that the teachers involved in the study may have had sufficient prior knowledge or experience with ILS&T, rendering TPD unnecessary.
Limitations
A few limitations to the present review warrant note. First, this review only included published studies. However, previous reviews have noted that published studies report higher ESs than unpublished ones (Cheung and Slavin, 2016). For this review, it was not deemed feasible to include grey literature. Second, several limitations apply to the ESs’ reliability and precision. The ESs reported by the studies vary in their underlying data, as different instruments and statistical methods were used to calculate the intervention outcomes. The preferred method would be to calculate the ES with the complete and original data, rather than deducing ESs from the values mentioned by the studies. However, in this study the researchers made every effort (within reasonable constraints) to ensure the highest possible reliability and precision of the ESs, as described in the Method section. Third, no attention was paid to the treatment fidelity (see Carroll et al., 2007) of the interventions in this review. Ideally, this information would be included, to determine the degree to which the intervention was implemented in classroom practice as intended. Teachers play an important role in this, as well as teacher educators (i.e., was the intervention implemented as planned?). Low treatment fidelity in a study may be a possible explanation for low ESs that are found.
Future Research
This review has shown that more high-quality research is needed to determine whether ILS&T instruction enhances student learning in language and S&T when compared to non-integrated language and S&T instruction, and to understand why certain intervention studies produce more of the desired effects than others. Three main areas of research require attention, namely, specification of the features of the studies and the interventions, expansion of the variables to be investigated, and the systematic analysis of potential moderators of the effects of ILS&T instruction.
Several hypotheses could not be tested in this review due to lack of specification of the characteristics of the intervention in the included studies. This must be rectified in future studies. The authors therefore call upon researchers to be mindful that they thoroughly report the study and intervention design. Researchers should adequately describe the procedures and substantiate design choices when reporting on intervention studies. At the least, this should include the specification of learning goals, learning content and instructional method, and ideally the features of the contextual conditions for learning as well.
More research is needed to expand the relevant variables investigated as an outcome of ILS&T instruction. This review showed that there is disparity in the number of studies reporting effects for the different outcome variables. For example, the number of studies that measured students’ reading achievement was much higher than studies reporting on students’ writing or oral language skills. To resolve this issue, future studies should include measurements for all student learning outcomes that were addressed in the intervention, which was not the case in many of the studies that were included in this review.
Future research could also further examine the availability or design of suitable instruments for the measurement of relatively complex outcome measures, such as oral language skills, writing skills, and inquiry or design skills. Moreover, it would be worthwhile to design instruments that measure integrated constructs (i.e., argumentative science writing, oral presentations on scientific experiments or technological designs). In this way, the instrument can capture the goals and nature of the ILS&T intervention and of the intended assessment.
Finally, there are other relevant variables that may be worth addressing in future research. For example, this review synthesized studies that compared the experimental conditions (namely ILS&T interventions) to three types of control conditions: separate language and S&T instruction, language-only instruction, and S&T-only instruction. Future research could include a comparison of an ILS&T interventions to all three types of control groups, to give more substantive insight into the value of integration for both subjects.
Finally, more research is needed to systematically analyse the potential moderators of effects of ILS&T instruction. This review focused on the effects of relatively short interventions, where the teachers who implemented the intervention were supported by a team of researchers and experts. In the long term, such support is not always available in practice. The upscaling of programs for system-wide adoption can pose challenges. Moreover, interventions that are successful in controlled settings may not have the same results when used in real-world settings, due to inadequate fidelity among other things. At this stage, other factors can contribute to the success of the implementation, such as active educational leadership (see Timperley et al., 2008). Many reform efforts fail, resulting in teachers returning to traditional teaching methods (Cohen and Mehta, 2017). Making explicit what these factors are could contribute to higher success rates for reform efforts that go beyond controlled interventions.
Longitudinal studies that investigate the effects of ILS&T interventions are required to reveal long-term effects on students’ learning achievement in language and S&T. This is especially desirable for examining more complex learning outcomes, such as reading comprehension skills and inquiry or design skills, which are not developed over the course of a few weeks or months. In the current review, weighted mean ESs were calculated for groups of studies with similar characteristics, but due to the variation in the studies and interventions, we were only able to look at one variable at a time. When more studies about ILS&T interventions are available, a thorough meta-analysis could be performed to gain more systematic insight into the potential moderators of the effects of ILS&T instruction. This would also allow for more thorough analysis of the interaction between study and intervention characteristics. Similarly, it would be interesting to look at the cross-over effects of the instructional and TPD interventions. Based on the current review and the data that were available from the included studies, too little is yet known about the combined impact of student and teacher learning in the context of ILS&T instruction.
CONCLUSION
This review provides valuable insights into the impact of ILS&T interventions on students’ learning achievement in S&T and language. Although it remains difficult to determine which approach to ILS&T instruction enhances student learning most, this review has taken an important step in providing an overview of the current literature on this topic. This review distinguishes itself from past reviews by focusing on interventions that encompass all domains of language learning (i.e., reading, writing, oral language, and vocabulary), in line with the elementary language arts curriculum. Moreover, this review focused exclusively on interventions that adopt an inquiry- or design-based pedagogy in S&T instruction, which suits the true nature of science, engineering and technology (Lewis, 2006). By assuming this focus, the interventions that were included in this review are closely aligned to classroom practices that are currently being advocated by educational standards (e.g., NGSS Lead States, 2013). The findings of this review also give insight into the areas that are still unknown and require additional research. Moreover, this study provides a substantiated framework for analysing ILS&T interventions and offers new insights into the content and approach of interventions described in the existing literature. These insights can improve the quality of the design of ILS&T interventions. Most importantly, this study showed that ILS&T instruction is, in most cases, effective in enhancing student achievement for language and S&T when compared to non-integrated instruction.