Artificial intelligence detects awareness of functional relation with the environment in 3 month old babies – Scientific Reports

admin
15 Min Read

A recent experiment probed how purposeful action emerges in early life by manipulating infants’ functional connection to an object in the environment (i.e., tethering an infant’s foot to a colorful mobile). Vicon motion capture data from multiple infant joints were used here to create Histograms of Joint Displacements (HJDs) to generate pose-based descriptors for 3D infant spatial trajectories. Using HJDs as inputs, machine and deep learning systems were tasked with classifying the experimental state from which snippets of movement data were sampled. The architectures tested included k-Nearest Neighbour (kNN), Linear Discriminant Analysis (LDA), Fully connected network (FCNet), 1D-Convolutional Neural Network (1D-Conv), 1D-Capsule Network (1D-CapsNet), 2D-Conv and 2D-CapsNet. Sliding window scenarios were used for temporal analysis to search for topological changes in infant movement related to functional context. kNN and LDA achieved higher classification accuracy with single joint features, while deep learning approaches, particularly 2D-CapsNet, achieved higher accuracy on full-body features. For each AI architecture tested, measures of foot activity displayed the most distinct and coherent pattern alterations across different experimental stages (reflected in the highest classification accuracy rate), indicating that interaction with the world impacts the infant behaviour most at the site of organism~world connection.

Artificial neural networks (ANNs) were first developed to understand biological behaviour and mechanisms of cognition. Turing believed that a crucial measure of artificial intelligence (AI) is the ability to mimic the way in which complex behaviour becomes organized in infants. Given recent technical advancements in computing and AI as well as theoretical advancements in infant learning, it may be possible to use machine and deep learning techniques to study how infants transition from loosely structured exploratory movement to more organized, intentional action. Thus far, such methods have focused on analyzing spontaneous movements and distinguishing fidgety from non-fidgety movements.

Though early infant movement is chaotic, meaningful patterns emerge as infants adapt to external perturbations and constraints through dynamic interaction between brain, body and environment. However, little is known about the mechanisms by which infants begin to intentionally act on functional relationships with their environment. Laws governing bidirectional interaction between infant and environment are lacking, and the roots of conscious, coordinated, goal-directed action remain largely unexplored.

A paradigm designed a half century ago to study infant memory and learning provides an experimental window into the formation of human agency, action towards an end. In this so-called mobile conjugate reinforcement (MCR) paradigm, Rovee et al. connected a ribbon between an infant’s ankle and a mobile suspended over the infant’s crib. Conjugate reinforcement refers to the sights,noises, and sensations due to mobile movement all being ostensibly dependent on and in proportion to the magnitude and rate of infant action. In short, the thinking was that the more the infant moved, the more ‘reward’ the mobile provided, stimulating further infant movement. Infants moved the connected leg at much higher rates compared to baseline, which Rovee and Rovee interpreted as reinforcement learning. However, mounting evidence suggests that rather than being rewarded by mobile stimulation per se, the increase in infant movement is driven by infant detection of the self~mobile relationship. The key variable manipulated in MCR is the infant’s functional connection to the world – transforming the infant from a disconnected observer to a connected actor. Bidirectional, coordinated information exchange through coordination is thought to generate meaning and create the opportunity for infant discovery of agency.

If infants do in fact discover that they can ‘make the world behave’ in MCR, dynamical analysis should expose the mechanisms of the discovery process. A necessary step in studying the development of sentient agency is being able to detect structural changes (in time, space and function) related to goal-directedness and to differentiate between exploratory and goal-directed action. One hypothesis is that the moment of agentive realization (‘Aha! moment’) constitutes a kind of phase transition marked by sudden changes in activity rate, coordination and variability. Given recent development of dynamical tools to identify infant agentive discovery in the infant~mobile paradigm and related results which support the notion of infant agentive discovery as a phase transition dependent on tight organism~environment coordination, it may be possible for AI systems to automatically detect these and/or other changes in infant movement patterns reflecting detection of baby~world causal relationships. Though the target measure in most infant contingency studies is movement rate, some studies have found that infants modify multiple features of movement including amplitude, timing and inter-joint coordination while exploring and exploiting their functional relationship with the mobile. AI tools may be particularly suited to deal with the complexity and subtleties of infant movement and, more generally, agent~object interaction in 3D space.

A variety of machine and deep learning methods can be implemented for pose recognition using video recordings of infant movement. For example, McCay et al. extracted posed-based features from video recordings to develop an automatic and independent method to diagnose cerebral palsy (CP) in infants. They used a freely available library called OpenPose to obtain skeletal joint coordination during sequences of movement in 12 infants up to seven months of age. That study achieved nearly 92% accuracy (two classes) using machine learning techniques (k-Nearest Neighbor (kNN) and Linear Discriminant Analysis (LDA)) and 91.7% accuracy using a fully connected convolutional neural network (FCNet). Additionally, Tsuji et al. constructed a neural network with a stochastic structure that was able to distinguish between normal and abnormal infant movements with up to 92.2% accuracy. (See also).

In general, deep learning structures that use convolutional neural networks (CNNs) have effectively achieved state-of-the-art accuracy for classification of adult action. In order to complete CNN feature extraction and classification automatically, regions of an output feature map are pooled. Though typically employed to reduce the computational cost of the model, such poolings may also result in a loss of important information. Two common approaches are max or mean pooling. In max pooling, a filter (a 2 × 2 grid, for example) is slid over the output feature map and only the maximum value in the grid area is retained. (In the example of a 2×2 filter, output data are reduced from four values to one). Although a useful means to simplify the network, it is impossible to know from poolings alone where and how many times a filtered feature is encountered in the data. A new form of neural network, the capsule network (CapsNet) was developed to address this issue. Using groups of artificial neurons (i.e., mathematical functions designed to model biological neurons that encode visual entities and to recognize the relationships between these entities), CapsNets are designed to model part-whole hierarchical relationships explicitly. CapsNet encapsulates artificial neurons in its vector structure to arrange the first layer (primary capsules) and uses a novel Dynamic Routing (DR) procedure to create a perfect route between this layer and subsequent layers (parent capsules). In an image classification task, a hierarchical relationship is built into an object in the image so that it is possible to interpret features and determine which part of any image belongs to a particular object. For example, DR might allow a CapsNet to not only assess whether elements of a face (e.g., eyes, lips, nose) are present but also whether these elements are realistically situated in relation to one another. Lastly, CapsNets can be trained more efficiently than traditional CNNs. Because CNNs cannot handle rotational invariance, they must be trained on large amounts of input data which have been augmented by many combinations of transformations (e.g., rotation, zooming, cropping, inversion) to be able to classify new data accurately. Critically, since CapsNets can handle rotational invariance, they can be trained with fewer samples and may be particularly suited for infant action recognition for which large datasets are difficult to acquire.

While techniques in motion identification, reconstruction, and analysis for automatic recognition of specific human activities continue to improve, classifying human action patterns is challenging because these patterns often involve both temporal and spatial characteristics. Joints connect different segments in the human body as an articulated system, and human actions comprise the continuous development of the spatial structure of these segments. Though the majority of AI action recognition research focuses on adult movement, researchers have begun developing systems to automatically analyze movement of paediatric populations, including infants. However, automatic analysis of infant behaviour is complex since data are often captured in uncontrolled natural settings while the infant is freely moving and interacting with a variety of objects. Handling information such as infants’ physical variations, lighting changes, and examiner involvement results in a lack of robustness in classification for markerless capture of movement data (i.e., estimating 3D limb position and movement from video recordings). Optical flow, frequency domain, and background removal are techniques commonly used to deal with these challenges. While pose estimation from video recordings can deliver high accuracy, the extracted image sequences provide a large amount of data, but at a high computational cost.

Alternatively, reconstructed 3D skeleton data from marker-based motion capture (MoCap) systems have been shown to be dynamically robust and have anti-interference properties for automatic action classification. Although consistent positioning of physical sensors (i.e., visual markers, accelerometers, gyroscopes) across infant participants is difficult, as joint landmarks are often obscured by fat, and the application of sensors may modify infant behaviour, skeletal joint information extracted from MoCap systems that use physical markers provides high temporal resolution and extremely high accuracy. For example, Yu et al. used an adaptive skeleton definition to translate and rotate the virtual camera’s viewpoint to generate a new coordinate system. This produced a robust and adaptive neural network for automatic, optimal spatiotemporal representation. In many respects, action recognition is more straightforward using skeletal joint information than RGB video imagery, and hence is preferred here.

The data used in the present work were obtained from a baby~mobile experiment which quantitatively identified moments of agentive discovery and explored underlying mechanisms through analysis of infant motion, mobile motion and their coordination dynamics. Movement data were collected using a Vicon 3D MoCap system. Given the nature of this experiment and the fact that the MoCap data provide exact infant joint locations, several machine and deep learning approaches for classifying pose-based features are proposed and evaluated here. The first main objective is to classify infant movement across different experimental stages, ranging from spontaneous activity to reactions to an externally driven moving stimulus (here a mobile) to make the mobile move to losing control over the mobile. Successful classification would indicate that the functional context and infant sensitivity to context drive changes in the structural features of infant movement. The classification accuracy of different experimental stages will highlight the pros and cons of applying different machine and deep learning approaches in infant studies and, at the same time, expose whether behaviour is more constrained and characteristic within certain contexts, leading to higher accuracy classification. The second main aim is to study temporal features of infant behaviour using sliding windows. To reiterate, the infant’s discovery that ‘I can make the mobile move’ emerges from a coordinative dance between organism and environment which unfolds in time. Assessing classification accuracy using sliding windows will provide new insight to the dynamic topological evolution in infants exploring and discovering their relationship to the world. With our unique dataset and in the context of related evidence of infant agentive discovery, we examine and optimize machine and deep learning methods to characterize the flow of structural change in infant movement reflective of cognitive processes and behavioural adaptation within and across coordinative contexts. We demonstrate that approaches based on deep learning are well-suited for working with pose-based data. In particular, we show that CapsNet-based approaches, such as 1D-Capsule Network (1D-CapsNet) and 2D-CapsNet preserve the hierarchy of features and avoid information loss in the model’s architecture by substituting pooling with dynamic routing especially when fused features were employed. More to the point, we demonstrate that AI systems provide significant insight into the early ability of infants to actively detect and engage in a functional relationship with the environment.

Share This Article
By admin
test bio
Please login to use this feature.