Using the Emoji Spatial Stroop Task to measure valence-space effects of emoji

What are valence-space effects?
Valence-space effects are a psychological phenomenon where abstract concepts of "good" (positive valence) are associated with higher or rightward spatial locations, while "bad" (negative valence) concepts are associated with lower or leftward spatial locations.
Within cognitive processing, we see a congruency effect where it is typically easier for us to process information when valence and space associated are matched ("congruent") rather than unmatched ("incongruent"). We expect that positive stimuli in upper or rightward positions would be more easy to process than in lower or leftward positions and vice versa for negative stimuli.
What is the Emoji Spatial Stroop Task?
The Spatial Stroop Task is a well-used paradigm in psychology. It measures interference between a stimulus's spatial location and its meaning. It is a variant of the classic Stroop task, which typically uses colours and words. Our Emoji Spatial Stroop task uses emoji instead of other types of stimuli to measure how characteristics such as the valence of emoji relate to how easily they are processed when presented in different physical locations on a screen.
Do we see vertical valence-space effects of emoji?
We often embody emotion in physical space ("Feeling up"). If emoji are emotional, we should see effects when these are displayed within our physical world. When we report the level of emotionality of emoji, we see that happy emoji are rated more positive when in upper vertical space and sad emoji are more negative when in lower space. However, this only seems to be the case when we explicitly report the valence of emoji and not how we implicitly embody them.
Do we see any cross-modal effects when processing emoji?
Cross-modal processing refers to the way we integrate information from more than one sense at once (e.g., visual and aural information). If emoji are emotional, we should see that positive emoji should be easier to process with aural information which tends to be embodied more positively (e.g higher pitch) and negative emoji with aural information which is embodied more negatively (e.g., lower pitch).