using ANNs to decode retinal ganglion cell encoding


This is an older article written back in 2021 when I was still just a first year in university. The writing and citation use is less then ideal, but the subject matter is still relevant and interesting, so I decided to upload it.



Motivation :

Today, the form of VR that has been adopted most is the Head Mounted Display (HMD) [1]. The purpose of HMDs is to input visual information into the eyes by screens and auditory information by speakers. Both uses are a way to input artificial information to the brain by using its external sensory organs. We can define virtual reality as the action of replacing the natural sensory input with artificial inputs created by software, thus hijacking the brain’s inputs and outputs, and using them inside a virtual environment. Today’s technology is very primitive, barely classifying as virtual reality under this definition. The use of screens placed near the human eye is not a long-term solution, creating a multitude of unnecessary problems, like refresh rate, resolution, screen-door effect and so forth. While some people today are trying to fight those problems, the bigger problem is the limitation of such screens and the logical next step, which is mostly overlooked. VR today is only a prototype of what it could be.


The retinal encoding problem :

1. Neuron

A neuron is a cell made up from three parts, the dendrites, which take inputs from other neurons, the neuronal body, which processes those inputs, and the axons, which give an output to the next neurons. It is the main cell that provides connectivity and computing power in the brain [2] [3], numbering about 100 billion [4]. The connections between the neurons comprise all of who you are [5]. Each neuron takes as input a electric spike from a number of other neurons in the dendrites and if the spike is powerful enough, it itself gives a spike forward through it’s axon, which is connected to a number of other neurons that repeat the process. [6]

2. Eye

The mammal eye is amazing. It is the most important sense that humans possess [7]. To get a better understanding of its architecture, let us follow the path light takes. It enters the eye in the pupil, the amount of it being controlled by the iris contraction, which can contract to let a smaller amount of light in and dilate so that we can see in darker places. After that it travels through the so called ‘eye lens’ which controls the focal point of the eye. Depending on the shape the lens takes, you can focus on objects near you or far in the distance. In the end it hits the back part of the eye, where the retina is located. [8]

The retina is made up from 3 stacked layers of cells, all of which do a certain job. The initial layer, made up from a matrix of receptors, is hit by the light refracted by the lens. The number of photoreceptors and their size varies throughout the retina. You can think of it more like a target that has a center, a place in the middle called the fovea where the size of the photoreceptors is smallest and give the most detailed view [9]. As you go further away from the fovea, the size increases and population number decreases, thus providing us with a clear point of focus and blurry sides. It is in no way similar to the cameras that are available today, which take a rectangular photo of the same quality throughout. After the photoreceptors, the light travels through the second layer to the most important layer of the retina, the last one, made up from so called ganglion cells , the cells responsible for digitizing the images captured by the receptors. Ganglion cells are a type of neuron, and thus have an input and an output [10]. The input comes through the second layer in the form of electrical signals. One ganglion cell is connected to several layer 2 cells, that in turn are connected to even more photoreceptors. The structure is triangular, the number of photoreceptors being around 125 million while there are only around 1.5 million ganglion cells [11]. The retina transforms those signals into nerve spikes.

“Nerve spikes are a time-coded digital form of electrical signalling used to transmit nervous system information over long distances, in this case through the optic nerve and into brain visual centers” [12]. The optic nerve is a bundle of all the axons of the ganglion cells. The information that reaches the brain is much more than the simple image a digital camera might take, it includes temporal information, spatial information, and motion information [13] [14]. “Research over the past several decades has made clear that most RGCs are not merely light detectors, but rather feature detectors, which send a diverse set of parallel, highly processed images of the world on to higher centers (of the brain)” [15]. Even when first discovered in 1967, the ganglion cell had three unique patterns of light response. ON type cells responded with a short burst at the first onset of light and than sustained constant discharges throughout the stimulation. ON-OFF type cells responded with a burst on onset of light but were otherwise quiet. OFF type cells were quiet until the stimulus light had been turned off, whereupon they responded with a constant burst of impulses [10]. Since than, we have discovered that there are in fact around 15 – 20 different types of RGSs that reflect distinctive features of the spatial and temporal pattern of stimuli activity [16]. The information is then processed by the brain which decides on the appropriate action to take.

While some parts of this system are simple in concept, the overall structure is unknown [17]. As usual in the brain, everything is interconnected [18], that means that it is not a linear process. For example, the photoreceptors in the layer 1 of the retina are influenced by the ganglion cells in layer 3 even if they are the ones sending the information. Everything is connected and influenced by everything [17]. In result, isolating a ganglion cell or a smaller population of them will give us some clues on how they work internally, but it will not tell us anything about the system overall. This problem is called the retina decoding/encoding problem and in the coming years will be one of the most important advances in neuroscience and computing. [19]


What have people done until now? :

People have been trying for half a century to decode the visual information from the retina. The first to do it won the Nobel prize in physiology and medicine in 1967. They managed to record the response of an individual ganglion cell of a vertebrate retina [20]. Since than we have tried many different approaches, and we have been successful in some cases, managing to decode rudimentary images. The models we have today fall into three categories: Linear, Linear-Nonlinear and Nonlinear [21]. Each one provides different advantages and disadvantages. They are all important in the industry. In this presentation we will be talking about the nonlinear model. We do have a history of using linear and linear-nonlinear models and succeeding at decoding rudimentary images [22] [23], but with the advancement of neural networks and their nonlinear potential, the field is advancing rapidly [17].


Neural Networks

An artificial neural network (ANN) is a brain-inspired algorithm that is comprised of many layers that hold a certain number of neurons, also called nodes. The usual structure contains an input layer, a certain number of hidden layers and the final output layer [24]. The number of hidden layers varies greatly, ranging from hundreds to just 1 in some cases. Each neuron in a layer holds a piece of information that will be propagated further into the network based on its weight. In the brain, in a network of neurons where the connections are biological, every time a neuron’s action potential triggers another neuron, the connection between them becomes stronger [25] and it will be easier for the same spikes to happen the next time. Over time, the strong connections between a network of neurons lead to patterns. This is how humans learn [26]. The same principle has been adapted into neural networks, the strength of the connection being transformed into the concept of weight, where a certain neuron’s output has more weight than other neurons on the next neuron’s computation if trained over time. In the case of a basic neural network, the information flows only in one direction. This property is called feed-forward [27]. There are also types of neural networks where the information within a layer is cycled to earlier layers. A piece of information is inserted into the input node, it is processed by that node and transferred into the hidden layers. At the entry of every node, there are more than one inputs. The inputs are filtered in terms of their weight. The process that happens inside the node is called the Activation Function [28] and depends on the use of the NN. After the input enters the next node, it is processed and fed forward. This procedure is repeated a certain number of times until the information reaches the output layer. For example, if we input the pixels of an image into individual input nodes and than perform a certain procedure inside the hidden layers, we can output the same number of pixels and map them into their original structure. This results in a modified image.

When first inputting information in the neural network, the output will be random. But, by taking the original input and the output created, you can use it to further the weight of each neuron. This principle is called backpropagation [29] and it is widely used to train neural nets. After a period of training, a neural network can do amazing things, like interpret a photo, create a PHD thesis and many more things. [30] [31]


Typical Decoder

The structure of the typical retinal decoder is composed of two parts, a spike-to-image converter, and an image-to-image autoencoder [32] [33] [34]. An autoencoder is a neural network that learns unsupervised from data. The spike-to-image decoder part maps all the neural spike trains to individual pixels on an intermediary image. The autoencoder part takes the intermediary image, which looks like random noise, and translates it into the final output image. For the autoencoder, a Convolutional Neural Network (CNN) is used. A CNN is a type of ANN that includes convolutional layers inside the hidden layers. If an ANN is like the brain, a CNN is like the retina, because those convolutional layers actively analyze the image to understand what it contains [35] [36]. It is widely used in visual tasks, like labeling photographs [37]. The quality of the output has gotten much better since introducing the CNN, or NN in general [38] [39] [40] [41]. The degree of accuracy and detail is still lacking and the ability to decode live video is still in development. [42] [43]

The problem with neural networks is that they offer no insight into how the retina works. When training a model, the network develops a new way of decoding information that could be fundamentally different from the original one used in the retina, even if they both perform the same task. [17]


What effect would solving it have on the VR industry and general? :

As I have said before, the retina shares the same level of complexity as the brain. “A retina is essentially a piece of brain” [44] Unlike the brain, the retina, “with its highly layered structural organization and ease of access as a whole organ that can be studied from sensory integration to neural coding, is the perfect model to study functional organization in the central neural system.“ [45] In short, the retina is the Rosetta stone for the brain. If we solve vision, all the other senses will follow. We believe this because it has been shown that the brain works under one general algorithm.

In April 2000, MIT neuroscientists rewired the brain of a ferret so that the visual information coming from the eyes would reach the auditory cortex and vice versa. The response of the brain was to adapt to that information. The auditory cortex learned to see, and the visual cortex learned to hear. The ferret was fine after a period of adaptation. [46] Other similar studies have followed. What this tells us is that any part of the brain can be rewired to do any process. Therefore, if we solve the visual system, we have many reasons to believe that the rest of the brain will follow shortly. Even if we encounter problems along the way, just decoding the visual system is a huge step forward for us.

Understanding the brain visual I/O means we can hack it. With the adoption of soon to be Brain Machine Interfaces that will be used mainly for medical and augmentation purposes, VR will most likely see a much wider adoption rate as it will be integrated in the form of a software extension. All the people with an implant will be able to try a much better virtual experience by connecting directly to the brain, and at some point, a full VR dive experience. This will negate the need of buying hardware and integrate VR into the daily lives of, eventually, the majority of people.

It is currently impossible to predict the main uses of the technology as the whole culture will be different. The whole of human experience will be fundamentally different than what we have today. The medical field will see a huge push forward with this, as they will have direct access to the brain, eradicating diseases such as paralysis or Alzheimer’s and even use it for making the experience of recovering more pleasant and easy [47] [48] Another use is entertainment, with full dive experience available everywhere, it is fair to say that the time we use today for transit or waiting will be filled with VR dives. With this, virtual reality will quit being a prototype like we have today and mature into its full potential as a medium.



References

[1] "Global Virtual Reality (Semi & Fully Immersive, Non-immersive) Market Size, Share & Trends Analysis Report, 2020-2027," ResearchAndMarkets.com, 16 September 2020. [Online]. Available: https://www.businesswire.com/news/home/20200916005472/en/Global-Virtual-Reality-Semi-Fully-Immersive-Non-immersive-Market-Size-Share-Trends-Analysis-Report-2020-2027---ResearchAndMarkets.com.

[2] D. A. Woodruff, "What is a neuron?," The University of Queensland, 13 August 2019Login. [Online]. Available: https://qbi.uq.edu.au/brain/brain-anatomy/what-neuron.

[3] M. Hennig, "Neural Computation," 3 December 2018. [Online]. Available: https://www.inf.ed.ac.uk/teaching/courses/nc/ln_all_2018.pdf.

[4] S. Herculano-Houzel, "The human brain in numbers: a linearly scaled-up primate brain," Frontiers in human neuroscience , vol. 3, p. 31, 2009.

[5] F. Crick, in The Astonighing Hypothesis - The Scientific Search for the Soul, 1994, p. 3.

[6] "How do neurons work?," The University of Queensland, [Online]. Available: https://qbi.uq.edu.au/brain-basics/brain/brain-physiology/how-do-neurons-work.

[7] R. J. a. S. K. Sternberg, Cognitive psychology, Nelson Education, 2016.

[8] P. D. Heeger, "Perception Lecture Notes: The Eye and Image Formation," Department of Psychology, New York University, 2006. [Online]. Available: http://www.cns.nyu.edu/~david/courses/perception/lecturenotes/eye/eye.html.

[9] A. e. a. Bringmann, "The primate fovea: Structure, function and development.," Progress in retinal and eye research, vol. 66, pp. 49-84, 2018.

[10] R. NELSON, "GANGLION CELL PHYSIOLOGY," Webvision, The Organization of the Retina and Visual System, 10 April 2007. [Online]. Available: https://webvision.med.utah.edu/book/part-ii-anatomy-and-physiology-of-the-retina/ganglion-cell-physiology/.

[11] P. J. e. a. Vance, "Bioinspired approach to modeling retinal ganglion cells using system identification techniques.," IEEE transactions on neural networks and learning systems, vol. 29.5, pp. 1796-1808, 2017.

[12] R. Nelson, "Visual responses of ganglion cells.," in Webvision: The Organization of the Retina and Visual System , 2007.

[13] R. e. a. Gütig, "Computing complex visual features with retinal spike times.," PLoS One 8.1 , p. e53063, 2013.

[14] B. e. a. Liu, "Predictive encoding of motion begins in the primate retina.," bioRxiv, 2020.

[15] J. R. a. R. H. M. Sanes, "The types of retinal ganglion cells: current status and implications for neuronal classification.," Annual review of neuroscience 38, pp. 21-246, 2015.

[16] T. Gollisch, "Features and functions of nonlinear spatial integration by retinal ganglion cells," Journal of Physiology-Paris, vol. 107.5 , pp. 338-348, 2013.

[17] Z. e. a. Yu, "Toward the Next Generation of Retinal Neuroprosthesis: Visual Computation with Spikes," Engineering , vol. 6, no. 4, pp. 449-461, 2020.

[18] D. Deriso, "The Brain As A Network," Scitable, 15 March 2012. [Online]. Available: https://www.nature.com/scitable/blog/the-artful-brain/the_brain_part_1/#:~:text=The%20brain%20is%20an%20enormously,with%201015%20synapses1.s.

[19] S. Needed. [Online].

[20] "The Nobel Prize in Physiology or Medicine 1967," Nobel Media, [Online]. Available: https://www.nobelprize.org/prizes/medicine/1967/summary/.

[21] G. a. F. R. Schwartz, "Nonlinear spatial encoding by retinal ganglion cells: when 1+ 1≠ 2.," Journal of General Physiology, vol. 138.3, pp. 283-290, 2011.

[22] S. B. e. a. Ryu, "Optimal linear filter based light intensity decoding from rabbit retinal ganglion cell spike trains.," 3rd International IEEE/EMBS Conference on Neural Engineering, 2007.

[23] R. H. Masland, "Processing and encoding of visual information in the retina.," Current opinion in neurobiology, vol. 6.4, pp. 467-474, 1996.

[24] "Quick intro without brain analogies," Stanford University, 2020. [Online]. Available: https://cs231n.github.io/neural-networks-1/.

[25] "What is synaptic plasticity?," The University of Queensland, [Online]. Available: https://qbi.uq.edu.au/brain-basics/brain/brain-physiology/what-synaptic-plasticity.

[26] R. G. Morris., T. T. and A. J. Duszkiewicz, "The synaptic plasticity and memory hypothesis: encoding, storage and persistence.," Philosophical Transactions of the Royal Society B: Biological Sciences, 2014.

[27] djmw, "Feedforward neural networks 1. What is a feedforward neural network?," 26 April 2004. [Online]. Available: https://www.fon.hum.uva.nl/praat/manual/Feedforward_neural_networks_1__What_is_a_feedforward_ne.html.

[28] A. S. V, "Understanding Activation Functions in Neural Networks," The Theory Of Everything, 30 March 2017. [Online]. Available: https://medium.com/the-theory-of-everything/understanding-activation-functions-in-neural-networks-9491262884e0.

[29] L. Mou, "Notes on Back Propagation in 4 Lines," March 2015. [Online]. Available: http://sei.pku.edu.cn/~moull12/resource/backprop.pdf.

[30] W. D. H. page, "OpenAI’s new language generator GPT-3 is shockingly good—and completely mindless," MIT Technology Review, 20 July 2020. [Online]. Available: https://www.technologyreview.com/2020/07/20/1005454/openai-machine-learning-language-generator-gpt-3-nlp/.

[31] M. Somers, "Deepfakes, explained," MIT Management Sloan School, 21 July 2020. [Online]. Available: https://mitsloan.mit.edu/ideas-made-to-matter/deepfakes-explained.

[32] Z. Y. and e. a. , "Reconstruction of natural visual scenes from neural spikes with deep neural networks.," Neural Networks, vol. 125, pp. 19-30, 2020.

[33] K. Y. Joon and e. a. , "Nonlinear decoding of natural images from large-scale primate retinal ganglion recordings.," bioRxiv, 2020.

[34] P. N. and e. a. , "Neural networks for efficient bayesian decoding of natural images from retinal neurons.," Advances in Neural Information Processing Systems, 2017. [35] G. Lindsay, "Convolutional neural networks as a model of the visual system: past, present, and future.," Journal of Cognitive Neuroscience, pp. 1 - 15, 2020.

[36] K. Jonghong and e. a. , "Convolutional neural network with biologically inspired retinal structure.," Procedia Computer Science, vol. 88, pp. 145-154, 2016.

[37] T. Zhou and et al. , "Classify multi-label images via improved CNN model with adversarial network.," Multimedia Tools and Applications, vol. 79.9, pp. 6871-6890, 2020.

[38] N. Maheswaranathan and et al., "Deep learning models reveal internal structure and diverse computations in the retina under natural scenes.," bioRxiv , p. 340943, 2018.

[39] Q. Yan and et al., "Revealing structure components of the retina by deep learning networks.," arXiv preprint, p. arXiv:1711.02837, 2017.

[40] Q. Yan and et al., "Revealing Fine Structures of the Retinal Receptive Field by Deep-Learning Networks.," IEEE Transactions on Cybernetics , 2020.

[41] J. Glaser and et al., "Machine learning for neural decoding.," Eneuro, vol. 7.4, 2020.

[42] O. Marre and et al. , "High accuracy decoding of dynamical motion from a large retinal population.," PLoS Comput Biol, vol. 11.7, p. e1004304, 2015.

[43] V. Botella-Soler and et al., "Nonlinear decoding of a complex movie from the mammalian retina.," PLoS computational biology, vol. 14.5, p. e1006057, 2018.

[44] E. Fernandez and R. Normann, "Introduction to Visual Prostheses," Webvision, The Organization of the Retina and Visual System, [Online]. Available: https://webvision.med.utah.edu/book/part-xv-prosthetics%20/introduction-to-visual-prostheses-by-eduardo-fernandez-and-richard-normann/.

[45] D. CONNECT, "Understanding how the retina encodes information with the RENVISION project," European Commission, 25 April 2017. [Online]. Available: https://ec.europa.eu/digital-single-market/en/news/understanding-how-retina-encodes-information-renvision-project.

[46] S. Jitendra, A. Alessandra and S. Mriganka, "Induction of visual orientation modules in auditory cortex," Nature, vol. 404, pp. 841-847, 2000.

[47] A. Vourvopoulos and et al., "Effects of a brain-computer interface with virtual reality (VR) neurofeedback: A pilot study in chronic stroke patients.," Frontiers in human neuroscience, vol. 13, p. 210, 2019.

[48] Badia, S. Bermúdez and et al. , "Virtual reality for sensorimotor rehabilitation post stroke: Design principles and evidence.," Neurorehabilitation technology, pp. 573-603, 2016.