Dear Sara, I reply below to your claims. From: Sara A. Solla <sasolla@gmail.com> Date: Monday, October 21, 2024 at 5:12 PM To: Grossberg, Stephen <steve@bu.edu> Cc: André Fabio Kohn via Comp-neuro <comp-neuro@lists.cnsorg.org>, Stephen Grossberg <steve@cns.bu.edu> Subject: Re: [Comp-neuro] Re: Some scientific history that I experienced relevant to the recent Nobel Prizes to Hopfield and Hinton You don't often get email from sasolla@gmail.com. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> Stephen, You have been complaining about not getting enough credit since I first met you in the mid 1980s. That is not true. You refer to publishing before Hopfield's 1984 paper. You deliberately ignore his 1982 paper, which received 27946 citations: Neural networks and physical systems with emergent collective computational abilities, JJ Hopfield PNAS, April 15, 1982, 79 (8) 2554-2558 My post was originally on a web site with strict word limits. That is why I mentioned Hopfield’s 1984 article, but not his 1982 article. Section 6 of my 1988 review article: Grossberg, S. (1988) Nonlinear neural networks: Principles, mechanisms, and architectures. Neural Networks, 1 , 17-61. https://sites.bu.edu/steveg/files/2016/06/Gro1988NN.pdf reviews Binary, Linear, and Continuous-Nonlinear classical neural network models by many authors. This article has been cited 2410 times. Section 6 shows that the 1982 equation that Hopfield used is a variant of the 1943 McCulloch-Pitts model. The Liapunov function for it is just a discrete version of the Liapunov function for the Additive Model. In contrast, the 1971 student paper you mentioned received 118 citations. I assume that you mean: Grossberg, S. (1971). Embedding fields: Underlying philosophy, mathematics, and applications to psychology, physiology, and anatomy. Journal of Cybernetics, 1, 28-50. https://sites.bu.edu/steveg/files/2016/06/Gro1971JoC.pdf This was a review article published in a journal that ceased to exist in 1980 (see below) and thus could not be searched well by Google, which began in 1998. The article discussed Generalized Additive Models (see its Section 6). I was able to get articles published proving global limit and oscillation theorems for Generalized Additive Models in the Proceedings of the National Academy of Sciences between 1967 and 1971; e.g., Grossberg, S. (1971). Pavlovian pattern learning by nonlinear neural networks. Proceedings of the National Academy of Sciences, 68, 828-831. https://sites.bu.edu/steveg/files/2016/06/Gro1971ProNatAcaSci.pdf See Equations 1, 2, 3, and 4 etc. Your 1983 paper with Michael Cohen in IEEE Transaction of Man, Systems, and Cybernetics was received on August 1, 1982, several months after the 1982 paper by Hopfeld had appeared in print. You did get a good number of citations on this paper, 3344, but not as good as 27946. Michael Cohen and I tried to submit it to the Journal of Cybernetics in 1980. But there was a glitch in the submission process due to the fact that the Journal of Cybernetics unexpectedly stopped publishing that year and our article got lost in the shuffle, unknown to us for a while. We resubmitted in 1982, which is why it was published in 1983. Our results pre-dated Hopfield 1982 and 1984 by two years. About the difference in citations: There is a difference between discovery and marketing. I was told that Hopfield knew about my work before and with Michael Cohen before he went around the country lecturing about work that we had previously published, without citation. You mention a 1982 paper with Michael Cohen; I have not been able to find it. I think that was a typo. Michael published the following article in 1992: Cohen, M. A. (1992). The construction of arbitrary stable dynamics in nonlinear neural networks. Neural Networks, 5, 83 – 103. https://www.sciencedirect.com/science/article/abs/pii/S0893608005800085 Here is the Abstract: “In this paper, two methods for constructing systems of ordinary differential equations realizing any fixed finite set of equilibria in any fixed finite dimension are introduced; no spurious equilibria are possible for either method. By using the first method, one can construct a system with the fewest number of equilibria, given a fixed set of attractors. Using a strict Lyapunov function [boldface mine] for each of these differential equations, a large class of systems with the same set of equilibria is constructed. A method of fitting these nonlinear systems to trajectories is proposed. In addition, a general method which will produce an arbitrary number of periodic orbits of shapes of arbitrary complexity is also discussed. A more general second method is given to construct a differential equation which converges to a fixed given finite set of equilibria. This technique is much more general in that it allows this set of equilibria to have any of a large class of indices which are consistent with the Morse Inequalities. It is clear that this class is not universal, because there is a large class of additional vector fields with convergent dynamics which cannot be constructed by the above method. The easiest way to see this is to enumerate the set of Morse indices which can be obtained by the above method and compare this class with the class of Morse indices of arbitrary differential equations with convergent dynamics. The former set of indices are a proper subclass of the latter, therefore, the above construction cannot be universal. In general, it is a difficult open problem to construct a specific example of a differential equation with a given fixed set of equilibria, permissible Morse indices, and permissible connections between stable and unstable manifolds. A strict Lyapunov function is given for this second case as well. This strict Lyapunov function [boldface mine] as above enables construction of a large class of examples consistent with these more complicated dynamics and indices. The determination of all the basins of attraction in the general case for these systems is also difficult and open.” I have never heard anybody referring to you as 'the father of AI'. As an acolyte of Hopfield, you wouldn’t. What I do remember is a visit to your group at BU many, many years ago - in the late 80s or early 90s. I was amazed at finding out that you closely supervised every word in every slide that anybody in your group was allowed to show when invited to give a talk. Everybody was under a lot of pressure to 'stay on message', where the 'message' was your view on things. I had never encountered a scientific group run as a cult. It made an impression, but not a positive one. So much for training over 100 people - by teaching them not to think by themselves. You describe my devoted teaching as something ugly, even ad hominem. When I was a student, no one told me how to write an article or how to prepare and give a talk or poster. I developed a method to enable my students to grow intellectually and become independent scholars. When I started working with a new PhD student or postdoc, we discussed what topics would potentially interest them. Then we read hundreds of psychological and neurobiological articles together about that topic and began to discuss the results in them. For students interested in engineering and AI, we studied all the methods that were available to solve problems in a targeted problem domain. I encouraged my students to advance their own concepts about what might be the underlying mechanisms that generated an article’s data or benchmarks. Because we studied large amounts of data, favorite concepts often hit a brick wall and had to be discarded. We developed a modeling method that I like to call the Method of Minimal Anatomies. See Figure 2.37 in my 2021 Magnum Opus https://www.amazon.com/Conscious-Mind-Resonant-Brain-Makes/dp/0190070552 This Method is just Occam’s Razor applied to neural networks. As the Figure summarizes, it “Operationalizes the proper level of abstraction” and “that you cannot ‘derive a brain’ in one step”. Our models have been getting incrementally developed in a principled way to the present time. Your statement that “the message was your [my] view on things” is the opposite of how we worked. And how would you know? You were not there. The proof of the pudding is that all my students got positions that they wanted after they earned their PhD’s with me. I was also told by multiple famous scientists that our department sent them their best postdoctoral fellows. In fact, various major labs would contact me to let me know that they had an open position and asked if one of my students was about to graduate. My students have gone on to successful careers after their postdocs and other first post-PhD positions. Around half of them went into academe, and the other half into engineering, technology, and AI. Groups of us have periodically gotten together for happy social events, either at international conferences or in the Boston area. And they sent me information about their accomplishments and growing families for many years. As for back-propagation, this is not what Hinton is cited for in the Nobel Prize citation. It is for the Boltzmann machine. Yes, I know that, and I have also noted that my PhD student, James Williamson, developed statistical neural network models that did not have problems related to deep Learning. I have elsewhere noted that: Restricted Boltzmann Machines (RBMS) also have analogs in the Adaptive Resonance Theory (ART) family of models, but without the problems that follow from using variants of Deep Learning. Here are two of them, both developed by my PhD student, James R. Williamson: Gaussian ARTMAP: A Neural Network for Fast Incremental Learning of Noisy Multidimensional Maps https://www.sciencedirect.com/science/article/pii/0893608095001158 A Constructive, Incremental Learning Network for Mixture Modeling and Classification http://techlab.bu.edu/files/resources/articles_tt/A%20constructive,%20increm... Finally, I always wondered: If ART solves all problems, why are there ML/AI problems that remain to be solved? I would never claim that. I have always taught that science is never done. See the Method of Minimal Anatomies. The fact that you can even think that about my work shows, sad to say, how biased you have become. On Mon, Oct 21, 2024 at 7:49 AM Grossberg, Stephen via Comp-neuro <comp-neuro@lists.cnsorg.org<mailto:comp-neuro@lists.cnsorg.org>> wrote: Dear Comp-neuro colleagues, Here are some short summaries of the history of neural network discoveries, as I experienced it, that are relevant to the recent Nobel Prizes to Hopfield and Hinton: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++THETHE THE NOBEL PRIZES IN PHYSICS TO HOPFIELD AND HINTON FOR MODELS THEY DID NOT DISCOVER: THE CASE OF HOPFIELD Here I summarize my concerns about the Hopfield award. I published articles in 1967 – 1972 in the Proceedings of the National Academy of Sciences that introduced the Additive Model that Hopfield used in 1984. My articles proved global theorems about the limits and oscillations of my Generalized Additive Models. See sites.bu.edu/steveg<http://sites.bu.edu/steveg> for these articles. For example: Grossberg, S. (1971). Pavlovian pattern learning by nonlinear neural networks. Proceedings of the National Academy of Sciences, 68, 828-831. https://lnkd.in/emzwx4Tw This article illustrates that my mathematical results were part of a research program to develop biological neural networks that provide principled mechanistic explanations of psychological and neurobiological data. Later, Michael Cohen and I published a Liapunov function that included the Additive Model and generalizations thereof in 1982 and 1983 before Hopfield (1984) appeared. For example, Cohen, M.A. and Grossberg, S. (1983). Absolute stability of global pattern formation and parallel memory storage by competitive neural networks. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, 815-826. https://lnkd.in/eAFAdvbu I was told that Hopfield knew about my work before he published his 1984 article, without citation. Recall that I started my neural networks research in 1957 as a Freshman at Dartmouth College. That year, I introduced the biological neural network paradigm, as well as the short-term memory (STM), medium-term memory (MTM), and long-term memory (LTM) laws that are used to this day, including in the Additive Model, to explain data about how brains make minds. See the review in https://lnkd.in/gJZJtP_W . When I started in 1957, I knew no one else who was doing neural networks. That is why my colleagues call me the Father of AI. I then worked hard to create a neural networks community, notably a research center, academic department, the International Neural Network Society, the journal Neural Networks, multiple international conferences on neural networks, and Boston-area research centers, while training over 100 gifted PhD students, postdocs, and faculty to do neural network research. See the Wikipedia page. That is why I did not have time or strength to fight for priority of my models. Recently, I was able to provide a self-contained and non-technical overview and synthesis of some of my scientific discoveries since 1957, as well as explanations of the work of many other scientists, in my 2021 Magnum Opus Conscious Mind, Resonant Brain: How Each Brain Makes a Mind https://lnkd.in/eiJh4Ti ++++++++++++++++++++++++++++++++++++++++++++++++++++ THE NOBEL PRIZES IN PHYSICS TO HOPFIELD AND HINTON FOR MODELS THEY DID NOT DISCOVER: THE CASE OF HINTON Here I summarize my concerns about the Hinton award. Many authors developed Back Propagation (BP) before Hinton; e.g., Amari (1967), Werbos (1974), Parker (1982), all before Rumelhart, Hinton, & Williams (1986). BP has serious computational weaknesses: It is UNTRUSTWORTHY (because it is UNEXPLAINABLE). It is UNRELIABLE (because it can experience CATASTROPHIC FORGETTING. It should thus never be used in financial or medical applications. BP learning is also SLOW and uses non-biological NONLOCAL WEIGHT TRANSPORT. See Figure, right column, top. In 1988, I published 17 computational problems of BP: https://lnkd.in/erKJvXFA BP gradually grew out of favor because other models were better. Later, huge online databases and supercomputers enabled Deep Learning to use BP to learn. My 1988 article contrasted BP with Adaptive Resonance Theory (ART) which I first published in 1976: https://lnkd.in/evkfq22G See Figure, right column, bottom. ART never had BP’s problems. ART is now the most advanced cognitive and neural theory that explains HOW HUMANS LEARN TO ATTEND, RECOGNIZE, and PREDICT events in a changing world. ART also explains and simulates data from hundreds of psychological and neurobiological experiments. In 1980, I derived ART from a THOUGHT EXPERIMENT about how ANY system can AUTONOMOUSLY learn to correct predictive errors in a changing world: https://lnkd.in/eGWE8kJg The thought experiment derives ART from a few facts of life that do not mention mind or brain. ART is thus a UNIVERSAL solution of the problem of autonomous error correction in a changing world. That is why ART models can be used in designs for AUTONOMOUS ADAPTIVE INTELLIGENCE in engineering, technology, and AI. ART also proposes a solution of the classical MIND-BODY PROBLEM: HOW, WHERE in our brains, and WHY from a deep computational perspective, we CONSCIOUSLY SEE, HEAR, FEEL, and KNOW about the world, and use our conscious states to PLAN and ACT to realize VALUED GOALS. For details, see Conscious Mind, Resonant Brain: How Each Brain Makes a Mind https://lnkd.in/eiJh4Ti +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ _______________________________________________ Comp-neuro mailing list -- comp-neuro@lists.cnsorg.org<mailto:comp-neuro@lists.cnsorg.org> Mailing list webpage (to subscribe or view archives): https://www.cnsorg.org/comp-neuro-mailing-list To contact admin/moderators, send an email to: comp-neuro-owner@lists.cnsorg.org<mailto:comp-neuro-owner@lists.cnsorg.org> To unsubscribe, send an email to comp-neuro-leave@lists.cnsorg.org<mailto:comp-neuro-leave@lists.cnsorg.org>