Astrophysicist Maksym Tsizh Elaborates on Universe Simulations and the Boundaries of Artificial Intelligence

Maksym Tsizh holds the title of Candidate of Physical and Mathematical Sciences and serves as a research fellow at the Astronomical Observatory of Ivan Franko National University of Lviv.

Studies the large-scale structure of the universe, specifically examining how galaxies are distributed in space, the structures they form, and their evolution, based on observational data and cosmological simulations.

We examine the ways in which machine learning facilitates the exploration of space, the distinctions between AI tools used in scientific research and popular chatbots, and the reasons why humans continue to serve as the primary architects of scientific knowledge.

Maksym Tsizh

Maxime, could you kindly inform me whether you utilize LLM models in your daily professional activities?

For the past two years, I have utilized their capabilities nearly daily, predominantly for programming purposes. They have attained sufficient expertise to operate highly specialized tools, libraries, and programming languages. This proficiency is adequate for straightforward tasks — discrete modules that are subsequently integrated into larger constructs. Although they do not compose 200-line scripts independently, they perform effectively with 50-line segments that can be assembled into more intricate solutions. Currently, I initiate the majority of scripts with language learning models; however, I always personally construct the architecture.

Why not allow the model to manage the architecture?

There exists an excessive amount of context to elucidate comprehensively. It is more efficient and expedient to compose it independently and subsequently utilize the language model to process the individual sections.

Which model do you use?

Gemini is freely available and performs adequately. My colleagues mention that there are superior alternatives; however, it suffices for my requirements.

You commenced your career within the academic sphere, subsequently dedicating several years to work as an R&D engineer specializing in image recognition at a commercial enterprise, and eventually returning to academia. Do you leverage that experience in your current research?

Certainly, during my tenure, I acquired extensive knowledge in programming, machine learning, and computer vision. I became proficient in utilizing GitHub and various libraries, and I mastered numerous programming patterns. All of these experiences are interconnected with my current pursuits. Additionally, I have observed the significant advancements in the field of image recognition over recent years. Tasks that once required a team of engineers over a week can now be accomplished by a single individual in one day. Although I was no longer employed there at that time, this development is quite indicative.

Do you utilize Vibe coding in your professional endeavors?

This approach is insufficient for highly specialized tasks. In such cases, it is necessary to systematically perform each step in the sequence of calculations; this cannot be achieved through a single prompt. Additionally, large language models (LLMs) are trained on template solutions; however, scientific inquiry frequently requires innovative solutions crafted independently from fundamental principles. Although vibe coding is an innovative development, it has not yet been adopted within our field.

Can artificial intelligence genuinely be regarded as a form of intelligence, or merely as an advanced simulation?

Artificial intelligence remains a somewhat ambiguous concept. There exists a well-documented phenomenon whereby individuals consistently elevate the criteria for what qualifies as intelligence. Historically, the Turing test was regarded as the standard benchmark; however, contemporary AI systems now surpass it. Consequently, expectations continue to escalate: stakeholders anticipate that machines will exhibit considerably more sophisticated capabilities before acknowledging them as genuinely intelligent. It is my conviction that this trend will persist for an extended period.

I do not believe that genuinely human-like intelligence will arise for several decades, given that we ourselves do not possess a comprehensive understanding of its mechanisms. The debate concerning whether to categorize this as intelligence is essentially a semantic dispute and lacks substantial interest. I am of the opinion that if such external intelligence does indeed develop, it is improbable that it will be founded upon large language models. It appears to me that these models have already attained the peak of their capabilities.

What are you doing right now?

There are several areas I am actively engaged in. The most significant of these is observational cosmology. I am also involved in the classification of galaxies based on images — numerous research groups are conducting work in this domain. However, in recent years, my primary interest has shifted towards the large-scale structure of the universe.

Prior to commencing my dissertation, I engaged in theoretical research, such as models of dark energy, primarily focusing on deriving equations. Currently, my interest has shifted toward empirical work involving measurements. Concerning the large-scale structure of the universe, this encompasses the distribution of matter, the arrangement of galaxies, the structures they develop, and the intricate details of their evolution.

How do you use machine learning in this work?

Cosmological simulations are employed for this research. I have extensively worked with these simulations in recent years. They monitor the evolution of the large-scale distribution of structures, and I aim to make them as consistent as possible with observational data obtained through telescopic observations.

Training a model to recognize specific structural features involves initially demonstrating, on a small sample, the presence of large-scale formations, fibers, walls within the distribution, and various topological configurations. This approach can be utilized to train machine learning models, which are subsequently transferred — after being trained on simulation results — to the observational data provided by telescopes.

Telescopes undertake extensive surveys of the celestial sphere, and numerous such instruments are already in operation. These surveys encompass tens of millions of galaxies. Furthermore, models assist in reconstructing the distribution of matter within the universe.

The Vera Rubin Observatory has already commenced its initial observations and will soon produce 20 terabytes of data each night. Is it now impractical to manage such a data stream without the application of machine learning?

There exists no alternative. Since the advent of digital photography, images have been required to be processed instantaneously via computer systems, a development that occurred more than 30 years ago. Presently, we reside in an era characterized by data overload, which impacts the field of astronomy as well. Numerous telescopes are in operation, and their resolution capabilities are exceedingly high.

Machine learning is indispensable at virtually every stage, encompassing filtering and preprocessing to eliminate noise and undesired elements from images, as well as the analysis, classification, and recognition of objects. Consequently, machine learning algorithms have historically been integral to this domain and continue to evolve.

However, telescopes are not confined to operations within the visible spectrum. What considerations are there regarding radio, X-ray, or infrared radiation?

All of this can be transformed into an image. Both radio waves and X-rays are, in essence, captured through a telescope, merely at different wavelengths. It is the same phenomenon; they are all waves.

Could you please explain to a layperson how a large language model (LLM) differs from machine learning techniques specifically designed for particular tasks in cosmology?

In a large language model (LLM), the fundamental processing unit is a token. Essentially, it is a unit that lies between a single letter and a complete meaningful segment. These models are trained by analyzing the sequences in which these tokens occur within extensive datasets. Consequently, when a user inputs text, the model can predict the most probable subsequent character. While this capability appears highly sophisticated, it should be regarded as an illusion of intelligence.

Conversely, convolutional neural networks (CNNs) are predominantly employed for image processing across various disciplines, including cosmology and astrophysics. In such models, the fundamental data unit is a pixel rather than a word. The nature of this data fundamentally influences the model’s characteristics and operational mechanisms. While transformer-based language models concentrate on linguistic tokens, convolutional neural networks operate on pixels and their spatial interrelations.

Therefore, if you intend to comprehend the distinction, it is essential to first examine the data that this model utilizes. In cosmology, in particular, we predominantly rely on visual information, which underscores the necessity of these specialized tools.

Maksym Tsizh

Maksym, kindly provide an example of how you utilize large language models (LLMs) in your research.

For example, I authored a scholarly paper, and the reviewer acknowledged the quality of my work, but noted the existence of another publication — may I replicate the methodology using my own dataset? That publication describes a highly specific function onto which data is mapped, with parameters selected to identify an unknown function based on the data. Since this does not constitute my primary research focus, I simply input this paper into a large language model (LLM) and instruct: here is this function; generate a script to model the data accordingly. While I am capable of performing these tasks autonomously, utilizing this approach significantly reduces the time required to complete the task.

Another example: if I wish to acquire knowledge regarding the most recent scientific developments in a field that is not significantly different from my own, I may request the LLM to “Provide sources on such-and-such a topic.” Why specify “not too far removed”? Because if the subject were microbiology or a similar field about which I possess no prior knowledge, I would be unable to discern whether I had been misled. However, in this scenario, I would have the capacity to identify any deception.

It can locate online links on a particular subject, conduct in-depth analysis, or provide summaries, and swiftly form conclusions. However, these capabilities are not of significant importance and can be readily corroborated. I would not rely solely on it for critical matters.

Do you believe in scientific intuition? If a machine can recognize patterns more effectively than a human, is there still a place for humans in science?

Intuition has historically played a pivotal role. However, the enigma surrounding scientific intuition reflects the broader mystery of human intuition as a whole. Its mechanisms remain incompletely understood; nevertheless, it is undeniably beneficial. Frankly, I find it challenging to compare this with machine algorithms, as scientific intuition cannot be easily suppressed either.

However, in general, it is likely to assume a diminished role currently, given the advent of the era of big data. Intuition was once valuable because it addressed physics — something tangible in our environment — until scientific progress advanced to the extent that we now must contend with extensive datasets.

You manage a Telegram channel dedicated to science. Do you utilize AI models in your work for content creation or image generation?

For me, this channel primarily serves as a source of entertainment. It provides a platform for engaging in pleasant conversations with like-minded individuals and for mental relaxation. The tone employed is lighthearted, as it is not intended to be a highly serious endeavor. I personally select intriguing news from the forefront of scientific research and share it here. I do not require assistance with this task, as I find it enjoyable to compose posts and incorporate humor. I do not generate income from this activity and have no intentions to do so. My involvement is solely driven by the enjoyment it provides.

I suppose you could establish an agent to search for news within a specific domain and monitor arXiv, thereby creating a form of automated feed. However, for me, it primarily serves as a means of communication and interaction with like-minded individuals rather than a formal project.

You promote science and engage with students. Do you believe that the use of artificial intelligence might diminish the quality of training for the upcoming generation of scientists?

I hold a different perspective. It is highly advantageous that artificial intelligence, in all its manifestations, is accessible to students. It is regrettable that such resources were not available earlier. Naturally, the efficacy of this depends on the methodologies employed by the university, the instructor, and the individuals responsible for training future professionals. When students commence writing their bachelor’s or master’s thesis or a term paper, I advise them, “Utilize all available resources.”

However, my objective is to assign them a task that only artificial intelligence could not accomplish. Consequently, I am able to permit them to utilize these tools for further learning. Historically, I provided them with additional materials or hints; however, presently, artificial intelligence manages much of this. I have not observed any disadvantages arising from this approach.

He previously experienced hallucinations more frequently; however, such instances are now considerably less common. For instance, he might issue a link to a non-existent article or refer to a work that does not actually exist—in other words, he fabricates the information. I consistently advise students of this: they may not find the precise source referenced by the model, but they can often encounter materials on a comparable topic to the information they seek.

At minimum, they will gain experience in engaging with this type of artificial intelligence, recognizing that it may produce statements that are not accurate and understanding the necessity of verifying information. This, in particular, constitutes the primary aspect of their learning.

Furthermore, it assists them in coding. I have no objection to this assistance. I am not accountable for the quality of their programming, but it is important to me that they possess a reliable tool. They require scripts for data processing. If AI generates a functional script that produces accurate results, I will review it, and they will understand its workings as well — and that suffices. This approach allows them to dedicate less time to coding, which I consider to be a significant advantage.

Therefore, does artificial intelligence not hinder individuals from engaging in critical thinking, but instead liberate them from mundane activities?

An individual who refrains from action — or deliberately selects tasks that do not necessitate cognitive effort — does not engage in thinking. However, based on my experience, no individual artificial intelligence, nor an amalgamation of such systems, is sufficient to entirely supplant a researcher. You will still need to analyze the results personally and consider various factors that could have influenced your findings.

Artificial intelligence currently lacks the sophistication required to perform this task, and it is unlikely that such capabilities will be developed in the near future. Large Language Models (LLMs) do not exist within the tangible environment; rather, they operate within a domain of written texts and sources. Only humans inhabit and perceive the real world in an experiential manner. The development of systems capable of navigating the physical environment would represent a particularly compelling advancement.

A considerable number of scientists are presently either engaged in military service or have emigrated from the country. To what extent can artificial intelligence serve as a partial remedy for the shortage of specialists in Ukrainian science?

We conducted our own research on this topic. It showed that, due to the war, Ukrainian astronomers have been publishing approximately 20% fewer scientific papers. Many of our staff members — both from the university and the observatory — are serving in the Armed Forces of Ukraine. In addition, many Ukrainian scientists have had to leave the country. All of these factors combined have led to this outcome.

Can it replace human intelligence? As I have previously stated, I do not observe any indications that it can entirely substitute the profession of a researcher in any domain. It necessitates a combination of diverse areas of expertise: coding, hypothesis formulation, research structuring, and publication of articles. While it can automate numerous tasks, critical areas will still require human intervention. Certain stages may proceed more rapidly, but not exponentially — nothing close to tenfold — since all outputs generated by an LLM must be meticulously reviewed and proofread. I do not believe this will substantially affect the overall outcome; however, it offers considerable assistance in various aspects.

Furthermore, it is not solely Ukrainian scientists who are utilizing artificial intelligence. The scientific community at large is acclimating to the realization that a tool is now accessible to everyone, thereby instigating shifts in expectations. It is plausible that, on average, researchers will be anticipated to publish a greater number of articles annually. Additionally, it may become customary to fully disclose the algorithms employed.

Will the standards for researchers undergo changes?

Primarily, the stipulation for result reproducibility will become increasingly strict. It must be explicitly detailed how the research was conducted and how conclusions were derived—step by step. While this is generally one of the fundamental scientific standards, its significance is further amplified with the emergence of artificial intelligence. When utilizing such tools, it is imperative not to simply obtain outcomes and conceal the methodologies employed. This approach contradicts the principles of scientific integrity. Without transparency and verifiability of the entire process, authentic scientific results cannot be established.

An emerging standard is currently being formulated that mandates individuals to transparently disclose any assistance received from artificial intelligence during any phase, whether in composing text or programming. The scientific community is gradually adjusting to the realities of artificial intelligence and the necessity for established guidelines that will be universally observed.

Have Ukrainian researchers gained increased access to data or modeling results since the onset of the conflict?

I am unable to provide a specific example; however, overall, numerous new opportunities have arisen. At the commencement of the conflict, these opportunities were not distributed equitably. Western institutions were offering positions in response to some researchers departing Ukraine. It was somewhat more challenging for those remaining in hazardous regions to obtain support. Nevertheless, such opportunities did exist.

Special grant competitions were initiated for our scientists, and Ukrainian researchers continued to participate as members of international research teams. Various forms of support were offered, including competitions for infrastructure and funding aimed at preserving, strengthening, or restoring research infrastructure in Ukraine.

Certain publishers provided Ukrainian researchers with complimentary access to their materials, which were otherwise accessible only for a fee to the global audience. Additionally, some journals, such as “Monthly Notices” (“Monthly Notices of the Royal Astronomical Society”), offered complimentary publication opportunities to Ukrainian researchers. While we were not always able to capitalize on all these offers, there are undoubtedly instances exemplifying such initiatives.

To summarize our discussion, does this imply that humans continue to serve as the principal creators of scientific knowledge, while artificial intelligence is merely employed as a tool by them?

Certainly. Machine learning and large language models assist in data processing, coding, and source identification. Nonetheless, it is individuals who formulate hypotheses, develop research frameworks, and analyze outcomes. However, these tools are continuously advancing, and this progression persists.

Advertising