CatBoost AI Model Hits 98.8% (R2) Accuracy in Predicting Water-Cleaning Biochar Performance

Key Takeaways

A-Star AI Model: Out of nine machine learning models tested, CatBoost was the best, predicting biochar’s dye removal ability with 98.8% accuracy .
Conditions Matter Most: The model revealed that experimental conditions (like concentration and temperature) are the most important factor (50.8% importance) for successful dye removal, even more than the biochar’s physical properties (34.1%)
The #1 Factor: The most significant predictor of adsorption success was the initial dye-to-biochar concentration ratio (C0).
Proven in the Lab: The model’s predictions aren’t just theoretical. They were tested against new, real-world lab experiments and confirmed to be highly accurate, achieving a validation R2 of 0.9037.
A Tool for All: The researchers developed a user-friendly graphical interface (GUI) based on the model, allowing others to use this AI to quickly predict biocharBiochar is a carbon-rich material created from biomass decomposition in low-oxygen conditions. It has important applications in environmental remediation, soil improvement, agriculture, carbon sequestration, energy storage, and sustainable materials, promoting efficiency and reducing waste in various contexts while addressing climate change challenges. More performance without being a data scientist.

Industrial dyes from textiles, plastics, and paper manufacturing pollute wastewater with toxic, colorful compounds that are notoriously difficult to remove. A promising solution for this environmental challenge is biochar, which acts like a sponge to adsorb and trap these dyes. The problem is that finding the perfect combination of biochar type, dye, and treatment conditions can require thousands of slow, costly lab experiments. A new study by Chong Liu and colleagues, published in Carbon Research, uses the power of artificial intelligence to skip this guesswork, developing a model that predicts how well a biochar will work with 98.8% accuracy.

The research team set out to build the most robust predictive model to date. They began by gathering a large dataset from the scientific literature, amassing 685 data points from experiments involving 43 different types of biochar and 15 different dyes. They fed this data, which included 17 input features—like the biochar’s carbon content and surface area, the water’s pHpH is a measure of how acidic or alkaline a substance is. A pH of 7 is neutral, while lower pH values indicate acidity and higher values indicate alkalinity. Biochars are normally alkaline and can influence soil pH, often increasing it, which can be beneficial More and temperature, and the dye’s molecular properties—into nine different machine learning (ML) models to see which could most accurately predict the biochar’s “adsorption capacity. After rigorous training and data processing, one model emerged as the clear winner: CatBoost.

The CatBoost model demonstrated exceptional performance, achieving a coefficient of determination of 0.9880.This value this close to 1.0 means the model’s predictions almost perfectly match the real-world experimental outcomes. It significantly outperformed all eight other models, including other popular tree-based models like Random Forest and kernel-based models like Kriging. The model also proved extremely stable, maintaining its high accuracy over 1,000 random validation iterations. This confirmed it was a reliable and robust tool for prediction.

Having a highly accurate model allowed the researchers to look “under the hood” and ask a critical question: what really matters for removing dye? The AI model’s feature importance analysis provided a clear answer. The most dominant group of factors, accounting for 50.8% of the impact, was the “experimental conditions”. This was significantly more important than the biochar’s own physical and chemical characteristics (34.1%) or the specific type of dye being removed (15.1%).

Drilling down further, the single most significant feature of all was C0, the ratio of the initial dye concentration to the dose of biochar. The model showed that adsorption capacity increases dramatically as this ratio rises, as a higher concentration provides a stronger “driving force” to push dye molecules onto the biochar’s surface. This effect works up to a certain point, after which the biochar’s active sites become saturated, and the performance plateaus. The second most important factor was the biochar’s specific surface area (BET), which showed a similar pattern: performance increased until the surface area reached about 750 m2/g, and then leveled off.

A model that only works on a computer is just an academic exercise. The team validated their findings in the lab. They created a new biochar from cotton straw and used it to remove three different dyes (Methylene Blue, Congo Red, and Malachite Green) under various conditions. They then compared the actual measured results to what the CatBoost model predicted. The model passed the real-world test with flying colors, achieving a validation R2 of 0.9037, confirming its practical applicability.

This study provides a powerful and economical tool for environmental engineering. Instead of relying on expensive trial and error, researchers and water treatment plants can now use this model to rapidly screen thousands of potential scenarios. The team even built a simple graphical user interface (GUI) to make the model accessible. This data-driven approach can find the most efficient and cost-effective way to clean wastewater, identifying the most important parameters (like concentration and surface area) to optimize for a cleaner environment.

Source: Liu, C., Balasubramanian, P., Nguyen, X. C., An, J., Praneeth, S., Zhang, P., & Huang, H. (2025). Enhanced machine learning prediction of biochar adsorption for dyes: Parameter optimization and experimental validation. Carbon Research, 4(46).

Download

Shanthi Prabha V

Shanthi Prabha V, PhD is a Biochar Scientist and Science Editor at Biochar Today.