According to the research paper published on arXiv.org: "rStar-Math achieves this by exercising ... which the policy SLM and PPM are built from scratch and iteratively evolved to improve reasoning ...