Background
Robotic manipulation of deformable, plastic materials such as clay/putty is challenging because the object undergoes large, irreversible shape changes that depend on contact history. A vision-based perception pipeline can continuously observe the current shape, while vision–action–language models offer a promising way to translate a target shape (given as an image or language instruction) into a sequence of shaping actions. This enables a robot to “look at the goal” and adapt its manipulation strategy online to sculpt the material into the desired form.
Objective
The goal of this thesis is to develop a vision-driven robotic clay sculpting system. The system should (1) perceive and track the clay shape in real time, (2) understand a target shape from an image or language prompt, and (3) generate and execute a closed-loop action strategy for a robotic arm to deform the clay into the target shape.