Traditional computer vision sees cities as shapes, not social systems; this paper shows how vision-language reasoning enables AI to identify meaningful urban spaces like schools and parks by thinking in stages.
Traditional computer vision sees cities as shapes, not social systems; this paper shows how vision-language reasoning enables AI to identify meaningful urban spaces like schools and parks by thinking in stages.