Learning 3D object-centric representation through prediction

John Day; Tushar Arora; Jirui Liu; Li Erran Li; Mingbo Cai

doi:10.48550/arxiv.2403.03730

Back

Preprint

Learning 3D object-centric representation through prediction

John Day, Tushar Arora, Jirui Liu, Li Erran Li and Mingbo Cai

arXiv.org

2024-03-06

DOI: https://doi.org/10.48550/arxiv.2403.03730

Abstract

Computer Science - Artificial Intelligence

Computer Science - Computer Vision and Pattern Recognition

Computer Science - Learning

brain-inspired AI

As part of human core knowledge, the representation of objects is the building block of mental representation that supports high-level concepts and symbolic reasoning. While humans develop the ability of perceiving objects situated in 3D environments without supervision, models that learn the same set of abilities with similar constraints faced by human infants are lacking. Towards this end, we developed a novel network architecture that simultaneously learns to 1) segment objects from discrete images, 2) infer their 3D locations, and 3) perceive depth, all while using only information directly available to the brain as training data, namely: sequences of images and self-motion. The core idea is treating objects as latent causes of visual input which the brain uses to make efficient predictions of future scenes. This results in object representations being learned as an essential byproduct of learning to predict.

Metrics

1 Record Views

Details

Title: Learning 3D object-centric representation through prediction
Creators: John Day
Tushar Arora
Jirui Liu
Li Erran Li
Mingbo Cai
Publication Details: arXiv.org
Academic Unit: A&S - Psychology; College of A&S
Language: English
Resource Type: Preprint
Record Identifier: 991032050519302976