Speaker
Description
Obtaining well-calibrated probability density functions (PDFs) of photometric redshift (photo-z) for galaxies without using spectroscopy remains a challenge for many science goals. Deep learning tools have proven to be powerful for this task and gained growing popularity. These include, in particular, state-of-the-art deep neural networks that are typically fed with multi-band galaxy images or photometry and produce density estimates that mimic PDFs. However, in addition to the absence of interpretability, such density estimates usually lack rigorous statistical basis and suffer from systematic biases. To tackle these problems, we develop a two-stage method that endows statistical basis for estimating photo-z PDFs by incorporating a weighted k-nearest-neighbor (kNN) algorithm into the conventional neural network. In the first stage, we establish a latent space that properly encodes redshift information via a representation learning framework, trained with observed galaxy images and spectroscopic redshift labels. In the second stage, we select k nearest neighbors for each query galaxy in the learnt latent space and construct a photo-z PDF using their labels, with the optimal k determined by diagnostics for the local distribution of probability integral transform (PIT). By fitting and assigning different weights to the neighbors, this approach further allows for recalibrating the PDFs and resolving distribution mismatch between the inference set and the training set. Experiments on the SDSS data and the CFHTLS data have shown that our method produces well-calibrated photo-z PDFs over different redshift ranges. In contrast to benchmark methods, our method exhibits robustness under distribution mismatch in the aspect of restricting photo-z-dependent biases, and is able to do so without compromising the accuracy and thus holds promises for future large-scale surveys. Furthermore, the local PIT diagnostics applied in our method has the power of probing the local structure of data, which increases model interpretability by suggesting possible correlations between redshift and observational or physical variables, and may offer meaningful insights on the properties of different galaxy populations.