Astronomical datasets are typically very large, and manually
classifying the data in them is effectively impossible. We use
machine learning algorithms to provide classifications (as stars,
quasars and galaxies) for more than one billion objects given
photometrically in the Third Data Release of the Sloan Digital Sky
Survey (SDSS III). We have used $k$NN, SVM and random forest
algorithms in a distributed environment over the cloud to classify
1,183,850,913 unclassified, photometric objects present in the SDSS
III catalog. The SDSS III catalog contains photometric data for all
objects viewed through a telescope, and spectroscopic data for a small
part of these. Although it is possible to classify all the objects
using spectroscopic data, it is impractical to obtain such data for
each one of them. To classify such a big dataset on a single machine
would be impractically slow, so have used the Spark cluster computing
framework to implement a distributed computing environment over the
cloud. We found that writing results (dozens of gigabytes) to the
cloud storage is very slow while using $k$NN. Though writing the
results with SVM is faster as it is done in parallel, its accuracy is
only around 87\%, due to the lack of a kernel implementation of it in
Spark. We then used the random forest algorithm to classify the
entire set of 1,183,850,913 objects with an accuracy of 94\% in about
17 hours of processing time. The result set is significant as even
collecting spectroscopic data for these many objects would take
decades, and our classifications can help astronomers and
astrophysicists carry out further studies.