mirror of
https://asciireactor.com/otho/cs-5821.git
synced 2024-11-21 21:55:06 +00:00
project files added
This commit is contained in:
parent
ec21317213
commit
1559cdd06d
23
project/anna_program
Normal file
23
project/anna_program
Normal file
@ -0,0 +1,23 @@
|
|||||||
|
# Get data from csv
|
||||||
|
setwd("~/Documents/Classes/CS5821/project/HTRU2")
|
||||||
|
raw_data = read.csv("HTRU_2.csv")
|
||||||
|
colnames(raw_data) = c("X1","X2","X3","X4","X5","X6","X7","X8","PLSR")
|
||||||
|
attach(raw_data)
|
||||||
|
|
||||||
|
# create a new set with equal amounts of both pulsars and not pulsars
|
||||||
|
ydata=which(PLSR==0)
|
||||||
|
ndata=which(PLSR==1)
|
||||||
|
set.seed(11)
|
||||||
|
data = rbind(sample(ydata,1500),sample(ndata,1500))
|
||||||
|
|
||||||
|
|
||||||
|
# QDA
|
||||||
|
library(ISLR)
|
||||||
|
library(MASS)
|
||||||
|
train=sample(data,2500)
|
||||||
|
qda.fit = qda(PLSR~X1+X2+X3+X4,data=data,subset=train)
|
||||||
|
summary(qda.fit)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
BIN
project/data/HTRU2/HTRU2.zip
Normal file
BIN
project/data/HTRU2/HTRU2.zip
Normal file
Binary file not shown.
1
project/data/HTRU2/HTRU_2.arff
Normal file
1
project/data/HTRU2/HTRU_2.arff
Normal file
File diff suppressed because one or more lines are too long
1
project/data/HTRU2/HTRU_2.csv
Normal file
1
project/data/HTRU2/HTRU_2.csv
Normal file
File diff suppressed because one or more lines are too long
123
project/data/HTRU2/Readme.txt
Normal file
123
project/data/HTRU2/Readme.txt
Normal file
@ -0,0 +1,123 @@
|
|||||||
|
******************************************************************************************
|
||||||
|
|
||||||
|
# HTRU2
|
||||||
|
|
||||||
|
Author: Rob Lyon, School of Computer Science & Jodrell Bank Centre for Astrophysics,
|
||||||
|
University of Manchester, Kilburn Building, Oxford Road, Manchester M13 9PL.
|
||||||
|
|
||||||
|
Contact: rob@scienceguyrob.com or robert.lyon@.manchester.ac.uk
|
||||||
|
Web: http://www.scienceguyrob.com or http://www.cs.manchester.ac.uk
|
||||||
|
or alternatively http://www.jb.man.ac.uk
|
||||||
|
******************************************************************************************
|
||||||
|
|
||||||
|
1. Overview
|
||||||
|
|
||||||
|
HTRU2 is a data set which describes a sample of pulsar candidates collected during the
|
||||||
|
High Time Resolution Universe Survey (South) [1].
|
||||||
|
|
||||||
|
Pulsars are a rare type of Neutron star that produce radio emission detectable here on
|
||||||
|
Earth. They are of considerable scientific interest as probes of space-time, the inter-
|
||||||
|
stellar medium, and states of matter (see [2] for more uses).
|
||||||
|
|
||||||
|
As pulsars rotate, their emission beam sweeps across the sky, and when this crosses
|
||||||
|
our line of sight, produces a detectable pattern of broadband radio emission. As pulsars
|
||||||
|
rotate rapidly, this pattern repeats periodically. Thus pulsar search involves looking
|
||||||
|
for periodic radio signals with large radio telescopes.
|
||||||
|
|
||||||
|
Each pulsar produces a slightly different emission pattern, which varies slightly with each
|
||||||
|
rotation (see [2] for an introduction to pulsar astrophysics to find out why). Thus a
|
||||||
|
potential signal detection known as a 'candidate', is averaged over many rotations of the
|
||||||
|
pulsar, as determined by the length of an observation. In the absence of additional info,
|
||||||
|
each candidate could potentially describe a real pulsar. However in practice almost all
|
||||||
|
detections are caused by radio frequency interference (RFI) and noise, making legitimate
|
||||||
|
signals hard to find.
|
||||||
|
|
||||||
|
Machine learning tools are now being used to automatically label pulsar candidates to
|
||||||
|
facilitate rapid analysis. Classification systems in particular are being widely adopted,
|
||||||
|
(see [4,5,6,7,8,9]) which treat the candidate data sets as binary classification problems.
|
||||||
|
Here the legitimate pulsar examples are a minority positive class, and spurious examples
|
||||||
|
the majority negative class. At present multi-class labels are unavailable, given the
|
||||||
|
costs associated with data annotation.
|
||||||
|
|
||||||
|
The data set shared here contains 16,259 spurious examples caused by RFI/noise, and 1,639
|
||||||
|
real pulsar examples. These examples have all been checked by human annotators. Each
|
||||||
|
candidate is described by 8 continuous variables. The first four are simple statistics
|
||||||
|
obtained from the integrated pulse profile (folded profile). This is an array of continuous
|
||||||
|
variables that describe a longitude-resolved version of the signal that has been averaged
|
||||||
|
in both time and frequency (see [3] for more details). The remaining four variables are
|
||||||
|
similarly obtained from the DM-SNR curve (again see [3] for more details). These are
|
||||||
|
summarised below:
|
||||||
|
|
||||||
|
1. Mean of the integrated profile.
|
||||||
|
2. Standard deviation of the integrated profile.
|
||||||
|
3. Excess kurtosis of the integrated profile.
|
||||||
|
4. Skewness of the integrated profile.
|
||||||
|
5. Mean of the DM-SNR curve.
|
||||||
|
6. Standard deviation of the DM-SNR curve.
|
||||||
|
7. Excess kurtosis of the DM-SNR curve.
|
||||||
|
8. Skewness of the DM-SNR curve.
|
||||||
|
|
||||||
|
HTRU 2 Summary
|
||||||
|
|
||||||
|
17,898 total examples.
|
||||||
|
1,639 positive examples.
|
||||||
|
16,259 negative examples.
|
||||||
|
|
||||||
|
|
||||||
|
The data is presented in two formats: CSV and ARFF (used by the WEKA data mining tool).
|
||||||
|
Candidates are stored in both files in separate rows. Each row lists the variables first,
|
||||||
|
and the class label is the final entry. The class labels used are 0 (negative) and 1
|
||||||
|
(positive).
|
||||||
|
|
||||||
|
Please not that the data contains no positional information or other astronomical details. It is
|
||||||
|
simply feature data extracted from candidate files using the PulsarFeatureLab tool (see [10]).
|
||||||
|
|
||||||
|
2. Citing our work
|
||||||
|
|
||||||
|
If you use the dataset in your work please cite us using the DOI of the dataset, and the paper:
|
||||||
|
|
||||||
|
R. J. Lyon, B. W. Stappers, S. Cooper, J. M. Brooke, J. D. Knowles, Fifty Years of Pulsar
|
||||||
|
Candidate Selection: From simple filters to a new principled real-time classification approach
|
||||||
|
MNRAS, 2016.
|
||||||
|
|
||||||
|
3. Acknowledgements
|
||||||
|
|
||||||
|
This data was obtained with the support of grant EP/I028099/1 for the University of Manchester
|
||||||
|
Centre for Doctoral Training in Computer Science, from the UK Engineering and Physical Sciences
|
||||||
|
Research Council (EPSRC). The raw observational data was collected by the High Time Resolution
|
||||||
|
Universe Collaboration using the Parkes Observatory, funded by the Commonwealth of Australia and
|
||||||
|
managed by the CSIRO.
|
||||||
|
|
||||||
|
4. References
|
||||||
|
|
||||||
|
[1] M.~J. Keith et al., "The High Time Resolution Universe Pulsar Survey - I. System Configuration
|
||||||
|
and Initial Discoveries",2010, Monthly Notices of the Royal Astronomical Society, vol. 409,
|
||||||
|
pp. 619-627. DOI: 10.1111/j.1365-2966.2010.17325.x
|
||||||
|
|
||||||
|
[2] D. R. Lorimer and M. Kramer, "Handbook of Pulsar Astronomy", Cambridge University Press, 2005.
|
||||||
|
|
||||||
|
[3] R. J. Lyon, "Why Are Pulsars Hard To Find?", PhD Thesis, University of Manchester, 2015.
|
||||||
|
|
||||||
|
[4] R. J. Lyon et al., "Fifty Years of Pulsar Candidate Selection: From simple filters to a new
|
||||||
|
principled real-time classification approach", Monthly Notices of the Royal Astronomical Society,
|
||||||
|
submitted.
|
||||||
|
|
||||||
|
[5] R. P. Eatough et al., "Selection of radio pulsar candidates using artificial neural networks",
|
||||||
|
Monthly Notices of the Royal Astronomical Society, vol. 407, no. 4, pp. 2443-2450, 2010.
|
||||||
|
|
||||||
|
[6] S. D. Bates et al., "The high time resolution universe pulsar survey vi. an artificial neural
|
||||||
|
network and timing of 75 pulsars", Monthly Notices of the Royal Astronomical Society, vol. 427,
|
||||||
|
no. 2, pp. 1052-1065, 2012.
|
||||||
|
|
||||||
|
[7] D. Thornton, "The High Time Resolution Radio Sky", PhD thesis, University of Manchester,
|
||||||
|
Jodrell Bank Centre for Astrophysics School of Physics and Astronomy, 2013.
|
||||||
|
|
||||||
|
[8] K. J. Lee et al., "PEACE: pulsar evaluation algorithm for candidate extraction a software package
|
||||||
|
for post-analysis processing of pulsar survey candidates", Monthly Notices of the Royal Astronomical
|
||||||
|
Society, vol. 433, no. 1, pp. 688-694, 2013.
|
||||||
|
|
||||||
|
[9] V. Morello et al., "SPINN: a straightforward machine learning solution to the pulsar candidate
|
||||||
|
selection problem", Monthly Notices of the Royal Astronomical Society, vol. 443, no. 2,
|
||||||
|
pp. 1651-1662, 2014.
|
||||||
|
|
||||||
|
[10] R. J. Lyon, "PulsarFeatureLab", 2015, https://dx.doi.org/10.6084/m9.figshare.1536472.v1.
|
BIN
project/data/HTRU2/__MACOSX/._HTRU_2.arff
Normal file
BIN
project/data/HTRU2/__MACOSX/._HTRU_2.arff
Normal file
Binary file not shown.
BIN
project/data/HTRU2/__MACOSX/._Readme.txt
Normal file
BIN
project/data/HTRU2/__MACOSX/._Readme.txt
Normal file
Binary file not shown.
BIN
project/mariia/project/plots.png
Normal file
BIN
project/mariia/project/plots.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 34 KiB |
17
project/notes
Normal file
17
project/notes
Normal file
@ -0,0 +1,17 @@
|
|||||||
|
Task
|
||||||
|
|
||||||
|
|
||||||
|
Doesn't ha
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
I am not an oracle but usually you don't know what parameters to pass into a classifier - figuring those out is a big deal
|
||||||
|
|
||||||
|
|
||||||
|
problem with column 9
|
||||||
|
|
||||||
|
|
||||||
|
selecting right features
|
||||||
|
|
BIN
project/stu1188.pdf
Normal file
BIN
project/stu1188.pdf
Normal file
Binary file not shown.
Loading…
Reference in New Issue
Block a user