Linking Filter Validation
[1]:
import pandas as pd
import numpy as np
from sorcha.modules.PPLinkingFilter import PPLinkingFilter
This function aims to mimic the effects of the Solar System Processing pipeline in linking objects. More information can be found here. If we use the SSP defaults, for an object to be linked, it must have:
At least 2 observations in a night to constitute a valid tracklet.
These observations must have an angular separation of at least 0.5 arcseconds in order to be recognised as separate.
However, subsequent observations in a tracklet must occur within 90 minutes or 0.0625 days.
At least 3 tracklets must be observed to form a valid track.
These tracklets must be observed in less than 15 days.
We also expect 95% of objects to be linked. For now, we will set this parameter to 100% in order to test the others.
These six parameters can be changed in the config file and are found in the [LINKINGFILTER] section.
[2]:
min_observations = 2
min_angular_separation = 0.5
max_time_separation = 0.0625
min_tracklets = 3
min_tracklet_window = 15
detection_efficiency = 1
night_start_utc = 17.0
Let’s create an object that should definitely be linked according to these parameters.
[3]:
obj_id = ["pretend_object"] * 6
field_id = np.arange(1, 7)
times = [60000.03, 60000.06, 60005.03, 60005.06, 60008.03, 60008.06]
ra = [142, 142.1, 143, 143.1, 144, 144.1]
dec = [8, 8.1, 9, 9.1, 10, 10.1]
[4]:
observations = pd.DataFrame(
{
"ObjID": obj_id,
"FieldID": field_id,
"fieldMJD_TAI": times,
"RA_deg": ra,
"Dec_deg": dec
}
)
[5]:
observations
[5]:
| ObjID | FieldID | fieldMJD_TAI | RA_deg | Dec_deg | |
|---|---|---|---|---|---|
| 0 | pretend_object | 1 | 60000.03 | 142.0 | 8.0 |
| 1 | pretend_object | 2 | 60000.06 | 142.1 | 8.1 |
| 2 | pretend_object | 3 | 60005.03 | 143.0 | 9.0 |
| 3 | pretend_object | 4 | 60005.06 | 143.1 | 9.1 |
| 4 | pretend_object | 5 | 60008.03 | 144.0 | 10.0 |
| 5 | pretend_object | 6 | 60008.06 | 144.1 | 10.1 |
Now let’s run the linking filter. As this object should be linked, we should receive the same dataframe back.
[6]:
linked_observations = PPLinkingFilter(observations, detection_efficiency, min_observations, min_tracklets, min_tracklet_window, min_angular_separation, max_time_separation, night_start_utc)
[7]:
linked_observations
[7]:
| ObjID | FieldID | fieldMJD_TAI | RA_deg | Dec_deg | object_linked | date_linked_MJD | |
|---|---|---|---|---|---|---|---|
| 0 | pretend_object | 1 | 60000.03 | 142.0 | 8.0 | True | 60007.0 |
| 1 | pretend_object | 2 | 60000.06 | 142.1 | 8.1 | True | 60007.0 |
| 2 | pretend_object | 3 | 60005.03 | 143.0 | 9.0 | True | 60007.0 |
| 3 | pretend_object | 4 | 60005.06 | 143.1 | 9.1 | True | 60007.0 |
| 4 | pretend_object | 5 | 60008.03 | 144.0 | 10.0 | True | 60007.0 |
| 5 | pretend_object | 6 | 60008.06 | 144.1 | 10.1 | True | 60007.0 |
Success! The object was successfully linked. Now let’s play with this dataframe a little. First, let’s remove the first observation, so that we only have two complete tracklets.
[8]:
observations_two_tracklets = observations.iloc[1:].copy()
[9]:
observations_two_tracklets
[9]:
| ObjID | FieldID | fieldMJD_TAI | RA_deg | Dec_deg | object_linked | |
|---|---|---|---|---|---|---|
| 1 | pretend_object | 2 | 60000.06 | 142.1 | 8.1 | True |
| 2 | pretend_object | 3 | 60005.03 | 143.0 | 9.0 | True |
| 3 | pretend_object | 4 | 60005.06 | 143.1 | 9.1 | True |
| 4 | pretend_object | 5 | 60008.03 | 144.0 | 10.0 | True |
| 5 | pretend_object | 6 | 60008.06 | 144.1 | 10.1 | True |
[10]:
unlinked_observations = PPLinkingFilter(observations_two_tracklets, detection_efficiency, min_observations, min_tracklets, min_tracklet_window, min_angular_separation, max_time_separation, night_start_utc)
[11]:
unlinked_observations
[11]:
| ObjID | FieldID | fieldMJD_TAI | RA_deg | Dec_deg | object_linked | date_linked_MJD |
|---|
As expected, we no longer link the object. Now let’s try putting the last two observations outside of the 15-day window.
[12]:
observations_large_window = observations.copy()
observations_large_window['fieldMJD_TAI'] = [60000.03, 60000.06, 60005.03, 60005.06, 60016.03, 60016.06]
[13]:
unlinked_observations = PPLinkingFilter(observations_large_window, detection_efficiency, min_observations, min_tracklets, min_tracklet_window, min_angular_separation, max_time_separation, night_start_utc)
[14]:
unlinked_observations
[14]:
| ObjID | FieldID | fieldMJD_TAI | RA_deg | Dec_deg | object_linked | date_linked_MJD |
|---|
Once again, we no longer link the object. What if we move the first two objects much closer to each other so that they no longer form a valid tracklet?
[15]:
observations_small_sep = observations.copy()
observations_small_sep["RA_deg"] = [142, 142.00001, 143, 143.1, 144, 144.1]
observations_small_sep["Dec_deg"] = [8, 8.00001, 9, 9.1, 10, 10.1]
[16]:
unlinked_observations = PPLinkingFilter(observations_small_sep, detection_efficiency, min_observations, min_tracklets, min_tracklet_window, min_angular_separation, max_time_separation, night_start_utc)
[17]:
unlinked_observations
[17]:
| ObjID | FieldID | fieldMJD_TAI | RA_deg | Dec_deg | object_linked | date_linked_MJD |
|---|
And the object is no longer linked. Finally, let’s move the first two observations much further apart in time so that they once again no longer form a valid tracklet.
[18]:
observations_large_time = observations.copy()
observations_large_time["fieldMJD_TAI"] = [60000.03, 60000.10, 60005.03, 60005.06, 60008.03, 60008.06]
[19]:
unlinked_observations = PPLinkingFilter(observations_large_time, detection_efficiency, min_observations, min_tracklets, min_tracklet_window, min_angular_separation, max_time_separation, night_start_utc)
[20]:
unlinked_observations
[20]:
| ObjID | FieldID | fieldMJD_TAI | RA_deg | Dec_deg | object_linked | date_linked_MJD |
|---|
And as expected, we no longer link the object.
Finally, let’s check that the detection efficiency works as expected. Let’s set it to 0.95.
[21]:
detection_efficiency = 0.95
Now let’s make a dataframe of the same linked object repeated 10000 times.
[22]:
objs = [["pretend_object_" + str(a)] * 6 for a in range(0, 10000)]
obj_id_long = [item for sublist in objs for item in sublist]
field_id_long = list(np.arange(1, 7)) * 10000
times_long = [60000.03, 60000.06, 60005.03, 60005.06, 60008.03, 60008.06] * 10000
ra_long = [142, 142.1, 143, 143.1, 144, 144.1] * 10000
dec_long = [8, 8.1, 9, 9.1, 10, 10.1] * 10000
[23]:
observations_long = pd.DataFrame(
{
"ObjID": obj_id_long,
"FieldID": field_id_long,
"fieldMJD_TAI": times_long,
"RA_deg": ra_long,
"Dec_deg": dec_long
}
)
[24]:
observations_long
[24]:
| ObjID | FieldID | fieldMJD_TAI | RA_deg | Dec_deg | |
|---|---|---|---|---|---|
| 0 | pretend_object_0 | 1 | 60000.03 | 142.0 | 8.0 |
| 1 | pretend_object_0 | 2 | 60000.06 | 142.1 | 8.1 |
| 2 | pretend_object_0 | 3 | 60005.03 | 143.0 | 9.0 |
| 3 | pretend_object_0 | 4 | 60005.06 | 143.1 | 9.1 |
| 4 | pretend_object_0 | 5 | 60008.03 | 144.0 | 10.0 |
| ... | ... | ... | ... | ... | ... |
| 59995 | pretend_object_9999 | 2 | 60000.06 | 142.1 | 8.1 |
| 59996 | pretend_object_9999 | 3 | 60005.03 | 143.0 | 9.0 |
| 59997 | pretend_object_9999 | 4 | 60005.06 | 143.1 | 9.1 |
| 59998 | pretend_object_9999 | 5 | 60008.03 | 144.0 | 10.0 |
| 59999 | pretend_object_9999 | 6 | 60008.06 | 144.1 | 10.1 |
60000 rows × 5 columns
If detection efficiency were perfect, all of these objects would be linked. However, it is not. We have set the detection efficency to 0.95, so we should expect to return roughly 95% of these objects from the linking filter. Let’s find out.
[25]:
long_linked_observations = PPLinkingFilter(observations_long, detection_efficiency, min_observations, min_tracklets, min_tracklet_window, min_angular_separation, max_time_separation, night_start_utc)
[26]:
long_linked_observations
[26]:
| ObjID | FieldID | fieldMJD_TAI | RA_deg | Dec_deg | object_linked | date_linked_MJD | |
|---|---|---|---|---|---|---|---|
| 0 | pretend_object_0 | 1 | 60000.03 | 142.0 | 8.0 | True | 60007.0 |
| 1 | pretend_object_1624 | 1 | 60000.03 | 142.0 | 8.0 | True | 60007.0 |
| 2 | pretend_object_5206 | 1 | 60000.03 | 142.0 | 8.0 | True | 60007.0 |
| 3 | pretend_object_5205 | 1 | 60000.03 | 142.0 | 8.0 | True | 60007.0 |
| 4 | pretend_object_1625 | 1 | 60000.03 | 142.0 | 8.0 | True | 60007.0 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 59995 | pretend_object_5720 | 6 | 60008.06 | 144.1 | 10.1 | True | 60007.0 |
| 59996 | pretend_object_5721 | 6 | 60008.06 | 144.1 | 10.1 | True | 60007.0 |
| 59997 | pretend_object_5722 | 6 | 60008.06 | 144.1 | 10.1 | True | 60007.0 |
| 59998 | pretend_object_5708 | 6 | 60008.06 | 144.1 | 10.1 | True | 60007.0 |
| 59999 | pretend_object_9999 | 6 | 60008.06 | 144.1 | 10.1 | True | 60007.0 |
60000 rows × 7 columns
[27]:
len(long_linked_observations["ObjID"].unique())/10000
[27]:
1.0
This is close enough - the detection efficiency is stochastic, so some variation is to be expected.