Skip to content

Filtering and Downsampling

Overview

PoreFlow, features tools to both filter and downsample raw nanopore sequencing data. These methods are described on this page.

Key Concepts

  • Downsampling


    Reduces the number of samples in a dataset.

    Used for:

    • Faster visualization
    • Efficient storage
    • Reducing computational load
  • Filtering


    Retains the same number of samples but removes unwanted frequencies.

    Used for:

    • Noise reduction
    • Removing artifacts

Downsampling

Downsampling reduces the sampling frequency by an integer factor. It is particularly useful for UTube data, which can be recorded at high sample rates (50 kHz), which for long measurements leads to a large dataset. Downsampling to, say, 5 kHz plotting is a great way to improve performance.

Under the hood, scipy.signal.decimate is used to downsample the signal, which applies an anti-aliasing filter before downsampling.

Downscaling is done using the .downsample method on poreFlow dataframe objects, like RawDataFrame or EventDataFrame.

Example

1
2
3
4
5
6
7
with pf.File("utube_measurement.dat") as f:
    raw = f.get_raw()  # (1)!
    print(f"File sample rate {f.sfreq} Hz, {len(f)} samples.")

raw = raw.downsample(2500) # (2)!

print(f"Original sample rate {raw.sfreq_original} downsampled to {raw.sfreq} Hz, now {len(raw)} samples.")
  1. Get raw data for channel 0.
  2. Downsample to 2.5 kHz
File sample rate 50000.0 Hz, 2258000 samples.
Original sample rate 50000.0 downsampled to 2500.0 Hz, now 112900 samples.

By default, only the current and voltage columns processed with an anti-aliasing filter before downsampling.

Filtering

Unwanted high-frequency noise can be filtered using a 4th-order Bessel low-pass filter. Specify the cut-off frequency of the filter to from which frequency to attenuate the signal.

Downscaling is done using the .apply_filter method on poreFlow dataframe objects, like RawDataFrame or EventDataFrame.

Example

1
2
3
4
5
6
with pf.File("utube_measurement.dat") as f:
    raw = f.get_raw()

raw = raw.apply_filter(1000) # (1)!

print(f"Sample rate {raw.sfreq} filtered with a cutoff at to {raw.filter_cutoff}.")
  1. Filter with a filter with a cut-off frequency at 1000 Hz
Sample rate 50000.0 filtered with a cutoff at to 1000.

By default, only the current and voltage columns are filtered. To change this behaviour, check out the reference.

Do not confuse with DataFrame.filter

It is easy to confuse pandas.DataFrame.filter with poreflow.BaseDataFrame.apply_filter. The former filters values in the columns or rows based on some argument, the later does signal processing on the voltage/current columns of the DataFrame.

Additional examples

Depending on your data, you might want to always first downsample to a specific frequency, and only then do further filtering or processing. This is often the case for high-frequency UTube data, which generally is first downsampled to around 5 kHz. An example of doing so:

1
2
3
4
5
6
7
8
with pf.File("utube_measurement.dat") as f:
    raw = f.get_raw()

raw = raw.downsample(5000).apply_filter(1000)

print(f"Original sample rate: {raw.sfreq_original} Hz")
print(f"Downsampled to:       {raw.sfreq} Hz, ")
print(f"Filtered to:          {raw.filter_cutoff} Hz, ")
Original sample rate: 50000.0 Hz
Downsampled to:       5000.0 Hz, 
Filtered to:          1000 Hz, 

DataFrame attributes

The example above demonstrates the three attributes in poreflow.BaseDataFrame used to keep track of filtering results.

  • BaseDataFrame.sfreq_original is set to the original sample rate of the file from which the dataframe is read. It is not changed by filtering/downsampling.
  • BaseDataFrame.sfreq is the (downsampled) sample rate of the DataFrame.
  • BaseDataFrame.filter_cutoff is None if the event is unfiltered and set to the cut-off frequency of the filter after filtering.