Skip to content

poreflow.File

Bases: DataFile

An interface for handline of sequencing files.

This class extends fast5_research.BulkFast5 to provide more convenient methods for data retrieval and interaction, returning data as poreflow objects. Supports both UTube (.dat) and ONT (.fast5) files.

Extra data created when analyzing the file, like event or step data, is stored in a separate "annotations" file with, thus preserving the original data.

Attributes:

Name Type Description
annotator Annotation

Annotation handler for the file.

Examples:

Opening an ONT file and reading the entire "raw" measurement from the first channel.

>>> import poreflow as pf
>>> f = pf.File("example.fast5")
>>> raw_data = f.get_raw(channel=0)
>>> f.close()

Closing the file can be done automatically using a with statement.

>>> with pf.File("example.fast5") as f:
>>>     raw_data = f.get_raw(channel=0)

UTube device measurements are opened identically:

>>> with pf.File("example_utube.dat") as f:
>>>     raw_data = f.get_raw()

Note that these measurements only have one channel, so the channel argument in File.get_raw can be left out. Loading a UTube file will place a Fast5 file next to the .dat file, which is used for I/O.

channels property

List of channels in poreFlow notation

context_meta property

Gets context metadata from the file.

Returns:

Name Type Description
dict dict

Context metadata containing experimental context information.

device property

Identifies the device type from tracking metadata.

Returns:

Name Type Description
str

Either pf.UTUBE or pf.ONT depending on device ID.

events property writable

Gets all events from the file.

Returns:

Type Description
EventsDataFrame

pf.EventsDataFrame: DataFrame containing all events.

n_events property

Gets the number of events saved on disk.

Returns:

Name Type Description
int int

Number of events stored in the annotation file.

n_steps property

Gets the number of steps saved on disk.

Returns:

Name Type Description
int int

Number of steps stored in the annotation file.

name property writable

Gets or sets a pretty name for the file.

Gets a human-readable name for the file. If no custom name is set, returns the filename.

Returns:

Name Type Description
str str

The pretty name

sfreq property

Gets the sampling frequency of the data.

Retrieves the sampling frequency from metadata sources in the Fast5 file.

Returns:

Name Type Description
float float

Sampling frequency in Hz.

steps property writable

Gets all steps from the file.

Returns:

Type Description
StepsDataFrame

pf.StepsDataFrame: DataFrame containing all steps.

tracking_meta property

Gets tracking metadata from the file.

Returns:

Name Type Description
dict dict

Tracking metadata containing device information and identifiers.

__init__(filename, verbose=0, annotation_name=None, search_path=None, force_conversion=False, mode='r+')

Initializes the File object.

Parameters:

Name Type Description Default
filename str or Path

Path to the .fast5 or .dat file.

required
verbose int

Controls warning messages for .dat file conversion. 2 (default): Use warnings.warn. 1: Use print(). 0: Suppress messages.

0
force_conversion bool

Force re-conversion of .dat file to .fast5

False
search_path str or Path

Path to search for annotations when loading a file. Defaults to None, in which case annotation (.annot.fast5) files are searched in the same directory as the data (.fast5) file. Conversely, If filename is an annotation search_path is searched for a data file.

None

Returns:

Type Description

None

__len__()

Gets the total number of samples in the file.

Returns:

Name Type Description
int int

Total number of samples across all channels.

close()

Closes the file and along with the annotation file.

Returns:

Type Description

None

filter_events(mask)

Filter events based on mask.

Parameters:

Name Type Description Default
mask Series | ndarray

True is keep

required

find_events(processes=None, verbose=0, channels=None, index=None, **kwargs)

Find events in the file using parallel processing.

Finds events in channels. Also stores open-state current fits for all channels searched.

Parameters:

Name Type Description Default
processes int

Number of workers.

None
verbose

Verbosity level.

0
channels list

specific channels to search.

None
index tuple
None
**kwargs

Additional arguments for event detection. See

{}

get_channels()

Gets a list of available channels in the file.

Retrieves channel numbers from the file, converting from ONT's 1-based indexing to poreFlow's 0-based indexing.

Returns:

Name Type Description
list

List of channel numbers (0-indexed) available in the file.

Note

ONT stores channels in Fast5 files starting from 1. Channels in poreFlow are indexed from 0. So 0 → Channel 1, etc.

get_data(pointer, voltage=False)

Get data from a measurement as a poreflow.BaseDataFrame object.

This method retrieves the raw current data from a specified channel, allows for slicing and downsampling, and returns it as a poreflow.BaseDataFrame object.

Parameters:

Name Type Description Default
pointer Pointer

Pointer to the Data

required
voltage bool

Whether to return voltage data.

False

Returns:

Type Description
BaseDataFrame

poreflow.base.BaseData: An object containing the current data and metadata.

Notes

This method is intended as abstract/general way to get data. Usage of the methods get_raw, get_event, etc. is preferred.

get_event(item)

Get data from an events as a poreflow.EventDataFrame object.

This method retrieves the raw current data from a specified event. Requires events to have been found and stored in the annotation object.

Parameters:

Name Type Description Default
item int

The event number to retrieve data from.

required
downsample float

The new sampling frequency to downsample to. If None, no downsampling is performed. Defaults to None.

required

Returns:

Type Description
EventDataFrame

poreflow.EventDataFrame: An object containing the current of

EventDataFrame

an event.

get_events(channel=None)

Reads events from disk.

Parameters:

Name Type Description Default
channel int

Only return events from a specific channel. Defaults to None, in which case events from all channels are returned.

None

Returns:

Type Description
EventsDataFrame

pf.EventsDataFrame: DataFrame containing events.

get_raw(channel=0, times=None, index=None)

Get data from a channel as a poreflow.RawDataFrame object.

This method retrieves the raw current and voltage data from a specified channel, allows for slicing and downsampling, and returns it packaged as a dataframe-type object.

Parameters:

Name Type Description Default
channel int

The channel number to retrieve data from. Defaults to 0.

0
times tuple[float, float]

A tuple of (start_time, end_time) in seconds to slice the data. Defaults to None (all data).

None
index tuple[int, int]

A tuple of (start_index, end_index) to slice the data. Defaults to None (all data).

None
downsample float

The new sampling frequency to downsample to. If None, no downsampling is performed. Defaults to None.

required

Returns:

Type Description
RawDataFrame

poreflow.RawDataFrame: An object containing the current and voltage

RawDataFrame

data and metadata.

get_steps(event=None)

Reads events from disk.

Parameters:

Name Type Description Default
event int

Only return events from a specific event. Defaults to None, in which case steps from all events are returned.

None

Returns:

Type Description
StepsDataFrame

pf.StepsDataFrame: DataFrame containing events.

get_t(n=None)

Get time array

Parameters:

Name Type Description Default
n int

Number of time points to return. Defaults to None, in which case an number of time points is equal to the current data is returned

None

Returns:

map(worker_func, pointers, processes=None, verbose=0, callback=None, **kwargs)

Run a worker function in parallel over some mappable.

Parameters:

Name Type Description Default
worker_func Callable

Function to run. Must accept (fname, channel, lock, **kwargs).

required
pointers list[Pointer]

List of pointers.

required
processes int

Number of worker processes to use. If processes is None (default) then the number of logical CPUs in the system is used.

None
verbose int

Verbosity level.

0
callback Callable

Function to call with each result.

None
**kwargs

Keyword arguments to pass to the worker function.

{}

Returns:

Name Type Description
list

List of results from the worker function.

Notes

This is a method intended for advanced users. For examples of usages, check out source code of File.find_events or File.find_steps.

remove_events(channels=None, events=None, include_ios=True, include_steps=True)

Removes events from specific channels or all events.

Parameters:

Name Type Description Default
events int | list[int] | None
None
channels int | list[int] | None
  • If int: Remove events from the specified channel only.
  • If list[int]: Remove events from the specified channels.
  • If None: Remove all events from all channels.
None
include_ios bool

Whether to also remove IOS fits from the specified channels. Defaults to True.

True
include_steps bool

Whether to also remove steps in channel Defaults to True.

True

Returns:

Type Description
None

None

Todo

Docs

remove_ios(channels=None)

Removes open state current fits from specific channels or all channels.

Instead of deleting the entries, sets the polynomial coefficients to NaN to preserve the dataset structure.

Parameters:

Name Type Description Default
channels int | list[int] | None
  • If int: Remove IOS fit from the specified channel only.
  • If list[int]: Remove IOS fits from the specified channels.
  • If None: Remove IOS fits from all channels.
None

Returns:

Type Description
None

None

Examples:

Remove IOS fit from channel 0 only:

>>> f.remove_ios(0)

Remove IOS fits from channels 0 and 2:

>>> f.remove_ios([0, 2])

Remove all IOS fits:

>>> f.remove_ios(None)

set_steps(df_steps, mode='w')

Saves steps to disk.

Parameters:

Name Type Description Default
df_steps StepsDataFrame

Steps to save. Set None to delete all events store on disk.

required
mode str

Whether to replace ('w') or append ('a') to existing steps on disk.

'w'