poreflow.File
¶
Bases: DataFile
An interface for handline of sequencing files.
This class extends fast5_research.BulkFast5 to provide
more convenient methods for data retrieval and interaction, returning
data as poreflow objects. Supports both UTube (.dat) and ONT (.fast5)
files.
Extra data created when analyzing the file, like event or step data, is stored in a separate "annotations" file with, thus preserving the original data.
Attributes:
| Name | Type | Description |
|---|---|---|
annotator |
Annotation
|
Annotation handler for the file. |
Examples:
Opening an ONT file and reading the entire "raw" measurement from the first channel.
>>> import poreflow as pf
>>> f = pf.File("example.fast5")
>>> raw_data = f.get_raw(channel=0)
>>> f.close()
Closing the file can be done automatically using a with statement.
UTube device measurements are opened identically:
Note that these measurements only have one channel, so the
channel argument in File.get_raw can be left out. Loading
a UTube file will place a Fast5 file next to the .dat file, which is
used for I/O.
channels
property
¶
List of channels in poreFlow notation
context_meta
property
¶
Gets context metadata from the file.
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
Context metadata containing experimental context information. |
device
property
¶
Identifies the device type from tracking metadata.
Returns:
| Name | Type | Description |
|---|---|---|
str |
Either pf.UTUBE or pf.ONT depending on device ID. |
events
property
writable
¶
Gets all events from the file.
Returns:
| Type | Description |
|---|---|
EventsDataFrame
|
pf.EventsDataFrame: DataFrame containing all events. |
n_events
property
¶
Gets the number of events saved on disk.
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Number of events stored in the annotation file. |
n_steps
property
¶
Gets the number of steps saved on disk.
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Number of steps stored in the annotation file. |
name
property
writable
¶
Gets or sets a pretty name for the file.
Gets a human-readable name for the file. If no custom name is set, returns the filename.
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
The pretty name |
sfreq
property
¶
Gets the sampling frequency of the data.
Retrieves the sampling frequency from metadata sources in the Fast5 file.
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Sampling frequency in Hz. |
steps
property
writable
¶
Gets all steps from the file.
Returns:
| Type | Description |
|---|---|
StepsDataFrame
|
pf.StepsDataFrame: DataFrame containing all steps. |
tracking_meta
property
¶
Gets tracking metadata from the file.
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
Tracking metadata containing device information and identifiers. |
__init__(filename, verbose=0, annotation_name=None, search_path=None, force_conversion=False, mode='r+')
¶
Initializes the File object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str or Path
|
Path to the .fast5 or .dat file. |
required |
verbose
|
int
|
Controls warning messages for .dat file conversion. 2 (default): Use warnings.warn. 1: Use print(). 0: Suppress messages. |
0
|
force_conversion
|
bool
|
Force re-conversion of .dat file to .fast5 |
False
|
search_path
|
str or Path
|
Path to search for annotations when loading a file. Defaults to None, in which
case annotation (.annot.fast5) files are searched in the same directory as the data (.fast5) file. Conversely,
If |
None
|
Returns:
| Type | Description |
|---|---|
|
None |
__len__()
¶
Gets the total number of samples in the file.
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Total number of samples across all channels. |
close()
¶
Closes the file and along with the annotation file.
Returns:
| Type | Description |
|---|---|
|
None |
filter_events(mask)
¶
Filter events based on mask.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mask
|
Series | ndarray
|
True is keep |
required |
find_events(processes=None, verbose=0, channels=None, index=None, **kwargs)
¶
Find events in the file using parallel processing.
Finds events in channels. Also stores open-state current fits for all channels searched.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
processes
|
int
|
Number of workers. |
None
|
verbose
|
Verbosity level. |
0
|
|
channels
|
list
|
specific channels to search. |
None
|
index
|
tuple
|
|
None
|
**kwargs
|
Additional arguments for event detection. See |
{}
|
get_channels()
¶
Gets a list of available channels in the file.
Retrieves channel numbers from the file, converting from ONT's 1-based indexing to poreFlow's 0-based indexing.
Returns:
| Name | Type | Description |
|---|---|---|
list |
List of channel numbers (0-indexed) available in the file. |
Note
ONT stores channels in Fast5 files starting from 1. Channels in poreFlow are indexed from 0. So 0 → Channel 1, etc.
get_data(pointer, voltage=False)
¶
Get data from a measurement as a poreflow.BaseDataFrame object.
This method retrieves the raw current data from a specified channel,
allows for slicing and downsampling, and returns it as a
poreflow.BaseDataFrame object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pointer
|
Pointer
|
Pointer to the Data |
required |
voltage
|
bool
|
Whether to return voltage data. |
False
|
Returns:
| Type | Description |
|---|---|
BaseDataFrame
|
poreflow.base.BaseData: An object containing the current data and metadata. |
Notes
This method is intended as abstract/general way to get data. Usage of the methods get_raw, get_event, etc. is preferred.
get_event(item)
¶
Get data from an events as a poreflow.EventDataFrame object.
This method retrieves the raw current data from a specified event. Requires events to have been found and stored in the annotation object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
int
|
The event number to retrieve data from. |
required |
downsample
|
float
|
The new sampling frequency to downsample to. If None, no downsampling is performed. Defaults to None. |
required |
Returns:
| Type | Description |
|---|---|
EventDataFrame
|
poreflow.EventDataFrame: An object containing the current of |
EventDataFrame
|
an event. |
get_events(channel=None)
¶
Reads events from disk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
channel
|
int
|
Only return events from a specific channel. Defaults to None, in which case events from all channels are returned. |
None
|
Returns:
| Type | Description |
|---|---|
EventsDataFrame
|
pf.EventsDataFrame: DataFrame containing events. |
get_raw(channel=0, times=None, index=None)
¶
Get data from a channel as a poreflow.RawDataFrame object.
This method retrieves the raw current and voltage data from a specified channel, allows for slicing and downsampling, and returns it packaged as a dataframe-type object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
channel
|
int
|
The channel number to retrieve data from. Defaults to 0. |
0
|
times
|
tuple[float, float]
|
A tuple of (start_time, end_time) in seconds to slice the data. Defaults to None (all data). |
None
|
index
|
tuple[int, int]
|
A tuple of (start_index, end_index) to slice the data. Defaults to None (all data). |
None
|
downsample
|
float
|
The new sampling frequency to downsample to. If None, no downsampling is performed. Defaults to None. |
required |
Returns:
| Type | Description |
|---|---|
RawDataFrame
|
poreflow.RawDataFrame: An object containing the current and voltage |
RawDataFrame
|
data and metadata. |
get_steps(event=None)
¶
Reads events from disk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
event
|
int
|
Only return events from a specific event. Defaults to None, in which case steps from all events are returned. |
None
|
Returns:
| Type | Description |
|---|---|
StepsDataFrame
|
pf.StepsDataFrame: DataFrame containing events. |
get_t(n=None)
¶
Get time array
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n
|
int
|
Number of time points to return. Defaults to None, in which case an number of time points is equal to the current data is returned |
None
|
Returns:
map(worker_func, pointers, processes=None, verbose=0, callback=None, **kwargs)
¶
Run a worker function in parallel over some mappable.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
worker_func
|
Callable
|
Function to run. Must accept (fname, channel, lock, **kwargs). |
required |
pointers
|
list[Pointer]
|
List of pointers. |
required |
processes
|
int
|
Number of worker processes to use. If processes is None (default) then the number of logical CPUs in the system is used. |
None
|
verbose
|
int
|
Verbosity level. |
0
|
callback
|
Callable
|
Function to call with each result. |
None
|
**kwargs
|
Keyword arguments to pass to the worker function. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
list |
List of results from the worker function. |
Notes
This is a method intended for advanced users. For examples of usages, check out source code of File.find_events or File.find_steps.
remove_events(channels=None, events=None, include_ios=True, include_steps=True)
¶
Removes events from specific channels or all events.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
events
|
int | list[int] | None
|
|
None
|
channels
|
int | list[int] | None
|
|
None
|
include_ios
|
bool
|
Whether to also remove IOS fits from the specified channels. Defaults to True. |
True
|
include_steps
|
bool
|
Whether to also remove steps in channel Defaults to True. |
True
|
Returns:
| Type | Description |
|---|---|
None
|
None |
Todo
Docs
remove_ios(channels=None)
¶
Removes open state current fits from specific channels or all channels.
Instead of deleting the entries, sets the polynomial coefficients to NaN to preserve the dataset structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
channels
|
int | list[int] | None
|
|
None
|
Returns:
| Type | Description |
|---|---|
None
|
None |
Examples:
Remove IOS fit from channel 0 only:
Remove IOS fits from channels 0 and 2:
Remove all IOS fits:
set_steps(df_steps, mode='w')
¶
Saves steps to disk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df_steps
|
StepsDataFrame
|
Steps to save. Set None to delete all events store on disk. |
required |
mode
|
str
|
Whether to replace ('w') or append ('a') to existing steps on disk. |
'w'
|