pylimer_tools.io package¶

Submodules¶

pylimer_tools.io.extract_thermo_data module¶

pylimer_tools.io.extract_thermo_data.detect_headers(file: str, max_nr_of_lines_to_read: int = 1500, use_cache: bool = True) → List[str][source]¶

Read max_nr_of_lines_to_read lines from the given file and return all possible header lines.

Some assumptions are made regarding the columns, e.g., that 75% of them start with a character.

Parameters:

file (str) – The file to search for header lines
max_nr_of_lines_to_read (int) – The number of lines to read in search for header lines. Use a negative number to read the whole file.
use_cache (bool) – Whether to read the result from cache or not. The cache is not read if the file changed meanwhile.

Returns:

List of detected header lines

Return type:

List[str]

pylimer_tools.io.extract_thermo_data.extract_thermo_params(file, header: str | List[str] | None = 'Step Temp E_pair E_mol TotEng Press', texts_to_read: int = 50, min_line_len: int = 5, use_cache: bool = True, lines_to_read_to_detect_header: int = 100000, lines_to_read_till_header: float = -1) → DataFrame[source]¶

Extract the thermodynamic outputs produced for this simulation, i.e., in LAMMPS, by the thermo command.

In particular, this function can handle log files, handle sections with different columns, and handles skipping over warnings as well as broken lines.

Note: The header parameter can be an array — make sure to pay attention when reading a file with different header sections in them.

Parameters:

file (str) – The file path to the file to read from
header (Union[str, List[str], None]) – The header of the CSV (where to start reading at). Can be a string, a list of strings, or None if you want to try the detection.
texts_to_read (int) – The number of times to expect the header
min_line_len (int) – The minimal length of a line to be accepted as data
use_cache (bool) – Whether to use cache or not (though it will be written anyway). The cache is not read if the file changed meanwhile.
lines_to_read_to_detect_header (int) – The number of lines to read when trying to detect headers
lines_to_read_till_header (float) – The number of lines that are acceptable to skip until a header should have been found. This is useful for (a) finding the header, and (b) exit early if you are unsure about the header(s)

Returns:

The thermodynamic parameters

Return type:

pd.DataFrame

pylimer_tools.io.extract_thermo_data.get_thermo_cache_name_suffix(header: str | List[str] | None = 'Step Temp E_pair E_mol TotEng Press', texts_to_read: float = 50, min_line_len: float = 5) → str[source]¶

Compose a cache file suffix in such a way, that it distinguishes different thermo reader parameters.

Parameters:

header (Union[str, List[str], None]) – The header of the CSV (where to start reading at)
texts_to_read (float) – The number of times to expect the header
min_line_len (float) – The minimal length of a line to be accepted as data

Returns:

A string to be used as cache file suffix

Return type:

str

pylimer_tools.io.extract_thermo_data.read_multi_section_separated_value_file(file: str, separator: str | None = None, use_cache: bool = True, comment: str | None = None, skip_err: bool = False) → DataFrame[source]¶

Reads a file with multiple sections that have different headers throughout the file.

This function handles files with multiple data sections that may have different column structures. It automatically detects the separator if not specified and combines all sections into a single DataFrame.

Parameters:

file (str) – Path to the file to read
separator (Union[str, None]) – Character used to separate values in the file (auto-detected if None)
use_cache (bool) – Whether to use cached results if available
comment (Union[str, None]) – Character indicating the start of comments (e.g., “#”)
skip_err (bool) – Whether to skip errors when processing sections

Returns:

Combined DataFrame containing all data from the file

Return type:

pd.DataFrame

Note

Particularly useful for reading output files from the DPDSimulator or other multi-section files where the structure may change between sections.

pylimer_tools.io.extract_thermo_data.read_one_group(fp, header, min_line_len=4, additional_lines_skip=0, lines_to_read_till_header=1000.0) → str[source]¶

Read one group of csv lines from the file.

Parameters:

fp (file object) – The file pointer to the file to read from
header (str or list) – The header of the CSV (where to start reading at)
min_line_len (int) – The minimal length of a line to be accepted as data
additional_lines_skip (int) – Number of lines to skip after reading the header
lines_to_read_till_header (float) – Maximum number of lines to read until finding the header

Returns:

The filename of a temporary CSV file, or empty string if no data was read

Return type:

str

pylimer_tools.io.read_pylimer_tools_output_file module¶

This module provides a few functions to read output from pylimer_tools_cpp’s simulators.

pylimer_tools.io.read_pylimer_tools_output_file.read_avg_file(filename: str) → DataFrame[source]¶

Read an averages-output file from one of the simulators shipped with pylimer_tools.

This function parses the output file format used by pylimer_tools_cpp simulators, handling multiple data sections and converting them to a pandas DataFrame. The function also caches results to improve performance on subsequent reads.

Parameters:: filename (str) – Path to the averages file to read
Returns:: DataFrame containing the parsed averages data, grouped by OutputStep
Return type:: pd.DataFrame
Note:: The function automatically filters out lines containing “-nan” values, null characters, or fewer than 3 columns.
Note:: The returned DataFrame is grouped by OutputStep, keeping only the last entry for each step.

pylimer_tools.io.read_lammps_output_file module¶

This module provides a few functions to read LAMMPS’ output files, including:

log files (thermo output)
dump files (focusing on the coordinates of atoms)
data files (the LAMMPS structure)
averaged data (from fix ave/time... or fix ave/hist...)
correlation data (from fix ave/correlate/...)

pylimer_tools.io.read_lammps_output_file.read_averages_file(filepath, use_cache: bool = True, sep=' ') → DataFrame[source]¶

Read a file written by a fix ave/time command.

Uses pandas’ read_csv after detecting the columns.

Important assumption: The first 2 or 3 lines in the file are:

comment,
then one header indicating the columns,
and then either data or potentially a second header, if it is a sectioned file (e.g., from a fix ave/time … vector)

Parameters:

filepath (str) – Path to the averages file
use_cache (bool) – Whether to use the cache to speed up reading & writing
sep (str) – Delimiter used in the file (default is space)

Returns:

DataFrame containing the parsed average data

Return type:

pd.DataFrame

Raises:

FileNotFoundError – If the averages file does not exist

pylimer_tools.io.read_lammps_output_file.read_correlation_file(filepath, group_key='Timestep', use_cache: bool = True) → DataFrame[source]¶

Read a file written by a fix ave/correlate{/long} command.

Parameters:

filepath (str) – Path to the correlation file
group_key (str) – The key that denotes a new section
use_cache (bool) – Whether to use the cache to speed up reading & writing

Returns:

DataFrame containing the correlation data. Use the group_key with the DataFrame’s groupby() to restore the original sections.

Return type:

pd.DataFrame

Raises:

FileNotFoundError – If the correlation file does not exist

pylimer_tools.io.read_lammps_output_file.read_data_file(structure_file: str, atom_style: List[AtomStyle] | None = None) → Universe[source]¶

Read a file with LAMMPS’ data type of structure into a Universe.

Parameters:

structure_file (str) – Path to the structure file
atom_style (Union[List[AtomStyle], None]) – The atom style(s) in the structure file (defaults to AtomStyle.Molecule if None)

Returns:

Universe object representing the molecular structure

Return type:

Universe

Raises:

FileNotFoundError – If the structure file does not exist

pylimer_tools.io.read_lammps_output_file.read_dump_file(data_file, dump_file, atom_style: List[AtomStyle] | None = None) → UniverseSequence[source]¶

Read a file with LAMMPS’ dump of snapshots of structures into a Universe.

Parameters:

data_file (str) – Path to the LAMMPS data file containing structure information
dump_file (str) – Path to the LAMMPS dump file containing trajectory information
atom_style (Union[List[AtomStyle], None]) – The atom style(s) used in the data file

Returns:

Sequence of Universe objects representing the trajectory

Return type:

UniverseSequence

pylimer_tools.io.read_lammps_output_file.read_histogram_file(filepath, use_cache: bool = True) → DataFrame[source]¶

Read a file written by fix ave/hist or similar.

This is a wrapper around read_sectioned_averages_file for histogram data.

Parameters:

filepath (str) – Path to the histogram file
use_cache (bool) – Whether to use the cache to speed up reading & writing

Returns:

DataFrame containing the parsed histogram data

Return type:

pd.DataFrame

See:

read_sectioned_averages_file()

pylimer_tools.io.read_lammps_output_file.read_log_file(filepath, lines_to_read_to_detect_header=500000) → DataFrame[source]¶

Read a LAMMPS’ log (thermo output) file.

Parameters:

filepath (str) – Path to the LAMMPS log file
lines_to_read_to_detect_header (int) – Maximum number of lines to read when detecting the header

Returns:

DataFrame containing the parsed thermo data

Return type:

pd.DataFrame

pylimer_tools.io.read_lammps_output_file.read_sectioned_averages_file(filepath, use_cache: bool = True) → DataFrame[source]¶

Read a file written by a fix ave/time command with multiple sections.

Use the section delimiter columns together with pandas’ groupby() to restore the original sections.

Parameters:

filepath (str) – Path to the sectioned averages file
use_cache (bool) – Whether to use the cache to speed up reading & writing

Returns:

DataFrame containing the parsed sectioned data

Return type:

pd.DataFrame

Raises:

FileNotFoundError – If the file does not exist
ValueError – If the file format is not recognized as a proper sectioned averages file

pylimer_tools.io.unit_styles module¶

class pylimer_tools.io.unit_styles.UnitStyle(unit_configuration: dict, ureg: UnitRegistry)[source]¶

Bases: object

UnitStyle: a collection of units of a particular LAMMPS unit style, but in SI units (i.e.: use this to convert your LAMMPS output data to SI units).

Example usage:

unit_style_factory = UnitStyleFactory()
unit_style = unit_style_factory.get_unit_style(
    "lj", polymer="pdms", warning=False, accept_mol=True)

# multiply with the following factor to convert LJ stress to SI units,
# namely MPa in this example:
lj_stress_to_si_conversion_factor = (1.*unit_style.pressure).to("MPa").magnitude

Initialize a UnitStyle object.

Parameters:

unit_configuration (dict) – Dictionary containing unit definitions
ureg (UnitRegistry) – Pint unit registry to use for unit conversions

__getattr__(property: str)[source]¶

Shorthand access for get_base_unit_of().

Parameters:: property (str) – The property name to get the unit for
Returns:: The unit object for the requested property
Return type:: pint.Quantity

Example usage:

units = get_unit_style("lj")
mass_with_units = mass_in_lj * units.mass

get_base_unit_of(property: str)[source]¶

Returns the conversion factor from the unit style to SI units.

Parameters:: property (str) – The property name to get the unit for (e.g., “mass”, “distance”)
Returns:: The unit object for the requested property
Return type:: pint.Quantity

Example usage:

units = get_unit_style("lj")
mass_in_si = mass_in_lj * units.get_base_unit_of("mass")

get_underlying_unit_registry()[source]¶

Get the underlying Pint unit registry.

Returns:: The unit registry used by this UnitStyle
Return type:: UnitRegistry

class pylimer_tools.io.unit_styles.UnitStyleFactory[source]¶

Bases: object

This is a factory to get multiple instances of different UnitStyle using the same UnitRegistry, such that they are compatible.

Initialize the UnitStyleFactory with a new UnitRegistry.

get_available_polymers() → list[source]¶

List all available polymers for which we have lj unit conversions.

Returns:: List of polymer names
Return type:: list

get_everares_et_al_data() → DataFrame[source]¶

Load the Everaers et al. (2020) unit properties data.

Returns:: DataFrame containing polymer properties from Everaers et al.
Return type:: pd.DataFrame

get_unit_registry()[source]¶

Get the underlying unit registry.

Returns:: The unit registry used by this factory
Return type:: UnitRegistry

get_unit_style(unit_type: str, dimension: int = 3, **kwargs) → UnitStyle[source]¶

Get a UnitStyle instance corresponding to the unit system requested.

Parameters:

unit_type (str) – The unit type, e.g. “lj”, “nano”, “real”, “si”, etc.
dimension (int) – The dimension of the box
kwargs (dict) – Additional arguments required for certain unit styles

Returns:

A UnitStyle object for the requested unit system

Return type:

UnitStyle

Raises:

ValueError – If required parameters are missing
NotImplementedError – If the requested unit type is not implemented

For LJ units, you must specify the polymer using the polymer parameter.

Module contents¶

This module contains various utility functions for working with LAMMPS and pylimer_tools in- and output files.