Preprocessing

Rampy offers handful functions to preprocess your spectra. Do not hesitate to propose/ask for new functionalities!

Below you will find the documentation of the relevant functions. They are often used in the different example notebooks: Example notebooks

Flip the X axis

Some spectra come with decreasing X values. Rampy offers a simple function to flip them. This can be necessary to e.g. resample them (as interpolation algorithms usually require the X values to increase). The following function does this. It even allows even arbitrary X value positions. It returns a sorted array with an increasing x axis.

rampy.flipsp(sp: ndarray) → ndarray

Sorts or flips a spectrum along its row dimension based on x-axis values.

Parameters:: sp (np.ndarray) – A 2D array where the first column contains x-axis values and subsequent columns contain y-values.
Returns:: The input array sorted in ascending order based on the first column.
Return type:: np.ndarray

Notes

Uses np.argsort to ensure sorting regardless of initial order.

Example

>>> import numpy as np
>>> import rampy as rp
>>> sp = np.array([[300, 30], [100, 10], [200, 20]])
>>> sorted_sp = rp.flipsp(sp)

Shift the X axis

You can shift the X axis from a given value using the following function:

rampy.shiftsp(sp: ndarray, shift: float) → ndarray

Shifts the x-axis values of a spectrum by a given amount.

Parameters:

sp (np.ndarray) – A 2D array where the first column contains x-axis values (e.g., frequency or wavenumber) and subsequent columns contain y-values.
shift (float) – The amount by which to shift the x-axis values. Negative values shift left; positive values shift right.

Returns:

The input array with shifted x-axis values.

Return type:

np.ndarray

Example

>>> import numpy as np
>>> import rampy as rp
>>> sp = np.array([[100, 10], [200, 20], [300, 30]])
>>> shifted_sp = rp.shiftsp(sp, shift=50)

Extract a portion / portions of a signal

You can use the function rampy.extract_signal() to do that (old version: rampy.get_portion_interest)

rampy.extract_signal(x: ndarray, y: ndarray, roi) → ndarray

Extracts the signal from specified regions of interest (ROI) in the x-y data.

This function selects and extracts portions of the input x-y data based on the specified regions of interest (ROI) provided in roi. Each region is defined by a lower and upper bound along the x-axis.

Parameters:

x (ndarray) – The x-axis values (e.g., time, wavelength, or other independent variables).
y (ndarray) – The y-axis values corresponding to the x-axis values (e.g., signal intensity).
roi (ndarray or list of lists) –
Regions of interest (ROI) where the signal should be extracted. Must be an n x 2 array or a list of lists, where n is the number of regions to extract. Each sublist or row should contain two elements:
- The lower bound of the region (inclusive).
- The upper bound of the region (inclusive).
Example
- Array: np.array([[10, 20], [50, 70]])
- List: [[10, 20], [50, 70]]

Returns:

A 2-column array containing the extracted x-y signals from the specified regions.: The first column contains the x values, and the second column contains the corresponding y values.

Return type:

ndarray

Raises:

ValueError – If roi is not a valid n x 2 array or list of lists, or if any region in roi falls outside the range of x.

Notes

Overlapping regions in roi are not merged; they are processed as separate regions.
If no valid regions are found within roi, an empty array is returned.

Examples

Extracting signal from two regions in an x-y dataset:

>>> import numpy as np
>>> x = np.linspace(0, 100, 101)
>>> y = np.sin(x / 10) + np.random.normal(0, 0.1, size=x.size)
>>> roi = [[10, 20], [50, 70]]
>>> extracted_signal = extract_signal(x, y, roi)
>>> print(extracted_signal)

Remove spikes

Spikes can be removed via the rampy.despiking() function. It takes as input X and Y values of a spectrum and a threshold. The threshold is the number of standard deviation above the mean noise value that a point must be to be considered as a spike. For instance, if the threshold is 3, then a point will be considered as a spike if it is 3 standard deviation above the mean of the noise. The function will then replace the spike by the mean of k points before and after the spike.

rampy.despiking(x: ndarray, y: ndarray, neigh: int = 4, threshold: int = 3) → ndarray

Removes spikes from a 1D signal using a threshold-based approach.

This function identifies spikes in a signal by comparing local residuals to a threshold based on the root mean square error (RMSE). Spikes are replaced with the mean of neighboring points.

Parameters:

x (np.ndarray) – A 1D array containing the x-axis values of the signal.
y (np.ndarray) – A 1D array containing the y-axis values of the signal to despike.
neigh (int) – The number of neighboring points to use for calculating average values during despiking and for smoothing. Default is 4.
threshold (int) – The multiplier of RMSE used to identify spikes. Default is 3.

Returns:

A 1D array of the despiked signal.

Return type:

np.ndarray

Example

>>> import numpy as np
>>> import rampy as rp
>>> x = np.linspace(0, 10, 100)
>>> y = rp.gaussian(x, 10., 50., 2.0)
>>> y_despiked = rp.despiking(x, y)

Resampling a spectrum

We need sometime to resample a spectrum with a new X axis. rampy.resample() offers such ability. For instance, we have a spectrum that has a X axis from 400 to 1300 cm-1, with points each 0.9 cm-1. We want the same but with an X axis with a value each cm-1. We can do for our spectrum:

rampy.resample(x: ndarray, y: ndarray, x_new: ndarray, fill_value='extrapolate', **kwargs) → ndarray

Resamples a y signal along new x-axis values using interpolation.

Parameters:

x (np.ndarray) – Original x-axis values.
y (np.ndarray) – Original y-axis values corresponding to x.
x_new (np.ndarray) – New x-axis values for resampling.
fill_value (array-like or (array-like, array_like) or “extrapolate”, optional) – behavior of the interpolation for requested points outside of the data range. See [scipy help for details](https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html). Default is ‘“extrapolate”’.
**kwargs –
Additional arguments passed to scipy.interpolate.interp1d.
- kind (str or int): Type of interpolation (‘linear’, ‘cubic’, etc.). Default is ‘linear’.
- bounds_error (bool): If True, a ValueError is raised any time interpolation is attempted on a value outside of the range of x (where extrapolation is necessary). If False, out of bounds values are assigned fill_value. By default, an error is raised unless fill_value=”extrapolate”.

Returns:

Resampled y-values corresponding to x_new.

Return type:

np.ndarray

Example

>>> import numpy as np
>>> import rampy as rp
>>> original_x = np.array([100, 200, 300])
>>> original_y = np.array([10, 20, 30])
>>> new_x = np.linspace(100, 300, 5)
>>> resampled_y = rp.resample(original_x, original_y, new_x)

Normalisation

Rampy provides the rampy.normalisation() function to normalise the Y values of a spectrum to

the maximum intensity
the trapezoidal area under the curve
to min-max values of intensities

rampy.normalise(y: ndarray, x: ndarray = None, method: str = 'intensity') → ndarray

Normalizes the y signal(s) using specified methods.

This function normalizes the input y signal(s) based on the chosen method: by area under the curve, maximum intensity, or min-max scaling.

Parameters:

y (np.ndarray) – A 2D array of shape (m values, n samples) containing the y values to normalize.
x (np.ndarray, optional) – A 2D array of shape (m values, n samples) containing the x values corresponding to y. Required for area normalization. Default is None.
method (str) – The normalization method to use. Options are: - ‘area’: Normalize by the area under the curve. - ‘intensity’: Normalize by the maximum intensity. - ‘minmax’: Normalize using min-max scaling.

Returns:

A 2D array of normalized y signals with the same shape as the input y.

Return type:

np.ndarray

Raises:

ValueError – If x is not provided when using the ‘area’ normalization method.
NotImplementedError – If an invalid normalization method is specified.

Example

>>> import numpy as np
>>> import rampy as rp
>>> x = np.linspace(0, 10, 100)
>>> y = rp.gaussian(x, 10., 50., 2.0)
>>> y_norm = rp.normalise(y, x=x, method="area")

Temperature and excitation line effects

Raman spectra may need correction from temperature and excitation line effects. See the review of Brooker et al. 1988 for details. rampy offers several way to do so with the rampy.tlexcitation() function.

rampy.tlcorrection(x: ndarray, y: ndarray, temperature: float, wavelength: float, **kwargs) → tuple

Corrects Raman spectra for temperature and excitation line effects.

This function applies corrections to Raman spectra to account for temperature and laser excitation wavelength effects. It supports multiple correction equations and normalization methods, making it suitable for a variety of materials and experimental conditions.

Parameters:

x (np.ndarray) – Raman shifts in cm⁻¹.
y (np.ndarray) – Intensity values (e.g., counts).
temperature (float) – Temperature in °C.
wavelength (float) – Wavelength of the laser that excited the sample, in nm.
correction (str, optional) – The correction equation to use. Options are: - ‘long’: Default equation from Galeener and Sen (1978) with a (v_0^3) coefficient correction. - ‘galeener’: Original equation from Galeener and Sen (1978), based on Shuker and Gammon (1970). - ‘hehlen’: Equation from Hehlen et al. (2010), preserving the Boson peak signal. Default is ‘long’.
normalisation (str, optional) – Normalization method for the corrected data. Options are: - ‘intensity’: Normalize by maximum intensity. - ‘area’: Normalize by total area under the curve. - ‘no’: No normalization. Default is ‘area’.
density (float, optional) – Density of the studied material in kg/m³, used only with the ‘hehlen’ equation. Default is 2210.0 (density of silica).

Returns:

x (np.ndarray): Raman shift values after correction.
ycorr (np.ndarray): Corrected intensity values.
ese_corr (np.ndarray): Propagated errors calculated as (sqrt{y}) on raw intensities.

Return type:

tuple[np.ndarray, np.ndarray, np.ndarray]

Raises:

ValueError – If an invalid correction or normalization method is specified.

Notes

The ‘galeener’ equation is a modification of Shuker and Gammon’s formula to account for ((v_0 - v)^4) dependence of Raman intensity.
The ‘long’ equation includes a (v_0^3) coefficient to remove cubic meter dimensions, as used in several studies like Mysen et al. (1982).
The ‘hehlen’ equation avoids signal suppression below 500 cm⁻¹, preserving features like the Boson peak in glasses.

References

Galeener, F.L., & Sen, P.N. (1978). Theory of the first-order vibrational spectra of disordered solids. Physical Review B, 17(4), 1928–1933.
Hehlen, B. (2010). Inter-tetrahedra bond angle of permanently densified silicas extracted from their Raman spectra. Journal of Physics: Condensed Matter, 22(2), 025401.
Brooker, M.H., Nielsen, O.F., & Praestgaard, E. (1988). Assessment of correction procedures for reduction of Raman spectra. Journal of Raman Spectroscopy, 19(2), 71–78.
Mysen, B.O., Finger, L.W., Virgo, D., & Seifert, F.A. (1982). Curve-fitting of Raman spectra of silicate glasses. American Mineralogist, 67(7-8), 686–695.
Neuville, D.R., & Mysen, B.O. (1996). Role of aluminium in the silicate network: In situ high-temperature study of glasses and melts on the join SiO₂-NaAlO₂. Geochimica et Cosmochimica Acta, 60(9), 1727–1737.
Le Losq, C., Neuville, D.R., Moretti, R., & Roux, J. (2012). Determination of water content in silicate glasses using Raman spectrometry: Implications for the study of explosive volcanism. American Mineralogist, 97(5-6), 779–790.
Shuker, R., & Gammon, R.W. (1970). Raman-scattering selection-rule breaking and the density of states in amorphous materials. Physical Review Letters, 25(4), 222.

Examples

Correct a simple spectrum using default parameters:

>>> import numpy as np
>>> x = np.array([100, 200, 300])  # Raman shifts in cm⁻¹
>>> y = np.array([10, 20, 30])     # Intensity values
>>> temperature = 25.0             # Temperature in °C
>>> wavelength = 532.0             # Wavelength in nm
>>> x_corr, y_corr, ese_corr = correct_spectra(x, y, temperature, wavelength)

Use a specific correction equation and normalization method:

>>> x_corr, y_corr, ese_corr = correct_spectra(
        x=x,
        y=y,
        temperature=25,
        wavelength=532,
        correction='hehlen',
        normalisation='intensity',
        density=2500
    )

Wavelength-wavenumber convertion

The convert_x_units() function allows to convert your X values in nm in inverse cm, or the opposite! Do not hesitate to propose new ways to enrich it!

rampy.convert_x_units(x: ndarray, laser_nm: float = 532.0, way: str = 'nm_to_cm-1') → ndarray

Converts between nanometers and inverse centimeters for Raman spectroscopy.

Parameters:

x (np.ndarray) – Array of x-axis values to convert.
laser_nm (float) – Wavelength of the excitation laser in nanometers. Default is 532.0 nm.
way (str) – Conversion direction. Options are: - “nm_to_cm-1”: Convert from nanometers to inverse centimeters. - “cm-1_to_nm”: Convert from inverse centimeters to nanometers.

Returns:

Converted x-axis values.

Return type:

np.ndarray

Raises:

ValueError – If an invalid conversion direction is specified.

Example

Convert from nanometers to inverse centimeters:

>>> import rampy as rp
>>> x_nm = np.array([600.0])
>>> x_cm_1 = rp.convert_x_units(x_nm)

Convert from inverse centimeters to nanometers:

>>> x_cm_1 = np.array([1000.0])
>>> x_nm = rp.convert_x_units(x_cm_1, way="cm-1_to_nm")