Data Processing

Introduction

Spectra allows you to perform several processing steps on x-y spectral data. Below we will showcase short examples, and then you will find the documentation of the variosu functions you may want to use!

As a starting point and for the sack of example, we create two synthetic signals to play with. They will be Gaussian signals randomly sampled along two different X axis, with noise and increasing backgrounds. One of them will also have a strong spike!

# Signal creation
using Spectra, Plots

# we create a fake signal with
x_1 = rand(1000)*100
x_2 = rand(1000)*100

# create a signal that is the combination of two gaussian peaks plus a background
background_1 = 0.08 * x_1
background_2 = 0.03 * x_2

# some noise
noise_1 = 0.5*randn(1000)
noise_2 = 0.3*randn(1000)

# the full toy signals
y_1 = gaussian(x_1, 10.0, 40., 5.) .+ background_1 .+ noise_1
y_2 = gaussian(x_2, 20.0, 60., 9.) .+ background_2 .+ noise_2

# one of them will have a spike!
y_1[600] = 250.0

# make a plot of our two spectra
scatter(x_1, y_1)
scatter!(x_2, y_2)

We can do the following steps (not necessarily in this order):

  • correct_xshift allows correcting X-axis shifts of your spectra from a reference value (e.g. silicon wafer reference in Raman spectroscopy).
  • nm_to_invcm or invcm_to_nm convert the X axis between nanometers (nm) and wavenumbers (cm$^{-1}$).
  • flipsp sort the X-axis (this is necessary for some algorithms).
  • resample allows getting our spectra on the same X axis for convenience.
  • despiking remove spikes in the signal.
  • baseline allows removing the background.
  • smooth allows smoothing signals.
  • tlcorrection corrects Raman spectra for temperature and laser wavelength effects.
  • normalise allows normalising the spectra.
  • extract_signal can extract specific portions of a signal.

Thanks to Julia's multiple dispatch, those functions support different types of inputs. Of course you will receive different outputs, see the individual documentation of each function for further details. This is quite convenient as this avoid you to write your own loops to process many spectra at once.

Let's now use some of those functions below on the signal generated above.

Sort X Axis

We can sort the data by passing an array of spectra to flipsp. After that we should have not problem plotting things with lines for instance!

spectrum_1 = flipsp([x_1 y_1])
spectrum_2 = flipsp([x_2 y_2])
plot(spectrum_1[:,1], spectrum_1[:, 2])
plot!(spectrum_2[:,1], spectrum_2[:, 2])

Remove spikes

OK, the plot above reveals a strong spike in one of the signals. We will treat actually both signals with despiking to remove possible spikes from the signals, using the default parameters. In summary, with the default settings, despiking checks if any points in a spectrum differ by more than 3 sigma from the mean value of the 4 neighboring points. You can change the default values to adjust the threshold (for more or less than 3-sigma), or to modify the number of neighboring points considered.

y_1 = despiking(x_1, y_1)
y_2 = despiking(x_2, y_2)
1000-element Vector{Float64}:
  0.18817811184926136
  2.825728859381716
 18.90814929798047
  5.552506574951748
  0.5404993688493123
  7.659309584346769
  2.1166773445622216
 -0.24237392463827892
  6.111709327837205
  2.0343588470916356
  ⋮
  0.3826958219865128
  1.1071294178761881
  0.3390395700211149
  0.6241420175657777
 14.268809220590922
  1.4067235017257023
 17.99350749200591
  0.9379340582981646
  0.5396373580960956
Tip

You could also call despiking on the collection of spectra as

collection_spectra = [[x_1 y_1], [x_2 y_2]] 
ys = despiking(collection_spectra)

Resample spectra

Using resample, we can resample a spectrum or spectra on a user-defined X axis by calling

x_new = collect(0.:0.5:100)
y_new = resample(x, y, x_new)

By default, resample uses a linear interpolation method from the DataInterpolations.jl package, but you can specify other methods available at https://docs.sciml.ai/DataInterpolations/stable/methods/.

If you have multiple spectra, it is here very interesting to provide a collection of those spectra because you will then receive an array of spectra in output, all sampled on the same X axis.

Continuing on the example shown above, we can do:

x_new = collect(0.:0.5:100)
spectra_ = [[x_1 y_1], [x_2 y_2]]
spectra_same_x = resample(spectra_, x_new)
plot(x_new, spectra_same_x)

Baseline subtraction

Baseline subtraction is performed using baseline, which serves as the main API and wraps several dedicated baseline correction algorithms. Similarly to the other functions, you can pass x and y vectors or a x vectors and an array of y spectra.

Continuing with our example, we will do here:

ys_corrected, ys_baselines = baseline(x_new, spectra_same_x, method="arPLS")
p1 = plot(x_new, spectra_same_x)
plot!(x_new, ys_baselines, labels=["background 1" "background 2"])

Other methods are available, see the Tutorials and baseline function documentation for further details!

Smoothing

Spectra smoothing can be achieved with the smooth function, which supports several algorithms:

  • Whittaker smoother: Custom Julia implementation based on the Matlab code of Eiler (2003). It supports both equally and unequally spaced X values.
  • Savitzky-Golay Smoother: Provided by the SavitskyGolay.jl library.
  • GCV cubic spline smoother: From the DataInterpolations.jl library.
  • Window-based smoothers: leverage the DSP.jl library.

For fine control over smoothing parameters, you can use the whittaker function directly, allowing you to change weights w or the smoothing order d (also possible in smooth).

Continuing with our example, we will pass the matrix of baseline corrected signals to smooth like:

smoothed_y = smooth(x_new, spectra_same_x; method="gcvspline")

p1 = plot(x_new, spectra_same_x)
plot!(x_new, smoothed_y, labels=["smoothed 1" "smoothed 2"])

Other methods are available, see the Tutorials and smooth function documentation for further details!

Signal normalisation

Using normalise, you can normalise signals to their maximum intensity (method="intensity"), the area under the curve (method="area") or to their minimum and maximum values (minimum will be set to 0, maximum to 1) (method="minmax").

For instance, continuing with our example, we can do:

normalised_ys = normalise(spectra_same_x, method="intensity")
p1 = plot(x_new, normalised_ys)

If you want to normalize the signals by their areas, you have to pass x values too, like:

normalised_y = normalise(y_matrix, x, method="intensity")

Signal extraction

Extract signals in specific regions of interest using extract_signal. You can pass associated x and y values, a single spectrum in the form of a [x y] matrix, or a list of [x y] matrices.

For instance, for a single signal in which we want the values between 40 and 60, we would write:

roi = [[40. 60.]]
extracted_x, extracted_y, indices = extract_signal(x, y, roi)

You can also extract the signals in different portions by using a matrix for the regions of interest. For instance, to extract signals between 20 and 40, and 60 and 80, we can do:

roi = [[20. 40.]; [60. 80.]]
extracted_x, extracted_y, indices = extract_signal(x, y, roi)

Functions API

Spectra.correct_xshiftFunction
correct_xshift(x::Vector{Float64}, y::Union{Vector{Float64}, Matrix{Float64}}, shift::Float64)
correct_xshift(sps::Vector{<:Matrix}, shift::Float64)

Return the signal(s) corrected from a given linear shift at the same x location as the input.

Signals can be provided as y (vector or an array of ys values) for a given x vector, or as a list of [x y] arrays of signals.

Depending on the arguments, it either returns a new vector or array of y at the position x, or a new list of corrected [x y] spectra.

This would be typically used to correct a linear shift in x on Raman spectra: for instance you measured the Si wafer peak at 522.1 cm-1 while you know it is at 520.7 cm-1. Therefore you will call this function to correct your spectra from this shift, without affecting the x values.

Examples

using Spectra

# for a vector y
x = [0., 1., 2., 3.]
y = 2*x
shift = -0.1

new_y = correct_xshift(x, y, shift)

# for an array of y
x2 = [0.5, 1.3, 2.0, 4.5]
y2 = [2*x 3*x 4*x]
new_y = correct_xshift(x, y2, shift)

# for a list of x-ys
old_spectra_list = [[x, y], [x2, y2]]
new_spectra_list = corrected(old_spectra_list, shift)
source
Spectra.nm_to_invcmFunction
nm_to_invcm(x::Vector{Float64}; laser_nm = 532.0)

Convert absolute wavelengths in nanometers (nm) to Raman shifts in inverse centimeters (cm⁻¹), given laser_nm, the wavelength of the excitation laser in nanometers.

Examples

If using a 532 nm laser line, you will do:

x_wavelength_nm = collect(557:1.0:560) # unit = nm
x_inv_cm = nm_to_invcm(x_wavelength_nm; laser_nm = 532.0)
source
Spectra.invcm_to_nmFunction
invcm_to_nm(shift_inv_cm::Vector{Float64}; laser_nm=532.0)

Convert Raman shifts in inverse centimeters (cm⁻¹) to absolute wavelengths in nanometers (nm), given laser_nm, the wavelength of the excitation laser in nanometers.

Examples

If using a 532 nm laser line, you will do:

x_inv_cm = collect(557:1.0:560) # unit = cm^-1
x_wavelength_nm = invcm_to_nm(x_inv_cm; laser_nm = 532.0)
source
Spectra.flipspFunction
flipsp(spectra::Union{Matrix{Float64}, Vector{<:Matrix}})

Return the spectrum array or a list of spectra arrays sorted by their first column values (increasing x).

Examples


# create some unsorted signals
x = [0., 2.,5.,-1]
x2 = [0., 2.,5.,-1, 3., -10.]
y = 2*x
y2 = 2*x2

# the arrays of signals
signal_1 = [x y]
signal_2 = [x2 y2]

# a list of signal arrays
sps = [signal_1, signal_2]

# flip one signal array
flipsp(signal_1)

# you can also do this
flipsp([x y])

# flip the list of signal arrays
flipsp(sps)
source
Spectra.resampleFunction
resample(x::Vector{Float64}, y::Union{Vector{Float64}, Matrix{Float64}}, x_new::Vector{Float64}; method::String="LinearInterpolation")
resample(multiple_spectra::Vector{<:Matrix{Float64}}, x_new::Vector{Float64}; method::String="LinearInterpolation")

Resample a signal or signals onto a new set of x-coordinates using interpolation.

Arguments

  • x::Vector{Float64}: The original x-coordinates corresponding to the input signal(s).
  • y::Union{Vector{Float64}, Matrix{Float64}}: The input signal(s) to resample.
    • If y is a vector, it represents a single signal.
    • If y is a matrix, each column represents a separate signal.
  • multiple_spectra::Vector{<:Matrix{Float64}}: A collection of spectra where each spectrum is a matrix with two columns:
    • First column: x-coordinates
    • Second column: y-values (signal intensities)
  • x_new::Vector{Float64}: The new x-coordinates onto which the signal(s) will be resampled.
  • method::String="LinearInterpolation": The interpolation method to use. Options include:

Returns

  • For single signal (vector) input: A vector of resampled values (Vector{Float64}).
  • For multiple signals (matrix) input: A matrix where each column corresponds to the resampled values of the respective input column (Matrix{Float64}).
  • For collection of spectra input: A matrix where each column contains the resampled y-values for the corresponding spectrum in the input collection.

Errors

  • Throws an error if an unsupported interpolation method is specified.
  • Throws an error if the dimensions of x and y do not match.

Methods

This function provides three methods to handle different input types:

  1. Single signal: resample(x::Vector{Float64}, y::Vector{Float64}, x_new::Vector{Float64})
  2. Multiple signals with common x-axis: resample(x::Vector{Float64}, y::Matrix{Float64}, x_new::Vector{Float64})
  3. Collection of spectra: resample(multiple_spectra::Vector{<:Matrix{Float64}}, x_new::Vector{Float64})

Notes

  • Uses the DataInterpolations.jl package for interpolation.
  • Extrapolation beyond the range of x is handled using the option extrapolation_right=ExtrapolationType.Extension and extrapolation_left=ExtrapolationType.Extension.
  • For the collection of spectra method, each spectrum matrix must have exactly two columns: x-coordinates in the first column and y-values in the second column. However, something great: the different spectra can have different lengths!
  • Automatically sorts the data

Examples

Example 1: resample a vector y or a matrix of ys (multiple spectra)

using Spectra, Plots

# signal creation
x = collect(0.:0.8:10.)
# create the signals with create_peaks()
peak_infos = [
    Dict(:type => :gaussian, :amplitude => 10.0, :center => 4.0, :hwhm => 0.6),
    Dict(:type => :gaussian, :amplitude => 5.0, :center => 6.0, :hwhm => 0.4),
]
ys, y = create_peaks(x, peak_infos)

# the new x axis
x_new = collect(0.:0.05:10.)

# resampling y as a vector
y2 = resample(x, y, x_new)

# resampling the ys array of the two peaks
y3 = resample(x, ys, x_new)

# plotting
p1 = scatter(x, y, label="Original data")
plot!(x_new, y2, label="Resampled y")
display(p1)

p2 = scatter(x, ys, label="Original peak data")
plot!(x_new, y3, label="Resampled peaks")
display(p2)

Example 2: resampling a collection of spectra

x = collect(0.:0.8:10.)
y, ys = create_peaks(x, peak_infos)
x2 = collect(0.:0.8:10.)
y2, ys2 = create_peaks(x2, peak_infos)
x3 = collect(0.:0.8:10.)
y3, ys3 = create_peaks(x3, peak_infos)

spectra_ = [[x y], [x2 y2], [x3 y3]]
x_new = collect(0.:0.05:10.)
spectra_resampled = resample(spectra_, x_new)
p3 = scatter(x, [y, y2, y3], label="Original data")
plot!(x_new, spectra_resampled, label="Resampled data")
display(p3)
source
Spectra.despikingFunction
despiking(x::Vector{Float64}, y::Vector{Float64}; neigh::Int=4, threshold::Int=3)
despiking(x::Vector{Float64}, y::Matrix{Float64}; neigh::Int=4, threshold::Int=3)
despiking(multiple_spectra::Vector{<:Matrix{Float64}}; neigh::Int=4, threshold::Int=3)

Remove spikes from signal(s) based on a threshold compared to a smoothed version.

This function smooths the spectra, calculates the residual error RMSE, and replaces points above threshold*RMSE with the average of non-spike neighboring points.

Arguments

  • x::Vector{Float64}: The x-coordinates of the signal.
  • y::Union{Vector{Float64}, Matrix{Float64}}: The signal(s) to despike.
    • If y is a vector, it represents a single signal.
    • If y is a matrix, each column represents a separate signal.
  • multiple_spectra::Vector{<:Matrix{Float64}}: A collection of spectra where each spectrum is a matrix with two columns:
    • First column: x-coordinates
    • Second column: y-values (signal intensities)
  • neigh::Int=4: Number of points around each spike to consider for calculating the replacement value.
  • threshold::Int=3: Multiplier of RMSE to identify spikes (points with residuals > threshold*RMSE).

Returns

  • For single signal input: A vector of despiked values (Vector{Float64}).
  • For multiple signals input: A matrix where each column corresponds to the despiked values of the respective input column (Matrix{Float64}).
  • For collection of spectra input: A vector of matrices, each containing the original x-coordinates and despiked y-values.

Methods

This function provides three methods to handle different input types:

  1. Single signal: despiking(x::Vector{Float64}, y::Vector{Float64}; neigh::Int=4, threshold::Int=3)
  2. Multiple signals with common x-axis: despiking(x::Vector{Float64}, y::Matrix{Float64}; neigh::Int=4, threshold::Int=3)
  3. Collection of spectra: despiking(multiple_spectra::Vector{<:Matrix{Float64}}; neigh::Int=4, threshold::Int=3)

Notes

  • The function uses the smooth() function with the "gcvspline" method to create a reference smoothed signal.
  • Spikes are identified as points where the residual error exceeds threshold*RMSE.
  • Spike values are replaced with the mean of neighboring non-spike points.

Examples

Example 1: Despiking a single signal

x = collect(0:0.1:10)
y = sin.(x) + 0.1*randn(length(x))
y[30] = 5.0 # Add a spike
y_clean = despiking(x, y)

Example 2: Despiking multiple signals with common x-axis

x = collect(0:0.1:10)
y1 = sin.(x) + 0.1randn(length(x))
y2 = cos.(x) + 0.1randn(length(x))
y1[30] = 5.0 # Add a spike to first signal
y2[40] = -4.0 # Add a spike to second signal
y_matrix = hcat(y1, y2)
y_clean_matrix = despiking(x, y_matrix)

Example 3: Despiking a collection of spectra

spectrum1 = hcat(collect(0:0.1:10), sin.(collect(0:0.1:10)) + 0.1randn(101))
spectrum1[30, 2] = 5.0 # Add a spike
spectrum2 = hcat(collect(0:0.1:8), cos.(collect(0:0.1:8)) + 0.1randn(81))
spectrum2[40, 2] = -4.0 # Add a spike
spectra_collection = [spectrum1, spectrum2]
clean_spectra = despiking(spectra_collection)

Errors

  • Throws an ArgumentError if x and y have different lengths.
  • Throws an ArgumentError if neigh or threshold are not positive integers.
source
Spectra.tlcorrectionFunction
tlcorrection(spectrum::Matrix{Float64}, temperature_C::Float64, laser_wavelength::Float64;
             correction="long", normalisation="area", density=2210.0)
tlcorrection(x::Vector{Float64}, y::Vector{Float64}, temperature_C::Float64, laser_wavelength::Float64;
             correction="long", normalisation="area", density=2210.0)
tlcorrection(multiple_spectra::Vector{<:Matrix{Float64}}, temperature_C::Float64, laser_wavelength::Float64;
             correction="long", normalisation="area", density=2210.0)

Temperature and laser wavelength correction for Raman spectra using one of three available correction equations: "long", "galeener", or "hehlen". Also supports optional normalization of the corrected spectra.

Arguments

  • spectrum::Matrix{Float64}: A single spectrum with x (Raman shift in cm⁻¹) and y (intensity) values in the first and second columns, respectively.
  • x::Vector{Float64}: The Raman shift values in cm⁻¹.
  • y::Vector{Float64}: The intensity values corresponding to x.
  • multiple_spectra::Vector{<:Matrix{Float64}}: A collection of spectra where each spectrum is a matrix with two columns (x and y values).
  • temperature_C::Float64: Temperature in degrees Celsius.
  • laser_wavelength::Float64: Wavelength of the excitation laser line in nanometers (nm).

Options

  • correction::String="long": The equation used for the correction. Choose from:
    • "long": Corrects using the Galeener equation with an additional ( nu_0^3 ) scaling factor for adimensionality (default).
    • "galeener": Uses the original Galeener equation without the ( nu_0^3 ) factor.
    • "hehlen": Applies the Hehlen equation, which includes density corrections to preserve low-frequency signals.
  • normalisation::String="area": Specifies whether to normalize the corrected signal. Options are:
    • "area": Normalizes by integrating over the area under the curve.
    • "intensity": Normalizes by peak intensity.
    • "minmax": Scales between minimum and maximum values.
    • "no": No normalization (default is "area").
  • density::Float64=2210.0: Density of the studied material in kg/m³, used only with the "hehlen" equation. Default is the density of silica glass.

Returns

For single spectrum input (x and y or spectrum):

  • x_out::Vector{Float64}: The Raman shift values (same as input x).
  • y_corr::Vector{Float64}: The corrected intensity values.
  • ese_corr::Vector{Float64}: The propagated errors on the corrected intensities.

For multiple spectra input (multiple_spectra):

  • A vector of tuples where each tuple contains (x_out, y_corr, ese_corr) for a single spectrum.

Notes

  • The old API is not indicated but still available for backward compatibility: you can call tlcorrection(spectrum, ...) with spectrum = [x y]
  • This correction uses the formula reported in Galeener and Sen (1978), Mysen et al. (1982), Brooker et al. (1988) and Hehlen et al. (2010).
  • The "galeener" equation is the exact one reported in Galeener and Sen (1978), which is a modification from Shuker and Gammon (1970) for accounting of (vo - v)^4 dependence of the Raman intensity. See also Brooker et al. (1988) for further discussion.
  • The "long" equation is that of Galeener and Sen (1978) corrected by a vo^3 coefficient for removing the cubic meter dimension of the equation of "galeener". This equation has been used in Mysen et al. (1982) and Le Losq et al. (2012).
  • The "hehlen" equation is that reported in Hehlen et al. (2010). It actually originates before this publication (see Brooker et al. 1988). It uses a different correction that avoid crushing the signal below 500 cm-1. THerefore, it has the advantage of keeping intact the Boson peak signal in glasses.

Notes

  1. Equations:
    • The "long" equation is a modified version of Galeener's equation that includes a nu_0^3 scaling factor to remove cubic meter dimensions. This version has been widely used in studies such as Mysen et al. (1982) and Le Losq et al. (2012).
    • The "galeener" equation is the original form reported by Galeener and Sen (1978), which modifies Shuker and Gammon's (1970) approach to account for (vo - v)^4 dependence.
    • The "hehlen" equation, introduced by Hehlen et al. (2010), avoids signal suppression below 500 cm⁻¹, preserving features like the Boson peak in glasses.
  2. Error Propagation:
    • Errors are calculated as sqrt{y} for raw data and propagated through the correction process.
  3. Normalization:
    • If normalization is enabled, it is applied after the correction step

Examples

Example 1: Correcting a single spectrum, provided as x-y

using Spectra, Random
x = collect(100.0:1.0:1000)
y = rand(length(x)) # Example spectrum
temperature_C = 25.0
laser_wavelength = 532.0 # nm
x_out, y_corr, ese_corr = tlcorrection(x, y, temperature_C, laser_wavelength)

Example 2: Correcting multiple spectra

spectrum1 = hcat(collect(100.0:10:1000), rand(91))
spectrum2 = hcat(collect(100.0:10:1000), rand(91))
multiple_spectra = [spectrum1, spectrum2]
corrected_spectra = tlcorrection(multiple_spectra, temperature_C, laser_wavelength)

Errors

  • Throws an error if an unsupported correction method is specified.
  • Throws an error if normalization is not one of "area", "intensity", "minmax", or "no".
  • Throws an error if input spectra have fewer than two columns or if no spectra are provided for multiple-spectra input.

References

  • Brooker et al. (1988) Journal of Raman Spectroscopy 19(2), 71-78.
  • Galeener and Sen (1978) Physical Review B 17 (4): 1928–33.
  • Hehlen (2010) Journal of Physics: Condensed Matter 22 (2): 025401.
  • Le Losq et al. (2012) American Mineralogist, 97, 779–790.
  • Mysen et al. (1982) American Mineralogist 67: 686–95.
  • Shuker and Gammon (1970) Physical Review Letters 25 (4): 222–25.
source
Spectra.normaliseFunction
normalise(y::Union{Vector{Float64}, Matrix{Float64}}; x::Union{Vector{Float64}, Nothing}=nothing, method::String="intensity")

Normalise the y signal(s) using one of several methods.

Arguments

  • y::Union{Vector{Float64}, Matrix{Float64}}: The input signal(s) to normalize.
    • If y is a vector, it represents a single signal.
    • If y is a matrix, each column represents a separate signal.
  • x::Union{Vector{Float64}, Nothing}: The x-coordinates corresponding to the y values (used only for "area" normalization). Default is nothing.
  • method::String: The normalization method to use. Options are:
    • "area": Normalize by the area under the curve (requires x).
    • "intensity": Normalize by dividing by the maximum intensity.
    • "minmax": Normalize to the range [0, 1].

Returns

  • A normalized vector or matrix of signals (Vector{Float64} or Matrix{Float64}).

Notes

  • For "area" normalization, you must provide an x vector with the same length as each column of y.
  • If using "intensity" or "minmax", no x vector is required.

Examples

using Spectra, Plots

# Single signal normalization

x = collect(0.:0.1:10.)

# create a signal that is the combination of two gaussian peaks
y, ys = gaussiennes([10.,5.], [5.,6.], [1.,0.1], x)

# normalise the signal by area
y_norm = normalise(y; x=x, method="area")
plot(x, y_norm)

# Or normalise multiple signals, such as the two peaks created above (no need of the x axis in this case):

peaks_norm = normalise(ys, method="intensity")
plot(x, y_norm)
source
Spectra.extract_signalFunction
extract_signal(x::Vector{Float64}, y::Vector{Float64}, roi::Matrix{Float64}) -> Tuple{Vector{Float64}, Vector{Float64}, Vector{Int}}
extract_signal(spectrum::Matrix{Float64}, roi::Matrix{Float64})
extract_signal(multiple_spectra::Vector{<:Matrix{Float64}}, roi::Matrix{Float64})

Extract the x-y spectral values in specified regions of interest (ROI) and their indices.

You can pass associated x and y values, a single spectrum in the form of a [x y] matrix, or a list of [x y] matrices.

Arguments

  • x::Vector{Float64}: The x-axis values.
  • y::Vector{Float64}: The corresponding y-axis values.
  • spectrum::Matrix{Float64}: A matrix of size n x 2, where:
    • Column 1: x-axis values.
    • Column 2: y-axis values.
  • multiple_spectra::Vector{<:Matrix{Float64}}: A collection of spectra, where each spectrum is a matrix with two columns:
    • First column: x-coordinates
    • Second column: y-values (signal intensities)
  • roi::Matrix{Float64}: A matrix of size n x 2, where each row specifies a region of interest:
    • Column 1: Lower bounds of the ROI.
    • Column 2: Upper bounds of the ROI.

Returns

  • if calling extract_signal(x, y, roi), it returns a tuple containing:
    1. Vector{Float64}: The x values within the ROI.
    2. Vector{Float64}: The y values within the ROI.
    3. Vector{Int}: The indices of the x and y values within the ROI.
  • if calling extract_signal(spectrum, roi), it returns a tuple containing:
    1. Vector{Float64}: The x values within the ROI.
    2. Vector{Float64}: The y values within the ROI.
    3. Vector{Int}: The indices of the x and y values within the ROI.
  • if calling extract_signal(multiple_spectra, roi), it returns a vector of tuples, where each tuple corresponds to a spectrum and contains:
    1. Vector{Float64}: The x values within the ROI.
    2. Vector{Float64}: The y values within the ROI.
    3. Vector{Int}: The indices of the x and y values within the ROI.

Errors

  • Throws an error if x and y have different lengths.
  • Throws an error if roi is not a 2D matrix with exactly 2 columns.

Notes

  • The function sorts both x-y pairs and the ROI for safety before processing.
  • Multiple ROIs are supported, and their results are concatenated.
source
Spectra.baselineFunction
baseline(x_input::Vector{Float64}, y_input::Vector{Float64}; roi::Union{Matrix{Float64}, Nothing} = nothing, method::String = "polynomial", kwargs...)
baseline(x_input::Vector{Float64}, y_input::Matrix{Float64}; roi::Union{Matrix{Float64}, Nothing} = nothing,  method::String = "polynomial", kwargs...)
baseline(multiple_spectra::Vector{<:Matrix{Float64}}; roi::Union{Matrix{Float64}, Nothing} = nothing,  method::String = "polynomial", kwargs...)

Subtract a baseline from a spectrum y sampled at x values, or a set of spectra, using specified methods that can either fit regions of interests (roi) defined by the user, or use automatic algorithms.

Arguments

  • x_input::Vector{Float64}: The x-axis values (e.g., wavelengths or time points).
  • y_input::Union{Vector{Float64}, Matrix{Float64}}: The spectral data to correct. Can be a single spectrum (vector) or multiple spectra (matrix).
  • multiple_spectra::Vector{<:Matrix{Float64}}: A collection of spectra, where each spectrum is a matrix with two columns:
    • First column: x-coordinates
    • Second column: y-values (signal intensities)
  • roi::Union{Matrix{Float64}, Nothing}: Regions of interest for baseline fitting, specified as a matrix where each row defines a range [start, end]. Default is nothing.
  • method::String: The baseline fitting method. Default is "polynomial". Supported methods:
    • "polynomial" or "poly": Polynomial fitting.
    • "Dspline": 1D spline fitting using Dierckx library.
    • "gcvspline": Generalized Cross Validation Spline fitting.
    • "exp": Exponential background fitting.
    • "log": Logarithmic background fitting.
    • "rubberband": Rubberband baseline fitting.
    • "als": Asymmetric least squares baseline fitting.
    • "arPLS": Asymmetrically reweighted penalized least squares fitting.
    • "drPLS": Doubly reweighted penalized least squares fitting.

Keyword Arguments

Optional parameters for specific methods:

  • polynomial_order::Int: Degree of polynomial for polynomial fitting. Default is 1.
  • s::Float64: Smoothing coefficient for spline methods. Default is 2.0.
  • lambda::Float64: Smoothness parameter for ALS, arPLS, drPLS, and Whittaker methods.
  • p::Float64: Parameter for ALS algorithm. Default is 0.01.
  • ratio::Float64: Parameter for arPLS and drPLS algorithms. Default is 0.01.
  • niter::Int: Number of iterations for ALS and drPLS algorithms. Default is 10.
  • eta::Float64: Roughness parameter for drPLS algorithm. Default is 0.5.
  • d::Int: Order of differences for smoothing algorithms. Default is 2.
  • d_gcv::Int: Order of differences for GCV spline algorithm. Default is 3.
  • p0_exp::Vector{Float64}: Initial parameters for exponential fitting. Default is [1., 1., 1.].
  • p0_log::Vector{Float64}: Initial parameters for logarithmic fitting. Default is [1., 1., 1., 1.].

Returns

  • corrected_signal::Union{Vector{Float64}, Matrix{Float64}}: Baseline-corrected signal(s).
  • baseline::Union{Vector{Float64}, Matrix{Float64}}: Calculated baseline(s).
Note

If providing a collection of spectra, you will receive a collection of tuples (corrected_signal, baseline)

Examples

Correcting a single spectrum:

x = collect(50:1.0:500)
background = 10.0 .* sin.(x./50.0) + 0.1.*x
y = 50.0 .* exp.(-log(2) .* ((x .-250.0)./1.0).^2) + background
y_corrected, y_baseline = baseline(x, y, method="drPLS")

Correcting with regions of interest, GCV spline method:

roi = [[50.0, 200.0], [300.0, 500.0]]
y_corrected, y_baseline = baseline(x, y, roi=roi, method="gcvspline")

You can also adjust manually the smoothing spline coefficient s:

y_corrected, y_baseline = baseline(x, y, roi=roi, method="gcvspline", s=1.0)

Using a vector or arrays of y values:

using Spectra, Plots

# we create a fake signal with 2 peaks plus 2 backgrounds
x = collect(0.:0.2:10.)

# create the signals with create_peaks()
peak_infos = [
    Dict(:type => :gaussian, :amplitude => 10.0, :center => 4.0, :hwhm => 0.6),
    Dict(:type => :gaussian, :amplitude => 5.0, :center => 6.0, :hwhm => 0.4),
]
ys, _ = create_peaks(x, peak_infos)

# add backgrounds
ys[:,1] = ys[:,1] .+ 0.1 * x
ys[:,2] = ys[:,2] .+ 0.2 * x

# fit the background on the first peak: provide as a vector
y_corr, base_ = baseline(x, ys[:,1], method="als")
p1 = plot(x, ys[:,1])
plot!(x, base_)
display(p1)

# Fit the background on multiple peaks: just provide the array!
y_corr, base_ = baseline(x, ys, method="als")
p2 = plot(x, ys, label=["signal 1" "signal 2"])
plot!(x, base_, label=["background 1" "background 2"])
display(p2)
````
## Treating a collection of y values:

Given x1, y1 and x2, y2 two signals, we can create a collection of spectra
and pass it to baseline():

julia collectionsp = [[x1 y1], [x2 y2]] collectedbaselines = baseline(collectionsp, method="als) ```Thecollectedbaselines` is then a vector that contains tuples of (y_corrected, baseline).

source
Spectra.als_baselineFunction
als_baseline(x::Vector{Float64}, y::Vector{Float64}; lambda::Float64=1.0e5, p::Float64=0.01, niter::Int=50, d::Int=2) -> Vector{Float64}

Estimate the baseline of a signal using the Asymmetric Least Squares (ALS) method.

Arguments

  • x::Vector{Float64}: The x-axis values (must be increasing).
  • y::Vector{Float64}: The corresponding y-axis values.
  • lambda::Float64=1.0e5: Smoothing parameter; larger values result in smoother baselines.
  • p::Float64=0.01: Asymmetry parameter; typically between 0.001 and 0.1.
  • niter::Int=50: Number of iterations for the algorithm.
  • d::Int=2: Order of differences for the penalty term.

Returns

  • z::Vector{Float64}: The estimated baseline.
Note

This method uses an iterative approach to minimize the asymmetric least squares error, making it suitable for signals with varying background intensity.

References

G. Eilers and H. Boelens, "Baseline correction with asymmetric least squares smoothing" 2005.

source
Spectra.arPLS_baselineFunction
arPLS_baseline(x::Vector{Float64}, y::Vector{Float64}; lambda::Float64=1.0e5, ratio::Float64=0.01, d::Int=2) -> Vector{Float64}

Estimate the baseline of a signal using the Adaptive Reweighted Penalized Least Squares (arPLS) method.

Arguments

  • x::Vector{Float64}: The x-axis values (must be increasing).
  • y::Vector{Float64}: The corresponding y-axis values.
  • lambda::Float64=1.0e5: Smoothing parameter; larger values result in smoother baselines.
  • ratio::Float64=0.01: Convergence ratio for stopping criterion.
  • d::Int=2: Order of differences for the penalty term.

Returns

  • z::Vector{Float64}: The estimated baseline.

Notes

The arPLS algorithm iteratively adjusts weights based on residuals to improve baseline estimation accuracy.

References

Baek et al., "Baseline correction using adaptive iteratively reweighted penalized least squares," Analyst 140 (2015): 250–257.

source
Spectra.drPLS_baselineFunction
drPLS_baseline(x::Vector{Float64}, y::Vector{Float64}; lambda::Float64=1.0e5, niter::Int=50, eta::Float64=0.5, ratio::Float64=0.001, d::Int=2) -> Vector{Float64}

Perform baseline correction using the Doubly Reweighted Penalized Least Squares (drPLS) algorithm.

Arguments

  • x::Vector{Float64}: The x-axis values (must be increasing).
  • y::Vector{Float64}: The corresponding y-axis values.
  • lambda::Float64=1.0e5: Smoothing parameter; larger values result in smoother baselines.
  • niter::Int=50: Maximum number of iterations for the algorithm.
  • eta::Float64=0.5: Parameter controlling the influence of weights in the penalty term.
  • ratio::Float64=0.001: Convergence threshold for stopping criterion.
  • d::Int=2: Order of differences for the penalty term.

Returns

  • baseline_fitted::Vector{Float64}: The estimated baseline.

Notes

The drPLS algorithm builds on arPLS by introducing an additional reweighting scheme to improve robustness against noise and outliers.

References

Xu et al., "Baseline correction method based on doubly reweighted penalized least squares," Applied Optics 58 (2019): 3913–3920.

source
Spectra.rubberband_baselineFunction
rubberband_baseline(x::Vector{Float64}, y::Vector{Float64}; segments=1) -> Vector{Float64}

Estimate a baseline using the rubberband method based on the lower convex hull.

Warning

This function is currently under development and may not behave as intended in all cases. !!

Arguments

  • x::Vector{Float64}: The x-axis values of the data.
  • y::Vector{Float64}: The corresponding y-axis values.
  • segments (optional): Specifies how to segment the data:
    • If an integer, splits the data into equally sized segments (default is 1).
    • If a vector of integers, specifies exact indices where segmentation occurs.

Returns

  • baseline_fitted::Vector{Float64}: The estimated baseline as a lower envelope of the data.

Raises

  • Throws an error if x and y have different lengths.
  • Throws an error if a segment does not contain enough points for interpolation.

Notes

The rubberband method estimates a baseline by:

  1. Identifying local minima to approximate the lower envelope.
  2. Computing the convex hull of these points.
  3. Interpolating between convex hull points to generate a smooth baseline.
source
Spectra.smoothFunction
smooth(x::Vector{Float64}, y::Union{Vector{Float64}, Matrix{Float64}}; 
       method::String = "gcvspline", 
       window_length::Int = 5, 
       polyorder::Int = 2, 
       lambda::Float64 = 10.0^5, 
       d::Int = 2, 
       w = nothing, 
       d_gcv::Int = 3)

Smooth a signal using various smoothing methods.

Arguments

  • x::Vector{Float64}: The x-coordinates of the input signal (e.g., time or spatial values).
  • y::Union{Vector{Float64}, Matrix{Float64}}: The y-coordinates of the input signal to be smoothed.
  • method::String: The smoothing method to use. Options include:
    • "whittaker": Whittaker smoother, which uses weights (w) and smoothing parameter (lambda).
    • "gcvspline": Generalized cross-validation spline smoothing (requires DataInterpolations.jl).
    • "flat": Moving average.
    • "hanning", "hamming", "bartlett", "blackman": Window-based smoothing methods (requires DSP.jl).
    • "savgol": Savitzky-Golay filter.
  • window_length::Int: The length of the smoothing window (used in window-based and Savitzky-Golay methods). Must be a positive odd integer
  • polyorder::Int: The polynomial order for the Savitzky-Golay filter. Must be less than window_length
  • lambda::Float64: Smoothing parameter for the Whittaker smoother. Higher values result in smoother fits
  • d::Int: Order of differences for the Whittaker smoother. Default is 2.
  • w: Weights for the Whittaker smoother. size(w) should be equal to size(y). If not provided (nothing), uniform weights are used by default.
  • d_gcv::Int: Order of differences for the GCV spline algorithm (used in "gcvspline")

Returns

  • A smoothed vector or matrix of signals (Vector{Float64} or Matrix{Float64}).

Notes

  • Deprecated Methods: The methods "GCVSmoothedNSpline", "MSESmoothedNSpline", and "DOFSmoothedNSpline" are no longer supported and will throw an error if used.
  • For window-based methods ("flat", "hanning", etc.), the signal is symmetrically extended at both ends to reduce edge effects.
  • The Savitzky-Golay filter requires that window_length be a positive odd integer and that polyorder be less than window_length.
  • For Whittaker smoothing, if weights (w) are provided, they must have the same length as x.

Example

using Spectra, Plots

x = collect(1:10)
y = [1.0, 2.5, 3.0, 4.2, 5.1, 6.0, 7.3, 8.1, 9.4, 10.0]
method = "hanning"
window_length = 3

smoothed_y = smooth(x, y; method=method, window_length=window_length)
p1 = plot(x, [y smoothed_y], label=["Original" "Smoothed"], title="Smoothing Example", xlabel="X-axis", ylabel="Y-axis")
display(p1)

Errors

  • Throws an error if an unsupported or deprecated method is specified.
  • Throws an error if window_length is not a positive odd integer.
  • For Savitzky-Golay smoothing:
    • Throws an error if polyorder is greater than or equal to window_length.
  • For Whittaker smoothing:
    • Throws an error if weights (w) are provided but do not match the length of x.

References

source
Spectra.whittakerFunction
whittaker(x::Vector{Float64}, y::Vector{Float64}, w::Vector{Float64}, lambda::Float64; d::Int=2) -> Vector{Float64}

Smooth a signal using the Whittaker smoother with divided differences.

Arguments

  • x::Vector{Float64}: The x-axis values of the data (must be increasing).
  • y::Vector{Float64}: The corresponding y-axis values, assumed to be sampled at equal intervals.
  • w::Vector{Float64}: Weights for each data point. Higher weights indicate greater importance in smoothing.
  • lambda::Float64: Smoothing parameter; larger values result in smoother outputs. It is recommended to be set based on the length of x.
  • d::Int=2: Order of differences for the penalty term (default is 2).

Returns

  • z::Vector{Float64}: The smoothed y-axis values.

Notes

  • For equally spaced x values, higher-order differences are computed manually.
  • For unequally spaced x values, a difference matrix is constructed using the ddmat function.
  • A sparse matrix representation is used for computational efficiency.

References

  • Eilers, P.H.C. (2003). "A perfect smoother." Analytical Chemistry, 75, 3631–3636.

Example

using Spectra, Random, Plots
x = sort(rand(50) .* 10)                 # Randomly spaced x values in [0, 10]
true_y = sin.(x)
y = true_y .+ 0.1 .* randn(length(x))  # Noisy sine wave
w = ones(length(x))                     # Equal weights
lambda = 0.1                          # Smoothing parameter, !!! its value depends on the length of x !!!
z = whittaker(x, y, w, lambda; d=2)

p1 = plot(x, true_y); plot!(x, y); plot!(x, z), display(p1)

Matlab version by Paul Eilers, 2003 Julia translation by Charles Le Losq 2017, revised 2025

source