Data Processing

Introduction

Spectra allows you to perform several processing steps on x-y spectral data. Below we will showcase short examples, and then you will find the documentation of the variosu functions you may want to use!

As a starting point and for the sack of example, we create two synthetic signals to play with. They will be Gaussian signals randomly sampled along two different X axis, with noise and increasing backgrounds. One of them will also have a strong spike!

# Signal creation
using Spectra, Plots

# we create a fake signal with
x_1 = rand(1000)*100
x_2 = rand(1000)*100

# create a signal that is the combination of two gaussian peaks plus a background
background_1 = 0.08 * x_1
background_2 = 0.03 * x_2

# some noise
noise_1 = 0.5*randn(1000)
noise_2 = 0.3*randn(1000)

# the full toy signals
y_1 = gaussian(x_1, 10.0, 40., 5.) .+ background_1 .+ noise_1
y_2 = gaussian(x_2, 20.0, 60., 9.) .+ background_2 .+ noise_2

# one of them will have a spike!
y_1[600] = 250.0

# make a plot of our two spectra
scatter(x_1, y_1)
scatter!(x_2, y_2)

We can do the following steps (not necessarily in this order):

correct_xshift allows correcting X-axis shifts of your spectra from a reference value (e.g. silicon wafer reference in Raman spectroscopy).
nm_to_invcm or invcm_to_nm convert the X axis between nanometers (nm) and wavenumbers (cm$^{-1}$).
flipsp sort the X-axis (this is necessary for some algorithms).
resample allows getting our spectra on the same X axis for convenience.
despiking remove spikes in the signal.
baseline allows removing the background.
smooth allows smoothing signals.
tlcorrection corrects Raman spectra for temperature and laser wavelength effects.
normalise allows normalising the spectra.
extract_signal can extract specific portions of a signal.

Thanks to Julia's multiple dispatch, those functions support different types of inputs. Of course you will receive different outputs, see the individual documentation of each function for further details. This is quite convenient as this avoid you to write your own loops to process many spectra at once.

Let's now use some of those functions below on the signal generated above.

Sort X Axis

We can sort the data by passing an array of spectra to flipsp. After that we should have not problem plotting things with lines for instance!

spectrum_1 = flipsp([x_1 y_1])
spectrum_2 = flipsp([x_2 y_2])
plot(spectrum_1[:,1], spectrum_1[:, 2])
plot!(spectrum_2[:,1], spectrum_2[:, 2])

Remove spikes

OK, the plot above reveals a strong spike in one of the signals. We will treat actually both signals with despiking to remove possible spikes from the signals, using the default parameters. In summary, with the default settings, despiking checks if any points in a spectrum differ by more than 3 sigma from the mean value of the 4 neighboring points. You can change the default values to adjust the threshold (for more or less than 3-sigma), or to modify the number of neighboring points considered.

y_1 = despiking(x_1, y_1)
y_2 = despiking(x_2, y_2)

1000-element Vector{Float64}:
  0.18817811184926136
  2.825728859381716
 18.90814929798047
  5.552506574951748
  0.5404993688493123
  7.659309584346769
  2.1166773445622216
 -0.24237392463827892
  6.111709327837205
  2.0343588470916356
  ⋮
  0.3826958219865128
  1.1071294178761881
  0.3390395700211149
  0.6241420175657777
 14.268809220590922
  1.4067235017257023
 17.99350749200591
  0.9379340582981646
  0.5396373580960956

Tip

You could also call despiking on the collection of spectra as

collection_spectra = [[x_1 y_1], [x_2 y_2]] 
ys = despiking(collection_spectra)

Resample spectra

Using resample, we can resample a spectrum or spectra on a user-defined X axis by calling

x_new = collect(0.:0.5:100)
y_new = resample(x, y, x_new)

By default, resample uses a linear interpolation method from the DataInterpolations.jl package, but you can specify other methods available at https://docs.sciml.ai/DataInterpolations/stable/methods/.

If you have multiple spectra, it is here very interesting to provide a collection of those spectra because you will then receive an array of spectra in output, all sampled on the same X axis.

Continuing on the example shown above, we can do:

x_new = collect(0.:0.5:100)
spectra_ = [[x_1 y_1], [x_2 y_2]]
spectra_same_x = resample(spectra_, x_new)
plot(x_new, spectra_same_x)

Baseline subtraction

Baseline subtraction is performed using baseline, which serves as the main API and wraps several dedicated baseline correction algorithms. Similarly to the other functions, you can pass x and y vectors or a x vectors and an array of y spectra.

Continuing with our example, we will do here:

ys_corrected, ys_baselines = baseline(x_new, spectra_same_x, method="arPLS")
p1 = plot(x_new, spectra_same_x)
plot!(x_new, ys_baselines, labels=["background 1" "background 2"])

Other methods are available, see the Tutorials and baseline function documentation for further details!

Smoothing

Spectra smoothing can be achieved with the smooth function, which supports several algorithms:

Whittaker smoother: Custom Julia implementation based on the Matlab code of Eiler (2003). It supports both equally and unequally spaced X values.
Savitzky-Golay Smoother: Provided by the SavitskyGolay.jl library.
GCV cubic spline smoother: From the DataInterpolations.jl library.
Window-based smoothers: leverage the DSP.jl library.

For fine control over smoothing parameters, you can use the whittaker function directly, allowing you to change weights w or the smoothing order d (also possible in smooth).

Continuing with our example, we will pass the matrix of baseline corrected signals to smooth like:

smoothed_y = smooth(x_new, spectra_same_x; method="gcvspline")

p1 = plot(x_new, spectra_same_x)
plot!(x_new, smoothed_y, labels=["smoothed 1" "smoothed 2"])

Other methods are available, see the Tutorials and smooth function documentation for further details!

Signal normalisation

Using normalise, you can normalise signals to their maximum intensity (method="intensity"), the area under the curve (method="area") or to their minimum and maximum values (minimum will be set to 0, maximum to 1) (method="minmax").

For instance, continuing with our example, we can do:

normalised_ys = normalise(spectra_same_x, method="intensity")
p1 = plot(x_new, normalised_ys)

If you want to normalize the signals by their areas, you have to pass x values too, like:

normalised_y = normalise(y_matrix, x, method="intensity")

Signal extraction

Extract signals in specific regions of interest using extract_signal. You can pass associated x and y values, a single spectrum in the form of a [x y] matrix, or a list of [x y] matrices.

For instance, for a single signal in which we want the values between 40 and 60, we would write:

roi = [[40. 60.]]
extracted_x, extracted_y, indices = extract_signal(x, y, roi)

You can also extract the signals in different portions by using a matrix for the regions of interest. For instance, to extract signals between 20 and 40, and 60 and 80, we can do:

roi = [[20. 40.]; [60. 80.]]
extracted_x, extracted_y, indices = extract_signal(x, y, roi)

Functions API

Spectra.correct_xshift — Function

correct_xshift(x::Vector{Float64}, y::Union{Vector{Float64}, Matrix{Float64}}, shift::Float64)
correct_xshift(sps::Vector{<:Matrix}, shift::Float64)

Return the signal(s) corrected from a given linear shift at the same x location as the input.

Signals can be provided as y (vector or an array of ys values) for a given x vector, or as a list of [x y] arrays of signals.

Depending on the arguments, it either returns a new vector or array of y at the position x, or a new list of corrected [x y] spectra.

This would be typically used to correct a linear shift in x on Raman spectra: for instance you measured the Si wafer peak at 522.1 cm-1 while you know it is at 520.7 cm-1. Therefore you will call this function to correct your spectra from this shift, without affecting the x values.

Examples

using Spectra

# for a vector y
x = [0., 1., 2., 3.]
y = 2*x
shift = -0.1

new_y = correct_xshift(x, y, shift)

# for an array of y
x2 = [0.5, 1.3, 2.0, 4.5]
y2 = [2*x 3*x 4*x]
new_y = correct_xshift(x, y2, shift)

# for a list of x-ys
old_spectra_list = [[x, y], [x2, y2]]
new_spectra_list = corrected(old_spectra_list, shift)