edit

Pre-Processing

Temperature and frequency corrections for Raman spectra

Raman spectra can be corrected from temperature and excitation line effects using this function.

# Spectra.tlcorrection — Method.

tlcorrection(data::Array{Float64},temp::Float64,wave::Float64;correction="long",normalisation="area",density=2210.0)

INPUTS:

data: Array{Float64}, input spectrum with x and y in first and second columns respectively;

temp: Float64, the temperature in °C;

wave: Float64, the wavenumber at which the spectrum was acquirred in nm.

OPTIONS:

correction: String, the equation used for the correction. Choose between "long", "galeener", or "hehlen". Default = "long".

normalisation: String, indicate if you want to normalise your signal or not. Choose between "intensity", "area", or "no". Default = "area".

density: Float64, the density of the studied material in kg m-3, to be used with the "hehlen" equation. Default = 2210.0 (density of silica).

OUTPUTS:

(are combined in one array if only one output name is given)

x: Array{Float64}, containing the x values;

long: Array{Float64}, containing the corrected y values;

eselong: Array{Float64}, containing the errors calculated as sqrt(y) on raw data and propagated after the correction.

NOTES:

This correction uses the formula reported in Galeener and Sen (1978), Mysen et al. (1982), Brooker et al. (1988) and Hehlen et al. (2010).

The "galeener" equation is the exact one reported in Galeener and Sen (1978), which is a modification from Shuker and Gammon (1970) for accounting of (vo - v)^4 dependence of the Raman intensity. See also Brooker et al. (1988) for further discussion.

The "long" equation is that of Galeener and Sen (1978) corrected by a vo^3 coefficient for removing the cubic meter dimension of the equation of "galeener". This equation has been used in Mysen et al. (1982), Neuville and Mysen (1996) and Le Losq et al. (2012).

The "hehlen" equation is that reported in Hehlen et al. (2010). It actually originates before this publication (Brooker et al. 1988). It uses a different correction that avoid crushing the signal below 500 cm-1. THerefore, it has the advantage of keeping intact the Boson peak signal in glasses.

source

Removing cristal or epoxy signals

Spectra.jl contains a function that helps removing the signal from crystals in the Raman spectra of glasses. Two spectra are needed: that of the mixed crystal+glass signals, and that of the pure cristal signals. Please note that it also can be used to remove signal from epoxy.

This function is still under test and experimental. Further details on the code will be provided soon. For now, only a short description is provided.

# Spectra.ctxremoval — Method.

ctxremoval(liste,in_path,out_path,roi_all;input_properties=('   ',0),algorithm="FastICA",plot_intermediate_show = "no",plot_mixing_show = "yes",plot_final_show = "no",save_fig_switch = "yes", shutdown = 1300.,scaling=100.)

INPUTS

liste: Array{Float64}, an array contaning the information for getting the spectra. 

    Column 1: the name and relative path for the crystal spectra; 

    Column 2: the name and relative path for the (mixed) glass spectra;

    Column 3: smo_water, the coefficient of smothing for the spline that fits the backgrous below the signal of water (if any) in the glass spectra

    Column 4: number of iteration for the generation of new mixed spectra

    Column 5: the K_Start parameter, set at 0.0

    Column 6: the K_increament parameter, for mixing the signals

in_path: String, the relative location of the data, e.g. "./raw/"

out_path: String, the relative location where you want to save the corrected spectra, e.g. "./treated/" 

roi_all: Tuple, contains 2 arrays and 2 Float64 numbers. The 2 arrays indicate the regions of interest where the background correction is applied, for the cristal and the glass. The  2 float numbers indicate the starting and ending frequency of the peak used to correct the spectra from any shift in frequency. For instance:

    Those are the roi for fitting the baseline on the crystal (roi_ctx) and glass (roi_glass) signals:

    roi_ctx = [1260. 2000.;2000. 4000.]

    roi_glass = [1260. 2000.;2000. 3000.;3750. 4000.]

    We have a strong peak from the crystal at ~650 cm-1 that we can use to correct the spectra from any shift in frequency. So we indicate the values here:

    roi_xshift_low = 655.

    roi_xshift_high = 670.

    Then we construct the final tuple as:

    roi_all = (roi_ctx,roi_glass,roi_xshift_low,roi_xshift_high)

OPTIONS

input_properties: Tuple, this tuple contains the delimiter and the number of lines to skip in the raw data files. Default = ('  ',0);

algorithm: String, This indicates if the FastICA or the Non-negative Matrix Factorisation (NMF) algorithms from SciKit Learn will be used. Default = "FastICA";

plot_intermediate_show: String, This should be equal to "yes" or "no". It displays the intermediate figures. Default = "no";

plot_mixing_show: String, This should be equal to "yes" or "no". It displays the figures showing the mixing step. Default = "yes";

plot_final_show: String, This should be equal to "yes" or "no". It displays the final figures, showing the background subtraction and the retrieved signals. Default = "yes";

save_fig_switch: String, This should be equal to "yes" or "no". It indicates if you want to save the final figures in the location indicated by out_path;

shutdown: Float64, indicates where you consider the signals from silicate units to stop. Default = 1300.0;

scaling: Float64, the retrieved spectra are scaled to the original spectra using the Boson peak, located ~ 60-80 cm-1. This parameters indicates where you consider the Boson peak to stop for the scaling procedure. No need to put a too high value, as you might get strong crystal signals at frequencies > 100-150 cm-1.

OUTPUTS

All the corrected spectra and figures are saved in the location indicated in out_path. No direct outputs in Julia.

source

Smoothing signals

Smoothing the signal is achieved with the smooth function. Use of the GCVSplineNSmooth algorithm from the gcvspline Python library is recommended. It seems to give very reasonable smoothing signals. In the present call of GCVSplineNSmooth, errors as sqrt(y) are assumed.

# Spectra.smooth — Method.

smooth(x::Array{Float64},y::Array{Float64};filter=:SavitzkyGolay,M=5,N=2,ese_y = 1.0)

This function allows smoothing noisy data with Savitzky-Golay filter or GCV splines.

INPUTS:

x: Array{Float64}

1 column or vector array x values;

y: Array{Float64}

1 column or vector array y values associated with x;

OPTIONS:

filter: Symbol

the filter that will be used. Available filters are :SavitzkyGolay, :GCVSmoothedNSpline, :MSESmoothedNSpline, :DOFSmoothedNSpline, :Whittaker. The spline filters are automatic, but the :SavitzkyGolay filter requires adjusting the M and N parameters.

M = 5, Int

half window size of the :SavitzkyGolay filter. M is the number of points before and after to interpolate, i.e. the full width of the window is 2M+1;

N = 2, Int

polynomial degree of the :SavitzkyGolay filter;

lambda = 10.0^5, Float64

smoothing parameter of the Whittaker filter described in Eilers (2003). The higher the smoother the fit.

d = 2, Int

d parameter in Whittaker filter, see Eilers (2003)

ese_y, Float64 or Array{Float64}

an estimation of the errors on y

OUTPUTS:

smoothed_y: Array{Float64}

the smoothed y values.

NOTE:

See documentation and examples of gcvspline at https://charlesll.github.io/gcvspline/ for details on the gcvspline Python library.

source

Baseline subtraction

Baseline subtraction can be made with using the baseline function:

# Spectra.baseline — Method.

baseline(x::Array{Float64},y::Array{Float64},roi::Array{Float64},basetype::AbstractString;p=1.0,lambda = 10^5,SplOrder=3,roi_out="no")

Baseline subtraction can be made with using the baseline function:

INPUTS:

x: Array{Float64}

containing the x values;

y: Array{Float64}

containing the y values;

roi: Array{Float64}

containing the region of interest, i.e. the places where you want to fit the baseline. For instance, if the baseline should fit the regions comprised between 750 and 800 cm^{-1}, and 1250 and 1300 cm^{-1}: roi = [750. 800.; 1250. 1300.];

basetype: AbstractString

the type of baseline that you want to use. For now, polynomial and cubic spline baselines are available. Indicate the type you want as:

Polynomial baseline: enter "poly" for basetype, then the polynomial degree as p.

Dierckx cubic spline baseline: enter "Dspline" for basetype, then the smoothing degree as p. This uses the Dierckx package: https://github.com/kbarbary/Dierckx.jl

Generalised Cross-Validated baseline: enter "gsvspline" for basetype, then the smoothing degree as p.

Kernel Ridge Regression: enter "KRregression" for basetype, no need to provide p.

Support Vector Machines regression: enter "SVMregression" for basetype, no need to provide p.

ALS algorithm: enter "als" for automatic baseline fitting following Eilers and Boelens (2005).

arPSL algorithm: enter "arPLS" for automatic baseline fitting following Baek et al. (2015).

whittaker algorithm: enter "whittaker" to use the whittaker smoother described in Eiler (2003) which fit the signal in the roi regions. See also function whitsmdd().

OPTIONS:

p:: Float64

Default = 1.0. If using gcvspline or Dspline, this number indicates the spline smoothing coefficient. If using "poly", it is the degree of the polynomial function to be fitted. Please enter a float number (1.0, 2.0 or 3.0 for splines of order 1, 2 or 3), and it is automatically converted to an Integer for the polyfit function.

For the ALS algorithm, choose p in the range 0.001 - 0.1.

For the arPLS algorithm, p corresponds to the breaking ratio parameter in the paper of Baek et al. (2015). Test different values of p, starting at high p values (~0.1).

lambda: Float64

smoothing parameter of the ALS, arPLS and whittaker algorithms, recommended values in the range 10^2 - 10^9; Default = 10^5.

SplOrder: Integer

the spline coefficient to be used with the Dspline or gcvspline options. Default = 3.

roi_out: String, "no" or "yes".

This will result in an additional output matrix containing the y signal in the roi regions of interest, which can then be used to plot and to evaluate the roi provided to the baseline function.

niter: Int

number of iterations for the ALS algorithm. Default = 10.

OUTPUTS:

(are combined in a tuple in one array if only one output variable is provided)

y_corr: Array{Float64}

the spectrum corrected from its baseline;

bass: Array{Float64}

the baseline.

OPTIONAL OUTPUT:

y_roi_out: Array{Float64}

an 2 column array containing the initial x-y pairs of the signal in the roi regions of interest.

NOTES:

Errors on measurenements are automatically provided as sqrt(y) in gcvspline. For further options, please use the gcvspl and splderivative functions that directly call the GCVSPL and SPLDER function of the gcvspl.f program (Holtring, 1986). Further informations for the use of splines are given in the Splines section, see :ref:Splines.

The Kernel Ridge and Support Vector Machines regression algorithms call the Scikit Learn library, available in Python. This library thus SHOULD be installed. They are machine learning algorithms that will try to automatically fit the baseline in the provided regions of interest. They are slower that splines, but have the advantage of avoiding the (sometimes painful) tuning of the spline coefficients.

The Kernel Ridge and Support Vector Machines regression algorithms used a Cross-Validated approach to increase the generalisation and avoid overfitting. The GridSearchCV function of SciKit Learn is called, with 5 fold cross-validation and the following gridsearch parameters:

For KRregression:

param_grid=Dict("alpha"=> [1e0, 0.1, 1e-2, 1e-3],"gamma"=> logspace(-4, 4, 9));

For SVMregression:

param_grid=Dict("C"=> [1e0, 1e1, 1e2, 1e3],"gamma"=> logspace(-4, 4, 9)).

Please see the SciKit Learn documentation at http://scikit-learn.org/stable/index.html for further details on the implementation of those technics, together with the source code of Spectra.jl.

EXAMPLES:

For instance, for subtracting a constant baseline between 1250 and 1300 cm^{-1}:

roi = [1250. 1300.]

basetype = "poly"

y_corr, bas = baseline(x,y,roi,"poly",p=0.0)

For a linear baseline,

bas = baseline(x,y,roi,"poly",p=1.0)

For a second order polynomial baseline,

bas = baseline(x,y,roi,"poly",p=2.0)

with the last coefficient will be the one in front of x^2. This can continue as you want by adding more 1.0 values to p.

For a cubic spline baseline fitting the basis of a peak centered at 1100 cm^{-1} and with basis at 900 and 1250 cm^{-1}:

roi = [890. 910.; 1250. 1300.]

bas = baseline(x,y,roi,"Dspline",p=0.01)

p there is the smoothing parameter used.

source

Frequency shifts correction

In case your spectra are shifted from a reference value, Spectra offers several functions that allows you to correct it from this shift.

To correct a spectrum from a shift of P wavenumbers, you can simply call:

# Spectra.xshift_direct — Method.

xshift_direct(original_x::Array{Float64}, original_y::Array{Float64}, p::Float64)

To correct a spectrum for a p shift in X.

Used in xshift_correction.

INPUTS:

original_x: Array{Float64}, containing x values;

original_y: Array{Float64}, containing y values associated with x;

p: Array{Float64}, containing the value of how much x should be shifted.

OUTPUTS:

original_x: Array{Float64}, same as input;

corrected_y: Array{Float64}, the y values corrected from the p shift in original_x; 

p: Array{Float64}, same as input.

source

Sometime, two signals from the same mineral show a shift in the X axis, while they share a common X axis. To correct from such thing, you can use the function:

# Spectra.xshift_correction — Method.

xshift_correction(full_x::Array{Float64}, full_shifted_y::Array{Float64}, ref_x::Array{Float64}, ref_y::Array{Float64},shifted_y::Array{Float64})

To correct a shift between two spectra using a reference peak.

INPUTS:

full_x: Array{Float64}, containing x values that are not good;

full_shifted_y: Array{Float64}, containing y values associated with full_x;

ref_x: Array{Float64}, containing x values that are good;

ref_y: Array{Float64}, containing y values associated with ref_x.

shifted_y: Array{Float64}, containing y values associated with a selected range of full_x that corresponds to ref_x (for instance, a specific peak that you want to use to correct the shift).

OUTPUTS:

full_x: Array{Float64}, same as input;

corrected_y: Array{Float64}, the full_shifted_y values corrected from the shift; 

p: Array{Float64}, same as input.

ref_x is the common X axis of two particular ref_y and shifted_y signals, that should be for instance an intense and well defined peak in your spectra. If ref_y and shifted_y do not share the same X axis, you can use first the Dierckx spline to re-sample one of them and have both sharing a common X axis. See the examples for further details.

source