rampy package

Subpackages

Submodules

rampy.baseline module

rampy.baseline.baseline(x_input, y_input, bir, method, **kwargs)

Allows subtracting a baseline under a x y spectrum.

Parameters:
  • x_input (ndarray) – x values.

  • y_input (ndarray) – y values.

  • bir (ndarray) – Contain the regions of interest, organised per line. For instance, roi = np.array([[100., 200.],[500.,600.]]) will define roi between 100 and 200 as well as between 500 and 600. Note: This is NOT used by the “als” and “arPLS” algorithms, but still is a requirement when calling the function. bir and method probably will become args in a futur iteration of rampy to solve this.

  • method (str) –

    • “poly”: polynomial fitting, with splinesmooth the degree of the polynomial.

    • ”unispline”: spline with the UnivariateSpline function of Scipy, splinesmooth is

      the spline smoothing factor (assume equal weight in the present case);

    • ”gcvspline”: spline with the gcvspl.f algorythm, really robust.

      Spectra must have x, y, ese in it, and splinesmooth is the smoothing factor; For gcvspline, if ese are not provided we assume ese = sqrt(y). Requires the installation of gcvspline with a “pip install gcvspline” call prior to use;

    • ”exp”: exponential background;

    • ”log”: logarythmic background;

    • ”rubberband”: rubberband baseline fitting;

    • ”als”: (automatic) baseline least square fitting following Eilers and Boelens 2005;

    • ”arPLS”: (automatic) Baseline correction using asymmetrically reweighted penalized least squares smoothing. Baek et al. 2015, Analyst 140: 250-257;

    • ’drPLS’: (automatic) Baseline correction method based on doubly reweighted penalized least squares. Xu et al., Applied Optics 58(14):3913-3920.

  • polynomial_order (int, optional) – The degree of the polynomial (0 for a constant), default = 1.

  • s (float, optional) – spline smoothing coefficient for the unispline and gcvspline algorithms.

  • lam (float, optional) – The lambda smoothness parameter for the ALS, ArPLS and drPLS algorithms. Typical values are between 10**2 to 10**9, default = 10**5 for ALS and ArPLS and default = 10**6 for drPLS.

  • p (float, optional) – For the ALS algorithm, advised value between 0.001 to 0.1, default = 0.01.

  • ratio (float, optional) – Ratio parameter of the arPLS and drPLS algorithm. default = 0.01 for arPLS and 0.001 for drPLS.

  • niter (int, optional) – Number of iteration of the ALS and drPLS algorithm, default = 10 for ALS and default = 100 for drPLS.

  • eta (float, optional) – Roughness parameter for the drPLS algorithm, is between 0 and 1, default = 0.5

  • p0_exp (list, optional) – containg the starting parameter for the exp baseline fit with curve_fit. Default = [1.,1.,1.].

  • p0_log (list, optional) – containg the starting parameter for the log baseline fit with curve_fit. Default = [1.,1.,1.,1.].

Returns:

  • out1 (ndarray) – Contain the corrected signal.

  • out2 (ndarray) – Contain the baseline.

rampy.baseline.get_portion_interest(x, y, bir)

Extracts the signals indicated in the bir.

Parameters:
  • x (ndarray) – the x axis

  • y (ndarray) – the y values

  • bir (n x 2 array) – the x values of regions where the signal needs to be extracted, must be a n x 2 dimension array, where n is the number of regions to extract and column 0 contains the low bounds, column 1 the high ones.

Returns:

yafit – a 2 columns x-y array containing the signals in the bir.

Return type:

ndarray

rampy.filters module

rampy.filters.smooth(x, y, method='whittaker', **kwargs)

smooth the provided y signal (sampled on x)

Parameters:
  • x (ndarray) – Nx1 array of x values (equally spaced).

  • y (ndarray) – Nx1 array of y values (equally spaced).

  • method (str) – Method for smoothing the signal; choose between savgol (Savitzky-Golay), GCVSmoothedNSpline, MSESmoothedNSpline, DOFSmoothedNSpline, whittaker, flat, hanning, hamming, bartlett, blackman.

  • window_length (int, optional) – The length of the filter window (i.e. the number of coefficients). window_length must be a positive odd integer.

  • polyorder (int, optional) – The order of the polynomial used to fit the samples. polyorder must be less than window_length.

  • Lambda (float, optional) – smoothing parameter of the Whittaker filter described in Eilers (2003). The higher the smoother the fit.

  • d (int, optional) – d parameter in Whittaker filter, see Eilers (2003).

  • ese_y (ndarray, optional) – errors associated with y (for the gcvspline algorithms)

Returns:

y_smo – smoothed signal sampled on x.

Return type:

ndarray

Notes

Use of GCVSmoothedNSpline, MSESmoothedNSpline, DOFSmoothedNSpline requires installation of gcvspline. See gcvspline documentation. See also documentation for details on GCVSmoothedNSpline, MSESmoothedNSpline, DOFSmoothedNSpline.

savgol uses the scipy.signal.savgol_filter() function.

References

Eilers, P.H.C., 2003. A Perfect Smoother. Anal. Chem. 75, 3631–3636. https://doi.org/10.1021/ac034173t

Scipy Cookbook: https://scipy-cookbook.readthedocs.io/items/SignalSmooth.html?highlight=smooth

rampy.filters.spectrafilter(spectre, filtertype, fq, numtaps, columns)

Filter specific frequencies in spectra with a butterworth filter

Parameters:
  • spectre (ndarray) – Array of X-Y values of spectra. First column is X and subsequent n columns are Y values of n spectra. (see also spectraarray function)

  • filtertype (string) – type of filter; Choose between ‘low’, ‘high’, ‘bandstop’, ‘bandpass’.

  • fq (ndarray) – Frequency of the periodic signal you try to erase. If using a bandpass or band stop filter, fq must be an array containing the cutoff frequencies.

  • columns (ndarray) – An array defining which columns to treat.

Returns:

out – filtered signals.

Return type:

ndarray

rampy.filters.whittaker(y, **kwargs)

smooth a signal with the Whittaker smoother

Parameters:
  • y (ndarray) – An array with the values to smooth (equally spaced).

  • Lambda (float, optional) – The smoothing coefficient, the higher the smoother. Default = 10^5.

Returns:

z – An array containing the smoothed values.

Return type:

ndarray

References

      1. Eilers, A Perfect Smoother. Anal. Chem. 75, 3631–3636 (2003).

rampy.functions module

rampy.functions.constant(x, a)

returns a constant value

Parameters:

x (1D array)

Returns:

y – array filled with a values

Return type:

1D array

rampy.functions.difffull(x1, x2, t, C0, C1, D)

Equation for the diffusion into a full slab, see Crank 1975

Here we assume the profil to have 2 surfaces of contact on each side

Parameters:
  • C0 (float) – the concentration in the core

  • C1 (float) – the concentration at the border

  • D (float) – the diffusion coefficient in log10 unit, m^2.s^-1

  • x2 (x1 and) – the profil lengths from beginning and end respectively, in meters

  • t (float) – time in seconds

rampy.functions.diffshort(x, t, C0, C1, D)

1D equation for the diffusion into a semi-infinite slab, see Crank 1975

Parameters:
  • C0 (float) – the concentration in the core

  • C1 (float) – the concentration at the border

  • D (float) – the diffusion coefficient in log10 unit, m^2.s^-1

  • x (1D array) – the profil length in meters

  • t (float) – time in seconds

Returns:

Cx – concentration at x

Return type:

1D array

rampy.functions.funexp(x, a, b, c)

exponential baseline function

a*exp(b*(x-c))

rampy.functions.funlog(x, a, b, c, d)

log baseline function

a * ln(-b *(x-c)) - d*x**2

rampy.functions.gauss_lsq(params, x)

predicts a sum of gaussian peaks with parameters params

Parameters:
  • params (1D array) – an array of the parameters of the peaks. The number of peaks is assumed to be equal to len(params)/3. In this array, list intensities first, then all peak positions, then all peak half width at half maximum.

  • x (1D array) – x axis

Returns:

y – y values at position x

Return type:

1D array

rampy.functions.gauss_lsq_lfix(params, x)

predicts a sum of gaussian peaks with parameters params

Assumes that all peaks share the same HWHM.

Parameters:
  • params (1D array) – an array of the parameters of the peaks. The number of peaks is assumed to be equal to len(params)/3. In this array, list intensities first, then all peak positions, then the last element is the peaks’ half width at half maximum.

  • x (1D array) – x axis

Returns:

y – y values at position x

Return type:

1D array

rampy.functions.linear(x, a, b)

returns a + b*x

rampy.functions.linear0(x, a)

returns a*x

rampy.functions.multigaussian(x, params)

old attempt to have a multigaussian function, do not use. Will be removed soon.

rampy.functions.poly2(x, a, b, c)

returns a + b*x + c*x*x

rampy.maps module

class rampy.maps.maps(file, spectrometer_type='horiba', map_type='2D')

Bases: object

treat maps of Raman spectra

Parameters:
  • file (str) – filename, including path

  • spectrometer_type (str) – type of spectrometer, choose between “horiba” or “renishaw”, default: ‘horiba’

  • map_type (str) – type of map, choose between “2D” or “1D”, default: ‘2D’

area(y, region_to_investigate)

get the area under the curve in the region to investigate.

The area is calculated by trapezoidal integration, using np.trapz() Do not forget to smooth the signal if necessary prior to using this.

Parameters:
  • y (object intensities) – the intensities to consider. For instance, pass self.normalised for performing the calculation on normalised spectra.

  • region_to_investigate (1x2 array) – the x values of regions where the area will be measured

Returns:

self.A – maximum to make a nice plot

Return type:

ndarray

area_ratio(y, region_to_investigate)

get the area ratio between two regions of interest.

The areas are calculated by trapezoidal integration, using np.trapz() Do not forget to smooth the signals if necessary prior to using this.

Parameters:
  • y (object intensities) – the intensities to consider. For instance, pass self.normalised for performing the calculation on normalised spectra.

  • region_to_investigate (1x2 array) – the x values of regions where the areas will be measured. The two lines record the two regions of interest.

Returns:

self.A_ratio – Area ratio = area region 1 / area region 2

Return type:

ndarray

background(bir, method='poly', **kwargs)

correct a background from the initial signal I on a map using rampy.baseline

Parameters:
  • bir (ndarray) – arrays of the backgroudn interpolation regions.

  • method (string) – see rampy.baseline documentation for methods available. Default is polynomial

  • there. (All kwargs argument for rampy.baseline() will be forwarded and can be used)

Return type:

Background and corrected spectra area available at self.background and self.I_corrected

centroid(y, region_to_investigate)

calculate the centroid in a given region of interest

Parameters:
  • y (object intensities) – the intensities to normalise. For instance, pass self.normalised for performing the calculation on normalised spectra.

  • region_to_investigate (1x2 array) – the x values of regions where the centroid will be measured

Returns:

self.centroid_position – centroid position for the map

Return type:

ndarray

intensity(y, region_to_investigate)

get the maximum intensity in the region to investigate.

The intensity maximum is estimated from a simple np.max() search. Do not forget to smooth the signal if necessary prior to using this.

Parameters:
  • y (object intensities) – the intensities to consider. For instance, pass self.normalised for performing the calculation on normalised spectra.

  • region_to_investigate (1x2 array) – the x values of regions where the intensity will be measured

Returns:

self.I_max – Intensity maximum

Return type:

ndarray

intensity_ratio(y, region_to_investigate)

get the intensity ratio between two regions of interest.

The intensity maxima are estimated from a simple np.max() search. Do not forget to smooth the signals if necessary prior to using this. :param y: the intensities to consider. For instance, pass self.normalised for performing the calculation on normalised spectra. :type y: object intensities :param region_to_investigate: the x values of regions where the intensity ratios will be measured. The two lines record the two regions of interest. :type region_to_investigate: 2x2 array

Returns:

self.I_ratio – Intensity ratio

Return type:

ndarray

normalise(y, method='intensity')

normalise the spectra to their max intensity, their area or min-max normalisation

This uses the internals of rampy.normalise.

Parameters:
  • y (object intensities) – the intensities to normalise. For instance, if you want to normalised the background corrected I, pass self.I_corrected.

  • method (string) – method used, choose between area, intensity, minmax

Return type:

The normalised spectra are available at self.I_normalised

smooth(y, method='whittaker', **kwargs)

uses the smooth function of rampy to smooth the signals :param y: the intensities to normalise. For instance, if you want to normalised the background corrected I, pass self.I_corrected. :type y: object intensities :param method: Method for smoothing the signal;

choose between savgol (Savitzky-Golay), GCVSmoothedNSpline, MSESmoothedNSpline, DOFSmoothedNSpline, whittaker, flat, hanning, hamming, bartlett, blackman.

Parameters:
  • window_length (int, optional) – The length of the filter window (i.e. the number of coefficients). window_length must be a positive odd integer.

  • polyorder (int, optional) – The order of the polynomial used to fit the samples. polyorder must be less than window_length.

  • Lambda (float, optional) – smoothing parameter of the Whittaker filter described in Eilers (2003). The higher the smoother the fit.

  • d (int, optional) – d parameter in Whittaker filter, see Eilers (2003).

  • ese_y (ndarray, optional) – errors associated with y (for the gcvspline algorithms)

Returns:

self.y_smoothed – the smoothed signal for the map

Return type:

ndarray

rampy.maps.peak(X, Y, lambdas, intensities, function, Xrange, amp, Xmean, sigma, y0, A)

to fit peaks in a map. Work in progress.

rampy.maps.read_horiba(file, map_type='2D')

read Horiba csv maps (1D, 2D)

Parameters:
  • file (str) – filename, including path

  • map_type (str) – 1D map (line) or 2D map, default: 2D

Returns:

  • X (m by n array) – X positions

  • Y (m by n array) – Y position

  • lambdas (n array) – Raman shift

  • intensities (m by n array) – Intensities

rampy.maps.read_renishaw(file)

read Renishaw csv maps

Parameters:

file (str) – filename, including path

Returns:

  • X (m by n array) – X positions

  • Y (m by n array) – Y position

  • lambdas (m by n array) – Raman shift

  • intensities (m by n array) – Intensities

rampy.mixing module

rampy.mixing.mixing_sp(y_fit, ref1, ref2)

mix two reference spectra to match the given ones

Parameters:
  • y_fit (ndarray, shape m * n) – an array containing the signals with m datapoints and n experiments

  • ref1 (ndarray, shape m) – an array containing the first reference signal

  • ref2 (ndarray, shape m) – an array containing the second reference signal

Returns:

out – the fractions of ref1 in the mix

Return type:

ndarray, shape n

Notes

Performs the calculation by minimizing the sum of the least absolute value of the objective function:

obj = sum(abs(y_fit-(ref1*F1 + ref2*(1-F1))))

Uses cvxpy to perform this calculation

rampy.ml_classification module

class rampy.ml_classification.mlclassificator(x, y, **kwargs)

Bases: object

use machine learning algorithms from scikit learn to perform classification of spectra.

x

Spectra; n_features = n_frequencies.

Type:

{array-like, sparse matrix}, shape = (n_samples, n_features)

y

numeric labels.

Type:

array, shape = (n_samples,)

X_test

spectra organised in rows (1 row = one spectrum) that you want to use as a testing dataset. THose spectra should not be present in the x (training) dataset. The spectra should share a common X axis.

Type:

{array-like, sparse matrix}, shape = (n_samples, n_features)

y_test

numeric labels that you want to use as a testing dataset. Those targets should not be present in the y (training) dataset.

Type:

array, shape = (n_samples,)

algorithm
“Nearest Neighbors”, “Linear SVM”, “RBF SVM”, “Gaussian Process”,

“Decision Tree”, “Random Forest”, “Neural Net”, “AdaBoost”, “Naive Bayes”, “QDA”

Type:

String,

scaling

True or False. If True, data will be scaled during fitting and prediction with the requested scaler (see below),

Type:

Bool

scaler

the type of scaling performed. Choose between MinMaxScaler or StandardScaler, see http://scikit-learn.org/stable/modules/preprocessing.html for details. Default = “MinMaxScaler”.

Type:

String

test_size

the fraction of the dataset to use as a testing dataset; only used if X_test and y_test are not provided.

Type:

float

rand_state

the random seed that is used for reproductibility of the results. Default = 42.

Type:

Float64

params_

contain the values of the hyperparameters that should be provided to the algorithm. See scikit-learn documentation for details for each algorithm.

Type:

Dictionary

prediction_train

the predicted target values for the training y dataset.

Type:

Array{Float64}

prediction_test

the predicted target values for the testing y_test dataset.

Type:

Array{Float64}

model

A Scikit Learn object model, see scikit learn library documentation.

Type:

Scikit learn model

X_scaler

A Scikit Learn scaler object for the x values.

Type:

scikit learn scaler

Y_scaler

A Scikit Learn scaler object for the y values.

Type:

scikit learn scaler

Example

Given an array X of n samples by m frequencies, and Y an array of n x 1 concentrations >>> model = rampy.mlclassificator(X,y) >>> model.algorithm(“SVC”) >>> model.user_kernel = ‘poly’ >>> model.fit() >>> y_new = model.predict(X_new)

Remarks

For details on hyperparameters of each algorithms, please directly consult the documentation of SciKit Learn at: http://scikit-learn.org/stable/

In progress

fit()

Scale data and train the model with the indicated algorithm.

Do not forget to tune the hyperparameters.

Parameters:
  • algorithm (String) – algorithm to use. Choose between “Nearest Neighbors”, “Linear SVM”, “RBF SVM”, “Gaussian Process”,

  • Tree" ("Decision)

  • Forest" ("Random)

  • Net" ("Neural)

  • "AdaBoost"

  • Bayes" ("Naive)

  • "QDA"

predict(X)

Predict using the model.

Parameters:

X ({array-like, sparse matrix}, shape = (n_samples, n_features)) – Samples.

Returns:

  • C (array, shape = (n_samples,)) – Returns predicted values.

  • Remark

  • ——

  • if self.scaling == “yes”, scaling will be performed on the input X.

refit()

Re-train a model previously trained with fit()

rampy.ml_exploration module

class rampy.ml_exploration.mlexplorer(x, **kwargs)

Bases: object

use machine learning algorithms from scikit learn to explore spectroscopic datasets

Performs automatic scaling and train/test split before NMF or PCA fit.

x

Spectra; n_features = n_frequencies.

Type:

{array-like, sparse matrix}, shape = (n_samples, n_features)

X_test

spectra organised in rows (1 row = one spectrum) that you want to use as a testing dataset. THose spectra should not be present in the x (training) dataset. The spectra should share a common X axis.

Type:

{array-like, sparse matrix}, shape = (n_samples, n_features)

algorithm

“PCA”, “NMF”, default = “PCA”

Type:

String,

scaling

True or False. If True, data will be scaled prior to fitting (see below),

Type:

Bool

scaler

the type of scaling performed. Choose between MinMaxScaler or StandardScaler, see http://scikit-learn.org/stable/modules/preprocessing.html for details. Default = “MinMaxScaler”.

Type:

String

test_size

the fraction of the dataset to use as a testing dataset; only used if X_test and y_test are not provided.

Type:

float

rand_state

the random seed that is used for reproductibility of the results. Default = 42.

Type:

Float64

model

A Scikit Learn object model, see scikit learn library documentation.

Type:

Scikit learn model

Remarks
-------
For details on hyperparameters of each algorithms, please directly consult the documentation of SciKit Learn at
http
Type:

//scikit-learn.org/stable/

Results for machine learning algorithms can vary from run to run. A way to solve that is to fix the random_state.

Example

Given an array X of n samples by m frequencies, and Y an array of n x 1 concentrations

>>> explo = rampy.mlexplorer(X) # X is an array of signals built by mixing two partial components
>>> explo.algorithm = 'NMF' # using Non-Negative Matrix factorization
>>> explo.nb_compo = 2 # number of components to use
>>> explo.test_size = 0.3 # size of test set
>>> explo.scaler = "MinMax" # scaler
>>> explo.fit() # fitting!
>>> W = explo.model.transform(explo.X_train_sc) # getting the mixture array
>>> H = explo.X_scaler.inverse_transform(explo.model.components_) # components in the original space
>>> plt.plot(X,H.T) # plot the two components
fit()

Train the model with the indicated algorithm.

Do not forget to tune the hyperparameters.

predict(X)

Predict using the model.

Parameters:

X ({array-like, sparse matrix}, shape = (n_samples, n_features)) – Samples.

Returns:

  • C (array, shape = (n_samples,)) – Returns predicted values.

  • Remark

  • ——

  • if self.scaling == “yes”, scaling will be performed on the input X.

refit()

Train the model with the indicated algorithm.

Do not forget to tune the hyperparameters.

rampy.ml_regressor module

rampy.ml_regressor.chemical_splitting(Pandas_DataFrame, target, split_fraction=0.3, rand_state=42)

split datasets depending on their chemistry

Parameters:
  • Pandas_DataFrame (Pandas DataFrame) – The input DataFrame with in the first row the names of the different data compositions

  • label (string) – The target in the DataFrame according to which we will split the dataset

  • split_fraction (float, between 0 and 1) – This is the amount of splitting you want, in reference to the second output dataset (see OUTPUTS).

  • rand_state (float64) – the random seed that is used for reproductibility of the results. Default = 42.

Returns:

  • frame1 (Pandas DataFrame) – A DataSet with (1-split_fraction) datas from the input dataset with unique chemical composition / names

  • frame2 (Pandas DataFrame) – A DataSet with split_fraction datas from the input dataset with unique chemical composition / names

  • frame1_idx (ndarray) – Contains the indexes of the data picked in Pandas_DataFrame to construct frame1

  • frame2_idx (ndarray) – Contains the indexes of the data picked in Pandas_DataFrame to construct frame2

Notes

This function avoids the same chemical dataset to be found in different training/testing/validating datasets that are used in ML.

Indeed, it is worthless to put data from the same original dataset / with the same chemical composition in the training / testing / validating datasets. This creates a initial bias in the splitting process…

Another way of doing that would be to write:

>>> grouped = Pandas_DataFrame.groupby(by='label')
>>> k = [i for i in grouped.groups.keys()]
>>> k_train, k_valid = model_selection.train_test_split(np.array(k),test_size=0.40,random_state=100)
>>> train = Pandas_DataFrame.loc[Pandas_DataFrame['label'].isin(k_train)]
>>> valid = Pandas_DataFrame.loc[Pandas_DataFrame['label'].isin(k_valid)]

(results will vary slightly as variable k is sorted but not variable names in the function below)

class rampy.ml_regressor.mlregressor(x, y, **kwargs)

Bases: object

use machine learning algorithms from scikit learn to perform regression between spectra and an observed variable.

x

Spectra; n_features = n_frequencies.

Type:

{array-like, sparse matrix}, shape = (n_samples, n_features)

y

Returns predicted values.

Type:

array, shape = (n_samples,)

X_test

spectra organised in rows (1 row = one spectrum) that you want to use as a testing dataset. THose spectra should not be present in the x (training) dataset. The spectra should share a common X axis.

Type:

{array-like, sparse matrix}, shape = (n_samples, n_features)

y_test

the target that you want to use as a testing dataset. Those targets should not be present in the y (training) dataset.

Type:

array, shape = (n_samples,)

algorithm

“KernelRidge”, “SVM”, “LinearRegression”, “Lasso”, “ElasticNet”, “NeuralNet”, “BaggingNeuralNet”, default = “SVM”

Type:

String,

scaling

True or False. If True, data will be scaled during fitting and prediction with the requested scaler (see below),

Type:

Bool

scaler

the type of scaling performed. Choose between MinMaxScaler or StandardScaler, see http://scikit-learn.org/stable/modules/preprocessing.html for details. Default = “MinMaxScaler”.

Type:

String

test_size

the fraction of the dataset to use as a testing dataset; only used if X_test and y_test are not provided.

Type:

float

rand_state

the random seed that is used for reproductibility of the results. Default = 42.

Type:

Float64

param_kr

contain the values of the hyperparameters that should be provided to KernelRidge and GridSearch for the Kernel Ridge regression algorithm.

Type:

Dictionary

param_svm

containg the values of the hyperparameters that should be provided to SVM and GridSearch for the Support Vector regression algorithm.

Type:

Dictionary

param_neurons

contains the parameters for the Neural Network (MLPregressor model in sklearn). Default= dict(hidden_layer_sizes=(3,),solver = ‘lbfgs’,activation=’relu’,early_stopping=True)

Type:

Dictionary

param_bagging

contains the parameters for the BaggingRegressor sklearn function that uses a MLPregressor base method. Default= dict(n_estimators=100, max_samples=1.0, max_features=1.0, bootstrap=True,

bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=1, random_state=rand_state, verbose=0)

Type:

Dictionary

prediction_train

the predicted target values for the training y dataset.

Type:

Array{Float64}

prediction_test

the predicted target values for the testing y_test dataset.

Type:

Array{Float64}

model

A Scikit Learn object model, see scikit learn library documentation.

Type:

Scikit learn model

X_scaler

A Scikit Learn scaler object for the x values.

Y_scaler

A Scikit Learn scaler object for the y values.

Example

Given an array X of n samples by m frequencies, and Y an array of n x 1 concentrations

>>> model = rampy.mlregressor(X,y)
>>> model.algorithm("SVM")
>>> model.user_kernel = 'poly'
>>> model.fit()
>>> y_new = model.predict(X_new)

Remarks

For details on hyperparameters of each algorithms, please directly consult the documentation of SciKit Learn at:

http://scikit-learn.org/stable/

For Support Vector and Kernel Ridge regressions, mlregressor performs a cross_validation search with using 5 KFold cross validators.

If the results are poor with Support Vector and Kernel Ridge regressions, you will have to tune the param_grid_kr or param_grid_svm dictionnary that records the hyperparameter space to investigate during the cross validation.

Results for machine learning algorithms can vary from run to run. A way to solve that is to fix the random_state. For neural nets, results from multiple neural nets (bagging technique) may also generalise better, such that it may be better to use the BaggingNeuralNet function.

fit()

Scale data and train the model with the indicated algorithm.

Do not forget to tune the hyperparameters.

Parameters:

algorithm (String,) – algorithm to use. Choose between “KernelRidge”, “SVM”, “LinearRegression”, “Lasso”, “ElasticNet”, “NeuralNet”, “BaggingNeuralNet”, default = “SVM”

predict(X)

Predict using the model.

Parameters:

X ({array-like, sparse matrix}, shape = (n_samples, n_features)) – Samples.

Returns:

  • C (array, shape = (n_samples,)) – Returns predicted values.

  • Remark

  • ——

  • if self.scaling == “yes”, scaling will be performed on the input X.

refit()

Re-train a model previously trained with fit()

rampy.peak_area module

rampy.peak_area.gaussianarea(amp, HWHM, **options)

returns the area of a Gaussian peak

Parameters:
  • amp (float or ndarray) – amplitude of the peak

  • HWHM (float or ndarray) – half-width at half-maximum

  • eseAmplitude (float or ndarray, optional) – standard deviation on amp; Default = None

  • eseHWHM (float or ndarray, optional) – standard deviation on HWHM; Default = None

Returns:

  • area (float or ndarray) – peak area

  • esearea (float or ndarray) – error on peak area; will be None if no errors on amp and HWHM were provided.

rampy.peak_area.peakarea(shape, **options)

returns the area of a peak

(!experimental!)

gaussian peak area is calculated analytically; areas for other peak shapes are calculated using trapezoidal integration.

Parameters:
  • shape (string) – gaussian, lorentzian, pseudovoigt or pearson7

  • amp (float or ndarray) – amplitude of the peak

  • pos (float or ndarray) – peak position

  • HWHM (float or ndarray) – half-width at half-maximum

  • a3 (float or ndarray) – a3 parameters for pearson7

  • eseAmplitude (float or ndarray) – standard deviation on amp; Default = None

  • eseHWHM (float or ndarray) – standard deviation on HWHM; Default = None

Returns:

  • area (float or ndarray) – peak area

  • esearea (float or ndarray) – error on peak area; will be None if no errors on amp and HWHM were provided.

rampy.peak_shapes module

rampy.peak_shapes.create_gauss()
rampy.peak_shapes.create_lorenz()
rampy.peak_shapes.gaussian(x, amp, freq, HWHM)

compute a Gaussian peak

Parameters:
  • x (ndarray) – the positions at which the signal should be sampled

  • amp (float or ndarray with size equal to x.shape) – amplitude

  • freq (float or ndarray with size equal to x.shape) – frequency/position of the Gaussian component

  • HWHM (float or ndarray with size equal to x.shape) – half-width at half-maximum

Returns:

  • out (ndarray) – the signal

  • Remarks

  • ——-

  • Formula is amp*np.exp(-np.log(2)*((x-freq)/HWHM)**2)

rampy.peak_shapes.lorentzian(x, amp, freq, HWHM)

compute a Lorentzian peak

Parameters:
  • x (ndarray) – the positions at which the signal should be sampled

  • amp (float or ndarray with size equal to x.shape) – amplitude

  • freq (float or ndarray with size equal to x.shape) – frequency/position of the Gaussian component

  • HWHM (float or ndarray with size equal to x.shape) – half-width at half-maximum

Returns:

  • out (ndarray) – the signal

  • Remarks

  • ——-

  • Formula is amp/(1+((x-freq)/HWHM)**2)

rampy.peak_shapes.pearson7(x, a0, a1, a2, a3)

compute a Peason7 peak

Parameters:
  • x (ndarray) – the positions at which the signal should be sampled

  • a0 (float or ndarrays of size equal to x.shape) – parameters of the Pearson7 equation

  • a1 (float or ndarrays of size equal to x.shape) – parameters of the Pearson7 equation

  • a2 (float or ndarrays of size equal to x.shape) – parameters of the Pearson7 equation

  • a3 (float or ndarrays of size equal to x.shape) – parameters of the Pearson7 equation

Returns:

  • out (ndarray) – the signal

  • Remarks

  • ——-

  • Formula is a0 / ( (1.0 + ((x-a1)/a2)**2.0 * (2.0**(1.0/a3) -1.0))**a3 )

rampy.peak_shapes.pseudovoigt(x, amp, freq, HWHM, L_ratio)

compute a pseudo-Voigt peak

Parameters:
  • x (ndarray) – the positions at which the signal should be sampled. Can be provided as vector, nx1 or nxm array.

  • amp (float or ndarray with size equal to x.shape) – amplitude

  • freq (float or ndarray with size equal to x.shape) – frequency/position of the Gaussian component

  • HWHM (float or ndarray with size equal to x.shape) – half-width at half-maximum

  • L_ratio (float or ndarray with size equal to x.shape) – ratio of the Lorentzian component, should be between 0 and 1 (included)

Returns:

  • out (ndarray of size equal to x.shape) – the signal

  • Remarks

  • ——-

  • Formula is (1-L_ratio)*gaussian(amp,freq,HWHM) + L_ratio*lorentzian(amp,freq,HWHM)

rampy.rameau module

rampy.rameau.DG2017_calibrate(dictio)

Fit a calibration by optimizing the K coefficient in the DG2017 method

Parameters:

dictio (dictionary) – dictionary with arrays named “feo”, “rws” and “water”.

Returns:

popt – The optimize a and b parameters of the equation K = a * [FeO wt%] + b.

Return type:

ndarray

rampy.rameau.DG2017_predict(dictio, a=0.096, b=0.663)

Calculate the K coefficient for the DG2017 method.

Parameters:
  • dictio (dict) – a dictionary with ndarrays named “feo” and “rws”

  • b (a and) – factors in the equation: K = a * [FeO wt%] + b; default values from Di Genova et al. (2017)

Returns:

H2O (wt %) – The water content of the glasses calculated as Rws * (a * [FeO wt%] + b)

Return type:

ndarray

rampy.rameau.LL2012_calibrate(dictio)

Fit a calibration line following equations (2) and (3) from Le Losq et al. (2012)

Parameters:

dictio – dictionary with arrays named “feo”, “rws” and “water”.

Returns:

A – The parameter in the equation (3) of Le Losq et al. (2012).

Return type:

float

rampy.rameau.LL2012_predict(dictio, A=0.007609)

Predict the water content using the equation (3) from Le Losq et al. (2012)

Parameters:

dictio (dict) – a dictionary with ndarray named “rws”

Returns:

The glass water contents in wt%

Return type:

H2O

rampy.rameau.fit_spectra(data_liste, method='LL2012', delim='\t', path_in='./raw/', laser=514.532, spline_coeff=0.001, poly_coeff=3)

Calculate the ratios of water and silicate signals from Raman spectra

Parameters:
  • data_liste (Pandas DataFrame) – Contains the list of spectra, see provided file as an example

  • method (string) – The used method. LL2012: Le Losq et al. (2012); DG2017: Di Genova et al. (2017). See references.

  • delim (string) – File delimiter. Use ‘ ‘ for tabulated text or ‘,’ for comma separated text.

  • path_in (string) – Path for the spectra

  • laser (float) – Laser line wavelength in nm

  • spline_coeff (float) – Smoothing coefficient for the spline baseline. An array of size len(data_liste) can be provided. Default = 0.001.

  • poly_coeff (int) – Polynomial coefficient for the polynomial baseline function. Default = 3 (DG2017 method). Set to 2 for Behrens et al. (2006) method.

Returns:

  • x (ndarray) – Common x axis.

  • y_all (ndarray) – All raw spectra from data_liste in an array of length len(x) and with as many column as spectra.

  • y_all_corr (ndarray) – All corrected spectra from data_liste in an array of length len(x) and with as many column as spectra.

  • y_all_base (ndarray) – All baselines for spectra from data_liste in an array of length len(x) and with as many column as spectra.

  • rws (ndarray) – The ratio of the water integrated intensity over that of silicate signals.

  • rw (ndarray) – The integrated intensity of water signal.

  • rs (ndarray) – The integrated intensity of silicate signals.

Raises:

IOError – If method is not set to LL2012 or DG2017.

References

  1. Le Losq, D. R. Neuville, R. Moretti, J. Roux, Determination of water content in silicate glasses using Raman spectrometry: Implications for the study of explosive volcanism. American Mineralogist. 97, 779–790 (2012).

  2. Di Genova et al., Effect of iron and nanolites on Raman spectra of volcanic glasses: A reassessment of existing strategies to estimate the water content. Chemical Geology. 475, 76–86 (2017).

class rampy.rameau.rameau(data_liste)

Bases: object

treat Raman spectra of glass to retrieve the glass water content

Parameters:

data_liste (Pandas dataframe) – A Pandas dataframe containing the data and various meta information.

x

a 1D array (Nx1) containing the common x axis (wavelength) of the spectra.

Type:

ndarray

y

a NxM array (with M the number of spectra) containing the raw intensities of the spectra.

Type:

ndarray

y_corr

a NxM array (with M the number of spectra) containing the corrected intensities of the spectra.

Type:

ndarray

y_base

a NxM array (with M the number of spectra) containing the backgrounds of the spectra.

Type:

ndarray

rws

a 1D array (Nx1) containing the ratio between the integrated intensities of the water and silicate signals.

Type:

ndarray

rw

a 1D array (Nx1) containing the integrated intensities of the water signal.

Type:

ndarray

rs

a 1D array (Nx1) containing the integrated intensities of the silicate signals.

water

the known glass water content provided in data_liste (set to 0 if predicting for unknowns)

water_predicted

the predicted glass water content provided in data_liste (set to 0 if predicting for unknowns)

p

calibration coefficient(s) of the LL2012 or DG2017 method

Type:

ndarray

names

filenames indicated in the data_liste input

Type:

pandas dataframe

Notes

Uses either the LL2012 method (Le Losq et al., 2012) or the DG2017 (Di Genova et al., 2017) method. See references.

In the LL2012 method, a cubic spline is fitted to the regions of interest provided in self.data_liste (see example). The spline is smoothed by the spline_coeff of the data_reduction method. The water content is calculated following eq. (3) of LL2012, with the A coefficient either provided or calculated by the method self.calibrate().

In the DG2017 method, a third-order polynomial is fitted to the spectra following the instructions of Di Genova et al. (2017). The water content is calculated as wt% H2O = Rws * (a * [FeO wt%] + b) with a and b the coefficients either provided or calculated by the method self.calibrate().

References

LL2102: C. Le Losq, D. R. Neuville, R. Moretti, J. Roux, Determination of water content in silicate glasses using Raman spectrometry: Implications for the study of explosive volcanism. American Mineralogist. 97, 779–790 (2012). DG 2017 D. Di Genova et al., Effect of iron and nanolites on Raman spectra of volcanic glasses: A reassessment of existing strategies to estimate the water content. Chemical Geology. 475, 76–86 (2017).

calibrate(method='LL2012')

Fit a calibration by optimizing the K coefficient(s)

Parameters:
  • self (object) – rameau object with treated spectra (see data_reduction method)

  • method (string) – the method used; choose between “LL2012” or “DG2017”, default = “LL2012”.

Returns:

popt – The optimized parameter(s); if method = “DG2017”, popt=np.array([a,b]), parameters of the equation K = a * [FeO wt%] + b. if method = “LL2017”, popt = A (float), with A parameter in the equation (3) of Le Losq et al. (2012).

Return type:

ndarray or float

data_reduction(method='LL2012', delim='\t', path_in='./raw/', laser=514.532, spline_coeff=0.001, poly_coeff=3)

process Raman spectra of glass to calculate the Rws ratio

Parameters:
  • self (object) – a rameau object that has been initiated.

  • method (string) – The used method. LL2012: Le Losq et al. (2012); DG2017: Di Genova et al. (2017). See references. Default = “LL2012”.

  • delim (string) – File delimiter. Use ‘ ‘ for tabulated text or ‘,’ for comma separated text. Default = ‘ ‘.

  • path_in (string) – Path for the spectra. Default = ‘./raw/’

  • laser (float) – Laser line wavelength in nm. Default = 514.532.

  • spline_coeff (float) – Smoothing coefficient for the spline baseline. An array of size len(data_liste) can be provided. Default = 0.001.

  • poly_coeff (int) – Polynomial coefficient for the polynomial baseline function. Default = 3 (DG2017 method; set to 2 for Behrens et al. (2006) method).

Returns:

  • self.x (ndarray) – Common x axis.

  • self.y_all (ndarray) – All raw spectra from data_liste in an array of length len(x) and with as many column as spectra.

  • self.y_all_corr (ndarray) – All corrected spectra from data_liste in an array of length len(x) and with as many column as spectra.

  • self.y_all_base (ndarray) – All baselines for spectra from data_liste in an array of length len(x) and with as many column as spectra.

  • self.rws (ndarray) – The ratio of the water integrated intensity over that of silicate signals.

  • self.rw (ndarray) – The integrated intensity of water signal.

  • self.rs (ndarray) – The integrated intensity of silicate signals.

names = []
p = None
predict(method='LL2012')

predict the water content from the Rws

Parameters:
  • self (object) – rameau object with treated spectra (see data_reduction method).

  • method (string) – the method used; choose between “LL2012” or “DG2017”, default = “LL2012”.

Returns:

H2O – The glass water contents in wt%

Return type:

array

rs = []
rw = []
rws = []
water = []
water_predicted = []
x = []
y = []
y_base = []
y_corr = []

rampy.spectranization module

rampy.spectranization.centroid(x, y, smoothing=False, **kwargs)

calculation of y signal centroid(s)

as np.sum(y/np.sum(y)*x)

Parameters:
  • x (Numpy array, m values by n samples) – x values

  • y (Numpy array, m values by n samples) – y values

  • Options

  • =======

  • smoothing (bool) – True or False. Smooth the signals with arguments provided as kwargs. Default method is whittaker smoothing. See the rampy.smooth function for smoothing options and arguments.

Returns:

centroid – signal centroid(s)

Return type:

Numpy array, n samples

rampy.spectranization.despiking(x, y, neigh=4, threshold=3)

remove spikes from the y 1D signal given a threeshold

This function smooths the spectra, calculates the residual error RMSE and remove points above threshold*RMSE using the neighboring points

Parameters:
  • x (1D array) – signal to despike

  • y (1D array) – signal to despike

  • neigh (int) – numbers of points around the spikes to select for calculating average value for despiking

  • threshold (int) – multiplier of sigma, default = 3

Returns:

y – the signal without spikes

Return type:

1D array

rampy.spectranization.flipsp(sp)

Flip an array along the row dimension (dim = 1) if the row values are in decreasing order.

Parameters:

sp (ndarray) – An array with n columns, the first one should contain the X axis (frequency, wavenumber, etc.)

Returns:

sp – The same array but sorted such that the values in the first column are in increasing order.

Return type:

ndarray

rampy.spectranization.normalise(y, x=0, method='intensity')

normalise y signal(s)

Parameters:
  • x (ndarray, m values by n samples) – x values

  • y (ndarray, m values by n samples) – corresponding y values

  • method (string) – method used, choose between area, intensity, minmax

Returns:

y_norm – Normalised signal(s)

Return type:

Numpy array

rampy.spectranization.resample(x, y, x_new, **kwargs)

Resample a y signal associated with x, along the x_new values.

Parameters:
  • x (ndarray) – The x values

  • y (ndarray) – The y values

  • x_new (ndarray) – The new X values

  • kind (str or int, optional) – Specifies the kind of interpolation as a string (‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, ‘next’, where ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point) or as an integer specifying the order of the spline interpolator to use. Default is ‘linear’.

  • axis (int, optional) – Specifies the axis of y along which to interpolate. Interpolation defaults to the last axis of y.

  • copy (bool, optional) – If True, the class makes internal copies of x and y. If False, references to x and y are used. The default is to copy.

  • bounds_error (bool, optional) – If True, a ValueError is raised any time interpolation is attempted on a value outside of the range of x (where extrapolation is necessary). If False, out of bounds values are assigned fill_value. By default, an error is raised unless fill_value=”extrapolate”.

  • fill_value (array-like or (array-like, array_like) or “extrapolate”, optional) –

    if a ndarray (or float), this value will be used to fill in for requested points outside of the data range. If not provided, then the default is NaN. The array-like must broadcast properly to the dimensions of the non-interpolation axes. If a two-element tuple, then the first element is used as a fill value for x_new < x[0] and the second element is used for x_new > x[-1]. Anything that is not a 2-element tuple (e.g., list or ndarray, regardless of shape) is taken to be a single array-like argument meant to be used for both bounds as below, above = fill_value, fill_value.

    New in scipy version 0.17.0.

    If “extrapolate”, then points outside the data range will be extrapolated.

    New in scipy version 0.17.0.

  • assume_sorted (bool, optional) – If False, values of x can be in any order and they are sorted first. If True, x has to be an array of monotonically increasing values.

Returns:

  • y_new (ndarray) – y values interpolated at x_new.

  • Remarks

  • ——-

  • Uses scipy.interpolate.interp1d. Optional arguments are passed to scipy.interpolate.interp1d, see https (//docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html)

rampy.spectranization.shiftsp(sp, shift)

Shift the X axis (frequency, wavenumber, etc.) of a given value.

Parameters:
  • sp (ndarray) – An array with n columns, the first one should contain the X axis (frequency, wavenumber, etc.)

  • shift (float) – The shift value to apply.

Returns:

sp – The same array but sorted such that the values in the first column are in increasing order.

Return type:

ndarray

rampy.spectranization.spectraoffset(spectre, oft)

Vertical offset your spectra with values in offsets

Parameters:
  • spectre (ndarray) – array of spectra constructed with the spectrarray function

  • oft (ndarray) – array constructed with numpy and containing the coefficient for the offset to apply to spectra

Returns:

out – Array with spectra separated by offsets defined in oft

Return type:

ndarray

rampy.spectranization.spectrarray(name, sh, sf, x)

Construct a general array that contain common X values in first columns and all Y values in the subsequent columns.

Parameters:
  • name (ndarray) – Array containing the names of the files (should work with a dataframe too).

  • sh (int) – Number of header line in files to skip.

  • sf (int) – Number of footer lines in files to skip.

  • x (ndarray) – The common x axis.

Returns:

An array with the common X axis in first column and all the spectra in the subsequent columns.

Return type:

out

rampy.spectranization.spectrataux(spectres)

Calculate the increase/decrease rate of each frequencies in a set of spectra.

Parameters:

spectres (ndarray) – An array of spectra containing the common X axis in first column and all the spectra in the subsequent columns. (see spectrarray function)

Returns:

taux – The rate of change of each frequency, fitted by a 2nd order polynomial functions.

Return type:

ndarray

rampy.tlcorrection module

rampy.tlcorrection.tlcorrection(x, y, temp, wave, **kwargs)

correct spectra from temperature and excitation line effects.

Parameters:
  • x (ndarray) – Raman shifts in cm-1

  • y (ndarray) – Intensity values as counts

  • temp (float) – Temperature in °C

  • wave (float) – wavenumber of the laser that excited the sample, in nm

  • correction (string, optional) – Equation used for the correction. Choose between ‘long’, ‘galeener’, or ‘hehlen’. Default = ‘long’.

  • normalisation (string, optional) – Data normalisation procedure. Choose between ‘intensity’, ‘area’, or ‘no’. Default = ‘area’.

  • density (float, optional) – The density of the studied material in kg m-3, to be used with the ‘hehlen’ equation. Default = 2210.0 (density of silica).

Returns:

  • x (1darray) – Raman shifts values.

  • long (1darray) – corrected intensities.

  • eselong (1darray) – errors calculated as sqrt(y) on raw intensities and propagated after the correction.

  • Remarks

  • ——-

  • This correction uses the formula reported in Galeener and Sen (1978), Mysen et al. (1982), Brooker et al. (1988) and Hehlen et al. (2010).

  • The ‘galeener’ equation is the exact one reported in Galeener and Sen (1978), which is a modification from Shuker and Gammon (1970) for accounting of (vo - v)^4 dependence of the Raman intensity. See also Brooker et al. (1988) for further discussion.

  • The ‘long’ equation is that of Galeener and Sen (1978) corrected by a vo^3 coefficient for removing the cubic meter dimension of the equation of ‘galeener’. This equation has been used in Mysen et al. (1982), Neuville and Mysen (1996) and Le Losq et al. (2012).

  • The ‘hehlen’ equation is that reported in Hehlen et al. (2010). It actually originates before this publication (Brooker et al. 1988). It uses a different correction that avoid crushing the signal below 500 cm-1. THerefore, it has the advantage of keeping intact the Boson peak signal in glasses.

Module contents