rampy package
Subpackages
Submodules
rampy.baseline module
- rampy.baseline.baseline(x_input, y_input, bir, method, **kwargs)
Allows subtracting a baseline under a x y spectrum.
- Parameters:
x_input (ndarray) – x values.
y_input (ndarray) – y values.
bir (ndarray) – Contain the regions of interest, organised per line. For instance, roi = np.array([[100., 200.],[500.,600.]]) will define roi between 100 and 200 as well as between 500 and 600. Note: This is NOT used by the “als” and “arPLS” algorithms, but still is a requirement when calling the function. bir and method probably will become args in a futur iteration of rampy to solve this.
method (str) –
“poly”: polynomial fitting, with splinesmooth the degree of the polynomial.
- ”unispline”: spline with the UnivariateSpline function of Scipy, splinesmooth is
the spline smoothing factor (assume equal weight in the present case);
- ”gcvspline”: spline with the gcvspl.f algorythm, really robust.
Spectra must have x, y, ese in it, and splinesmooth is the smoothing factor; For gcvspline, if ese are not provided we assume ese = sqrt(y). Requires the installation of gcvspline with a “pip install gcvspline” call prior to use;
”exp”: exponential background;
”log”: logarythmic background;
”rubberband”: rubberband baseline fitting;
”als”: (automatic) baseline least square fitting following Eilers and Boelens 2005;
”arPLS”: (automatic) Baseline correction using asymmetrically reweighted penalized least squares smoothing. Baek et al. 2015, Analyst 140: 250-257;
’drPLS’: (automatic) Baseline correction method based on doubly reweighted penalized least squares. Xu et al., Applied Optics 58(14):3913-3920.
polynomial_order (int, optional) – The degree of the polynomial (0 for a constant), default = 1.
s (float, optional) – spline smoothing coefficient for the unispline and gcvspline algorithms.
lam (float, optional) – The lambda smoothness parameter for the ALS, ArPLS and drPLS algorithms. Typical values are between 10**2 to 10**9, default = 10**5 for ALS and ArPLS and default = 10**6 for drPLS.
p (float, optional) – For the ALS algorithm, advised value between 0.001 to 0.1, default = 0.01.
ratio (float, optional) – Ratio parameter of the arPLS and drPLS algorithm. default = 0.01 for arPLS and 0.001 for drPLS.
niter (int, optional) – Number of iteration of the ALS and drPLS algorithm, default = 10 for ALS and default = 100 for drPLS.
eta (float, optional) – Roughness parameter for the drPLS algorithm, is between 0 and 1, default = 0.5
p0_exp (list, optional) – containg the starting parameter for the exp baseline fit with curve_fit. Default = [1.,1.,1.].
p0_log (list, optional) – containg the starting parameter for the log baseline fit with curve_fit. Default = [1.,1.,1.,1.].
- Returns:
out1 (ndarray) – Contain the corrected signal.
out2 (ndarray) – Contain the baseline.
- rampy.baseline.get_portion_interest(x, y, bir)
Extracts the signals indicated in the bir.
- Parameters:
x (ndarray) – the x axis
y (ndarray) – the y values
bir (n x 2 array) – the x values of regions where the signal needs to be extracted, must be a n x 2 dimension array, where n is the number of regions to extract and column 0 contains the low bounds, column 1 the high ones.
- Returns:
yafit – a 2 columns x-y array containing the signals in the bir.
- Return type:
ndarray
rampy.filters module
- rampy.filters.smooth(x, y, method='whittaker', **kwargs)
smooth the provided y signal (sampled on x)
- Parameters:
x (ndarray) – Nx1 array of x values (equally spaced).
y (ndarray) – Nx1 array of y values (equally spaced).
method (str) – Method for smoothing the signal; choose between savgol (Savitzky-Golay), GCVSmoothedNSpline, MSESmoothedNSpline, DOFSmoothedNSpline, whittaker, flat, hanning, hamming, bartlett, blackman.
window_length (int, optional) – The length of the filter window (i.e. the number of coefficients). window_length must be a positive odd integer.
polyorder (int, optional) – The order of the polynomial used to fit the samples. polyorder must be less than window_length.
Lambda (float, optional) – smoothing parameter of the Whittaker filter described in Eilers (2003). The higher the smoother the fit.
d (int, optional) – d parameter in Whittaker filter, see Eilers (2003).
ese_y (ndarray, optional) – errors associated with y (for the gcvspline algorithms)
- Returns:
y_smo – smoothed signal sampled on x.
- Return type:
ndarray
Notes
Use of GCVSmoothedNSpline, MSESmoothedNSpline, DOFSmoothedNSpline requires installation of gcvspline. See gcvspline documentation. See also documentation for details on GCVSmoothedNSpline, MSESmoothedNSpline, DOFSmoothedNSpline.
savgol uses the scipy.signal.savgol_filter() function.
References
Eilers, P.H.C., 2003. A Perfect Smoother. Anal. Chem. 75, 3631–3636. https://doi.org/10.1021/ac034173t
Scipy Cookbook: https://scipy-cookbook.readthedocs.io/items/SignalSmooth.html?highlight=smooth
- rampy.filters.spectrafilter(spectre, filtertype, fq, numtaps, columns)
Filter specific frequencies in spectra with a butterworth filter
- Parameters:
spectre (ndarray) – Array of X-Y values of spectra. First column is X and subsequent n columns are Y values of n spectra. (see also spectraarray function)
filtertype (string) – type of filter; Choose between ‘low’, ‘high’, ‘bandstop’, ‘bandpass’.
fq (ndarray) – Frequency of the periodic signal you try to erase. If using a bandpass or band stop filter, fq must be an array containing the cutoff frequencies.
columns (ndarray) – An array defining which columns to treat.
- Returns:
out – filtered signals.
- Return type:
ndarray
- rampy.filters.whittaker(y, **kwargs)
smooth a signal with the Whittaker smoother
- Parameters:
y (ndarray) – An array with the values to smooth (equally spaced).
Lambda (float, optional) – The smoothing coefficient, the higher the smoother. Default = 10^5.
- Returns:
z – An array containing the smoothed values.
- Return type:
ndarray
References
Eilers, A Perfect Smoother. Anal. Chem. 75, 3631–3636 (2003).
rampy.functions module
- rampy.functions.constant(x, a)
returns a constant value
- Parameters:
x (1D array)
- Returns:
y – array filled with a values
- Return type:
1D array
- rampy.functions.difffull(x1, x2, t, C0, C1, D)
Equation for the diffusion into a full slab, see Crank 1975
Here we assume the profil to have 2 surfaces of contact on each side
- Parameters:
C0 (float) – the concentration in the core
C1 (float) – the concentration at the border
D (float) – the diffusion coefficient in log10 unit, m^2.s^-1
x2 (x1 and) – the profil lengths from beginning and end respectively, in meters
t (float) – time in seconds
- rampy.functions.diffshort(x, t, C0, C1, D)
1D equation for the diffusion into a semi-infinite slab, see Crank 1975
- Parameters:
C0 (float) – the concentration in the core
C1 (float) – the concentration at the border
D (float) – the diffusion coefficient in log10 unit, m^2.s^-1
x (1D array) – the profil length in meters
t (float) – time in seconds
- Returns:
Cx – concentration at x
- Return type:
1D array
- rampy.functions.funexp(x, a, b, c)
exponential baseline function
a*exp(b*(x-c))
- rampy.functions.gauss_lsq(params, x)
predicts a sum of gaussian peaks with parameters params
- Parameters:
params (1D array) – an array of the parameters of the peaks. The number of peaks is assumed to be equal to len(params)/3. In this array, list intensities first, then all peak positions, then all peak half width at half maximum.
x (1D array) – x axis
- Returns:
y – y values at position x
- Return type:
1D array
- rampy.functions.gauss_lsq_lfix(params, x)
predicts a sum of gaussian peaks with parameters params
Assumes that all peaks share the same HWHM.
- Parameters:
params (1D array) – an array of the parameters of the peaks. The number of peaks is assumed to be equal to len(params)/3. In this array, list intensities first, then all peak positions, then the last element is the peaks’ half width at half maximum.
x (1D array) – x axis
- Returns:
y – y values at position x
- Return type:
1D array
- rampy.functions.linear(x, a, b)
returns a + b*x
- rampy.functions.linear0(x, a)
returns a*x
- rampy.functions.multigaussian(x, params)
old attempt to have a multigaussian function, do not use. Will be removed soon.
- rampy.functions.poly2(x, a, b, c)
returns a + b*x + c*x*x
rampy.maps module
- class rampy.maps.maps(file, spectrometer_type='horiba', map_type='2D')
Bases:
object
treat maps of Raman spectra
- Parameters:
file (str) – filename, including path
spectrometer_type (str) – type of spectrometer, choose between “horiba” or “renishaw”, default: ‘horiba’
map_type (str) – type of map, choose between “2D” or “1D”, default: ‘2D’
- area(y, region_to_investigate)
get the area under the curve in the region to investigate.
The area is calculated by trapezoidal integration, using np.trapz() Do not forget to smooth the signal if necessary prior to using this.
- Parameters:
y (object intensities) – the intensities to consider. For instance, pass self.normalised for performing the calculation on normalised spectra.
region_to_investigate (1x2 array) – the x values of regions where the area will be measured
- Returns:
self.A – maximum to make a nice plot
- Return type:
ndarray
- area_ratio(y, region_to_investigate)
get the area ratio between two regions of interest.
The areas are calculated by trapezoidal integration, using np.trapz() Do not forget to smooth the signals if necessary prior to using this.
- Parameters:
y (object intensities) – the intensities to consider. For instance, pass self.normalised for performing the calculation on normalised spectra.
region_to_investigate (1x2 array) – the x values of regions where the areas will be measured. The two lines record the two regions of interest.
- Returns:
self.A_ratio – Area ratio = area region 1 / area region 2
- Return type:
ndarray
- background(bir, method='poly', **kwargs)
correct a background from the initial signal I on a map using rampy.baseline
- Parameters:
bir (ndarray) – arrays of the backgroudn interpolation regions.
method (string) – see rampy.baseline documentation for methods available. Default is polynomial
there. (All kwargs argument for rampy.baseline() will be forwarded and can be used)
- Return type:
Background and corrected spectra area available at self.background and self.I_corrected
- centroid(y, region_to_investigate)
calculate the centroid in a given region of interest
- Parameters:
y (object intensities) – the intensities to normalise. For instance, pass self.normalised for performing the calculation on normalised spectra.
region_to_investigate (1x2 array) – the x values of regions where the centroid will be measured
- Returns:
self.centroid_position – centroid position for the map
- Return type:
ndarray
- intensity(y, region_to_investigate)
get the maximum intensity in the region to investigate.
The intensity maximum is estimated from a simple np.max() search. Do not forget to smooth the signal if necessary prior to using this.
- Parameters:
y (object intensities) – the intensities to consider. For instance, pass self.normalised for performing the calculation on normalised spectra.
region_to_investigate (1x2 array) – the x values of regions where the intensity will be measured
- Returns:
self.I_max – Intensity maximum
- Return type:
ndarray
- intensity_ratio(y, region_to_investigate)
get the intensity ratio between two regions of interest.
The intensity maxima are estimated from a simple np.max() search. Do not forget to smooth the signals if necessary prior to using this. :param y: the intensities to consider. For instance, pass self.normalised for performing the calculation on normalised spectra. :type y: object intensities :param region_to_investigate: the x values of regions where the intensity ratios will be measured. The two lines record the two regions of interest. :type region_to_investigate: 2x2 array
- Returns:
self.I_ratio – Intensity ratio
- Return type:
ndarray
- normalise(y, method='intensity')
normalise the spectra to their max intensity, their area or min-max normalisation
This uses the internals of rampy.normalise.
- Parameters:
y (object intensities) – the intensities to normalise. For instance, if you want to normalised the background corrected I, pass self.I_corrected.
method (string) – method used, choose between area, intensity, minmax
- Return type:
The normalised spectra are available at self.I_normalised
- smooth(y, method='whittaker', **kwargs)
uses the smooth function of rampy to smooth the signals :param y: the intensities to normalise. For instance, if you want to normalised the background corrected I, pass self.I_corrected. :type y: object intensities :param method: Method for smoothing the signal;
choose between savgol (Savitzky-Golay), GCVSmoothedNSpline, MSESmoothedNSpline, DOFSmoothedNSpline, whittaker, flat, hanning, hamming, bartlett, blackman.
- Parameters:
window_length (int, optional) – The length of the filter window (i.e. the number of coefficients). window_length must be a positive odd integer.
polyorder (int, optional) – The order of the polynomial used to fit the samples. polyorder must be less than window_length.
Lambda (float, optional) – smoothing parameter of the Whittaker filter described in Eilers (2003). The higher the smoother the fit.
d (int, optional) – d parameter in Whittaker filter, see Eilers (2003).
ese_y (ndarray, optional) – errors associated with y (for the gcvspline algorithms)
- Returns:
self.y_smoothed – the smoothed signal for the map
- Return type:
ndarray
- rampy.maps.peak(X, Y, lambdas, intensities, function, Xrange, amp, Xmean, sigma, y0, A)
to fit peaks in a map. Work in progress.
- rampy.maps.read_horiba(file, map_type='2D')
read Horiba csv maps (1D, 2D)
- Parameters:
file (str) – filename, including path
map_type (str) – 1D map (line) or 2D map, default: 2D
- Returns:
X (m by n array) – X positions
Y (m by n array) – Y position
lambdas (n array) – Raman shift
intensities (m by n array) – Intensities
- rampy.maps.read_renishaw(file)
read Renishaw csv maps
- Parameters:
file (str) – filename, including path
- Returns:
X (m by n array) – X positions
Y (m by n array) – Y position
lambdas (m by n array) – Raman shift
intensities (m by n array) – Intensities
rampy.mixing module
- rampy.mixing.mixing_sp(y_fit, ref1, ref2)
mix two reference spectra to match the given ones
- Parameters:
y_fit (ndarray, shape m * n) – an array containing the signals with m datapoints and n experiments
ref1 (ndarray, shape m) – an array containing the first reference signal
ref2 (ndarray, shape m) – an array containing the second reference signal
- Returns:
out – the fractions of ref1 in the mix
- Return type:
ndarray, shape n
Notes
- Performs the calculation by minimizing the sum of the least absolute value of the objective function:
obj = sum(abs(y_fit-(ref1*F1 + ref2*(1-F1))))
Uses cvxpy to perform this calculation
rampy.ml_classification module
- class rampy.ml_classification.mlclassificator(x, y, **kwargs)
Bases:
object
use machine learning algorithms from scikit learn to perform classification of spectra.
- x
Spectra; n_features = n_frequencies.
- Type:
{array-like, sparse matrix}, shape = (n_samples, n_features)
- y
numeric labels.
- Type:
array, shape = (n_samples,)
- X_test
spectra organised in rows (1 row = one spectrum) that you want to use as a testing dataset. THose spectra should not be present in the x (training) dataset. The spectra should share a common X axis.
- Type:
{array-like, sparse matrix}, shape = (n_samples, n_features)
- y_test
numeric labels that you want to use as a testing dataset. Those targets should not be present in the y (training) dataset.
- Type:
array, shape = (n_samples,)
- algorithm
- “Nearest Neighbors”, “Linear SVM”, “RBF SVM”, “Gaussian Process”,
“Decision Tree”, “Random Forest”, “Neural Net”, “AdaBoost”, “Naive Bayes”, “QDA”
- Type:
String,
- scaling
True or False. If True, data will be scaled during fitting and prediction with the requested scaler (see below),
- Type:
Bool
- scaler
the type of scaling performed. Choose between MinMaxScaler or StandardScaler, see http://scikit-learn.org/stable/modules/preprocessing.html for details. Default = “MinMaxScaler”.
- Type:
String
- test_size
the fraction of the dataset to use as a testing dataset; only used if X_test and y_test are not provided.
- Type:
float
- rand_state
the random seed that is used for reproductibility of the results. Default = 42.
- Type:
Float64
- params_
contain the values of the hyperparameters that should be provided to the algorithm. See scikit-learn documentation for details for each algorithm.
- Type:
Dictionary
- prediction_train
the predicted target values for the training y dataset.
- Type:
Array{Float64}
- prediction_test
the predicted target values for the testing y_test dataset.
- Type:
Array{Float64}
- model
A Scikit Learn object model, see scikit learn library documentation.
- Type:
Scikit learn model
- X_scaler
A Scikit Learn scaler object for the x values.
- Type:
scikit learn scaler
- Y_scaler
A Scikit Learn scaler object for the y values.
- Type:
scikit learn scaler
Example
Given an array X of n samples by m frequencies, and Y an array of n x 1 concentrations >>> model = rampy.mlclassificator(X,y) >>> model.algorithm(“SVC”) >>> model.user_kernel = ‘poly’ >>> model.fit() >>> y_new = model.predict(X_new)
Remarks
For details on hyperparameters of each algorithms, please directly consult the documentation of SciKit Learn at: http://scikit-learn.org/stable/
In progress
- fit()
Scale data and train the model with the indicated algorithm.
Do not forget to tune the hyperparameters.
- Parameters:
algorithm (String) – algorithm to use. Choose between “Nearest Neighbors”, “Linear SVM”, “RBF SVM”, “Gaussian Process”,
Tree" ("Decision)
Forest" ("Random)
Net" ("Neural)
"AdaBoost"
Bayes" ("Naive)
"QDA"
- predict(X)
Predict using the model.
- Parameters:
X ({array-like, sparse matrix}, shape = (n_samples, n_features)) – Samples.
- Returns:
C (array, shape = (n_samples,)) – Returns predicted values.
Remark
——
if self.scaling == “yes”, scaling will be performed on the input X.
- refit()
Re-train a model previously trained with fit()
rampy.ml_exploration module
- class rampy.ml_exploration.mlexplorer(x, **kwargs)
Bases:
object
use machine learning algorithms from scikit learn to explore spectroscopic datasets
Performs automatic scaling and train/test split before NMF or PCA fit.
- x
Spectra; n_features = n_frequencies.
- Type:
{array-like, sparse matrix}, shape = (n_samples, n_features)
- X_test
spectra organised in rows (1 row = one spectrum) that you want to use as a testing dataset. THose spectra should not be present in the x (training) dataset. The spectra should share a common X axis.
- Type:
{array-like, sparse matrix}, shape = (n_samples, n_features)
- algorithm
“PCA”, “NMF”, default = “PCA”
- Type:
String,
- scaling
True or False. If True, data will be scaled prior to fitting (see below),
- Type:
Bool
- scaler
the type of scaling performed. Choose between MinMaxScaler or StandardScaler, see http://scikit-learn.org/stable/modules/preprocessing.html for details. Default = “MinMaxScaler”.
- Type:
String
- test_size
the fraction of the dataset to use as a testing dataset; only used if X_test and y_test are not provided.
- Type:
float
- rand_state
the random seed that is used for reproductibility of the results. Default = 42.
- Type:
Float64
- model
A Scikit Learn object model, see scikit learn library documentation.
- Type:
Scikit learn model
- Remarks
- -------
- For details on hyperparameters of each algorithms, please directly consult the documentation of SciKit Learn at
- http
- Type:
//scikit-learn.org/stable/
- Results for machine learning algorithms can vary from run to run. A way to solve that is to fix the random_state.
Example
Given an array X of n samples by m frequencies, and Y an array of n x 1 concentrations
>>> explo = rampy.mlexplorer(X) # X is an array of signals built by mixing two partial components >>> explo.algorithm = 'NMF' # using Non-Negative Matrix factorization >>> explo.nb_compo = 2 # number of components to use >>> explo.test_size = 0.3 # size of test set >>> explo.scaler = "MinMax" # scaler >>> explo.fit() # fitting! >>> W = explo.model.transform(explo.X_train_sc) # getting the mixture array >>> H = explo.X_scaler.inverse_transform(explo.model.components_) # components in the original space >>> plt.plot(X,H.T) # plot the two components
- fit()
Train the model with the indicated algorithm.
Do not forget to tune the hyperparameters.
- predict(X)
Predict using the model.
- Parameters:
X ({array-like, sparse matrix}, shape = (n_samples, n_features)) – Samples.
- Returns:
C (array, shape = (n_samples,)) – Returns predicted values.
Remark
——
if self.scaling == “yes”, scaling will be performed on the input X.
- refit()
Train the model with the indicated algorithm.
Do not forget to tune the hyperparameters.
rampy.ml_regressor module
- rampy.ml_regressor.chemical_splitting(Pandas_DataFrame, target, split_fraction=0.3, rand_state=42)
split datasets depending on their chemistry
- Parameters:
Pandas_DataFrame (Pandas DataFrame) – The input DataFrame with in the first row the names of the different data compositions
label (string) – The target in the DataFrame according to which we will split the dataset
split_fraction (float, between 0 and 1) – This is the amount of splitting you want, in reference to the second output dataset (see OUTPUTS).
rand_state (float64) – the random seed that is used for reproductibility of the results. Default = 42.
- Returns:
frame1 (Pandas DataFrame) – A DataSet with (1-split_fraction) datas from the input dataset with unique chemical composition / names
frame2 (Pandas DataFrame) – A DataSet with split_fraction datas from the input dataset with unique chemical composition / names
frame1_idx (ndarray) – Contains the indexes of the data picked in Pandas_DataFrame to construct frame1
frame2_idx (ndarray) – Contains the indexes of the data picked in Pandas_DataFrame to construct frame2
Notes
This function avoids the same chemical dataset to be found in different training/testing/validating datasets that are used in ML.
Indeed, it is worthless to put data from the same original dataset / with the same chemical composition in the training / testing / validating datasets. This creates a initial bias in the splitting process…
Another way of doing that would be to write:
>>> grouped = Pandas_DataFrame.groupby(by='label') >>> k = [i for i in grouped.groups.keys()] >>> k_train, k_valid = model_selection.train_test_split(np.array(k),test_size=0.40,random_state=100) >>> train = Pandas_DataFrame.loc[Pandas_DataFrame['label'].isin(k_train)] >>> valid = Pandas_DataFrame.loc[Pandas_DataFrame['label'].isin(k_valid)]
(results will vary slightly as variable k is sorted but not variable names in the function below)
- class rampy.ml_regressor.mlregressor(x, y, **kwargs)
Bases:
object
use machine learning algorithms from scikit learn to perform regression between spectra and an observed variable.
- x
Spectra; n_features = n_frequencies.
- Type:
{array-like, sparse matrix}, shape = (n_samples, n_features)
- y
Returns predicted values.
- Type:
array, shape = (n_samples,)
- X_test
spectra organised in rows (1 row = one spectrum) that you want to use as a testing dataset. THose spectra should not be present in the x (training) dataset. The spectra should share a common X axis.
- Type:
{array-like, sparse matrix}, shape = (n_samples, n_features)
- y_test
the target that you want to use as a testing dataset. Those targets should not be present in the y (training) dataset.
- Type:
array, shape = (n_samples,)
- algorithm
“KernelRidge”, “SVM”, “LinearRegression”, “Lasso”, “ElasticNet”, “NeuralNet”, “BaggingNeuralNet”, default = “SVM”
- Type:
String,
- scaling
True or False. If True, data will be scaled during fitting and prediction with the requested scaler (see below),
- Type:
Bool
- scaler
the type of scaling performed. Choose between MinMaxScaler or StandardScaler, see http://scikit-learn.org/stable/modules/preprocessing.html for details. Default = “MinMaxScaler”.
- Type:
String
- test_size
the fraction of the dataset to use as a testing dataset; only used if X_test and y_test are not provided.
- Type:
float
- rand_state
the random seed that is used for reproductibility of the results. Default = 42.
- Type:
Float64
- param_kr
contain the values of the hyperparameters that should be provided to KernelRidge and GridSearch for the Kernel Ridge regression algorithm.
- Type:
Dictionary
- param_svm
containg the values of the hyperparameters that should be provided to SVM and GridSearch for the Support Vector regression algorithm.
- Type:
Dictionary
- param_neurons
contains the parameters for the Neural Network (MLPregressor model in sklearn). Default= dict(hidden_layer_sizes=(3,),solver = ‘lbfgs’,activation=’relu’,early_stopping=True)
- Type:
Dictionary
- param_bagging
contains the parameters for the BaggingRegressor sklearn function that uses a MLPregressor base method. Default= dict(n_estimators=100, max_samples=1.0, max_features=1.0, bootstrap=True,
bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=1, random_state=rand_state, verbose=0)
- Type:
Dictionary
- prediction_train
the predicted target values for the training y dataset.
- Type:
Array{Float64}
- prediction_test
the predicted target values for the testing y_test dataset.
- Type:
Array{Float64}
- model
A Scikit Learn object model, see scikit learn library documentation.
- Type:
Scikit learn model
- X_scaler
A Scikit Learn scaler object for the x values.
- Y_scaler
A Scikit Learn scaler object for the y values.
Example
Given an array X of n samples by m frequencies, and Y an array of n x 1 concentrations
>>> model = rampy.mlregressor(X,y) >>> model.algorithm("SVM") >>> model.user_kernel = 'poly' >>> model.fit() >>> y_new = model.predict(X_new)
Remarks
For details on hyperparameters of each algorithms, please directly consult the documentation of SciKit Learn at:
http://scikit-learn.org/stable/
For Support Vector and Kernel Ridge regressions, mlregressor performs a cross_validation search with using 5 KFold cross validators.
If the results are poor with Support Vector and Kernel Ridge regressions, you will have to tune the param_grid_kr or param_grid_svm dictionnary that records the hyperparameter space to investigate during the cross validation.
Results for machine learning algorithms can vary from run to run. A way to solve that is to fix the random_state. For neural nets, results from multiple neural nets (bagging technique) may also generalise better, such that it may be better to use the BaggingNeuralNet function.
- fit()
Scale data and train the model with the indicated algorithm.
Do not forget to tune the hyperparameters.
- Parameters:
algorithm (String,) – algorithm to use. Choose between “KernelRidge”, “SVM”, “LinearRegression”, “Lasso”, “ElasticNet”, “NeuralNet”, “BaggingNeuralNet”, default = “SVM”
- predict(X)
Predict using the model.
- Parameters:
X ({array-like, sparse matrix}, shape = (n_samples, n_features)) – Samples.
- Returns:
C (array, shape = (n_samples,)) – Returns predicted values.
Remark
——
if self.scaling == “yes”, scaling will be performed on the input X.
- refit()
Re-train a model previously trained with fit()
rampy.peak_area module
- rampy.peak_area.gaussianarea(amp, HWHM, **options)
returns the area of a Gaussian peak
- Parameters:
amp (float or ndarray) – amplitude of the peak
HWHM (float or ndarray) – half-width at half-maximum
eseAmplitude (float or ndarray, optional) – standard deviation on amp; Default = None
eseHWHM (float or ndarray, optional) – standard deviation on HWHM; Default = None
- Returns:
area (float or ndarray) – peak area
esearea (float or ndarray) – error on peak area; will be None if no errors on amp and HWHM were provided.
- rampy.peak_area.peakarea(shape, **options)
returns the area of a peak
(!experimental!)
gaussian peak area is calculated analytically; areas for other peak shapes are calculated using trapezoidal integration.
- Parameters:
shape (string) – gaussian, lorentzian, pseudovoigt or pearson7
amp (float or ndarray) – amplitude of the peak
pos (float or ndarray) – peak position
HWHM (float or ndarray) – half-width at half-maximum
a3 (float or ndarray) – a3 parameters for pearson7
eseAmplitude (float or ndarray) – standard deviation on amp; Default = None
eseHWHM (float or ndarray) – standard deviation on HWHM; Default = None
- Returns:
area (float or ndarray) – peak area
esearea (float or ndarray) – error on peak area; will be None if no errors on amp and HWHM were provided.
rampy.peak_shapes module
- rampy.peak_shapes.create_gauss()
- rampy.peak_shapes.create_lorenz()
- rampy.peak_shapes.gaussian(x, amp, freq, HWHM)
compute a Gaussian peak
- Parameters:
x (ndarray) – the positions at which the signal should be sampled
amp (float or ndarray with size equal to x.shape) – amplitude
freq (float or ndarray with size equal to x.shape) – frequency/position of the Gaussian component
HWHM (float or ndarray with size equal to x.shape) – half-width at half-maximum
- Returns:
out (ndarray) – the signal
Remarks
——-
Formula is amp*np.exp(-np.log(2)*((x-freq)/HWHM)**2)
- rampy.peak_shapes.lorentzian(x, amp, freq, HWHM)
compute a Lorentzian peak
- Parameters:
x (ndarray) – the positions at which the signal should be sampled
amp (float or ndarray with size equal to x.shape) – amplitude
freq (float or ndarray with size equal to x.shape) – frequency/position of the Gaussian component
HWHM (float or ndarray with size equal to x.shape) – half-width at half-maximum
- Returns:
out (ndarray) – the signal
Remarks
——-
Formula is amp/(1+((x-freq)/HWHM)**2)
- rampy.peak_shapes.pearson7(x, a0, a1, a2, a3)
compute a Peason7 peak
- Parameters:
x (ndarray) – the positions at which the signal should be sampled
a0 (float or ndarrays of size equal to x.shape) – parameters of the Pearson7 equation
a1 (float or ndarrays of size equal to x.shape) – parameters of the Pearson7 equation
a2 (float or ndarrays of size equal to x.shape) – parameters of the Pearson7 equation
a3 (float or ndarrays of size equal to x.shape) – parameters of the Pearson7 equation
- Returns:
out (ndarray) – the signal
Remarks
——-
Formula is a0 / ( (1.0 + ((x-a1)/a2)**2.0 * (2.0**(1.0/a3) -1.0))**a3 )
- rampy.peak_shapes.pseudovoigt(x, amp, freq, HWHM, L_ratio)
compute a pseudo-Voigt peak
- Parameters:
x (ndarray) – the positions at which the signal should be sampled. Can be provided as vector, nx1 or nxm array.
amp (float or ndarray with size equal to x.shape) – amplitude
freq (float or ndarray with size equal to x.shape) – frequency/position of the Gaussian component
HWHM (float or ndarray with size equal to x.shape) – half-width at half-maximum
L_ratio (float or ndarray with size equal to x.shape) – ratio of the Lorentzian component, should be between 0 and 1 (included)
- Returns:
out (ndarray of size equal to x.shape) – the signal
Remarks
——-
Formula is (1-L_ratio)*gaussian(amp,freq,HWHM) + L_ratio*lorentzian(amp,freq,HWHM)
rampy.rameau module
- rampy.rameau.DG2017_calibrate(dictio)
Fit a calibration by optimizing the K coefficient in the DG2017 method
- Parameters:
dictio (dictionary) – dictionary with arrays named “feo”, “rws” and “water”.
- Returns:
popt – The optimize a and b parameters of the equation K = a * [FeO wt%] + b.
- Return type:
ndarray
- rampy.rameau.DG2017_predict(dictio, a=0.096, b=0.663)
Calculate the K coefficient for the DG2017 method.
- Parameters:
dictio (dict) – a dictionary with ndarrays named “feo” and “rws”
b (a and) – factors in the equation: K = a * [FeO wt%] + b; default values from Di Genova et al. (2017)
- Returns:
H2O (wt %) – The water content of the glasses calculated as Rws * (a * [FeO wt%] + b)
- Return type:
ndarray
- rampy.rameau.LL2012_calibrate(dictio)
Fit a calibration line following equations (2) and (3) from Le Losq et al. (2012)
- Parameters:
dictio – dictionary with arrays named “feo”, “rws” and “water”.
- Returns:
A – The parameter in the equation (3) of Le Losq et al. (2012).
- Return type:
float
- rampy.rameau.LL2012_predict(dictio, A=0.007609)
Predict the water content using the equation (3) from Le Losq et al. (2012)
- Parameters:
dictio (dict) – a dictionary with ndarray named “rws”
- Returns:
The glass water contents in wt%
- Return type:
H2O
- rampy.rameau.fit_spectra(data_liste, method='LL2012', delim='\t', path_in='./raw/', laser=514.532, spline_coeff=0.001, poly_coeff=3)
Calculate the ratios of water and silicate signals from Raman spectra
- Parameters:
data_liste (Pandas DataFrame) – Contains the list of spectra, see provided file as an example
method (string) – The used method. LL2012: Le Losq et al. (2012); DG2017: Di Genova et al. (2017). See references.
delim (string) – File delimiter. Use ‘ ‘ for tabulated text or ‘,’ for comma separated text.
path_in (string) – Path for the spectra
laser (float) – Laser line wavelength in nm
spline_coeff (float) – Smoothing coefficient for the spline baseline. An array of size len(data_liste) can be provided. Default = 0.001.
poly_coeff (int) – Polynomial coefficient for the polynomial baseline function. Default = 3 (DG2017 method). Set to 2 for Behrens et al. (2006) method.
- Returns:
x (ndarray) – Common x axis.
y_all (ndarray) – All raw spectra from data_liste in an array of length len(x) and with as many column as spectra.
y_all_corr (ndarray) – All corrected spectra from data_liste in an array of length len(x) and with as many column as spectra.
y_all_base (ndarray) – All baselines for spectra from data_liste in an array of length len(x) and with as many column as spectra.
rws (ndarray) – The ratio of the water integrated intensity over that of silicate signals.
rw (ndarray) – The integrated intensity of water signal.
rs (ndarray) – The integrated intensity of silicate signals.
- Raises:
IOError – If method is not set to LL2012 or DG2017.
References
Le Losq, D. R. Neuville, R. Moretti, J. Roux, Determination of water content in silicate glasses using Raman spectrometry: Implications for the study of explosive volcanism. American Mineralogist. 97, 779–790 (2012).
Di Genova et al., Effect of iron and nanolites on Raman spectra of volcanic glasses: A reassessment of existing strategies to estimate the water content. Chemical Geology. 475, 76–86 (2017).
- class rampy.rameau.rameau(data_liste)
Bases:
object
treat Raman spectra of glass to retrieve the glass water content
- Parameters:
data_liste (Pandas dataframe) – A Pandas dataframe containing the data and various meta information.
- x
a 1D array (Nx1) containing the common x axis (wavelength) of the spectra.
- Type:
ndarray
- y
a NxM array (with M the number of spectra) containing the raw intensities of the spectra.
- Type:
ndarray
- y_corr
a NxM array (with M the number of spectra) containing the corrected intensities of the spectra.
- Type:
ndarray
- y_base
a NxM array (with M the number of spectra) containing the backgrounds of the spectra.
- Type:
ndarray
- rws
a 1D array (Nx1) containing the ratio between the integrated intensities of the water and silicate signals.
- Type:
ndarray
- rw
a 1D array (Nx1) containing the integrated intensities of the water signal.
- Type:
ndarray
- rs
a 1D array (Nx1) containing the integrated intensities of the silicate signals.
- water
the known glass water content provided in data_liste (set to 0 if predicting for unknowns)
- water_predicted
the predicted glass water content provided in data_liste (set to 0 if predicting for unknowns)
- p
calibration coefficient(s) of the LL2012 or DG2017 method
- Type:
ndarray
- names
filenames indicated in the data_liste input
- Type:
pandas dataframe
Notes
Uses either the LL2012 method (Le Losq et al., 2012) or the DG2017 (Di Genova et al., 2017) method. See references.
In the LL2012 method, a cubic spline is fitted to the regions of interest provided in self.data_liste (see example). The spline is smoothed by the spline_coeff of the data_reduction method. The water content is calculated following eq. (3) of LL2012, with the A coefficient either provided or calculated by the method self.calibrate().
In the DG2017 method, a third-order polynomial is fitted to the spectra following the instructions of Di Genova et al. (2017). The water content is calculated as wt% H2O = Rws * (a * [FeO wt%] + b) with a and b the coefficients either provided or calculated by the method self.calibrate().
References
LL2102: C. Le Losq, D. R. Neuville, R. Moretti, J. Roux, Determination of water content in silicate glasses using Raman spectrometry: Implications for the study of explosive volcanism. American Mineralogist. 97, 779–790 (2012). DG 2017 D. Di Genova et al., Effect of iron and nanolites on Raman spectra of volcanic glasses: A reassessment of existing strategies to estimate the water content. Chemical Geology. 475, 76–86 (2017).
- calibrate(method='LL2012')
Fit a calibration by optimizing the K coefficient(s)
- Parameters:
self (object) – rameau object with treated spectra (see data_reduction method)
method (string) – the method used; choose between “LL2012” or “DG2017”, default = “LL2012”.
- Returns:
popt – The optimized parameter(s); if method = “DG2017”, popt=np.array([a,b]), parameters of the equation K = a * [FeO wt%] + b. if method = “LL2017”, popt = A (float), with A parameter in the equation (3) of Le Losq et al. (2012).
- Return type:
ndarray or float
- data_reduction(method='LL2012', delim='\t', path_in='./raw/', laser=514.532, spline_coeff=0.001, poly_coeff=3)
process Raman spectra of glass to calculate the Rws ratio
- Parameters:
self (object) – a rameau object that has been initiated.
method (string) – The used method. LL2012: Le Losq et al. (2012); DG2017: Di Genova et al. (2017). See references. Default = “LL2012”.
delim (string) – File delimiter. Use ‘ ‘ for tabulated text or ‘,’ for comma separated text. Default = ‘ ‘.
path_in (string) – Path for the spectra. Default = ‘./raw/’
laser (float) – Laser line wavelength in nm. Default = 514.532.
spline_coeff (float) – Smoothing coefficient for the spline baseline. An array of size len(data_liste) can be provided. Default = 0.001.
poly_coeff (int) – Polynomial coefficient for the polynomial baseline function. Default = 3 (DG2017 method; set to 2 for Behrens et al. (2006) method).
- Returns:
self.x (ndarray) – Common x axis.
self.y_all (ndarray) – All raw spectra from data_liste in an array of length len(x) and with as many column as spectra.
self.y_all_corr (ndarray) – All corrected spectra from data_liste in an array of length len(x) and with as many column as spectra.
self.y_all_base (ndarray) – All baselines for spectra from data_liste in an array of length len(x) and with as many column as spectra.
self.rws (ndarray) – The ratio of the water integrated intensity over that of silicate signals.
self.rw (ndarray) – The integrated intensity of water signal.
self.rs (ndarray) – The integrated intensity of silicate signals.
- names = []
- p = None
- predict(method='LL2012')
predict the water content from the Rws
- Parameters:
self (object) – rameau object with treated spectra (see data_reduction method).
method (string) – the method used; choose between “LL2012” or “DG2017”, default = “LL2012”.
- Returns:
H2O – The glass water contents in wt%
- Return type:
array
- rs = []
- rw = []
- rws = []
- water = []
- water_predicted = []
- x = []
- y = []
- y_base = []
- y_corr = []
rampy.spectranization module
- rampy.spectranization.centroid(x, y, smoothing=False, **kwargs)
calculation of y signal centroid(s)
as np.sum(y/np.sum(y)*x)
- Parameters:
x (Numpy array, m values by n samples) – x values
y (Numpy array, m values by n samples) – y values
Options
=======
smoothing (bool) – True or False. Smooth the signals with arguments provided as kwargs. Default method is whittaker smoothing. See the rampy.smooth function for smoothing options and arguments.
- Returns:
centroid – signal centroid(s)
- Return type:
Numpy array, n samples
- rampy.spectranization.despiking(x, y, neigh=4, threshold=3)
remove spikes from the y 1D signal given a threeshold
This function smooths the spectra, calculates the residual error RMSE and remove points above threshold*RMSE using the neighboring points
- Parameters:
x (1D array) – signal to despike
y (1D array) – signal to despike
neigh (int) – numbers of points around the spikes to select for calculating average value for despiking
threshold (int) – multiplier of sigma, default = 3
- Returns:
y – the signal without spikes
- Return type:
1D array
- rampy.spectranization.flipsp(sp)
Flip an array along the row dimension (dim = 1) if the row values are in decreasing order.
- Parameters:
sp (ndarray) – An array with n columns, the first one should contain the X axis (frequency, wavenumber, etc.)
- Returns:
sp – The same array but sorted such that the values in the first column are in increasing order.
- Return type:
ndarray
- rampy.spectranization.normalise(y, x=0, method='intensity')
normalise y signal(s)
- Parameters:
x (ndarray, m values by n samples) – x values
y (ndarray, m values by n samples) – corresponding y values
method (string) – method used, choose between area, intensity, minmax
- Returns:
y_norm – Normalised signal(s)
- Return type:
Numpy array
- rampy.spectranization.resample(x, y, x_new, **kwargs)
Resample a y signal associated with x, along the x_new values.
- Parameters:
x (ndarray) – The x values
y (ndarray) – The y values
x_new (ndarray) – The new X values
kind (str or int, optional) – Specifies the kind of interpolation as a string (‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, ‘next’, where ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point) or as an integer specifying the order of the spline interpolator to use. Default is ‘linear’.
axis (int, optional) – Specifies the axis of y along which to interpolate. Interpolation defaults to the last axis of y.
copy (bool, optional) – If True, the class makes internal copies of x and y. If False, references to x and y are used. The default is to copy.
bounds_error (bool, optional) – If True, a ValueError is raised any time interpolation is attempted on a value outside of the range of x (where extrapolation is necessary). If False, out of bounds values are assigned fill_value. By default, an error is raised unless fill_value=”extrapolate”.
fill_value (array-like or (array-like, array_like) or “extrapolate”, optional) –
if a ndarray (or float), this value will be used to fill in for requested points outside of the data range. If not provided, then the default is NaN. The array-like must broadcast properly to the dimensions of the non-interpolation axes. If a two-element tuple, then the first element is used as a fill value for x_new < x[0] and the second element is used for x_new > x[-1]. Anything that is not a 2-element tuple (e.g., list or ndarray, regardless of shape) is taken to be a single array-like argument meant to be used for both bounds as below, above = fill_value, fill_value.
New in scipy version 0.17.0.
If “extrapolate”, then points outside the data range will be extrapolated.
New in scipy version 0.17.0.
assume_sorted (bool, optional) – If False, values of x can be in any order and they are sorted first. If True, x has to be an array of monotonically increasing values.
- Returns:
y_new (ndarray) – y values interpolated at x_new.
Remarks
——-
Uses scipy.interpolate.interp1d. Optional arguments are passed to scipy.interpolate.interp1d, see https (//docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html)
- rampy.spectranization.shiftsp(sp, shift)
Shift the X axis (frequency, wavenumber, etc.) of a given value.
- Parameters:
sp (ndarray) – An array with n columns, the first one should contain the X axis (frequency, wavenumber, etc.)
shift (float) – The shift value to apply.
- Returns:
sp – The same array but sorted such that the values in the first column are in increasing order.
- Return type:
ndarray
- rampy.spectranization.spectraoffset(spectre, oft)
Vertical offset your spectra with values in offsets
- Parameters:
spectre (ndarray) – array of spectra constructed with the spectrarray function
oft (ndarray) – array constructed with numpy and containing the coefficient for the offset to apply to spectra
- Returns:
out – Array with spectra separated by offsets defined in oft
- Return type:
ndarray
- rampy.spectranization.spectrarray(name, sh, sf, x)
Construct a general array that contain common X values in first columns and all Y values in the subsequent columns.
- Parameters:
name (ndarray) – Array containing the names of the files (should work with a dataframe too).
sh (int) – Number of header line in files to skip.
sf (int) – Number of footer lines in files to skip.
x (ndarray) – The common x axis.
- Returns:
An array with the common X axis in first column and all the spectra in the subsequent columns.
- Return type:
out
- rampy.spectranization.spectrataux(spectres)
Calculate the increase/decrease rate of each frequencies in a set of spectra.
- Parameters:
spectres (ndarray) – An array of spectra containing the common X axis in first column and all the spectra in the subsequent columns. (see spectrarray function)
- Returns:
taux – The rate of change of each frequency, fitted by a 2nd order polynomial functions.
- Return type:
ndarray
rampy.tlcorrection module
- rampy.tlcorrection.tlcorrection(x, y, temp, wave, **kwargs)
correct spectra from temperature and excitation line effects.
- Parameters:
x (ndarray) – Raman shifts in cm-1
y (ndarray) – Intensity values as counts
temp (float) – Temperature in °C
wave (float) – wavenumber of the laser that excited the sample, in nm
correction (string, optional) – Equation used for the correction. Choose between ‘long’, ‘galeener’, or ‘hehlen’. Default = ‘long’.
normalisation (string, optional) – Data normalisation procedure. Choose between ‘intensity’, ‘area’, or ‘no’. Default = ‘area’.
density (float, optional) – The density of the studied material in kg m-3, to be used with the ‘hehlen’ equation. Default = 2210.0 (density of silica).
- Returns:
x (1darray) – Raman shifts values.
long (1darray) – corrected intensities.
eselong (1darray) – errors calculated as sqrt(y) on raw intensities and propagated after the correction.
Remarks
——-
This correction uses the formula reported in Galeener and Sen (1978), Mysen et al. (1982), Brooker et al. (1988) and Hehlen et al. (2010).
The ‘galeener’ equation is the exact one reported in Galeener and Sen (1978), which is a modification from Shuker and Gammon (1970) for accounting of (vo - v)^4 dependence of the Raman intensity. See also Brooker et al. (1988) for further discussion.
The ‘long’ equation is that of Galeener and Sen (1978) corrected by a vo^3 coefficient for removing the cubic meter dimension of the equation of ‘galeener’. This equation has been used in Mysen et al. (1982), Neuville and Mysen (1996) and Le Losq et al. (2012).
The ‘hehlen’ equation is that reported in Hehlen et al. (2010). It actually originates before this publication (Brooker et al. 1988). It uses a different correction that avoid crushing the signal below 500 cm-1. THerefore, it has the advantage of keeping intact the Boson peak signal in glasses.