AutoML
Automated Machine Learning (AutoML) is tied in with producing Machine Learning solutions for the data scientist without doing unlimited inquiries on data preparation, model selection, model parameters
To listen to the video please copy and paste the website URL into your host Chrome browser, as there's no soundcard in the Lab environment.

Start PDI
Open the following autoML.ktr
~/Workshop--Data-Integration/Labs/Module 6 - Machine Learning/autoML.ktr

Browse the various customer data sources:




Create the TPOT dataset
Run the transformation: autoML.ktr
Preview the results: data/TPOT.csv

Sign into Colab.
Connect to a hosted runtime.

Select File -> Open Notebook

Upload:
/home/pentaho/Workshop--Data-Integration/Labs/Module 7 - Workflows/Machine Learning/AutoML/data/credit_card_fraud.ipynb
AutoML Script
These are the code sections for the Jupyter file credit_card_fraud.ipynb:
Install the TPOT libraries:
Import libraries:
Import dataset:
Add column headers:
Convert dataset to numpy array and fit data (optional):
Split the dataset. 75% used for test.
8. Run TPOT Classifier:
Export Pipeline as Python script:
Enable the rest of the hops in the transformation, except: Model Catalogue (table).
Open the step: auto machine learning
For further details:
Click on the Input tab.
a. Use this tab to make selections for moving data from PDI fields to Python variables.
b. The All rows option is commonly used for data frames. A data frame is used for storing data tables and is composed of a list of vectors of equal length.
c. Select the All rows option to process all your data at once, for example, using the Python list of dictionaries.
Available variables
Use the plus sign button to add a Python variable to the input mapping for the script used in the transformation. You can remove the Python variable by clicking the X icon.
Variable name
Enter the name of the Python variable. The list of available variables will update automatically.
Step
Specify the name of the input step to map from. It can be any step in the parent transformation with an outgoing hop connected to the Python Executor step.
Data structure
Specify the data structure from which you want to pull the fields for mapping. You can select one of the following:
· Pandas data frame: the tabular data structure for Python/Pandas.
· NumPy array: the table of values, all the same type, which is indexed by a tuple of positive integers.
· Python List of Dictionaries: each row in the PDI stream becomes a Python dictionary. All the dictionaries are put into a Python list.
The Mapping table contains the following field properties:
Data structure field
The value of the Python data structure field to which you want to map the PDI field.
Data structure type
The value of the data structure type assigned to the data structure field to which you want to map the PDI field.
PDI field
The name of the PDI field which contains the vector data stored in the mapped Python variable.
PDI data type
The value of the data type assigned to the PDI field, such as a date, a number, or a timestamp.
Click on the Output tab.
The output of model.df dataframe, from the script:
is converted back to PDI fields.
Preview the data for the tfo_model_catalogue step - sort by Cross Validation Performance..
What does this mean?
Last updated
Was this helpful?
