Map > Problem Definition > Data Preparation > Data Exploration > Modeling > Evaluation > Deployment | ||
Model Deployment |
||
The concept of deployment in data science refers to the application of a model for prediction using a new data. Building a model is generally not the end of the project. Even if the purpose of the model is to increase knowledge of the data, the knowledge gained will need to be organized and presented in a way that the customer can use it. Depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data science process. In many cases, it will be the customer, not the data analyst, who will carry out the deployment steps. For example, a credit card company may want to deploy a trained model or set of models (e.g., neural networks, meta-learner) to quickly identify transactions, which have a high probability of being fraudulent. However, even if the analyst will not carry out the deployment effort it is important for the customer to understand up front what actions will need to be carried out in order to actually make use of the created models. | ||
|
||
Model deployment methods: | ||
In general, there is four way of deploying the models in data science. | ||
|
||
An example of using a data mining tool (Orange) to deploy a decision tree model. | ||
An example of using a programming language (Visual Basic) to deploy a regression model. | ||
Same regression model deployed in the SQL script. | ||
Predictive Model Markup Language (PMML) | ||
PMML is an XML-based language used to define statistical and data science models and to share these between compliant applications. It defines a standard not only to represent data-science models, but also data handling and data transformations (pre and post processing). PMML is developed by DMG to avoid proprietary issues and incompatibilities and to deploy models. PMML eliminates the need for custom model deployment and allows for the clear separation of model development and model deployment tasks. The following data science methods are supported by PMML. | ||
|
||
PMML Processes | ||
|
||
PMML Components | ||
Header: contains general information about the PMML document, such as copyright information for the model, its description, and information about the application used to generate the model such as name and version. It also contains an attribute for a timestamp which can be used to specify the date of model creation. | ||
Data Dictionary: contains definitions for all the possible fields used by the model. It is here that a field is defined as continuous, categorical, or ordinal. Depending on this definition, the appropriate value ranges are then defined as well as the data type (such as, string or double). | ||
Data Transformations: transformations allow for the mapping of user data into a more desirable form to be used by the mining model. PMML defines several kinds of simple data transformations. | ||
|
||
Model: contains the definition of the data science model. For example a fee-forward neural network is represented in PMML by a "NeuralNetwork" element which contains attributes such as: | ||
|
||
Mining Schema: the mining schema lists all fields used in the model. This can be a subset of the fields as defined in the data dictionary. It contains specific information about each field, such as: | ||
|
||
Targets: allow for post-processing of the predicted value in the format of scaling if the output of the model is continuous. Targets can also be used for classification tasks. In this case, the attribute prior Probability specifies a default probability for the corresponding target category. It is used if the prediction logic itself did not produce a result. This can happen, e.g., if an input value is missing and there is no other method for treating missing values. | ||
PMML 4.0 – New Features | ||
|
||