Linear Regression Health Costs Calculator
https://www.freecodecamp.org/learn/machine-learning-with-python/machine-learning-with-python-projects/linear-regression-health-costs-calculator
In this challenge, you will predict healthcare costs using a regression algorithm.
You are given a dataset that contains information about different people including their healthcare costs. Use the data to predict healthcare costs based on new data.
You can access the full project instructions and starter code on Google Colaboratory.
训练划分
https://www.tensorflow.org/tutorials/keras/regression#split_the_data_into_train_and_test
使用pandas的sample接口
train_dataset = dataset.sample(frac=0.8, random_state=0)
test_dataset = dataset.drop(train_dataset.index)
当然使用sklearn train_test_split 也可以。
https://towardsdatascience.com/keras-101-a-simple-and-interpretable-neural-network-model-for-house-pricing-regression-31b1a77f05ae
from sklearn.model_selection import train_test_splitX = df.loc[:, df.columns != 'MEDV']
y = df.loc[:, df.columns == 'MEDV']X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=123)
正则化与模型构建
https://www.tensorflow.org/tutorials/keras/regression#split_the_data_into_train_and_test
horsepower = np.array(train_features['Horsepower'])
horsepower_normalizer = preprocessing.Normalization(input_shape=[1,])
horsepower_normalizer.adapt(horsepower)Build the sequential model:
horsepower_model = tf.keras.Sequential([
horsepower_normalizer,
layers.Dense(units=1)
])
horsepower_model.summary()
模型配置与训练
https://www.tensorflow.org/tutorials/keras/regression#split_the_data_into_train_and_test
Once the model is built, configure the training procedure using the
Model.compile()
method. The most important arguments to compile are theloss
and theoptimizer
since these define what will be optimized (mean_absolute_error
) and how (using theoptimizers.Adam
).horsepower_model.compile(
optimizer=tf.optimizers.Adam(learning_rate=0.1),
loss='mean_absolute_error')Once the training is configured, use
Model.fit()
to execute the training:%%time
history = horsepower_model.fit(
train_features['Horsepower'], train_labels,
epochs=100,
# suppress logging
verbose=0,
# Calculate validation results on 20% of the training data
validation_split = 0.2)
When to use a Sequential model
模型定义有两种形式,
一种是 sequential
另一种是 函数式
sequential 使用简单形式, 输入数据都准备好,作为tensor出现, 如果特征中有 categories类型数据,需要自行转换为数据类型。、
或者使用 函数式, 在输入层后,添加categories转换。
https://colab.research.google.com/github/keras-team/keras-io/blob/master/guides/ipynb/sequential_model.ipynb#scrollTo=GCrA42dfKE9m
## When to use a Sequential model
A `Sequential` model is appropriate for **a plain stack of layers**
where each layer has **exactly one input tensor and one output tensor**.
Schematically, the following `Sequential` model:
# Define Sequential model with 3 layers
model = keras.Sequential(
[
layers.Dense(2, activation="relu", name="layer1"),
layers.Dense(3, activation="relu", name="layer2"),
layers.Dense(4, name="layer3"),
]
)
# Call model on a test input
x = tf.ones((3, 3))
y = model(x)
is equivalent to this function:
# Create 3 layers
layer1 = layers.Dense(2, activation="relu", name="layer1")
layer2 = layers.Dense(3, activation="relu", name="layer2")
layer3 = layers.Dense(4, name="layer3")
# Call layers on a test input
x = tf.ones((3, 3))
y = layer3(layer2(layer1(x)))
A Sequential model is **not appropriate** when:
- Your model has multiple inputs or multiple outputs
- Any of your layers has multiple inputs or multiple outputs
- You need to do layer sharing
- You want non-linear topology (e.g. a residual connection, a multi-branch
model)
参考:
https://keras.io/guides/preprocessing_layers/
https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/structured_data/preprocessing_layers.ipynb?hl=ar-bh#scrollTo=6Yrj-_pr6jyL