CPU上での実行

Neuronコアによる推論高速化を体験する前に、一度CPU上でのBERTモデルを使用した自然言語処理推論を実行してみます。

Inf1 インスタンスには、第2世代 Intel® Xeon® スケーラブルプロセッサが搭載されています。inf1.2xlargeに搭載されるvCPU数は8です。

Step 1. Transformerをインストール

HuggingFace’s transformers パッケージをインストールします。.

pip install transformers==4.6.0

Step 2. 推論実行 Python スクリプトを作成

以下の内容でinfer_bert_cpu.py というファイル名の推論実行 Python スクリプトを作成します。

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Build tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc", return_dict=False)

# Setup some example inputs
sequence_0 = "The company HuggingFace is based in New York City"
sequence_1 = "Apples are especially bad for your health"
sequence_2 = "HuggingFace's headquarters are situated in Manhattan"

paraphrase = tokenizer.encode_plus(sequence_0, sequence_2, max_length=128, padding='max_length', truncation=True, return_tensors="pt")
not_paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=128, padding='max_length', truncation=True, return_tensors="pt")

# Convert example inputs to a format that is compatible with TorchScript tracing
example_inputs_paraphrase = paraphrase['input_ids'], paraphrase['attention_mask'], paraphrase['token_type_ids']
example_inputs_not_paraphrase = not_paraphrase['input_ids'], not_paraphrase['attention_mask'], not_paraphrase['token_type_ids']

# Verify the TorchScript works on both example inputs
paraphrase_classification_logits = model(*example_inputs_paraphrase)
not_paraphrase_classification_logits = model(*example_inputs_not_paraphrase)

classes = ['not paraphrase', 'paraphrase']
paraphrase_prediction = paraphrase_classification_logits[0][0].argmax().item()
not_paraphrase_prediction = not_paraphrase_classification_logits[0][0].argmax().item()
print('BERT says that "{}" and "{}" are {}'.format(sequence_0, sequence_2, classes[paraphrase_prediction]))
print('BERT says that "{}" and "{}" are {}'.format(sequence_0, sequence_1, classes[not_paraphrase_prediction]))

Step 2. 推論スクリプトを実行

CPU上でのモデル推論を実行するために、推論実行スクリプトinfer_bert_cpu.py を実行します。

python infer_bert_cpu.py

次の結果が取得されます。

BERT says that "The company HuggingFace is based in New York City" and "HuggingFace's headquarters are situated in Manhattan" are paraphrase
BERT says that "The company HuggingFace is based in New York City" and "Apples are especially bad for your health" are not paraphrase