[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence

๐ŸŠ ๋…ผ๋ฌธ ๋งํฌ: https://arxiv.org/pdf/1903.09588.pdf

Sun, C., Huang, L., & Qiu, X. (2019). Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence. arXiv preprint arXiv:1903.09588.


   ๊ธฐ์กด ๊ฐ์„ฑ๋ถ„์„ ๊ธฐ๋ฒ•์€ ๋ฆฌ๋ทฐ ํ…์ŠคํŠธ์˜ ์ „๋ฐ˜์ ์ธ ๊ฐ์„ฑ (overall sentiment)๋งŒ์„ ์˜ˆ์ธกํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ์†Œ๋น„์ž๊ฐ€ ๊ฐ ์†์„ฑ, entity์— ๋Œ€ํ•ด ๋‹ค์–‘ํ•˜๊ฒŒ ํ‘œํ˜„ํ•œ ๊ฐ์„ฑ์„ ํ’๋ถ€ํ•˜๊ฒŒ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ•œ๋‹ค. Aspect-based-sentiment anlaysis (ABSA)๋Š” ํ…์ŠคํŠธ ๋‚ด ๋“ฑ์žฅํ•˜๋Š” ์—ฌ๋Ÿฌ ์†์„ฑ๋“ค์— ๋Œ€ํ•œ ๊ฐ์„ฑ์„ ์ง์ ‘์ ์œผ๋กœ ํ•™์Šตํ•˜๊ณ  ์˜ˆ์ธกํ•˜๋Š” ๊ธฐ๋ฒ•์ด๋‹ค. ABSA์˜ ๋ชฉํ‘œ๋Š” ๋ฆฌ๋ทฐ ๋‚ด์—์„œ ๋“ฑ์žฅํ•˜๋Š” fine-grained polarity๋ฅผ ๊ฐ aspect๋ณ„๋กœ ๋งคํ•‘ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋Ÿฌํ•œ ์ž‘์—…์„ ํ†ตํ•ด ์—ฌ๋Ÿฌ ๋ฆฌ๋ทฐ์— ๊ฑธ์ณ ์œ ์ €๊ฐ€ ๊ฐ aspect์— ํ‘œํ˜„ํ•œ aggregated sentiments๋ฅผ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

์œ„ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด Location aspect์— ๋Œ€ํ•ด์„œ๋Š” ๊ธ์ •์ ์œผ๋กœ ํ‰๊ฐ€ํ–ˆ์ง€๋งŒ, Food aspect์— ๋Œ€ํ•ด์„œ๋Š” ๋ถ€์ •์ ์œผ๋กœ ํ‰๊ฐ€ํ•œ ๋ฆฌ๋ทฐ๋Š” overall sentiment๋งŒ์œผ๋กœ ํ‘œํ˜„๋˜๊ธฐ ์ œํ•œ๋œ๋‹ค. ๋”ฐ๋ผ์„œ ABSA๋ฅผ ํ†ตํ•ด ๋ณด๋‹ค ํ’๋ถ€ํ•œ ๊ฐ์„ฑ์„ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ auxiliary sentence๋ฅผ ๋งŒ๋“ค์–ด์„œ ABSA๋ฅผ BERT ๋ชจ๋ธ์˜ sentence-pair classification task์— ์‘์šฉํ•œ๋‹ค. ์ด๋Š” ๋งˆ์น˜ QA(question answering) ํƒœ์Šคํฌ๋‚˜ NLI(natural language inference) ํƒœ์Šคํฌ์™€ ๋น„์Šทํ•˜๋‹ค๊ณ  ๋…ผ๋ฌธ์€ ์–ธ๊ธ‰ํ•œ๋‹ค.

 

1. Introduction


  ABSA๋Š” ํŠน์ •ํ•œ aspect ๋ณ„๋กœ fine-grained polarity๋ฅผ ๋งคํ•‘ํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ด๋‹ค. ์ด๋ ‡๊ฒŒ ๊ฐ ์œ ์ €๊ฐ€ ์ž‘์„ฑํ•œ ๋ฆฌ๋ทฐ๋“ค์—์„œ aspect๋ณ„๋กœ ์ ์ˆ˜๋ฅผ ์ข…ํ•ฉํ•˜์—ฌ aggregated sentiments ์ ์ˆ˜๋ฅผ ํ‰๊ฐ€ ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ œํ’ˆ, ์„œ๋น„์Šค์— ๋Œ€ํ•œ more granular understanding์„ ์–ป์„ ์ˆ˜๊ฐ€ ์žˆ๋‹ค. ๊ธฐ์กด ABSA ๊ธฐ๋ฒ•์€ feature engineering์— ํฌ๊ฒŒ ์˜์กดํ•˜์˜€๋‹ค. ์ตœ๊ทผ์—๋Š” ELMo, GPT, BERT ๋ชจ๋ธ๋“ค์˜ ๋“ฑ์žฅ๊ณผ ํ•จ๊ป˜, ์ด๋Ÿฌํ•œ feature engineering์˜ ์ˆ˜๊ณ ๊ฐ€ ํฌ๊ฒŒ ์ค„์—ˆ๋‹ค. ํŠนํžˆ BERT ๋ชจ๋ธ์€ QA์™€ NLI ํƒœ์Šคํฌ์—์„œ ํฐ ํšจ๊ณผ๋ฅผ ๋ณด์˜€๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ BERT model์„ ์‚ฌ์šฉํ•œ ABSA ํƒœ์Šคํฌ๋Š” ํฐ ์ง„์ „์ด ์—†๋Š” ์ƒํ™ฉ์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ๋ฆฌ๋ทฐ ํ…์ŠคํŠธ๋กœ๋ถ€ํ„ฐ auxiliary sentence๋ฅผ ๋งŒ๋“ค์–ด์„œ ABSA๋ฅผ sentence-pair classification task๋กœ ์ ์šฉํ•œ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ pre-trained BERT model์„ fine-tuningํ•˜์—ฌ SOTA๋ฅผ ๋‹ฌ์„ฑํ–ˆ๋‹ค. ๋…ผ๋ฌธ์˜ contribution์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

  • (T)ABSA๋ฅผ sentence-pair classification tast๋กœ ์ „ํ™˜ํ•œ ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ•œ๋‹ค,
  • pre-trained๋œ BERT ๋ชจ๋ธ์„ fine-tuningํ•˜์—ฌ SentiHood์™€ SemEval-2014 Task4 ๋ฐ์ดํ„ฐ์…‹์—์„œ SOTA๋ฅผ ๋‹ฌ์„ฑํ–ˆ๋‹ค.

 

2. Methodology


2.1 Task descroption

TABSA์—์„œ sentence๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด words์˜ ์‹œ๋ฆฌ์ฆˆ๋กœ ์ด๋ฃจ์–ด์ง„๋‹ค.$ \begin{Bmatrix} w_1, ..., w_m
\end{Bmatrix} $

์ด ์ค‘ ๋ช‡๊ฐ€์ง€ ๋‹จ์–ด๋“ค์€ $(w_{i1}, ... w_{ik})$  ๋ฏธ๋ฆฌ ์ •์˜๋œ ํƒ€๊ฒŸ ๋‹จ์–ด์ด๋‹ค. $(t_1, ... t_k)$

 

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” 3๊ฐ€์ง€ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ๋ถ„๋ฅ˜ ๋ฌธ์ œ๋กœ TABSA๋ฅผ ์ •์˜ํ•œ๋‹ค. sentence $s$์™€ target entities $T$, ๊ณ ์ •๋œ aspect set .$A =  \begin{Bmatrix} general, price, transit-loacation, safety \end{Bmatrix} $ ์ด ์ฃผ์–ด์กŒ์„ ๋•Œ, ๋ชจ๋“  target-aspect ํŽ˜์–ด์— ๋Œ€ํ•ด์„œ $ y \in \begin{Bmatrix} positive, negative, none \end{Bmatrix}$์„ ์˜ˆ์ธกํ•œ๋‹ค.

 

์œ„ ์˜ˆ์‹œ์—์„œ (LOCATION2, price) ํŽ˜์–ด์˜ y ๊ฐ’์€ negative, (LOCATION1, price) ํŽ˜์–ด๋Š” none์ด๋‹ค.

2.2 Construction of the auxiluary sentence

  ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์•„๋ž˜ 4๊ฐ€์ง€ TABSA ํƒœ์Šคํฌ๋ฅผ sentence pair classification task๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค.

S.P๋Š” sentiment polarity, w/o๋Š” without, w/๋Š” with๋ฅผ ๋œปํ•œ๋‹ค.

Sentences for QA-M

QA-M๋Š” target-aspect ํŽ˜์–ด์— ๋Œ€ํ•ด์„œ question์„ ๋งŒ๋“œ๋Š” ์ž‘์—…์ด๋‹ค. ์˜ˆ๋ฅผ๋“ค์–ด (LOCATION1, safety) ํŽ˜์–ด๊ฐ€ ์žˆ์„๋•Œ ์ƒ์„ฑ๋˜๋Š” ์งˆ๋ฌธ์€ "what do you think of the safety of location - 1 ?"๊ณผ ๊ฐ™๋‹ค. ์ด๋•Œ ์งˆ๋ฌธ์˜ ํฌ๋งท์„ ๋ชจ๋‘ ๊ฐ™๋‹ค.

 

Sentences for NLI-M

NLI-M๋Š” target-aspect ํŽ˜์–ด์— ๋Œ€ํ•ด์„œ strickํ•œ ๋ฌธ์žฅ์„ ๋งŒ๋“ค์ง€๋Š” ์•Š๋Š”๋‹ค. ์ƒ์„ฑ๋˜๋Š” ๋ฌธ์žฅ์€ ํ‘œ์ค€ํ™”๋œ ๋ฌธ์žฅ์ด ์•„๋‹ˆ๋ฉฐ, ๊ฐ„๋‹จํ•œ psudo-sentence์ด๋‹ค. ์˜ˆ์‹œ๋กœ (LOCATION1, safety) ํŽ˜์–ด์— ๋Œ€ํ•ด์„œ "location - 1 - safety" ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฌธ์žฅ์ด ์ƒ์„ฑ๋œ๋‹ค.

 

Sentences for QA-B

QA-B์—์„œ๋Š” label ์ •๋ณด๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ TABSA๋ฅผ binary classification ๋ฌธ์ œ๋กœ ์ „ํ™˜ํ•œ๋‹ค. ์ด๋•Œ $label \in (yes, no)$์ด๋‹ค.

์ด๋•Œ target-aspect ํŽ˜์–ด์— ๋Œ€ํ•ด์„œ 3๊ฐ€์ง€ ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•œ๋‹ค. ์ƒ์„ฑ๋˜๋Š” ๋ฌธ์žฅ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  • “the polarity of the aspect safety of location - 1 is positive”
  • “the polarity of the aspect safety of location - 1 is negative”
  • “the polarity of the aspect safety of location - 1 is none”.

Sentences for NLI-B

NLI-B์™€ QA-B์˜ ์ฐจ์ด๋Š” peudo-sentence๋กœ ๋ณ€ํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์˜ˆ์‹œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  • “location - 1 - safety - positive”
  • “location - 1 - safety - negative”
  • “location - 1 - safety - none”.

 

์œ„ 4๊ฐ€์ง€ ๋ฌธ์žฅ ์ƒ์„ฑ ๊ณผ์ •์„ ๊ฑฐ์ณ TABSA๋ฅผ single sentence classification์—์„œ sentence pair classification ๋ฌธ์ œ๋กœ ์ „ํ™˜ํ•œ๋‹ค. sentence pair classification์˜ ์˜ˆ์‹œ๋Š” ์•„๋ž˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™๋‹ค.

 

2.3 Fine-tuning pre-training BERT

TABSA task์— ์ ํ•ฉํ•˜๋„๋ก pre-trained BERT ๋ชจ๋ธ์„ fine-tuningํ•œ๋‹ค. ์ด๋•Œ BERT ๋ชจ๋ธ์˜ ์ฒซ๋ฒˆ์งธ ํ† ํฐ์˜ final hidden state ํ† ํฐ์„ ์ธํ’‹์œผ๋กœ ๊ฐ„๋‹จํžˆ classification layer๋งŒ์„ ์ถ”๊ฐ€ํ•˜์—ฌ softmax๋ฅผ ํ†ตํ•ด ๊ฐ ํด๋ž˜์Šค๋ณ„ ํ™•๋ฅ ์„ ๊ตฌํ•œ๋‹ค. 

 

3. Experiments


3.1 Datasets

  ๋ชจ๋ธ ํ‰๊ฐ€์— ์‚ฌ์šฉ๋œ SentiHood(Saeidi et al., 2016) ๋ฐ์ดํ„ฐ์…‹์€ 5,215๊ฐœ์˜ ๋ฌธ์žฅ๊ณผ 3,862๊ฐœ์˜ ๊ฐœ๋ณ„ ํƒ€๊ฒŸ์œผ๋กœ ์ด๋ฃจ์–ด์ ธ์žˆ๋‹ค. ๋‚˜๋จธ์ง€๋Š” ๋‹ค์ค‘ ํƒ€๊ฒŸ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ๊ฐ ๋ฌธ์žฅ์€ target-aspcet pair {t,a}์˜ ๋ฆฌ์ŠคํŠธ์™€ sentiment polarity์ธ y๊ฐ’์„ ๊ฐ€์ง„๋‹ค. ๊ฒฐ๊ตญ sentence s์™€ target t๊ฐ€ ์ฃผ์—ฌ์กŒ์„ ๋•Œ, t์— ๋Œ€ํ•ด ์–ธ๊ธ‰๋œ aspect a๋ฅผ ์ฐพ๊ณ , sentiment polarity์ธ y๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ด๋‹ค.

๋˜ํ•œ SemEval-2014 Task 4 (Pontiki et al., 2014)๋Š” ABSA์—์„œ ์œ ๋ช…ํ•œ ๋ฐ์ดํ„ฐ ์…‹์ด๋‹ค.

 

3.2 Hyperparameters

  • Model: BERT-base
  • The number of  Transformer blocks: 12
  • The hidden layer size: 768
  • The number of  self-attention heads: 12
  • The total number of parameters for pre-trained model: 110M
  • The dropout probability: 0.1
  • The number of epochs: 4
  • The initail learning rate: 2e-5
  • The batch size: 24

3.3 Exp & Results

Semeval-2014 task4 Subtask3: Aspect Category Detection์— ๋Œ€ํ•ด์„œ ๋ฒค์น˜๋งˆํฌ ๋ชจ๋ธ๊ณผ ๋น„๊ตํ•˜์—ฌ ์ œ์‹œํ•œ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์€ ์•„๋ž˜์™€ ๊ฐ™์ด ํ–ฅ์ƒ๋˜์—ˆ๋‹ค.

Semeval-2014 task4 Subtask3: Aspect Category Polarity์— ๋Œ€ํ•ด์„œ accuracy (%)๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ํ–ฅ์ƒ๋˜์—ˆ๋‹ค.

4. Discussion

5. Conclusion

๋…ผ๋ฌธ ์ฐธ์กฐ

๋ฐ˜์‘ํ˜•