AIFFEL Life

[Day59] 임베딩내 편향성 알아보자

nevermet 2020. 12. 25. 17:43

임베딩내 편향성이라는 것을 처음 들었을 때는 무슨 뜻인지 잘 이해하지 못했지만, 알아보니 AI가 학습하는 자연어 데이타가 편향성을 가지고 있다면, AI의 동작이 편향적일 수 있다는 그런 내용이었습니다. 읽어보면 좋을 내용들을 공유합니다.

1. Predictive policing algorithms are racist. They need to be dismantled.

www.technologyreview.com/2020/07/17/1005396/predictive-policing-algorithms-racist-dismantled-machine-learning-bias-criminal-justice/

 

Predictive policing algorithms are racist. They need to be dismantled.

Lack of transparency and biased training data mean these tools are not fit for purpose. If we can’t fix them, we should ditch them.

www.technologyreview.com

2. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings (paper)

arxiv.org/pdf/1607.06520.pdf

3. Semantics derived automatically from language corpora necessarily contain human biases (paper)

arxiv.org/pdf/1608.07187.pdf

4. Project Implicit

implicit.harvard.edu/implicit/education.html

 

Education

Overview People don’t always say what’s on their minds. One reason is that they are unwilling. For example, someone might report smoking a pack of cigarettes per day because they are embarrassed to admit that they smoke two. Another reason is that they

implicit.harvard.edu

5. Text Embedding Models Contain Bias. Here's Why That Matters.

developers.googleblog.com/2018/04/text-embedding-models-contain-bias.html

 

Text Embedding Models Contain Bias. Here's Why That Matters.

Human data encodes human biases by default. Being aware of this is a good start, and the conversation around how to handle it is ongoing. At Google, we are actively researching unintended bias analysis and mitigation strategies because we are committed to

developers.googleblog.com

6. Google WEAT (data)

drive.google.com/u/0/uc?id=0B7XkCwpI5KDYNlNUTTlSS21pQmM&export=download

 

Google 드라이브 - 바이러스 검사 경고

Google 드라이브는 이 파일에 대해 바이러스를 검사할 수 없습니다. GoogleNews-vectors-negative300.bin.gz(1.5G) 파일이 너무 커서 Google에서 바이러스를 검사할 수 없습니다. 그래도 파일을 다운로드하시겠

drive.google.com

7. Synosis data

aiffelstaticprd.blob.core.windows.net/media/documents/synopsis.zip

8. TF-IDF(Term Frequency-Inverse Document Frequency)

wikidocs.net/31698

 

위키독스

온라인 책을 제작 공유하는 플랫폼 서비스

wikidocs.net