본문 바로가기

NLP

(111)
[huggingface🤗] Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA 이 글은 huggingface blog 의 'Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA' 이라는 글을 의역한 것입니다. https://huggingface.co/blog/4bit-transformers-bitsandbytes Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA LLMs are known to be large, and running or training t..
[huggingface🤗] 8-bit Matrix Multiplication for transformers 해당 포스팅은 학습 차원에서 아래 글을 의역하여 작성합니다. 도입부와 배경은 가볍게 다루되, 이해해야 할 부분은 최대한 자세히 담아보고자 합니다. https://huggingface.co/blog/hf-bitsandbytes-integration A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using Hugging Face Transformers, Accelerate and bitsan..
[논문이해] Let's verify step by step 논문명: Let's verify step by step 논문 링크: https://arxiv.org/abs/2305.20050 Let's Verify Step by Step In recent years, large language models have greatly improved in their ability to perform complex multi-step reasoning. However, even state-of-the-art models still regularly produce logical mistakes. To train more reliable models, we can turn either to outc arxiv.org 요약 언어 모델이 뛰어나지만, 아직도 논리적인 실수를 함 최근..
[용어정리] MSE loss vs Cross Entropy loss 코드에서는 무슨 차이가 있어? 아마 이런 글을 찾은 사람들은 구현할 때 이 둘의 차이가 크게 없다고 보고 궁금해서 찾아봤을 것이다. 나도 그렇다. 대강 이런 생각을 했었다. 어차피 둘 다 정답을 향해 수렴하는데? 학습은 뭘 쓰든 잘 될 것 같은데? 분포적으로는 이해를 했지만 찾아보면 이런 이야기를 한다. Gaussian 분포를 미분하면, MSE loss 가 나온다 → 그러므로 연속 확률 변수에 쓰자 Categorical 분포를 미분하면, CE loss 가 나온다 → 그러므로 이산 확률 변수에 쓰자 (여기에 정리가 잘 되어 있어서 참고하시길.) 좋다. 원리적으로는 이게 맞지. 그래서 성능도 그게 더 좋아? 근데 성능 면에서도 더 좋은 건가? 그에 대해서는 말이 없었다. 그냥 Gaussian 분포는 연속적인..
[논문이해] Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling 논문명: Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling 논문링크: https://arxiv.org/abs/2102.06183 Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling The canonical approach to video-and-language learning (e.g., video question answering) dictates a neural model to learn from offline-extracted dense video features from vision models and text features fro..
[논문이해] CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval 논문명: CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval 논문링크: https://arxiv.org/abs/2104.08860 CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval Video-text retrieval plays an essential role in multi-modal research and has been widely used in many real-world web applications. The CLIP (Contrastive Language-Image Pre-training), an image-language pre-t..
[dataset] Korean Information Retrieval Dataset Name Task type Train Dev Test characteristic Link Miracl IR 868 213 - Multilingual dataset https://huggingface.co/datasets/miracl/miracl KLUE QA 17554 5841 - Korean version of GLUE https://github.com/KLUE-benchmark/KLUE KorQUAD v2 QA 83486 10165 - Korean version of SQUAD https://korquad.github.io/ 뉴스기사 기계독해데이터 MRC 200K AI hub https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&ai..
[논문이해] locally typical sampling 논문명: Locally Typical Sampling 논문링크: https://arxiv.org/abs/2202.00666 Locally Typical Sampling Today's probabilistic language generators fall short when it comes to producing coherent and fluent text despite the fact that the underlying models perform well under standard metrics, e.g., perplexity. This discrepancy has puzzled the language generation arxiv.org 수학적 증명과 이해는 건들지 않는다 논문에 수학적인 증명과 이해가 ..