본문 바로가기

전체 글

(116)
[논문이해] LARGER LANGUAGE MODELS DO IN-CONTEXT LEARNING DIFFERENTLY 논문명: LARGER LANGUAGE MODELS DO IN-CONTEXT LEARNING DIFFERENTLY https://arxiv.org/abs/2303.03846 Larger language models do in-context learning differently We study how in-context learning (ICL) in language models is affected by semantic priors versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with semantically-unrelated labels-across various model families (GP..
[논문이해] A Survey on In-context Learning 논문명: A Survey on In-context Learning 논문 링크: https://arxiv.org/abs/2301.00234 A Survey on In-context Learning With the increasing ability of large language models (LLMs), in-context learning (ICL) has become a new paradigm for natural language processing (NLP), where LLMs make predictions only based on contexts augmented with a few examples. It has been a new tren arxiv.org 논문 선정 이유 비교적 최근인 23년 6..
[huggingface🤗] Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA 이 글은 huggingface blog 의 'Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA' 이라는 글을 의역한 것입니다. https://huggingface.co/blog/4bit-transformers-bitsandbytes Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA LLMs are known to be large, and running or training t..
[huggingface🤗] 8-bit Matrix Multiplication for transformers 해당 포스팅은 학습 차원에서 아래 글을 의역하여 작성합니다. 도입부와 배경은 가볍게 다루되, 이해해야 할 부분은 최대한 자세히 담아보고자 합니다. https://huggingface.co/blog/hf-bitsandbytes-integration A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using Hugging Face Transformers, Accelerate and bitsan..
[논문이해] Let's verify step by step 논문명: Let's verify step by step 논문 링크: https://arxiv.org/abs/2305.20050 Let's Verify Step by Step In recent years, large language models have greatly improved in their ability to perform complex multi-step reasoning. However, even state-of-the-art models still regularly produce logical mistakes. To train more reliable models, we can turn either to outc arxiv.org 요약 언어 모델이 뛰어나지만, 아직도 논리적인 실수를 함 최근..
[용어정리] MSE loss vs Cross Entropy loss 코드에서는 무슨 차이가 있어? 아마 이런 글을 찾은 사람들은 구현할 때 이 둘의 차이가 크게 없다고 보고 궁금해서 찾아봤을 것이다. 나도 그렇다. 대강 이런 생각을 했었다. 어차피 둘 다 정답을 향해 수렴하는데? 학습은 뭘 쓰든 잘 될 것 같은데? 분포적으로는 이해를 했지만 찾아보면 이런 이야기를 한다. Gaussian 분포를 미분하면, MSE loss 가 나온다 → 그러므로 연속 확률 변수에 쓰자 Categorical 분포를 미분하면, CE loss 가 나온다 → 그러므로 이산 확률 변수에 쓰자 (여기에 정리가 잘 되어 있어서 참고하시길.) 좋다. 원리적으로는 이게 맞지. 그래서 성능도 그게 더 좋아? 근데 성능 면에서도 더 좋은 건가? 그에 대해서는 말이 없었다. 그냥 Gaussian 분포는 연속적인..
[논문이해] Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling 논문명: Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling 논문링크: https://arxiv.org/abs/2102.06183 Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling The canonical approach to video-and-language learning (e.g., video question answering) dictates a neural model to learn from offline-extracted dense video features from vision models and text features fro..
[논문이해] CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval 논문명: CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval 논문링크: https://arxiv.org/abs/2104.08860 CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval Video-text retrieval plays an essential role in multi-modal research and has been widely used in many real-world web applications. The CLIP (Contrastive Language-Image Pre-training), an image-language pre-t..