본문 바로가기

분류 전체보기

(116)

[논문이해] DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models 논문명: DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models논문 링크: https://arxiv.org/abs/2309.03883 DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language ModelsDespite their impressive capabilities, large language models (LLMs) are prone to hallucinations, i.e., generating content that deviates from facts seen during pretraining. We propose a simple..

[논문이해] Contrastive Decoding: Open-ended Text Generation as Optimization 논문명: Contrastive Decoding: Open-ended Text Generation as Optimization논문 링크: https://arxiv.org/abs/2210.15097 Contrastive Decoding: Open-ended Text Generation as OptimizationGiven a language model (LM), maximum probability is a poor decoding objective for open-ended generation, because it produces short and repetitive text. On the other hand, sampling can often produce incoherent text that drifts..

[용어정리] self-BLEU (BLEU 를 잘모르신다면, 먼저 보셔야 합니다.) 핵심 문제점언어 모델은 보통 beam search 를 사용해서 K개의 답변을 생성함그런데 그 답변이 죄다 비슷하면 사실 여러 개를 생성하는 의미가 없음즉 다양한 답변을 생성할수록 좋은 모델로 볼 수 있음그러나 답변이 얼마나 다양하게 생성되는지에 대해 평가하는 지표가 없음 해결책목적: K개의 답변을 생성하게 해서 다양성을 BLEU를 활용해서 측정하자방법: K개의 답변끼리 BLEU score 를 각각 측정해서 평균을 내자 예시GPT3에게 답변을 5개 생성하도록 했다 (A, B, C, D, E)모든 조합별로 BLEU score를 구한다.모든 조합: [(A, B), (A, C), (A, D), (A, E), (B, C), ...... (D, E)]조합의 개수는..

[논문이해] ConvGQR: Generative Query Reformulation for Conversational Search 논문명: ConvGQR: Generative Query Reformulation for Conversational Search논문 링크: https://arxiv.org/abs/2305.15645 ConvGQR: Generative Query Reformulation for Conversational SearchIn conversational search, the user's real search intent for the current turn is dependent on the previous conversation history. It is challenging to determine a good search query from the whole conversation context. To avoi..

[error] 파라미터 제대로 frozen 하세요 저처럼 미친 짓하시면 안 됩니다.... 잘못된 예시model.requires_grad = False 문제는 위 코드를 넣고 돌려도 에러가 발생하지 않아 GPU 사용량이 달라지거나 파라미터를 따로 확인하지 않는 한, 제대로 된 예시for param in model.parameters(): param.requires_grad = False 어떤 블로그에서 보고 그대로 긁어서 했었는데 그 블로그가 잘못된 것 같아요. 물론 지금 검색해보니 좋은 자료가 많이 나오네요. 제 불찰입니다. 여하튼 반복하지 않기를 바라며 강조해봅니다.

[논문이해] LORA-FA: MEMORY-EFFICIENT LOW-RANK ADAPTATION FOR LARGE LANGUAGE MODELS FINE-TUNING 논문명: LORA-FA: MEMORY-EFFICIENT LOW-RANK ADAPTATION FOR LARGE LANGUAGE MODELS FINE-TUNING논문 링크: https://arxiv.org/abs/2308.03303 LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuningThe low-rank adaptation (LoRA) method can largely reduce the amount of trainable parameters for fine-tuning large language models (LLMs), however, it still requires expensive activation m..

[Insight] Successful language model evals 글 제목: Successful language model evals글 링크: https://www.jasonwei.net/blog/evals Successful language model evals — Jason WeiEverybody uses evaluation benchmarks (“evals”), but I think they deserve more attention than they are currently getting. Evals are incentives for the research community, and breakthroughs are often closely linked to a huge performance jump on some evalwww.jasonwei.net 거대언어모델의..

[논문이해] REPLUG: Retrieval-Augmented Black-Box Language Models 논문명: REPLUG: Retrieval-Augmented Black-Box Language Models논문링크: https://arxiv.org/abs/2301.12652 REPLUG: Retrieval-Augmented Black-Box Language ModelsWe introduce REPLUG, a retrieval-augmented language modeling framework that treats the language model (LM) as a black box and augments it with a tuneable retrieval model. Unlike prior retrieval-augmented LMs that train language models with special ..

이전 1 2 3 4 5 ··· 15 다음

티스토리툴바