attention score (1) 썸네일형 리스트형 [논문이해] Diffuser: Efficient Transformers with Multi-hop Attention Diffusion for Long Sequences 논문명: Diffuser: Efficient Transformers with Multi-hop Attention Diffusion for Long Sequences 논문링크: https://arxiv.org/abs/2210.11794 Diffuser: Efficient Transformers with Multi-hop Attention Diffusion for Long Sequences Efficient Transformers have been developed for long sequence modeling, due to their subquadratic memory and time complexity. Sparse Transformer is a popular approach to improving t.. 이전 1 다음