1 article found
How a new attention mechanism enables 8x longer context lengths while cutting VRAM requirements in half for LLM training on consumer hardware.