After I finished modifying the code to work in a "Unix-like" "one-shot" fashion, I ran into an issue during testing (that the standard C implementation also shares but manages to hide better) — the numbers it generated stopped being random, but only on macOS. It turns out that it isn't actually specified how rand is implemented under the hood, and the macOS implementation doesn't mix the bits of the seed around very well until you generate at least one random number.
So, where is Compressing model coming from? I can search for it in the transformers package with grep \-r "Compressing model" ., but nothing comes up. Searching within all packages, there’s four hits in the vLLM compressed_tensors package. After some investigation that lets me narrow it down, it seems like it’s likely coming from the ModelCompressor.compress_model function as that’s called in transformers, in CompressedTensorsHfQuantizer._process_model_before_weight_loading.
,详情可参考易翻译
跳过 - 放行可信IP或服务,
category="episodic"