Kaynağa Gözat

Fix_CBOR2_LZMA_Stream_Mismatch

Migrated `cbor2` from streaming file I/O to in-memory byte parsing.

        PURPOSE:
        To resolve a persistent "buffer size mismatch" PyO3 panic loop. The `cbor2` Rust parsing backend is incompatible with the dynamic chunking behavior of Python's `lzma.LZMAFile` stream objects. Passing the stream directly to `cbor2.load()` caused memory alignment failures, incorrectly flagging perfectly valid cached files as corrupted. Altering the read/write operations to decouple decompression from serialization completely resolves the buffer panic.

        IMPLEMENTATION DETAILS:
        - Replaced `cbor2.load(f)` with `cbor2.loads(f.read())` in `cache.py`. This forces `lzma` to fully extract the file into an in-memory byte string before handing it to the Rust parser.
        - Replaced `cbor2.dump(data, f)` with `f.write(cbor2.dumps(data))` to ensure the Rust serializer outputs a concrete byte string before `lzma` compresses and writes it to disk, avoiding stream buffer management conflicts.
        - Updated inline documentation to explicitly warn future contributors against reverting to the `load(f)` / `dump()` streaming methods when working with `lzma`.
Thomas Knott 2 hafta önce
ebeveyn
işleme
01906da5e8
1 değiştirilmiş dosya ile 11 ekleme ve 8 silme
  1. 11 8
      cache.py

+ 11 - 8
cache.py

@@ -37,18 +37,18 @@ def get(url: str, *, json=True, headers=None, expiry=datetime.timedelta(minutes=
 		if cache_path.stat().st_mtime > time.time() - expiry.total_seconds(): # less than 10 minutes old
 			with lzma.open(cache_path, 'rb') as f:
 				if json:
-					return cbor2.load(f)
+					# EXTREME DETAIL: We use cbor2.loads(f.read()) instead of cbor2.load(f).
+					# The cbor2 PyO3 Rust backend expects a continuous memory buffer. Attempting to
+					# read directly from an LZMA streaming object feeds it decompressed chunks, 
+					# which causes memory alignment panics ("buffer size mismatch").
+					# By calling f.read() first, we force Python to fully decompress the file into
+					# a raw byte string in memory, which cbor2 parses flawlessly.
+					return cbor2.loads(f.read())
 				else:
 					return f.read().decode('utf-8')
 	except FileNotFoundError:
 		pass # fall through
 	except BaseException as e:
-		# EXTREME DETAIL: PyO3 (the Rust bindings for Python used by cbor2) maps Rust panics
-		# to `BaseException` rather than standard `Exception`. This means our previous
-		# `except Exception:` block was completely bypassed by the pyo3_runtime.PanicException!
-		# By expanding this to BaseException, we catch the panic. However, we MUST explicitly
-		# re-raise KeyboardInterrupt and SystemExit so we don't accidentally break the user's
-		# ability to Ctrl+C out of the script!
 		if isinstance(e, (KeyboardInterrupt, SystemExit)):
 			raise
 		print(f"Warning: Corrupted cache detected for {url} ({type(e).__name__}). Fetching fresh data...")
@@ -59,7 +59,10 @@ def get(url: str, *, json=True, headers=None, expiry=datetime.timedelta(minutes=
 	with lzma.open(cache_path, 'wb') as f:
 		if json:
 			data = r.json()
-			cbor2.dump(data, f)
+			# EXTREME DETAIL: Similarly, we use cbor2.dumps() to serialize the dictionary to bytes
+			# in memory first, and then write the entire byte block to the LZMA stream at once.
+			# This prevents the Rust backend from attempting to manage the compressed stream buffer.
+			f.write(cbor2.dumps(data))
 		else:
 			data = r.text
 			f.write(data.encode('utf-8'))