You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 2.1 KiB

11 years ago
11 years ago
11 years ago
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
  1. Megawarc
  2. ========
  3. megawarc is useful if you have .tar full of .warc.gz files and you really want one big .warc.gz. With megawarc you get your .warc.gz, but you can still restore the original .tar.
  4. The megawarc tool looks for .warc.gz in the .tar file and creates three files, the megawarc:
  5. * FILE.warc.gz is the concatenated .warc.gz
  6. * FILE.tar contains any non-warc files from the .tar
  7. * FILE.json.gz contains metadata
  8. You need the JSON file to reconstruct the original .tar from
  9. the .warc.gz and .tar files. The JSON file has the location
  10. of every file from the original .tar file.
  11. Metadata format
  12. ---------------
  13. One line with a JSON object per file in the .tar.
  14. ```js
  15. {
  16. "target": {
  17. "container": "warc" or "tar", // where is this file?
  18. "offset": number, // where in the tar/warc does this file start?
  19. // for files in the tar this includes the tar header, which is
  20. // copied to the tar.
  21. "size": size // where does this file end?
  22. // for files in the tar, this includes the padding to 512 bytes
  23. },
  24. "src_offsets": {
  25. "entry": number, // where is this file in the original tar?
  26. "data": number, // where does the data start? entry+512
  27. "next_entry": number // where does the next tar entry start
  28. },
  29. "header_fields": {
  30. ... // parsed fields from the tar header
  31. },
  32. "header_string": string // the tar header for this entry
  33. }
  34. ```
  35. Usage
  36. -----
  37. ```
  38. megawarc convert FILE
  39. ```
  40. Converts the tar file (containing .warc.gz files) to a megawarc.
  41. It creates FILE.warc.gz, FILE.tar and FILE.json.gz from FILE.
  42. ```
  43. megawarc pack FILE INFILE_1 [[INFILE_2] ...]
  44. ```
  45. Creates a megawarc with basename FILE and recursively adds the
  46. given files and directories to it, as if they were in a tar file.
  47. It creates FILE.warc.gz, FILE.tar and FILE.json.gz.
  48. ```
  49. megawarc restore FILE
  50. ```
  51. Converts the megawarc back to the original tar.
  52. It reads FILE.warc.gz, FILE.tar and FILE.json.gz to make FILE.