Hadoop Archives *.har文件解析备忘

作者：从前慢现在也慢 | 2024-06-14 01:41:10

踩

*.har

%2F dir 1378884867194+493+cdh4+supergroup 0 0 123.txt 2013 3.txt %2F2013 dir 1378884762156+493+cdh4+supergroup 0 0 09 %2F2013%2F09%2F10%2F1.txt file part-0 12 12 1378883181096+420+cdh4+supergroup %2F123.txt file part-0 0 12 1378866591533+420+cdh4+supergroup %2F2013%2F09%2F10 dir 1378884856608+493+cdh4+supergroup 0 0 1.txt %2F2013%2F09%2F11 dir 1378884867194+493+cdh4+supergroup 0 0 2.txt %2F2013%2F09 dir 1378884821792+493+cdh4+supergroup 0 0 10 11 %2F2013%2F09%2F11%2F2.txt file part-0 24 12 1378883185898+420+cdh4+supergroup %2F3.txt file part-0 36 12 1378883191541+420+cdh4+supergroup

可以看到里面存储了所有打包目录及文件的层次结构，数据文件信息及内容偏移等:
/123.txt
/2013/1.txt
/2013/2.txt
/3.txt

[b]part-0:[/b]

hdfs://aaaahdfs://aaaahdfs://aaaahdfs://aaaa

数据文件中记录了打包目录下所有4个文件的内容。

【*】根据元数据文件及数据文件应该可以恢复出原目录结构。

本文内容由网友自发贡献，转载请注明出处：https://www.wpsshop.cn/w/从前慢现在也慢/article/detail/715755