#4 Allow for items to be deduplicated under a different name.

开启中
arkiver1年前创建 · 0 条评论
arkiver 评论于 1年前

This will allow us to put context information in the item name, while not including the context information in the deduplication process.

Using this, we can get rid of most of urls:filters, since those are mostly in place to prevent loops on bad on bad page requisites. The context would hold the crawling depth, and we would not queue a new URL if depth is over a certain threshold.

Example:
url=URL&context=CONTEXT would be deduplicated under for example URL

This will allow us to put context information in the item name, while not including the context information in the deduplication process. Using this, we can get rid of most of urls:filters, since those are mostly in place to prevent loops on bad on bad page requisites. The context would hold the crawling depth, and we would not queue a new URL if depth is over a certain threshold. Example: `url=URL&context=CONTEXT` would be deduplicated under for example `URL`
arkiver 添加了标签
enhancement
1年前
登录 并参与到对话中。
未选择里程碑
未指派成员
1 名参与者
通知
到期时间

未设置到期时间。

依赖工单

此工单当前没有任何依赖。

正在加载...
这个人很懒,什么都没留下。