#4 Allow for items to be deduplicated under a different name.

開啟中
arkiver1 年之前建立 · 0 條評論

This will allow us to put context information in the item name, while not including the context information in the deduplication process.

Using this, we can get rid of most of urls:filters, since those are mostly in place to prevent loops on bad on bad page requisites. The context would hold the crawling depth, and we would not queue a new URL if depth is over a certain threshold.

Example:
url=URL&context=CONTEXT would be deduplicated under for example URL

This will allow us to put context information in the item name, while not including the context information in the deduplication process. Using this, we can get rid of most of urls:filters, since those are mostly in place to prevent loops on bad on bad page requisites. The context would hold the crawling depth, and we would not queue a new URL if depth is over a certain threshold. Example: `url=URL&context=CONTEXT` would be deduplicated under for example `URL`
arkiver added the
enhancement
label 1 年之前
登入 才能加入這對話。
未選擇里程碑
No Assignees
1 參與者
訊息
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
尚未有任何內容