日韩无码专区无码一级三级片|91人人爱网站中日韩无码电影|厨房大战丰满熟妇|AV高清无码在线免费观看|另类AV日韩少妇熟女|中文日本大黄一级黄色片|色情在线视频免费|亚洲成人特黄a片|黄片wwwav色图欧美|欧亚乱色一区二区三区

RELATEED CONSULTING
相關(guān)咨詢
選擇下列產(chǎn)品馬上在線溝通
服務(wù)時(shí)間:8:30-17:00
你可能遇到了下面的問(wèn)題
關(guān)閉右側(cè)工具欄

新聞中心

這里有您想知道的互聯(lián)網(wǎng)營(yíng)銷解決方案
淺談百度爬蟲的HTTP狀態(tài)碼返回機(jī)制
本文將就HTTP協(xié)議中相關(guān)的返回機(jī)制以及在不同情況下會(huì)出現(xiàn)何種返回代號(hào)作一番淺顯易懂地介紹。返回404 Not Found 時(shí)表明找不到相關(guān)頁(yè)面;

一、簡(jiǎn)介

HTTP狀態(tài)碼是指在Web服務(wù)器上運(yùn)行的應(yīng)用程序發(fā)送到客戶端(瀏覽器)的信息。它包含了諸如200 OK之類的標(biāo)準(zhǔn)代號(hào),用來(lái)告訴客戶端當(dāng)前頁(yè)面所處的情況。而對(duì)于百度來(lái)說(shuō),其為了能夠正常采集數(shù)據(jù)并將其存儲(chǔ)到數(shù)據(jù)庫(kù)中,必須要遵循HTTP協(xié)議中相關(guān)的規(guī)則。因此,本文將就HTTP協(xié)議中相關(guān)的返回機(jī)制以及在不同情況下會(huì)出現(xiàn)何種返回代號(hào)作一番淺顯易懂地介紹。

創(chuàng)新互聯(lián)是一家專注于成都網(wǎng)站建設(shè)、成都做網(wǎng)站與策劃設(shè)計(jì),滿洲網(wǎng)站建設(shè)哪家好?創(chuàng)新互聯(lián)做網(wǎng)站,專注于網(wǎng)站建設(shè)10多年,網(wǎng)設(shè)計(jì)領(lǐng)域的專業(yè)建站公司;建站業(yè)務(wù)涵蓋:滿洲等地區(qū)。滿洲做網(wǎng)站價(jià)格咨詢:18980820575

二、HTTP 狀態(tài)代號(hào)

1. 200 OK: 這是最常見(jiàn)也是最重要的 HTTP 狀態(tài)代號(hào)之一, 在大部分情況下, 此時(shí)表明 Web 服務(wù)器已成功處理了該請(qǐng)求;

2. 301 Moved Permanently: 這意味者永久性重定向, 針對(duì)特定鏈接, 如 www.example.com/old-page.html , 此時(shí)會(huì)將 URL 重新引導(dǎo)到 www.example.com/new-page .html ;

3. 302 Found (Moved Temporarily): 這意味者臨時(shí)性重定向, 和301 Moved Permanently 相似, 但302 Found 是臨時(shí)更新URL;

4. 404 Not Found: 返回404 Not Found 時(shí)表明找不到相關(guān)頁(yè)面;

5 403 Forbidden : 有時(shí)候 Web 服務(wù)器會(huì)阻止特定 IP 地址或由特定 IP 地址執(zhí)行特定方法(例如 POST) , 此時(shí)就會(huì)返回403 Forbidden ;

三、Http Status Code Return Mechanism of Baidu Crawler

1、Baidu crawler will first send a request to the server and wait for the response from the server in order to get the content of web page or other resources on it . If there is no response within certain time limit , then Baidu crawler will consider that this request has failed and stop crawling this page .

2、When receiving a response from server , Baidu crawler will check whether it is an error code or not according to HTTP status codes returned by server . If it is an error code such as 404 Not found or 403 Forbidden etc., then Baidu crawler will stop crawling this page immediately without further processing . Otherwise if it is a normal status code like 200 OK , then Baidu crawler can continue its work and start downloading contents from this page .

3、In addition to checking HTTP status codes returned by servers , Baidu also checks robots exclusion protocol (robots txt ) before sending requests so as to avoid wasting resources on pages which are forbidden for crawling by website owners themselves through robots txt files stored on their websites .

4、After getting all contents successfully downloaded from target webpages with normal status codes returned by servers , baidu spider will store them into database for later use such as indexing these data into search engine results list when users enter related keywords in search box of baidus homepage etc..

5、Finally after finishing all tasks above mentioned above successfully without any errors occurred during processings of each step involved in whole procedure described hereabove , baud spider can move onto next webpage waiting for being crawled until all webpages listed in task queue have been processed completely one after another orderly just like what we have discussed hereabove briefly but clearly enough hopefully !

以上就是關(guān)于淺談百度爬蟲的HTTP狀態(tài)碼返回機(jī)制的相關(guān)知識(shí),如果對(duì)你產(chǎn)生了幫助就關(guān)注網(wǎng)址吧。


網(wǎng)站標(biāo)題:淺談百度爬蟲的HTTP狀態(tài)碼返回機(jī)制
本文URL:http://m.5511xx.com/article/ccocpog.html