• User Attivo

    File Robots.txt

    Ho letto qlc post sul forum
    E sono concluso a scrivere questo robots.txt

    User-agent: Googlebot
    Disallow:
    User-agent: Googlebot-Image
    Disallow:
    User-agent: MSNBot
    Disallow:
    User-agent: Slurp
    Disallow:
    User-agent: Teoma
    Disallow:
    User-agent: Gigabot
    Disallow:
    User-agent: Scrubby
    Disallow:
    User-agent: Robozilla
    Disallow:
    User-agent: BecomeBot
    Disallow:
    User-agent: Nutch
    Disallow:
    User-agent: Fast
    Disallow:
    User-agent: Scooter
    Disallow:
    User-agent: Mercator
    Disallow:
    User-agent: Ask Jeeves
    Disallow:
    User-agent: teoma_agent
    Disallow:
    User-agent: ia_archiver
    Disallow:
    User-agent: BizBot04 kirk.overleaf.com
    Disallow:
    User-agent: HappyBot (gserver.kw.net)
    Disallow:
    User-agent: CaliforniaBrownSpider
    Disallow:
    User-agent: EINet/0.1 libwww/0.1
    Disallow:
    User-agent: Ibot/1.0 libwww-perl/0.40
    Disallow:
    User-agent: Merritt/1.0
    Disallow:
    User-agent: StatFetcher/1.0
    Disallow:
    User-agent: TeacherSoft/1.0 libwww/2.17
    Disallow:
    User-agent: WWW Collector
    Disallow:
    User-agent: processor/0.0ALPHA libwww-perl/0.20
    Disallow:
    User-agent: wobot/1.0 from 206.214.202.45
    Disallow:
    User-agent: WhoWhere Robot
    Disallow:
    User-agent: ITI Spider
    Disallow:
    User-agent: w3index
    Disallow:
    User-agent: MyCNNSpider
    Disallow:
    User-agent: SummyCrawler
    Disallow:
    User-agent: OGspider
    Disallow:
    User-agent: linklooker
    Disallow:
    User-agent: CyberSpyder
    Disallow:
    User-agent: SlowBot
    Disallow:
    User-agent: heraSpider
    Disallow:
    User-agent: Surfbot
    Disallow:
    User-agent: Bizbot003
    Disallow:
    User-agent: WebWalker
    Disallow:
    User-agent: SandBot
    Disallow:
    User-agent: EnigmaBot
    Disallow:
    User-agent: spyder3.microsys.com
    Disallow:
    User-agent: 205.252.60.71
    Disallow:
    User-agent: 194.20.32.131
    Disallow:
    User-agent: 198.5.209.201
    Disallow:
    User-agent: acke.dc.luth.se
    Disallow:
    User-agent: dallas.mt.cs.cmu.edu
    Disallow:
    User-agent: darkwing.cadvision.com
    Disallow:
    User-agent: waldec.com
    Disallow:
    User-agent: www2000.ogsm.vanderbilt.edu
    Disallow:
    User-agent: unet.ca
    Disallow:
    User-agent: murph.cais.net
    Disallow:
    User-agent: *
    Disallow: /cgi-bin
    Disallow: /wp-admin
    Disallow: /wp-includes
    Disallow: /wp-content/plugins
    Disallow: /wp-content/cache
    Disallow: /wp-content/themes
    Disallow: /
    ?*
    Disallow: /*?
    Sitemap: link tua sitemap

    è quello ottimale secondo voi?


  • User Attivo

    elencare tutta quella serie di spider senza motivo (non vengono bloccati, quindi perchè stabilirne una regola??) non credo serva a molto...


  • User Attivo

    L'unica parte che potrebbe andar bene è

    
    User-agent: *
    Disallow: /cgi-bin
    Disallow: /wp-admin
    Disallow: /wp-includes
    Disallow: /wp-content/plugins
    Disallow: /wp-content/cache
    Disallow: /wp-content/themes
    
    

    per quanto riguarda le ultime 3 righe, non so. Non so come funzioni il sitemap nel robots.txt, e le due righe con /? corrono il rischio di bloccare qualsiasi pagina che contiene un punto interrogativo nel titolo, e magari non è quello che vuoi.

    
    Disallow: /*?*
    Disallow: /*?
    Sitemap: link tua sitemap