Definizione dei brevetti di Google al 31.03.05

mamilu

Definizione dei brevetti di Google al 31.03.05

http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=1&f=G&l=50&co1=AND&d=PG01&s1=20050071741&OS=20050071741&RS=20050071741

Ragazzi, è un mattone della Madonna ma è terribilmente interessante.

Per poterlo studiare e discutere occorrerebbe tradurlo accuratamente.

Tenete presente che per " motore di ricerca 125" si intende Google

Tocca fare una Commissione Traduzioni.

Seven, [777] ci sei ?

rinzi

Elena l'URL credo sia errato... puoi controllare

emanuelebolondi

ma di cosa tratta principalmente?

mamilu

A grandi linee direi che traccia valutazioni sul modo che Google ha di rapportarsi e dare peso-valutazione a fattori come i BL, anzianità degli inbound link, del sito che li manda, del sito che li riceve, la loro diluizione nel tempo ecc. e l'influenza di essi fattori sulle serp.

Se ne stà parlando in altri forum, ma con evidenti difficoltà di interpretazione-traduzione; occorrerebbe quindi SEVEN !!

SEVEN!!!

beke

SUMMARY OF THE INVENTION

[0010] Systems and methods consistent with the principles of the invention may score documents based, at least in part, on history data associated with the documents. This scoring may be used to improve search results generated in connection with a search query.

[0011] According to one aspect consistent with the principles of the invention, a method for scoring a document is provided. The method may include identifying a document and obtaining one or more types of history data associated with the document. The method may further include generating a score for the document based, at least in part, on the one or more types of history data.

[0012] According to another aspect, a method for scoring documents is provided. The method may include determining an age of linkage data associated with a linked document and ranking the linked document based on a decaying function of the age of the linkage data.

Sembrerebbe trattarsi di un sistema per dare dei punteggi ad un documento in base a dati storici memorizzati su di esso, in particolare alla anzianità dei links che riceve, se ho capito bene.

Tieni conto che Google ha patentato centinaia di questi sistemi ma non c'è modo sicuro, non deduttivo, di sapere quali effettivamente utilizza negli algoritmi

andrez

Google ha patentato centinaia di questi sistemi ma non c'è modo sicuro, non deduttivo, di sapere quali effettivamente utilizza negli algoritmi

Certo.

Ma riterrei perlomeno interessante l'esserne a conoscenza ed il poterli valutare.

Quindi sarei anch'io per tentare una traduzione

zil

Io lo sto leggendo in inglese, cmq se volete vi traduco i passaggi piu' interessanti che trovo..

cominciamo

While a spiky rate of growth in the number of back links may be a factor used by search engine 125 to score documents, it may also signal an attempt to spam search engine 125. Accordingly, in this situation, search engine 125 may actually lower the score of a document(s) to reduce the effect of spamming.

Mentre una crescita del numero dei backlinks può essere un fattore usato dal motore di ricerca 125 per assegnare un rank (punteggio) ad un documento, può anche segnalare un tentativo di spam verso il motore di ricerca 125. Di conseguenza, in questa situazione, il motore di ricerca 125 può attualmente abbassare il rank di un documento per ridurre l'effetto dello Spamming.

zil

For some queries, older documents may be more favorable than newer ones. As a result, it may be beneficial to adjust the score of a document based on the difference (in age) from the average age of the result set. In other words, search engine 125 may determine the age of each of the documents in a result set (e.g., using their inception dates), determine the average age of the documents, and modify the scores of the documents (either positively or negatively) based on a difference between the documents' age and the average age.

In pratica dice che per alcune query (ricerche) i siti piu' vecchi possono essere molto piu' favoriti, e quindi puo' fare una specie di proporzione per mostrare i risultati. Poi aggiunge che l'eta' di un documento per una query e' il frutto di piu' criteri (li spiega nel sito) messi assieme

zil

For example, a document whose content is edited often may be scored differently than a document whose content remains static over time. Also, a document having a relatively large amount of its content updated over time might be scored differently than a document having a relatively small amount of its content updated over time.

Spiega che un sito aggiornato piu' frequentemente sara' valutato differentemente da un sito statico (inteso come non aggiornato, non .html) e che un sito che aggiorna grandi quantita' di contenuti sara' a sua volta valutato differentemente da uno che aggiorna poco

nota mia, questo spiegherebbe perche' da un buon rank ad i blog fatti bene. Cmq nel sito ci sono pure le formule.. se un giorno son particolarmente fuso vedo di ricavare un bel algoritmo..

zil

UA may also be determined as a function of one or more factors, such as the number of "new" or unique pages associated with a document over a period of time

UA e' un fattore della formula per il rank in base agli aggiornamenti (il mio posto sopra) e dice che un modo per calcolarla e' anche quello di osservare quante nuove pagine aggiunge un sito in un dato periodo di tempo

zil

In this case, search engine 125 may store a term vector for a document (or page) and monitor it for relatively large changes.

Dice che a volte lo spazio per monitorare tutti gli aggiornamenti delle varie pagine e' insufficiente per salvare tutte le pagine e quindi si opta per salvare un vettore (insieme) di termini usati piu' spesso, tipo una lista di keywords

According to yet another implementation, search engine 125 may store a summary or other representation of a document and monitor this information for changes

oppure puo' salvarsi una specie di sommario sempre per monitorare i cambiamenti

la prossima e' interessante

According to a further implementation, search engine 125 may generate a similarity hash (which may be used to detect near-duplication of a document) for the document and monitor it for changes

Il motore puo' generare un "similarity hash" (immagino sappiate tutti cos'e' , cmq una specie di checksum di un documento, es md5 fa hash) e puo' usarlo sia per vedere se il documento cambia sia per scovare copie di documenti.

zil

Parla di come dare un rank in base alle query e prima specifica che puo' farlo in base alle scelte degli utenti, poi

Another query-based factor may relate to the occurrence of certain search terms appearing in queries over time. A particular set of search terms may increasingly appear in queries over a period of time. For example, terms relating to a "hot" topic that is gaining/has gained popularity or a breaking news event would conceivably appear frequently over a period of time. In this case, search engine 125 may score documents associated with these search terms (or queries) higher than documents not associated with these terms.

In pratica puo' dare un maggiore peso ai siti che sono associati con le notizie/termini piu' recenti, le super news in pratica. Faccio un esempio per rendere, tempo fa c'e' stato lo tsunami, ecco se i miei siti erano associati allo tsunami avrebbero avuto un peso maggiore di quelli non associati ad esso.

mamilu

Tenete presente che per " motore di ricerca 125" si intende Google

Complimenti Zil !!!
...tieni duro

beke

Ritiro tutto quello che ho detto prima.

Altro che se è interessante... me lo stampo subito e lo guardo con calma nel WE.

zil

Questa e' paurosa

Yet another query-based factor may relate to the extent to which a document appears in results for different queries. In other words, the entropy of queries for one or more documents may be monitored and used as a basis for scoring. For example, if a particular document appears as a hit for a discordant set of queries, this may (though not necessarily) be considered a signal that the document is spam, in which case search engine 125 may score the document relatively lower.

In pratica dice che un sito che compare per piu' key discordanti puo' (non necessariamente pero') essere considerato come un segnale che il documento e' spam, in questo caso gli verra' assegnato un punteggio relativamente basso

zil

Parla a lungo di come anche la data dei link (quanti nuovi link ogni tot, quanti link spariscono) influisce

By analyzing the change in the number or rate of increase/decrease of back links to a document (or page) over time, search engine 125 may derive a valuable signal of how fresh the document is.

In particolare li dice che analizzando se i backlink calano od aumentano puo' derivare se un documento e' "fresco" oppure statico.. in pratica se continuiamo ad avere bl ci assegna un rank piu' alto, se ne perdiamo piu' basso

zil

Poi descrive il peso dei link

in pratica piu' un sito ha rank alto e piu' rank passa, e fin qua ok, poi si fa piu' interessante e dice che i link da siti "veri" (es governativi) contano molto (con una M maiuscola direi io)

questo spiega perche' quando mi sono linkato un sito dal sito dell'universita' mi e' esploso

poi parla anche che i link contano di piu' se sono su siti "freschi" cioe' aggiornati frequentemente

vado a cenare, se ho tempo continuo dopo, altrimenti aspettiamo qualche altra buonaanima

giorgiotave

Molto interessante la traduzione di zil. 777 dai traduci qualcosina

Stavo pensando che possiamo ogni tanto tradurre qualche articolo interessante, magari se troviamo qualche anima buona, creiamo una sezione apposita. Naturalmente come farò per i moderatori ci sarà una pagina personale dove inserire i propri link e profilo.

raele.l.angelo

riparto un po io da dove ha lasciato zil

[0071] According to one implementation, the analysis may depend on the number of new links to a document. For example, search engine 125 may monitor the number of new links to a document in the last n days compared to the number of new links since the document was first found. Alternatively, search engine 125 may determine the oldest age of the most recent y % of links compared to the age of the first link found.

L'analisi puo' dipendere dal numero di nuovi link nel documento. Per esempio gg puo' monitorare il numero di nuovi link in un documento negli ultimi "n" giorni e compararlo col numero dei nuovi link da qunado e' nato il documento. in alternativa, gg puo' determinare l'eta piu' vecchia della piu' recente percentuale (chiamata y%) di link comparandola con l'eta' del primo link trovato .

aggiungo io ....urge esempio:

For the purpose of illustration, consider y=10 and two documents (web sites in this example) that were both first found 100 days ago. For the first site, 10% of the links were found less than 10 days ago, while for the second site 0% of the links were found less than 10 days ago (in other words, they were all found earlier). In this case, the metric results in 0.1 for site A and 0 for site B. The metric may be scaled appropriately. In another exemplary implementation, the metric may be modified by performing a relatively more detailed analysis of the distribution of link dates. For example, models may be built that predict if a particular distribution signifies a particular type of site (e.g., a site that is no longer updated, increasing or decreasing in popularity, superceded, etc.).

a scopo illustrativo, cosideriamo y=10 (vedi sopra) e 2 documenti entrambi inizialmente trovati 100 giorni fa.
per il primo sito il 10% dei link sono stati trovati meno di 10 giorni fa, mentre per il secondo sito lo 0% dei link sono stati trovati meno di 10 giorni fa. in questo caso gg attribuisce una sorta di valore 0.1 al primo sito e 0 al secondo sito. e dice che questo valore puo' essere studiato da gg facendo un'analisi piu' dettagliata della distribuione delle date dei link e modificato in funzione del fatto che a distribuzioni di date diverse corrispondono siti diversi .

the analysis may depend on weights assigned to the links. In this case, each link may be weighted by a function that increases with the freshness of the link. The freshness of a link may be determined by the date of appearance/change of the link, the date of appearance/change of anchor text associated with the link, date of appearance/change of the document containing the link. The date of appearance/change of the document containing a link may be a better indicator of the freshness of the link based on the theory that a good link may go unchanged when a document gets updated if it is still relevant and good. In order to not update every link's freshness from a minor edit of a tiny unrelated part of a document, each updated document may be tested for significant changes (e.g., changes to a large portion of the document or changes to many different portions of the document) and a link's freshness may be updated (or not updated) accordingly.

L'analisi puo' dipendere dai pesi assegnati ai link. Ogni link puo' essere pesato attrave4rso una funzione che cresce con la freschezza del link. La freschezza del link puo' essere determinata da:
-data di apparizione/cambiamento del link
-data di apparizione/cambiamento del testo associato a un link
-data di apparizione/cambiamento del documento contentente il link
La data di apparizione/cambiamento del documento contentente il link puo' essere il miglior indicatore della freschezza del link basandosi sulla teoria che un buon link puo' rimanere invariato quando un documento
viene updatato se esso e' ancora buono e rilevante

raele.l.angelo

Links may be weighted in other ways. For example, links may be weighted based on how much the documents containing the links are trusted (e.g., government documents can be given high trust). Links may also, or alternatively, be weighted based on how authoritative the documents containing the links are (e.g., CUT

i link possono essere pesati in altri modi. per esempio, i link possono essre pesati basandosi su quanto documento contiene link fidati....

ODDIO ma ho sbajato a riparti'...qui c'era gia' arrivato zil

vabbe'...

According to yet another technique, the analysis may depend on an age distribution associated with the links pointing to a document. In other words, the dates that the links to a document were created may be determined and input to a function that determines the age distribution.CUT

L'analisi puo' dipendere dalla distribuzione delle eta' associatia con i link che puntano ad un documento, cioe' le date in cui vengono creati i links che puntano a un documento

e vabbe' roba gia' vista...

The dates that links appear can also be used to detect "spam," where owners of documents or their colleagues create links to their own document for the purpose of boosting the score assigned by a search engine. A typical, "legitimate" document attracts back links slowly...CUT

la stesa funzione di distribuzione della date puo' essere usata per scovare spam, dove i proprietari dei documenti o loro colleghi creano links al loro documento per avanza nelle serp, in qunato un sito tipico, "legittimo" acquista BL piano....col tempo. poi parla del fenomeno dello scambio link...e dei posti dove scrivere i link senza "editorial discretion"...penso siano cmq tutti i posti compresi i forum, blog, guestbook etc.

[0081] Anchor Text

[0082]
uno dei criteri per le ancore e' il modo in cui cambie il testo nell'ancora che puo' essere uno dei criteri che indica che il documento e' stato aggiornato o cmq un cambiamento del focus del documento.

[0083]
se il contenuto di un documento cambia in maniera cosi' significativa dal testo associato all'ancora, vuol dire che il dominio su cui sta il documento puo' avere cambiato significativamente o completamente i suoi contenuti (basti pensare a quando un dominio muore e viene ricomprato da altri) e visto che il testo di href di un bl viene considerato come parte integrante del documento a cui punta, il dominio sale nelle serp per ricerche che nn sono piu' on topic e questo e' da evitare.

0085]
La freschezza di un ancor text puo' essere usata come fatttore per lo scoring del documento. La freschezza di una ancora puo' essere determinata dalla data di apparizione/cambiamento del testo nell'ancora e/o dalla data di apparizione/cambiamento del documento a cui punta l'ancora: quest'ultima puo' essere un buon indicatore della freschezza di un ancor text basato sulla teoria che abbiamo visto prima per la freschezza dei link e cioe' che buone ancore potrebbero non essere cambiate quando si aggiorna un documento

0086]
in definitiva gg puo' generare un punteggio associato al documento, almeno in parte, basato sulle informazioni relative al modo in cui cambia l'ancor text.

scusate una cosetta
io sono partito da dove aveva lasciato zil
ora pero' scorrendo il testo della pagina in alto mi pare che non tutte le sezioni siano state trattate qui o sbaglio?

rileggendo mi pare ci siano molte cosette interessanti: perche' nn affrontiamo sta cosa un po piu' sistematicamente?! tipo ripartiamo da 0 e, magari, come diceva nn rikordo chi scusatemi, cerchiamo di buttare giu' qualkosa di utile... che ne dite?