I have 3 fields in my solr index database and I search for two questions but different fields
indexed data
Employee post : 220232
Pskills : JSP, Servicelet, HTML, Java
Oskill : DB2, Oracle, JDBC, JNI, JSP, VC ++, C, C ++, Java, SQL, XML, Palm OS, Unix, Palm OS, AX, Linux, Solari , Windows 2000, TCP / IP, IP, IDS, Asset Liability Management, Enterprise Application Integration
schema.xml
& lt; Field name = "employee" type = "string" indexed = "true" stored = "true" required = "true" /> & lt; Field name = "pskills" type = "text" indexed = "true" stored = "wrong" required = "false" /> & Lt; Field name = "oscill" type = "text" indexed = "true" stored = "wrong" required = "wrong" />
Question 1 = Employee: 220232 and (Pascil: (("Java")) ^ 3000.00)
Score : 0.6169528
Question 2 = Employee: 220232 and (oskills: (("Java")) ^ 3000.00)
< Score : 0.32307756
My question is "java" keyword in both fields, then why a different value was given
< Div class = "post-text" itemprop = "text">
There are many reasons! Specifically:
- If the fields are different lengths, the score will be affected (matching in smaller areas is more heavier) ( Definitely a factor )
- More than one match is found in one of the fields, gives a high TF to that field (for example Java once appears in Oskil, but for example, twice in Piskil ) ( It does not look like
- These terms are in a field compared to other words In all the documents in R is more common. If, for example, in all documents, "Java" appears in Oshil in 1000 documents, but it only appears in 100 documents, then due to IDF, more and more skills See more matches .
And at that time when it was run, the index was at that time. Not to compare scores.
Comments
Post a Comment