Bad Search engine in Concrete5?.

Permalink
Hey Guys,

Just wondering a have been testing the search engine and i noticed one huge thing, maybe a flaw or just normal, not sure.

The search engine seems not to search for text inside paragraphs (content) just for the url, title etc.

For an Example, let's take this text from the front page.

concrete5 is easy to use, and flexible to build with.
 You'll find yourself saying "yes" and "sure" instead of 
"that's not in the statement of work" to your clients. 
When you deliver the finished website, training can be 
as easy as a phone call and pat on the back — editing 
with concrete5 is that intuitive.


And make a search for "not in the statement of work".

It will give you no results. Is there any way to fix this ?.

 
Tony replied on at Permalink Reply
Tony
my biggest gripe about the search engine is that by default it only indexes areas called main or main content, instead of all areas. You can add to the area's that it searches very easily if you know what you're doing (override the index_search.job, and call IndexedSearch::addSearchableArea('area name') ), but people shouldn't have to do this. I somewhat understand the thinking behind just indexing the main content areas (so as to not index the same navigation links on every page), but maybe we can set it up to just exclude areas called sidenav, sidebar, or header, and then to just index all other areas by default?
glockops replied on at Permalink Reply
glockops
It would be nice if the search function had a dashboard page where you could set custom settings.

I'd also like to see search statistics, an auto-suggest function built right in, and maybe the ability to take users directly to a certain page if they search for something specific.

Also, are there (or plans to add), some sort of commenting code that tells the site search not to index something. Previously, I was using Sphider (sphider.eu) for site search. You could enter <-- sphider_noindex --> and it would skip everything until you closed the comment.

But anyway you look at it, C5 Search needs some TLC. I'd gladly pay for a marketplace block that had advance search features.
zoki replied on at Permalink Reply 1 Attachment
I've managed to solve most of the search problems on c5 5.3.2. I've created a new table in my c5 database
CREATE TABLE `PageSearchIndexAll` (
  `cID` int(10) unsigned NOT NULL default '0',
  `cMeta` text,
  `content` text,
  `cName` varchar(255) default NULL,
  `cDescription` text,
  `cPath` text,
  `cDatePublic` datetime default NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;


And modified database_indexed_search.php (see attachment). On line 48 starts the select part of the query used to fetch the results. You'll see MATCH ... AGAINST ... statements. Statements with "IN BOOLEAN MODE" return only 1 if it found the keywords, 0 otherwise. Statements without "IN BOOLEAN..." return weighted results but are less flexible (they don't always find the keywords that boolean mode does, but I forgot why so check mySql docs to find out more). I'm multiplying results with different integers to tweak the output. I've setup search suggestion (I'll post that code latter if anyone is interested in it) that goes through keywords and page titles and returns a page titles as a suggestion, so I want the pages with matching titles to be first in the search result list. If user didn't choose the suggested title I want the keywords to have precedence over content... For more info on MySql text search capabilities look for "full text search" in their docs. If you decide to use the attached file put in the libraries folder in the c5 root of your site and change extension to php. I have not tested my modifications on 5.3.3.1, it was tested only on 5.3.2.

And I have one question.
I've seen in 5.3.3.1 when adding a new page attribute two options:
1. Content included in "Keyword Search".
2. Field available in "Advanced Search".
How to enable these two search types? Or is this only for future use? I checked the query that is used for searching in 5.3.3.1 and it's joining PageSearchIndex and CollectionSearchIndexAttributes tables but attributes table is not used in select or where part of the query(?!), am I missing something?
olacom replied on at Permalink Reply
olacom
bump!
hursey013 replied on at Permalink Reply
hursey013
Any progress with the search engine in 5.3.3.1? I am unable to search on custom page attributes even with the "content included in keyword search" option checked. Is there a way to modify the sql query to include attributes?
zoki replied on at Permalink Reply
Actually they are used in query but for some reason it's not consistent. If You i.e. include keywords in your search, users should write all the keywords in their search in exactly the same order as they are written on the page (and would probably get only one page in results). If they want all pages with only one keyword they should add '%' in their search query, but that is not needed for searching content. Luckily this is easy to fix. Just change the line
$kw = $db->quote($kw);

in database_indexed_search.php to
$kw = $db->quote('%'.preg_replace( '/\s/', '%', $kw).'%');

This will append '%' in front and back of the search query and it will also replace all whitespace characters with '%'. If You don't want to replace whitespace characters replace the old line with this
$kw = $db->quote('%'.$kw.'%');
medhatmsm replied on at Permalink Reply
@zoki: Thanks for the fix. That was reallllly useful!

appreciate if you or somebody can help me with this one:

http://www.concrete5.org/community/forums/customizing_c5/existing-c...
hursey013 replied on at Permalink Reply
hursey013
I'm slightly confused - does this fix address search specifically for custom page attributes, or just searching in general?
zoki replied on at Permalink Reply
The fix is for page attributes. Originally they are using like operator for attributes but without '%'. If you have any page attribute included in search with value i.e. "some value text" and search for "value", without modification that page will not be shown in search results.
hursey013 replied on at Permalink Reply
hursey013
Thanks for the quick reply. I'm curious what I'm doing incorrectly, because even with the updated line in my database_indexed_earch.php I am still not returning any results for keywords which are found in the page attributes. I have not modified the core at all, and really dont have anything complicated going on rather than setting up a few sections using page attributes and a page list.

For example I have pages for each member of a management team, which are all built out using attributes. However if I search for their name or title (which are attribute), nothing is returned. Do you have any ideas what I am missing?

Thanks again.
zoki replied on at Permalink Reply
Sorry, but I don't know what could be the problem but you can also try this.
Find this line in database_indexed_search.php
$attribsStr=' OR ak_' . $ak->getAttributeKeyHandle() . ' like '.$kw.' ';

and replace it with
$attribsStr=' OR MATCH(ak_' . $ak->getAttributeKeyHandle() . ') AGAINST ('.$kw.' IN BOOLEAN MODE) ';

I've tested this and it works. This will use full text search on attributes, the same way it's used for content. If You do this you don't have to use the previous modification. I'm already using this on my production web site...
moth replied on at Permalink Reply
moth
Edit:
No I'm wrong. But I think I might have a fix.
hursey013 replied on at Permalink Reply
hursey013
Do share... I'd love to be able to search attributes on my site, similar to you I have several sites that basically rely on the functionality and zoki's fixes for one reason or another do not seem to help.
moth replied on at Permalink Reply
moth
It's a bit messy. I spent some time trying to debug the SQL and there's something funny going on with the output of getAttributeKeyHandle(). I don't know how to output the full SQL - there used to be a debug log in the dashboard but now it's gone?

Anyway - rather than use that function I just decided to manually pass my Custom Attribute names. The SQL looks IDENTICAL but the only one that returned matches was with my hard-coded Custom Attribute names. I haven't got a clue why this would be the case as the getAttributeKeyHandle() function appears to return everything as it should.

So...,

In database_indexed_search.php

I replaced this;

<?php
$keys = CollectionAttributeKey::getSearchableIndexedList();
$attribsStr = '';
foreach ($keys as $ak) {
$attribsStr=' OR ak_' . $ak->getAttributeKeyHandle() . ' like '.$kw.' ';
}
?>

with this;

<?php
$attribsStr=' OR MATCH(ak_Attribute_One, ak_Attribute_Two) AGAINST ('.$kw.' IN BOOLEAN MODE) ';
?>

And then I get results.

You will need to add in each attribute name as you add them, so from that respect this isn't really a fix, merely a hack. You can find them in the SQL table CollectionSearchIndexAttributes.

This is tested on 5.3.3.1

Perhaps someone with a bit more PHP skills can do something a bit more elegant with this. Furthermore, I had to add this to the core file since I can not for the life of me get this to work if I place a new copy of database_indexed_search.php in my top-level 'libraries' folder, but that's another issue completely.
moth replied on at Permalink Reply
moth
Perhaps someone can enlighten me as to how to post code ;)
matogertel replied on at Permalink Reply
matogertel
Ok, a few notes here:
- The attribute search is a bit buggy in 5.3.3.1 - I specifically found problems searching for values in select lists with multiple values allowed.
- To include attributes in a keyword search, you need to tick "Content included in "Keyword Search"". in the attribute edit screen.
- If you add date, checkbox or numeric attributes to keyword search nothing will work. This is a bug (or a feature) in mysql, that doeasn't like mixing text fields and binary fields in a "match against" query.
- In my search tools package, rather than working on database_indexed_search I worked on models/page_list.php - in fact I completely replaced the page_list with my own version, tuned to my needs. I think that is where you should be looking.
- Also, to make things worse (or better), every attribute (in models/attribute/types) can give the page list it's own way of searching. The page list accesses this by calling the "searchForm" in the attribute controller. As I said, every attribute does whatever it wants in that function, so results will not always be what you expect them to be.
hursey013 replied on at Permalink Reply
hursey013
So for you, if you tick "Content included in keyword search" and the custom attribute is a text field, it is searchable by concrete out of the box? For me, following those steps still does not seem to return any results?
moth replied on at Permalink Reply
moth
It doesn't work out of the box for me, hursey.

I had to make the hacks as I outlined above, AND tick those boxes.

I think I've read most threads pertaining to this issue and it could well be that it works out-of-box for some and not for others. Could be MySQL versions, database collations, any number of different issues...