.htaccess - robots.txt needs only certain files and folders and disallow everything -


i want robots.txt allow index.php , images folder , disallow other folders, possible?

this code:

user-agent: * allow: /index.php allow: /images disallow: / 

secondly, possible same job htaccess?

first, aware "allow" option non-standard extension , not supported crawlers. see wiki page (in "nonstandard extensions" section) , robotstxt.org page.

this bit awkward, there no "allow" field. easy way put files disallowed separate directory, "stuff", , leave 1 file in level above directory:

some major crawlers do support it, frustratingly handle in different ways. example. google prioritises allow statements matching characters , path length, whereas bing prefers put allow statements first. example you've given above work in both cases, though.

bear in mind crawlers not support ignore it, , therefore see "disallow" rule, stopping them indexing entire site! have decide if work moving files around (or writing long list of disallow rules subdirectories) worth bonus of getting indexed lesser crawlers. not.

ref htaccess, can't useful here. you'd have match user agent against large list of known bots , you'd end missing - or worse, blocking real users.


Comments

Popular posts from this blog

java - UnknownEntityTypeException: Unable to locate persister (Hibernate 5.0) -

python - ValueError: empty vocabulary; perhaps the documents only contain stop words -

ubuntu - collect2: fatal error: ld terminated with signal 9 [Killed] -