Ravi Raj's Blog: 6/1/12

Sunday, June 24, 2012

shebang/hashbang and Single Page Interface Good or Bad

We have noticed long URLs including # or #! in twitter and facebook. Actually # is known as The fragment identifier introduced by a hash mark # is the optional last part of a URL for a document,It is typically used to identify a portion of that document.

# behavior depends on document MIME type like in PDF, it acts in different manner.

The reason that Facebook and other Javascript-driven applications use this because they want to make pages indexable, bookmarkable and support the back button without reloading the entire page from the server. This technique is called Single Page Interface which is based on JavaScript routing.

When we use #! then it will be considered "AJAX crawlable." So suppose url is /ajax.html#!key=value then it temporarily become /ajax.html?_escaped_fragment_=key=value this is happening because
Hash fragments are never (by specification) sent to the server as part of an HTTP request so crawler needs some way to let your server know that it wants the content for the URL with #.

And server, on the other hand, needs to know that it has to return an HTML snapshot, rather than the normal page sent to the browser. Here snapshot is all the content that appears on the page after the JavaScript has been executed.

Other Uses except AJAX

In pagination we can use #!. URLs like blog/topic/page/1 and blog/topic/page/2 etc can appear as duplicate content and google doesn't like that so maybe in this case we make hashbangs be better or just a robots no index on any page that is a pagination of another page.

Other benefit to this technique is loading page content through AJAX and then injecting it into the current DOM can be much faster than loading a new page. In addition to the speed increase, further tricks like loading certain portions in the background can be performed under the programmer's control.

Issues

With hashbang URLs, the browser needs to download an HTML page, download and execute some JavaScript, recognize the hashbang path (which is only visible to the browser), then fetch and render the content for that URL. So By removing it,we can reduce the page load time it takes.

Spiders and search indexers can and do sometimes implement JavaScript runtimes. However, even in this case there’s no well recognised way to say ‘this is a redirect’ or ‘this content is not found’ in a way that non-humans will understand.

Also code will be not maintainable unless we use some modular kind of code at front-end side (Java Script ) otherwise it will be very hard to adopt further enhancements and support existing code.

Solution

location.hash was a way for AJAX applications to get back button and bookmarking support.

HTML 5 now introduce with pushState. it provides a way to change the URL displayed in the browser through JavaScript without reloading the page.

window.history.pushState(data, "Title", "/new-url");

In order to support the back and forward buttons we must be notified when they are clicked. we can do that using thewindow.onpopstate event. This event gives access to the state data that passed to pushState earlier. Of course, we can manually go back and forward with the standard history functions.

Currently, pushState has support from the latest versions of Safari and Chrome, and Firefox 4 will be supporting it as well. It is worth noting that Flickr is already using the API in their new layout.

Libraries like jQuery BBQ start to support this feature with fallback to the old hash trick.

The hard part is that support for history.pushState in Internet Explorer does not appear to be forthcoming. That makes the argument that browsers are quickly adopting that feature pretty dubious since IE accounts for a good 30-40% of traffic.

Found a very nice case study in this. Here you can understand how hashbang impact on page routing and whats are basic pitfalls.

http://tumbledry.org/2011/05/12/screw_hashbangs_building

Tuesday, June 12, 2012

open source JS files loaders: LabJS review

It's heated topic and people always do some differently (http://www.phpied.com/preload-cssjavascript-without-execution/) for loading HTML resources without blocking.

I have a several thoughts. Actually i spent some time and reviewed LabJS codebase and found few conclusions.

LabJs is playing with ready event with XHR technique to load files ( see how many ways we can load js files without blocking .... http://www.stevesouders.com/blog/2009/04/27/loading-scripts-without-blocking/).

The ready event occurs after the HTML document has been loaded, while the onload event occurs later, when all content (e.g. images etc) also has been loaded.

The onload event is a standard event in the DOM, while the ready event is specific to browser(so
it required various browser checks).

The purpose of the ready event is that it should occur as early as possible after the document has loaded, so that code that adds functionality to the elements in
the page doesn't have to wait for all content to load.

As i said ready event is not standard event so it need different cross browser checks and it's not necessary that its always return correct result.

On time by time people tried to get ready event but it's not full proof.
---------------------------------------------------------------------
Let's check how much reliable is Ready event
---------------------------------------------------------------------

http://blog.getify.com/why-dom-ready-still-sucks/ [by Author of LabJs Kyle Simpson]
Mozilla later named it "DOMContentLoaded" as
https://developer.mozilla.org/en/DOM/DOM_event_reference/DOMContentLoaded [it's mentioned here that it's not safe method]
http://code.google.com/p/domready/
http://code.google.com/p/domassistant/source/browse/trunk/modules/DOMAssistantLoad.js
Jquery's previous versions are also failed in correct implementation

------------------------------
LAB JS History
------------------------------

We have seen different script loaders but i think LABjs’s goal is a bit different from others.
It enable parallel downloading of JavaScript files while maintaining execution order.

To do so, LABjs needs to know which browsers allow parallel downloads by default and then provide
other solutions for the browsers that don’t.

All loaders are using various browser detection techniques to determine the correct action
to optimize script loading.

BUT Browser detection is still not full proof in JS.

http://www.nczonline.net/blog/2009/12/29/feature-detection-is-not-browser-detection/

Unfortunately LABJs was failed in past to maintain execution order. See below links.

https://twitter.com/getify/status/26109887817 [ by Kyle]

Blog post of Kyle ( http://blog.getify.com/ff4-script-loaders-and-order-preservation/comment-page-1/)
and See Guy's comment who wrote Firefox Gecko engine code.
(http://blog.getify.com/ff4-script-loaders-and-order-preservation/comment-page-1/#comment-748)

Finally Kyle have wiki website ( http://wiki.whatwg.org/wiki/Dynamic_Script_Execution_Order) where he maintain list of issues ( due to different behavior of browsers like Mozilla Gecko and Web-kit) so that browser vendors internalize these issues and come up with ways to solve them.

----------------------------
negative side effects
----------------------------
In below links Kyle mentioned that LABjs will not work properly :

if page is used document.write
if codebase is used ready event poorly
Other negative effects which are mentioned, BTW nice abbreviated by Kyle: "FUBC" (flash of un-behaviored content)

http://labjs.com/description.php#whennottouse

http://blog.getify.com/labjs-new-hotness-for-script-loading/#ux-labjs

-----------------------------
Why we need LoadJs ?
-----------------------------

First, all loader scripts are trying to enable parallel downloading of JavaScript resources. That’s a worthy goal but one that’s already being handled by newer browsers.

see below link where i load 19 HTML resources in normal manner but still JS files are loaded parallel.

http://172.16.3.228/public/test.html

Second, LABjs is very focused on maintaining script execution order.
With this comes an assumption that we want to download multiple JavaScript files that have
dependencies on one another. This is something that don’t recommend but I think that some people feel it’s important.

Third, other loaders are focused on separation of download and execution of JavaScript. It is the
idea that download a JavaScript file and not execute it until a point in time determined by
us(requirejs did same job in very decent manner). The assumption here is that page is progressively
enhanced such that JavaScript isn’t immediately needed. LABjs doesn’t address this problem. Browsers are also not helping with this.

So all these loaders require constant monitoring and updating as new browser versions come
out. Maintenance is important in JS based websites and these libraries add maintenance
overhead that isn’t necessary.

----------------------------------------------------
Then What Lazyload does actually do ?
---------------------------------------------------
In shiksha.com we use loadscript & upLoadJsOnDemand function for parallel downloading.

We can easily GREP codebase with "loadScript" or "upLoadJsOnDemand".

Actually Lazyload was introduced for different purpose. It servers files which are
not required in main page rendering. Some features are listed below.

1. load file only once
2. load file after on load event ( before lazyload API, it was unable to load js files in javascript code )
3. callback Fn execution
4. pass objects to callback Fn in global scope
5. callback Fn execution even files are loaded

Sunday, June 10, 2012

PHP's future: PHP unframework

PEAR,PECL these are well known bundle of libraries of PHP. There is one more Ez Components
which was designed for enterprise applications.

Recently i got chance to see few more packages which are same as RAIL, CPAN or NMP(Nodejs packages).

This is called "PHP unframework" a general-purpose, object-oriented libraries. It has a modular architecture, meaning it isn’t strictly MVC. It focuses on being secure, well documented and easy to use, while solving problems intrinsic to web development.

Few good examples are:

Flourish
Spoon http://www.spoon-library.com/
http://getcomposer.org/

I read a good article that Symfony2(latest version) isn't an "MVC framework", it's just a bunch of loosely coupled components working nicely together. Article was written by lead Symfony developer.

http://fabien.potencier.org/article/49/what-is-symfony2

So much is awaiting for final shape, so keep waiting and ready to use these awesome libraries.

Happy Coding !!!

storing secure encrypted passwords

We recently got news that linkedin password hashes are hacked. Generally common practice in web development is to hash the user password and store the resulted hash string believing that chance of two distinct strings having the same hash string is so low that it’s deemed mathematically impossible but now computers are become so smart and SKYNET type computing power can change the story.

rainbow tables (http://en.wikipedia.org/wiki/Rainbow_table), the mapping function from hash strings to any possible combinations of keyboard characters. With trillions of records in rainbow tables, it takes only 160 seconds to crack the password “Fgpyyih804423” which we presume fairly safe (http://www.codinghorror.com/blog/2007/09/rainbow-hash-cracking.html).

Finally What should we do ?

1. Avoide storing plain text password always use encrypted hashes.Storing text password is bad practice, infact "bad practice" is an understatement. "Irresponsible" might be more accurate.
Storing passwords in plain text is an embarrassing security breach waiting to happen.
Reasonable efforts must be made to keep sensitive data secure. Storing passwords in plain-text will clearly violate this, and in turn potentially null any indemnity insurance you have in the event of a breach of security.

So create SALT randomly for each HASH string and store it into DB.
Avoid MD5 use SHA256 or blowfish algorithms.
Use strong algo to generate random key.
If you are using private key or something then store it in ENV or in file where no one can access it.

$salt = generate_random_salt()

$my_hash = sha1($salt.$secret);

cracker has no idea what the salt is, there’s no way he can create the right rainbow table to
perform the crack. Even if he does, he would have to specifically build a rainbow table to crack your database which can be time-consuming. Subsequently, to make this even more difficult for the cracker, you can use different salts for each of the password entries in the database.

2. At last, it is recommended (http://chargen.matasano.com/chargen/2007/9/7/enough-with-the-rainbow-tables-what-you-need-to-know-about-s.html) that generate the initial hash string (the one to be stored in database) by running 1000 iterations of hashing instead of just 1. The extra computing burden on your server is negligible while it will increase the time needed to crack a single password by 1000 times at the cracker’s end. The point is to make the hashing process as slow as possible rather than the other way around. As the cracking usually makes password guesses and trial logins at a much higher paced speed, the slowness will have a much more detrimental effect on the cracker than on your website.

Main Key Areas:

Data storage - Store the passwords far away from where an attacker can get in.

Encryption - Any encryption technique is a delaying tactic - one the attacker has your data, they will eventually crack your encryption given an infinite amount of time. So mostly you're aiming to slow them down long enough for the rest of the system to discover you've been hacked, alert your users, and give the users time to change passwords or disable accounts.

Key storage - Encryption is only as good as your key storage. If the key is sitting right next
to the encrypted data, then it stands to reason that the attacker doesn't need to break your crypto, they just use the key.

Intrusion detection - Have a good system in place that has a good chance of raising alarms if you should get hacked. If your password data is compromised, you want to get the word to your users well ahead of any threat.

Audit logging - Have really good records of who did what on the system - particularly in the
vicinity of your passwords.