Sunday, April 20, 2014

awk and log parsing


Find the number of total unique visitors:

cat access.log | awk '{print $1}' | sort | uniq -c | wc -l

2. Find the number of unique visitors today:

cat access.log | grep `date '+%e/%b/%G'` | awk '{print $1}' | sort | uniq -c | wc -l

3. Find the number of unique visitors this month:

cat access.log | grep `date '+%b/%G'` | awk '{print $1}' | sort | uniq -c | wc -l

4. Find the number of unique visitors on arbitrary date – for example March 22nd of 2007:

cat access.log | grep 22/Mar/2007 | awk '{print $1}' | sort | uniq -c | wc -l

5. (based on #3) Find the number of unique visitors for the month of March:

cat access.log | grep Mar/2007 | awk '{print $1}' | sort | uniq -c | wc -l

6. Show the sorted statistics of “number of visits/requests” “visitor’s IP address”:

cat access.log | awk '{print "requests from " $1}' | sort | uniq -c | sort

7. Similarly by adding “grep date”, as in above tips, the same statistics will be produces for “that” date:

cat access.log | grep 26/Mar/2007 | awk '{print "requests from " $1}' | sort | uniq -c | sort

Most Common 404s (Page Not Found)
cut -d'"' -f2,3 /var/log/apache/access.log | awk '$4=404{print $4" "$2}' | sort | uniq -c | sort -rg

2 - Count requests by HTTP code

cut -d'"' -f3 /var/log/apache/access.log | cut -d' ' -f2 | sort | uniq -c | sort -rg

3 - Largest Images
cut -d'"' -f2,3 /var/log/apache/access.log | grep -E '\.jpg|\.png|\.gif' | awk '{print $5" "$2}' | sort | uniq | sort -rg

4 - Filter Your IPs Requests
tail -f /var/log/apache/access.log | grep

5 - Top Referring URLS
cut -d'"' -f4 /var/log/apache/access.log | grep -v '^-#39; | grep -v '^http://www.yoursite.com' | sort | uniq -c | sort -rg

6 - Watch Crawlers Live
For this we need an extra file which we'll call bots.txt. Here's the contents:


Bot
Crawl
ai_archiver
libwww-perl
spider
Mediapartners-Google
slurp
wget
httrack


This just helps is to filter out common user agents used by crawlers.
Here's the command:
tail -f /var/log/apache/access.log | grep -f bots.txt

7 - Top Crawlers
This command will show you all the spiders that crawled your site with a count of the number of requests.
cut -d'"' -f6 /var/log/apache/access.log | grep -f bots.txt | sort | uniq -c | sort -rg


How To Get A Top Ten
You can easily turn the commands above that aggregate (the ones using uniq) into a top ten by adding this to the end:
| head

That is pipe the output to the head command.
Simple as that.

Zipped Log Files
If you want to run the above commands on a logrotated file, you can adjust easily by starting with a zcat on the file then piping to the first command (the one with the filename).

So this:
cut -d'"' -f3 /var/log/apache/access.log | cut -d' ' -f2 | sort | uniq -c | sort -rg
Would become this:
zcat /var/log/apache/access.log.1.gz | cut -d'"' -f3 | cut -d' ' -f2 | sort | uniq -c | sort -rg

confusion related to HMVC design in codeigniter

There are two main different features that HMVC adds to CodeIgniter which often confuses people:
  1. Modular MVC
  2. Hierarchal MVC
Modular MVC is the feature that most people want to use and is essentially just a way to have a cleaner folder structure.
HMVC is the practise of calling controllers from other controllers without the need for a new HTTP request. This is very rarely useful in my opinion, other than for things like calling a custom 404 page or "widgets".
MMVC adds barely anything to performance, calling a controller via HMVC is obviously almost twice as slow.

Either way neither will be noticeable. If your site is starting to crawl under high traffic then this is one of the last things you'll need to worry about.

nice website based on HTML5

https://chains.cc/
http://hypem.com/popular/
http://pinterest.com/
http://www.endomondo.com/
http://habitforge.com/
http://visual.ly/
http://www.evernote.com/
http://www.thesixtyone.com/
https://www.crunch.co.uk/
https://andbang.com/
http://kippt.com/
http://101in365.com/
http://mugtug.com/
http://www.tweetaboogle.com/
http://www.zocial.tv/
http://www.dearmap.com/

Technology related Blogs You must read

Here is the list. :-) Few links could be dead so please report me about broken links so that i can update.

meminfo Explained

"Free," "buffer," "swap," "dirty." What does it all mean? If you said, "something to do with the Summer of '68", you may need a primer on 'meminfo'.

The entries in the /proc/meminfo can help explain what's going on with your memory usage, if you know how to read it.
Example of "cat /proc/meminfo":
root: total:     used:     free:    shared: buffers: cached:
Mem:   1055760384 1041887232 13873152 0 100417536  711233536
Swap:  1077501952   8540160  1068961792     
  • MemTotal: 1031016 kB
  • MemFree: 13548 kB
  • MemShared: 0 kB
  • Buffers: 98064 kB
  • Cached: 692320 kB
  • SwapCached: 2244 kB
  • Active: 563112 kB
  • Inact_dirty: 309584 kB
  • Inact_clean: 79508 kB
  • Inact_target: 190440 kB
  • HighTotal: 130992 kB
  • HighFree: 1876 kB
  • LowTotal: 900024 kB
  • LowFree: 11672 kB
  • SwapTotal: 1052248 kB
  • SwapFree: 1043908 kB
  • Committed_AS: 332340 kB
  • The information comes in the form of both high-level and low-level statistics. At the top you see a quick summary of the most common values people would like to look at. Below you find the individual values we will discuss. First we will discuss the high-level statistics.

High-Level Statistics

  • MemTotal: Total usable ram (i.e. physical ram minus a few reserved bits and the kernel binary code)
  • MemFree: Is sum of LowFree+HighFree (overall stat)
  • MemShared: 0; is here for compat reasons but always zero.
  • Buffers: Memory in buffer cache. mostly useless as metric nowadays
  • Cached: Memory in the pagecache (diskcache) minus SwapCache
  • SwapCache: Memory that once was swapped out, is swapped back in but still also is in the swapfile (if memory is needed it doesn't need to be swapped out AGAIN because it is already in the swapfile. This saves I/O)



Detailed Level Statistics

VM Statistics


VM splits the cache pages into "active" and "inactive" memory. The idea is that if you need memory and some cache needs to be sacrificed for that, you take it from inactive since that's expected to be not used. The vm checks what is used on a regular basis and moves stuff around.
When you use memory, the CPU sets a bit in the pagetable and the VM checks that bit occasionally, and based on that, it can move pages back to active. And within active there's an order of "longest ago not used" (roughly, it's a little more complex in reality). The longest-ago used ones can get moved to inactive. Inactive is split into two in the above kernel (2.4.18-24.8.0). Some have it three.
  • Active: Memory that has been used more recently and usually not reclaimed unless absolutely necessary.
  • Inact_dirty: Dirty means "might need writing to disk or swap." Takes more work to free. Examples might be files that have not been written to yet. They aren't written to memory too soon in order to keep the I/O down. For instance, if you're writing logs, it might be better to wait until you have a complete log ready before sending it to disk.
  • Inact_clean: Assumed to be easily freeable. The kernel will try to keep some clean stuff around always to have a bit of breathing room.
  • Inact_target: Just a goal metric the kernel uses for making sure there are enough inactive pages around. When exceeded, the kernel will not do work to move pages from active to inactive. A page can also get inactive in a few other ways, e.g. if you do a long sequential I/O, the kernel assumes you're not going to use that memory and makes it inactive preventively. So you can get more inactive pages than the target because the kernel marks some cache as "more likely to be never used" and lets it cheat in the "last used" order.


Memory Statistics

  • HighTotal: is the total amount of memory in the high region. Highmem is all memory above (approx) 860MB of physical RAM. Kernel uses indirect tricks to access the high memory region. Data cache can go in this memory region.
  • LowTotal: The total amount of non-highmem memory.
  • LowFree: The amount of free memory of the low memory region. This is the memory the kernel can address directly. All kernel datastructures need to go into low memory.
  • SwapTotal: Total amount of physical swap memory.
  • SwapFree: Total amount of swap memory free.
  • Committed_AS: An estimate of how much RAM you would need to make a 99.99% guarantee that there never is OOM (out of memory) for this workload. Normally the kernel will overcommit memory. That means, say you do a 1GB malloc, nothing happens, really. Only when you start USING that malloc memory you will get real memory on demand, and just as much as you use. So you sort of take a mortgage and hope the bank doesn't go bust. Other cases might include when you mmap a file that's shared only when you write to it and you get a private copy of that data. While it normally is shared between processes. The Committed_AS is a guesstimate of how much RAM/swap you would need worst-case.

Android Memory Management

Android OS, as of V2.2 (and 2.3) have two general types of memory: internal storage (sometimes known as application stroage), and SD card (which may be flash memory that works like SD card, but not physical SD card, such as in the Nexus S).

All apps (in the form of APKs) are loaded into "app storage" part of the "ROM" (actually flash RAM). Part of the ROM is the boot ROM which loads the system. The other half of the ROM is "app storage". For example, in Motorola Droid, 256MB is RAM, and 512MB is ROM. Out of 512MB ROM, 256MB is Android System itself (actually a bit less), and the rest is "app storage", to max of 256MB.

With Android 2.2 and "Move2SD", a portion of the APK can be moved onto the "SD card", but main portion must remain in [s]internal[/s] app storage. The size of the main portion that stays would depend on the app. Some apps cannot be moved or will not function if moved. "Protected" apps cannot be moved. Apps that primarily consist of a service and a widget may not work if moved. add Services or widgets needed for startup should not be moved.

For example, If you have a 256MB system (shows as 262MB due to 1024 vs. 1000 KB size difference) and have 130MB of apps and data/cache loaded, then that leaves about 130MB for the system to actually RUN programs. That sounds like a lot, but in reality that is not enough, since the system itself takes 50-80MB, and services will take up another 30-50MB, leaving almost nothing. 

256MB RAM phone such as my Moto Droid, AutoKiller shows...

  • acore : 4.55MB (system)
  • dialer: 8.95MB (system)
  • system: 20.38MB (system)
  • autokiller:5.68MB
  • messaging: 3.41MB (system)
  • Swiftkey: 6.59MB
  • JuiceDefender 4.14MB
  • Calendar Storage 4.1 MB (system)
  • acore: 7.7MB (different pid) (system)
  • smart taskbar 3.81 MB
  • seePU 3.44MB
  • Screebl 4.38MB
  • SetCPU: 3.83MB
  • ATK Froyo 3.01MB
  • gapps: 7.79MB (system)
  • and 2 more at 4.66MB and 3.56MB
That adds up to... 99.88, or 100 MB.

But that is supposed to leave 156MB, right? Wrong. The system itself takes about 100 MB by itself, in addition to loaded programs, according to this thread about T-Mobile G1 (which has 192 MB of RAM, and has about 96000 KB after booting)

  • MemTotal: 231740 KB, or 226MB
  • MemFree: 3376
  • Buffers 272
  • Cache: 34960
  • SwapCache: 0

So the system (before OS kernel) uses about 30MB leaving about 226 MB
Cache itself used another 35 MB. , leaving about 189 MB
Minus 100 MB of auto-loaded apps, and you get... 89 MB.

If you run any programs that need more than that, programs and services will be killed to make room.

Native vs. Dalvik

There are two types of Android programs... "Native" programs, and VM programs.

Native programs are written for the specific CPU in the machine. While this gives better performance, this is much harder to achieve, so most people write program for the VM, or "Virtual Machine".

A "virtual machine" is basically a CPU emulator. You feed it a program, and it will run this program, as if it's a real CPU. The good thing about using a VM is it doesn't matter what the actual physical CPU the device uses. You write the program once, and never have to worry about converting it to other CPUs.

Android's VM is called Dalvik, and it is similar to Java's virtual machine. (In fact, Sun/Oracle sued Google for violating Java copyrights on JVM)


Different pieces of a single app

Most apps have either just an activity, or activity along with a service.

"Activity" is basically the user interface that takes your inputs and displays something back. Foreground app would be an activity.

"Service" is a background program that updates something. Common services includes input, widget updates, mail notification, and so on. Other services include Bluetooth, network updates, and so on.

(Actually there are two more types: broadcast receiver, and content provider, but those are not that pertinent to our discussion)

An app can use a widget, and the widget can use a lot of memory, usually several MB at once. You can see the different services and how much memory they are taking under Settings / Applications / Manage Services


How Services Use Memory

As explained above, Android OS have to run programs from within the limited space available, which, on older phones, isn't much. [s]From within that much memory, it needs system work space to load all the services (you probably have a dozen loaded, taking up at least 30 MB) System itself uses about 60-70 MB (acore, phone, gapps, messaging, etc.) That's 100 MB used. That doesn't leave much memory for anything else, if you have 100 MB of apps loaded. (256-100-100=56)[/s] 100MB for system itself, about 100MB used for apps and services, and you got almost nothing left.

If you look at the services screen, at the bottom, there's a bar: red, yellow, and green. There is a number in the red section and some in green. Your services adds up to the number in the green section. The yellow portion is some memory that can be freed. The red stuff are system stuff and can't be moved.


What Happens When System Runs Out of Memory

When the system needs to load programs, but don't see enough available, it will start killing programs and services (to the system, they are all considered "process") from memory based on the following priority:
Empty App: the app is in standby, not being used, but is still in memory. These can be killed without any effect.
Content Provider: process that provides content to the foreground, such as "contacts content provider", "calendar content provider", and so on. Various "storage" are also content providers. Those can be restarted when needed.

Hidden Application: apps not visible, but still running in the background. These are not exactly running, so killing them should have no serious consequences.

Secondary server: services that stay in background and apps such as Launcher (or other home replacements). Most services go here, like music player, clock updater, background sync, and so on, that's not built into the OS. If these are killed there may be some problems, such as the playback is interrupted, background sync stops, widget no longer updates, and so on.

Visible app: the app is running and visible, but due to multi-tasking or such is not currently "on top". Any program with a display in the notification area is considered "visible". Android OS will not kill these programs unless absolutely necessary, but it can happen.

Foreground app: you see this app on screen, currently running, but also includes the system itself and "phone". These are never killed. In any case, system and phone have much higher priority than any app to make sure those are never killed.

Each category above has a certain number associated with it, sometimes known as a "minfree" value (in either "pages" or megabytes, depending on the app). When Android OS free memory drops below the minfree value for that category, apps in that category are killed. The killing starts Empty App group as that has the highest number. if that's not enough, it then starts killing apps in the Content Provider Group, and it keeps going until it has finally freed up enough memory to load the app and all related processes (such as services).

NOTE: Having a constant "notification" in the notification area makes the program "visible app" instead of "hidden app", thus making it less likely to be killed by the system to make room for other apps.

A lot of problems with Android device occur when the system tries to make room by killing "secondary server" processes that are needed. Playback of audio (music or podcast) stopped, download stopped, location services stopped... etc. This especially happens on phones with little RAM. First Android phone, T-Mobile G1 / HTC Magic, has 192MB of RAM. Moto Droid have 256MB of RAM. Second generation of Android phones, like HTC Wildfire, got 384MB of RAM. Recent phones, like Droid X, Galaxy S, and so on got 512MB.

NOTE: Some apps, like web browser, can exit but still save the URL you were browsing. So when the process reloads, it is almost as if it was never unloaded. Unfortunately not all apps can do that.

So what is the solution?

There are two approaches to the problem: make more memory available, or pre-empt the auto-kill by killing apps yourself.

Making More Memory Available

There are four ways to make more memory available short of exchanging the phone for a more powerful one.

1) Free up more app storage / internal storage

[s]Either uninstall the apps altogether, or move2sd as much as possible. Keep in mind move2SD may not work for all apps, and amount that can be freed varies greatly. Uninstall an app is best, as it both frees up the space itself takes, and if it loads a service, that service is loaded either, saving even more space. [/s]

While it's true that the app that wasn't run won't take up any space, every widget is served by a service, and a small app can load a HUGE service by calling existing libraries and declare a large buffer for downloads. And just because you don't actually use the app doesn't mean the system will not load it. The only way to make sure the app will NOT be loaded is to uninstall it (or if you have Titanium Backup premium, you can "freeze" the app)

2) VMHeap

VMHeap adjusts the the amount of memory that can be dedicated to the Dalvik Virtual Machine (VM). In general this should not be touched, and does not really make more memory available. It is available only for experimentation purposes.

This usually is NOT tweakable without mod ROM such as Cyanogen Mod. And benefits are unproven so far. Don't change anything yet.

3) CompCache

CompCache, or "compressed cache", is handled by the Linux kernel. It takes a portion of your memory, and use it as a cache space, but compressed. By using on-the-fly compression it is able to make your memory appear to be a bit larger than it actually is. However, the result is slower performance.