corner image corner image
corner image corner image
corner image corner image
corner image corner image
corner image corner image
corner image corner image
corner image corner image

Archive for the ‘Google / Internet / Programming’ Category

Optimizing your website

Tuesday, February 12th, 2008

One of the most important factors that will lead to visitors leaving is slow loading pages. And at that point, we should make distinction between sveral types of sluggishness: A. Slow due to HTML bloat B. Slow due to long server processing time C. Slow due to long browser rendering time.

(more…)

SEO friendly URLs with php, apache and mod-rewrite

Thursday, February 7th, 2008

Words used in URLs give tips to search engines about the content of the page. It also helps advertises figure out what the page is about. A URL that looks like node15 isn’t really optimized for search engines.

(more…)

Using atop to track your optimization results

Tuesday, January 29th, 2008

Recently, I rediscovered atop. It is an excellent command-line tool (just like top) that shows some advanced statistics including disk usage and swap page faults per process. For the disk usage to work per process a kernel patch is currently required. However, even without the kernel patch if you know which processes running on your server you’ll be able to guess which is utilizing the disk more.

(more…)

Some thoughts about image and text ads

Wednesday, January 23rd, 2008

Lately, I’ve been thinking about whether mixing text and image ads is really profitable. The case that most advertising companies (like Google adsense, adbrite and lots of others) make is that allowing image ads in the same ad unit as text ads is usually better because they compete in price with text ads. But, my understanding is that image ads will only show if they beat the highest text ad price. I tried to analyze and see whether the final profit will be larger in case of mixed ads or not and I found conditions for image ads to make more money if allowed to compete against text ads in the same ad unit.
(more…)

Postgresql is really beating my expectations

Tuesday, January 15th, 2008

Since I updated the website this weekend and it’s been downloading images from MRIS non-stop. In a previous post I said that I had around 900,000 images. I later on discovered that I had dangling images whose listings were deleted. Since this weekend, the update script downloaded almost 820,000 images from MRIS and inserted those binaries inside bytea fields in postgresql. I was expecting things to slow down, however I don’t feel any slow down since the process started. It seems the postgres team really optimized the b-tree indexes to the extreme.

(more…)

lawburg.com is up and running

Monday, November 5th, 2007

I was playing with legal websites from a while, then though why don’t I use postgres on the code of federal regulations & see what happens? I found that it indexed the law pretty well. I wouldn’t say it is the fastest search engine but I think I can do something very competitive. I called the website lawburg.com since lawcity was taken. It is quite familiar in the area that I live in to use the suffix burg in the city names like Gaithersburg or Clarksburg in Montgomery county. So I used the suffix burg with the word law to mean the law city.

Hopefully I’ll do more when I have more spare time.

Brute force attacks over SSH port

Friday, August 24th, 2007

On one of my searches on Google zaf’s blog came up again, this time he was complaining from SSH attacks. I looked in my /var/auth.log and I find this:

Aug 21 13:56:17 local sshd[7744]: Failed password for root from 202.171.152.211 port 42343 ssh2
Aug 21 13:56:19 local sshd[7752]: (pam_unix) authentication failure; logname= uid=0 euid=0
tty=ssh ruser= rhost=202.171.152.211.static.zoot.jp user=root
Aug 21 13:56:21 local sshd[7752]: Failed password for root from 202.171.152.211 port 43304 ssh2
Aug 21 13:56:26 local sshd[7768]: (pam_unix) authentication failure; logname= uid=0 euid=0
tty=ssh ruser= rhost=202.171.152.211.static.zoot.jp user=root
Aug 21 13:56:28 local sshd[7768]: Failed password for root from 202.171.152.211 port 44577 ssh2
Aug 21 13:56:30 local sshd[7770]: (pam_unix) authentication failure; logname= uid=0 euid=0
tty=ssh ruser= rhost=202.171.152.211.static.zoot.jp user=root
Aug 21 13:56:31 local sshd[7770]: Failed password for root from 202.171.152.211 port 46364 ssh2
Aug 21 13:56:33 local sshd[7776]: (pam_unix) authentication failure; logname= uid=0 euid=0
tty=ssh ruser= rhost=202.171.152.211.static.zoot.jp user=root
Aug 21 13:56:36 local sshd[7776]: Failed password for root from 202.171.152.211 port 47340 ssh2
Aug 21 13:56:38 local sshd[7782]: (pam_unix) authentication failure; logname= uid=0 euid=0
tty=ssh ruser= rhost=202.171.152.211.static.zoot.jp user=root
Aug 21 13:56:40 local sshd[7782]: Failed password for root from 202.171.152.211 port 48644 ssh2
Aug 21 13:56:42 local sshd[7790]: (pam_unix) authentication failure; logname= uid=0 euid=0
tty=ssh ruser= rhost=202.171.152.211.static.zoot.jp user=root
Aug 21 13:56:44 local sshd[7790]: Failed password for root from 202.171.152.211 port 49725 ssh2
Aug 21 13:56:46 local sshd[7800]: (pam_unix) authentication failure; logname= uid=0 euid=0
tty=ssh ruser= rhost=202.171.152.211.static.zoot.jp user=root
Aug 21 13:56:47 local sshd[7800]: Failed password for root from 202.171.152.211 port 50548 ssh2

By just putting the IP on Google, I found that this guy is very famous. And it repeats twice or three times a day from another Korean IP. After searching the net the most promising and easiest to install solution was denyhosts . It looks in the /var/auth.log and applies rules to filter the IPs which are attacking your machine. It then adds them to /etc/hosts.deny . A nice feature is a shared xml/rpc service where every host running deyhosts can share the IPs trying to attack his/her machine. Also, download the latest black list .

Installation was practically trouble free. I used synaptic to add the package, edited /etc/denyhosts.conf and enabled the SYNC parameters to share the IPs of attackers. After enabling the SYNC, my hosts.deny was filled by more than 1400 IP addresses ! Amazing how much time those people have to annoy us, instead of focusing on building something useful.

A quick not - I noticed that the attackers recognize services running on standard port numbers. So they assume that FTP is on port 21, and SSH is on port 22. A very simple countermeasure is to run the services on non-standard ports - this would at least thwart almost 99% of those attacks.

Technorati Tags: , , , , , , , , , , , ,

Zoho and online office

Wednesday, August 22nd, 2007

One of the best tools that I used this year is Google Documents. I really liked the extent to which they stretched the use of javascript, and created almost an online office suite.

I was reading on the google code blog when I read a post about zoho.com , and it was the first time for me to learn about zoho.com . I tried the website, and it is way more advanced than Google docs. It includes online presentations, project management, online database and even a shared testing application.

I honestly believe that Microsoft should focus on the OS, do a good job there and stop stretching themselves so thin. They are no where close or dream to be close to any of those apps.

Google earth goes outside earth

Wednesday, August 22nd, 2007

I read Google’s post , and it’s just amazing how those guys are thinking. I believe that those teams brainstorm and have good quality product planning, that list and prioritize features as well as developers that realize those and don’t stop or refactor the product every a couple of years.

So in summary, Google earth is now capable of viewing the skyline - which what I call the result of good quality lateral thinking and extending older ideas into today’s technology.

Installing freenx on ubuntu feisty - step by step till you see the GUI

Tuesday, August 14th, 2007

I’ve read a lot in the internet, and many posts claim that freenx installs seamlessly and works without any hiccups even the ubuntu community docs, which might in fact have been true at one point of time. The one that really worked well with no problems is nxserver from nomachine, however, the free version of nxserver is limited to two users and two sessions. This makes it impractical if a machine is being accessed by more than two.

Freenx comes in as the solution however, every time I try to install freenx I spend at least an hour trying to guess what’s going on and then I leave off to something more useful. This time I decided to take it all the way to the end. I debugged every call from the nxserver file, till I located the problem and finally got it to work.

I will describe what I did and how I got it to work, although the solution might not work for every one and on every setup, yet it is better than nothing.

STEP 1
Add the following lines to your /etc/apt/sources.list [exactly like the community docs say]
deb http://free.linux.hp.com/~brett/seveas/freenx feisty-seveas freenx
deb-src http://free.linux.hp.com/~brett/seveas/freenx feisty-seveas freenx

STEP 2
apt-get install freenx

STEP 3
vi /usr/lib/nx/nxnode

Goto line 482 (press 482gg)

Edit the line (press i), change it to:
PATH=”$PATH_BIN:$PATH”

Add another line after:
$PATH_BIN/nxagent :$display 2>&3 &

Go out of editing mode - press ESC

Save the file type :wq and press enter

STEP 4
vi /etc/nxserver/node.conf

go to line 70 (70gg)
uncomment ENABLE_SSH_AUTHENTICATION=”1″

STEP 5
Download the nxclient for windows from the nomachine’s website. Make sure the SSL is not checked (i.e. SSL encryption is on). Connect with your client to your server and enjoy.

What was the problem with freenx then ?
The nxagent does not run as expected, and goes out with exit code 1 because it can’t find an option file somewhere inside the .nx directory. Earlier in the process, that option file should have been generated but it looks like it does not. After the nxagent terminates, your client does not find any one to communicate with on the other side and finally gives up and generates an error. If you remove the options parameter from nxagent line, sometimes nxagent goes out with an error because it can’t find a font ! so I ended up removing all parameters except the display. Well, it worked ok for me & I hope it does for the rest of you.

Technorati Tags: , , , , ,

Natural SEO

Friday, August 3rd, 2007

Adam wrote a post on Google’s webmaster central blog. I believe the way he describes cross linking, server location roots from usability and benefit to the website visitor. One way or another Google should be measuring the quality and usefulness of a site to the visitor. Things like bounce rates, pages per visit and of course inbound and outbound link quality are all factors here in play.

I believe that a website would succeed if it can capture people’s attention, and that point is the point where search engines will be sending even more visitors to that website. It’s actually the reverse of what we’re doing. Focusing on how to please the search engines won’t actually please them. But giving quality and a usable website to the visitor will make the visitor come again.

One other comment that goes inline with the same thought was also made by Adam on web masters world when he mentioned in the middle if this means site navigation that’s broken without javascript, well, that indicates a potentially significant user experience problem, and that should generally trump any SEO-related concerns.

No more supplemental results

Wednesday, August 1st, 2007

Google announced yesterday that supplemental results went mainstream.

This was a very important news for webmasters who used to complain A LOT about my pages are all supplemental, please help. I was one of those people for a very long time. The problem now, is that there is no reason to complain as there is no proof that there pages are supplemental. After all, showing that a page is supplemental does not contribute to the searcher’s experience.

Although I know I have supplemental pages in my website, my main goal is to increase my website traffic an convergence rates, and that what I learned over the course of the past three years. The last thing I learned to care about is how many pages are supplemental. Google, yahoo and other search engines will be able to send a certain amount of daily visitors. It is up to us to make those visitors come back again.

Currently I’m working on a newer version of the website that enhances the user experience, increases the visitors engagement and creates a relationship with the website visitor. I believe that this is the correct route to grow my website.

Technorati Tags: , ,

Analytics broken

Tuesday, July 31st, 2007

On July 28th, I suddenly found no traffic in my analytics account. Although I repeatedly looked on their blog, no posts were there till Monday when Google people noticed and posted this. I hope that one day they will implement an e-mail notification system instead of making us think that something is broken with our websites !

Now I’m using more than one stats service, other than awstats with daily breaks. Previously I used to use awstats and now I gave up on it as it is not meant for a large website.

Beside analytics, I’m giving hittail a try, and see what kind of info will it add beside analytics.

Internet data: host sites comparision, site details and how many websites are there

Saturday, June 30th, 2007

I was browsing yesterday when I stumbled upon Netcraft. They provide a lot of information like domain related information, and the uptime of that domain. They issue a monthly webserver survey that shows how many domains are active out there, and which web server are they running. Another monthly report shows the most reliable hosting companies out there.

How many websites are out there?

What webserver are those websites running?

The number of websites are 122,000,635 according to netcraft in their June 2007 survey 3.5% more than in May 07. This means that the number of active websites double approximately every 20 months (log 2 / log 1.035 :) ).

After running my website for two and half years, now I realize that this data would have been extremely useful when I was picking my first web host. I hope that other starting their websites can benefit from that information.

Technorati Tags: , , , , , ,

Editing code using VIM for C / C++ / PHP versus emacs

Monday, June 25th, 2007

There are two scripts that I always use with VIM:
1. Tlist (or tags list): It lists the functions, variables in all opened source files.
2. Nerd commenter: Provides easy comment to source code, the easiest way is using Ctrl+C in during insert.

I’ve been using vim for a while, and lately I tried to use emacs for editing my code. I discovered that emacs was a much better editor, that takes so much time to configure. It is highly configurable to the extent that it can take you days to customize it to the point that makes you happy. I liked the M-; fast comment shortcut in emacs, and its C-style indentation (corresponding to set cindent in vim)

Since I was trying to emulate the same environment I had with vim, I looked for something similar to Tlist, and I found an elisp program called ecb (emacs code browser), the provides that tags in the opened source file. I also found the speedbar from the cedet-tools website, that provides a similar functionality. Both tools were not able to browse php code, unfortunately.

Although I believe that emacs is a much better editor, at the moment the available plugins do not support PHP, which for me is very important as I do a lot of php editing. However, for C/C++ I will be using emacs (or xemacs) for developing as it is much easier to work with.

Technorati Tags: , , , , , , , , , , , , , , ,

Cumulative percentage of website traffic

Tuesday, June 19th, 2007

Cumulative percentage of mibrahim.net trafficEvery day of the week sees a different traffic pattern. For my website for example Mondays and Tuesdays are the busiest. Websites of other nature and other geographic targets will see different patterns.

I always wanted at the start of the day to have a rough estimate of the traffic. twatch provides and estimate, and I guess it is just a linear estimate. I used the hourly bandwidth output from awstats and calculated the cumulative percentage over the day. By this way, at the end of noon (just when the clock turns one), I divide the number of sessions (or page views) by 0.4 - which gives a rough idea of how many sessions or page views I’ll see that day.

Technorati Tags: , , , , , , , , , , , , , , ,

apache log and awstats

Friday, May 18th, 2007

When I was hosting my website on bluehost.com I used awstats to track the website usage. Although awstats is not as advanced as Google analytics, yet it has a very important feature that Google analytics does not have, which is analyzing apache logs. Due to this, awstats shows what apache sees.

When I did that, I noticed that three other domains that I own, are pointing to the same server and no virtual servers are set for them. The result of this was that my main website (mibrahim.net) shows on four domain names. Worse than that, yahoo and google crawled parts of the website, and yahoo cached some pages using one of the other domains.

Of course this is one of the reasons I saw slow traffic due to repeated content penalties from search engines, and I’m anticipating weeks till the traffic goes back to the older levels.

One good lesson, use more than one web stats tool to analyze your traffic - it won’t hurt, but sometimes can benefit.

Technorati Tags: , , , , , , , , , , , ,

Mail moved out

Tuesday, May 8th, 2007

Finally my website is totally moved out of bluehost.com. The last step in the process was moving my mail out, and I did that some yesterday, and some today. Bluehost is a very good hosting company, actually they are the best shared hosting company I used, after lunarpages. They have lots of limitations, which are of course due to the shared hosting environment.

I learned about google apps a week ago. I tried it today, and found that it is excellent. I switched the MX records to point to google mail exchange servers, and now that’s how my dig looks like:

; < <>> DiG 9.3.4 < <>> MX mibrahim.net
;; global options:  printcmd
;; Got answer:
;; ->>HEADER< <- opcode: QUERY, status: NOERROR, id: 60105
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 5

;; QUESTION SECTION:
;mibrahim.net.                  IN      MX

;; ANSWER SECTION:
mibrahim.net.           959     IN      MX      5 ALT2.ASPMX.L.GOOGLE.COM.
mibrahim.net.           959     IN      MX      1 ASPMX.L.GOOGLE.COM.
mibrahim.net.           959     IN      MX      10 ASPMX2.GOOGLEMAIL.COM.
mibrahim.net.           959     IN      MX      5 ALT1.ASPMX.L.GOOGLE.COM.

;; ADDITIONAL SECTION:
ASPMX.L.GOOGLE.COM.     257     IN      A       72.14.247.27
ASPMX2.GOOGLEMAIL.COM.  440     IN      A       209.85.135.27
ALT1.ASPMX.L.GOOGLE.COM. 15     IN      A       72.14.253.27
ALT2.ASPMX.L.GOOGLE.COM. 20     IN      A       64.233.167.27
ALT2.ASPMX.L.GOOGLE.COM. 20     IN      A       64.233.167.114

;; Query time: 8 msec
;; SERVER: 71.252.0.12#53(71.252.0.12)
;; WHEN: Tue May  8 11:52:46 2007
;; MSG SIZE  rcvd: 220

After thinking a little bit of what google did, I discovered how extremely smart they are. Emails are basically more impressions to their ads. Instead of making people sign-up and some of which might end up being spammers, they made companies sign-up,and obviously that’s more reliable than individual users. Technorati Tags: , , , , , , , , , , , , ,

Avoid PR leakage with technorati simpletags

Monday, May 7th, 2007

Like many other people, I tag my wordpress posts with relevant tags. This process usually leads to better coverage from technorati, and from other blog related websites that read the tags from your post. The way tags are usually done is by placing a link to technorati.com/tag/, and that’s the way plugins like simpletags operate.

Today I read at technorati that you can still tag your post without linking to technorati. That will be great as it will avoid leaking PR through many links pointing to technorati. Although many people debate the effect of page rank leak, it is better to be cautious especially if your website brings in business to you.

So I opened the simpletags plugin source code, and found a couple of locations to update. I basically changed the link of the tag from technorati to ‘#’, which has the effect of pointing to the same page. In this way all the tags should be processed by technorati (and others), and links will be pointing to your own page with static text links. In my opinion this should tip search engines with some keywords, and stop leaking PR if there was a PR leak. P.S. PR is only used by Google but similar link-based ranking methods are used with other search engines like Yahoo.

I attached the file to this post .

Technorati Tags: , , , , , , , , , ,

Out of the box apache versus my own compilation

Wednesday, May 2nd, 2007

I’ve been running tests trying to figure out whether to use the server packages that come pre-compiled with the linux distributions or to compile my own. I always was under the impression that compiling your own and optimizing for your own architecture is better.

When I actually tested that today, I found that there is a tremendous increase in speed when you compile and optimize for your architectures. I compared the apache that comes with ubuntu feisty versus my own compilation with export CFLAGS='-O3 -mtune=opteron', which makes gcc optimize for the opteron processor with full optimizations.

I ran a local copy of my compiled apache2, and used apache bench ab to access the default ‘It works!’ page with -n500 -c50. My local compiled version got 7700 requests per second. I copied the same file, and placed it inside the apache2 directory with the fiesty distribution. I tested again with apache bench with the same parameters. I got 2200 requests a second.

The 300% difference in apache2 performance was just amazing! I know from my experiece that -O3 with gcc makes a big difference in performance, I didn’t imagine that coupling the -O3 with the processor tuning will result in that boost. Plus I don’t really know what is the default compilation of Ubuntu packages ! may be they didn’t use -O3.

I repeated the same using lighthttpd, using the same file. The results of apache bench was 11953 requests/second - just amazing !

Technorati Tags: , , , , ,
corner image corner image
2,241 spam comments
blocked by
Akismet