Friday 14 February 2014

Extracting the high level domain (or sub domain) from URL in T-SQL (SQL Server)

I've not posted anything technical for years now so when I had to solve this problem yesterday I thought it might be a good idea to publish it. And yes I did look for a solution online but all I found were approaches that did not work for me (e.g. did not work with sub domains) or involved converting to xml which I thought is pretty heavy for what I want to do.

So here's what worked for me:

 SELECT Url = SUBSTRING(SourceUrl,0,CHARINDEX('.',SourceUrl,0))  
 FROM  
 (        
 SELECT SourceUrl = REPLACE(REPLACE(REPLACE(REPLACE(SourceUrl, 'http://www.',''), 'https://www.',''),'https://',''),'http://','')  
 FROM myUrlTable   
 ) urlData  
Comments and improvements are welcome and I hope someone finds this useful!

No comments: