LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-30-2009, 06:01 AM   #1
aliahsan81
LQ Newbie
 
Registered: Sep 2008
Posts: 22

Rep: Reputation: 0
Curl getting html tags


I am making a script in which i wana use curl to download a web page and check status.But problem is when i use curl in linux command line it downlaod htlm tags.How can we ignore these tage any idea.
 
Old 07-30-2009, 06:09 AM   #2
JulianTosh
Member
 
Registered: Sep 2007
Location: Las Vegas, NV
Distribution: Fedora / CentOS
Posts: 674
Blog Entries: 3

Rep: Reputation: 90
You can use the -I option to only retrieve headers and page status. For example:

Code:
$ curl -I http://www.google.com
HTTP/1.1 200 OK
Date: Thu, 30 Jul 2009 11:08:36 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Server: gws
Transfer-Encoding: chunked

$
 
Old 07-30-2009, 06:15 AM   #3
aliahsan81
LQ Newbie
 
Registered: Sep 2008
Posts: 22

Original Poster
Rep: Reputation: 0
Thanks for quick reply.

I want to get page content to parse in my script.
 
Old 07-30-2009, 06:22 AM   #4
JulianTosh
Member
 
Registered: Sep 2007
Location: Las Vegas, NV
Distribution: Fedora / CentOS
Posts: 674
Blog Entries: 3

Rep: Reputation: 90
Can you post an example of what content you're retreiving and specifically what content you wish to extract from it?
 
Old 07-30-2009, 06:26 AM   #5
aliahsan81
LQ Newbie
 
Registered: Sep 2008
Posts: 22

Original Poster
Rep: Reputation: 0
yes sure i am getting my page that will check server status either good or bad.

text is some think like this

Server response is OK

Or

server sending error xxx
 
Old 07-30-2009, 06:32 AM   #6
JulianTosh
Member
 
Registered: Sep 2007
Location: Las Vegas, NV
Distribution: Fedora / CentOS
Posts: 674
Blog Entries: 3

Rep: Reputation: 90
It's probably easiest to grep for the affirmative and error on anything else..
Code:
curl http://example.com |  grep -i "Server response is OK"
if [ $? == 0 ]
then
  echo "Everything's OK"
else
  echo "Get me a hammer."
fi
 
Old 07-30-2009, 06:47 AM   #7
aliahsan81
LQ Newbie
 
Registered: Sep 2008
Posts: 22

Original Poster
Rep: Reputation: 0
ok,That i have already done it..So it mean there is not way to achieve what i have asked,I have asked this question because like browser didnt show up html.So i tough same curl will do.Any way thanks for your help:
 
Old 07-30-2009, 08:09 AM   #8
speccy
Member
 
Registered: Dec 2008
Location: Portsmouth, UK
Distribution: slackware
Posts: 39

Rep: Reputation: 32
If you are trying to get just the page content (without tags) try:

Code:
lynx -dump http://www.example.com
 
Old 07-31-2009, 09:02 AM   #9
aliahsan81
LQ Newbie
 
Registered: Sep 2008
Posts: 22

Original Poster
Rep: Reputation: 0
yes thanks i am using elinks
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Removing atributes from html tags maikki Programming 2 04-30-2009 01:32 PM
mutt and html tags cizzi Linux - Software 3 03-30-2008 08:21 PM
strip html tags rblampain Programming 6 08-07-2005 06:22 AM
Bash script for correcting HTML tags hq4ever Programming 4 11-08-2004 04:06 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:49 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration