A fellow emailed me wanting to screen scrape, er, ah, harvest a page that only displays the data he wants with a postback.
Remember what an HTTP GET looks like under the covers:
GET /whatever/page.aspx?param1=value¶m2=value
Note that the GET includes no HTTP Body. That's important. With a POST the 'DATA' moves from the QueryString into the HTTP Body, but you can still have stuff in the QueryString.
POST /whatever/page.aspx?optionalotherparam1=value
Content-Type: application/x-www-form-urlencoded
Content-Length: 25
param1=value¶m2=value
Note the Content-Type header and the Content-Length, those are important.
A POST is just the verb for when you have an HTTP document. A GET implies you got nothing.
So, in C#, here's a GET:
public static string HttpGet(string URI)
{
System.Net.WebRequest req = System.Net.WebRequest.Create(URI);
req.Proxy = new System.Net.WebProxy(ProxyString, true); //true means no proxy
System.Net.WebResponse resp = req.GetResponse();
System.IO.StreamReader sr = new System.IO.StreamReader(resp.GetResponseStream());
return sr.ReadToEnd().Trim();
}
{
System.Net.WebRequest req = System.Net.WebRequest.Create(URI);
req.Proxy = new System.Net.WebProxy(ProxyString, true); //true means no proxy
System.Net.WebResponse resp = req.GetResponse();
System.IO.StreamReader sr = new System.IO.StreamReader(resp.GetResponseStream());
return sr.ReadToEnd().Trim();
}
Here's a POST:
public static string HttpPost(string URI, string Parameters)
{
System.Net.WebRequest req = System.Net.WebRequest.Create(URI);
req.Proxy = new System.Net.WebProxy(ProxyString, true); //Add these, as we're doing a POST
req.ContentType = "application/x-www-form-urlencoded";
req.Method = "POST";
//We need to count how many bytes we're sending. Post'ed Faked Forms should be name=value&
byte [] bytes = System.Text.Encoding.ASCII.GetBytes(Parameters);
req.ContentLength = bytes.Length;
System.IO.Stream os = req.GetRequestStream ();
os.Write (bytes, 0, bytes.Length); //Push it out there
os.Close ();
System.Net.WebResponse resp = req.GetResponse();
if (resp== null) return null;
System.IO.StreamReader sr = new System.IO.StreamReader(resp.GetResponseStream());
return sr.ReadToEnd().Trim();
}
{
System.Net.WebRequest req = System.Net.WebRequest.Create(URI);
req.Proxy = new System.Net.WebProxy(ProxyString, true); //Add these, as we're doing a POST
req.ContentType = "application/x-www-form-urlencoded";
req.Method = "POST";
//We need to count how many bytes we're sending. Post'ed Faked Forms should be name=value&
byte [] bytes = System.Text.Encoding.ASCII.GetBytes(Parameters);
req.ContentLength = bytes.Length;
System.IO.Stream os = req.GetRequestStream ();
os.Write (bytes, 0, bytes.Length); //Push it out there
os.Close ();
System.Net.WebResponse resp = req.GetResponse();
if (resp== null) return null;
System.IO.StreamReader sr = new System.IO.StreamReader(resp.GetResponseStream());
return sr.ReadToEnd().Trim();
}
I could and should have put in more 'using' statements, but you get the gist. And, there are other ways to have done this with the BCL, but this is one.
Now, how would you fake an HTTP PostBack? Use a tool like ieHttpHeaders to watch what a real postback looks like, and well, fake it. :) Just hope they don't require unique/encrypted ViewState (via ViewStateUserKey or EnableViewStateMac) for that page, or you're out of luck.
No comments:
Post a Comment