Can anybody help me to download string from this site
I use this code but
Dim client As New Net.WebClient
Dim str As String = client.DownloadString("
http://www.tsetmc.com/tsev2/chart/data/IndexFinancial.aspx?i=32097828799138957&t=ph")
the results are different.
true data are numbers
"20081206,9249,9168,9249,9178,8539624,9178;20081207,9178,9130,9178,9130,11752353,9130"
but results are like
"‹ ŠÜT ÿdë’í,«…ohýˆg}ÿ÷µyÆdöûuuQà”ÄxD¬Ï³K}æ¿Sûù"
You should set the webclient's encoding first before calling DownloadString.Try with this code.
Dim client As New Net.WebClient
client.Encoding = Encoding.UTF8
Dim str As String = client.DownloadString("http://goo.gl/JRvlsm")
If you "get" the headers for your link:
Status:200
Raw:
HTTP/1.1 200 OK
Cache-Control: public, max-age=9999
Content-Length: 33183
Content-Type: text/csv; charset=utf-8
Content-Encoding: gzip
Expires: Sat, 23 Jul 2016 02:32:58 GMT
Last-Modified: Fri, 22 Jul 2016 23:46:19 GMT
Vary: *
Set-Cookie: ASP.NET_SessionId=vsxyok45zvtgsbvp4iqxdh45; path=/; HttpOnly
X-Powered-By: ASP.NET
Date: Fri, 22 Jul 2016 23:46:19 GMT
Request:
GET /tsev2/chart/data/IndexFinancial.aspx?i=32097828799138957&t=ph HTTP/1.1
You find that the data is gzip compressed (see the "Content-Encoding:" line). To address that, use this code:
Dim myUrl As String = "http://www.tsetmc.com/tsev2/chart/data/IndexFinancial.aspx?i=32097828799138957&t=ph"
Dim result as string
Using client As New WebClient
client.Headers(HttpRequestHeader.AcceptEncoding) = "gzip"
Using rs As New GZipStream(client.OpenRead(myUrl), CompressionMode.Decompress)
result = New StreamReader(rs).ReadToEnd()
End Using
End Using
The result is uncompressed text, just as you have indicated as the correct set of numbers:
20081206,9249,9168,9249,9178,8539624,9178;20081207,9178,9130,9178,9130,11752353,9130;
Here is where I found the info for decompressing gzip (more info there):
Automatically decompress gzip response via WebClient.DownloadData
Note: you may have to add a reference in your project for "System.IO.Compression"
Related
When using Splash with Scrapy the headers are returned from the Splash server instead of the website Splash renders.
response.headers returns:
{b'Server': [b'TwistedWeb/19.7.0'], b'Date': [b'Sun, 11 Jul 2021 07:31:32 GMT'], b'Content-Type': [b'text/html; charset=utf-8']}
And I'm trying to get the headers of the actual website:
Connection: Keep-Alive
Content-Length: 5
Content-Type: text/html
Date: Sun, 11 Jul 2021 07:05:49 GMT
Keep-Alive: timeout=5, max=100
Server: Apache
X-Cache: HIT
How can I get the headers of the website instead of the Splash server?
I got it to work with this:
splash_lua_script = """
function main(splash, args)
assert(splash:go(args.url))
assert(splash:wait(0.5))
local entries = splash:history()
local last_response = entries[#entries].response
return {
html = splash:html(),
headers = last_response.headers
}
end
"""
And then refer it to response.headers with Scrapy.
I'm using Hyper to send HTTP requests, but when multiple cookies are included in the response, Hyper will combine them to one which then fails the parsing procedure.
For example, here's a simple PHP script
<?php
setcookie("hello", "world");
setcookie("foo", "bar");
Response using curl:
$ curl -sLD - http://local.example.com/test.php
HTTP/1.1 200 OK
Date: Sat, 24 Dec 2016 09:24:04 GMT
Server: Apache/2.4.25 (Unix) PHP/7.0.14
X-Powered-By: PHP/7.0.14
Set-Cookie: hello=world
Set-Cookie: foo=bar
Content-Length: 0
Content-Type: text/html; charset=UTF-8
However for the following Rust code:
let client = Client::new();
let response = client.get("http://local.example.com/test.php")
.send()
.unwrap();
println!("{:?}", response);
for header in response.headers.iter() {
println!("{}: {}", header.name(), header.value_string());
}
...the output will be:
Response { status: Ok, headers: Headers { Date: Sat, 24 Dec 2016 09:31:54 GMT, Server: Apache/2.4.25 (Unix) PHP/7.0.14, X-Powered-By: PHP/7.0.14, Set-Cookie: hello=worldfoo=bar, Content-Length: 0, Content-Type: text/html; charset=UTF-8, }, version: Http11, url: "http://local.example.com/test.php", status_raw: RawStatus(200, "OK"), message: Http11Message { is_proxied: false, method: None, stream: Wrapper { obj: Some(Reading(SizedReader(remaining=0))) } } }
Date: Sat, 24 Dec 2016 09:31:54 GMT
Server: Apache/2.4.25 (Unix) PHP/7.0.14
X-Powered-By: PHP/7.0.14
Set-Cookie: hello=worldfoo=bar
Content-Length: 0
Content-Type: text/html; charset=UTF-8
This seems to be really weird to me. I used Wireshark to capture the response and there're two Set-Cookie headers in it. I also checked the Hyper documentation but got no clue...
I noticed Hyper internally uses a VecMap<HeaderName, Item> to store the headers. So they concatenate the them to one? Then how should I divide them into individual cookies afterwards?
I think that Hyper prefers to keep the cookies together in order to make it easier do some extra stuff with them, like checking a cryptographic signature with CookieJar (cf. this implementation outline).
Another reason might be to keep the API simple. Headers in Hyper are indexed by type and you can only get a single instance of that type with Headers::get.
In Hyper, you'd usually access a header by using a corresponding type. In this case the type is SetCookie. For example:
if let Some (&SetCookie (ref cookies)) = response.headers.get() {
for cookie in cookies.iter() {
println! ("Got a cookie. Name: {}. Value: {}.", cookie.name, cookie.value);
}
}
Accessing the raw header value of Set-Cookie makes less sense, because then you'll have to reimplement a proper parsing of quotes and cookie attributes (cf. RFC 6265, 4.1).
P.S. Note that in Hyper 10 the cookie is no longer parsed, because the crate that was used for the parsing triggers the openssl dependency hell.
I have been trying for a while now to consume the eventbrite api with vb.net, I am using the HttpClient to consume the api however it only returns a HTTP 401 Unathorised when I call the same method with the same headers using postman it returns the expected response with a HTTP 200 OK
VB.Net
Dim objClient As New HttpClient()
objClient.BaseAddress = New Uri("https://www.eventbriteapi.com/v3/")
objClient.DefaultRequestHeaders.Authorization = New AuthenticationHeaderValue("Bearer", "IODVRTRFJ5FVEXZXXXXX")
Dim objResponse As HttpResponseMessage = Await objClient.GetAsync("events/search?organizer.id=77181XXXXX")
If objResponse.IsSuccessStatusCode Then
Dim strJSON As String = Await objResponse.Content.ReadAsStringAsync
txtOutput.Text = strJSON
Else
txtOutput.AppendText(objResponse.ToString + vbCrLf)
txtOutput.AppendText(objResponse.RequestMessage.ToString + vbCrLf)
End If
objClient.Dispose()
Request
Method: GET, RequestUri: 'https://www.eventbriteapi.com/events/search?organizer.id=77181XXXXX', Version: 1.1, Content: <null>, Headers:
{
Authorization: Bearer IODVRTRFJ5FVEXZXXXXX
}
Response
StatusCode: 401, ReasonPhrase: 'UNAUTHORIZED', Version: 1.1, Content: System.Net.Http.StreamContent, Headers:
{
Transfer-Encoding: chunked
Connection: keep-alive
x-xss-protection: 1; mode=block
x-content-type-options: nosniff
Vary: Accept
X-UA-Compatible: IE=edge
X-Frame-Options: SAMEORIGIN
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: Authorization
Date: Fri, 28 Nov 2014 14:32:02 GMT
P3P: CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM"
Set-Cookie: SS=AE3DLHTSIyq8Ey6stmsFe7sH0LwxwTjQNw; Domain=.eventbriteapi.com; httponly; Path=/; secure
Set-Cookie: eblang=lo%3Den_US%26la%3Den-us; Domain=.eventbriteapi.com; expires=Sat, 28-Nov-2015 14:32:02 GMT; httponly; Path=/
Set-Cookie: SP=AGQgbblV535q50zSGYa6PvdeMsiuIPDnFlsnyVrk5VIvAnFRtrUHh7AU791a46nkYXJQhH3_VZLFgHuw4j8sAYXPy3l6adKHpQ5js-vjoXyJpTp51nd4Ewnhd-lS9UlI2YL0rUaHCkLvt4_buXJOvRuN222hINBjvQBsJPrR9woApj_ic0MT0cJcNIDsY40PnEOhH8p2xijrXVZHQa6fjwemsjgJEu_Vn6NBi4UO9hBL7sLl-eetYyE; Domain=.eventbriteapi.com; httponly; Path=/
Set-Cookie: G=v%3D1%26i%3D1429e3af-ac09-4b67-b2a3-1f4473c28bcd%26a%3D51b%26s%3DAPDvTK5mqMI40Xz80zXcPKvFx7daGz_DOA; Domain=.eventbriteapi.com; expires=Sat, 28-Nov-2015 14:32:02 GMT; httponly; Path=/
Set-Cookie: AN=; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/
Server: nginx
Allow: GET
Allow: HEAD
Allow: OPTIONS
Content-Type: application/json
}
Postman
The reason I was getting the 401 Unauthorized was because when i was calling the api originally I was querying https://www.eventbriteapi.com/v3/events/search?organizer.id=77181XXXXX.
This URL is not valid and should be https://www.eventbriteapi.com/v3/events/search/?organizer.id=77181XXXXX (note the extra / after search)
Eventbrite automatically redirected me to the correct URL however it lost the authentication header and thus was unauthorized.
Working VB.Net code:
Dim objClient As New HttpClient()
Try
objClient.BaseAddress = New Uri("https://www.eventbriteapi.com/v3/")
objClient.DefaultRequestHeaders.Authorization = New AuthenticationHeaderValue("Bearer", Context.EventBriteApiToken)
Dim objResponse As HttpResponseMessage = Await objClient.GetAsync("users/" + Context.EventBriteUserId + "/owned_events/?page=" + intPage.ToString)
objResponse.EnsureSuccessStatusCode() '** Throws exception
Dim strJSON As String = Await objResponse.Content.ReadAsStringAsync
Return JsonConvert.DeserializeObject(Of EventBrite.EventSearchResponse)(strJSON)
Catch ex As Exception
Throw ex
Finally
objClient.Dispose()
End Try
I had a similar issue.
Building .net application and I was using the HttpClient class. The url I was calling was "http" and the server redirects it to https. Apparently this drops the headers. Didnt happen through postman !!
Solution: use httpS
I use a very simple jquery.ajax() call to fetch some HTML snippet from a server:
// Init add lines button
$('body').on('click', '.add-lines', function(e) {
$.ajax({
type : 'POST',
url : $(this).attr('href')+'?ajax=1&addlines=1',
data : $('#quickorder').serialize(),
success : function(data,x,y) {
$('#directorderform').replaceWith(data);
},
dataType : 'html'
});
e.preventDefault();
});
On the PHP side i basically echo out a HTML string. The jQuery version is 1.8.3.
The problem is in IE10: While it works fine there on Server A which runs on Apache it fails on Server B which runs on Nginx + PHP-FPM: If i debug the success handler on Server B I get a undefined for data. In the Network tab of the IE developer tools I can see the full response and all headers. It may affect other IE versions, but i could only test IE10 so far.
Here are the two response headers:
Server A, Apache (works):
HTTP/1.1 200 OK
Date: Thu, 25 Apr 2013 13:28:08 GMT
Server: Apache
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Vary: Accept-Encoding,User-Agent
Content-Encoding: gzip
Content-Length: 1268
Keep-Alive: timeout=2, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=UTF-8
Server B, Nginx + PHP-FPM (fails):
HTTP/1.1 200 OK
Server: nginx/1.1.19
Date: Thu, 25 Apr 2013 13:41:43 GMT
Content-Type: text/html; charset=utf8
Transfer-Encoding: chunked
Connection: keep-alive
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Content-Encoding: gzip
The body part looks the same in both cases.
Any idea what could cause this issue?
Please also check the Content-Type Header, since Apache and Nginx are sending different values:
Content-Type: text/html; charset=UTF-8
vs.
Content-Type: text/html; charset=utf8
Update your Nginx config, add this line:
charset UTF-8;
When I make a request in RestSharp like so:
var response = client.Execute<bool>(request);
I get the following error:
"Unable to cast object of type 'System.Boolean' to type 'System.Collections.Generic.IDictionary`2[System.String,System.Object]'."
This is complete HTTP response, per Fiddler:
HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Type: application/json; charset=utf-8
Expires: -1
Server: Microsoft-IIS/7.5
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Mon, 01 Apr 2013 15:09:14 GMT
Content-Length: 5
false
It appears that everything is kosher with the response, so what gives?
Also, if I'm doing something stupid with my WebAPI Controller by returning a simple value instead of an object and that would fix my problem, feel free to suggest.
RestSharp will only deserialise valid json. false is not valid json (according to RFC-4627). The server will need to return something like the following at the least:
{ "foo": false }
And you'll need a class like to following to deserialize to:
public class BooleanResponse
{
public bool Foo { get; set; }
}