In the early nineties, the World Wide Web (WWW) was invented. Nowadays, most people think that the WWW simply consists of all the pretty (or not so pretty) HTML-pages that you can read with your WWW browser. But back then, one of the main intentions behind the design of the WWW was to unify several existing communication protocols.
Then (and even now), information on the Internet was available via a multitude of channels: FTP, HTTP, E-Mail, News, Gopher2, and many more. Thanks to the WWW, all these services can now be uniformly addressed via URLs (Uniform Resource Locators). The syntax of URLs is defined in the Internet standard RFC 1738. For our problem, we consider a simplified version of the syntax, which is as follows:
< protocol > "://" < host > [ ":" < port > ] [ "/" < path > ]
The square brackets [] mean that the enclosed string is optional and may or may not appear. Examples of URLs are the following:
- http://www.informatik.uni-ulm.de/acm
- ftp://acm.baylor.edu:1234/pub/staff/mr-p
- gopher://veryold.edu
More specifically,
- < protocol > is always one of http, ftp or gopher.
- < host > is a string consisting of alphabetic (a-z, A-Z) or numeric (0-9) characters and points (.), dash (-), underscore(_).
- < port > is a positive integer, smaller than 65536.
- < path > is a string that contains no spaces.
You are to write a program that parses an URL into its components.
2The ancestor of today’s WWW. Now nearly extinct, but quite important when the WWW was invented.
The input starts with a line containing a single integer n, the number of URLs in the input file. The following n lines contain one URL each, in the format described above. The URLs will consist of at most 60 characters each.
For each URL in the input first print the number of the URL, as shown in the sample output. Then print four lines, stating the protocol, host, port and path specified by the URL. If the port and/or path are not given in the URL, print the string <default> instead. Adhere to the format shown in the sample output.
Print a blank line after each test case.
예제 입력 1
예제 출력 1
URL #1
Protocol = ftp
Host = acm.baylor.edu
Port = 1234
Path = pub/staff/mr-p
URL #2
Protocol = http
Host = www.informatik.uni-ulm.de
Port = <default>
Path = acm
URL #3
Protocol = gopher
Host = veryold.edu
Port = <default>
Path = <default>
import java.io.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
interface Main{
static void main(String[]a) throws IOException {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(System.out));
Matcher matcher;
int n = Integer.parseInt(br.readLine());
String str = "";
String regex = "(http|ftp|gopher)://([\\w.-]+)(?::([\\d]+))?(?:/([\\S]+))?";
String[] arr = {"Protocol = ", "Host = ", "Port = ", "Path = "};
for(int i=1; i<=n; i++) {
bw.write("URL #" + i + "\n");
str = br.readLine();
matcher = Pattern.compile(regex).matcher(str);
if(matcher.find()) {
for(int j=1; j<=4; j++) {
if(matcher.group(j) != null) {
bw.write(arr[j-1] + matcher.group(j) + "\n");
} else {
bw.write(arr[j-1] + "<default>" + "\n");
주어진 URL에서 Protocol, Host, Port, Path 규칙에 맞으면 그 부분을 출력하고 아닐 경우 <default>를 출력하는 문제입니다.
Protocol은 http, ftp, gopher 중 하나여서 (http|ftp|gopher)이고
Host는 알파벳, 숫자, 점, 대시, 밑줄이므로 ([\\w.-]_)입니다.
Port는 그냥 숫자이고 콜론 뒤에 오지만 콜론이 매치되지 않아야 하고 Port가 있을 수도 있고 없을 수도 있기 때문에 (?::[\\d]+))?입니다.
Port는 공백을 포함하지 않은 문자열이고 빗금 뒤에 오며 Port와 마찬가지로 있을 수도 있고 없을 수도 있기 때문에 (?:/[([\\S]+))?입니다.
for 문으로 그룹이 있으면 그룹에 해당하는 출력을 하도록 합니다.
