You should not attempt to parse HTML with regex. HTML is not a regular language, so any regex you come up with will likely fail on some esoteric edge case. Please refer to the seminal answer to this question for specifics. While mostly formatted as a joke, it makes a very good point.
The following examples are Java, but the regex will be similar -- if not identical -- for other languages.
String target = someString.replaceAll["]*>", ""];
Assuming your non-html does not contain any < or > and that your input string is correctly structured.
If you know they're a specific tag -- for example you know the text contains only Edit: Ωmega brought up a good point in a comment on another post that this would result in multiple results all being squished together if
there were multiple tags. For example, if the input string were In a situation where multiple tags are expected, we could do something like: This replaces the HTML with a single space, then collapses whitespace, and then trims any on the ends. HTML stands for HyperText Markup Language and is used to display information in the browser. HTML regular expressions can be used to find tags in the text, extract them or remove them. Generally, it’s not a good idea to parse HTML with regex, but a limited known set of HTML can be sometimes parsed. Below is a simple regex to validate the string against HTML tag pattern. This can be later used to remove all tags and leave text only. Test it! /]]+>/ True False Enter a text in the input above to see the result Example code in JavaScript: One of the most common operations with HTML and regex is the extraction of the text between certain tags [a.k.a. scraping]. For this operation, the following regular expression can be used. tags, you could do something like this: String target = someString.replaceAll["[?i]]*>", ""];
, then the above would result in Something Another Thing SomethingAnother Thing
.String target = someString.replaceAll["[?i]]*>", " "].replaceAll["\\s+", " "].trim[];
Match all HTML tags
/]]+>/g;
// Remove all tags from a string
var htmlRegexG = /]]+>/g;
'Hello, world!
'.replace[htmlRegexG, '']; // returns 'Hello, world';
Extract
text between certain tags
var r1 = /
Chủ Đề