How to extract html tags from a string

Normally in the server side you could use a series of PHP functions [such as strip_tags] and to remove HTML and ugly formatting. However, if you're unable to use the server [or you use Node.js] to achieve this task, then you can still use Javascript to do it. In this article, you will find 3 ways to strip the html tags from a string in Javascript.

1. Create a temporary DOM element and retrieve the text

This is the preferred [and recommended] way to strip the HTML from a string with Javascript. The content of a temporary div element, will be the providen HTML string to strip, then from the div element return the innerText property:

/**
 * Returns the text from a HTML string
 * 
 * @param {html} String The html string
 */
function stripHtml[html]{
    // Create a new div element
    var temporalDivElement = document.createElement["div"];
    // Set the HTML content with the providen
    temporalDivElement.innerHTML = html;
    // Retrieve the text property of the element [cross-browser support]
    return temporalDivElement.textContent || temporalDivElement.innerText || "";
}

var htmlString= "Hello World
\nIt's me, Mario";

//Hello World
//It's me, Mario
console.log[stripHtml[htmlString]];

The only problem of this [and the advantage] is that the browser will handle the providen string as HTML, that means that if the HTML string contains some type of interpretable Javascript for the browser, then it will be executed:

// This won't do anything but retrieve the text
stripHtml["alert[\"Hello\"];"]

// But this ...
stripHtml[""]

Therefore, you should use this only if you trust the source of the HTML string.

2. If you are using jQuery

If you use jQuery you can simplificate the code from the first step. The following code will do the same that the code in the first step [the warnings apply too]:

var htmlString= "\n Hello World
\n This is the text that we should get.
\n Our Code World © 2017

  
\n ";

var stripedHtml = $[""].html[htmlString].text[];

// Hello World
// This is the text that we should get.
// Our Code World © 2017
console.log[stripedHtml];

3. With a regular expression

If you're working in a Node environment, where there's not either document or createElement method, then you can use a regular expression to replace all the HTML tags from a string:

var htmlString= "Hello World
\nIt's me, Mario";

var stripedHtml = htmlString.replace[/]+>/g, ''];

//Hello World
//It's me, Mario
console.log[stripedHtml];

This method will work perfectly, but it will only remove the less than and more than symbols [], that means that the html entities aren't removed from the string as shown in the following example:

var htmlString= "\n Hello World
\n This is the text that we should get.

  
\n Our Code World © 2017
\n ";

var stripedHtml = htmlString.replace[/]+>/g, ''];

// Hello World
// This is the text that we should get.
// Our Code World © 2017
console.log[stripedHtml];

The © entity should be translated as a copyright symbol, however it still there as an html entity. That's clearly a disadvantage if you compare it with the first method, but don't worry not everything is lost [not yet]. You can use Javascript to decode the htmlentities into readable characters [read this article to learn how to achieve it]. The following example will strip all the html using the previous mentioned replace instruction and convert the htmlentities to human readable characters using the he library:

var htmlString= "\n Hello World
\n This is the text that we should get.
\n Our Code World © 2017
\n ";

var stripedHtml = htmlString.replace[/]+>/g, ''];
var decodedStripedHtml = he.decode[stripedHtml];

// Hello World
// This is the text that we should get.
// Our Code World © 2017
console.log[stripedHtml];

// Hello World
// This is the text that we should get.
// Our Code World © 2017
console.log[decodedStripedHtml];

As you can see, using the he library we converted the remaining html entities into its readable value. Note that you don't need to use necessarily the he library because you can create your own decode htmlentities function if you read this article.

Happy coding !

How extract string from HTML tag in Python?

Using re module this task can be performed. In this we employ, findall[] function to extract all the strings by matching appropriate regex built using tag and symbols.

How do I strip a string in HTML?

To strip out all the HTML tags from a string there are lots of procedures in JavaScript. In order to strip out tags we can use replace[] function and can also use . textContent property, . innerText property from HTML DOM.

What are the 4 ways methods of finding HTML elements?

There are several ways to do this:.

Finding HTML elements by id..

Finding HTML elements by tag name..

Finding HTML elements by class name..

Finding HTML elements by CSS selectors..

Finding HTML elements by HTML object collections..

How do I find tags in HTML?

How to Check Your Site's HTML Tags.

Right-click while on your webpage in Google Chrome..

Click 'Inspect'.

You'll see the HTML code in a box at the side or bottom of your page..

Use Ctrl + F to find particular tags or elements..

1. Create a temporary DOM element and retrieve the text

Hello World

2. If you are using jQuery

Hello World

3. With a regular expression

Hello World

Hello World

Hello World

How extract string from HTML tag in Python?

How do I strip a string in HTML?

What are the 4 ways methods of finding HTML elements?

How do I find tags in HTML?

Bài Viết Liên Quan

Toplist mới

Bài mới nhất

Chủ Đề