Convert HTML to XML online helps to convert HTML Table to xml which saves a lot a time. Copy, Paste and Convert to XML. It traverse the DOM and look for tables from the HTML data.
What can you do with HTML to XML?
- This tool helps you to get plain xml from html table very quickly without writing single line of code.
- From HTML to XML free allows loading the Website URL which has tables converting to XML. Click on the URL button, Enter URL and Submit.
- Parsing HTML into XML supports loading the HTML File to transform to XML. Click on the Upload button and select File.
- HTML to Plain XML Converter Online works well on Windows, MAC, Linux, Google Chrome, Firefox, Edge, and Safari.
Example of HTML
HTML with Table Try it.
Tom | Cruise |
Maria | Sharapova |
James | Bond |
Converted HTML 2 XML
Tom Cruise Maria Sharapova James Bond
For Advanced Users
HTML with Table External URLLoad HTML External URL in Browser URL like this //codebeautify.org/html-to-xml-converter?url=external-url
//codebeautify.org/html-to-xml-converter?url=//gist.githubusercontent.com/cbmgit/644916fb1e435ddc367233c6d785652f/raw/html-table.html
I did found a way to convert [even bad] html into well formed XML. I started to base this on the DOM loadHTML function. However during time several issues occurred and I optimized and added patches to correct side effects.
function tryToXml[$dom,$content] {
if[!$content] return false;
// xml well formed content can be loaded as xml node tree
$fragment = $dom->createDocumentFragment[];
// wonderfull appendXML to add an XML string directly into the node tree!
// aappendxml will fail on a xml declaration so manually skip this when occurred
if[ substr[ $content,0, 5] == '']+1];
if[ strpos[$content,'' . $content;
}
// return a dom from the content
$domInject = new DOMDocument["1.0", "UTF-8"];
$domInject->preserveWhiteSpace = false;
$domInject->formatOutput = true;
// html type
try {
@$domInject->loadHTML[ $content ];
} catch[Exception $e]{
// do nothing and continue as it's normal that warnings will occur on nasty HTML content
}
// to check encoding: echo $dom->encoding
$this->reworkDom[ $domInject ];
if[ $bodyOnly ] {
$fragment = $dom->createDocumentFragment[];
// retrieve nodes within /html/body
foreach[ $domInject->documentElement->childNodes as $elementLevel1 ] {
if[ $elementLevel1->nodeName == 'body' and $elementLevel1->nodeType == XML_ELEMENT_NODE ] {
foreach[ $elementLevel1->childNodes as $elementInject ] {
$fragment->insertBefore[ $dom->importNode[$elementInject, true] ];
}
}
}
} else {
$fragment = $dom->importNode[$domInject->documentElement, true];
}
return $fragment;
}
protected function reworkDom[ $node, $level = 0 ] {
// start with the first child node to iterate
$nodeChild = $node->firstChild;
while [ $nodeChild ] {
$nodeNextChild = $nodeChild->nextSibling;
switch [ $nodeChild->nodeType ] {
case XML_ELEMENT_NODE:
// iterate through children element nodes
$this->reworkDom[ $nodeChild, $level + 1];
break;
case XML_TEXT_NODE:
case XML_CDATA_SECTION_NODE:
// do nothing with text, cdata
break;
case XML_COMMENT_NODE:
// ensure comments to remove - sign also follows the w3c guideline
$nodeChild->nodeValue = str_replace["-","_",$nodeChild->nodeValue];
break;
case XML_DOCUMENT_TYPE_NODE: // 10: needs to be removed
case XML_PI_NODE: // 7: remove PI
$node->removeChild[ $nodeChild ];
$nodeChild = null; // make null to test later
break;
case XML_DOCUMENT_NODE:
// should not appear as it's always the root, just to be complete
// however generate exception!
case XML_HTML_DOCUMENT_NODE:
// should not appear as it's always the root, just to be complete
// however generate exception!
default:
throw new exception["Engine: reworkDom type not declared [".$nodeChild->nodeType. "]"];
}
$nodeChild = $nodeNextChild;
} ;
}
Now this also allows to add more html pieces into one XML which I needed to use myself. In general it can be used like this:
$c='testtwo
';
$dom=new DOMDocument['1.0', 'UTF-8'];
$n=$dom->appendChild[$dom->createElement['info']]; // make a root element
if[ $valueXml=tryToXml[$dom,$c] ] {
$n->appendChild[$valueXml];
}
echo ''. htmlentities[$dom->saveXml[$n]]. '
';
In this example testtwo testtwo one two'
will nicely be outputed in well formed XML as '
'. The info root tag
is added as it will also allow to convert '
tag can be skipped.With this I'm getting real nice XML out of unstructured and even corrupted HTML!
I hope it's a bit clear and might contribute to other people to use it.