[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[www_shared] branch master updated: extract only article content
From: |
gnunet |
Subject: |
[www_shared] branch master updated: extract only article content |
Date: |
Tue, 11 May 2021 19:18:26 +0200 |
This is an automated email from the git hooks/post-receive script.
dold pushed a commit to branch master
in repository www_shared.
The following commit(s) were added to refs/heads/master by this push:
new 2b72c7f extract only article content
2b72c7f is described below
commit 2b72c7f57d318271856f992eb2e58c133ae5179e
Author: Florian Dold <florian@dold.me>
AuthorDate: Tue May 11 19:16:04 2021 +0200
extract only article content
---
sitegen/site.py | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/sitegen/site.py b/sitegen/site.py
index f9d3e7d..2148763 100644
--- a/sitegen/site.py
+++ b/sitegen/site.py
@@ -65,14 +65,14 @@ def cut_text(filename, count):
return textreduced
-def extract_body(text):
+def extract_body(text, content_id="newspost-content"):
"""Extract the body of some HTML and
return it wrapped in an <article> tag."""
soup = BeautifulSoup(text, features="lxml")
- bs = soup.findAll("body")
- b = bs[0]
- b.name = "article"
- return b.prettify()
+ content = soup.find(id=content_id)
+ if content is None:
+ raise Error("can't extract content")
+ return content.prettify()
def make_helpers(root, in_file, locale):
--
To stop receiving notification emails like this one, please contact
gnunet@gnunet.org.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [www_shared] branch master updated: extract only article content,
gnunet <=