I have a string
Manager of Medical Threat Devlop at Micro
by regular expressionfindall
i want to find any words which goes after "at","for", "of", . If to apply r'(?:for|at|of)\s+(.*)'
i will get incorrect ['Medical Threat Devlop at Micro']
i want find by regular expression from the end of string ['Micro']
P/S. Please don't advice split
More examples:
1)Manager of Medical Threat Devlop at Canno -> Canno
2)Manager of Medicalof Threat Devlop of Canno -> Canno
3)Manager of Medicalfor Threat Devlop for Canno -> Canno
4)Threat Devlop at Canno Matt -> Canno Matt
You can use
re.findall(r'.*\b(?:for|at|of)\s+(.*)', text)
See the regex demo. Details:
.*
- any zero or more chars other than line break chars, as many as possible\b
- a word boundary(?:for|at|of)
-for
,at
orof
\s+
- one or more whitespaces(.*)
- Group 1: any zero or more chars other than line break chars, as many as possible.
Another regex that will fetch the same results is
re.findall(r'\b(?:for|at|of)\s+((?:(?!\b(?:for|at|of)\b).)*)$', text)
Details:
\b
- a word boundary(?:for|at|of)
-for
,at
orof
\s+
- one or more whitespaces((?:(?!\b(?:for|at|of)\b).)*)
- Group 1: any char, other than a line break char, zero or more but as many as possible, occurrences, that does not start afor
,at
orof
as a whole word char sequence$
- end of string.
Note you can also use re.search
since you expect a single match:
match = re.search(r'.*\b(?:for|at|of)\s+(.*)', text)
if match:
print(match.group(1))
Try this re.split
would work this.
Your question is not fully clear give some more input and output examples.
import re
s = 'Manager of Medical Threat Devlop at Micro'
s = re.split(r'at |for |of ',s)[-1:]
print(s)
OUTPUT
IN : OUTPUT
'Manager of Medical Threat Devlop at Micro' : ['Micro']
'Threat Devlop at Canno Matt' : ['Canno Matt']
THERE IS ANOTHER METHOD TO DO THIS (USING re.finditer
).
import re
string = 'Threat Devlop at Canno Matt'
s = re.finditer(r'(at | for | of )',string,)
last_index = list(s)[-1].end()
print(string[last_index:])
I am not good in re
at all.(But I get it)
Yeah there is another to do this.(Using re.findall
)
import re
string = 'Threat Devlop at Canno of Matjkasa'
s = re.findall(r'.*(?:at|for|of)\s+', string)
print(string.replace(*s,''))
If you want to do it with a regex, then here's the way to do it.
Replace matches of the following regex with the empty string:
.*\b(?:for|at|of)\b\s?
This will match:
.*
: any character (by its nature, this pattern will match as most characters as possible)\b(?:for|at|of)\b
: your hotwords between boundary symbols\s?
: an optional space
Check the demo here
Comments
Post a Comment