1. fix a performance regression when using line-by-line highlighting
* the root cause is that chroma's `lexers.Get` is slow and a lexer cache
is missing during recent changes
2. clarify the chroma lexer detection behavior
* now we fully manage our logic to detect lexer, and handle overriding
problems, everything is fully under control
3. clarify "code analyze" behavior, now only 2 usages:
* only use file name and language to detect lexer (very fast), mainly
for "diff" page which contains a lot of files
* if no lexer is detected by file name and language, use code content to
detect again (slow), mainly for "view file" or "blame" page, which can
get best result
4. fix git diff bug, it caused "broken pipe" error for large diff files
For when Enry correctly recognized the language, but returns the
language name in a way that isn't recognized by chroma.
Resolves https://github.com/go-gitea/gitea/issues/22443
---------
Co-authored-by: Moritz Jörg <moritz.jorg@oceanbox.io>
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
Fix#29136
Before: The result is a table and all line numbers are all in one row.
After: Use a separate table column for the line numbers.
---------
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
Refactor `modules/indexer` to make it more maintainable. And it can be
easier to support more features. I'm trying to solve some of issue
searching, this is a precursor to making functional changes.
Current supported engines and the index versions:
| engines | issues | code |
| - | - | - |
| db | Just a wrapper for database queries, doesn't need version | - |
| bleve | The version of index is **2** | The version of index is **6**
|
| elasticsearch | The old index has no version, will be treated as
version **0** in this PR | The version of index is **1** |
| meilisearch | The old index has no version, will be treated as version
**0** in this PR | - |
## Changes
### Split
Splited it into mutiple packages
```text
indexer
├── internal
│ ├── bleve
│ ├── db
│ ├── elasticsearch
│ └── meilisearch
├── code
│ ├── bleve
│ ├── elasticsearch
│ └── internal
└── issues
├── bleve
├── db
├── elasticsearch
├── internal
└── meilisearch
```
- `indexer/interanal`: Internal shared package for indexer.
- `indexer/interanal/[engine]`: Internal shared package for each engine
(bleve/db/elasticsearch/meilisearch).
- `indexer/code`: Implementations for code indexer.
- `indexer/code/internal`: Internal shared package for code indexer.
- `indexer/code/[engine]`: Implementation via each engine for code
indexer.
- `indexer/issues`: Implementations for issues indexer.
### Deduplication
- Combine `Init/Ping/Close` for code indexer and issues indexer.
- ~Combine `issues.indexerHolder` and `code.wrappedIndexer` to
`internal.IndexHolder`.~ Remove it, use dummy indexer instead when the
indexer is not ready.
- Duplicate two copies of creating ES clients.
- Duplicate two copies of `indexerID()`.
### Enhancement
- [x] Support index version for elasticsearch issues indexer, the old
index without version will be treated as version 0.
- [x] Fix spell of `elastic_search/ElasticSearch`, it should be
`Elasticsearch`.
- [x] Improve versioning of ES index. We don't need `Aliases`:
- Gitea does't need aliases for "Zero Downtime" because it never delete
old indexes.
- The old code of issues indexer uses the orignal name to create issue
index, so it's tricky to convert it to an alias.
- [x] Support index version for meilisearch issues indexer, the old
index without version will be treated as version 0.
- [x] Do "ping" only when `Ping` has been called, don't ping
periodically and cache the status.
- [x] Support the context parameter whenever possible.
- [x] Fix outdated example config.
- [x] Give up the requeue logic of issues indexer: When indexing fails,
call Ping to check if it was caused by the engine being unavailable, and
only requeue the task if the engine is unavailable.
- It is fragile and tricky, could cause data losing (It did happen when
I was doing some tests for this PR). And it works for ES only.
- Just always requeue the failed task, if it caused by bad data, it's a
bug of Gitea which should be fixed.
---------
Co-authored-by: Giteabot <teabot@gitea.io>
Change all license headers to comply with REUSE specification.
Fix#16132
Co-authored-by: flynnnnnnnnnn <flynnnnnnnnnn@github>
Co-authored-by: John Olheiser <john.olheiser@gmail.com>
Use check attribute code to check the assigned language of a file and send that in to
chroma as a hint for the language of the file.
Signed-off-by: Andrew Thornton <art27@cantab.net>
* Server-side syntax hilighting for all code
This PR does a few things:
* Remove all traces of highlight.js
* Use chroma library to provide fast syntax hilighting directly on the server
* Provide syntax hilighting for diffs
* Re-style both unified and split diffs views
* Add custom syntax hilighting styling for both regular and arc-green
Fixes#7729Fixes#10157Fixes#11825Fixes#7728Fixes#3872Fixes#3682
And perhaps gets closer to #9553
* fix line marker
* fix repo search
* Fix single line select
* properly load settings
* npm uninstall highlight.js
* review suggestion
* code review
* forgot to call function
* fix test
* Apply suggestions from code review
suggestions from @silverwind thanks
Co-authored-by: silverwind <me@silverwind.io>
* code review
* copy/paste error
* Use const for highlight size limit
* Update web_src/less/_repository.less
Co-authored-by: Lauris BH <lauris@nix.lv>
* update size limit to 1MB and other styling tweaks
* fix highlighting for certain diff sections
* fix test
* add worker back as suggested
Co-authored-by: silverwind <me@silverwind.io>
Co-authored-by: Lauris BH <lauris@nix.lv>
Move langauge detection to separate module to be more reusable
Add option to disable vendored file exclusion from file search
Allways show all language stats for search